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A1.1 The quantum mechanics of atoms and 
molecules 


John F Stanton 


A1.1.1 INTRODUCTION 

At the turn of the 19th century, it was generally believed that the great distance between earth and the stars 
would forever limit what could be learned about the universe. Apart from their approximate size and distance 
from earth, there seemed to be no hope of determining intensive properties of stars, such as temperature and 
composition. While this pessimistic attitude may seem quaint from a modern perspective, it should be 
remembered that all knowledge gained in these areas has been obtained by exploiting a scientific technique 
that did not exist 200 years ago — spectroscopy. 

In 1859, Kirchoff made a breakthrough discovery about the nearest star — our sun. It had been known for some 
time that a number of narrow dark lines are found when sunlight is bent through a prism. These absences had 
been studied systematically by Fraunhofer, who also noted that dark lines can be found in the spectrum of 
other stars; furthermore, many of these absences are found at the same wavelengths as those in the solar 
spectrum. By burning substances in the laboratory, Kirchoff was able to show that some of the features are 
due to the presence of sodium atoms in the solar atmosphere. For the first time, it had been demonstrated that 
an element found on our planet is not unique, but exists elsewhere in the universe. Perhaps most important, 
the field of modern spectroscopy was born. 

Armed with the empirical knowledge that each element in the periodic table has a characteristic spectrum, and 
that heating materials to a sufficiently high temperature disrupts all interatomic interactions, Bunsen and 
Kirchoff invented the spectroscope, an instrument that atomizes substances in a flame and then records their 
emission spectrum. Using this instrument, the elemental composition of several compounds and minerals were 
deduced by measuring the wavelength of radiation that they emit. In addition, this new science led to the 
discovery of elements, notably caesium and rubidium. 

Despite the enormous benefits of the fledgling field of spectroscopy for chemistry, the underlying physical 
processes were completely unknown a century ago. It was believed that the characteristic frequencies of 
elements were caused by (nebulously defined) vibrations of the atoms, but even a remotely satisfactory 
quantitative theory proved to be elusive. In 1885, the Swiss mathematician Balmer noted that wavelengths in 
the visible region of the hydrogen atom emission spectrum could be fitted by the empirical equation 
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where m = 2 and n is an integer. Subsequent study showed that frequencies in other regions of the hydrogen 
spectrum could be fitted to this equation by assigning different integer values to m, albeit with a different 
value of the constant b. Ritz noted that a simple modification of Balmer' s formula 
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succeeds in fitting all the line spectra corresponding to different values of m with only the single constant R^. 
Although this formula provides an important clue regarding the underlying processes involved in 
spectroscopy, more than two decades passed before a theory of atomic structure succeeded in deriving this 
equation from first principles. 

The origins of line spectra as well as other unexplained phenomena such as radioactivity and the intensity 
profile in the emission spectrum of hot objects eventually led to a realization that the physics of the day was 
incomplete. New ideas were clearly needed before a detailed understanding of the submicroscopic world of 
atoms and molecules could be gained. At the turn of the 20th century, Planck succeeded in deriving an 
equation that gave a correct description of the radiation emitted by an idealized isolated solid (blackbody 
radiation). In the derivation, Planck assumed that the energy of electromagnetic radiation emitted by the 
vibrating atoms of the solid cannot have just any energy, but must be an integral multiple of /zv, where v is the 
frequency of the radiation and h is now known as Planck's constant. The resulting formula matched the 
experimental blackbody spectrum perfectly. 

Another phenomenon that could not be explained by classical physics involved what is now known as the 
photoelectric effect. When light impinges on a metal, ionization leading to ejection of electrons happens only 
at wavelengths (X = civ, where c is the speed of light) below a certain threshold. At shorter wavelengths 
(higher frequency), the kinetic energy of the photoelectrons depends linearly on the frequency of the applied 
radiation field and is independent of its intensity. These findings were inconsistent with conventional 
electromagnetic theory. A brilliant analysis of this phenomenon by Einstein convincingly demonstrated that 
electromagnetic energy is indeed absorbed in bundles, or quanta (now called photons), each with energy hv 
where h is precisely the same quantity that appears in Planck's formula for the blackbody emission spectrum. 

While the revolutionary ideas of Planck and Einstein forged the beginnings of the quantum theory, the physics 
governing the structure and properties of atoms and molecules remained unknown. Independent experiments 
by Thomson, Weichert and Kaufmann had established that atoms are not the indivisible entities postulated by 
Democritus 2000 years ago and assumed in Dalton's atomic theory. Rather, it had become clear that all atoms 
contain identical negative charges called electrons. At first, this was viewed as a rather esoteric feature of 
matter, the electron being an entity that 'would never be of any use to anyone'. With time, however, the 
importance of the electron and its role in the structure of atoms came to be understood. Perhaps the most 
significant advance was Rutherford's interpretation of the scattering of alpha particles from a thin gold foil in 
terms of atoms containing a very small, dense, positively charged core surrounded by a cloud of electrons. 
This picture of atoms is fundamentally correct, and is now learned each year by millions of elementary school 
students. 

Like the photoelectric effect, the atomic model developed by Rutherford in 1911 is not consistent with the 
classical theory of electromagnetism. In the hydrogen atom, the force due to Coulomb attraction between the 
nucleus and the electron results in acceleration of the electron (Newton's first law). Classical electromagnetic 
theory mandates that all accelerated bodies bearing charge must emit radiation. Since emission of radiation 
necessarily results in a loss of energy, the electron should eventually be captured by the nucleus. But this 
catastrophe does not occur. Two years after Rutherford's gold-foil experiment, the first quantitatively 
successful theory of an atom was developed by Bohr. This model was based on a combination of purely 
classical ideas, use of Planck's constant h and the bold assumption that radiative loss of energy does not occur 
provided the electron adheres to certain special orbits, or 'stationary states'. Specifically, electrons that move 
in a circular path about the nucleus with a classical angular momentum mvr equal to an integral multiple of 
Planck's constant divided by 2tt (a quantity of sufficient general use that it is designated by the simple symbol 
ft) are immune from energy loss in the Bohr model. By simply writing the classical energy of the orbiting 
electron in terms of its mass m, velocity v, distance r from the nucleus and charge e, 
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invoking the (again classical) virial theorem that relates the average kinetic ((J)) and potential ((V)) energy of 
a system governed by a potential that depends on pairwise interactions of the form r" via 

{T) = -{V) (A1.1.4) 

and using Bohr's criterion for stable orbits 
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it is relatively easy to demonstrate that energies associated with orbits having angular momentum h Jjin the 
hydrogen atom are given by 
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with corresponding radii 
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Bohr further postulated that quantum jumps between the different allowed energy levels are always 
accompanied by absorption or emission of a photon, as required by energy conservation, viz. 


me* / 1 I \ , 

AE = E n - E m = — r I — -) = Avpi 

2/r \m* /r/ 


h^n (ALLS) 


or perhaps more illustratively 
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precisely the form of the equation deduced by Ritz. The constant term of equation (Al . 1 .2) calculated from 
Bohr's equation did not exactly reproduce the experimental value at first. However, this situation was quickly 
remedied when it was realized that a proper treatment of the two-particle problem involved use of the reduced 
mass of the system |u = mm moton K m + m Dr oton)' a m i nor modification that gives striking agreement with 
experiment. 


Despite its success in reproducing the hydrogen atom spectrum, the Bohr model of the atom rapidly 
encountered difficulties. Advances in the resolution obtained in spectroscopic experiments had shown that the 
spectral features of the hydrogen atom are actually composed of several closely spaced lines; these are not 
accounted for by quantum jumps between Bohr's allowed orbits. However, by modifying the Bohr model to 


allow for elliptical orbits and to include the special theory of relativity, Sommerfeld was able to account for 
some of the fine structure of spectral lines. More serious problems arose when the planetary model was 
applied to systems that contained more than one electron. Efforts to calculate the spectrum of helium were 
completely unsuccessful, as was a calculation of the spectrum of the hydrogen molecule ion (H?) that used a 

generalization of the Bohr model to treat a problem involving two nuclei. This latter work formed the basis of 
the PhD thesis of Pauli, who was to become one of the principal players in the development of a more mature 
and comprehensive theory of atoms and molecules. 

In retrospect, the Bohr model of the hydrogen atom contains several flaws. Perhaps most prominent among 
these is that the angular momentum of the hydrogen ground state (n = 1) given by the model is h; it is now 
known that the correct value is zero. Efforts to remedy the Bohr model for its insufficiencies, pursued 
doggedly by Sommerfeld and others, were ultimately unsuccessful. This 'old' quantum theory was replaced in 
the 1920s by a considerably more abstract framework that forms the basis for our current understanding of the 
detailed physics governing chemical processes. The modern quantum theory, unlike Bohr's, does not involve 
classical ideas coupled with an ad hoc incorporation of Planck's quantum hypothesis. It is instead founded 
upon a limited number of fundamental principles that cannot be proven, but must be regarded as laws of 
nature. While the modern theory of quantum mechanics is exceedingly complex and fraught with certain 
philosophical paradoxes (which will not be discussed), it has withstood the test of time; no contradiction 
between predictions of the theory and actual atomic or molecular phenomena has ever been observed. 

The purpose of this chapter is to provide an introduction to the basic framework of quantum mechanics, with 
an emphasis on aspects that are most relevant for the study of atoms and molecules. After summarizing the 
basic principles of the subject that represent required knowledge for all students of physical chemistry, the 
independent-particle approximation so important in molecular quantum mechanics is introduced. A significant 
effort is made to describe this approach in detail and to communicate how it is used as a foundation for 
qualitative understanding and as a basis for more accurate treatments. Following this, the basic techniques 
used in accurate calculations that go beyond the independent-particle picture (variational method and 
perturbation theory) are described, with some attention given to how they are actually used in practical 
calculations. 

It is clearly impossible to present a comprehensive discussion of quantum mechanics in a chapter of this 
length. Instead, one is forced to present cursory overviews of many topics or to limit the scope and provide a 
more rigorous treatment of a select group of subjects. The latter alternative has been followed here. 
Consequently, many areas of quantum mechanics are largely ignored. For the most part, however, the areas 
lightly touched upon or completely absent from this chapter are specifically dealt with elsewhere in the 
encyclopedia. Notable among these are the interaction between matter and radiation, spin and magnetism, 
techniques of quantum chemistry including the Born-Oppenheimer approximation, the Hartree-Fock method 
and electron correlation, scattering theory and the treatment of internal nuclear motion (rotation and vibration) 
in molecules. 


A1.1.2 CONCEPTS OF QUANTUM MECHANICS 

A1. 1.2.1 BEGINNINGS AND FUNDAMENTAL POSTULATES 

The modern quantum theory derives from work done independently by Heisenberg and Schrodinger in the 
mid- 1920s. Superficially, the mathematical formalisms developed by these individuals appear very different; 
the quantum mechanics of Heisenberg is based on the properties of matrices, while that of Schrodinger is 
founded upon a differential equation that bears similarities to those used in the classical theory of waves. 
Schrodinger' s formulation was strongly influenced by the work of de Broglie, who made the revolutionary 


hypothesis that entities previously thought to be strictly particle-like (electrons) can exhibit wavelike 
behaviour (such as diffraction) with particle 'wavelength' and momentum (p) related by the equation X = hip. 
This truly startling premise was subsequently verified independently by Davisson and Germer as well as by 
Thomson, who showed that electrons exhibit diffraction patterns when passed through crystals and very small 
circular apertures, respectively. Both the treatment of Heisenberg, which did not make use of wave theory 
concepts, and that of Schrodinger were successfully applied to the calculation of the hydrogen atom spectrum. 
It was ultimately proven by both Pauli and Schrodinger that the 'matrix mechanics' of Heisenberg and the 
'wave mechanics' of Schrodinger are mathematically equivalent. Connections between the two methods were 
further clarified by the transformation theory of Dirac and Jordan. The importance of this new quantum theory 
was recognized immediately and Heisenberg, Schrodinger and Dirac shared the 1932 Nobel Prize in physics 
for their work. 

While not unique, the Schrodinger picture of quantum mechanics is the most familiar to chemists principally 
because it has proven to be the simplest to use in practical calculations. Hence, the remainder of this section 
will focus on the Schrodinger formulation and its associated wavefunctions, operators and eigenvalues. 
Moreover, effects associated with the special theory of relativity (which include spin) will be ignored in this 
subsection. Treatments of alternative formulations of quantum mechanics and discussions of relativistic 
effects can be found in the reading list that accompanies this chapter. 

Like the geometry of Euclid and the mechanics of Newton, quantum mechanics is an axiomatic subject. By 
making several assertions, or postulates, about the mathematical properties of and physical interpretation 
associated with solutions to the Schrodinger equation, the subject of quantum mechanics can be applied to 
understand behaviour in atomic and molecular systems. The first of these postulates is: 

1. Corresponding to any collection of n particles, there exists a time-dependent function ^{q^ 
q 2 , . . ., q n l t) that comprises all information that can be known about the system. This function 
must be continuous and single valued, and have continuous first derivatives at all points where 
the classical force has a finite magnitude. 

In classical mechanics, the state of the system may be completely specified by the set of Cartesian particle 
coordinates r. and velocities drJdt at any given time. These evolve according to Newton's equations of 
motion. In principle, one can write down equations involving the state variables and forces acting on the 
particles which can be solved to give the location and velocity of each particle at any later (or earlier) time t\ 
provided one knows the precise state of the classical system at time t. In quantum mechanics, the state of the 
system at time t is instead described by a well behaved mathematical function of the particle coordinates q> 
rather than a simple list of positions and velocities. 


The relationship between this wavefunction (sometimes called state function) and the location of particles in 
the system forms the basis for a second postulate: 

2. The product of W (q v q 2 , . . ., q n ; t) and its complex conjugate has the following physical 
interpretation. The probability of finding the n particles of the system in the regions bounded by 
the coordinates <?i * G^ - - ■ ^and q^Qi fl^at time t is proportional to the integral 


f f ■■ I Viquqz q tl ,t)#(q\.q z ,...q n ir)dqi<iqi-'dq ttl 


(A1.1.10) 


The proportionality between the integral and the probability can be replaced by an equivalence if the 
wavefunction is scaled appropriately. Specifically, since the probability that the n particles will be found 
somewhere must be unity, the wavefunction can be scaled so that the equality 
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is satisfied. The symbol dx introduced here and used throughout the remainder of this section indicates that 
the integral is to be taken over the full range of all particle coordinates. Any wavefunction that satisfies 
equation (Al. 1.1 1) is said to be normalized. The product W^W corresponding to a normalized wavefunction is 
sometimes called a probability, but this is an imprecise use of the word. It is instead a probability density, 
which must be integrated to find the chance that a given measurement will find the particles in a certain region 
of space. This distinction can be understood by considering the classical counterpart of W^W for a single 
particle moving on the x-axis. In classical mechanics, the probability at time t for finding the particle at the 
coordinate (x') obtained by propagating Newton's equations of motion from some set of initial conditions is 
exactly equal to one; it is zero for any other value of x. What is the corresponding probability density function, 
P(x; t) Clearly, P(x; t) vanishes at all points other than x' since its integral over any interval that does not 
include x' must equal zero. At x', the value of P(x; t) must be chosen so that the normalization condition 
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is satisfied. Functions such as this play a useful role in quantum mechanics. They are known as Dirac delta 
functions, and are designated by S(r - r Q ). These functions have the properties 
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Although a seemingly odd mathematical entity, it is not hard to appreciate that a simple one-dimensional 
realization of the classical P(x; t) can be constructed from the familiar Gaussian distribution centred about x' 
by letting the standard deviation (a) go to zero, 
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Hence, although the probability for finding the particle at x' is equal to one, the corresponding probability 
density function is infinitely large. In quantum mechanics, the probability density is generally nonzero for all 
values of the coordinates, and its magnitude can be used to determine which regions are most likely to contain 
particles. However, because the number of possible coordinates is infinite, the probability associated with any 
precisely specified choice is zero. The discussion above shows a clear distinction between classical and 
quantum mechanics; given a set of initial conditions, the locations of the particles are determined exactly at all 
future times in the former, while one generally can speak only about the probability associated with a given 
range of coordinates in quantum mechanics. 


To extract information from the wavefunction about properties other than the probability density, additional 
postulates are needed. All of these rely upon the mathematical concepts of operators, eigenvalues and 
eigenfunctions. An extensive discussion of these important elements of the formalism of quantum mechanics 
is precluded by space limitations. For further details, the reader is referred to the reading list supplied at the 
end of this chapter. In quantum mechanics, the classical notions of position, momentum, energy etc are 
replaced by mathematical operators that act upon the wavefunction to provide information about the system. 
The third postulate relates to certain properties of these operators: 

3. Associated with each system property A is a linear, Hermitian operator A. 

Although not a unique prescription, the quantum-mechanical operators ^can be obtained from their classical 
counterparts A by making the substitutions x — » x (coordinates); t — » t (time); p — » -ihdldq (component of 

momentum). Hence, the quantum-mechanical operators of greatest relevance to the dynamics of an ^-particle 
system such as an atom or molecule are: 


Dynamical variable A 


Classical quantity Quantum-mechanical operator A 


Time 
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Position of particle / 
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Momentum of particle / 
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Angular momentum of particle / 
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Kinetic energy of particle / 
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Potential energy 

V(q, t) 

V(q, t) 


where the gradient 
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and Laplacian 
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operators have been introduced. Note that a potential energy which depends upon only particle coordinates 
and time has exactly the same form in classical and quantum mechanics. A particularly useful operator in 
quantum mechanics is that which corresponds to the total energy. This Hamiltonian operator is obtained by 
simply adding the potential and kinetic energy operators 


~ ~ ^ h 2 r a 2 ;> 2 ;j 2 " 


pedicles 


(A1.1.19) 


The relationship between the abstract quantum-mechanical operators i4and the corresponding physical 
quantities A is the subject of the fourth postulate, which states: 

4. If the system property A is measured, the only values that can possibly be observed are those 
that correspond to eigenvalues of the quantum-mechanical operator A. 

An illustrative example is provided by investigating the possible momenta for a single particle travelling in 
the x-direction, p . First, one writes the equation that defines the eigenvalue condition 

P,f[x) = -i»^i! = kf(x) (A1.1.20) 
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where X is an eigenvalue of the momentum operator and/(x) is the associated eigenfunction. It is easily 
verified that this differential equation has an infinite number of solutions of the form 

AU) = Aexp(iA-.v) (A1.1.21) 

with corresponding eigenvalues 

k k = h k (A1.1.22) 


in which k can assume any value. Hence, nature places no restrictions on allowed values of the linear 
momentum. Does this mean that a quantum-mechanical particle in a particular state \|/(x; f) is allowed to have 
any value ofp ? The answer to this question is 'yes', but the interpretation of its consequences rather subtle. 
Eventually a fifth postulate will be required to establish the connection between the quantum-mechanical 
wavefunction \|/ and the possible outcomes associated with measuring properties of the system. It turns out 
that the set of possible momenta for our particle depends entirely on its wavefunction, as might be expected 
from the first postulate given above. The infinite set of solutions to equation (A 1.1. 20) means only that no 
values of the momentum are excluded, in the sense that they can be associated with a particle described by an 
appropriately chosen wavefunction. However, the choice of a specific function might (or might not) impose 
restrictions on which values of p are allowed. 

The rather complicated issues raised in the preceding paragraph are central to the subject of quantum 
mechanics, and their resolution forms the basis of one of the most important postulates associated with the 
Schrodinger formulation of the subject. In the example above, discussion focuses entirely on the eigenvalues 
of the momentum operator. What significance, if any, can be attached to the eigenfunctions of quantum- 
mechanical operators? In the interest of simplicity, the remainder of this subsection will focus entirely on the 
quantum mechanics associated with operators that have a finite number of eigenvalues. These are said to have 
a discrete spectrum, in contrast to those such as the linear momentum, which have a continuous spectrum. 
Discrete spectra of eigenvalues arise whenever boundaries limit the region of space in which a system can be. 
Examples are particles in hard-walled boxes, or soft-walled shells and particles attached to springs. The 
results developed below can all be generalized to the continuous case, but at the expense of increased 
mathematical complexity. Readers interested in these details should consult chapter 1 of Landau and Lifschitz 
(see additional reading). 


It can be shown that the eigenfunctions of Hermitian operators necessarily exhibit a number of useful 
mathematical properties. First, if all eigenvalues are distinct, the set of eigenfunctions {fpf 2 m "f n } are 
orthogonal in the sense that the integral of the product formed from the complex conjugate of eigenfunction 
J (//)and eigenfunction k (f^) vanishes unless j = k, 
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If there are identical eigenvalues (a common occurrence in atomic and molecular quantum mechanics), it is 
permissible to form linear combinations of the eigenfunctions corresponding to these degenerate eigenvalues, 
as these must also be eigenfunctions of the operator. By making a judicious choice of the expansion 
coefficients, the degenerate eigenfunctions can also be made orthogonal to one another. Another useful 
property is that the set of eigenfunctions is said to be complete. This means that any function of the 
coordinates that appear in the operator can be written as a linear combination of its eigenfunctions, provided 
that the function obeys the same boundary conditions as the eigenfunctions and shares any fundamental 
symmetry property that is common to all of them. If, for example, all of the eigenfunctions vanish at some 
point in space, then only functions that vanish at the same point can be written as linear combinations of the 
eigenfunctions. Similarly, if the eigenfunctions of a particular operator in one dimension are all odd functions 
of the coordinate, then all linear combinations of them must also be odd. It is clearly impossible in the latter 

case to expand functions such as cos(x), exp(-x ) etc in terms of odd functions. This qualification is omitted in 
some elementary treatments of quantum mechanics, but it is one that turns out to be important for systems 
containing several identical particles. Nevertheless, if these criteria are met by a suitable function g, then it is 
always possible to find coefficients c k such that 
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where the coefficient c. is given by 
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If the eigenfunctions are normalized, this expression reduces to 

'■/ = / fjS^- (A1.1.26) 

When normalized, the eigenfunctions corresponding to a Hermitian operator are said to represent an 
orthonormal set. 

The mathematical properties discussed above are central to the next postulate: 

5. In any experiment, the probability of observing a particular non-degenerate value for the 
system property A can be determined by the following procedure. First, expand the 
wavefunction in terms of the complete set of normalized eigenfunctions of the quantum- 
mechanical operator, A, 
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The probability of measuring A = X k , where X k is the eigenvalue associated with the normalized 
eigenfunction (^, is precisely equal to |q|- {—i^c^. For degenerate eigenvalues, the probability 

of observation is given by Z I c k \ , where the sum is taken over all of the eigenfunctions § k that 
correspond to the degenerate eigenvalue X k . 

At this point, it is appropriate to mention an elementary concept from the theory of probability. If there are n 
possible numerical outcomes (£, ) associated with a particular process, the average value (Q can be calculated 
by summing up all of the outcomes, each weighted by its corresponding probability 
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As an example, the possible outcomes and associated probabilities for rolling a pair of six-sided dice are 


Sum 

Probability 

2 

1/36 

3 

1/18 

4 

1/12 

5 

1/9 

6 

5/36 

7 

1/6 

8 

5/36 

9 

1/9 

10 

1/12 

11 

1/18 

12 

1/36 


The average value is therefore given by the sum 

What does this have to do with quantum mechanics? To establish a connection, it is necessary to first expand 
the wavefunction in terms of the eigenfunctions of a quantum-mechanical operator ^, 


W = J2 Ck ^+ 
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We will assume that both the wavefunction and the orthogonal eigenfunctions are normalized, which implies 
that 


E c >j = Ij c j| 2 = l 
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Now, the operator ^is applied to both sides of equation (A 1.1. 29), which because of its linearity, gives 

AW = A y^ r k <j> k = y^QA^ = y^i: k k k ^> k (A1.1.31) 
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where ^represents the eigenvalue associated with the eigenfunction ^. Next, both sides of the preceding 
equation are multiplied from the left by the complex conjugate of the wavefunction and integrated over all 
space 

/*M*dr= fj2J2 c ? kk ^A dT (A1.1.32) 
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The last identity follows from the orthogonality property of eigenfunctions and the assumption of 
normalization. The right-hand side in the final result is simply equal to the sum over all eigenvalues of the 
operator (possible results of the measurement) multiplied by the respective probabilities. Hence, an important 
corollary to the fifth postulate is established: 
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This provides a recipe for calculating the average value of the system property associated with the quantum- 
mechanical operator A, for a specific but arbitrary choice of the wavefunction W, notably those choices which 
are not eigenfunctions of A . 

The fifth postulate and its corollary are extremely important concepts. Unlike classical mechanics, where 
everything can in principle be known with precision, one can generally talk only about the probabilities 
associated with each member of a set of possible outcomes in quantum mechanics. By making a measurement 
of the quantity A, all that can be said with certainty is that one of the eigenvalues of A will be observed, and its 
probability can be calculated precisely. However, if it happens that the wavefunction corresponds to one of 
the eigenfunctions of the operator A, then and only then is the outcome of the experiment certain: the 
measured value of A will be the corresponding eigenvalue. 

Up until now, little has been said about time. In classical mechanics, complete knowledge about the system at 
any time t suffices to predict with absolute certainty the properties of the system at any other time f . The 
situation is quite different in quantum mechanics, however, as it is not possible to know everything about the 
system at any time t. Nevertheless, the temporal behavior of a quantum-mechanical system evolves in a well 
defined way that depends on the Hamiltonian operator and the wavefunction W according to the last postulate 

6. The time evolution of the wavefunction is described by the differential equation 
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The differential equation above is known as the time-dependent Schrodinger equation. There is an interesting 
and 
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intimate connection between this equation and the classical expression for a travelling wave 


AU,t) = Aexp(2xi |- - vtY\ 
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To convert (Al.1.37) into a quantum-mechanical form that describes the 'matter wave' associated with a free 
particle travelling through space, one might be tempted to simply make the substitutions v = E/h (Planck's 
hypothesis) and X = hip (de Broglie's hypothesis). It is relatively easy to verify that the resulting expression 
satisfies the time-dependent Schrodinger equation. However, it should be emphasized that this is not a 
derivation, as there is no compelling reason to believe that this ad hoc procedure should yield one of the 
fundamental equations of physics. Indeed, the time-dependent Schrodinger equation cannot be derived in a 
rigorous way and therefore must be regarded as a postulate. 

The time-dependent Schrodinger equation allows the precise determination of the wavefunction at any time t 
from knowledge of the wavefunction at some initial time, provided that the forces acting within the system are 
known (these are required to construct the Hamiltonian). While this suggests that quantum mechanics has a 
deterministic component, it must be emphasized that it is not the observable system properties that evolve in a 
precisely specified way, but rather the probabilities associated with values that might be found for them in a 
measurement. 

A1.1.2.2 STATIONARY STATES, SUPERPOSITION AND UNCERTAINTY 

From the very beginning of the 20th century, the concept of energy conservation has made it abundantly clear 
that electromagnetic energy emitted from and absorbed by material substances must be accompanied by 
compensating energy changes within the material. Hence, the discrete nature of atomic line spectra suggested 
that only certain energies are allowed by nature for each kind of atom. The wavelengths of radiation emitted 
or absorbed must therefore be related to the difference between energy levels via Planck's hypothesis, A E = 
hv = hclX. 

The Schrodinger picture of quantum mechanics summarized in the previous subsection allows an important 
deduction to be made that bears directly on the subject of energy levels and spectroscopy. Specifically, the 
energies of spectroscopic transitions must correspond precisely to differences between distinct eigenvalues of 
the Hamiltonian operator, as these correspond to the allowed energy levels of the system. Hence, the set of 
eigenvalues of the Hamiltonian operator are of central importance in chemistry. These can be determined by 
solving the so-called time-independent Schrodinger equation, 

flWtfl* tf2 + - * i flr,) = EftVt(9l* m £rr) (A1.1.38) 

for the eigenvalues E k and eigenfunctions \|/^. It should be clear that the set of eigenfunctions and eigenvalues 
does not evolve with time provided the Hamiltonian operator itself is time independent. Moreover, since the 


eigenfunctions of the Hamiltonian (like those of any other operator) form a complete set, it is always possible 
to expand the exact wavefunction of the system at any time in terms of them: 


*{yj- ys q^0 = ^CjiO^ji^.qi yJ 


(A1.1.39) 
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It is important to point out that this expansion is valid even if time-dependent terms are added to the 
Hamiltonian (as, for example, when an electric field is turned on). If there is more than one nonzero value of 
c at any time t, then the system is said to be in a superposition of the energy eigenstates \|/^ associated with 
non- vanishing expansion coefficients, c^. If it were possible to measure energies directly, then the fifth 
postulate of the previous section tells us that the probability of finding energy E k in a given measurement 
would be t-j^Q. 

When a molecule is isolated from external fields, the Hamiltonian contains only kinetic energy operators for 
all of the electrons and nuclei as well as terms that account for repulsion and attraction between all distinct 
pairs of like and unlike charges, respectively. In such a case, the Hamiltonian is constant in time. When this 
condition is satisfied, the representation of the time-dependent wavefunction as a superposition of 
Hamiltonian eigenfunctions can be used to determine the time dependence of the expansion coefficients. If 
equation (A 1.1. 3 9) is substituted into the time-dependent Schrodinger equation 

ift— J^CjtfO^ft = H^2c k (t)f k (A1.1.40) 

A Jt 

the simplification 

isj]^— qO) = J^AftafOfi (A1.1.41) 

A Ct ft 

can be made to the right-hand side since the restriction of a time-independent Hamiltonian means that \y k is 
always an eigenfunction ofH. By simply equating the coefficients of the \|/^, it is easy to show that the choice 


m 


Q(f)=Q(0)expf — 1 (A1.1.42) 

for the time-dependent expansion coefficients satisfies equation (Al.1.41). Like any differential equation, 
there are an infinite number of solutions from which a choice must be made to satisfy some set of initial 
conditions. The state of the quantum-mechanical system at time t = is used to fix the arbitrary multipliers c k 
(0), which can always be chosen as real numbers. Hence, the wavefunction W becomes 


* = ^ Q (0)exp^) 


^ (A1.1.43) 


Suppose that the system property A is of interest, and that it corresponds to the quantum-mechanical operator 
A. The average value of A obtained in a series of measurements can be calculated by exploiting the corollary 
to the fifth postulate 


{A}= /*Mi|>dT ** J^ X/7 (0>rjl(0) / e *p( '^J^^expr^ Wrdr. (A1.1.44) 
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Now consider the case where Ais itself a time-independent operator, such as that for the position, momentum 
or angular momentum of a particle or even the energy of the benzene molecule. In these cases, the time- 
dependent expansion coefficients are unaffected by application of the operator, and one obtains 




(A1.1.45) 
CjityctiO) cqs | ^-"" J | f f. Af t At. 


As one might expect, the first term that contributes to the expectation value of A is simply its value at t = 0, 
while the second term exhibits an oscillatory time dependence. If the superposition initially includes large 
contributions from states of widely varying energy, then the oscillations in (A) will be rapid. If the states that 
are strongly mixed have similar energies, then the timescale for oscillation in the properties will be slower. 
However, there is one special class of system properties A that exhibit no time dependence whatsoever. If (and 
only if) every one of the states \\f k is an eigenfunction of A, then the property of orthogonality can be used to 

show that every contribution to the second term vanishes. An obvious example is the Hamiltonian operator 
itself; it turns out that the expectation value for the energy of a system subjected to forces that do not vary 
with time is a constant. Are there other operators that share the same set of eigenfunctions \\f k with /?, and if 
so, how can they be recognized? It can be shown that any two operators which satisfy the property 

ABf = BAf =* [A, B]f = (A1.1.46) 

for all functions /share a common set of eigenfunctions, and A and B are said to commute. (The symbol [^, ft] 
meaning AH- JFM, is called the commutator of the operators j4and ft.) Hence, there is no time dependence for 
the expectation value of any system property that corresponds to a quantum-mechanical operator that 
commutes with the Hamiltonian. Accordingly, these quantities are known as constants of the motion: their 
average values will not vary, provided the environment of the system does not change (as it would, for 
example, if an electromagnetic field were suddenly turned on). In nonrelativistic quantum mechanics, two 
examples of constants of the motion are the square of the total angular momentum, as well as its projection 
along an arbitrarily chosen axis. Other operators, such as that for the dipole moment, do not commute with the 
Hamiltonian and the expectation value associated with the corresponding properties can indeed oscillate with 
time. It is important to note that the frequency of these oscillations is given by differences between the 
allowed energies of the system divided by Planck's constant. These are the so-called Bohr frequencies, and it 
is perhaps not surprising that these are exactly the frequencies of electromagnetic radiation that cause 
transitions between the corresponding energy levels. 

Close inspection of equation (Al.1.45) reveals that, under very special circumstances, the expectation value 
does not change with time for any system properties that correspond to fixed (static) operator representations. 
Specifically, if the spatial part of the time-dependent wave function is the exact eigenfunction \|/. of the 
Hamiltonian, then c(0) = 1 (the zero of time can be chosen arbitrarily) and all other cJO) = 0. The second 

J K 

term clearly vanishes in these cases, which are known as stationary states. As the name implies, all 
observable properties of these states do not vary with time. In a stationary state, the energy of the system has a 
precise value (the corresponding eigenvalue of/?) as do observables that are associated with operators that 
commute with /?. For all other properties (such as the position and momentum), 
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one can speak only about average values or probabilities associated with a given measurement, but these 
quantities themselves do not depend on time. When an external perturbation such as an electric field is applied 
or a collision with another atom or molecule occurs, however, the system and its properties generally will 
evolve with time. The energies that can be absorbed or emitted in these processes correspond precisely to 
differences in the stationary state energies, so it should be clear that solving the time-independent Schrodinger 
equation for the stationary state wavefunctions and eigenvalues provides a wealth of spectroscopic 
information. The importance of stationary state solutions is so great that it is common to refer to equation 
(Al.1.38) as 'the Schrodinger equation', while the qualified name 'time-dependent Schrodinger equation' is 
generally used for equation (A 1.1. 3 6) . Indeed, the subsequent subsections are devoted entirely to discussions 
that centre on the former and its exact and approximate solutions, and the qualifier 'time independent' will be 
omitted. 

Starting with the quantum-mechanical postulate regarding a one-to-one correspondence between system 
properties and Hermitian operators, and the mathematical result that only operators which commute have a 
common set of eigenfunctions, a rather remarkable property of nature can be demonstrated. Suppose that one 
desires to determine the values of the two quantities A and B, and that the corresponding quantum-mechanical 
operators do not commute. In addition, the properties are to be measured simultaneously so that both reflect 
the same quantum-mechanical state of the system. If the wavefunction is neither an eigenfunction of i4nor #, 
then there is necessarily some uncertainty associated with the measurement. To see this, simply expand the 
wavefunction \|/ in terms of the eigenfunctions of the relevant operators 

* = 2Z fl */* A (A1.1.47) 

$ = ^^/jr* (A1.1.48) 

k 

where the eigenfunctions /j^and /t of operators /land H, respectively, are associated with corresponding 

eigenvalues JLj^and ^ . Given that \|/ is not an eigenfunction of either operator, at least two of the coefficients 
a k and two of the b k must be nonzero. Since the probability of observing a particular eigenvalue is 
proportional to the square of the expansion coefficient corresponding to the associated eigenfunction, there 
will be no less than four possible outcomes for the set of values A and B. Clearly, they both cannot be 
determined precisely. Indeed, under these conditions, neither of them can be! 

In a more favourable case, the wavefunction \|/ might indeed correspond to an eigenfunction of one of the 
operators. If ^ = f*, then a measurement of A necessarily yields X^L and this is an unambiguous result. 

What can be said about the measurement of B in this case? It has already been said that the eigenfunctions of 
two commuting operators are identical, but here the pertinent issue concerns eigenfunctions of two operators 
that do not commute. Suppose f*is an eigenfunction of A. Then, it must be true that 




A ,A fi w (A1-1-49) 
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If f^is also an eigenfunction of B, then it follows that AB f* = BAf A = X A A. fl f\ which contradicts the 
assumption that Amd Rdo not commute. Hence, no nontrivial eigenfunction of /lean also be an eigenfunction 
of H. Therefore, if measurement of A yields a precise result, then some uncertainty must be associated with B. 
That is, the expansion of \\r in terms of eigenfunctions of flf equation (Al.1.48) ) must have at least two non- 
vanishing coefficients; the corresponding eigenvalues therefore represent distinct possible outcomes of the 
experiment, each having probability /£/>*. A physical interpretation of Af^is the process of measuring the 
value of A for a system in a state with a unique value for this property A^J. However Bf Represents a 
measurement that changes the state of the system, so that if after we measure B and then measure A, we would 
no longer find X^as its value: BAf* = kf n Bf* ^ ABf*. 

The Heisenberg uncertainty principle offers a rigorous treatment of the qualitative picture sketched above. If 
several measurements of A and B are made for a system in a particular quantum state, then quantitative 
uncertainties are provided by standard deviations in the corresponding measurements. Denoting these as a A 
and a B , respectively, it can be shown that 


*A*B> JlflA, B]}|* (A1.1.50) 

One feature of this inequality warrants special attention. In the previous paragraph it was shown that the 
precise measurement of A made possible when \|/ is an eigenfunction of ^necessarily results in some 
uncertainty in a simultaneous measurement of B when the operators ^and ftdo not commute. However, the 
mathematical statement of the uncertainty principle tells us that measurement of B is in fact completely 
uncertain: one can say nothing at all about B apart from the fact that any and all values of B are equally 
probable! A specific example is provided by associating A and B with the position and momentum of a 
particle moving along the x-axis. It is rather easy to demonstrate that \p x , x] = - ifi, so that 07^0* 5: A/2. If 

the system happens to be described by a Dirac delta function at the point x Q (which is an eigenfunction of the 
position operator corresponding to eigenvalue x Q ), then the probabilities associated with possible momenta 
can be determined by expanding 5(x-x Q ) in terms of the momentum eigenfunctions A exp(iAx). Carrying out 
such a calculation shows that all of the infinite number of possible momenta (the momentum operator has a 
continuous spectrum) appear in the wavefunction expansion, all with precisely the same weight. Hence, no 
particular momentum or (more properly in this case) range bounded byp x + dp x is more likely to be observed 
than any other. 

A1. 1.2.3 SOME QUALITATIVE FEATURES OF STATIONARY STATES 

A great number of qualitative features associated with the stationary states that correspond to solutions of the 
time-independent Schrodinger can be worked out from rather general mathematical considerations and use of 
the postulates of quantum mechanics. Mastering these concepts and the qualifications that may apply to them 
is essential if one is to obtain an intuitive feeling for the subject. In general, the systems of interest to chemists 
are atoms and molecules, both in isolation as well as how they interact with each other or with an externally 
applied field. In all of these cases, the forces acting upon the particles in the system give rise to a potential 
energy function that varies with the positions of the particles, strength of the applied fields etc. In general, the 
potential is a smoothly varying function of the coordinates, either growing without bound for large values of 
the coordinates or tending asymptotically towards a finite value. In these cases, there is necessarily a 
minimum value at what is known as the global equilibrium position (there may be several global minima that 
are equivalent by symmetry). In many cases, there are also other minima 
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(meaning that the matrix of second derivatives with respect to the coordinates has only non-negative 
eigenvalues) that have higher energies, which are called local minima. If the potential becomes infinitely large 
for infinite values of the coordinates (as it does, for example, when the force on a particle varies linearly with 
its displacement from equilibrium) then all solutions to the Schrodinger equation are known as bound states', 
that with the smallest eigenvalue is called the ground state while the others are called excited states. In other 
cases, such as potential functions that represent realistic models for diatomic molecules by approaching a 
constant finite value at large separation (zero force on the particles, with a finite dissociation energy), there 
are two classes of solutions. Those associated with eigenvalues that are below the asymptotic value of the 
potential energy are the bound states, of which there is usually a finite number; those having higher energies 
are called the scattering (or continuum) states and form a continuous spectrum. The latter are dealt with in 
section A3. 11 of the encyclopedia and will be mentioned here only when necessary for mathematical reasons. 

Bound state solutions to the Schrodinger equation decay to zero for infinite values of the coordinates, and are 
therefore integrable since they are continuous functions in accordance with the first postulate. The solutions 
may assume zero values elsewhere in space and these regions — which may be a point, a plane or a three- or 
higher-dimensional hypersurface — are known as nodes. From the mathematical theory of differential 
eigenvalue equations, it can be demonstrated that the lowest eigenvalue is always associated with an 
eigenfunction that has the same sign at all points in space. From this result, which can be derived from the 
calculus of variations, it follows that the wavefunction corresponding to the smallest eigenvalue of the 
Hamiltonian must have no nodes. It turns out, however, that relativistic considerations require that this 
statement be qualified. For systems that contain more than two identical particles of a specific type, not all 
solutions to the Schrodinger equation are allowed by nature. Because of this restriction, which is described in 
subsection (Al.1.3.3) , it turns out that the ground states of lithium, all larger atoms and all molecules other 
than H|, H 2 and isoelectronic species have nodes. Nevertheless, our conceptual understanding of electronic 

structure as well as the basis for almost all highly accurate calculations is ultimately rooted in a single-particle 
approximation. The quantum mechanics of one-particle systems is therefore important in chemistry. 

Shapes of the ground- and first three excited-state wavefunctions are shown in figure Al.1.1 for a particle in 
one dimension subject to the potential V = ^Jbr 2 , which corresponds to the case where the force acting on the 

particle is proportional in magnitude and opposite in direction to its displacement from equilibrium (f= -VV = 
-he). The corresponding Schrodinger equation 


h ^ + ljhr 2 = Ef (A1.1.51) 


2m tk 2 T 2 

can be solved analytically, and this problem (probably familiar to most readers) is that of the quantum 
harmonic oscillator. As expected, the ground-state wavefunction has no nodes. The first excited state has a 
single node, the second two nodes and so on, with the number of nodes growing with increasing magnitude of 
the eigenvalue. From the form of the kinetic energy operator, one can infer that regions where the slope of the 
wavefunction is changing rapidly (large second derivatives) are associated with large kinetic energy. It is 
quite reasonable to accept that wavefunctions with regions of large curvature (where the function itself has 
appreciable magnitude) describe states with high energy, an expectation that can be made rigorous by 
applying a quantum-mechanical version of the virial theorem. 
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Figure Al.1.1. Wavefunctions for the four lowest states of the harmonic oscillator, ordered from the n = 
ground state (at the bottom) to the n = 3 state (at the top). The vertical displacement of the plots is chosen so 
that the location of the classical turning points are those that coincide with the superimposed potential 
function (dotted line). Note that the number of nodes in each state corresponds to the associated quantum 
number. 

Classically, a particle with fixed energy E described by a quadratic potential will move back and forth 
between the points where V= E, known as the classical turning points . Movement beyond the classical 
turning points is forbidden, because energy conservation implies that the particle will have a negative kinetic 
energy in these regions, and imaginary velocities are clearly inconsistent with the Newtonian picture of the 
universe. Inside the turning points, the particle will have its maximum kinetic energy as it passes through the 
minimum, slowing in its climb until it comes to rest and subsequently changes direction at the turning points 
(imagine a marble rolling in a parabola). Therefore, if a camera were to take snapshots of the particle at 
random intervals, most of the pictures would show the particle near the turning points (the equilibrium 
position is actually the least likely location for the particle). A more detailed analysis of the problem shows 
that the probability of seeing the classical particle in the neighbourhood of a given position x is proportional to 
,ri_|n . Note that the situation found for the ground state described by quantum mechanics bears very little 

resemblance to the classical situation. The particle is most likely to be found at the equilibrium position and, 
within the classically allowed region, least likely to be seen at the turning points. However, the situation is 
even stranger than this: the probability of finding the particle outside the turning points is non-zero! This 
phenomenon, known as tunnelling, is not unique to the harmonic oscillator. Indeed, it occurs for bound states 
described by every potential 
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that tends asymptotically to a finite value since the wavefunction and its derivatives must approach zero in a 


smooth fashion for large values of the coordinates where (by the definition of a bound state) Fmust exceed E. 
However, at large energies (see the 29th excited state probability density in figure Al.1.2, the situation is 
more consistent with expectations based on classical theory: the probability density has its largest value near 
the turning points, the general appearance is as implied by the classical formula (if one ignores the 
oscillations) and its magnitude in the classically forbidden region is reduced dramatically with respect to that 
found for the low-lying states. This merging of the quantum-mechanical picture with expectations based on 
classical theory always occurs for highly excited states and is the basis of the correspondence principle. 



Figure Al.1.2. Probability density (\|/*i|/) for the n = 29 state of the harmonic oscillator. The vertical state is 
chosen as in figure Al.1.1 , so that the locations of the turning points coincide with the superimposed potential 
function. 

The energy level spectrum of the harmonic oscillator is completely regular. The ground state energy is given 
by 2^v, where v is the classical frequency of oscillation given by 


v = ^/f 


(A1.1.52) 
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although it must be emphasized that our inspection of the wavefunction shows that the motion of the particle 
cannot be literally thought of in this way. The energy of the first excited state is hv above that of the ground 
state and precisely the same difference separates each excited state from those immediately above and below. 
A different example is provided by a particle trapped in the Morse potential 


V(jc) = D t [Qxpi-ax) - I]", (A1.1.53) 

originally suggested as a realistic model for the vibrational motion of diatomic molecules. Although the 
wavefunctions associated with the Morse levels exhibit largely the same qualitative features as the harmonic 
oscillator functions and are not shown here, the energy level structures associated with the two systems are 
qualitatively different. Since V(x) tends to a finite value (D ) for large x, there are only a limited number of 
bound state solutions, and the spacing between them decreases with increasing eigenvalue. This is another 
general feature; energy level spacings for states associated with potentials that tend towards asymptotic values 
at infinity tend to decrease with increasing quantum number. 

The one-dimensional cases discussed above illustrate many of the qualitative features of quantum mechanics, 
and their relative simplicity makes them quite easy to study. Motion in more than one dimension and 
(especially) that of more than one particle is considerably more complicated, but many of the general features 
of these systems can be understood from simple considerations. While one relatively common feature of 
multidimensional problems in quantum mechanics is degeneracy, it turns out that the ground state must be 
non-degenerate. To prove this, simply assume the opposite to be true, i.e. 

tf^l =E i} f [ (A1.1.54) 

H$2 = £fi$2 (A1.1.55) 

where E^ is the ground state energy, and 

$ 2 dr = 0. (A1.1.56) 


/*■■ 


In order to satisfy equation (Al.1.56), the two functions must have identical signs at some points in space and 
different signs elsewhere. It follows that at least one of them must have at least one node. However, this is 
incompatible with the nodeless property of ground-state eigenfunctions. 

Having established that the ground state of a single-particle system is non-degenerate and nodeless, it is 
straightforward to prove that the wavefunctions associated with every excited state must contain at least one 
node (though they need not be degenerate!), just as seen in the example problems. It follows from the 
orthogonality of eigenfunctions corresponding to a Hermitian operator that 


/ 


tfr*lfc,dr = (A1.1.57) 
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for all excited states \\f . In order for this equality to be satisfied, it is necessary that the integrand either 
vanishes at all points in space (which contradicts the assumption that both \\f and \|/ are nodeless) or is 
positive in some regions of space and negative in others. Given that the ground state has no nodes, the latter 
condition can be satisfied only if the excited-state wavefunction changes sign at one or more points in space. 
Since the first postulate states that all wavefunctions are continuous, it is therefore necessary that \\f has at 
least one node. 

In classical mechanics, it is certainly possible for a system subject to dissipative forces such as friction to 
come to rest. For example, a marble rolling in a parabola lined with sandpaper will eventually lose its kinetic 
energy and come to rest at the bottom. Rather remarkably, making a measurement of E that coincides with 


^min ( as wou ^ ^ e f° un( i classically for our stationary marble) is incompatible with quantum mechanics. 
Turning back to our example, the ground-state energy is indeed larger than the minimum value of the 
potential energy for the harmonic oscillator. That this property of zero-point energy is guaranteed in quantum 
mechanics can be demonstrated by straightforward application of the basic principles of the subject. Unlike 
nodal features of the wavefunction, the arguments developed here also hold for many-particle systems. 
Suppose the total energy of a stationary state is E. Since the energy is the sum of kinetic and potential 
energies, it must be true that expectation values of the kinetic and potential energies are related according to 


E= {T) + {V). (A1.1.58) 

If the total energy associated with the state is equal to the potential energy at the equilibrium position, it 
follows that 

V m m-{V)={r>. (A1.1.59) 

Two cases must be considered. In the first, it will be assumed that the wavefunction is nonzero at one or more 
points for which V> F min (for the physically relevant case of a smoothly varying and continuous potential, 
this includes all possibilities other than that in which the wavefunction is a Dirac delta function at the 
equilibrium position). This means that (V) must also be greater than F j thereby forcing the average kinetic 
energy to be negative. This is not possible. The kinetic energy operator for a quantum-mechanical particle 
moving in the x-direction has the (unnormalized) eigenfunctions 

/ = cxp(Uhr) (A1.1.60) 

where 


/2ma y 


(A1.1.61) 


and a are the corresponding eigenvalues. It can be seen that negative values of a give rise to real arguments 
of the exponential and correspondingly divergent eigenfunctions. Zero and non-negative values are associated 
with constant and oscillatory solutions in which the argument of the exponential vanishes or is imaginary, 
respectively. Since divergence of the actual wavefunction is incompatible with its probabilistic interpretation, 
no contribution from negative a eigenfunctions can appear when the wavefunction is expanded in terms of 
kinetic energy eigenfunctions. 
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It follows from the fifth postulate that the kinetic energy of each particle in the system (and therefore the total 
kinetic energy) is restricted to non-negative values. Therefore, the expectation value of the kinetic energy 
cannot be negative. The other possibility is that the wavefunction is non-vanishing only when V= V m ^ For 
the case of a smoothly varying, continuous potential, this corresponds to a state described by a Dirac delta 
function at the equilibrium position, which is the quantum-mechanical equivalent of a particle at rest. In any 
event, the fact that the wavefunction vanishes at all points for which V^ F min means that the expectation 
value of the kinetic energy operator must also vanish if there is to be no zeropoint energy. Considering the 
discussion above, this can occur only when the wavefunction is the same as the zero-kinetic-energy 
eigenfunction (\|/ = constant). This contradicts the assumption used in this case, where the wavefunction is a 


delta function. Following the general arguments used in both cases above, it is easily shown that E can only be 
larger than ^ min , which means that any measurement of E for a particle in a stationary or non-stationary state 
must give a result that satisfies the inequality E > ^ min - 


A1.1.3 QUANTUM MECHANICS OF MANY-PARTICLE SYSTEMS 

A1. 1.3.1 THE HYDROGEN ATOM 

It is admittedly inconsistent to begin a section on many-particle quantum mechanics by discussing a problem 
that can be treated as a single particle. However, the hydrogen atom and atomic ions in which only one 

electron remains (He , Li etc) are the only atoms for which exact analytic solutions to the Schrodinger 
equation can be obtained. In no cases are exact solutions possible for molecules, even after the Born- 
Oppenheimer approximation (see section B3. 1.1.1 ) is made to allow for separate treatment of electrons and 
nuclei. Despite the limited interest of hydrogen atoms and hydrogen-like ions to chemistry, the quantum 
mechanics of these systems is both highly instructive and provides a basis for treatments of more complex 
atoms and molecules. Comprehensive discussions of one-electron atoms can be found in many textbooks; the 
emphasis here is on qualitative aspects of the solutions. 

The Schrodinger equation for a one-electron atom with nuclear charge Z is 


^-V 2 - —f = £> (A1.1.62) 

where |u is the reduced mass of the electron-nucleus system and the Laplacian is most conveniently expressed 
in spherical polar coordinates. While not trivial, this differential equation can be solved analytically. Some of 
the solutions are normalizable, and others are not. The former are those that describe the bound states of one- 
electron atoms, and can be written in the form 

f trStIi = NR a t{r)Y Kmf (B t ^) (A1.1.63) 

where N is a normalization constant, and R n jfir) and Y l (0, §) are specific functions that depend on the 
quantum numbers n, I and m^ The first of these is called the principal quantum number, while / is known as 
the angular momentum, or azimuthal, quantum number, and m l the magnetic quantum number. The quantum 
numbers that allow for normalizable wavefunctions are limited to integers that run over the ranges 


-24- 
n= l t 2 t 3 + .< + (A1.1.64) 

/ = -11+ )> -if +2, ^.,0, 1,2,^/1 - 1 (A1.1.65) 

m s = -!„ -/+ I, ,,, J - 1 J. (A1.1.66) 

The fact that there is no restriction on n apart from being a positive integer means that there are an infinite 
number of bound-state solutions to the hydrogen atom, a peculiarity that is due to the form of the Coulomb 
potential. Unlike most bound state problems, the range of the potential is infinite (it goes to zero at large r, but 
diverges to negative infinity at r = 0). The eigenvalues of the Hamiltonian depend only on the principal 


quantum number and are (in attojoules (10 J)) 

£„ = -2.1s4 (A1 " 1 - 67) 

n- 

where it should be noted that the zero of energy corresponds to infinite separation of the particles. For each 
value of n, the Schrodinger equation predicts that all states are degenerate, regardless of the choice of / and m^ 
Hence, any linear combination of wavefunctions corresponding to some specific value of n is also an 
eigenfunction of the Hamiltonian with eigenvalue E n . States of hydrogen are usually characterized as ns, nip, 
nd etc where n is the principal quantum number and s is associated with / = 0, p with / = 1 and so on. The 
functions R n i(r) describe the radial part of the wavefunctions and can all be written in the form 

R t Ar) = exp(-p/2)^L^(/?) (A1.1.68) 

where p is proportional to the electron-nucleus separation r and the atomic number Z. L nl is a polynomial of 
order n-l-\ that has zeros (where the wavefunction, and therefore the probability of finding the electron, 
vanishes — a radial node) only for positive values of p. The functions 7/^(0, <\>) are the spherical harmonics. 
The first few members of this series are familiar to everyone who has studied physical chemistry: 7 00 is a 
constant, leading to a spherically symmetric wavefunction, while 7 1 , and specific linear combinations of 
7 1 1 and 7 1 _ 1? vanish (have an angular node) in the xy, xz and yz planes, respectively. In general, these 
functions exhibit / nodes, meaning that the number of overall nodes corresponding to a particular v|/^/ m / * s 
equal to n - 1 . For example, the 4d state has two angular nodes (/ = 2) and one radial node (^ w /(p) has one 
zero for positive p). In passing, it should be noted that many of the ubiquitous qualitative features of quantum 
mechanics are illustrated by the wavefunctions and energy levels of the hydrogen atom. First, the system has a 
zero-point energy, meaning that the ground-state energy is larger than the lowest value of the potential (-oo) 
and the spacing between the energy levels decreases with increasing energy. Second, the ground state of the 
system is nodeless (the electron may be found at any point in space), while the number of nodes exhibited by 
the excited states increases with energy. Finally, there is a finite probability that the electron is found in a 
classically forbidden region in all bound states. For the hydrogen atom ground state, this corresponds to all 
electron-proton separations larger than 105.8 pm, where the electron is found 23.8% of the time. As usual, 
this tunnelling phenomenon is less pronounced in excited states: the corresponding values for the 3s state are 
1904 pm and 16.0%. 

The Hamiltonian commutes with the angular momentum operator i- as well as that for the square of the 

angular momentum £ . The wavefunctions above are also eigenfunctions of these operators, with eigenvalues 
rti/fi {L : ) anc j i[\ + I )ft (i>) j t should be emphasized that the total angular momentum is L ^ */{(! + 1 )/i, 
and not a simple 
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integral multiple of fias assumed in the Bohr model. In particular, the ground state of hydrogen has zero 
angular momentum, while the Bohr atom ground state has L = h. The meaning associated with the m l quantum 

number is more difficult to grasp. The choice of z instead of x ory seems to be (and is) arbitrary and it is 
illogical that a specific value of the angular momentum projection along one coordinate must be observed in 
any experiment, while those associated with x and y are not similarly restricted. However, the states with a 
given / are degenerate, and the wavefunction at any particular time will in general be some linear combination 
of the m^ eigenfunctions. The only way to isolate a specific W n \ m \ ( an d therefore ensure the result of measuring 
L ) is to apply a magnetic field that lifts the degeneracy and breaks the symmetry of the problem. The z axis 


then corresponds to the magnetic field direction, and it is the projection of the angular momentum vector on 
this axis that must be equal to mfi. 

The quantum-mechanical treatment of hydrogen outlined above does not provide a completely satisfactory 
description of the atomic spectrum, even in the absence of a magnetic field. Relativistic effects cause both a 
scalar shifting in all energy levels as well as splittings caused by the magnetic fields associated with both 
motion and intrinsic properties of the charges within the atom. The features of this fine structure in the energy 
spectrum were successfully (and miraculously, given that it preceded modern quantum mechanics by a decade 
and was based on a two-dimensional picture of the hydrogen atom) predicted by a formula developed by 
Sommerfeld in 1915. These interactions, while small for hydrogen, become very large indeed for larger atoms 
where very strong electron-nucleus attractive potentials cause electrons to move at velocities close to the 
speed of light. In these cases, quantitative calculations are extremely difficult and even the separability of 
orbital and intrinsic angular momenta breaks down. 

A1. 1.3.2 THE INDEPENDENT-PARTICLE APPROXIMATION 

Applications of quantum mechanics to chemistry invariably deal with systems (atoms and molecules) that 
contain more than one particle. Apart from the hydrogen atom, the stationary-state energies cannot be 
calculated exactly, and compromises must be made in order to estimate them. Perhaps the most useful and 
widely used approximation in chemistry is the independent-particle approximation, which can take several 
forms. Common to all of these is the assumption that the Hamiltonian operator for a system consisting of n 
particles is approximated by the sum 


//o = h\ +/l2 + "-+Air (A1.1.69) 

where the single-particle Hamiltonians h. consist of the kinetic energy operator plus a potential ( 
Hq = ft | + h -> + ■■■ + A J that does not explicitly depend on the coordinates of the other n - 1 particles in the 
system. Of course, the simplest realization of this model is to completely neglect forces due to the other 
particles, but this is often too severe an approximation to be useful. In any event, the quantum mechanics of a 
system described by a Hamiltonian of the form given by equation (Al.1.69) is worthy of discussion simply 
because the independent-particle approximation is the foundation for molecular orbital theory, which is the 
central paradigm of descriptive chemistry. 

Let the orthonormal functions %-(l), X-(2), • • ., xS n ) be selected eigenfunctions of the corresponding single- 
particle Hamiltonians ftp ft 2 , . . ., ft , with eigenvalues A,., X., . . ., X . It is easily verified that the product of 

these single-particle wavefunctions (which are often called orbitals when the particles are electrons in atoms 
and molecules) 
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<P = JfKl)jfj(2) ■ ■ XpW (A1.1.70) 

satisfies the approximate Schrodinger equation for the system 

Hu<p= E^ (A1.1.71) 

with the corresponding energy 


Eft = Xj + Ajf + '♦ • + V (A1.1.72) 

Hence, if the Hamiltonian can be written as a sum of terms that individually depend only on the coordinates of 
one of the particles in the system, then the wavefunction of the system can be written as a product of 
functions, each of which is an eigenfunction of one of the single-particle Hamiltonians, h v The corresponding 
eigenvalue is then given by the sum of eigenvalues associated with each single-particle wavefunction % 
appearing in the product. 

The approximation embodied by equation (Al.1.69) , equation (Al.1.70) , equation (A 1.1.71) and equation 
(Al.1.72) presents a conceptually appealing picture of many-particle systems. The behaviour and energetics 
of each particle can be determined from a simple function of three coordinates and the eigenvalue of a 
differential equation considerably simpler than the one that explicitly accounts for all interactions. It is 
precisely this simplification that is invoked in qualitative interpretations of chemical phenomena such as the 
inert nature of noble gases and the strongly reducing property of the alkali metals. The price paid is that the 
model is only approximate, meaning that properties predicted from it (for example, absolute ionization 
potentials rather than just trends within the periodic table) are not as accurate as one might like. However, as 
will be demonstrated in the latter parts of this section, a carefully chosen independent-particle description of a 
many-particle system provides a starting point for performing more accurate calculations. It should be 
mentioned that even qualitative features might be predicted incorrectly by independent-particle models in 
extreme cases. One should always be aware of this possibility and the oft-misunderstood fact that there really 
is no such thing as an orbital. Fortunately, however, it turns out that qualitative errors are uncommon for 
electronic properties of atoms and molecules when the best independent-particle models are used. 

One important feature of many-particle systems has been neglected in the preceding discussion. Identical 
particles in quantum mechanics must be indistinguishable, which implies that the exact wavefunctions \\f 
which describe them must satisfy certain symmetry properties. In particular, interchanging the coordinates of 
any two particles in the mathematical form of the wavefunction cannot lead to a different prediction of the 
system properties. Since any rearrangement of particle coordinates can be achieved by successive pairwise 
permutations, it is sufficient to consider the case of a single permutation in analysing the symmetry properties 
that wavefunctions must obey. In the following, it will be assumed that the wavefunction is real. This is not 
restrictive, as stationary state wavefunctions for isolated atoms and molecules can always be written in this 
way. If the operator P.. is that which permutes the coordinates of particles / andy, then indistinguishability 
requires that 

t j fiyV'J APirffdz = / V'*^!M* (A1.1.73) 
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for any operator ^(including the identity) and choice of/ andy. Clearly, a wavefunction that is symmetric with 
respect to the interchange of coordinates for any two particles 

Pij$ —^ (A1.1.74) 

satisfies the indistinguishability criterion. However, equation (Al.1.73) is also satisfied if the permutation of 
particle coordinates results in an overall sign change of the wavefunction, i.e. 

Py^r = -tfr. (A1.1.75) 


Without further considerations, the only acceptable real quantum-mechanical wavefunctions for an ^-particle 
system would appear to be those for which 

Piji/r = ±f (A1.1.76) 

where i andy are any pair of identical particles. For example, if the system comprises two protons, a neutron 
and two electrons, the relevant permutations are that which interchanges the proton coordinates and that 
which interchanges the electron coordinates. The other possible pairs involve distinct particles and the action 
of the corresponding P.. operators on the wave function will in general result in something quite different. 
Since indistinguishability is a necessary property of exact wavefunctions, it is reasonable to impose the same 
constraint on the approximate wavefunctions § formed from products of single-particle solutions. However, if 
two or more of the % f in the product are different, it is necessary to form linear combinations if the condition 
P . .\|/ = ± \|/ is to be met. An additional consequence of indistinguishability is that the h f operators 
corresponding to identical particles must also be identical and therefore have precisely the same 
eigenfunctions. It should be noted that there is nothing mysterious about this perfectly reasonable restriction 
placed on the mathematical form of wavefunctions. 

For the sake of simplicity, consider a system of two electrons for which the corresponding single-particle 
states are % f , %., % k , . . ., x w? with eigenvalues X f , X., X k , . . ., X n . Clearly, the two-electron wavefunction § = %. 
(l)%i(2) satisfies the indistinguishability criterion and describes a stationary state with energy Eq = 2Xi. 
However, the state y .(1)%.(2) is not satisfactory. While it is a solution to the Schrodinger equation, it is neither 
symmetric nor antisymmetric with respect to particle interchange. However, two such states can be formed by 
taking the linear combinations 

<k^J]lXi(\)xA2) + Xi (Z)xj(i)] (A1.1.77) 

<t>* = JltiiWXja) - X;(2)Xy(D] (A1-1-78) 

which are symmetric and antisymmetric with respect to particle interchange, respectively. Because the 
functions % are orthonormal, the energies calculated from (|> s and (|> A are the same as that corresponding to the 
unsymmetrized product state %.(1)%.(2), as demonstrated explicitly for (L: 

l J o 
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j fcH^ dr = 1 T j X A I y Xi {2 )if Xi { I >x;<2l dr, drj + j ft ( U X/K) tfft(2l ftd ) dr, dr; 
+ Jft<2)x,(l»rtft(lU / t2)dT l dr ! +y , ^(2)^(l)ffxH2)^(l)dr l <lr i l 

+ f XiWxAUttn j xA2iXia*tei+ f XfmXjUHi, f X A2)x*2ydTi\ 
= r<A(+A>>|! +0 + 0+ l| = Aj+A^. (A1.1.79) 

£■ 

It should be mentioned that the single-particle Hamiltonians in general have an infinite number of solutions, 
so that an uncountable number of wavefunctions \\f can be generated from them. Very often, interest is 
focused on the ground state of many-particle systems. Within the independent-particle approximation, this 
state can be represented by simply assigning each particle to the lowest-lying energy level. If a calculation is 


performed on the lithium atom in which interelectronic repulsion is ignored completely, the single-particle 
Schrodinger equations are precisely the same as those for the hydrogen atom, apart from the difference in 
nuclear charge. The following lithium atom wavefunction could then be constructed from single-particle 
orbitals 

* = ATxi.i(1)Xi.t(2)XuO) (A1.1.80) 

a form that is obviously symmetric with respect to interchange of particle coordinates. If this wavefunction is 
used to calculate the expectation value of the energy using the exact Hamiltonian (which includes the explicit 
electron-electron repulsion terms), 


= f $*H$di 


(A1.1.81) 


one obtains an energy lower than the actual result, which (see (Al. 1.4.1) ) suggests that there are serious 
problems with this form of the wavefunction. Moreover, a relatively simple analysis shows that ionization 
potentials of atoms would increase monotonically — approximately linearly for small atoms and quadratically 
for large atoms — if the independent-particle picture discussed thus far has any validity. Using a relatively 
simple model that assumes that the lowest lying orbital is a simple exponential, ionization potentials of 13.6, 
23.1, 33.7 and 45.5 electron volts (eV) are predicted for hydrogen, helium, lithium and beryllium, 
respectively. The value for hydrogen (a one-electron system) is exact and that for helium is in relatively good 
agreement with the experimental value of 24.8 eV. However, the other values are well above the actual 
ionization energies of Li and Be (5.4 and 9.3 eV, respectively), both of which are smaller than those of H and 
He! All freshman chemistry students learn that ionization potentials do not increase monotonically with 
atomic number, and that there are in fact many pronounced and more subtle decreases that appear when this 
property is plotted as a function of atomic number. 

There is evidently a grave problem here. The wavefunction proposed above for the lithium atom contains all 
of the particle coordinates, adheres to the boundary conditions (it decays to zero when the particles are 
removed to infinity) and obeys the restrictions P^ = P\3§ = ^23^ = ±( l ) ^ at g° vern the behaviour of the 
exact wavefunctions. Therefore, if no other restrictions are placed on the wavefunctions of multiparticle 
systems, the product wavefunction for lithium 
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must lie in the space spanned by the exact wavefunctions. However, it clearly does not, because it is proven in 
subsection (Al. 1.4.1) that any function expressible as a linear combination of Hamiltonian eigenfunctions 
cannot have an energy lower than that of the exact ground state. This means that there is at least one 
additional symmetry obeyed by all of the exact wavefunctions that is not satisfied by the product form given 
for lithium in equation (A 1.1. 80) . 

This missing symmetry provided a great puzzle to theorists in the early part days of quantum mechanics. 
Taken together, ionization potentials of the first four elements in the periodic table indicate that 
wavefunctions which assign two electrons to the same single-particle functions such as 


$ = XM)XM) (A1.1.82) 

(helium) and 


4> = SxM)xM)XkQ)XhW (M.1.83) 

- 

(beryllium, the operator ^produces the labelled X^X^X^X^ product that is symmetric with respect to 

interchange of particle indices) are somehow acceptable but that those involving three or more electrons in 
one state are not! The resolution of this zweideutigkeit (two-valuedness) puzzle was made possible only by the 
discovery of electron spin, which is discussed below. 

A1. 1.3.3 SPIN AND THE PAULI PRINCIPLE 

In the early 1920s, spectroscopic experiments on the hydrogen atom revealed a striking inconsistency with the 
Bohr model, as adapted by Sommerfeld to account for relativistic effects. Studies of the fine structure 
associated with the n = 4 — » n = 3 transition revealed five distinct peaks, while six were expected from 
arguments based on the theory of interaction between matter and radiation. The problem was ultimately 
reconciled by Uhlenbeck and Goudsmit, who reinterpreted one of the quantum numbers appearing in 
Sommerfeld' s fine structure formula based on a startling assertion that the electron has an intrinsic angular 
momentum independent of that associated with its motion. This idea was also supported by previous 
experiments of Stern and Gerlach, and is now known as electron spin. Spin is a mysterious phenomenon with 
a rather unfortunate name. Electrons are fundamental particles, and it is no more appropriate to think of them 
as charges that resemble extremely small billiard balls than as waves. Although they exhibit behaviour 
characteristic of both, they are in fact neither. Elementary textbooks often depict spin in terms of spherical 
electrons whirling about their axis (a compelling idea in many ways, since it reinforces the Bohr model by 
introducing a spinning planet), but this is a purely classical perspective on electron spin that should not be 
taken literally. 

Electrons and most other fundamental particles have two distinct spin wavefunctions that are degenerate in the 
absence of an external magnetic field. Associated with these are two abstract states which are eigenfunctions 
of the intrinsic spin angular momentum operator $ 

S-a = m s hc m (A1.1.84) 
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The allowed quantum numbers m s are ^and -^, and the corresponding eigenfunctions are usually written as a 

and P, respectively. The associated eigenvalues ^and -i give the projection of the intrinsic angular momentum 
vector along the direction of a magnetic field that can be applied to resolve the degeneracy. The overall spin 
angular momentum of the electron is given in terms of the quantum number s by ^/-v(7TT)fi. For an electron, s 
= |. For a collection of particles, the overall spin and its projection are given in terms of the spin quantum 
numbers S and M<. (which are equal to the corresponding lower-case quantities for single particles) by 
^/$(S + l)&and S ™ j, respectively. S must be positive and can assume either integral or half-integral values, 
and the M s quantum numbers lie in the interval 


Ma = -A\-A+h-5 + 2 5- ],S (A1.1.85) 

where a correspondence to the properties of orbital angular momentum should be noted. The multiplicity of a 
state is given by 2S + 1 (the number of possible M s values) and it is customary to associate the terms singlet 
with S=0 9 doublet with S = i, triplet with S= 1 and so on. 

In the non-relativistic quantum mechanics discussed in this chapter, spin does not appear naturally. Although 


Dirac showed in 1928 that a fourth quantum number associated with intrinsic angular momentum appears in a 
relativistic treatment of the free electron, it is customary to treat spin heuristically. In general, the 
wavefunction of an electron is written as the product of the usual spatial part (which corresponds to a solution 
of the non-relativistic Schrodinger equation and involves only the Cartesian coordinates of the particle) and a 
spin part a, where a is either a or p. A common shorthand notation is often used, whereby 

f = lfViial<* (A1.1.86) 

^=^<ial/?. (A1.1.87) 

In the context of electronic structure theory, the composite functions above are often referred to as spin 
orbitals. When spin is taken into account, one finds that the ground state of the hydrogen atom is actually 
doubly degenerate. The spatial part of the wavefunction is the Schrodinger equation solution discussed in 
section (Al. 1.3.1) , but the possibility of either spin a or p means that there are two distinct overall 
wavefunctions. The same may be said for any of the excited states of hydrogen (all of which are, however, 
already degenerate in the nonrelativistic theory), as the level of degeneracy is doubled by the introduction of 
spin. Spin may be thought of as a fourth coordinate associated with each particle. Unlike Cartesian 
coordinates, for which there is a continuous distribution of possible values, there are only two possible values 
of the spin coordinate available to each particle. This has important consequences for our discussion of 
indistinguishability and symmetry properties of the wavefunction, as the concept of coordinate permutation 
must be amended to include the spin variable of the particles. As an example, the independent-particle ground 
state of the helium atom based on hydrogenic wavefunctions 

;a T U)xi.i(2) (A1.1.88) 

must be replaced by the four possibilities 

Xu(l)Xu(2) (A1.1.89) 
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XliCDXlf© (A1.1.90) 

Xl*0)JClj(2) (A1.1.91) 

Xl.fO)Xl*<2). (A1.1.92) 

While the first and fourth of these are symmetric with respect to particle interchange and thereby satisfy the 
indistinguishability criterion, the other two are not and appropriate linear combinations must be formed. 
Doing so, one finds the following four wavefunctions 

*A'I =XuO)Xl*(2) (A1.1.93) 

0.S2 = yi[Xl,<i)Xl,(2)+Xl^2)JCl,{l)3 (M.1.94) 

*S3 = XI,U)XW(2> (A1.1.95) 

4>A = v ^Ui,0)xi,C2) - xi*(2)xl,(1)] ( A1 - 1 - 96 ) 

where the first three are symmetric with respect to particle interchange and the last is antisymmetric. This 
suggests that under the influence of a magnetic field, the ground state of helium might be resolved into 
components that differ in terms of overall spin, but this is not observed. For the lithium example, there are 


eight possible ways of assigning the spin coordinates, only two of which 

<P = XiM)Xu(2)XuO) (A1-1-97) 

<P = Xiv(l)Xu(2)Zu(3) (A1.1.98) 

satisfy the criterion P..§ = ±§. The other six must be mixed in appropriate linear combinations. However, 
there is an important difference between lithium and helium. In the former case, all assignments of the spin 
variable to the state given by equation (A 1.1. 8 8) produce a product function in which the same state (in terms 
of both spatial and spin coordinates) appears at least twice. A little reflection shows that it is not possible to 
generate a linear combination of such functions that is antisymmetric with respect to all possible interchanges; 
only symmetric combinations such as 

* = v /}Uu(])/i,(2)Jfi,(3) + j{h-Cl)J£j 4 (2)xi,(3) + j(,,(l)^ l ,C2)X].(3)] < A1 - 1 ") 

can be constructed. The fact that antisymmetric combinations appear for helium (where the independent- 
particle ground state made up of hydrogen Is functions is qualitatively consistent with experiment) and not for 
lithium (where it is not) raises the interesting possibility that the exact wavefunction satisfies a condition more 
restrictive than P..\\r = ±\|/ ? namely P..\|/ = -\|/. For reasons that are not at all obvious, or even intuitive, nature 
does indeed enforce this restriction, which is one statement of the Pauli exclusion principle. When this idea is 
first met with, one usually learns an equivalent but less general statement that applies only within the 
independent-particle approximation: no two electrons can have the same quantum numbers. What does this 
mean? Within the independent-particle picture of an atom, each single-particle wavefunction, or orbital, is 
described by the quantum numbers n, /, m^ and (when spin is considered) m s . 
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Since it is not possible to generate antisymmetric combinations of products if the same spin orbital appears 
twice in each term, it follows that states which assign the same set of four quantum numbers twice cannot 
possibly satisfy the requirement P..\|/ = -\|/ ? so this statement of the exclusion principle is consistent with the 
more general symmetry requirement. An even more general statement of the exclusion principle, which can be 
regarded as an additional postulate of quantum mechanics, is 

The wavefunction of a system must be antisymmetric with respect to interchange of the 
coordinates of identical particles y and 8 if they &xq fermions, and symmetric with respect to 
interchange of y and 8 if they are bosons. 

Electrons, protons and neutrons and all other particles that have s = ^are known as fermions. Other particles 
are restricted to s = or 1 and are known as bosons. There are thus profound differences in the quantum- 
mechanical properties of fermions and bosons, which have important implications in fields ranging from 
statistical mechanics to spectroscopic selection rules. It can be shown that the spin quantum number S 
associated with an even number of fermions must be integral, while that for an odd number of them must be 
half-integral. The resulting composite particles behave collectively like bosons and fermions, respectively, so 
the wavefunction symmetry properties associated with bosons can be relevant in chemical physics. One 
prominent example is the treatment of nuclei, which are typically considered as composite particles rather 
than interacting protons and neutrons. Nuclei with even atomic number therefore behave like individual 
bosons and those with odd atomic number as fermions, a distinction that plays an important role in rotational 
spectroscopy of polyatomic molecules. 


A1. 1.3.4 INDEPENDENT-PARTICLE MODELS IN ELECTRONIC STRUCTURE 

At this point, it is appropriate to make some comments on the construction of approximate wavefimctions for 
the many-electron problems associated with atoms and molecules. The Hamiltonian operator for a molecule is 
given by the general form 


* = -7 
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(A1. 1.100) 


It should be noted that nuclei and electrons are treated equivalently in /?, which is clearly inconsistent with the 
way that we tend to think about them. Our understanding of chemical processes is strongly rooted in the 
concept of a potential energy surface which determines the forces that act upon the nuclei. The potential 
energy surface governs all behaviour associated with nuclear motion, such as vibrational frequencies, mean 
and equilibrium internuclear separations and preferences for specific conformations in molecules as complex 
as proteins and nucleic acids. In addition, the potential energy surface provides the transition state and 
activation energy concepts that are at the heart of the theory of chemical reactions. Electronic motion, 
however, is never discussed in these terms. All of the important and useful ideas discussed above derive from 
the Born-Oppenheimer approximation, which is discussed in some detail in section B3.1 . Within this model, 
the electronic states are solutions to the equation 
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t=k$ (A1. 1.101] 


'<J 


where the nuclei are assumed to be stationary. The electronic energies are given by the eigenvalues (usually 
augmented by the wavefunction-independent internuclear repulsion energy) of/?. The functions obtained by 
plotting the electronic energy as a function of nuclear position are the potential energy surfaces described 
above. The latter are different for every electronic state; their shape gives the usual information about 
molecular structure, barrier heights, isomerism and so on. The Born-Oppenheimer separation is also made in 
the study of electronic structure in atoms. However, this is a rather subtle point and is not terribly important in 
applications since the only assumption made is that the nucleus has infinite mass. 

Although a separation of electronic and nuclear motion provides an important simplification and appealing 
qualitative model for chemistry, the electronic Schrodinger equation is still formidable. Efforts to solve it 
approximately and apply these solutions to the study of spectroscopy, structure and chemical reactions form 
the subject of what is usually called electronic structure theory or quantum chemistry. The starting point for 
most calculations and the foundation of molecular orbital theory is the independent-particle approximation. 

For many-electron systems such as atoms and molecules, it is obviously important that approximate 
wavefunctions obey the same boundary conditions and symmetry properties as the exact solutions. Therefore, 
they should be antisymmetric with respect to interchange of each pair of electrons. Such states can always be 
constructed as linear combinations of products such as 

Xi(l)X/(2)X*(3)--.X»(n). (A1.1.102) 


The x are assumed to be spin orbitals (which include both the spatial and spin parts) and each term in the 
product differs in the way that the electrons are assigned to them. Of course, it does not matter how the 
electrons are distributed amongst the % in equation (Al. 1.102), as the necessary subsequent 
antisymmetrization makes all choices equivalent apart from an overall sign (which has no physical 
significance). Hence, the product form is usually written without assigning electrons to the individual orbitals, 
and the set of unlabelled % included in the product represents an electron configuration. It should be noted that 
all of the single-particle orbitals % in the product are distinct. A very convenient method for constructing 
antisymmetrized combinations corresponding to products of particular single-particle states is to form the 
Slater determinant 


* = 


/ 


V^! 


(A1. 1.103) 


Xi(l) »(2) Xr(3) ■» Xi(n) 
XjO) Xj(2) XjO) — Xj(n) 
»0) XkQ) x*0> — x*(h) 


z,,U) X,(2) x«0) ■ X q (n) 
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where the nominal electron configuration can be determined by simply scanning along the main diagonal of 
the matrix. A fundamental result of linear algebra is that the determinant of a matrix changes sign when any 
two rows or columns are interchanged. Inspection of equation (ALL 103) shows that interchanging any two 
columns of the Slater determinant corresponds to interchanging the labels of two electrons, so the Pauli 
exclusion principle is automatically incorporated into this convenient representation. Whether all orbitals in a 
given row are identical and all particle labels the same in each column (as above) or vice versa is not 
important, as determinants are invariant with respect to transposition. In particular, it should be noted that the 
Slater determinant necessarily vanishes when two of the spin orbitals are identical, reflecting the alternative 
statement of the Pauli principle — no two electrons can have the same quantum number. One qualification 

which should be stated here is that Slater determinants are not necessarily eigenfunctions of the S 2 operator, 
and it is often advantageous to form linear combinations of those corresponding to electron configurations that 
differ only in the assignment of the spin variable to the spatial orbitals. The resulting functions § are 
sometimes known as spin-adapted configurations. 


Within an independent-particle picture, there are a very large number of single-particle wavefunctions % 
available to each particle in the system. If the single-particle Schrodinger equations can be solved exactly, 
then there are often an infinite number of solutions. Approximate solutions are, however, necessitated in most 
applications, and some subtleties must be considered in this case. The description of electrons in atoms and 
molecules is often based on the Hartree-Fock approximation, which is discussed in section B3.1 of this 
encyclopedia. In the Hartree-Fock method, only briefly outlined here, the orbitals are chosen in such a way 
that the total energy of a state described by the Slater determinant that comprises them is minimized. There 
are cogent reasons for using an energy minimization strategy that are based on the variational principle 
discussed later in this section. The Hartree-Fock method derives from an earlier treatment of Hartree, in 
which indistinguishability and the Pauli principle were ignored and the wavefunction expressed as in equation 
(ALL 102) . However, that approach is not satisfactory because it can lead to pathological solutions such as 
that discussed earlier for lithium. In Hartree-Fock theory, the orbital optimization is achieved at the expense 
of introducing a very complicated single-particle potential term v.. This potential depends on all of the other 
orbitals in which electrons reside, requires the evaluation of difficult integrals and necessitates a self- 
consistent (iterative) solution. The resulting one-electron Hamiltonian is known as the Fock operator, and it 
has (in principle) an infinite number of eigenfunctions; a subset of these are exactly the same as the % that 
correspond to the occupied orbitals upon which it is parametrized. The resulting equations cannot be solved 
analytically; for atoms, exact solutions for the occupied orbitals can be determined by numerical methods, but 


the infinite number of unoccupied functions are unknown apart from the fact that they must be orthogonal to 
the occupied ones. In molecular calculations, it is customary to assume that the orbitals % can be written as 
linear combinations of a fixed set of N basis functions, where N is typically of the order of tens to a few 
hundred. Iterative solution of a set of matrix equations provides approximations for the orbitals describing the 
n electrons of the molecule and N- n unoccupied orbitals. 

The choice of basis functions is straightforward in atomic calculations. It can be demonstrated that all 
solutions to an independent-particle Hamiltonian have the symmetry properties of the hydrogenic 
wavefunctions. Each is, or can be written as, an eigenfunction of the L z and L operators and involves a radial 

part multiplied by a spherical harmonic. Atomic calculations that use basis sets (not all of them do) typically 
choose functions that are similar to those that solve the Schrodinger equation for hydrogen. If the complete set 
of hydrogenic functions is used, the solution to the basis set equations are the exact Hartree-Fock solutions. 
However, practical considerations require the use of finite basis sets; the corresponding solutions are therefore 
only approximate. Although the distinction is rarely made, it is preferable to refer to these as self-consistent 
field (SCF) solutions and energies in order to distinguish them from the exact Hartree-Fock results. As the 
quality of a basis is improved, the energy approaches that of the Hartree-Fock solution from above. 
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In molecules, things are a great deal more complicated. In principle, one can always choose a subset of all the 
hydrogenic wavefunctions centred at some point in space. Since the resulting basis functions include all 
possible electron coordinates and Slater determinants constructed from them vanish at infinity and satisfy the 
Pauli principle, the corresponding approximate solutions must lie in the space spanned by the exact solutions 
and be qualitatively acceptable. In particular, use of enough basis functions will result in convergence to the 
exact Hartree-Fock solution. Because of the difficulties involved with evaluating integrals involving 
exponential hydrogenic functions centred at more than one point in space, such single-centre expansions were 
used in the early days of quantum chemistry. The main drawback is that convergence to the exact Hartree- 
Fock result is extraordinarily slow. The states of the hydrogen molecule are reasonably well approximated by 
linear combinations of hydrogenic functions centred on each of the two nuclei. Hence, a more practical 
strategy is to construct a basis by choosing a set of hydrogenic functions for each atom in the molecule (the 
same functions are usually used for identical atoms, whether or not they are equivalent by symmetry). Linear 
combinations of a relatively small number of these functions are capable of describing the electronic 
distribution in molecules much better than is possible with a corresponding number of functions in a single- 
centre expansion. This approach is often called the linear combination of atomic orbitals (LCAO) 
approximation, and is used in virtually all molecular SCF calculations performed today. The problems 
associated with evaluation of multicentre integrals alluded to above was solved more than a quarter-century 
ago by the introduction of Gaussian — rather than exponential — basis functions, which permit all of the 
integrals appearing in the Fock operator to be calculated analytically. Although Gaussian functions are not 
hydrogenic functions (and are inferior basis functions), the latter can certainly be approximated well by linear 
combinations of the former. The ease of integral evaluation using Gaussian functions makes them the standard 
choice for practical calculations. The importance of selecting an appropriate basis set is of great practical 
importance in quantum chemistry and many other aspects of atomic and molecular quantum mechanics. An 
illustrative example of basis set selection and its effect on calculated energies is given in subsection 
(Al. 1.4.2) . While the problem studied there involves only the motion of a single particle in one dimension, an 
analogy with the LCAO and single-centre expansion methods should be apparent, with the desirable features 
of the former clearly illustrated. 

Even Hartree-Fock calculations are difficult and expensive to apply to large molecules. As a result, further 
simplifications are often made. Parts of the Fock operator are ignored or replaced by parameters chosen by 
some sort of statistical procedure to account, in an average way, for the known properties of selected 


compounds. While calculating properties that have already been measured experimentally is of limited 
interest to anyone other than theorists trying to establish the accuracy of a method, the hope of these 
approximate Hartree-Fock procedures (which include the well known Hiickel approximation and are 
collectively known as semiempirical methods) is that the parametrization works just as well for both 
unmeasured properties of known molecules (such as transition state structures) and the structure and 
properties of transient or unknown species. No further discussion of these approaches is given here (more 
details are given in section B3.1 and section B3. 2 ); it should only be emphasized that all of these methods are 
based on the independent-particle approximation. 

Regardless of how many single-particle wavefunctions % are available, this number is overwhelmed by the 
number of ^-particle wavefunctions § (Slater determinants) that can be constructed from them. For example, 
if a two-electron system is treated within the Hartree-Fock approximation using 100 basis functions, both of 
the electrons can be assigned to any of the % obtained in the calculation, resulting in 10,000 two-electron 
wavefunctions. For water, which has ten electrons, the number of electronic wavefunctions with equal 
numbers of a and P spin electrons that can be constructed from 100 single-particle wavefunctions is roughly 

10 ! The significance of these other solutions may be hard to grasp. If one is interested solely in the 
electronic ground state and its associated potential energy surface (the focus of investigation in most quantum 
chemistry studies), these solutions play no role whatsoever within the HF-SCF approximation. Moreover, one 
might think (correctly) that solutions obtained by putting an electron in one of the 
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unoccupied orbitals offers a poor treatment of excited states since only the occupied orbitals are optimized. 
However, there is one very important feature of the extra solutions. If the HF solution has been obtained and 
all (an infinite number) of virtual orbitals available, then the basic principles of quantum mechanics imply that 
the exact wavefunction can be written as the sum of Slater determinants 


= 5^Q*fc 


l-a^i - / >m (A1.1.104) 


where the § k correspond to all possible electron configurations. The individual Slater determinants are thus 
seen to play a role in the representation of the exact wavefunction that is analogous to that played by the 
hydrogenic (or LCAO) functions in the expansion of the Hartree-Fock orbitals. The Slater determinants are 
sometimes said to form an n-electron basis, while the hydrogenic (LCAO) functions are the one-electron 
basis. 

A similar expansion can be made in practical finite-basis calculations, except that limitations of the basis set 
preclude the possibility that the exact wavefunction lies in the space spanned by the available (|). However, it 
should be clear that the formation of linear combinations of the finite number of (L offers a way to better 
approximate the exact solution. In fact, it is possible to obtain by this means a wavefunction that exactly 
satisfies the electronic Schrodinger equation when the assumption is made that the solution must lie in the 
space spanned by the ^-electron basis functions (|). However, even this is usually impossible, and only a select 
number of the § k are used. The general principle of writing ^-electron wavefunctions as linear combinations 
of Slater determinants is known as configuration interaction, and the resultant improvement in the 
wavefunction is said to account for electron correlation. The origin of this term is easily understood. 
Returning to helium, an inspection of the Hartree-Fock wavefunction 

* = yi"tXi*C Djf i*(2) — Xi«<2)jcu(l>] (M.1.105) 


exhibits some rather unphysical behaviour: the probability of finding one electron at a particular point in 
space is entirely independent of where the other electron is! In particular, the probability does not vanish 
when the two particles are coincident, the associated singularity in the interelectronic repulsion potential 
notwithstanding. Of course, electrons do not behave in this way, and do indeed tend to avoid each other. 
Hence, their motion is correlated, and this qualitative feature is absent from the Hartree-Fock approximation 
when the electrons have different spins. When they are of like spin, then the implicit incorporation of the 
Pauli principle into the form of the Slater determinant allows for some measure of correlation (although these 
like-spin effects are characteristically overestimated) since the wavefunction vanishes when the coordinates of 
the two electrons coincide. Treatments of electron correlation and the related concept of correlation energy 
(the difference between the Hartree-Fock and exact non-relativistic results) take a number of different forms 
that differ in the strategies used to determine the expansion coefficients c k and the energy (which is not always 
given by the expectation value of the Hamiltonian over a function of the form equation (A 1.1. 104)). The basic 
theories underlying the most popular choices are the variational principle and perturbation theory, which are 
discussed in a general way in the remainder of this section. Specific application of these tools in electronic 
structure theory is dealt with in section B3.1 . Before leaving this discussion, it should also be mentioned that a 
concept very similar to the independent-particle approximation is used in the quantum-mechanical treatment 
of molecular vibrations. In that case, it is always possible to solve the Schrodinger equation for nuclear 
motion exactly if the potential energy function is assumed to be quadratic. 
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The corresponding functions %., %. etc. then define what are known as the normal coordinates of vibration, 
and the Hamiltonian can be written in terms of these in precisely the form given by equation (A 1.1. 69) , with 
the caveat that each term refers not to the coordinates of a single particle, but rather to independent 
coordinates that involve the collective motion of many particles. An additional distinction is that treatment of 
the vibrational problem does not involve the complications of antisymmetry associated with identical 
fermions and the Pauli exclusion principle. Products of the normal coordinate functions nevertheless describe 
all vibrational states of the molecule (both ground and excited) in very much the same way that the product 
states of single-electron functions describe the electronic states, although it must be emphasized that one 
model is based on independent motion and the other on collective motion, which are qualitatively very 
different. Neither model faithfully represents reality, but each serves as an extremely useful conceptual model 
and a basis for more accurate calculations. 


A1. 1.4 APPROXIMATING EIGENVALUES OF THE HAMILTONIAN 

Since its eigenvalues correspond to the allowed energy states of a quantum-mechanical system, the time- 
independent Schrodinger equation plays an important role in the theoretical foundation of atomic and 
molecular spectroscopy. For cases of chemical interest, the equation is always easy to write down but 
impossible to solve exactly. Approximation techniques are needed for the application of quantum mechanics 
to atoms and molecules. The purpose of this subsection is to outline two distinct procedures — the variational 
principle and perturbation theory — that form the theoretical basis for most methods used to approximate 
solutions to the Schrodinger equation. Although some tangible connections are made with ideas of quantum 
chemistry and the independent-particle approximation, the presentation in the next two sections (and example 
problem) is intended to be entirely general so that the scope of applicability of these approaches is not 
underestimated by the reader. 

A1. 1.4.1 THE VARIATIONAL PRINCIPLE 

Although it may be impossible to solve the Schrodinger equation for a specific choice of the Hamiltonian, it is 


always possible to guess! While randomly chosen functions are unlikely to be good approximations to the 
exact quantum-mechanical wavefunction, an educated guess can usually be made. For example, if one is 
interested in the ground state of a single particle subjected to some potential energy function, the qualitative 
features discussed in subsection (Al.1.2.3) can be used as a guide in constructing a guess. Specifically, an 
appropriate choice would be one that decays to zero at positive and negative infinity, has its largest values in 
regions where the potential is deepest, and has no nodes. For more complicated problems — especially those 
involving several identical particles — it is not so easy to intuit the form of the wavefunction. Nevertheless, 
guesses can be based on solutions to a (perhaps grossly) simplified Schrodinger equation, such as the Slater 
determinants associated with independent-particle models. 

In general, approaches based on guessing the form of the wavefunction fall into two categories. In the first, 
the ground-state wavefunction is approximated by a function that contains one or more nonlinear parameters. 
For example, if exp(ax) is a solution to a simplified Schrodinger equation, then the function exp(fex) provides 
a plausible guess for the actual problem. The parameter b can then be varied to obtain the most accurate 
description of the exact ground state. However, there is an apparent contradiction here. If the exact ground- 
state wavefunction and energy are not known (and indeed impossible to obtain analytically), then how is one 
to determine the best choice for the parameter bl 
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The answer to the question that closes the preceding paragraph is the essence of the variational principle in 
quantum mechanics. If a guessed or trial wavefunction § is chosen, the energy s obtained by taking the 
expectation value of the Hamiltonian (it must be emphasized that the actual Hamiltonian is used to evaluate 
the expectation value, rather than the approximate Hamiltonian that may have been used to generate the form 
of the trial function) over § must be higher than the exact ground-state energy. It seems rather remarkable that 
the mathematics seems to know precisely where the exact eigenvalue lies, even though the problem cannot be 
solved exactly. However, it is not difficult to prove that this assertion is true. The property of mathematical 
completeness tells us that our trial function can be written as a linear combination of the exact wavefunctions 
(so long as our guess obeys the boundary conditions and fundamental symmetries of the problem), even when 
the latter cannot be obtained. Therefore one can always write 


tf> = ^Qlfa 


(A1. 1.106) 


where \\f k is the exact Hamiltonian eigenfunction corresponding to eigenvalue X k , and ordered so that A, Q < A^ 
< X 2 ' ' m . Assuming normalization of both the exact wavefunctions and the trial function, the expectation 
value of the Hamiltonian is 


- / ( £ w) "( !>**) dT = E E <;<> / *; H ** 


<f>*H<pdr 

(A1. 1.107) 
Since the v)/^ represent exact eigenfunctions of the Hamiltonian, equation (A 1.1. 107) simplifies to 


= £ £ c >** f *;** = £***** = E ' c *i a **- 


f " -= l^i^CjCki^k / VjVk = ^c k c k Ai = ^ IctTAJt. (A1. 1.108) 


The assumption of normalization imposed on the trial function means that 

/. I c >l~ = J (A1. 1.109) 


hence 

|2 i^ i2 .^ .2 


Inserting equation (A 1.1.1 10) into equation (A 1.1. 10 8) yields 
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The first term on the right-hand side of the equation for s is the exact ground-state energy. All of the 
remaining contributions involve norms of the expansion coefficients and the differences X^-Xq, both of 
which must be either positive or zero. Therefore, s is equal to the ground-state energy plus a number that 
cannot be negative. In the case where the trial function is precisely equal to the ground-state wavefunction, 
then s = X ; otherwise s > X^. Hence, the expectation value of the Hamiltonian with respect to any arbitrarily 
chosen trial function provides an upper bound to the exact ground-state energy. The dilemma raised earlier — 
how to define the best value of the variational parameter b — has a rather straightforward answer, namely the 
choice that minimizes the value of s, known as the variational energy. 

A concrete example of the variational principle is provided by the Hartree-Fock approximation. This method 
asserts that the electrons can be treated independently, and that the ^-electron wavefunction of the atom or 
molecule can be written as a Slater determinant made up of orbitals. These orbitals are defined to be those 
which minimize the expectation value of the energy. Since the general mathematical form of these orbitals is 
not known (especially in molecules), then the resulting problem is highly nonlinear and formidably difficult to 
solve. However, as mentioned in subsection (Al. 1.3.2) , a common approach is to assume that the orbitals can 
be written as linear combinations of one-electron basis functions. If the basis functions are fixed, then the 
optimization problem reduces to that of finding the best set of coefficients for each orbital. This tremendous 
simplification provided a revolutionary advance for the application of the Hartree-Fock method to molecules, 
and was originally proposed by Roothaan in 1951. A similar form of the trial function occurs when it is 
assumed that the exact (as opposed to Hartree-Fock) wavefunction can be written as a linear combination of 
Slater determinants (see equation (AT . 1 . 1 04) ). In the conceptually simpler latter case, the objective is to 
minimize an expression of the form 


= f <p*H$dt 


(A1. 1.112) 


where § is parametrized as 


♦ -£ 


CftX* (A1.1.113) 


and both the (fixed functions) % h and § are assumed to be normalized. 


The representation of trial functions as linear combinations of fixed basis functions is perhaps the most 
common approach used in variational calculations; optimization of the coefficients c k is often said to be an 
application of the linear variational principle. Although some very accurate work on small atoms (notably 
helium and lithium) has been based on complicated trial functions with several nonlinear parameters, attempts 
to extend these calculations to larger atoms and molecules quickly runs into formidable difficulties (not the 
least of which is how to choose the form of the trial function). Basis set expansions like that given by equation 
(Al.1.1 13) are much simpler to design, and the procedures required to obtain the coefficients that minimize s 
are all easily carried out by computers. 
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For the example discussed above, where exp(ax) is the solution to a simpler problem, a trial function using 
five basis functions 


£ = a ^~ n ' + c 2 £ itr ~ ]u + C3 e ax +r 4 e^ m + ^e { " +2) '* (A1. 1.114) 

could be used instead of exp(fex) if the exact function is not expected to deviate too much from exp(ax). What 
is gained from replacing a trial function containing a single parameter by one that contains five? To see, 
consider the problem of how coefficients can be chosen to minimize the variational energy s, 

f$*H<f>dr 

€ = ^tt . (A1.1.115) 

The denominator is included in equation (Al.1.1 15) because it is impossible to ensure that the trial function is 
normalized for arbitrarily chosen coefficients Cr. In order to minimize the value of s for the trial function 


I 


4 = £<*** (A1.1.116) 


it is necessary (but not sufficient) that its first partial derivatives with respect to all expansion coefficients 
vanish, viz 

d* lh d€ 

T— = — = — = ■■■ = 0. (A1.1.117) 

It is worthwhile, albeit tedious, to work out the condition that must satisfied in order for equation (Al.1.1 17) 
to hold true. Expanding the trial function according to equation (Al.1.1 13) , assuming that the basis functions 
and expansion coefficients are real and making use of the technique of implicit differentiation, one finds 


8c k 
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g7 EE c ' f J^ =2^£j(ffjk-eS Jk ) (A1. 1.118) 


where shorthand notations for the overlap matrix elements 


Sjk = I XjXkdz 


and Hamiltonian matrix elements 


(A1. 1.119) 
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(A1. 1.120) 


have been introduced. Since the term multiplying the derivative of the expansion coefficient is simply the 
norm of the wavefunction, the variational condition equation (A 1.1. 117) is satisfied if the term on the right- 
hand side of equation (Al . 1 . 1 1 8) vanishes for all values of k. Specifically, the set of homogeneous linear 
equations corresponding to the matrix expression 


Ui c 2 


t> ) 
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//ojv - £5<wv \ /0 


Wiiv — f5ijv 
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(A1. 1.121) 


must be satisfied. It is a fundamental principle of linear algebra that systems of equations of this general type 
are satisfied only for certain choices of 8, namely those for which the determinant 


//iu — eSm //|| — eSii 
"wo — ^^.vt> "_vi — ^5v] 
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(A1. 1.122) 


is identically equal to zero. There are precisely lvalues of s that satisfy this condition, some of which might 
be degenerate, and their determination constitutes what is known as the generalized eigenvalue problem. 
While this is reasonably well suited to computation, a further simplification is usually made. When suited to 
the problem under consideration, the basis functions are usually chosen to be members of an orthonormal set. 
In other cases (for example, in the LCAO treatment of molecules) where this is not possible, the original basis 
functions %< corresponding to the overlap matrix S' can be subjected to the orthonormalizing transformation 


& = 5Z xi^ik 


f 


(A1. 1.123) 


where X is the reciprocal square root of the overlap matrix in the primed basis, 

x^s'- |/3 T 


(A1. 1.124) 


The simplest way to obtain X is to diagonalize S', take the reciprocal square roots of the eigenvalues and then 
transform the matrix back to its original representation, i.e. 
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X = Cs l2 C r 


(A1. 1.125) 


where s is the diagonal matrix of reciprocal square roots of the eigenvalues, and C is the matrix of 
eigenvectors for the original S' matrix. Doing this, one finds that the transformed basis functions are 
orthonormal. In terms of implementation, elements of the Hamiltonian are usually first evaluated in the 
primed basis, and the resulting matrix representation of H is then transformed to the orthogonal basis (H = 
X^H'X). 

In an orthonormal basis, SV . = 1 if k=j, and vanishes otherwise. The problem of finding the variational energy 
of the ground state then reduces to that of determining the smallest value of s that satisfies 
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(A1. 1.126) 


a task that modern digital computers can perform very efficiently. Given an orthonormal basis, the variational 
problem can be solved by diagonalizing the matrix representation of the Hamiltonian, H. Associated with 
each eigenvalue s is an eigenvector (c Q , c 1? c 2 , . . ., c-^) that tells how the basis functions are combined in the 
corresponding approximate wavefunction § as parametrized by equation (A 1.1. 116) . That the lowest 
eigenvalue 8 of H provides an upper bound to the exact ground-state energy has already been proven; it is also 
true (but will not be proved here) that the first excited state of the actual system must lie below the next 
largest eigenvalue Xp and indeed all remaining eigenvalues provide upper bounds to the corresponding 
excited states. That is, 


^0 — ^-0* f] ^ ^1 1&2 — X; fc . . . , £*f > A .v- 


(A1. 1.127) 


The equivalence between variational energies and the exact eigenvalues of the Hamiltonian is achieved only 
in the case where the corresponding exact wavefunctions can be written as linear combinations of the basis 
functions. Suppose that the Schrodinger equation for the problem of interest cannot be solved, but another 
simpler problem that involves precisely the same set of coordinates lends itself to an analytic solution. In 
practice, this can often be achieved by ignoring certain interaction terms in the Hamiltonian, as discussed 
earlier. Since the eigenfunctions of the simplified Hamiltonian form a complete set, they provide a 
conceptually useful basis since all of the eigenfunctions of the intractable Hamiltonian can be written as linear 
combinations of them (for example, Slater determinants for electrons or products of normal mode 
wavefunctions for vibrational states). In this case, diagonalization of H in this basis of functions provides an 
exact solution to the Schrodinger equation. It is worth pausing for a moment to analyse what is meant by this 
rather remarkable statement. One simply needs to ignore interaction terms in the Hamiltonian that preclude an 
analytic determination of the stationary states and energies of the system. The corresponding Schrodinger 
equation can then be solved to provide a set of orthonormal basis functions, and the integrals that represent the 
matrix elements of H 


Htj — j jtfHxjdz 


(A1. 1.128) 
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computed. Diagonalization of the resulting matrix provides the sought-after solution to the quantum- 
mechanical problem. Although this process replaces an intractable differential equation by a problem in linear 
algebra, the latter offers its own insurmountable hurdle: the dimension of the matrix (equal to the number of 
rows or columns) is equal to the number of functions included in the complete set of solutions to the 
simplified Schrodinger equation. Regrettably, this number is usually infinite. At present, special algorithms 
can be used with modern computers to obtain eigenvalues of matrices with dimensions of about 100 million 
relatively routinely, but this still falls far short of infinity. Therefore, while it seems attractive (and much 
simpler) to do away with the differential equation in favour of a matrix diagonalization, it is not a magic bullet 
that makes exact quantum-mechanical calculations a possibility. 

In order to apply the linear variational principle, it is necessary to work with a matrix sufficiently small that it 
can be diagonalized by a computer; such calculations are said to employ a finite basis. Use of a finite basis 
means that the eigenvalues of H are not exact unless the basis chosen for the problem has the miraculous (and 
extraordinarily unlikely) property of being sufficiently flexible to allow one or more of the exact solutions to 
be written as linear combinations of them. For example, if the intractable system Hamiltonian contains only 
small interaction terms that are ignored in the simplified Hamiltonian used to obtain the basis functions, then 
X is probably a reasonably good approximation to the exact ground-state wavefunction. At the very least, one 
can be relatively certain that it is closer to \|/ than are those that correspond to the thousandth, millionth and 
billionth excited states of the simplified system. Hence, if the objective of a variational calculation is to 
determine the ground-state energy of the system, it is important to include % an d other solutions to the 
simplified problem with relatively low lying energies, while % 1 000 000 and other high lying solutions can be 
excluded more safely. 

A1. 1.4.2 EXAMPLE PROBLEM: THE DOUBLE-WELL OSCILLATOR 

To illustrate the use of the variational principle, results are presented here for calculations of the five lowest 
energy states (the ground state and the first four excited states) of a particle subject to the potential 


V[q) =Q.05q 4 - q 2 (A1. 1.129) 

which is shown in figure Al.1.3 . The potential goes asymptotically to infinity (like that for the harmonic 
oscillator), but exhibits two symmetrical minima at q — dz^Tci. and a maximum at the origin. This function is 
known as a double well, and provides a qualitative description of the potential governing a variety of 
quantum-mechanical processes, such as motion involved in the inversion mode of ammonia (where the 
minima play the role of the two equivalent pyramidal structures and the maximum that of planar NH 3 ). For 
simplicity, the potential is written in terms of the dimensionless coordinate q defined by 


fmk\* 


(A1. 1.130) 


where x is a Cartesian displacement and k is a constant with units of (mass)(time) . The corresponding 
Schrodinger 
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equation can be written as 


Id ,1 


2dy 


£^ 


(A1. 1.131) 


where the energy is given as a multiple of tt 5 it 2 /t». This value corresponds to hv where v is the frequency 
corresponding to a quantum harmonic oscillator with force constant k. 


I 



Figure Al.1.3. Potential function used in the variational calculations. Note that the energies of all states lie 
above the lowest point of the potential (V=-5), which occurs at q — ±VTo,- 

It is not possible to solve this equation analytically, and two different calculations based on the linear 
variational principle are used here to obtain the approximate energy levels for this system. In the first, 

eigenfunctions corresponding to the potential V= 2q (this corresponds to the shape of the double-well 
potential in the vicinity of its minima) are used as a basis. It should be noted at the outset that these functions 
form a complete set, and it is therefore possible to write exact solutions to the double-well oscillator problem 
in terms of them. However, since we expect the ground-state wavefunction to have maximum amplitude in the 
regions around q = dzVTT)., it is unlikely that the first few harmonic oscillator functions (which have maxima 
closer to the origin) are going to provide a good representation of the exact ground state. The first four 
eigenvalues of the potential are given in the table below, where N indicates 
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the size of the variational basis which includes the TVlowest energy harmonic oscillator functions centred at 
the origin. 


N 


2 

0.259 37 

0.796 87 



4 

-0.467 37 

-0.358 63 

-1.989 99 

-3.248 62 

6 

-1.414 39 

-1.051 71 

-0.689 35 

-1 .434 57 

8 

-2.225 97 

-1.850 67 

-0.097 07 

-0.396 14 

10 

-2.891 74 

-2.580 94 

-0.358 57 

-0.339 30 

20 

-4.021 22 

^.012 89 

-2.162 21 

-2.125 38 

30 

-4.026 63 

-4.026 60 

-2.204 1 1 

-2.200 79 

40 

-4.026 63 

-4.026 60 

-2.204 1 1 

-2.200 79 

50 

-4.026 63 

-4.026 60 

-2.204 1 1 

-2.200 79 


Note that the energies decrease with increasing size of the basis set, as expected from the variational principle. 
With 30 or more functions, the energies of the four states are well converged (to about one part in 100,000). In 
figure A 1.1. 4 the wavefunctions of the ground and first excited states of the system calculated with 40 basis 
functions are shown. As expected, the probability density is localized near the symmetrically disposed minima 
on the potential. The ground state has no nodes and the first excited state has a single node. The ground-state 
wavefunction calculated with only eight basis functions (shown in figure Al.1.5 is clearly imperfect. The 
rapid oscillations in the wavefunction are not real, but rather an artifact of the incomplete basis used in the 
calculation. A larger number of functions is required to reduce the amplitude of the oscillations. 



Figure Al.1.4. Wavefunctions for the four lowest states of the double-well oscillator. The ground-state 
wavefunction is at the bottom and the others are ordered from bottom to top in terms of increasing energy. 
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Figure Al.1.5. Ground state wavefunction of the double-well oscillator, as obtained in a variational 
calculation using eight basis functions centred at the origin. Note the spurious oscillatory behaviour near the 
origin and the location of the peak maxima, both of which are well inside the potential minima. 

The form of the approximate wavefunctions suggests another choice of basis for this problem, namely one 
comprising some harmonic oscillator functions centred about one minimum and additional harmonic 
oscillator functions centred about the other minimum. The only minor difficulty in this calculation is that the 
basis set is not orthogonal (which should be clear simply by inspecting the overlap of the ground-state 
harmonic oscillator wavefunctions centred at the two points) and an orthonormalization based on equation 

(Al. 1.123) , equation (A 1.1. 124) and equation (A 1.1. 125) is necessary. Placing an equal number of V= 2q 
harmonic oscillator functions at the position of each minimum (these correspond to solutions of the harmonic 
oscillator problems with V = 2{q — yiti) 2 and V = 2{q + \ZlO) 2 , respectively) yields the eigenvalues given 

below for the four lowest states (in each case, there are N/2 functions centred at each point). 
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N 


2 

-3.990 62 

-3.990 62 



4 

-4.01787 

^.01787 

-1 .92588 

-1 .92588 

6 

-4.01851 

-4.01851 

-2.12522 

-2.12521 

8 

-4.02523 

^.02523 

-2.14247 

-2.14245 

10 

-4.02632 

^.02632 

-2.17690 

-2.17680 

20 

-4.02663 

^.02660 

-2.20290 

-2.20064 

30 

-4.02663 

^.02660 

-2.20411 

-2.20079 

40 

-4.02663 

^.02660 

-2.20411 

-2.20079 

50 

-4.02663 

-4.02660 

-2.20411 

-2.20079 


These results may be compared to those obtained with the basis centred at q = 0. The rate of convergence is 
faster in the present case, which attests to the importance of a carefully chosen basis. It should be pointed out 
that there is a clear correspondence between the two approaches used here and the single-centre and LCAO 
expansions used in molecular orbital theory; the reader should appreciate the advantages of choosing an 
appropriately designed multicentre basis set in achieving rapid convergence in some calculations. Finally, in 
figure Al.1.6 the ground-state wavefunctions calculated with a mixed basis of eight functions (four centred 
about each of the two minima) are displayed. Note that oscillations seen in the single-centre basis calculation 
using the same number of functions are completely missing in the non-orthogonal basis calculation. 



Figure Al.1.6. Ground-state wavefunction of the double-well oscillator, as obtained in a variational 
calculation using four basis functions centred at tj = VTOand four centred at <y = — */To„Note the absence of a 
node at the origin. 

A1.1.4.3 PERTURBATION THEORY 

Calculations that employ the linear variational principle can be viewed as those that obtain the exact solution 
to an approximate problem. The problem is approximate because the basis necessarily chosen for practical 
calculations is not sufficiently flexible to describe the exact states of the quantum-mechanical system. 
Nevertheless, within this finite basis, the problem is indeed solved exactly: the variational principle provides a 
recipe to obtain the best possible solution in the space spanned by the basis functions. In this section, a 
somewhat different approach is taken for obtaining approximate solutions to the Schrodinger equation. 
Instead of obtaining exact eigenvalues of H in a finite basis, a strategy is developed for determining 
approximate eigenvalues of the exact matrix representation of/?. It can also be used (and almost always is in 
practical calculations) to obtain approximate eigenvalues to approximate (incomplete basis) Hamiltonian 
matrices that are nevertheless much larger in dimension than those that can be diagonalized exactly. The 
standard textbook presentation of this technique, which is known as perturbation theory, generally uses the 
Schrodinger differential equation as the starting point. However, some of the generality and usefulness of the 
technique can be lost in the treatment. Students may not come away with an appreciation for the role of linear 
algebra in perturbation theory, nor do they usually grasp the (approximate problem, exact answer)/(right — or 
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at least less approximate — problem/approximate answer) distinction between matrix diagonalization in the 
linear variational principle and the use of perturbation theory. 

In perturbation theory, the Hamiltonian is divided into two parts. One of these corresponds to a Schrodinger 
equation that can be solved exactly 


ffr 


t>X* 


MJJ 


= A 




(A1. 1.132) 


while the remainder of the Hamiltonian is designated here as V. The orthonormal eigenfunctions x^of the 
unperturbed, or zeroth-order Hamiltonian /? form a convenient basis for a matrix representation of the 

Hamiltonian /?. Diagonalization of H gives the exact quantum-mechanical energy levels if the complete set of 
xl } is used, and approximate solutions if the basis is truncated. Instead of focusing on the exact eigenvalues of 
H, however, the objective of perturbation theory is to approximate them. The starting point is the matrix 
representation of H^ and V, which will be designated as h 
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respectively, where the matrix elements A., and v.. are given by the integrals 
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(A1. 1.135) 


(A1. 1.136) 


Note that h is simply the diagonal matrix of zeroth-order eigenvalues X?\ In the following, it will be assumed 
that the zeroth-order eigenfunction x £ ^'is a reasonably good approximation to the exact ground-state 
wavefunction (meaning that J.JJ" ^ Aq)) 5 and h and v will be written in the compact representations 


h = 


v = 




(A1. 1.137) 
(A1. 1.138) 


It is important to realize that while the uppermost diagonal elements of these matrices are numbers, the other 
diagonal element is a matrix of dimension N. Specifically, these are the matrix representations of Hq and Fin 
the basis q which consists of all j^'in the original set, apart from %^\ i.e. 


y — 1X1 * Xi x*t j- 


(A1. 1.139) 


The off-diagonal elements in this representation of h and v are the zero vector of length N (for h) and matrix 
elements which couple the zeroth-order ground-state eigenfunction jf^to members of the set q (for v): 


The exact ground-state eigenvalue A, Q and corresponding eigenvector 

■■(a 


(A1. 1.140) 


(A1. 1.141) 


clearly satisfy the coupled equations 


"OC'-O - Ho^ = ^0 (A1 .1 .142) 

H, |( i<o + H, n O v; = fy*0* (A1 . 1 . 1 43) 

The latter of these can be solved for c 

c* = [VI - H^r'lVCft (A1.1.144) 

(the TV by TV identity matrix is represented here and in the following by 1) and inserted into equation 
(Al . 1 .142) to yield the implicit equation 
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Au = {Mu + Ht^l -H W ]-'M. (A1. 1.145) 

Thus, one can solve for the eigenvalue iteratively, by guessing \^ evaluating the right-hand side of equation 
(A 1.1. 145), using the resulting value as the next guess and continuing in this manner until convergence is 
achieved. However, this is not a satisfactory method for solving the Schrodinger equation, because the 
problem of diagonalizing a matrix of dimension N + 1 is replaced by an iterative procedure in which a matrix 
of dimension N must be inverted for each successive improvement in the guessed eigenvalue. This is an even 
more computationally intensive problem than the straightforward diagonalization approach associated with 
the linear variational principle. 

Nevertheless, equation (A 1.1. 145) forms the basis for the approximate diagonalization procedure provided by 
perturbation theory. To proceed, the exact ground-state eigenvalue and corresponding eigenvector are written 
as the sums 

c =c {U} + c (l) +c <2) + --- (A1. 1.146) 

and 

Xv=Xf+k^+Xf+-^ (A1. 1.147) 

where c® and /.J^are said to be kth-order contributions in the perturbation expansion. What is meant here by 

order? Ultimately, the various contributions to c and A, Q will be written as matrix products involving the 
unperturbed Hamiltonian matrix h and the matrix representation of the perturbation v. The order of a 
particular contribution is defined by the number of times v appears in the corresponding matrix product. 
Roughly speaking, if A.!| IJ 1 — A lll| is of order unity, and the matrix elements of v are an order of magnitude or 

two smaller, then the third-order energy contribution should be in the range 10 -10 . Therefore, one expects 
the low order contributions to be most important and the expansions given by equation (A 1.1. 146) and 
equation (A 1.1. 147) to converge rapidly, provided the zeroth-order description of the quantum-mechanical 
system is reasonably accurate. 

To derive equations for the order-by-order contributions to the eigenvalue X, the implicit equation for the 
eigenvalue is first rewritten as 


C + &k = {A* + m -Vd,[0 + ** 1 - A* - v«]~ 1 V 

= Ur + "to+*b.[1 - (A*1 - Af )-'(V - AA1)r'(AfH - A™r'v«*l (A1.1.148) 
where AX is a shorthand notation for the error in the zeroth-order eigenvalue X 

AA = A« - Xjf* = Xi 11 + A?' +lf> + ■-- . (A1. 1.149) 
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There are two matrix inverses that appear on the right-hand side of these equations. One of these is trivial; the 
matrix Xl^is diagonal. The other inverse 


[1 - (Ap - A; uj r J (v OT - AaI)]" 1 (A1.1.150) 

is more involved because the matrix v is not diagonal, and direct inversion is therefore problematic. 

qq 

However, if the zeroth-order ground-state energy is well separated from low lying excited states, the diagonal 
matrix hereafter designated as R 

R„ = (O - A™)- ] (A1.1.151) 

that acts in equation (Al.1.150) to scale 

(V^- A).1) (A1. 1.152) 

will consist of only small elements. Thus, the matrix to be inverted can be considered as 

1 - X (A1. 1.153) 

where X is, in the sense of matrices, small with respect to 1. It can be shown that the inverse of the matrix 1 - 
X can be written as a series expansion 

(1 -X)" 1 =1 + X + XX + XXX + XXXX + (A1. 1.154) 

that converges if all eigenvalues of X lie within the unit circle in the complex plane (complex numbers a + b\ 

such that a 2 + b 2 < 1). Applications of perturbation theory in quantum mechanics are predicated on the 
assumption that the series converges for the inverse given by equation (Al.1.150), but efforts are rarely made 
to verify that this is indeed the case. Use of the series representation of the inverse in equation (A 1.1. 14 8) 
gives the unwieldy formal equality 

?Jq ] + A A = Aj M + v m + v^R,v flLP + v 0q R q (v qq - Aa1)R 9 v^ 

(A1. 1.155) 

+ v Dr| R f/ (v w - AX1 )R, f (v r|iV - Aa1 )R„v,jfl + ■ ■ 


from which the error in the zeroth-order energy AX is easily seen to be 

4" + *? 1 + ^ + ■■■=*» 4 V () ,R gV 4 V^Fy^, - AX1)R,V^ (A1.1.156) 

+ v y?J Rg(v flg - AA1)R^(v^ - AA1)R 4 v 4 q + ■ - ■ . 
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Each term on the right-hand side of the equation involves matrix products that contain v a specific number of 
times, either explicitly or implicitly (for the terms that involve AA,). Recognizing that R is a zeroth-order 
quantity, it is straightforward to make the associations 


X^ = V m (A1. 1.157) 

A^'=V % R,V^ (A1. 1.158) 

Xj } = V^'V^'VflU - A^V^R^ (A1. 1.159) 

-i\y <l> m (A1. 1.160) 

— 2a^ Vy^R^R^v^^R^v^ -r A.^ a (] v^R^R^R^v^ . . . 

which provide recipes for calculating corrections to the energy to fourth order. Similar analysis of equation 
(A 1.1. 146) provides successively ordered corrections to the zeroth-order eigenvector (^ = l H c*°* = 0), 
specifically 

4 IJ =R^V 7 „ (A1. 1.161) 

eJL 2) = R fl v fl qR q Vgo - AJ^RflR^o 

(A1. 1.162) 


At this point, it is appropriate to make some general comments about perturbation theory that relate to its use 
in qualitative aspects of chemical physics. Very often, our understanding of complex systems is based on 
some specific zeroth-order approximation that is then modified to allow for the effect of a perturbation. For 
example, chemical bonding is usually presented as a weak interaction between atoms in which the atomic 
orbitals interact to form bonds. Hence, the free atoms represent the zeroth-order picture, and the perturbation 
is the decrease in internuclear distance that accompanies bond formation. Many rationalizations for bonding 
trends traditionally taught in descriptive chemistry are ultimately rooted in perturbation theory. As a specific 
illustration, the decreasing bond strength of carbon-halogen bonds in the sequence C-F > C-Cl > C-Br > C-I 
(a similar trend is found in the sequence CO, CS, CSe, CTe) can be attributed to a 'mismatch' of the np 
halogen orbitals with the 2p orbitals of carbon as for larger values of n. From the point of perturbation theory, 
it is easily understood that the interaction between the bonding electrons is maximized when the 
corresponding energy levels are close (small denominators, large values of R ) while large energy 
mismatches (such as that between the valence orbitals of iodine and carbon) allow for less interaction and 
correspondingly weaker bonds. 

For qualitative insight based on perturbation theory, the two lowest order energy corrections and the first- 
order wavefunction corrections are undoubtedly the most useful. The first-order energy corresponds to 
averaging the effects of the perturbation over the approximate wavefunction %q, and can usually be evaluated 
without difficulty. The sum of X^and v *Ms precisely equal to the expectation value of the Hamiltonian over 

the zeroth-order description % n? and is therefore the proper energy to associate with a simplified model. (It 


should be pointed out that it is this energy and not the zeroth-order energy obtained by summing up orbital 
eigenvalues that is used as the basis for orbital optimization in Hartree-Fock theory. It is often stated that the 
first-order correction to the Hartree-Fock energy vanishes, but this is 
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misleading; the first-order energy is defined instead to be part of the Hartree-Fock energy.) The second-order 
correction allows for interaction between the zeroth-order wavefunction and all others, weighted by the 

reciprocal of the corresponding energy differences and the magnitude of the matrix elements Af f . The same 
interactions between cj. 1 'and the ^"determine the extent to which the latter are mixed in to the first-order 
perturbed wavefunction described by cjp. This is essentially the idea invoked in the theory of orbital 
hybridization. In the presence of four identical ligands approaching a carbon atom tetrahedrally, its valence s 
and p orbitals are mixed (through the corresponding ^"elements, which vanish at infinite separation) and 

their first-order correction in the presence of the perturbation (the ligands) can be written as four equivalent 
linear combinations between the s and three p zeroth-order orbitals. 

Some similarities and differences between perturbation theory and the linear variational principle need to be 
emphasized. First, neither approach can be used in practice to obtain exact solutions to the Schrodinger 
equation for intractable Hamiltonians. In either case, an infinite basis is required; neither the sums given by 
perturbation theory nor the matrix diagonalization of a variational calculation can be carried out. Hence, the 
strengths and weaknesses of the two approaches should be analysed from the point of view that the basis is 
necessarily truncated. Within this constraint, diagonalization of H represents the best solution that is possible 
in the space spanned by the basis set. In variational calculations, rather severe truncation of H is usually 
required, with the effect that its eigenvalues might be poor approximations to the exact values. The problem, 
of course, is that the basis is not sufficiently flexible to accurately represent the true quantum-mechanical 
wavefunction. In perturbation theory, one can include significantly more functions in the calculation. It turns 
out that the results of a low order perturbation calculation are often superior to a practical variational 
treatment of the same problem. Unlike variational methods, perturbation theory does not provide an upper 
bound to the energy (apart from a first-order treatment) and is not even guaranteed to converge. However, in 
chemistry, it is virtually always energy differences — and not absolute energies — that are of interest, and 
differences of energies obtained variationally are not themselves upper (or lower) bounds to the exact values. 
For example, suppose a spectroscopic transition energy between the states \\f f and \|/ . is calculated from the 
difference X f - X. obtained by diagonalizing H in a truncated basis. There is no way of knowing whether this 
value is above or below the exact answer, a situation no different than that associated with taking the 
difference between two approximate eigenvalues obtained from two separate calculations based on 
perturbation theory. 

In the quantum mechanics of atoms and molecules, both perturbation theory and the variational principle are 
widely used. For some problems, one of the two classes of approach is clearly best suited to the task, and is 
thus an established choice. However, in many others, the situation is less clear cut, and calculations can be 
done with either of the methods or a combination of both. 
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A 1.2 Internal molecular motions 

Michael E Kellman 


A 1.2.1 INTRODUCTION 

Ideas on internal molecular motions go back to the very beginnings of chemistry as a natural science, to the 
days of Robert Boyle and Isaac Newton [1]. Much of Boyle's interest in chemistry, apart from the 
'bewitchment' he found in performing chemical experiments [2], arose from his desire to revive and 
transform the corpuscular philosophy favoured by some of the ancient Greeks, such as Epicurus [3]. This had 
lain dormant for centuries, overshadowed by the apparently better- founded Aristotelian cosmology [4], 
including the theory of the four elements. With the revolution in celestial mechanics that was taking place in 
modern Europe in the 17th century, Boyle was concerned to persuade natural philosophers that chemistry, 
then barely emerging from alchemy, was potentially of great value for investigating the corpuscular view, 
which was re-emerging as a result of the efforts of thinkers such as Francis Bacon and Descartes. This belief 
of Boyle's was based partly on the notion that the qualitative properties of real substances and their chemical 
changes could be explained by the joining together of elementary corpuscles, and the 'local motions' within 
these aggregates — what we now call the internal motions of molecules. Boyle influenced his younger 
colleague in the Royal Society, Isaac Newton. Despite immense efforts in chemical experimentation, Newton 
wrote only one paper in chemistry, in which he conjectured the existence of short-range forces in what we 
now recognize as molecules. Thus, in a true sense, with Boyle and Newton was born the science of chemical 
physics [I]. 

This was a child whose development was long delayed, however. Not until the time of John Dalton in the 
early 19th century, after the long interlude in which the phlogiston theory triumphed and then was overthrown 
in the chemistry of Lavoisier, did the nascent corpuscular view of Boyle and Newton really begin to grow into 
a useful atomic and molecular theory [1, 5]. It became apparent that it was necessary to think of the compound 
states of the elements of Lavoisier in terms of definite molecular formulae, to account for the facts that were 
becoming known about the physical properties of gases and the reactions of the elements, their joining into 
compounds and their separation again into elements. 

However, it was still a long time even after Dalton before anything definite could be known about the internal 
motions in molecules. The reason was that the microscopic nature of atoms and molecules was a bar to any 
knowledge of their internal constituents. Furthermore, nothing at all was known about the physical laws that 
applied at the microscopic level. The first hints came in the late 19th century, with the classical Maxwell- 
Lorentz theory of the dynamics of charged particles interacting through the electromagnetic field. The 
electron was discovered by Thomson, and a little later the nuclear structure of the atom by Rutherford. This 
set the stage in the 20th century for a physical understanding in terms of quantum theory of the constituents of 
molecules, and the motions of which they partake. 

This section will concentrate on the motions of atoms within molecules — 'internal molecular motions' — as 
comprehended by the revolutionary quantum ideas of the 20th century. Necessarily, limitations of space 
prevent many topics from being treated in the detail they deserve. Some of these are treated in more detail in 


other articles in this Encyclopedia, or in references in the Bibliography. The emphasis is on treating certain 
key topics in sufficient depth to build a foundation for further exploration by the reader, and for branching off 
into related topics that cannot be treated 


in depth at all. There will not be much focus on molecules undergoing chemical reactions, except for 
unimolecular rearrangements, which are a rather extreme example of internal molecular motion. However, it 
must be emphasized that the distinctions between the internal motions of molecules, the motions of atoms in a 
molecule which is undergoing dissociation and the motion of atoms in two or more molecules undergoing 
reaction are somewhat artificial. Even the motions which are most properly called 'internal' play a central role 
in theories of reaction dynamics. In fact, their character in chemical reactions is one of the most important 
unsolved mysteries in molecular motion. Although we will not have anything directly to say about general 
theories of reaction [6], the internal motion of molecules undergoing isomerization and the importance of the 
internal motions in efforts to control reactions with sophisticated laser sources will be two of the topics 
considered. 

A key theme of contemporary chemical physics and physical chemistry is 'ultrafast' molecular processes [7, 8 
and 9], including both reaction dynamics and internal molecular motions that do not involve reaction. The 
probing of ultrafast processes generally is thought of in terms of very short laser pulses, through the window 
of the time domain. However, most of the emphasis of this section is on probing molecules through the 
complementary window of the frequency domain, which usually is thought of as the realm of the time- 
independent processes, which is to say, the 'ultraslow'. One of the key themes of this section is that encrypted 
within the totality of the information which can be gathered on a molecule in the frequency domain is a vast 
store of information on ultrafast internal motions. The decoding of this information by new theoretical 
techniques for analysis of experimental spectra is a leading theme of recent work. 


A 1.2.2 QUANTUM THEORY OF ATOMIC AND MOLECULAR 
STRUCTURE AND MOTION 

The understanding of molecular motions is necessarily based on quantum mechanics, the theory of 
microscopic physical behaviour worked out in the first quarter of the 20th century. This is because molecules 
are microscopic systems in which it is impossible — or at least very dangerous! — to ignore the dual wave- 
particle nature of matter first recognized in quantum theory by Einstein (in the case of classical waves) and de 
Broglie (in the case of classical particles). 

The understanding of the quantum mechanics of atoms was pioneered by Bohr, in his theory of the hydrogen 
atom. This combined the classical ideas on planetary motion — applicable to the atom because of the formal 
similarity of the gravitational potential to the Coulomb potential between an electron and nucleus — with the 
quantum ideas that had recently been introduced by Planck and Einstein. This led eventually to the formal 
theory of quantum mechanics, first discovered by Heisenberg, and most conveniently expressed by 
Schrodinger in the wave equation that bears his name. 

However, the hydrogen atom is relatively a very simple quantum mechanical system, because it contains only 
two constituents, the electron and the nucleus. This situation is the quantum mechanical analogue of a single 
planet orbiting a sun. It might be thought that an atom with more than one electron is much like a solar system 
with more than one planet, in which the motion of each of the planets is more or less independent and regular. 
However, this is not the case, because the relative strength of the interaction between the electrons is much 
stronger than the attraction of the planets in our solar system. The problem of the internal dynamics of 
atoms — the internal motion when there is more than one electron — is still very far from a complete 
understanding. The electrons are not really independent, nor would their motion, if it were described by 


classical rather than quantum mechanics, be regular, unlike the annual orbits of the 


planets. Instead, in general, it would be chaotic. The corresponding complexity of the quantum mechanical 
atom with more than one electron, or even one electron in a field, is to this day a challenge [10, 11 and 12 ]. 
(In fact, even in the solar system, despite the relative strengths of planetary attraction, there are constituents, 
the asteroids, with very irregular, chaotic behaviour. The issue of chaotic motion in molecules is an issue that 
will appear later with great salience.) 

As we shall see, in molecules as well as atoms, the interplay between the quantum description of the internal 
motions and the corresponding classical analogue is a constant theme. However, when referring to the internal 
motions of molecules, we will be speaking, loosely, of the motion of the atoms in the molecule, rather than of 
the fundamental constituents, the electrons and nuclei. This is an extremely fundamental point to which we 
now turn. 


A 1.2.3 THE MOLECULAR POTENTIAL ENERGY SURFACE 

One of the most salient facts about the structure of molecules is that the electrons are far lighter than the 
nuclei, by three orders of magnitude and more. This is extremely fortunate for our ability to attain a rational 
understanding of the internal motion of the electrons and nuclei. In fact, without this it might well be that not 
much progress would have been made at all! Soon after the discovery of quantum mechanics it was realized 
that the vast difference in the mass scales of the electrons and nuclei means that it is possible, in the main, to 
separate the problem into two parts, an electronic and a nuclear part. This is known as the Born-Oppenheimer 
separability or approximation [13]. The underlying physical idea is that the electrons move much faster than 
the nuclei, so they adjust rapidly to the relatively much slower nuclear motion. Therefore, the electrons are 
described by a quantum mechanical 'cloud' obtained by solving the Schrodinger wave equation. The nuclei 
then move slowly within this cloud, which in turn adjusts rapidly as the nuclei move. 

The result is that, to a very good approximation, as treated elsewhere in this Encyclopedia, the nuclei move in 
a mechanical potential created by the much more rapid motion of the electrons. The electron cloud itself is 
described by the quantum mechanical theory of electronic structure. Since the electronic and nuclear motion 
are approximately separable, the electron cloud can be described mathematically by the quantum mechanical 
theory of electronic structure, in a framework where the nuclei are fixed. The resulting Born-Oppenheimer 
potential energy surface (PES) created by the electrons is the mechanical potential in which the nuclei move. 
When we speak of the internal motion of molecules, we therefore mean essentially the motion of the nuclei, 
which contain most of the mass, on the molecular potential energy surface, with the electron cloud rapidly 
adjusting to the relatively slow nuclear motion. 

We will now treat the internal motion on the PES in cases of progressively increasing molecular complexity. 
We start with the simplest case of all, the diatomic molecule, where the notions of the Born-Oppenheimer 
PES and internal motion are particularly simple. 

The potential energy surface for a diatomic molecule can be represented as in figure Al.2.1 . The x -axis gives 
the internuclear separation R and the y -axis the potential function V(R). At a given value of R, the potential V 
(R) is determined by solving the quantum mechanical electronic structure problem in a framework with the 
nuclei fixed at the given value of R. (To reiterate the discussion above, it is only possible to regard the nuclei 
as fixed in this calculation because of the Born-Oppenheimer separability, and it is important to keep in mind 
that this is only an approximation. 


There can be subtle but important non-adiabatic effects [14, 15], due to the non-exactness of the separability 
of the nuclei and electrons. These are treated elsewhere in this Encyclopedia.) The potential function V(R) is 
determined by repeatedly solving the quantum mechanical electronic problem at different values of R. 
Physically, the variation of V(R) is due to the fact that the electronic cloud adjusts to different values of the 
internuclear separation R in a subtle interplay of mutual particle attractions and repulsions: electron-electron 
repulsions, nuclear-nuclear repulsions and electron-nuclear attractions. 


Energy 



Figure Al.2.1. Potential V(R) of a diatomic molecule as a function of the internuclear separation R. The 
equilibrium distance R^ is at the potential minimum. 

The potential function in figure Al.2.1 has several crucial characteristics. It has a minimum at a certain value 
Rq of the internuclear separation. This is the equilibrium internuclear distance. Near 7? , the function V(R) 
rises as R increases or decreases. This means that there is an attractive mechanical force tending to restore the 
nuclei to Rq. At large values of R, V(R) flattens out and asymptotically approaches a value which in figure 
Al.2.1 is arbitrarily chosen to be zero. This means that the molecule dissociates into separated atoms at large 
R. The difference between the equilibrium potential V(7? ) and the asymptotic energy is the dissociation, or 
binding, energy. At values of R less than 7^ , the potential V(R) again rises, but now without limit. This 
represents the repulsion between nuclei as the molecule is compressed. 

Classically, the nuclei vibrate in the potential V(R), much like two steel balls connected by a spring which is 
stretched or compressed and then allowed to vibrate freely. This vibration along the nuclear coordinate R is 
our first example of internal molecular motion. Most of the rest of this section is concerned with different 
aspects of molecular vibrations in increasingly complicated situations. 

Near the bottom of the potential well, V(R) can be approximated by a parabola, so the function V(R) is 
approximated as 


V(R) = kR\ 


(A 1.2.1) 


This is the form of the potential for a harmonic oscillator, so near the bottom of the well, the nuclei undergo 
nearly 


harmonic vibrations. For a harmonic oscillator with potential as in ( Al.2.1 ), the classical frequency of 


oscillation is independent of energy and is given by [16, 17 and 18] 


WO = 2jTUft = ifkffi (A 1.2.2) 


where |u is the reduced mass. Quantum mechanically, the oscillator has a series of discrete energy levels, 
characterized by the number of quanta n in the oscillator. This is the quantum mechanical analogue for the 
oscillator of the quantized energy levels of the electron in a hydrogen atom. The energy levels of the harmonic 
oscillator are given by 

E, J = Wo fN+i) (A 1.2.3) 

where fi, i.e. Planck's constant h divided by 2tt, has been omitted as a factor on the right-hand side, as is 
appropriate when the customary wavenumber (cm ) units are used [18]. 


A1.2.4ANHARMONICITY 

If the potential were exactly harmonic for all values of 7?, the vibrational motion would be extremely simple, 
consisting of vibrations with frequency 03q for any given amount of vibrational energy. The fact that this is a 
drastic oversimplification for a real molecule can be seen from the fact that such a molecule would never 
dissociate, lacking the flatness in the potential at large R that we saw in figure Al.2.1 . As the internuclear 
separation departs from the bottom of the well at 7? , the harmonic approximation ( Al.2.1 ) progressively 
becomes less accurate as a description of the potential. This is known as anharmonicity or nonlinearity. 
Anharmonicity introduces complications into the description of the vibrational motion. The frequency is no 
longer given by the simple harmonic formula (Al.2.2). Instead, it varies with the amount of energy in the 
oscillator. This variation of frequency with the number of quanta is the essence of the nonlinearity. 

The variation of the frequency can be approximated by a series in the number of quanta, so the energy levels 
are given by 

E* = m {n + A) + y\{n + ±) 2 + y 2 {n + \f + - - . (A 1.2.4) 

Often, it is a fair approximation to truncate the series at the quadratic term with y^ The energy levels are then 
approximated as 

JT ff =w (ff + !) + ]/!(* + i) 2 . (A 1.2.5) 

The first term is known as the harmonic contribution and the second term as the quadratic anharmonic 
correction. 


Even with these complications due to anharmonicity, the vibrating diatomic molecule is a relatively simple 
mechanical system. In polyatomics, the problem is fundamentally more complicated with the presence of 
more than two atoms. The anharmonicity leads to many extremely interesting effects in the internal molecular 
motion, including the possibility of chaotic dynamics. 


It must be pointed out that another type of internal motion is the overall rotation of the molecule. The 
vibration and rotation of the molecule are shown schematically in figure Al.2.2. 



Figure Al.2.2. Internal nuclear motions of a diatomic molecule. Top: the molecule in its equilibrium 
configuration. Middle: vibration of the molecule. Bottom: rotation of the molecule. 


A 1.2.5 POLYATOMIC MOLECULES 

In polyatomic molecules there are many more degrees of freedom, or independent ways in which the atoms of 
the molecule can move. With n atoms, there are a total of 3/z degrees of freedom. Three of these are the 
motion of the centre of mass, leaving (3/z-3) internal degrees of freedom [18]. Of these, except in linear 
polyatomics, three are rotational degrees of freedom, leaving (3/z-6) vibrational degrees of freedom. (In linear 
molecules, there are only two rotational degrees of freedom, corresponding to the two individual orthogonal 
axes of rotation about the molecular axis, leaving (3/z-5) vibrational degrees of freedom. For example, the 
diatomic has only one vibrational degree of freedom, the vibration along the coordinate R which we 
encountered above.) 

Because of limitations of space, this section concentrates very little on rotational motion and its interaction 
with the vibrations of a molecule. However, this is an extremely important aspect of molecular dynamics of 
long-standing interest, and with development of new methods it is the focus of intense investigation [18, 19, 
20 , 21 , 22 and 23]. One very interesting aspect of rotation-vibration dynamics involving geometric phases is 
addressed in section Al.2.20. 


The (3n—6) degrees of vibrational motion again take place on a PES. This implies that the PES itself must be a 
function in a (3/z-6) dimensional space, i.e. it is a function of (3/7-6) internal coordinates r^ . ,r N , where N = 
(3/2-6), which depend on the positions of all the nuclei. The definition of the coordinates Vy . .r^has a great 
deal of flexibility. To be concrete, for H 2 one choice is the set of internal coordinates illustrated in figure 
Al.2.3. These are a bending coordinate, i.e. the angular bending displacement from the equilibrium geometry, 
and two bond displacement coordinates, i.e. the stretching displacement of each O-H bond from its 
equilibrium length. 



Figure Al.2.3. The internal coordinates of the H 2 molecule. There are two bond stretching coordinates and 
a bend coordinate. 

An equilibrium configuration for the molecule is any configuration (r 1Q . . .r^) where the PES has a minimum, 
analogous to the minimum in the diatomic potential at Rq in figure Al.2.1 . In general, there can be a number 
of local equilibrium configurations in addition to the lowest equilibrium configuration, which is called the 
global equilibrium or minimum. We will refer to an equilibrium configuration in speaking of any of the local 
equilibria, and the equilibrium configuration when referring to the global minimum. In the very close vicinity 
of the equilibrium configuration, the molecule will execute harmonic vibrations. Since there are (3n-6) 
vibrational degrees of freedom, there must be (3n-6) harmonic modes, or independent vibrational motions. 
This means that on the multi-dimensional PES, there must be (3n-6) independent coordinates, along any of 
which the potential is harmonic, near the equilibrium configuration. We will denote these independent degrees 
of freedom as the normal modes coordinates Ry . .R N . Each of the R f in general is some combination of the 
internal coordinates r^ . .r^in terms of which the nuclear positions and PES were defined earlier. These are 
illustrated for the case of water in figure Al.2.4 . One of the normal modes is a bend, very much like the 
internal bending coordinate in figure Al.2.3 . The other two modes are a symmetric and antisymmetric 
stretch. Near the equilibrium configuration, given knowledge of the molecular potential, it is possible by the 
procedure of normal mode analysis [ 24 ] to calculate the frequencies of each of the normal modes and their 
exact expression in terms of the original internal coordinates r ] . . ,r N . 



Figure Al.2.4. The normal vibrational coordinates of H 2 0. Left: symmetric stretch. Middle: antisymmetric 
stretch. Right: bend. 


It is often very useful to describe classical vibrations in terms of a trajectory in the space of coordinates r^ 
.r N . If the motion follows one of the normal modes, the trajectory is one in which the motion repeats itself 


along a closed curve. An example is shown in figure Al.2.5 for the symmetric and antisymmetric stretch 
modes. The x and y coordinates r 1? r 2 are the displacements of the two O-H bonds. (For each mode / there is a 
family of curves, one for each value of the energy, with the amplitude of vibration along the normal modes in 
figure Al.2.5 increasing with energy; the figure shows the trajectory of each mode for one value of the 
energy.) 
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Figure Al.2.5. Harmonic stretch normal modes of a symmetric triatomic. The symmetric stretch s and 
antisymmetric stretch a are plotted as a function of the bond displacements r 1? r 2 . 

In general, each normal mode in a molecule has its own frequency, which is determined in the normal mode 
analysis [24]. However, this is subject to the constraints imposed by molecular symmetry [18, 25, 26]. For 
example, in the methane molecule CH 4 , four of the normal modes can essentially be designated as normal 
stretch modes, i.e. consisting primarily of collective motions built from the four C-H bond displacements. The 
molecule has tetrahedral symmetry, and this constrains the stretch normal mode frequencies. One mode is the 
totally symmetric stretch, with its own characteristic frequency. The other three stretch normal modes are all 
constrained by symmetry to have the same frequency, and are referred to as being triply -degenerate. 


The (3/2—6) normal modes with coordinates R, . . .R N are often designated v 1 . . .v^. (Not to be confused with 


N' 


the common usage of v to denote a frequency, as in equation (A 1.2. 2) , the last such usage in this section.) 
Quantum mechanically, each normal mode v z - is characterized by the number of vibrational quanta v^. in the 
mode. Then the vibrational state of the molecule is designated or assigned by the number of quanta v^. in each 
of the modes, i.e. (n^. . .n^). In the harmonic approximation in which each mode i is characterized by a 

frequency co z ., the vibrational energy of a state assigned as (jiy . .n^) is given by 


EUl\ ■ ■ ■ H, V ) = (/J] + y)o>, + (,} 2 + \)(02 + ■ ■ - * {n N + 4)Wrf 


(A 1.2.6) 


A 1.2.6 ANHARMONIC NORMAL MODES 


In the polyatomic molecule, just as in the diatomic, the PES must again be highly anharmonic away from the 
vicinity of the potential minimum, as seen from the fact that the polyatomic can dissociate; in fact in a 
multiplicity of ways, because in general there can be several dissociation products. In addition, the molecule 
can have complicated internal rearrangements in which it isomerizes. This means that motion takes place from 
one minimum in the PES, over a saddle, or 'pass', and into another minimum. We will have something to say 
about these internal rearrangements later. However, the fact of anharmonicity raises important questions about 
the normal modes even in the near vicinity of an equilibrium configuration. We saw above that anharmonicity 
in a diatomic means that the frequency of the vibrational motion varies with the amount of vibrational energy. 


An analogous variation of frequency of the normal modes occurs in polyatomics. 

However, there is a much more profound prior issue concerning anharmonic normal modes. The existence of 
the normal vibrational modes, involving the collective motion of all the atoms in the molecule as illustrated 
for H 2 in figure Al.2.4 was predicated on the basis of the existence of a harmonic potential. But if the 
potential is not exactly harmonic, as is the case everywhere except right at the equilibrium configuration, are 
there still collective normal modes? And if so, since they cannot be harmonic, what is their nature and their 
relation to the harmonic modes? 

The beginning of an answer comes from a theorem of Moser and Weinstein in mathematical nonlinear 
dynamics [27, 28]. This theorem states that in the vicinity of a potential minimum, a system with (3n-6) 
vibrational degrees of freedom has (3n-6) anharmonic normal modes. What is the difference between the 
harmonic normal modes and the anharmonic normal modes proven to exist by Moser and Weinstein? Figure 
Al.2.6 shows anharmonic stretch normal modes. The symmetric stretch looks the same as its harmonic 
counterpart in Figure Al.2.5 ; this is necessarily so because of the symmetry of the problem. The 
antisymmetric stretch, however, is distinctly different, having a curvilinear appearance in the zero-order bond 
modes. The significance of the Moser-Weinstein theorem is that it guarantees that in the vicinity of a 
minimum in the PES, there must be a set of (3^-6) of these anharmonic modes. 
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Figure Al.2.6. Anharmonic stretch normal modes of a symmetric triatomic. The plot is similar to figure 
Al.2.5 , except the normal modes are now anharmonic and can be curvilinear in the bond displacement 
coordinates r 1? r 2 . The antisymmetric stretch is curved, but the symmetric stretch is linear because of 
symmetry. 

It is sometimes very useful to look at a trajectory such as the symmetric or antisymmetric stretch of figure 
Al.2.5 and figure Al.2.6 not in the physical spatial coordinates {ry . .r^), but in the phase space of 
Hamiltonian mechanics [16, 29], which in addition to the coordinates (ry . .r^) also has as additional 
coordinates the set of conjugate momenta (p v . .p^. In phase space, a one-dimensional trajectory such as the 
antisymmetric stretch again appears as a one-dimensional curve, but now the curve closes on itself. Such a 
trajectory is referred to in nonlinear dynamics as aperiodic orbit [29]. One says that the anharmonic normal 
modes of Moser and Weinstein are stable periodic orbits. 

What does it mean to say the modes are stable? Suppose that one fixes the initial conditions — the initial 
values of the coordinates and momenta, for a given fixed value of the energy — so the trajectory does not lie 
entirely on one of the anharmonic modes. At any given time the position and momentum is some combination 
of each of the normal motions. An example of the kind of trajectory that can result is shown in figure Al.2.7 . 
The trajectory lies in a box with extensions in each of the anharmonic normal modes, filling the box in a very 
regular, 'woven' pattern. In phase space, a regular trajectory in a box is no longer a one-dimensional closed 
curve, or periodic orbit. Instead, in phase space a box-filling trajectory lies on a surface which has the 


qualitative form, or topology, of a torus — the surface of a doughnut. The confinement of the trajectory to such 
a box indicates that the normal modes are stable. (Unstable modes do exist and will be of importance later.) 
Another quality of the trajectory in the box is its 'woven' pattern. Such a trajectory is called regular. We will 
consider other, chaotic types of trajectories later; the chaos and instability of modes are closely related. The 
issues of periodic orbits, stable modes and regular and chaotic motion have been studied in great depth in the 
theory of Hamiltonian or energy-preserving dynamical systems [29, 30]. We will return repeatedly to 
concepts of classical dynamical systems. 
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Figure Al.2.7. Trajectory of two coupled stretches, obtained by integrating Hamilton's equations for motion 
on a PES for the two modes. The system has stable anharmonic symmetric and antisymmetric stretch modes, 
like those illustrated in figure Al.2.6 . In this trajectory, semiclassically there is one quantum of energy in each 
mode, so the trajectory corresponds to a combination state with quantum numbers [n , « ] = [1, 1]. The 
'woven' pattern shows that the trajectory is regular rather than chaotic, corresponding to motion in phase 
space on an invariant torus. 

However, the reader may be wondering, what is the connection of all of these classical notions — stable 
normal modes, regular motion on an invariant torus — to the quantum spectrum of a molecule observed in a 
spectroscopic experiment? Recall that in the harmonic normal modes approximation, the quantum levels are 
defined by the set of quantum numbers (n v . .n^ giving the number of quanta in each of the normal modes. 

Does it make sense to associate a definite quantum number n i to each mode i in an anharmonic system? In 
general, this is an extremely difficult question! But remember that so far, we are speaking of the situation in 
some small vicinity of a minimum on the PES, where the Moser-Weinstein theorem guarantees the existence 
of the anharmonic normal modes. This essentially guarantees that quantum levels with low enough v f values 
correspond to trajectories that lie on invariant tori. Since the levels are quantized, these must be special tori, 
each characterized by quantized values of the classical actions L = (n. + %)Jt, which are constants of the motion 

on the invariant torus. As we shall see, the possibility of assigning a set of TV quantum numbers n f to a level, 
one for each mode, is a very special situation that holds only near the potential minimum, where the motion is 
described by the TV anharmonic normal modes. However, let us continue for now with the region of the 
spectrum where this special situation applies. 


If there are n . quanta in mode / and zero quanta in all the other modes, the state is called an overtone of the 
normal mode i. What does such a state correspond to in terms of a classical trajectory? Consider the overtone 
of the antisymmetric stretch, again neglecting the bend. If all the energy in the overtone were in mode /, the 
trajectory would look like the anharmonic mode itself in figure Al.2.6 . However, because of the unavoidable 


quantum mechanical zero-point energy associated with the action fi/2 in each mode, an overtone state actually 
has a certain amount of energy in all of the normal modes. Therefore, classically, the overtone of the 
antisymmetric stretch corresponds to a box-like trajectory, with most of the extension along the antisymmetric 
stretch, but with some extension along the symmetric stretch, and corresponding to the irreducible zero-point 
energy. 
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The other kind of quantum level we considered above is one with quanta in more than one mode, i.e. (n v . 
.rijj) with more than one of the n f not equal to zero. Such a state is called a combination level. This 
corresponds, classically, to a box-like trajectory with extension in each mode corresponding to the number of 
quanta; an example was seen in figure Al.2.7 . 

What does one actually observe in the experimental spectrum, when the levels are characterized by the set of 
quantum numbers (n v . .n^) for the normal modes? The most obvious spectral observation is simply the set of 
energies of the levels; another important observable quantity is the intensities. The latter depend very 
sensitively on the type of probe of the molecule used to obtain the spectrum; for example, the intensities in 
absorption spectroscopy are in general far different from those in Raman spectroscopy. From now on we will 
focus on the energy levels of the spectrum, although the intensities most certainly carry much additional 
information about the molecule, and are extremely interesting from the point of view of theoretical dynamics. 

If the molecule really had harmonic normal modes, the energy formula ( Al.2.6 ) would apply and the 
spectrum would be extremely simple. It is common to speak of a progression in a mode /; a progression 
consists of the series of levels containing the fundamental, with n f = 1, along with the overtone levels n f > 1. 
Each progression of a harmonic system would consist of equally spaced levels, with the level spacing given 
by the frequency co .. It is also common to speak of sequences, in which the sum of the number of quanta in 
two modes is fixed. In a harmonic spectrum, the progressions and sequences would be immediately evident to 
the eye in a plot of the energy levels. 

In a system with anharmonic normal modes, the spectral pattern is not so simple. Instead of the simple energy 
level formula ( Al.2.6 ), in addition to the harmonic terms there are anharmonic terms, similar to the terms y 1 
(n + 2) , J 2 ( n + 2) ' • • • i n ( Al.2.4 ). For each mode i , there is a set of such terms y ii (n i + 2) , Y///( w / + 2) ' etc ' 
where now by common convention the / 's in the subscript refer to mode i and the order of the subscript and 
superscript match, for example y.. with the quadratic power (n. + \) 2 . However, there are also cross terms y.. 

( n i + h( n i + i)> Y/»( w / + 1) ( n • + 2)5 etc - A s an example, the anharmonic energy level formula for just a 
symmetric and antisymmetric stretch is given to the second order in the quantum numbers by 


E(*h+ nj = a>, (n, + 5) + wAn* + 5) + tithing + j) + y, •,■('*., + h 2 

+ ft«0l« ^ J) 1 + W*(ffA + |) 2 + YtAn* + jXffd + \) (A 1.2.7) 

An energy expression for a polyatomic in powers of the quantum numbers like (Al.2.7)) is an example of an 
anharmonic expansion [18]. In the anharmonic spectrum, within a progression or sequence there will not be 
equal spacings between levels; rather, the spacings will depend on the quantum numbers of the adjacent 
levels. Nonetheless, the spectrum will appear very regular to the eye. Spectra that follow closely a formula 
such as (Al.2.7), perhaps including higher powers in the quantum numbers, are very common in the 
spectroscopy of polyatomic molecules at relatively low energy near the minimum of the PES. This regularity 
is not too surprising, when one recalls that it is associated with the existence of the good quantum numbers 
(n ] . . .n^, which themselves correspond classically to regular motion of the kind shown in figure Al.2.7 . 
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A 1.2.7 SPECTRA THAT ARE NOT SO REGULAR 

If this was all there is to molecular spectra they would be essentially well understood by now and their power 
to give information on molecules nearly exhausted. However, this cannot be the case: consider that molecules 
dissociate — a very irregular type of motion! — while a molecule whose spectrum strictly followed a formula 
such as (Al.2.7) would have quantum levels all corresponding semiclassically to motion on invariant tori that 
are described by the N anharmonic normal modes. Motion as simple as this is expected only near a potential 
minimum, where the Weinstein-Moser theorem applies. How is the greater complexity of real molecules 
manifested in a spectrum? The spectrum is a reflection of the physical PES, since the vibrational spectrum is 
determined quantum mechanically by the PES. Since the PES contains the possibility of much less regular 
motion than that reflected in a Dunham formula such as ( Al.2.7 ), how can a Dunham formula be modified so 
as to represent a real spectrum, including portions corresponding to less regular motion? We will consider first 
what these modifications must look like, then pose the following question: suppose we have a generalized 
spectral Hamiltonian and use this to represent experimental observations, how can we use this representation 
to decode the dynamical information on the internal molecular motions that is contained in the spectrum? 


A 1.2.8 RESONANCE COUPLINGS 

The fact that terms in addition to those present in the energy level formula ( Al.2.7 ) might arise in molecular 

spectra is already strongly suggested by one of the features already discussed; the cross-anharmonic terms 

such as yJn. + ^){n- + j). These terms show that the anharmonicity arises not only from the normal modes 
ij i j 

themselves — the 'self-anharmonicity' terms like J ii (n i + j) — but also from couplings between the normal 

modes. The cross-anharmonic terms depend only on the vibrational quantum numbers — the Hamiltonian so 
far is diagonal in the normal mode quantum numbers. However, there are also terms in the generalized 
Hamiltonian that are not diagonal in the quantum numbers. It is these that are responsible for profoundly 
greater complexity of the internal motion of a polyatomic, as compared to a diatomic. 

Consider how these non-diagonal terms would arise in the analysis of an experimental spectrum. Given a set 
of spectral data, one would try to fit the data to a Hamiltonian of the form of ( Al.2.7 ). The Hamiltonian then 
is to be regarded as a 'phenomenologicaP or 'effective' spectroscopic Hamiltonian, to be used to describe the 
results of experimental observations. The fitting consists of adjusting the parameters of the Hamiltonian, for 
example co's, the y's, etc, until the best match possible is obtained between the spectroscopic Hamiltonian and 
the data. If a good fit is not obtained with a given number of terms in the Dunham expansion, one could 
simply add terms of higher order in the quantum numbers. However, it is found in fitting the spectrum of the 
stretch modes of a molecule like H 2 that this does not work at all well. Instead, a large resonance coupling 
term which exchanges quanta between the modes is found to be necessary to obtain a good fit to the data, as 
was first discovered long ago by Darling and Dennison [31], Specifically, the Darling-Dennison coupling 
takes two quanta out of the symmetric stretch, and places two into the antisymmetric stretch. There is also a 
coupling which does the reverse, taking two quanta from the antisymmetric stretch and placing them into the 
symmetric stretch. It is convenient to represent this coupling in terms of the raising and lowering operators 
[ 32 ] a/, a.. These, respectively, have the action of placing a quantum into or removing a quantum from an 

oscillator which originally has n quanta: 
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a + \n) = |«+1) a\n) = |n — I) (A 1.2.8) 

The raising and lowering operators originated in the algebraic theory of the quantum mechanical oscillator, 
essentially by the path followed by Heisenberg in formulating quantum mechanics [33]. In terms of raising 
and lowering operators, the Darling-Dennison coupling operator is 

* DD(«JflJ***ri + ^a t ^ t a*) (A 1 2.9) 

where k dd is a parameter which defines the strength of the coupling; k dd is optimized to obtain the best 
possible fit between the data and the spectroscopic Hamiltonian. 

Physically, why does a term like the Darling-Dennison coupling arise? We have said that the spectroscopic 
Hamiltonian is an abstract representation of the more concrete, physical Hamiltonian formed by letting the 
nuclei in the molecule move with specified initial conditions of displacement and momentum on the PES, 
with a given total kinetic plus potential energy. This is the sense in which the spectroscopic Hamiltonian is an 
'effective' Hamiltonian, in the nomenclature used above. The concrete Hamiltonian that it mimics is 
expressed in terms of particle momenta and displacements, in the representation given by the normal 
coordinates. Then, in general, it may contain terms proportional to all the powers of the products of the 

normal coordinates R. r|1 R. r,(p . (It will also contain terms containing the momenta that arise from the kinetic 
energy; however, these latter kinetic energy terms are more restricted in form than the terms from the 

potential.) In the spectroscopic Hamiltonian, these will partly translate into expressions with terms 

proportional to the powers of the quantum numbers, as in (Al.2.7). However, there will also be resonance 

couplings, such as the Darling-Dennison coupling (Al.2.9). These arise directly from the fact that the 

oscillator raising and lowering operators (Al.2.8) have a close connection to the position and momentum 

operators of the oscillator [32], so the resonance couplings are implicit in the terms of the physical 

Hamiltonian such as R^ 1 R^. 

i J 

Since all powers of the coordinates appear in the physical PES, and these give rise to resonance couplings, 
one might expect a large, in fact infinite, number of resonance couplings in the spectroscopic Hamiltonian. 
However, in practice, a small number of resonance couplings — and often none, especially at low energy — is 
sufficient to give a good fit to an experimental spectrum, so effectively the Hamiltonian has a rather simple 
form. To understand why a small number of resonance couplings is usually sufficient we will focus again on 
H 2 0. 

In fitting the H 2 stretch spectrum, it is found that the Darling-Dennison coupling is necessary to obtain a 
good fit, but only the Darling-Dennison and no other. (It turns out that a second coupling, between the 
symmetric stretch and bend, is necessary to obtain a good fit when significant numbers of bending quanta are 
involved; we will return to this point later.) If all resonance terms in principle are involved in the Hamiltonian, 
why it is that, empirically, only the Darling-Dennison coupling is important? To understand this, a very 
important notion, the poly ad quantum number, is necessary. 
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A 1.2.9 POLYAD NUMBER 

The characteristic of the Darling-Dennison coupling is that it exchanges two quanta between the symmetric 
and antisymmetric stretches. This means that the individual quantum numbers n $ , n a are no longer good 
quantum numbers of the Hamiltonian containing F DD . However, the total number of stretch quanta 


'str 


= (/f,+n fl ) (A 1.2.10) 


is left unchanged by F DD . Thus, while it might appear that F DD has destroyed two quantum numbers, 
corresponding to two constants of motion, it has in fact preserved n stY as a good quantum number, often 
referred to as a poly 'ad quantum number. So, the Darling-Dennison term F DD couples together a set of zero- 
order states with common values of the polyad number « str . For example, the set with n^. = 4 contains zero- 
order states [n , n a ]=[4, 0], [3, 1], [2, 2], [1, 3], [0, 4]. These five, zero-order states are referred to as the zero- 
order polyad with n stY = 4. 

If only zero-order states from the same polyad are coupled together, this constitutes a fantastic simplification 
in the Hamiltonian. Enormous computational economies result in fitting spectra, because the spectroscopic 
Hamiltonian is block diagonal in the polyad number. That is, only zero-order states within blocks with the 
same polyad number are coupled; the resulting small matrix diagonalization problem is vastly simpler than 
diagonalizing a matrix with all the zero-order states coupled to each other. 

However, why should such a simplification be a realistic approximation? For example, why should not a 
coupling of the form 

(tf>>X + tf.>tf,aX) (A 1.2.11) 

which would break the polyad number « str , be just as important as ^ DD ? There is no reason a priori why it 
might not have just as large a contribution as F DD when the coordinate representation of the PES is expressed 
in terms of the raising and lowering operators a*, a f . To see why it nonetheless is found empirically to be 
unimportant in the fit, and therefore is essentially negligible, consider again the molecule H 2 0. A coupling 
like (Al.2.1 1), which removes three quanta from one mode but puts only one quantum in the other mode, is 
going to couple zero-order states with vastly different zero-order energy. For example, \n s , n a ]=[3, 0] will be 
coupled to [0, 1], but these zero-order states are nowhere near each other in energy. By general quantum 
mechanical arguments of perturbation theory [32], the coupling of states which differ greatly in energy will 
have a correspondingly small effect on the wavefunctions and energies. In a molecule like H 2 0, such a 
coupling can essentially be ignored in the fitting Hamiltonian. 

This is why the coupling F DD is often called a Darling-Dennison resonance coupling: it is significant 
precisely when it couples zero-order states that differ by a small number of quanta which are approximately 
degenerate with each other, which classically is to say that they are in resonance. The Darling-Dennison 
coupling, because it involves taking two quanta from one mode and placing two in another, is also called a 2:2 
coupling. Other orders of coupling n:m also arise in different situations (such as the stretch-bend coupling in 
H 2 0), and these will be considered later. 
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However, if only the Darling-Dennison coupling is important for the coupled stretches, what is its importance 
telling us about the internal molecular motion? It turns out that the right kind of analysis of the spectroscopic 
fitting Hamiltonian reveals a vast amount about the dynamics of the molecule: it allows us to decipher the 
story encoded in the spectrum of what the molecule is 'really doing' in its internal motion. We will approach 
this 'spectral cryptology' from two complementary directions: 

the spectral pattern of the Darling-Dennison spectroscopic Hamiltonian; and, less directly, the analysis of a 
classical Hamiltonian corresponding to the spectroscopic quantum Hamiltonian. We will see that the Darling- 


Dennison coupling produces a pattern in the spectrum that is very distinctly different from the pattern of a 
'pure normal modes Hamiltonian', without coupling, such as (Al.2.7). Then, when we look at the classical 
Hamiltonian corresponding to the Darling-Dennison quantum fitting Hamiltonian, we will subject it to the 
mathematical tool of bifurcation analysis [34]. From this, we will infer a dramatic birth in bifurcations of new 
'natural motions' of the molecule, i.e. local modes. This will be directly connected with the distinctive 
quantum spectral pattern of the polyads. Some aspects of the pattern can be accounted for by the classical 
bifurcation analysis; while others give evidence of intrinsically non-classical effects in the quantum dynamics. 

It should be emphasized here that while the discussion of contemporary techniques for decoding spectra for 
information on the internal molecular motions will largely concentrate on spectroscopic Hamiltonians and 
bifurcation analysis, there are distinct, but related, contemporary developments that show great promise for 
the future. For example approaches using advanced 'algebraic' techniques [35, 36] for alternative ways to 
build the spectroscopic Hamiltonian, and 'hierarchical analysis' using techniques related to general 
classification methods [37], 


A 1.2.10 SPECTRAL PATTERN OF THE DARLING-DENNISON 
HAMILTONIAN 

Consider the polyad n^. = 6 of the Hamiltonian ( Al.2.7 ). This polyad contains the set of levels conventionally 
assigned as [6, 0, ], [5, 1], . . ., [0, 6]. If a Hamiltonian such as ( Al.2.7 ) described the spectrum, the polyad 
would have a pattern of levels with monotonically varying spacing, like that shown in figure Al.2.8 . 
However, suppose the fit of the experimental spectrum requires the addition of a strong Darling-Dennison 
term F DD , as empirically is found to be the case for the stretch spectrum of a molecule like H 2 0. In general, 
because of symmetry, only certain levels may be spectroscopically allowed; for example, in absorption 
spectra, only levels with odd number of quanta n in the antisymmetric stretch. However, diagonalization of 
the polyad Hamiltonian gives all the levels of the polyad. When these are plotted for the Darling-Dennison 
Hamiltonian, including the spectroscopically unobserved levels with even n a , a striking pattern, shown in 
figure Al.2.9 , is immediately evident. At the top of the polyad the level spacing pattern is like that of the 
anharmonic normal modes, as in figure Al.2.8 , but at the bottom of the polyad the levels come in near- 
degenerate doublets. What is this pattern telling us about the change in the internal molecular motion resulting 
from inclusion of the Darling-Dennison coupling? 
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Figure Al.2.8. Typical energy level pattern of a sequence of levels with quantum numbers \n s , n a ] for the 
number of quanta in the symmetric and antisymmetric stretch. The bend quantum number is neglected and 
may be taken as fixed for the sequence. The total number of quanta (n^ + n a = 6) is the polyad number, which 


is the same for all levels. [6, 0] and [0, 6] are the overtones of the symmetric and antisymmetric stretch; the 
other levels are combination levels. The levels have a monotonic sequence of energy spacings from bottom to 
top. 


Figure Al.2.9. Energy level pattern of polyad 6 of a spectroscopic Hamiltonian for coupled stretches with 
strong Darling-Dennison coupling. Within the polyad the transition from normal to local modes is evident. At 
the bottom of the polyad are two nearly degenerate pairs of levels. Semiclassically, the bottom pair derive 
from local mode overtone states. The levels are symmetrized mixtures of the individual local mode overtones. 
Semiclassically, they are exactly degenerate; quantum mechanically, a small splitting is present, due to 
tunnelling. The next highest pair are symmetrized local mode combination states. The tunnelling splitting is 
larger than in the bottom pair; above this pair, the levels have normal mode character, as evidenced by the 
energy level pattern. 
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This has been the subject of a great deal of work by many people over more than 20 years. Breakthroughs in 
the theoretical understanding of the basic physics began to accumulate in the early 1980s [ 38 , 39 , 40 and 41 ]. 
One approach that has a particularly close relation between experiment and theory uses bifurcation analysis of 
a classical analogue of the spectroscopic fitting Hamiltonian. The mathematical details are presented 
elsewhere [42, 43, 44 and 45]; the qualitative physical meaning is easily described. 

A classical Hamiltonian is obtained from the spectroscopic fitting Hamiltonian by a method that has come to 
be known as the 'Heisenberg correspondence' [46], because it is closely related to the techniques used by 
Heisenberg in fabricating the form of quantum mechanics known as matrix mechanics. 

Once the classical Hamiltonian has been obtained, it is subjected to bifurcation analysis. In a bifurcation, 
typically, a stable motion of the molecule — say, one of the Weinstein-Moser normal modes — suddenly 
becomes unstable; and new stable, anharmonic modes suddenly branch out from the normal mode. An 
illuminating example is presented in figure Al.2.10 which illustrates the results of the bifurcation analysis of 
the classical version of the Darling-Dennison Hamiltonian. One of the normal modes — it can be either the 
symmetric or antisymmetric stretch depending on the specific parameters found empirically in the fitting 
Hamiltonian — remains stable. Suppose it is the antisymmetric stretch that remains stable. At the bifurcation, 
the symmetric stretch suddenly becomes unstable. This happens at some critical value of the mathematical 
'control parameters' [34], which we may take to be some critical combination of the energy and polyad 
number. From the unstable symmetric stretch, there immediately emerge two new stable periodic orbits, or 
anharmonic modes. As the control parameter is increased, the new stable modes creep out from the symmetric 


stretch — which remains in 'fossilized' form as an unstable periodic orbit. Eventually, the new modes point 
more or less along the direction of the zero-order bond displacements, but as curvilinear trajectories. We can 
say that in this bifurcation, anharmonic local modes have been born. 

It is the 'skeleton' of stable and unstable modes in figure Al. 2. 10(c) that explains the spectral pattern seen in 
figure Al.2.9 . Some of the levels in the polyad, those in the upper part, have wavefunctions that are quantized 
in patterns that shadow the normal modes — the still-stable antisymmetric stretch and the now-unstable 
symmetric stretch. Other states, the lower ones in the polyad, are quantized along the local modes. These latter 
states, described by local mode quantum numbers, account for the pattern of near-degenerate doublets. First, 
why is the degeneracy there at all? The two classical local modes have exactly the same energy and 
frequency, by symmetry. In the simplest semiclassical [29] picture, there are two exactly degenerate local 
mode overtones, each pointed along one or the other of the local modes. There are also combination states 
possible with quanta in each of the local modes and, again, semiclassically these must come in exactly 
degenerate pairs. 
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Figure Al.2.10. Birth of local modes in a bifurcation. In (a), before the bifurcation there are stable 
anharmonic symmetric and antisymmetric stretch modes, as in figure Al.2.6 . At a critical value of the energy 
and polyad number, one of the modes, in this example the symmetric stretch, becomes unstable and new 
stable local modes are born in a bifurcation; the system is shown shortly after the bifurcation in (b), where the 
new modes have moved away from the unstable symmetric stretch. In (c), the new modes clearly have taken 
the character of the anharmonic local modes. 


The classical bifurcation analysis has succeeded in decoding the spectrum to reveal the existence of local and 
normal modes, and the local modes have accounted for the changeover from a normal mode spectral pattern to 
the pattern of degenerate doublets. But why the splitting of the near-degenerate doublets? Here, non-classical 
effects unique to quantum mechanics come into play. A trajectory in the box for one of the local modes is 
confined in phase space to an invariant torus, and classically will never leave its box. However, quantum 
mechanically, there is some probability for classically forbidden processes to take place in which the 
trajectory jumps from one box to the other! This may strike the reader as akin to the quantum mechanical 
phenomenon of tunnelling. In fact, this is more than an analogy. The effect has been called 'dynamical 
tunnelling' [41, 47], and it can be formulated rigorously as a mathematical tunnelling problem [40, 48]. The 
effect of the dynamical tunnelling on the energy levels comes through in another unique manifestation of 
quantum mechanics. The quantum eigenfunctions — the wavefunctions for the energy levels of the true 
quantum spectrum — are symmetrized combinations of the two semiclassical wavefunctions corresponding to 
the two classical boxes [38]. These wavefunctions come in pairs of + and - symmetry; the two levels of a 
near-degenerate pair are split into a +state and a -state. The amount of the splitting is directly related to the 


rate of the non-classical tunnelling process [49]. 


A 1.2.11 FERMI RESONANCES 

In the example of H 2 0, we saw that the Darling-Dennison coupling between the stretches led to a profound 
change in the internal dynamics; the birth of local modes in a bifurcation from one of the original low-energy 
normal modes. The question arises of the possibility of other types of couplings, if not between two identical 
stretch modes, then between other kinds of modes. We have seen that, effectively, only a very small subset of 
possible resonance couplings between 
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the stretches is actually important; in the case of the H 2 stretches, only the 2:2 Darling-Dennison coupling. 
This great simplification came about because of the necessity to satisfy a condition of frequency resonance 
between the zero-order modes for the 2:2 Darling-Dennison coupling to be important. In H 2 0, there is also an 
approximate 2:1 resonance condition satisfied between the stretch and bend frequencies. Not surprisingly, in 
fitting the H 2 spectrum, in particular when several bending quanta are present, it is necessary to consider a 
2:1 coupling term between the symmetric stretch (s) and bend (ft), of the form 


KJth (a*ahai t + u % tf j/J,,) . (A 1 .2.1 2) 

(The analogous coupling between the antisymmetric stretch and bend is forbidden in the H 2 Hamiltonian 
because of symmetry.) The 2:1 resonance is known as a 'Fermi resonance' after its introduction [50] in 
molecular spectroscopy. The 2:1 resonance is often very prominent in spectra, especially between stretch and 
bend modes, which often have approximate 2:1 frequency ratios. The 2:1 coupling leaves unchanged as a 
polyad number the sum: 

n, t = (f^+zj*/2) + (A 1.2.13) 

Other resonances, of order n:m , are possible in various systems. Another type of resonance is a 'multimode' 
resonance. For example, in C 2 H 2 the coupling 

tf2Jj*(tfjfl2tf-iflJ + aiaUf 4 tf s ) (A 1.2.14) 

that transfers one quantum from the antisymmetric stretch v 3 to the C-C stretch v 2 and each of the bends v 4 
and v 5 is important [ 51 , 52 and 53 ]. Situations where couplings such as the n:m resonance and the 2345 
multimode resonance need to be invoked are often referred to as 'Fermi resonances', though some authors 
restrict this term to the 2:1 resonance and use the term 'anharmonic resonance' to describe the more general 
n:m or multimode cases. Here, we will use the terms 'Fermi' and 'anharmonic' resonances interchangeably. 

It turns out that the language of 'normal and local modes' that emerged from the bifurcation analysis of the 
Darling-Dennison Hamiltonian is not sufficient to describe the general Fermi resonance case, because the 
bifurcations are qualitatively different from the normal-to-local bifurcation in figure Al.2.10 . For example, in 
2:1 Fermi systems, one type of bifurcation is that in which 'resonant collective modes' are born [54]. The 
resonant collective modes are illustrated in figure Al. 2. 11 their difference from the local modes of the 
Darling-Dennison system is evident. Other types of bifurcations are also possible in Fermi resonance systems; 
a detailed treatment of the 2:1 resonance can be found in [44]. 
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Figure Al.2.11. Resonant collective modes of the 2:1 Fermi resonance system of a coupled stretch and bend 
with an approximate 2:1 frequency ratio. Shown is one end of a symmetric triatomic such as H 2 0. The normal 
stretch and bend modes are superseded by the horseshoe-shaped modes shown in (a) and (b). These two 
modes have different frequency, as further illustrated in figure Al. 2. 12 . 


A 1.2.12 MORE SUBTLE ENERGY LEVEL PATTERNS 

The Darling-Dennison Hamiltonian displayed a striking energy level pattern associated with the bifurcation to 
local modes: approximately degenerate local mode doublets, split by dynamical tunnelling. In general Fermi 
resonance systems, the spectral hallmarks of bifurcations are not nearly as obvious. However, subtle, but 
clearly observable spectral markers of bifurcations do exist. For example, associated with the formation of 
resonant collective modes in the 2:1 Fermi system there is a pattern of a minimum in the spacing of adjacent 
energy levels within a poly ad [55], as illustrated in figure Al.2.12 . This pattern has been invoked [56, 57] in 
the analysis of 'isomerization spectra' of the molecule HCP, which will be discussed later. Other types of 
bifurcations have their own distinct, characteristic spectral patterns; for example, in 2:1 Fermi systems a 
second type of bifurcation has a pattern of alternating level spacings, of a 'fan' or a 'zigzag', which was 
predicted in [55] and subsequently s [57]. 
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Figure Al.2.12. Energy level pattern of a polyad with resonant collective modes. The top and bottom energy 
levels correspond to overtone motion along the two modes shown in figure Al.2.11 , which have a different 
frequency. The spacing between adjacent levels decreases until it reaches a minimum between the third and 
fourth levels from the top. This minimum is the hallmark of a separatrix [29, 45] in phase space. 


A 1.2.13 MULTIPLE RESONANCES IN POLYATOMICS 

Implicit in the discussion of the Darling-Dennison and Fermi resonances has been the assumption that we can 
isolate each individual resonance, and consider its bifurcations and associated spectral patterns separately 
from other resonances in the system. However, strictly speaking, this cannot be the case. Consider again H 2 0. 
The Darling-Dennison resonance couples the symmetric and antisymmetric stretches; the Fermi resonance 
couples the symmetric stretch and bend. Indirectly, all three modes are coupled, and the two resonances are 
linked. It is no longer true that the stretch polyad number (n + « ) is conserved, because it is broken by the 
2: 1 Fermi coupling; nor is the Fermi polyad number (n s + n^/2) preserved, because it is broken by the 
Darling-Dennison coupling. However, there is still a generalized 'total' polyad number 


W| 0ta | = (tt* +W fl +n h /2) (A 1.2.15) 


that is conserved by both couplings, as may readily be verified. All told, the Hamiltonian with both couplings 
has two constants of motion, the energy and the polyad number (Al.2.15). A system with fewer constants than 
the number of degrees of freedom, in this case two constants and three degrees of freedom, is 'non- 
integrable', in the language of classical mechanics [29]. This means that, in general, trajectories do not lie on 
higher-dimensional invariant tori; instead, they may be chaotic, and in fact this is often observed to be the 
case [58, 59] in trajectories of the semiclassical Hamiltonian for H 2 0. 
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Nonetheless, it is still possible to perform the bifurcation analysis on the multiresonance Hamiltonian. In fact, 
the existence of the polyad number makes this almost as easy, despite the presence of chaos, as in the case of 
an isolated single Fermi or Darling-Dennison resonance. It is found [ 60 ] that most often (though not always), 
the same qualitative bifurcation behaviour is seen as in the single resonance case, explaining why the 
simplified individual resonance analysis very often is justified. The bifurcation analysis has now been 
performed for triatomics with two resonances [60] and for C 2 H 2 with a number of resonances [61]. 


A 1.2.14 POTENTIAL AND EXPERIMENT: CLOSING THE CIRCLE 

We have alluded to the connection between the molecular PES and the spectroscopic Hamiltonian. These are 
two very different representations of the molecular Hamiltonian, yet both are supposed to describe the same 
molecular dynamics. Furthermore, the PES often is obtained via ab initio quantum mechanical calculations; 
while the spectroscopic Hamiltonian is most often obtained by an empirical fit to an experimental spectrum. Is 
there a direct link between these two seemingly very different ways of apprehending the molecular 
Hamiltonian and dynamics? And if so, how consistent are these two distinct ways of viewing the molecule? 

There has been a great deal of work [62, 63] investigating how one can use perturbation theory to obtain an 
effective Hamiltonian like the spectroscopic Hamiltonian, starting from a given PES. It is found that one can 
readily obtain an effective Hamiltonian in terms of normal mode quantum numbers and coupling. 
Furthermore, the actual Hamiltonians obtained very closely match those obtained via the empirical fitting of 
spectra! This consistency lends great confidence that both approaches are complementary, mutually consistent 
ways of apprehending real information on molecules and their internal dynamics. 

Is it possible to approach this problem the other way, from experiment to the molecular PES? This is difficult 


to answer in general, because 'inversion' of spectra is not a very well-posed question mathematically. 
Nonetheless, using spectra to gain information on potentials has been pursued with great vigor. Even for 
diatomics, surprising new, mathematically powerful methods are being developed [64]. For polyatomics, it 
has been shown [65] how the effective spectroscopic Hamiltonian is a very useful way-station on the road 
from experiment back to the PES. This closes the circle, because it shows that one can go from an assumed 
PES to the effective Hamiltonian derived via perturbation theory; or take the opposite path from the 
experimentally obtained effective spectroscopic Hamiltonian to the PES. 


A 1.2.15 POLYAD QUANTUM NUMBERS IN LARGER SYSTEMS 

We have seen that resonance couplings destroy quantum numbers as constants of the spectroscopic 
Hamiltonian. With both the Darling-Dennison stretch coupling and the Fermi stretch-bend coupling in H 2 0, 
the individual quantum numbers n . n n and n h were destroyed, leaving the total polyad number (n c + n n + 
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n b l2) as the only remaining quantum number. We can ask: (1) Is there also a good polyad number in larger 
molecules? (2) If so, how robust is this quantum number? For example, how high in energy does it persist as 
the molecule approaches dissociation or a barrier to isomerization? (3) Is the total polyad number the only 
good vibrational quantum number left over after the resonances have been taken into account, or can there be 
others? 
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It may be best to start with question (3). Given the set of resonance coupling operators found to be necessary 
to obtain a good fit of an experimental spectrum, it can be shown that the resonance couplings may be 
represented as vectors, which are not necessarily orthogonal. This leads to a simple but very powerful 
'resonance vector analysis' [62, 66, 67]. The original vector space of the normal mode coordinates has TV 
dimensions. The subspace spanned by the resonance vectors is the space of the vibrational quantum numbers 
that was destroyed; the complement of this space gives the quantities that remain as good quantum numbers. 
In general, there can be more than one such quantum number; we will encounter an example of this in C 2 H 2 , 
and see that it has important implications for the internal molecular dynamics. The set of good quantum 
numbers may contain one or more of the original individual normal mode quantum numbers; but in general, 
the good constants are combinations of the original quantum numbers. Examples of this are the polyad 
numbers that we have already encountered. 

The resonance vector analysis has been used to explore all of the questions raised above on the fate of the 
polyad numbers in larger molecules, the most thoroughly investigated case so far probably being C 2 H 2 . This 
molecule has been very extensively probed by absorption as well as stimulated emission pumping and 
dispersed fluorescence techniques [52, 53, 68, 69, 70 and 71], the experimental spectra have been analysed in 
great detail and the fits to data have been carefully refined with each new experiment. A large number of 
resonance coupling operators has been found to be important, a good many more than the number of 
vibrational modes, which are seven in number: a symmetric C-H stretch v 1? antisymmetric C-H stretch v 3? 
C-C stretch v 2 and two bends v 4 and v 5 , each doubly degenerate. Despite the plethora of couplings, the 
resonance vector analysis shows that the total polyad number 


A r W i ; ,i = (5« . + 3n 3 + 5«j +« 4 + *s) (A 1 - Z16) 

is a good quantum number up to at least about 15, 000 cm -1 . This is at or near the barrier to the formation of 
the isomer vinylidene! (The coefficients 5, 3, 5, 1 and 1 in (A 1.2. 16) are close to the frequency ratios of the 
zero-order normal modes, which is to say, the polyad number satisfies a resonance condition, as in the earlier 
examples for H 2 0.) The polyad number Af total has been used with great effect to identify remarkable order 


[ 66 , 67 , 68 , 69 and 70] in the spectrum: groups of levels can clearly be identified that belong to distinct 
polyads. Furthermore, there are additional 'polyad' constants — that is, quantum numbers that are 
combinations of the original quantum numbers — in addition to the total polyad number (Al.2.16). These 
additional constants have great significance for the molecular dynamics. They imply the existence of energy 
transfer pathways [67]. For example, in dispersed fluorescence spectra in which pure bending motion is 
excited, it has been found that with as many as 22 quanta of bend, all of the vibrational excitation remains in 
the bends on the time scale associated with dispersed fluorescence spectroscopy, with no energy transfer to 
the stretches [72]. 


A 1.2.16 ISOMERIZATION SPECTRA 

We have spoken of the simplicity of the bifurcation analysis when the spectroscopic Hamiltonian possesses a 
good polyad number, and also of the persistence of the polyad number in C 2 H 2 as the molecule approaches 
the barrier to isomerization to the species vinylidene. This suggests that it might be possible to use detailed 
spectra to probe the dynamics of a system undergoing an internal rearrangement. Several groups [56, 57] have 
been investigating the rearrangement of HCP to the configuration CPH, through analysis of the 'isomerization 
spectrum'. Many of the tools described in this section, including decoding the dynamics through analysis of 
bifurcations and associated spectral 
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patterns, have come into play. The various approaches all implicate an 'isomerization mode' in the 
rearrangement process, quite distinct from any of the low-energy normal modes of the system. An explanation 
has been provided [ 57 ] in terms of the abrupt birth of the isomerization mode. This occurs at a bifurcation, in 
which the HCP molecule suddenly acquires a stable motion that takes it along the isomerization pathway, 
thereby altering the geometry and with it the rotational constant. 

It should be emphasized that isomerization is by no means the only process involving chemical reactions in 
which spectroscopy plays a key role as an experimental probe. A very exciting topic of recent interest is the 
observation and computation [73, 74] of the spectral properties of the transition state [6] — catching a 
molecule 'in the act' as it passes the point of no return from reactants to products. Furthermore, it has been 
discovered from spectroscopic observation [75] that molecules can have motions that are stable for long times 
even above the barrier to reaction. 


A 1.2.17 BREAKDOWN OF THE POLYAD NUMBERS 

The polyad concept is evidently a very simple but powerful tool in the analysis and description of the internal 
dynamics of molecules. This is especially fortunate in larger molecules, where the intrinsic spectral 
complexity grows explosively with the number of atoms and degrees of freedom. Does the polyad number 
ever break down? Strictly speaking, it must: the polyad number is only an approximate property of a 
molecule's dynamics and spectrum. The actual molecular Hamiltonian contains resonance couplings of all 
forms, and these must destroy the polyad numbers at some level. This will show up by looking at high enough 
resolution at a spectrum which at lower resolution has a good polyad number. Levels will be observed of 
small intensity, which would be rigorously zero if the polyad numbers were exact. The fine detail in the 
spectrum corresponds to long-time dynamics, according to the time-energy uncertainty relation [49]. 

One reason the polyad-breaking couplings are of interest is because they govern the long-time intramolecular 
energy flow, which is important for theories on reaction dynamics. These are considered elsewhere in this 
Encyclopedia and in monographs [6] and will not be considered further here. The long-time energy flow may 


also be important for efforts of coherent control and for problems of energy flow from a molecule to a bath, 
such as a surrounding liquid. Both of these will be considered later. 

Several questions arise on the internal dynamics associated with the breakdown of the polyad number. We can 
only speculate in what follows, awaiting the illumination of future research. 

When the polyad number breaks down, as evidenced by the inclusion of polyad-breaking terms in the 
spectroscopic Hamiltonian, what is the residue left in the spectrum of the polyads as approximately conserved 
entities? There is already some indication [ 76 ] that the polyad organization of the spectrum will still be 
evident even with the inclusion of weak polyad-breaking terms. The identification of these polyad-breaking 
resonances will be a challenge, because each such resonance probably only couples a given polyad to a very 
small subset of 'dark' states of the molecule that lie outside those levels visible in the polyad spectrum. There 
will be a large number of such resonances, each of them coupling a polyad level to a small subset of dark 
levels. 

Another question is the nature of the changes in the classical dynamics that occur with the breakdown of the 
polyad number. In all likelihood there are further bifurcations. Apart from the identification of the individual 
polyad-breaking resonances, the bifurcation analysis itself presents new challenges. This is partly because 
with the breakdown 


-26- 

of the polyad number, the great computational simplicity afforded by the block-diagonalization of the 
Hamiltonian is lost. Another problem is that the bifurcation analysis is exactly solvable only when a polyad 
number is present [45], so approximate methods will be needed. 

When the polyad number breaks down, the bifurcation analysis takes on a new kind of interest. The 
approximate polyad number can be thought of as a type of 'bottleneck' to energy flow, which is restricted to 
the phase space of the individual polyad; the polyad breakdown leads to energy flow in the full phase space. 
We can think of the goal as the search for the 'energy transfer modes' of long-time energy flow processes in 
the molecule, another step beyond the current use of bifurcation analysis to find the natural anharmonic modes 
that emerge within the polyad approximation. 

The existence of the polyad number as a bottleneck to energy flow on short time scales is potentially 
important for efforts to control molecular reactivity using advanced laser techniques, discussed below in 
section Al. 2. 20 . Efforts at control seek to intervene in the molecular dynamics to prevent the effects of 
widespread vibrational energy flow, the presence of which is one of the key assumptions of Rice- 
Ramsperger-Kassel-Marcus (RRKM) and other theories of reaction dynamics [6]. 

In connection with the energy transfer modes, an important question, to which we now turn, is the 
significance of classical chaos in the long-time energy flow process, in particular the relative importance of 
chaotic classical dynamics, versus classically forbidden processes involving 'dynamical tunnelling'. 


A 1.2.18 CLASSICAL VERSUS NON-CLASSICAL EFFECTS 

To understand the internal molecular motions, we have placed great store in classical mechanics to obtain a 
picture of the dynamics of the molecule and to predict associated patterns that can be observed in quantum 
spectra. Of course, the classical picture is at best an imprecise image, because the molecular dynamics are 
intrinsically quantum mechanical. Nonetheless, the classical metaphor must surely possess a large kernel of 
truth. The classical structure brought out by the bifurcation analysis has accounted for real patterns seen in 
wavefunctions and also for patterns observed in spectra, such as the existence of local mode doublets, and the 


more subtle level-spacing patterns seen in connection with Fermi resonance spectra. 

However, we have also seen that some of the properties of quantum spectra are intrinsically non-classical, 
apart from the discreteness of quantum states and energy levels implied by the very existence of quanta. An 
example is the splitting of the local mode doublets, which was ascribed to dynamical tunnelling, i.e. processes 
which classically are forbidden. We can ask if non-classical effects are ubiquitous in spectra and, if so, are 
there manifestations accessible to observation other than those we have encountered so far? If there are such 
manifestations, it seems likely that they will constitute subtle peculiarities in spectral patterns, whose 
discernment and interpretation will be an important challenge. 

The question of non-classical manifestations is particularly important in view of the chaos that we have seen 
is present in the classical dynamics of a multimode system, such as a polyatomic molecule, with more than 
one resonance coupling. Chaotic classical dynamics is expected to introduce its own peculiarities into 
quantum spectra [29, 77 ]. In H 2 0, we noted that chaotic regions of phase space are readily seen in the 
classical dynamics corresponding to the spectroscopic Hamiltonian. How important are the effects of chaos in 
the observed spectrum, and in the wavefunctions of the molecule? In H 2 0, there were some states whose 
wavefunctions appeared very disordered, in the region of the 
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phase space where the two resonances should both be manifesting their effects strongly. This is precisely 
where chaos should be most pronounced, and indeed this was observed to be the case [58]. However, close 
examination of the states in question by Keshavamurthy and Ezra [78] showed that the disorder in the 
quantum wavefunction was due not primarily to chaos, but to dynamical tunnelling, the non-classical effect 
invoked earlier to explain the splitting of local mode doublets. 

This demonstrated importance of the non-classical processes in systems with intact polyad numbers prompts 
us to consider again the breakdown of the polyad number. Will it be associated mainly with chaotic classical 
diffusion, or non-classical effects? It has been suggested [ 47 ] that high-resolution structure in spectra, which 
we have said is one of the manifestations of the polyad breakdown, may be predominantly due to non- 
classical, dynamical tunnelling processes, rather than chaotic diffusion. Independent, indirect support comes 
from the observation that energy flow from vibrationally excited diatomic molecules in a liquid bath is 
predominantly due to non-classical effects, to the extent of several orders of magnitude [79]. Whether 
dynamical tunnelling is a far more important energy transfer mechanism within molecules than is classical 
chaos is an important question for the future exploration of the interface of quantum and classical dynamics. 

It should be emphasized that the existence of 'energy transfer modes' hypothesized earlier with the polyad 
breakdown is completely consistent with the energy transfer being due to non-classical, dynamical tunnelling 
processes. This is evident from the observation above that the disorder in the H 2 spectrum is attributable to 
non-classical effects which nonetheless are accompaniments of classical bifurcations. 

The general question of the spectral manifestations of classical chaos and of non-classical processes, and their 
interplay in complex quantum systems, is a profound subject worthy of great current and future interest. 
Molecular spectra can provide an immensely important laboratory for the exploration of these questions. 
Molecules provide all the necessary elements: a mixture of regular and chaotic classical motion, with ample 
complexity for the salient phenomena to make their presence known and yet sufficient simplicity and control 
in the number of degrees of freedom to yield intelligible answers. In particular, the fantastic simplification 
afforded by the polyad constants, together with their gradual breakdown, may well make the spectroscopic 
study of internal molecular motions an ideal arena for a fundamental investigation of the quantum-classical 
correspondence. 


A 1.2.19 MOLECULES IN CONDENSED PHASE 

So far we have considered internal motions mostly of isolated molecules, not interacting with an environment. 
This condition will be approximately met in a dilute gas. However, many of the issues raised may be of 
relevance in processes where the molecule is not isolated at all. An example already briefly noted is the 
transfer of vibrational energy from a molecule to a surrounding bath, for example a liquid. It has been found 
[ 79 ] that when a diatomic molecule such as 2 is vibrationally excited in a bath of liquid oxygen, the transfer 
of vibrational energy is extremely slow. This is due to the extreme mismatch between the energy of an 2 
vibrational quantum, and the far lower energy of the bath's phonon modes — vibrations involving large 
numbers of the bath molecules oscillating together. Classically, the energy transfer is practically non-existent; 
semiclassical approximations, however, show that quantum effects increase the rate by orders of magnitude. 
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The investigation of energy transfer in polyatomic molecules immersed in a bath is just beginning. One issue 
has to do with energy flow from the molecule to the bath. Another issue is the effect of the bath on energy 
flow processes within the molecule. Recent experimental work [80] using ultrafast laser probes of C10 2 
immersed in solvents points to the importance of bifurcations within the solute triatomic for the understanding 
of energy flow both within and from the molecule. 

For a polyatomic, there are many questions on the role of the polyad number in energy flow from the 
molecule to the bath. Does polyad number conservation in the isolated molecule inhibit energy flow to the 
bath? Is polyad number breaking a facilitator or even a prerequisite for energy flow? Finally, does the energy 
flow to the bath increase the polyad number breaking in the molecule? One can only speculate until these 
questions become accessible to future research. 


A 1.2.20 LASER CONTROL OF MOLECULES 

So far, we have talked about the internal motions of molecules which are exhibiting their 'natural' behaviour, 
either isolated in the gas phase or surrounded by a bath in a condensed phase. These natural motions are 
inferred from carefully designed spectroscopic experiments that are sufficiently mild that they simply probe 
what the molecule does when left to 'follow its own lights'. However, there is also a great deal of effort 
toward using high-intensity, carefully sculpted laser pulses which are anything but mild, in order to control 
the dynamics of molecules. In this quest, what role will be played by knowledge of their natural motions? 

Surprisingly, a possible answer may be 'not much of a role at all'. One promising approach [ 81 ] using 
coherent light sources seeks to have the apparatus 'learn' how to control the molecule without knowing much 
at all about its internal properties in advance. Instead, a 'target' outcome is selected, and a large number of 
automated experiments performed, in which the control apparatus learns how to achieve the desired goal by 
rationally programmed trial and error in tailoring coherent light sources. It might not be necessary to learn 
much at all about the molecule's dynamics before, during or after, to make the control process work, even 
though the control apparatus might seem to all appearances to be following a cunning path to achieve its ends. 

It can very well be objected that such a hit-or-miss approach, no matter how cleverly designed, is not likely to 
get very far in controlling polyatomic molecules with more than a very small number of atoms — in fact one 
will do much better by harnessing knowledge of the natural internal motions of molecules in tandem with the 
process of external control. The counter-argument can be made that in the trial and error approach, one will 
hit on the 'natural' way of controlling the molecule, even if one starts out with a method which at first tries 
nothing but brute force, even if one remains resolutely ignorant of why the molecule is responding to the 
evolving control procedure. Of course, if a good way is found to control the molecule, a retrospective 
explanation of how and why it worked almost certainly must invoke the natural motions of the molecule, 


about which much will perhaps have been learned along the way in implementing the process of control. 

The view of this author is that knowledge of the internal molecular motions, perhaps as outlined in this 
chapter, is likely to be important in achieving successful control, in approaches that make use of coherent light 
sources and quantum mechanical coherence. However, at this point, opinions on these issues may not be much 
more than speculation. 
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There are also approaches [82, 83 and 84] to control that have had marked success and which do not rely on 
quantum mechanical coherence. These approaches typically rely explicitly on a knowledge of the internal 
molecular dynamics, both in the design of the experiment and in the achievement of control. So far, these 
approaches have exploited only implicitly the very simplest types of bifurcation phenomena, such as the 
transition from local to normal stretch modes. If further success is achieved along these lines in larger 
molecules, it seems likely that deliberate knowledge and exploitation of more complicated bifurcation 
phenomena will be a matter of necessity. 

As discussed in section Al. 2. 17 , the existence of the approximate polyad numbers, corresponding to short- 
time bottlenecks to energy flow, could be very important in efforts for laser control, apart from the separate 
question of bifurcation phenomena. 

Another aspect of laser control of molecular dynamics is the use of control techniques to probe the internal 
motions of molecules. A full account of this topic is far beyond the scope of this section, but one very 
interesting case in point has important relations to other branches of physics and mathematics. This is the 
phenomenon of 'geometric phases', which are closely related to gauge theories. The latter were originally 
introduced into quantum physics from the classical theory of electromagnetism by Weyl and others (see [85]). 
Quantum field theories with generalizations of the electromagnetic gauge invariance were developed in the 
1950s and have since come to play a paramount role in the theory of elementary particles [ 86 , 87 ]. Geometric 
phases were shown to have directly observable effects in quantum phenomena such as the Aharanov-Bohm 
effect [88]. It was later recognized that these phases are a general phenomenon in quantum systems [89]. One 
of the first concrete examples was pointed out [ 90 ] in molecular systems involving the coupling of rotation 
and vibration. A very systematic exposition of geometric phases and gauge ideas in molecular systems was 
presented in [91]. The possibility of the direct optical observation of the effects of the geometric phases in the 
time domain through coherent laser excitations has recently been explored [92]. 


A 1.2.21 LARGER MOLECULES 

This section has focused mainly on the internal dynamics of small molecules, where a coherent picture of the 
detailed internal motion has been emerging from intense efforts of many theoretical and experimental 
workers. A natural question is whether these kinds of issues will be important in the dynamics of larger 
molecules, and whether their investigation at the same level of detail will be profitable or tractable. 

There will probably be some similarities, but also some fundamental differences. We have mainly considered 
small molecules with relatively rigid structures, in which the vibrational motions, although much different 
from the low-energy, near-harmonic normal modes, are nonetheless of relatively small amplitude and close to 
an equilibrium structure. (An important exception is the isomerization spectroscopy considered earlier, to 
which we shall return shortly.) 

Molecules larger than those considered so far are formed by linking together several smaller components. A 
new kind of dynamics typical of these systems is already seen in a molecule such as C 2 H 6 , in which there is 
hindered rotation of the two methyl groups. Systems with hindered internal rotation have been studied in great 


depth [93], but there are still many unanswered questions. It seems likely that semiclassical techniques, using 
bifurcation analysis, could be brought to bear on these systems with great benefit. 

The dynamics begin to take on a qualitatively different nature as the number of components, capable of 
mutual 
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hindered rotation, starts to become only a little larger than in C 2 H 6 . The reason is that large-amplitude, very 
flexible twisting motions, such as those that start to be seen in a small polymer chain, become very important. 
These large scale 'wiggly motions' define a new class of dynamics and associated frequency scale as a 
characteristic internal motion of the system. 

A hint that bifurcation techniques should be a powerful aid to the understanding of these problems comes 
from the example already considered in HCP isomerization [57]. Here the bifurcation techniques have given 
dramatic insights into the motions that stray very far from the equilibrium structure, in fact approaching the 
top of a barrier to the rearrangement to a different molecular isomer. It seems likely that similar approaches 
will be invaluable for molecules with internal rotors, including flexible polymer systems, but with an increase 
in complexity corresponding to the larger size of the systems. Probably, techniques to separate out the 
characteristic large-amplitude flexible motions from faster high-frequency vibrations, such as those of the 
individual bonds, will be necessary to unlock, along with the tools of the bifurcation analysis, the knowledge 
of the detailed anharmonic motions encrypted in the spectrum. This separation of time scales would be similar 
in some ways to the Born-Oppenheimer separability of nuclear and electronic motion. 

Another class of problems in larger systems, also related to isomerization, is the question of large-amplitude 
motions in clusters of atoms and molecules. The phenomena of internal rearrangements, including processes 
akin to 'melting' and the seeking of minima on potential surfaces of very high dimensionality (due to the 
number of particles), have been extensively investigated [94]. The question of the usefulness of bifurcation 
techniques and the dynamical nature of large-amplitude natural motions in these systems has yet to be 
explored. These problems of large-amplitude motions and the seeking of potential minima in large clusters are 
conceptually related to the problem of protein folding, to which we now turn. 


A 1.2.22 PROTEIN FOLDING 

An example of a kind of extreme challenge in the complexity of internal molecular dynamics comes with very 
complicated biological macromolecules. One of the major classes of these is proteins, very long biopolymers 
consisting of large numbers of amino acid residues [95]. They are very important in biological systems 
because they are the output of the translation of the genetic code: the DNA codes for the sequences of amino 
acid residues for each individual protein produced by the organism. A good sequence, i.e. one which forms a 
biologically useful protein, is one which folds to a more-or-less unique 'native' three-dimensional tertiary 
structure. (The sequence itself is the primary structure; subunits within the tertiary structure, consisting of 
chains of residues, fold to well defined secondary structures, which themselves are folded into the tertiary 
structure.) An outstanding problem, still very far from a complete understanding, is the connection between 
the sequence and the specific native structure, and even the prior question whether a given sequence has a 
reliable native structure at all. For sequences which do fold up into a unique structure, it is not yet possible to 
reliably predict what the structure will be, or what it is about the sequence that makes it a good folder. A 
solution of the sequence-structure problem would be very important, because it would make it possible to 
design sequences in the laboratory to fold to a definite, predictable structure, which then could be tailored for 
biological activity. A related question is the kinetic mechanism by which a good protein folds to its native 
structure. 
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Both the structural and kinetic aspects of the protein- folding problem are complicated by the fact that folding 
takes place within a bath of water molecules. In fact, hydrophobic interactions are almost certainly crucial for 
both the relation of the sequence and the native structure, and the process by which a good sequence folds to 
its native structure. 

It is presently unknown whether the kind of detailed dynamical analysis of the natural motions of molecules 
outlined in this section will be useful for a problem as complicated as that of protein folding. The likely 
applicability of such methods to systems with several internal rotors strung together, and the incipient interest 
in bifurcation phenomena of small molecules immersed in a bath [80], suggests that dynamical analysis might 
also be useful for the much larger structures in proteins. In a protein, most of the molecular motion may be 
essentially irrelevant, i.e. the high-frequency, small-amplitude vibrations of the backbone of the amino acid 
sequence, and, also, probably much of the localized large-amplitude 'wiggly' motion. It is likely that there is a 
far smaller number of relevant large-amplitude, low-frequency motions that are crucial to the folding process. 
It will be of great interest to discover if techniques of dynamical systems such as bifurcation analysis can be 
used to reveal the 'folding modes' of proteins. For this to work, account must be taken of the complication of 
the bath of water molecules in which the folding process takes place. This introduces effects such as friction, 
for which there is little or no experience at present in applying bifurcation techniques in molecular systems. 
Proteins themselves interact with other proteins and with nucleic acids in biological processes of every 
conceivable kind considered at the molecular level. 


A 1.2.23 OUTLOOK 

Knowledge of internal molecular motions became a serious quest with Boyle and Newton, at the very dawn of 
modern natural science. However, real progress only became possible with the advent of quantum theory in 
the 20th century. The study of internal molecular motion for most of the century was concerned primarily 
with molecules near their equilibrium configuration on the PES. This gave an enormous amount of immensely 
valuable information, especially on the structural properties of molecules. 

In recent years, especially the past two decades, the focus has changed dramatically to the study of highly- 
excited states. This came about because of a conjunction of powerful influences, often in mutually productive 
interaction with molecular science. Perhaps the first was the advent of lasers as revolutionary light sources for 
the probing of molecules. Coherent light of unprecedented intensities and spectral purity became available for 
studies in the traditional frequency domain of spectroscopy. This allowed previously inaccessible states of 
molecules to be reached, with new levels of resolution and detail. Later, the development of ultrafast laser 
pulses opened up the window of the ultrafast time domain as a spectroscopic complement to the new richness 
in the frequency domain. At the same time, revolutionary information technology made it possible to apply 
highly-sophisticated analytical methods, including new pattern recognition techniques, to process the wealth 
of new experimental information. The computational revolution also made possible the accurate investigation 
of highly-excited regions of molecular potential surfaces by means of quantum chemistry calculations. 
Finally, new mathematical developments in the study of nonlinear classical dynamics came to be appreciated 
by molecular scientists, with applications such as the bifurcation approaches stressed in this section. 

With these radical advances in experimental technology, computational ability to handle complex systems, 
and new theoretical ideas, the kind of information being sought about molecules has undergone an equally 
profound change. Formerly, spectroscopic investigation, even of vibrations and rotations, had focused 
primarily on structural information. Now there is a marked drive toward dynamical information, including 
problems of energy flow, and 
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internal molecular rearrangement. As emphasized in this section, a tremendous impetus to this was the 
recognition that other kinds of motion, such as local modes, could be just as important as the low-energy 
normal modes, in the understanding of the internal dynamics of highly-excited states. Ultrafast pulsed lasers 
have played a major role in these dynamical investigations. There is also a growing awareness of the immense 
potential for frequency domain spectroscopy to yield information on ultrafast processes in the time domain. 
This involves sophisticated measurements and data analysis of the very complex spectra of excited states; and 
equally sophisticated theoretical analysis to unlock the dynamical information encoded in the spectra. One of 
the primary tools is the bifurcation analysis of phenomenological Hamiltonians used directly to model 
experimental spectra. This gives information on the birth of new anharmonic motions in bifurcations of the 
low-energy normal modes. This kind of analysis is yielding information of startling detail about the internal 
molecular dynamics of high-energy molecules, including molecules undergoing isomerization. The 
ramifications are beginning to be explored for molecules in condensed phase. Here, ultrafast time-domain 
laser spectroscopy is usually necessary; but the requisite knowledge of internal molecular dynamics at the 
level of bifurcation analysis must be obtained from frequency-domain, gas phase experiments. Thus, a fruitful 
interplay is starting between gas and condensed phase experiments, and probes using sophisticated time- and 
frequency-domain techniques. Extension to much larger systems such as proteins is an exciting, largely 
unexplored future prospect. The interplay of research on internal molecular dynamics at the levels of small 
molecules, intermediate-size molecules, such as small polymer chains, and the hyper-complex scale of 
biological macromolecules is a frontier area of chemistry which surely will yield fascinating insights and 
discoveries for a long time to come. 
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A1.3 Quantum mechanics of condensed phases 

James R Chelikowsky 

A1.3.1 INTRODUCTION 

Traditionally one categorizes matter by phases such as gases, liquids and solids. Chemistry is usually 
concerned with matter in the gas and liquid phases, whereas physics is concerned with the solid phase. 
However, this distinction is not well defined: often chemists are concerned with the solid state and reactions 
between solid-state phases, and physicists often study atoms and molecular systems in the gas phase. The term 
condensed phases usually encompasses both the liquid state and the solid state, but not the gas state. In this 
section, the emphasis will be placed on the solid state with a brief discussion of liquids. 

The solid phase of matter offers a very different environment to examine the chemical bond than does a gas or 
liquid [1, 2, 3, 4 and 5]. The obvious difference involves describing the atomic positions. In a solid state, one 
can often describe atomic positions by a static configuration, whereas for liquid and gas phases this is not 
possible. The properties of the liquids and gases can be characterized only by considering some time-averaged 
ensemble. This difference between phases offers advantages in describing the solid phase, especially for 
crystalline matter. Crystals are characterized by a periodic symmetry that results in a system occupying all 
space [6]. Periodic, or translational, symmetry of crystalline phases greatly simplifies discussions of the solid 
state since knowledge of the atomic structure within a fundamental 'subunit' of the crystal, called the unit cell, 
is sufficient to describe the entire system encompassing all space. For example, if one is interested in the 
spatial distribution of electrons in a crystal, it is sufficient to know what this distribution is within a unit cell. 

A related advantage of studying crystalline matter is that one can have symmetry-related operations that 
greatly expedite the discussion of a chemical bond. For example, in an elemental crystal of diamond, all the 
chemical bonds are equivalent. There are no terminating bonds and the characterization of one bond is 
sufficient to understand the entire system. If one were to know the binding energy or polarizability associated 
with one bond, then properties of the diamond crystal associated with all the bonds could be extracted. In 
contrast, molecular systems often contain different bonds and always have atoms at the boundary between the 
molecule and the vacuum. 

Since solids do not exist as truly infinite systems, there are issues related to their termination (i.e. surfaces). 
However, in most cases, the existence of a surface does not strongly affect the properties of the crystal as a 
whole. The number of atoms in the interior of a cluster scale as the cube of the size of the specimen while the 
number of surface atoms scale as the square of the size of the specimen. For a sample of macroscopic size, the 
number of interior atoms vastly exceeds the number of atoms at the surface. On the other hand, there are 
interesting properties of the surface of condensed matter systems that have no analogue in atomic or 
molecular systems. For example, electronic states can exist that 'trap' electrons at the interface between a 
solid and the vacuum [1]. 

Issues associated with order occupy a large area of study for crystalline matter [1, 7, §]. For nearly perfect 
crystals, one can have systems with defects such as point defects and extended defects such as dislocations and 
grain 


boundaries. These defects occur in the growth process or can be mechanically induced. In contrast to 
molecular systems that can be characterized by 'perfect' molecular systems, solids always have defects. 
Individuals atoms that are missing from the ideal crystal structure, or extra atoms unneeded to characterize the 
ideal crystal are called point defects. The missing atoms correspond to vacancies; additional atoms are called 


interstitials. Extended defects are entire planes of atoms or interfaces that do not correspond to those of the 
ideal crystal. For example, edge dislocations occur when an extra half-plane of atoms is inserted in a perfect 
crystal and grain boundaries occur when a solid possesses regions of crystalline matter that have different 
structural orientations. In general, if a solid has no long-range order then one considers the phase to be an 
amorphous solid. The idea of atomic order and 'order parameters' is not usually considered for molecular 
systems, although for certain systems such as long molecular chains of atoms one might invoke a similar 
concept. 

Another issue that distinguishes solids from atomic or molecular systems is the role of controlled defects or 
impurities. Often a pure, elemental crystal is not of great interest for technological applications; however, 
crystals with controlled additions of impurities are of great interest. The alteration of electronic properties 
with defects can be dramatic, involving changes in electrical conductivity by orders of magnitude. As an 
example, the addition of one boron atom for every 10 5 silicon atoms increases the conductivity of pure silicon 
by factor of 10 3 at room temperature [JJ. Much of the electronic materials revolution is based on capitalizing 
on the dramatic changes in electronic properties via the controlled addition of electronically active dopants. 

Of course, condensed phases also exhibit interesting physical properties such as electronic, magnetic, and 
mechanical phenomena that are not observed in the gas or liquid phase. Conductivity issues are generally not 
studied in isolated molecular species, but are actively examined in solids. Recent work in solids has focused 
on dramatic conductivity changes in superconducting solids. Superconducting solids have resistivities that are 
identically zero below some transition temperature [1, 9, 10]. These systems cannot be characterized by 
interactions over a few atomic species. Rather, the phenomenon involves a collective mode characterized by a 
phase representative of the entire solid. 


A1.3.2 MANY-BODY WAVEFUNCTIONS IN CONDENSED PHASES 

One of the most significant achievements of the twentieth century is the description of the quantum 
mechanical laws that govern the properties of matter. It is relatively easy to write down the Hamiltonian for 
interacting fermions. Obtaining a solution to the problem that is sufficient to make predictions is another 
matter. 

Let us consider TV nucleons of charge Z n at positions {R^} for n = 1 , . . . , N and M electrons at positions {r z } for 
i= 1,. . ., M. This is shown schematically in figure Al.3.1 . The Hamiltonian for this system in its simplest 
form can be written as 


Z^ m - r .i + 2 ^ iT^T? (A131 > 




M is the mass of the nucleon, fiis Planck's constant divided by 2tt, m is the mass of the electron. This 

expression omits some terms such as those involving relativistic interactions, but captures the essential 
features for most condensed matter phases. 



Figure Al.3.1. Atomic and electronic coordinates. The electrons are illustrated by filled circles; the nuclei by 
open circles. 

Using the Hamiltonian in equation Al.3.1 , the quantum mechanical equation known as the Schrodinger 
equation for the electronic structure of the system can be written as 

3ft(Ri i H J ,R] l ...^rhr2.r>.. .)ty{Ki.R : »Rj.. J .:ri .^ ri. ..> = £*(Ri- R*. R<- . . .1 n-rj-n,..) (A1.3.2) 

where E is the total electronic energy of the system, and ¥ is the many-body wavefunction. In the early part 
of the twentieth century, it was recognized that this equation provided the means of solving for the electronic 
and nuclear degrees of freedom. Using the variational principle, which states that an approximate 
wavefunction will always have a less favourable energy than the true ground-state energy, one had an 
equation and a method to test the solution. One can estimate the energy from 

f q/*K>j/ d*R t tl% d*Ri . . . d-V L dVj dVi . . . 

E = i-5 : — — — H — - ; , , ■ . (A1.3.3) 
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Solving equation Al.3.2 for anything more complex than a few particles becomes problematic even with the 
most modern computers. Obtaining an approximate solution for condensed matter systems is difficult, but 
considerable progress has been made since the advent of digital computers. Several highly successful 
approximations have been made to solve for the ground-state energy. The nature of the approximations used is 
to remove as many degrees of freedom from the system as possible. 

One common approximation is to separate the nuclear and electronic degrees of freedom. Since the nuclei are 
considerably more massive than the electrons, it can be assumed that the electrons will respond 
'instantaneously' to the nuclear coordinates. This approximation is called the Born-Oppenheimer or adiabatic 
approximation. It allows one to treat the nuclear coordinates as classical parameters. For most condensed 
matter systems, this assumption is highly accurate [ 11 , 12 ]. 


A1. 3.2.1 THE HARTREE APPROXIMATION 

Another common approximation is to construct a specific form for the many-body wavefunction. If one can 
obtain an accurate estimate for the wavefunction, then, via the variational principle, a more accurate estimate 
for the energy will emerge. The most difficult part of this exercise is to use physical intuition to define a trial 
wavefunction. 

One can utilize some very simple cases to illustrate this approach. Suppose one considers a solution for non- 
interacting electrons: i.e. in equation Al.3.1 the last term in the Hamiltonian is ignored. In this limit, it is 


possible to write the many-body wavefunction as a sum of independent Hamiltonians. Using the adiabatic 
approximation, the electronic part of the Hamiltonian becomes 

« c] (,,^...)^ 1 ^-^— . (M.3.4, 

Let us define a nuclear potential, F N , which the /th electron sees as 

— i Z„£ J 

II 
fl-l 

One can now rewrite a simplified Schrodinger equation as 

.-i 
where the Hamiltonian is now defined for the /th electron as 


^(r J -) = -^-^— -. (A1.3.5) 


7ii(rj. r 2 , rj . . .)V'(r,. r 2 > r ? . . .) = £ W'>(n, r 3 , rj , . .) (A1.3.6) 
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For this simple Hamiltonian, let us write the many-body wavefunction as 

^Cn , r 3 , r 3 . . .) = 4> y (r,)^(r 3 )^(n) .... (A1.3.8) 

The (|).(r) orbitals can be determined from a 'one-electron' Hamiltonian 


-, /-7? 2 V 2 \ 


(A1.3.9) 


The index / for the orbital ^.(r) can be taken to include the spin of the electron plus any other relevant 
quantum numbers. The index i runs over the number of electrons, each electron being assigned a unique set of 
quantum 


numbers. This type of Schrodinger equation can be easily solved for fairly complex condensed matter 
systems. The many-body wavefunction in equation A 1.3. 8 is known as the Hartree wavefunction. If one uses 
this form of the wavefunction as an approximation to solve the Hamiltonian including the electron-electron 
interactions, this is known as the Hartree approximation. By ignoring the electron-electron terms, the Hartree 
approximation simply reflects the electrons independently moving in the nuclear potential. The total energy of 
the system in this case is simply the sum of the eigenvalues, E.. 

To obtain a realistic Hamiltonian, the electron-electron interactions must be reinstated in equation Al. 3. 6 : 

m , * \t y v 
KM(r ] ,r^r 1 . J .)^(r l ,r>,r J ...) = y (ff ■ - Y — ■ W(r u r 2l r 3 ...J. (A1.3.10) 


In this case, the individual orbitals, (|)(r), can be determined by minimizing the total energy as per equation 
Al.3.3 , with the constraint that the wavefunction be normalized. This minimization procedure results in the 
following Hartree equation: 


H%(r) 
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(A1.3.11) 


Using the orbitals, (|)(r), from a solution of equation Al.3.1 1, the Hartree many-body wavefunction can be 
constructed and the total energy determined from equation A 1.3. 3 . 

The Hartree approximation is useful as an illustrative tool, but it is not a very accurate approximation. A 
significant deficiency of the Hartree wavefunction is that it does not reflect the anti-symmetric nature of the 
electrons as required by the Pauli principle [7]. Moreover, the Hartree equation is difficult to solve. The 
Hamiltonian is orbitally dependent because the summation in equation Al.3.1 1 does not include the /th 
orbital. This means that if there are M electrons, then M Hamiltonians must be considered and equation 
Al.3.1 1 solved for each orbital. 

A1. 3.2.2 THE HARTREE-FOCK APPROXIMATION 

It is possible to write down a many-body wavefunction that will reflect the antisymmetric nature of the 
wavefunction. In this discussion, the spin coordinate of each electron needs to be explicitly treated. The 
coordinates of an electron may be specified by rs i where s f represents the spin coordinate. Starting with one- 
electron orbitals, (|).(r s), the following form can be invoked: 


*(r|A"|, JT|Jf2, JTlA-3, - ■) = 


0M (■"!-* I ) 
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(A1.3.12) 


This form of the wavefunction is called a Slater determinant. It reflects the proper symmetry of the 
wavefunction and 


the Pauli principle. If two electrons occupy the same orbit, two rows of the determinant will be identical and 
the many-body wavefunction will have zero amplitude. Likewise, the determinant will vanish if two electrons 
occupy the same point in generalized space (i.e. rs. = r.s) as two columns of the determinant will be 
identical. If two particles are exchanged, this corresponds to a sign change in the determinant. The Slater 
determinant is a convenient representation. It is probably the simplest form that incorporates the required 
symmetry properties for fermions, or particles with non-integer spins. 

If one uses a Slater determinant to evaluate the total electronic energy and maintains the orbital normalization, 
then the orbitals can be obtained from the following Hartree-Fock equations: 


(A1.3.13) 






It is customary to simplify this expression by defining an electronic charge density, p: 

M 

p(r) = J]|^(r)|- (A1-3.14) 

and an orbitally dependent exchange-charge density, p llH for the /th orbital: 

This 'density' involves a spin-dependent factor which couples only states (ij) with the same spin coordinates 

(s. 9 s). It is not a true density in that it is dependent on r, r ? ; it has meaning only as defined below. 

i j 

With these charge densities defined, it is possible to define corresponding potentials. The Coulomb or Hartree 
potential, F H? is defined by 


f e 2 
V H (r)= p{rh -dV 

/ |r - r'| 


(A1.3.16) 


and an exchange potential can be defined by 


^ 


VjJCr) = - fp™iry)— ^— dV\ (A1.3.17) 
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This combination results in the following Hartree-Fock equation: 


("57" + ^ N(r) + ^ H(r) + v * (r) V (r) = EiMr) ' 


(A1.3.18) 


Once the Hartree-Fock orbitals have been obtained, the total Hartree-Fock electronic energy of the system, 
.Ejjp, can be obtained from 


M If 1 M f 


Y (A1.3.19) 


^P is not a sum of the Hartree-Fock orbital energies, E f . The factor of jin the electron-electron terms arises 

because the electron-electron interactions have been double-counted in the Coulomb and exchange potentials. 
The Hartree-Fock Schrodinger equation is only slightly more complex than the Hartree equation. Again, the 
equations are difficult to solve because the exchange potential is orbitally dependent. 


There is one notable difference between the Hartree-Fock summation and the Hartree summation. The 
Hartree-Fock sums include the i =j terms in equation Al. 3. 13 . This difference arises because the exchange 
term corresponding to i =j cancels an equivalent term in the Coulomb summation. The i =j term in both the 
Coulomb and exchange term is interpreted as a 'self-screening' of the electron. Without a cancellation 
between Coulomb and exchange terms a 'self-energy' contribution to the total energy would occur. 
Approximate forms of the exchange potential often do not have this property. The total energy then contains a 
self-energy contribution which one needs to remove to obtain a correct Hartree-Fock energy. 

The Hartree-Fock wavefunctions are approximations to the true ground-state many-body wavefunctions. 
Terms not included in the Hartree-Fock energy are referred to as correlation contributions. One definition for 
the correlation energy, E is to write it as the difference between the correct total energy of the system and 
the Hartree-Fock energies: E Qon = ^ exact - £ HF . Correlation energies are sometimes included by considering 
Slater determinants composed of orbitals which represent excited-state contributions. This method of 
including unoccupied orbitals in the many-body wavefunction is referred to as configuration interaction or 
«CI\ 

Applying Hartree-Fock wavefunctions to condensed matter systems is not routine. The resulting Hartree- 
Fock equations are usually too complex to be solved for extended systems. It has been argued that many-body 
wavefunction approaches to the condensed matter or large molecular systems do not represent a reasonable 
approach to the electronic structure problem of extended systems. 


A1.3.3 DENSITY FUNCTIONAL APPROACHES TO QUANTUM 
DESCRIPTIONS OF CONDENSED PHASES 

Alternative descriptions of quantum states based on a knowledge of the electronic charge density equation 
Al.3.14 have existed since the 1920s. For example, the Thomas-Fermi description of atoms based on a 
knowledge of p (r) 


was reasonably successful [ 13 , 14 and 15 ]. The starting point for most discussions of condensed matter begins 
by considering a limiting case that may be appropriate for condensed matter systems, but not for small 
molecules. One often considers a free electron gas of uniform charge density. The justification for this 
approach comes from the observation that simple metals like aluminium and sodium have properties which 
appear to resemble those of a free electron gas. This model cannot be applied to systems with localized 
electrons such as highly covalent materials like carbon or highly ionic materials like sodium chloride. It is 
also not appropriate for very open structures. In these systems large variations of the electron distribution can 
occur. 

A1. 3.3.1 FREE ELECTRON GAS 

Perhaps the simplest description of a condensed matter system is to imagine non-interacting electrons 
contained within a box of volume, Q. The Schrodinger equation for this system is similar to equation A 1.3. 9 
with the potential set to zero: 


-7rV 2 

0(r) = E0(r). (A1.3.20) 

2»i 


Ignoring spin for the moment, the solution of equation Al.3.20 is 

^(r) = ^exp(ik-r). (A1.3.21) 

The energy is given by E(k) = h 2 l^l2m and the charge density by p = 1/Q. k is called a wavevector. 

A key issue in describing condensed matter systems is to account properly for the number of states. Unlike a 
molecular system, the eigenvalues of condensed matter systems are closely spaced and essentially 'infinite' in 

number. For example, if one has 10 electrons, then one can expect to have 10 occupied states. In 
condensed matter systems, the number of states per energy unit is a more natural measure to describe the 
energy distribution of states. 

It is easy to do this with periodic boundary conditions. Suppose one considers a one-dimensional specimen of 
length L. In this case the wavefunctions obey the rule (|)(x + L) = §(x) as x + L corresponds in all physical 
properties to x. For a free electron wavefunction, this requirement can be expressed as exp(i£(x + L) = exp(iAx) 
or as exp(i£L) = 1 or k = 2nn/L where n is an integer. 

Periodic boundary conditions force k to be a discrete variable with allowed values occurring at intervals of 
2n/L. For very large systems, one can describe the system as continuous in the limit of L — > go. Electron states 
can be defined by a density of states defined as follows: 

n/ - r N(E + AE)-N{E) 
D(E) = Inn — 

J M AE (A1.3.22) 

dN 


where N(E) is the number of states whose energy resides below E. For the one-dimensional case, N(k) = 2k/ 

(2n/L) (the factor of two coming from spin) and 6N/dE = (dN/dk) • (dk/dE). Using E(k) = h 2 l^l2m, we have 
* = v^^r/i-V/rand d^/dE = {^/Itn/E/Jt. This results in the one-dimensional density of states as 

DiE) = —^2mfE, (A1.3.23) 

The density of states for a one-dimensional system diverges as E -^ 0. This divergence of D(E) is not a 
serious issue as the integral of the density of states remains finite. In three dimensions, it is straightforward to 
show that 

yi 

(A1.3.24) 
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The singularity is removed, although a discontinuity in the derivative exists as E -^ 0. 

One can determine the total number of electrons in the system by integrating the density of states up to the 
highest occupied energy level. The energy of the highest occupied state is called the Fermi level or Fermi 
energy, E^: 


^mrc^ 


(A1.3.25) 


and 


2m V « / 
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(A1.3.26) 


By defining a Fermi wavevector as k^ = (3tt « el ) where « j is the electron density, « j = 7V/Q, of the system, 
one can write 


£ F = 


2™ 


(A1.3.27) 


It should be noted that typical values for E^ for simple metals like sodium or potassium are of the order of 
several electronvolts. If one defines a temperature, T F , where T^ = E^/k B and k B is the Boltzmann constant, 

typical values for T^ might be 10-10 K. Thus, at ambient temperatures one can often neglect the role of 
temperature in determining the Fermi energy. 

A1. 3.3.2 HARTREE-FOCK EXCHANGE IN A FREE ELECTRON GAS 

For a free electron gas, it is possible to evaluate the Hartree-Fock exchange energy directly [3, 16 ]. The Slater 
determinant is constructed using free electron orbitals. Each orbital is labelled by a k and a spin index. The 
Coulomb 
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potential for an infinite free electron gas diverges, but this divergence can be removed by imposing a 
compensating uniform positive charge. The resulting Hartree-Fock eigenvalues can be written as 


F - * V { V 


W 


lm 


Q,ff |k-k'| 2 


(A1.3.28) 


where the summation is over occupied k-states. It is possible to evaluate the summation by transposing the 
summation to an integration. This transposition is often done for solid-state systems as the state density is so 
high that the system can be treated as a continuum: 


1 


iLr | k _ k '|2 (2ff)»A,^|k-kf " 


(A1.3.29) 


This integral can be solved analytically. The resulting eigenvalues are given by 


E k = 


2m 
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(A1.3.30) 


Using the above expression and equation Al.3.19 , the total electron energy, £ f:| ; G , for a free electron gas 
within the Hartree-Fock approximation is given by 


J?™ _ 
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(A1.3.31) 


The factor of 2 in the first term comes from spin. In the exchange term, there is no extra factor of 2 because 
one can subtract off a 'double-counting term' (see equation A 1.3. 19 ). The summations can be executed as per 
equation A 1.3. 2 9 to yield 


£!!P/" = 5 * F 
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(A1.3.32) 


The first term corresponds to the average energy per electron in a free electron gas. The second term 
corresponds to the exchange energy per electron. The exchange energy is attractive and scales with the cube 
root of the average density. This form provides a clue as to what form the exchange energy might take in an 
interacting electron gas or non-uniform electron gas. 

Slater was one of the first to propose that one replace KJin equation A 1.3. 18 by a term that depends only on 

the cube root of the charge density [17, 18 and 19]. In analogy to equation Al.3.32, he suggested that F* x be 
replaced by 
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(A1.3.33) 


This expression is not orbitally dependent. As such, a solution of the Hartree-Fock equation ( equation 
(Al.3.18) is much easier to implement. Although Slater exchange was not rigorously justified for non-uniform 
electron gases, it was quite successful in replicating the essential features of atomic and molecular systems as 
determined by Hartree-Fock calculations. 


A1. 3.3.3 THE LOCAL DENSITY APPROXIMATION 


In a number of classic papers Hohenberg, Kohn and Sham established a theoretical framework for justifying 
the replacement of the many-body wavefunction by one-electron orbitals [15, 20, 21 ]. In particular, they 
proposed that the charge density plays a central role in describing the electronic structure of matter. A key 
aspect of their work was the local density approximation (LDA). Within this approximation, one can express 
the exchange energy as 


=/ 


£\[p(r)]= p(r)£ x [Mr)]dV 


(A1.3.34) 


where e x [p] is the exchange energy per particle of uniform gas at a density of p. Within this framework, the 
exchange potential in equation Al.3.18 is replaced by a potential determined from the functional derivative of 

E M- 


VM = 




(A1.3.35) 


One serious issue is the determination of the exchange energy per particle, e x , or the corresponding exchange 
potential, V . The exact expression for either of these quantities is unknown, save for special cases. If one 


assumes the exchange energy is given by equation A 1.3 .32 , i.e. the Hartree-Fock expression for the exchange 
energy of the free electron gas, then one can write 

M ' ■ (A1.3.36) 
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and taking the functional derivative, one obtains 

e 2 
V*[p] = (37t 2 p(T)) ] f\ (A1.3.37) 

Comparing this to the form chosen by Slater, we note that this form, known as Kohn-Sham exchange, differs 
by a factor of |: i.e. v^ = 2 V^^/J- For a number of years, some controversy existed as to whether the 

Kohn-Sham or Slater exchange was more accurate for realistic systems [15]. Slater suggested that a 
parameter be introduced that would allow one to vary the exchange between the Slater and Kohn-Sham 
values [19]. The parameter, a, was often 
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placed in front of the Slater exchange: V^ = ^V^*" m a was often chosen to replicate some known feature of an 

exact Hartree-Fock calculation such as the total energy of an atom or ion. Acceptable values of a were 
viewed to range from a = ^to a = 1. Slater's so-called 'X ' method was very successful in describing 

molecular systems [19]. Notable drawbacks of theX a method centre on its ad hoc nature through the a 
parameter and the omission of an explicit treatment of correlation energies. 

In contemporary theories, a is taken to be ^, and correlation energies are explicitly included in the energy 

functionals [15]. Sophisticated numerical studies have been performed on uniform electron gases resulting in 
local density expressions of the form F Y _[p(r)] = ^ Y [p(r)] + ^Jp(r)] where V n represents contributions to the 
total energy beyond the Hartree-Fock limit [22]. It is also possible to describe the role of spin explicitly by 
considering the charge density for up and down spins: p = p + p i . This approximation is called the local spin 
density approximation [15]. 

The Kohn-Sham equation [21] for the electronic structure of matter is given by 

(~ir~ + VvCr) + VHtr) + v *<» (r) ]V' (r) = Ei ^ r) - (A1 - 3 - 38) 

This equation is usually solved 'self-consistently'. An approximate charge is assumed to estimate the 
exchange-correlation potential and to determine the Hartree potential from equation Al. 3. 16 . These 
approximate potentials are inserted in the Kohn-Sham equation and the total charge density is obtained from 
equation Al. 3. 14 . The 'output' charge density is used to construct new exchange-correlation and Hartree 
potentials. The process is repeated until the input and output charge densities or potentials are identical to 
within some prescribed tolerance. 

Once a solution of the Kohn-Sham equation is obtained, the total energy can be computed from 

(A1.3.39) 


£ K s s E^-|/pW v hW dV + / p(r)(4 c [p(r)] - V«[p(i-))) dV. 

The electronic energy, as determined from i? KS , must be added to the ion-ion interactions to obtain the 
structural energies. This is a straightforward calculation for confined systems. For extended systems such as 
crystals, the calculations can be done using Madelung summation techniques [2]. 

Owing to its ease of implementation and overall accuracy, the local density approximation is the current 
method of choice for describing the electronic structure of condensed matter. It is relatively easy to implement 
and surprisingly accurate. Moreover, recent developments have included so-called gradient corrections to the 
local density approximation. In this approach, the exchange-correlation energy depends on the local density 
the gradient of the density. This approach is called the generalized gradient approximation or GGA [23]. 

When first proposed, density functional theory was not widely accepted in the chemistry community. The 
theory is not 'rigorous' in the sense that it is not clear how to improve the estimates for the ground-state 
energies. For wavefunction-based methods, one can include more Slater determinants as in a configuration 
interaction approach. As the wavefunctions improve via the variational theorem, the energy is lowered. In 
density functional theory, there is no 
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analogous procedure. The Kohn-Sham equations are also variational, but need not approach the true ground- 
state energy. This is not a problem provided that one is interested in relative energies and any inherent density 
functional errors cancel in the difference. 

In some sense, density functional theory is an a posteriori theory. Given the transference of the exchange- 
correlation energies from an electron gas, it is not surprising that errors would arise in its implementation to 
highly non-uniform electron gas systems as found in realistic systems. However, the degree of error 
cancellations is rarely known a priori. The reliability of density functional theory has only been established by 
numerous calculations for a wide variety of condensed matter systems. For example, the cohesive energies, 
compressibility, structural parameters and vibrational spectra of elemental solids have been calculated within 
the density functional theory [24]. The accuracy of the method is best for systems in which the cancellation of 
errors is expected to be complete. Since cohesive energies involve the difference in energies between atoms in 
solids and atoms in free space, error cancellations are expected to be significant. This is reflected in the fact 
that historically cohesive energies have presented greater challenges for density functional theory: the errors 
between theory and experiment are typically -5-10%, depending on the nature of the density functional. In 
contrast, vibrational frequencies which involve small structural changes within a given crystalline 
environment are easily reproduced to within 1-2%. 


A1.3.4 ELECTRONIC STATES IN PERIODIC POTENTIALS: BLOCH'S 
THEOREM 

Crystalline matter serves as the testing ground for electronic structure methods applied to extended systems. 
Owing to the translational periodicity of the system, a knowledge of the charge density in part of the crystal is 
sufficient to understand the charge density throughout the crystal. This greatly simplifies quantum 
descriptions of condensed matter. 

A1. 3.4.1 THE STRUCTURE OF CRYSTALLINE MATTER 

A key aspect in defining a crystal is the existence of a building block which, when translated by a precise 


prescription an infinite number of times, replicates the structure of interest. This building block is call a unit 
cell. The numbers of atoms required to define a unit cell can vary greatly from one solid to another. For 
simple metals such as sodium only one atom may be needed in defining the unit cell. Complex organic 
crystals can require thousands of atoms to define the building block. 

The unit cell can be defined in terms of three lattice vectors: (a, b, c). In a periodic system, the point x is 
equivalent to any point x 1 , provided the two points are related as follows: 

i = x' - 11 | a +■ n 2 b +■ j?Ttt (A1 .3.40) 

where n^,n 2 , n^ are arbitrary integers. This requirement can be used to define the translation vectors. 
Equation A 1.3. 40 can also be written as 

x =%' + R, P| ^ ifll (A1.3.41) 
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where R. = ft, a + ?u b + tk c is called a translation vector. The set of points located by R M1 . 
formed by all possible combinations of (n v n 2 , n^) is called a lattice. 

Knowing the lattice is usually not sufficient to reconstruct the crystal structure. A knowledge of the vectors (a, 
b, c) does not specify the positions of the atoms within the unit cell. The positions of the atoms within the unit 
cell is given by a set of vectors: x. ? i = 1, 2, 3. . . n where n is the number of atoms in the unit cell. The set of 
vectors, x., is called the basis. For simple elemental structures, the unit cell may contain only one atom. The 
lattice sites in this case can be chosen to correspond to the atomic sites, and no basis exists. 


The position of the /th atom in a crystal, r f , is given by 


■I.* (A1.3.42) 

where the index y refers to they'th atom in the cell and the indices n^n 2 , n^ refer to the cell. The construction 
of the unit cell, i.e. the lattice vectors K nl n2 n3 and the basis vector x, is not unique. The choice of unit cell is 
usually dictated by convenience. The smallest possible unit cell which properly describes a crystal is called 
the primitive unit cell. 

(A) FACE-CENTRED CUBIC (FCC) STRUCTURE 

The FCC structure is illustrated in figure Al.3.2 . Metallic elements such as calcium, nickel, and copper form 
in the FCC structure, as well as some of the inert gases. The conventional unit cell of the FCC structure is 
cubic with the length of the edge given by the lattice parameter, a. There are four atoms in the conventional 
cell. In the primitive unit cell, there is only one atom. This atom coincides with the lattice points. The lattice 
vectors for the primitive cell are given by 

a = a(y + z)/2 b = a{\ + i}/2 c = a{* + y)/2 + (A1.3.43) 

This structure is called 'close packed' because the number of atoms per unit volume is quite large compared 
with other simple crystal structures. 
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Figure Al.3.2. Structure of a FCC crystal. 

(B) BODY-CENTRED CUBIC (BCC) STRUCTURE 

The BCC structure is illustrated in figure Al.3.3 . Elements such as sodium, tungsten and iron form in the 
BCC structure. The conventional unit cell of the BCC structure is cubic, like FCC, with the length of the edge 
given by the lattice parameter, a. There are two atoms in the conventional cell. In the primitive unit cell, there 
is only one atom and the lattice vectors are given by 

a = a(-x + y + z)/2 b = u{k - y - z)/2 c = a(\ + y - z)/2. (A1 .3.44) 

(C) DIAMOND STRUCTURE 

The diamond structure is illustrated in figure Al.3.4 . Elements such as carbon, silicon and germanium form in 
the diamond structure. The conventional unit cell of the diamond structure is cubic with the length of the edge 
given by the lattice parameter, a. There are eight atoms in the conventional cell. The diamond structure can 
be constructed by considering two interpenetrating FCC crystals displaced one-fourth of the body diagonal. 
For the primitive unit cell, the lattice vectors are the same as for the FCC crystal; however, each lattice point 
has a basis associated with it. The basis can be chosen as 

F| = -a{\, I, l)/8 r 2 = a{l t l t l)/8. (A1.3.45) 

(D) ROCKSALT STRUCTURE 

The rocksalt structure is illustrated in figure Al.3.5 . This structure represents one of the simplest compound 
structures. Numerous ionic crystals form in the rocksalt structure, such as sodium chloride (NaCl). The 
conventional unit cell of the rocksalt structure is cubic. There are eight atoms in the conventional cell. For the 
primitive unit cell, the lattice vectors are the same as FCC. The basis consists of two atoms: one at the origin 
and one displaced by one-half the body diagonal of the conventional cell. 
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Figure Al.3.3. Structure of a BCC crystal. 



Figure Al.3.4. Structure of a diamond crystal. 



Figure Al.3.5. Structure of a rocksalt crystal. 
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A1. 3.4.2 BLOCH'S THEOREM 


The periodic nature of crystalline matter can be utilized to construct wavefunctions which reflect the 
translational symmetry. Wavefunctions so constructed are called Block functions [JJ. These functions greatly 
simplify the electronic structure problem and are applicable to any periodic system. 


For example, consider a simple crystal with one atom per lattice point: the total ionic potential can be written 

as 


0') = £^ <r - R - T >- 


(A1.3.46) 

K.T 

This ionic potential is periodic. A translation of r to r + R can be accommodated by simply reordering the 
summation. Since the valence charge density is also periodic, the total potential is periodic as the Hartree and 
exchange-correlation potentials are functions of the charge density. In this situation, it can be shown that the 
wavefunctions for crystalline matter can be written as 

k (r ) = exp( ik - r)u k (r) (A1 .3.47) 

where k is a wave vector and u^ (r) is a periodic function, u^ (r + R)= u^ (r). This is known as Bloch 's 
theorem. In the limit of a free electron, k can be identified with the momentum of the electron and u^ = 1. 

The wavevector is a good quantum number: e.g., the orbitals of the Kohn-Sham equations [ 21 ] can be 
rigorously labelled by k and spin. In three dimensions, four quantum numbers are required to characterize an 
eigenstate. In spherically symmetric atoms, the numbers correspond to n, /, m, s, the principal, angular 
momentum, azimuthal and spin quantum numbers, respectively. Bloch's theorem states that the equivalent 
quantum numbers in a crystal are k x , k . k z and spin. The spin index is usually dropped for non-magnetic 
materials. 

By taking the (|> k orbitals to be of the Bloch form, the Kohn-Sham equations can be written as 

/(p + fik) 2 \ 

I - L ^ — + V>i(r) + Vn(r) + V«[p(r)] li/ h (r) = E{k)it v (t). (A1.3.48) 

Knowing the energy distributions of electrons, £(k), and the spatial distribution of electrons, p(r), is important 
in obtaining the structural and electronic properties of condensed matter systems. 


A1.3.5 ENERGY BANDS FOR CRYSTALLINE SOLIDS 

A1. 3.5.1 KRONIG-PENNEY MODEL 

One of the first models to describe electronic states in a periodic potential was the Kronig-Penney model [JJ. 
This model is commonly used to illustrate the fundamental features of Bloch's theorem and solutions of the 
Schrodinger 
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equation for a periodic system. 
This model considers the solution of wavefunctions for a one-dimensional Schrodinger equation: 

-fi 2 V 2 I 

—^— + V(x) U(x) = Ejff(x). (A1.3.49) 

This Schrodinger equation has a particularly simple solution for a finite energy well: V(x) = - V^ for < x < a 
(region I) and V(x) = elsewhere (region II) as indicated in figure Al.3.6 . This is a standard problem in 


elementary quantum mechanics. For a bound state (E < 0) the wavefunctions have solutions in region I: i|/j(x) 
= B exp(iXx) + C exp(-iKx) and in region II: i|/ n (x) = A exp(-g \x\). The wavefunctions are required to be 
continuous: v|/j(0) = \|/ n (0) and \|/j(^) = i|/ n (tf) and have continuous first derivatives: \\f^(0) = \|/ n '(0) and \|/j'(fl) 
= \|/jj f (a). With these conditions imposed at x = 

B/C = -(1 + iK/Q) 2 /(l + E-/£ 2 ) (A1.3.50) 

and at x = a 

2?/C = -(l-itf/C) 2 exp(-2ifl:0)/(l * K 2 fQ 2 ). (A1.3.51) 

A nontrivial solution will exist only if 

(I + jtf/Q) 2 = (1 - itf/<?) 2 CXp(-2i^«) (A1.3.52) 

or 

Q 2 - 2QK ztil(Ka) - K 2 = 0. (A1.3.53) 

This results in two solutions: 

Q = -K cot{Ka/2) and Q = KmniKafl). (A1.3.54) 

If \|/j and \|/ n are inserted into the one-dimensional Schrodinger equation, one finds E = h K /2m - V^ or 

K = v2m(E + VW/ft'and E = -h 2 Q 2 /2m. In the limit Vq — > oo, or AT — > oo, equation Al.3.53 can result in a finite 

value for 2 only if tan(X<z/2) — > 0, or coi{Kal2) — » (i.e. Ka = nn where n is an integer). The energy levels in 
this limit correspond to the standard 'particle in a box' eigenvalues: 

Ti 2 (2nnfti) 7 


2 m 
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In the Kronig-Penney model, a periodic array of such potentials is considered as illustrated in figure Al.3.6. 
The width of the wells is a and the wells are separated by a distance b. Space can be divided to distinct 
regions: region I (-b < x < 0), region II (0 < x < a) and region III (a < x < a + b). In region I, the wavefunction 
can be taken as 


^i(y) = Ccxp(£?.c)+ Dcxp(-£J-0- (A1.3.55) 

In region II, the wavefunction is 

^ii (x ) = A exp{i Kx) + II exp( - iK x). (A1 .3.56) 

Unlike an isolated well, there is no restriction on the sign on the exponentials, i.e. both exp(+ Qx) and exp(- 
Qx) are allowed. For an isolated well, the sign was restricted so that the exponential vanished as |x| — » go. 
Either sign is allowed for the periodic array as the extent of the wavefunction within each region is finite. 
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Figure Al.3.6. An isolated square well (top). A periodic array of square wells (bottom). This model is used in 
the Kronig-Penney description of energy bands in solids. 


Because our system is periodic, one need only consider the wavefunctions in I and II and apply the periodic 
boundary conditions for other regions of space. Bloch's theorem can be used in this case: \\f (x + a) = exp(ika) 
\|/(x) or \|/(x + (a + b)) = exp(ik(a + b)) \|/(x). This relates \|/ m and ij/p 


^iii {x) = cxp(iA(rt + /j))^iU) 


(A1.3.57) 


or 


^miU) - cxp{ik(a + h)){CcxplQ(A - a -/?))+ Dc\p(-Q{x - a - h))). 
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(A1.3.58) 


k now serves to label the states in the same sense n serves to label states for a square well. 

As in the case of the isolated well, one can impose continuity of the wavefunctions and the first derivatives of 
the wavefunctions at x = and x = a. At x = 0, 


A + B = C + D 


\K{A - H) = Q(C- D) 


(A1.3.59) 


and at x = a 


A oxpdKa) + B zxp(-\Ka) = exp(i* (a +■ b)){C cxp(-Qb) - D cxp(£M) 


(A1.3.60) 


iK a{ A vxpti Kit)- B wpl- \ Kit)) = Q wpiikta + h)HC wpi-Qh) + DwpiQb)). 


(A1.3.61) 


This results in four equations and four unknowns. Since the equations are homogeneous, a nontrivial solution 
exists only if the determinant formed by the coefficients of A, B, C and D vanishes. The solution to this 
equation is 


2QK 


sinh{Qb) s\a{Ka) - cosh{Qb) cas{Ka) = cos(Jt(« + !?)). 


(A1.3.62) 


Equation A 1.3. 62 provides a relationship between the wave vector, k, and the energy, E, which is implicit in Q 
and K. 


Before this result is explored in more detail, consider the limit where b — » oo. In this limit, the wells become 
isolated and k has no meaning. As b — » oo, sinh(g&) — » exp(Qb)/2 and cosh(Qb) — » exp(Qb)/2. One can 
rewrite equation A 1.3. 62 as 


((O* — K~) \ 

— —xm(Ka) + costKti) = cos(Jt(u + />)). 
2QK J 


(A1.3.63) 


As exp(Qb)/2 — > oo, this equation can be valid if 


2QK 


sm{Ka) * cos(Ka)^ 


(A1.3.64) 


otherwise the rhs of equation Al.3.63 would diverge. In this limit, equation Al.3.64 reduces to the isolated 
well solution ( equation Al. 3. 53 ): 


Q 2 - 2QK cotiKa) - K 2 = 0. 


(A1.3.65) 


Since k does not appear in equation Al.3.65 in this limit, it is undefined. 
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One can illustrate how the energy states evolve from discrete levels in an isolated well to states appropriate 
for periodic systems by varying the separation between wells. In figure Al.3.7 solutions for E versus k are 
shown for isolated wells and for strongly interacting wells. It is important to note that k is not defined except 
within a factor of 2nm/(a + b) where m is an integer as cos((£ + 2nm/(a + b)) {a + b)) = cos(k(a + b)). The E 
versus k plot need be displayed only for k between and n/(a + b) as larger values of k can be mapped into 
this interval by subtracting off values of 2nl(a + b). 



2rE/(a+b) fc iKfto+b) 
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Figure Al.3.7. Evolution of energy bands in the Kronig-Penney model as the separation between wells, b 
( figure A 1.3. 6 ) is decreased from (a) to (d). In (a) the wells are separated by a large distance (large value of b) 
and the energy bands resemble discrete levels of an isolated well. In (d) the wells are quite close together 
(small value of b) and the energy bands are free-electron-like. 

In the case where the wells are far apart, the resulting energy levels are close to the isolated well. However, an 
interesting phenomenon occurs as the atoms are brought closer together. The energy levels cease being 
constant as a function of the wavevector, k. There are regions of allowed solutions and regions where no 
energy state occurs. The region of allowed energy values is called an energy band. The range of energies 
within the band is called the band width. As the width of the band increases, it is said that the band has greater 
dispersion. 

The Kronig-Penney solution illustrates that, for periodic systems, gaps can exist between bands of energy 
states. As for the case of a free electron gas, each band can hold 27V electrons where TV is the number of wells 
present. In one dimension, this implies that if a well contains an odd number, one will have partially occupied 
bands. If one has an even number of electrons per well, one will have fully occupied energy bands. This 
distinction between odd and even numbers of electrons per cell is of fundamental importance. The Kronig- 
Penney model implies that crystals with an odd number of electrons per unit cell are always metallic whereas 
an even number of electrons per unit cell implies an 
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insulating state. This simple rule is valid for more realistic potentials and need be only slightly modified in 
three dimensions. In three dimensions, an even number of electrons per unit cells is a necessary condition for 
an insulating state, but not a sufficient condition. 

One of the major successes of energy band theory is that it can be used to predict whether a crystal exists as a 
metal or insulator. If a band is filled, the Pauli principle prevents electrons from changing their momentum in 
response to the electric field as all possible momentum states are occupied. In a metal this constraint is not 
present as an electron can change its momentum state by moving from a filled to an occupied state within a 
given band. The distinct types of energy bands for insulators, metals, semiconductors and semimetals are 
schematically illustrated in figure A 1.3. 8. In an insulator, energy bands are either completely empty or 
completely filled. The band gap between the highest occupied band and lowest empty band is large, e.g. above 
5 eV. In a semiconductor, the bands are also completely filled or empty, but the gap is smaller, e.g. below 3 
eV. In metals bands are not completely occupied and no gap between filled and empty states occurs. 
Semimetals are a special case. No gap exists, but one band is almost completely occupied; it overlaps with a 
band that is almost completely empty. 
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Figure Al.3.8. Schematic energy bands illustrating an insulator (large band gap), a semiconductor (small 
band gap), a metal (no gap) and a semimetal. In a semimetal, one band is almost filled and another band is 
almost empty. 


A1.3.5.2 RECIPROCAL SPACE 


Expressing E(k) is complicated by the fact that k is not unique. In the Kronig-Penney model, if one replaced k 
by k + 2n/(a + b), the energy remained unchanged. In three dimensions k is known only to within a reciprocal 
lattice vector, G. One can define a set of reciprocal vectors, given by 


G = m l A + tt> 2 ]t + ni i C 


(A1.3.66) 


where the set (A, B, C) define a lattice in reciprocal space. These vectors can be defined by 
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A = — b x c 


B = — c x b 


„ 2jr 

C = — a x b 

ft 


(A1.3.67) 


where fl is defined as the unit cell volume. Note that fl = |a • b x c| from elementary vector analysis. It is easy 
to show that 


A a = 2n 
B-a=0 

Ca = 


A b = 
B-h = 27r 
C h = 


A c = 
B c = 
C c = 2jt. 


(A1.3.68) 


It is apparent that 


G - R = 27T(/i|*H] ^/IjHIt +/J3WJ3). 


(A1.3.69) 


Reciprocal lattice vectors are useful in defining periodic functions. For example, the valence charge density, p 
(r), can be expressed as 


p<r) = ^p(C)ex P (iG-r), 


(A1.3.70) 


It is clear that p(r + R) = p(r) from equation Al.3.69. The Fourier coefficients, p(G), can be determined from 

(A1.3.71) 


p(G) = -f ptr) exp{-iG r) dV, 


Because ^(k) = E(k + G), a knowledge of ^(k) within a given volume called the Brillouin zone is sufficient to 
determine E(fc) for all k. In one dimension, G = 2nn/d where d is the lattice spacing between atoms. In this 
case, E(k) is known once k is determined for -n/d < k < n/d. (For example, in the Kronig-Penney model 
( figure A 1.3. 6 ), d= a + b and k was defined only to within a vector 2n/(a + b).) In three dimensions, this 
subspace can result in complex polyhedrons for the Brillouin zone. 
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In figure Al.3.9 the Brillouin zone for a FCC and a BCC crystal are illustrated. It is a common practice to 
label high-symmetry point and directions by letters or symbols. For example, the k = point is called the Y 
point. For cubic crystals, there exist 48 symmetry operations and this symmetry is maintained in the energy 
bands: e.g., E(k x , k , k 7 ) is invariant under sign permutations of (x, y, z). As such, one need only have 
knowledge of £(k) in ^of the zone to determine the energy band throughout the zone. The part of the zone 
which cannot be reduced by symmetry is called the irreducible Brillouin zone. 




FCC Brillouin Zone 


BCC Brillouin Zone 


Figure Al.3.9. Brillouin zones for the FCC and BCC crystal structures. 

A1. 3.5.3 REALISTIC ENERGY BANDS 

Since the electronic structure of a solid can be determined from a knowledge of the spatial and energetic 
distribution of electrons (i.e. from the charge density, p(r), and the electronic density of states, D{E)), it is 
highly desirable to have the ability to determine the quantum states of crystal. The first successful electronic 
structure calculations for energy bands of crystalline matter were not performed from 'first principles'. 
Although elements of density functional theory were understood by the mid-1960s, it was not clear how 
reliable these methods were. Often, two seemingly identical calculations would yield very different results for 
simple issues such as whether a solid was a metal or an insulator. Consequently, some of the first reliable 
energy bands were constructed using empirical pseudopotentials [25]. These potentials were extracted from 
experimental data and not determined from first principles. 


A1. 3.5.4 EMPIRICAL PSEUDOPOTENTIALS 

The first reliable energy band theories were based on a powerful approximation, call the pseudopotential 
approximation. Within this approximation, the all-electron potential corresponding to interaction of a valence 
electron with the inner, core electrons and the nucleus is replaced by a pseudopotential. The pseudopotential 
reproduces only the properties of the outer electrons. There are rigorous theorems such as the Phillips- 
Kleinman cancellation theorem that can be used to justify the pseudopotential model [2, 3, 26]. The Phillips— 
Kleinman cancellation theorem states that the orthogonality requirement of the valence states to the core 
states can be described by an effective repulsive 
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potential. This repulsive potential cancels the strong Coulombic potential within the core region. The 
cancellation theorem explains, in part, why valence electrons feel a less attractive potential than would be 
expected on the basis of the Coulombic part of the potential. For example, in alkali metals an 'empty' core 
pseudopotential approximation is often made. In this model pseudopotential, the valence electrons experience 
no Coulomb potential within the core region. 

Since the pseudopotential does not bind the core states, it is a very weak potential. Simple basis functions can 
be used to describe the pseudo-wavefunctions. For example, a simple grid or plane wave basis will yield a 
converged solution [25]. The simplicity of the basis is important as it results in an unbiased, flexible 
description of the charge density. Also, since the nodal structure of the pseudo-wavefunctions has been 
removed, the charge density varies slowly in the core region. A schematic model of the pseudopotential model 
is illustrated in figure Al.3.10. The pseudopotential model describes a solid as a sea of valence electrons 
moving in a periodic background of cores (composed of nuclei and inert core electrons). In this model many 
of the complexities of all-electron calculations, calculations that include the core and valence electrons on an 
equal footing, are avoided. A group IV solid such as C with 6 electrons per atom is treated in a similar fashion 
to Sn with 50 electrons per atom since both have 4 valence electrons per atom. In addition, the focus of the 
calculation is only on the accuracy of the valence electron wavefunction in the spatial region away from the 
chemically inert core. 



Nucfeus 
Core electrons 
Valence eleclrons- 


Figure Al.3.10. Pseudopotential model. The outer electrons (valence electrons) move in a fixed arrangement 
of chemically inert ion cores. The ion cores are composed of the nucleus and core electrons. 


One can quantify the pseudopotential by writing the total crystalline potential for an elemental solid as 


V r (r) = J] 5(G}V;((7)exp(in r). 


(A1.3.72) 


G 
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S(G) is the structure factor given by 


S(G} = — ^exp(iG-T) 


(A1.3.73) 


where AT is the number of atoms in the unit call and x is a basis vector. ~(G) is the form factor given by 

V»{G) = -J- f V p »exp(iG r)dV (A1.3.74) 

where Q a is the volume per atom and ■*(>) is a pseudopotential associated with an atom. Often this potential 
is assumed to be spherically symmetry. In this case, the form factor depends only on the magnitude of G: l£ 

a 

(G) = lt(|G|). A schematic pseudopotential is illustrated in figure Al.3.1 1. Outside the core region the 

it 

pseudopotential is commensurate with the all-electron potential. When this potential is transformed into 
Fourier space, it is often sufficient to keep just a few unique form factors to characterize the potential. These 
form factors are then treated as adjustable parameters which can be fitted to experimental data. This is 
illustrated in figure Al. 3. 12 . 
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Figure Al.3.11. Schematic pseudopotential in real space. 
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Figure Al.3.12. Schematic pseudopotential in reciprocal space. 

The empirical pseudopotential method can be illustrated by considering a specific semiconductor such as 
silicon. The crystal structure of Si is diamond. The structure is shown in figure Al.3.4 . The lattice vectors and 
basis for a primitive cell have been defined in the section on crystal structures ( Al. 3.4.1 ). In Cartesian 
coordinates, one can write G for the diamond structure as 


2jt 
a 


(A1.3.75) 


where the indices (n, /, m) must be either all odd or all even: e.g., G = 2JL(l 9 0, 0) is not allowed, but G = Irc 

a a 

(2, 0, 0) is permitted. It is convenient to organize G-vectors by their magnitude squared in units of (2n/a) . In 

this scheme: G 2 = 0, 3, 4, 8, 1 1, 12, . . .. The structure factor for the diamond structure is S(G) = cos(G • x) . 

For some values of G, this structure factor vanishes: e.g., if G = (2rc/a) (2, 0, 0), then G • x = tt/2 and S(G) = 0. 

If the structure factor vanishes, the corresponding form factor is irrelevant as it is multiplied by a zero 

structure factor. In the case of diamond structure, this eliminates the G = 4, 12 form factors. Also, the G = 
factor is not important for spectroscopy as it corresponds to the average potential and serves to shift the 
energy bands by a constant. The rapid convergence of the pseudopotential in Fourier space coupled with the 
vanishing of the structure factor for certain G means that only three form factors are required to fix the energy 
bands for diamond semiconductors like Si and Ge: ^(G = 3), ^f (G = 8) and ^f (G = 1 1). These form 
factors can be fixed by comparisons to reflectivity measurements or photoemission [25]. 

A1. 3.5.5 DENSITY FUNCTIONAL PSEUDOPOTENTIALS 

Another realistic approach is to construct pseudopotentials using density functional theory. The 
implementation of the Kohn-Sham equations to condensed matter phases without the pseudopotential 
approximation is not easy owing to the dramatic span in length scales of the wavefunction and the energy 
range of the eigenvalues. The pseudopotential eliminates this problem by removing the core electrons from 
the problem and results in a much simpler problem [27]. 
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In the pseudopotential construction, the atomic wavefunctions for the valence electrons are taken to be 
nodeless. The pseudo-wavefunction is taken to be identical to the appropriate all-electron wavefunction in the 
regions of interest for solid-state effects. For the core region, the wavefunction is extrapolated back to the 


origin in a manner consistent with the normalization condition. This type of construction was first introduced 
by Fermi to account for the shift in the wavefunctions of high-lying states of alkali atoms subject to 
perturbations from foreign atoms. In this remarkable paper, Fermi introduced the conceptual basis for both the 
pseudopotential and the scattering length [28], 

With the density functional theory, the first step in the construction of a pseudopotential is to consider the 
solution for an isolated atom [27]. If the atomic wavefunctions are known, the pseudo-wavefunction can be 
constructed by removing the nodal structure of the wavefunction. For example, if one considers a valence 
wavefunction for the isolated atom, \|/ (r), then a pseudo-wavefunction, (L(r) 5 might have the properties 


$p(r) = r' cxp(—ar 4 — 0r 3 — yr 7 —&) r < r% 


= $ v (r) 


(A1.3.76) 


r > n.. 


The pseudo-wavefunction within this frame work is guaranteed to be nodeless. The parameters (a, (3, y, 8) are 
fixed so that (1) § w and (b have the same eigenvalue, £ y , and the same norm: 

riiMr)|Vdr= /' |^{r)| V dr. 

Jo Jo 


(A1.3.77) 


This ensures that § (r) = \|/ y (r) for r > r Q after the wavefunctions have been normalized. (2) The pseudo- 
wavefunction should be continuous and have continuous first and second derivatives at r Q . An example of a 
pseudo-wavefunction is given in figure Al.3.13 . 
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Figure Al.3.13. All-electron and pseudopotential wavefunction for the 3s state in silicon. The all-electron 3s 
state has nodes which arise because of an orthogonality requirement to the Is and 2s core states. 

Once the eigenvalue and pseudo-wavefunction are known for the atom, the Kohn-Sham equation can be 
inverted to yield the ionic pseudopotential: 


Since F H and F xc depend only on the valence charge densities, they can be determined once the valence 
pseudo- wavefunctions are known. Because the pseudo-wavefunctions are nodeless, the resulting 
pseudopotential is well defined despite the last term in equation Al.3.78. Once the pseudopotential has been 
constructed from the atom, it can be transferred to the condensed matter system of interest. For example, the 
ionic pseudopotential defined by equation Al.3.78 from an atomistic calculation can be transferred to 
condensed matter phases without any significant loss of accuracy. 

There are complicating issues in defining pseudopotentials, e.g. the pseudopotential in equation Al.3.78 is 
state dependent, orbitally dependent and the energy and spatial separations between valence and core 
electrons are sometimes not transparent. These are not insurmountable issues. The state dependence is usually 
weak and can be ignored. The orbital dependence requires different potentials for different angular 
momentum components. This can be incorporated via non-local operators. The distinction between valence 
and core states can be addressed by incorporating the core level in question as part of the valence shell. For 

example, in Zn one can treat the 3d 10 shell as a valence shell. In this case, the valency of Zn is 12, not 2. 
There are also very reliable approximate methods for treating the outer core states without explicitly 
incorporating them in the valence shell. 
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A1. 3.5.6 OTHER APPROACHES 


There are a variety of other approaches to understanding the electronic structure of crystals. Most of them rely 
on a density functional approach, with or without the pseudopotential, and use different bases. For example, 
instead of a plane wave basis, one might write a basis composed of atomic-like orbitals: 


tfk(r) = ^ff;(k) expCik R)4> t (r - R) 


(A1.3.79) 

r k 

where the exp(ik • R) is explicitly written to illustrate the Bloch form of this wavefunction: i.e. \|/ k (r + R) = 
exp(ik • R) \|/ k (r). The orbitals (|> . can be taken from atomic structure solutions where i is a general index such 
as Imns, or § . can be taken to be a some localized function such as an exponential, called a Slater-type orbital, 
or a Gaussian orbital. Provided the basis functions are appropriately chosen, this approach works quite well 
for a wide variety of solids. This approach is called the tight binding method [2, 7]. 

An approach closely related to the pseudopotential is the orthogonalized plane wave method [29]. In this 
method, the basis is taken to be as follows: 


^ pw (r) = exptik ■ r) - J2 AXa(r) 


(A1.3.80) 

■■ 

and 

Xi.ulr) = ^expfik ■ R)^(r - R) (A1.3.81) 

where %. k is a tight binding wavefunction composed of atomic core functions, a f . As an example, one would 
take (a ls , a 2s , a 2 ) atomic orbitals for the core states of silicon. The form for (k (r) is motivated by several 
factors. In the interstitial regions of a crystal, the potential should be weak and slowly varying. The 


wavefunction should look like a plane wave in this region. Near the nucleus, the wavefunction should look 
atomic-like. The basis reflects these different regimes by combining plane waves with atomic orbitals. 
Another important attribute of the wavefunction is an orthogonality condition. This condition arises from the 
form of the Schrodinger equation; higher-energy eigenvalues must have wavefunctions which are orthogonal 
to more tightly bound states of the same symmetry: e.g., the 2s wavefunction of an atom must be orthogonal 
to the Is state. It is possible to choose fi f so that 


/ 


<p£{r)Xi*ir)fr"r = 0. (A1.3.82) 


The orthogonality condition assures one that the lowest energy state will not converge to core-like states, but 
valence states. The wavefunction for the solid can be written as 


^ k (r) = ^ ff (k 1 G)0 k nrw (r). 


(A1.3.83) 
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As with any basis method the a(k, G) coefficients are determined by solving a secular equation. 

Other methods for determining the energy band structure include cellular methods, Green function approaches 
and augmented plane waves [2, 3]. The choice of which method to use is often dictated by the particular 
system of interest. Details in applying these methods to condensed matter phases can be found elsewhere (see 
section B3. 2 ). 


A1.3.6 EXAMPLES FOR THE ELECTRONIC STRUCTURE AND 
ENERGY BANDS OF CRYSTALS 

Many phenomena in solid-state physics can be understood by resort to energy band calculations. Conductivity 
trends, photoemission spectra, and optical properties can all be understood by examining the quantum states 
or energy bands of solids. In addition, electronic structure methods can be used to extract a wide variety of 
properties such as structural energies, mechanical properties and thermodynamic properties. 

A1.3.6.1 SEMICONDUCTORS 

A prototypical semiconducting crystal is silicon. Historically, silicon has been the testing ground for quantum 
theories of condensed matter. This is not surprising given the importance of silicon for technological 
applications. The energy bands for Si are shown in figure Al.3.14 . Each band can hold two electrons per unit 
cell. There are four electrons per silicon atom and two atoms in the unit cell. This would lead to four filled 
bands. It is customary to show the filled bands and the lowest few empty bands. In the case of silicon the 
bands are separated by a gap of approximately 1 eV. Semiconductors have band gaps that are less than a few 
electronvolts. Displaying the energy bands is not a routine matter as E(fc) is often a complex function. The 
bands are typically displayed only along high-symmetry directions in the Brillouin zone (see figure Al.3.9 ). 
For example, one might plot the energy bands along the (100) direction (the A direction). 
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Figure Al.3.14. Band structure for silicon as calculated from empirical pseudopotentials [25]. 

The occupied bands are called valence bands; the empty bands are called conduction bands. The top of the 
valence band is usually taken as energy zero. The lowest conduction band has a minimum along the A 
direction; the highest occupied valence band has a maximum at Y. Semiconductors which have the highest 
occupied 1^-state and lowest empty state k c at different points are called indirect gap semiconductors. If k^ = 
k c , the semiconductor is call direct gap semiconductor. Germanium is also an indirect gap semiconductor 
whereas GaAs has a direct gap. It is not easy to predict whether a given semiconductor will have a direct gap 
or not. 

Electronic and optical excitations usually occur between the upper valence bands and lowest conduction band. 
In optical excitations, electrons are transferred from the valence band to the conduction band. This process 
leaves an empty state in the valence band. These empty states are called holes. Conservation of wavevectors 
must be obeyed in these transitions: k h oton + 1^ = k c where k hoton is the wavevector of the photon, 1^ is the 
wavevector of the electron in the initial valence band state and k c is the wavevector of the electron in the final 
conduction band state. For optical excitations, k D j loton ~ 0. This implies that the excitation must be direct: \l^ « 
k Q . Because of this conservation rule, direct optical excitations are stronger than indirect excitations. 

Semiconductors are poor conductors of electricity at low temperatures. Since the valence band is completely 
occupied, an applied electric field cannot change the total momentum of the valence electrons. This is a 
reflection of the Pauli principle. This would not be true for an electron that is excited into the conduction 
band. However, for a band gap of 1 eV or more, few electrons can be thermally excited into the conduction 
band at ambient temperatures. Conversely, the electronic properties of semiconductors at ambient 
temperatures can be profoundly altered by the 
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addition of impurities. In silicon, each atom has four covalent bonds, one to each neighbouring atom. All the 
valence electrons are consumed in saturating these bonds. If a silicon atom is removed and replaced by an 
atom with a different number of valence electrons, there will be a mismatch between the number of electrons 
and the number of covalent bonds. For example, if one replaces a silicon atom by a phosphorous atom, then 


there will an extra electron that cannot be accommodated as phosphorous possesses five instead of four 
valence electrons. This extra electron is only loosely bound to the phosphorous atom and can be easily excited 
into the conduction band. Impurities with an 'extra' electron are called donors. Under the influence of an 
electric field, this donor electron can contribute to the electrical conductivity of silicon. If one were to replace 
a silicon atom by a boron atom, the opposite situation would occur. Boron has only three valence electrons 
and does not possess a sufficient number of electrons to saturate the bonds. In this case, an electron in the 
valence band can readily move into the unsaturated bond. Under the influence of an electric field, this 
unsaturated bond can propagate and contribute to the electrical conductivity as if it were a positively charged 
particle. The unsaturated bond corresponds to a hole excitation. Impurity atoms that have less than the number 
of valence electrons to saturate all the covalent bonds are called acceptors. 

Several factors determine how efficient impurity atoms will be in altering the electronic properties of a 
semiconductor. For example, the size of the band gap, the shape of the energy bands near the gap and the 
ability of the valence electrons to screen the impurity atom are all important. The process of adding controlled 
impurity atoms to semiconductors is called doping. The ability to produce well defined doping levels in 
semiconductors is one reason for the revolutionary developments in the construction of solid-state electronic 
devices. 

Another useful quantity is defining the electronic structure of a solid is the electronic density of states. In 
general the density of states can be defined as 


(2ny ^Jm 


(A1.3.84) 

Unlike the density of states defined in equation A 1.3. 24 , which was specific for the free electron gas, equation 
Al.3.84 is a general expression. The sum in equation Al.3.84 is over all energy bands and the integral is over 
all k-points in the Brillouin zone. The density of states is an extensive function that scales with the size of the 
sample. It is usually normalized to the number of electronic states per atom. In the case of silicon, the number 
of states contained by integrating D(E) up to the highest occupied states is four states per atom. Since each 
state can hold two electrons with different spin coordinates, eight electrons can be accommodated within the 
valence bands. This corresponds to the number of electrons within the unit cell with the resulting valence 
bands being fully occupied. 

The density of states for crystalline silicon is shown in figure Al. 3. 15 . The density of states is a more general 
representation of the energetic distribution of electrons than the energy band structure. The distribution of 
states can be given without regard to the k wavevector. It is possible to compare the density of states from the 
energy band structure directly to experimental probes such as those obtained in photoemission. Photoemission 
measurements can be used to measure the distribution of binding electrons within a solid. In these 
measurements, a photon with a well defined energy impinges on the sample. If the photon carries sufficient 
energy, an electron can be excited from the valence state to a free electron state. By knowing the energy of the 
absorbed photon and the emitted electron, it is possible to determine the energy of the electron in the valence 
state. The number of electrons emitted is proportional to the number of electrons in the initial valence states; 
the density of states gives a measure of the number of photoemitted electrons for a given binding energy. In 
realistic calculations of the photoemission spectra, the probability of making a transition from the valence 
band to the vacuum must be included, but often the transition probabilities are 
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similar over the entire valence band. This is illustrated in figure A 1.3. 15 . Empty states cannot be measured 
using photoemission so these contributions are not observed. 
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Figure Al.3.15. Density of states for silicon (bottom panel) as calculated from empirical pseudopotential 
[25]. The top panel represents the photoemission spectra as measured by x-ray photoemission spectroscopy 
[30]. The density of states is a measure of the photoemission spectra. 

By examining the spatial character of the wavefunctions, it is possible to attribute atomic characteristics to the 
density of states spectrum. For example, the lowest states, 8 to 12 eV below the top of the valence band, are s- 
like and arise from the atomic 3s states. From 4 to 6 eV below the top of the valence band are states that are 
also s-like, but change character very rapidly toward the valence band maximum. The states residing within 4 
eV of the top of the valence band are p and arise from the 3p states. 

A major achievement of the quantum theory of matter has been to explain the interaction of light and matter. 
For example, the first application of quantum theory, the Bohr model of the atom, accurately predicted the 
electronic excitations in the hydrogen atom. In atomic systems, the absorption and emission of light is 
characterized by sharp lines. Predicting the exact frequencies for atomic absorption and emission lines 
provides a great challenge and testing ground for any theory. This is in apparent contrast to the spectra of 
solids. The continuum of states in solids, i.e. 
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energy bands, allows many possible transitions. A photon with energy well above the band gap can excite a 
number of different states corresponding to different bands and k-points. The resulting spectra correspond to 
broad excitation spectra without the sharp structures present in atomic transitions. This is illustrated in figure 
Al.3.16. The spectrum consists of three broad peaks with the central peak at about 4.5 eV. 
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Figure Al.3.16. Reflectivity of silicon. The theoretical curve is from an empirical pseudopotential method 
calculation [25]. The experimental curve is from [31]. 

The interpretation of solid-state spectra as featureless and lacking the information content of atomic spectra is 
misleading. If one modulates the reflectivity spectra of solids, the spectra are quite rich in structure. This is 
especially the case at low temperatures where vibrational motions of the atoms are reduced. In figure Al. 3. 17 
the spectra of silicon is differentiated. The process of measuring a differentiated spectra is called modulation 
spectroscopy. In modulated reflectivity spectra, broad undulating features are suppressed and sharp features 
are enhanced. It is possible to modulate the reflectivity spectrum in a variety of ways. For example, one can 
mechanically vibrate the crystal, apply an alternating electric field or modulate the temperature of the sample. 
One of the most popular methods is to measure the reflectivity directly and then numerically differentiate the 
reflectivity data. This procedure has the advantage of being easily interpreted[25]. 
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Figure Al.3.17. Modulated reflectivity spectrum of silicon. The theoretical curve is obtained from an 
empirical pseudopotential calculation [25]. The experimental curve is from a wavelength modulation 
experiment from [32]. 

The structure in the reflectivity can be understood in terms of band structure features: i.e. from the quantum 
states of the crystal. The normal incident reflectivity from matter is given by 


/ 1 \ 2 I N - I 


(A1.3.85) 


where 7 Q is the incident intensity of the light and lis the reflected intensity. TV is the complex index of 
refraction. The complex index of refraction, TV, can be related to the dielectric function of matter by 

N 2 =€] ±i€ 2 (A1.3.86) 

where e 1 is the real part of the dielectric function and e 2 is the imaginary part of the dielectric function. 

It is possible to make a connection between the quantum states of a solid and the resulting optical properties 
of a solid. 
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In contrast to metals, most studies have concentrated on insulators and semiconductors where the optical 
structure readily lends itself to a straightforward interpretation. Within certain approximations, the imaginary 
part of the dielectric function for semiconducting or insulating crystals is given by 


*2Uo) = P^ Y -t^-t ( *<<Mk> - w)l««:(k)| 2 d'*. 


(A1.3.87) 


The matrix elements are given by 


M vg (k)=y , w * v (r)V Wh . c (r)d 


V (A1.3.88) 


where w k c , is the periodic part of the Bloch wavefunction. The summation in equation Al.3.87 is over all 
occupied to empty state transitions from valence (v) to conduction bands (c). The energy difference between 
occupied and empty states is given by £ c (k) - E y (k) which can be defined as a frequency; a> vc (k) = (E (k) - 
i? v (k))/fi. The delta function term, 5(a> vc (k) - co), ensures conservation of energy. The matrix elements, M yc , 

control the oscillator strength. As an example, suppose that the v — » c transition couples states which have 
similar parity. The matrix elements will be small because the momentum operator is odd. Although angular 
momentum is not a good quantum number in condensed matter phases, atomic selection rules remain 
approximately true. 

This expression for e 2 neglects the spatial variation of the perturbing electric field. The wavelength of light 
for optical excitations is between 4000-7000 A and greatly exceeds a typical bond length of 1-2 A. Thus, the 
assumption of a uniform field is usually a good approximation. Other effects ignored include many-body 
contributions such as correlation and electron-hole interactions. 

Once the imaginary part of the dielectric function is known, the real part can be obtained from the Kramers- 
Kronig relation: 


JT Jo OJ — Itf- 


(A1.3.89) 


The principal part of the integral is taken and the integration must be done over all frequencies. In practice, 
the integration is often terminated outside of the frequency range of interest. Once the full dielectric function 
is known, the reflectivity of the solid can be computed. 

It is possible to understand the fine structure in the reflectivity spectrum by examining the contributions to the 
imaginary part of the dielectric function. If one considers transitions from two bands (v — » c), equation 
Al.3.87 can be written as 

j 2-* -i /^ 

*2(^X, = ** ' |M VL ,| / 5(aUk) -*>)d 3 4. (A1.3.90) 
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Under the assumption that the matrix elements can be treated as constants, they can be factored out of the 
integral. This is a good approximation for most crystals. By comparison with equation Al. 3. 84 , it is possible 
to define a function similar to the density of states. In this case, since both valence and conduction band states 
are included, the function is called the joint density of states: 


(2tt) 3 Jnz 


(A1.3.91) 


With this definition, one can write 

ft(»)„=£?^|M„|-M«). (A1-3.92) 

Within this approximation, the structure in e 2 (a>) vc can be related to structure in the joint density of states. 
The joint density of states can be written as a surface integral [1]: 


AcM = 7— T / TZ-^-TTT;^ (A1.3.93) 


IvkttTool 


ds is a surface element defined by co vc (k) = co. The sharp structure in the joint density of states arises from 
zeros in the dominator. This occurs at critical points where 

V k *> v( (k) = (A1.3.94) 

or 

V h £ v (k) = V k £ ( (k) (A1.3.95) 

when the slopes of the valence band and conduction band are equal. The group velocity of an electron or hole 
is defined as v = V k E(fc). Thus, the critical points occur when the hole and electrons have the same group 
velocity. 

The band energy difference or a> vc (k) can be expanded around a critical point k as 

1 
*\ tf (k) = *V(M + ^^(k - k, v ); t +■ ■ ■ *. (A1.3.96) 

ji-l 

The expansion is done around the principal axes so only three terms occur in the summation. The nature of the 
critical point is determined by the signs of the a n . If a n > for all n, then the critical point corresponds to a 
local minimum. If a < for all n, then the critical point corresponds to a local maximum. Otherwise, the 
critical points correspond to saddle points. 
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The types of critical points can be labelled by the number of a n less than zero. Specifically, the critical points 
are labelled by M. where / is the number of a which are negative: i.e. a local minimum critical point would 
be labelled by M Q , a local maximum by M 3 and the saddle points by (M 1? M 2 ). Each critical point has a 
characteristic line shape. For example, the M Q critical point has a joint density of state which behaves as J yc = 
constant x ^/a? — ft^for co > a> and zero otherwise, where co corresponds to the M Q critical point energy. At 

co = C0q, J yc has a discontinuity in the first derivative. In figure Al.3.18 the characteristic structure of the joint 
density of states is presented for each type of critical point. 



U}-> 


Figure Al.3.18. Typical critical point structure in the joint density of states. 

For a given pair of valence and conduction bands, there must be at least one M Q and one M 3 critical points and 
at least three M 1 and three M 2 critical points. However, it is possible for the saddle critical points to be 
degenerate. In the simplest possible configuration of critical points, the joint density of states appears as in 
figure A1.3. 19. 



i Wl *J2 W 3 ^ 

Figure Al.3.19. Simplest possible critical point structure in the joint density of states for a given energy band. 

It is possible to identify particular spectral features in the modulated reflectivity spectra to band structure 
features. For example, in a direct band gap the joint density of states must resemble that of a M Q critical point. 
One of the first applications of the empirical pseudopotential method was to calculate reflectivity spectra for a 
given energy band. Differences between the calculated and measured reflectivity spectra could be assigned to 
errors in the energy band 
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structure. Such errors usually involve the incorrect placement or energy of a critical point feature. By making 
small adjustments in the pseudopotential, it is almost always possible to extract an energy band structure 
consistent with the measure reflectivity. 

The critical point analysis performed for the joint density of states can also be applied to the density of states. 
By examining the photoemission spectrum compared with the calculated density of states, it is also possible to 
assess the quality of the energy band structure. Photoemission spectra are superior to reflectivity spectra in the 
sense of giving the band structure energies relative to a fixed energy reference, such as the vacuum level. 
Reflectivity measurements only give relative energy differences between energy bands. 

In figure Al.3.20 and figure Al.3.21 the real and imaginary parts of the dielectric function are illustrated for 


silicon. There are some noticeable differences in the line shapes between theory and experiment. These 
differences can be attributed to issues outside of elementary band theory such as the interactions of electrons 
and holes. This issue will be discussed further in the following section on insulators. Qualitatively, the real 
part of the dielectric function appears as a simple harmonic oscillator with a resonance at about 4.5 eV. This 
energy corresponds approximately to the cohesive energy per atom of silicon. 
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Figure Al.3.20. Real part of the dielectric function for silicon. The experimental work is from [31]. The 
theoretical work is from an empirical pseudopotential calculation [25]. 
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Figure Al.3.21. Imaginary part of the dielectric function for silicon. The experimental work is from [31]. The 
theoretical work is from an empirical pseudopotential calculation [25]. 

It is possible to determine the spatial distributions, or charge densities, of electrons from a knowledge of the 
wavefunctions. The arrangement of the charge density is very useful in characterizing the bond in the solid. 
For example, if the charge is highly localized between neighbouring atoms, then the bond corresponds to a 
covalent bond. The classical picture of the covalent bond is the sharing of electrons between two atoms. This 
picture is supported by quantum calculations. In figure Al. 3. 22 the electronic distribution charge is illustrated 
for crystalline carbon and silicon in the diamond structure. In carbon the midpoint between neighbouring 
atoms is a saddle point: this is typical of the covalent bond in organics, but not in silicon where the midpoint 
corresponds to a maximum of the charge of the density. X-ray measurements also support the existence of the 
covalent bonding charge as determined from quantum calculations [33]. 
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Valence charge density (1 10 plane) 



Figure Al.3.22. Spatial distributions or charge densities for carbon and silicon crystals in the diamond 
structure. The density is only for the valence electrons; the core electrons are omitted. This charge density is 
from an ab initio pseudopotential calculation [27], 


Although empirical pseudopotentials present a reliable picture of the electronic structure of semiconductors, 
these potentials are not applicable for understanding structural properties. However, density- functional- 
derived pseudopotentials can be used for examining the structural properties of matter. Once a self-consistent 
field solution of the Kohn-Sham equations has been achieved, the total electronic energy of the system can be 
determined from equation Al. 3. 39 . One of the first applications of this method was to forms of crystalline 
silicon. Various structural forms of silicon were considered: diamond, hexagonal diamond, P~Sn, simple 
cubic, FCC, BCC and so on. For a given volume, the lattice parameters and any internal parameters can be 
optimized to achieve a ground-state energy. In figure Al.3.23 the total structural energy of the system is 
plotted for eight different forms of silicon. The lowest energy form of silicon is correctly predicted to be the 
diamond structure. By examining the change in the structural energy with respect to volume, it is possible to 
determine the equation of state for each form. It is possible to determine which phase is lowest in energy for a 
specified volume and to determine transition pressures between different phases. As an example, one can 
predict from this phase diagram the transition pressure to transform silicon in the diamond structure to the 
white tin (P~Sn) structure. This pressure is predicted to be approximately 90 MPa; the measured pressure is 
about 120 MPa [34]. The role of temperature has been neglected in the calculation of the structural energies. 
For most applications, this is not a serious issue as the role of temperature is often less than the inherent errors 


within density functional theory. 
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Figure Al.3.23. Phase diagram of silicon in various polymorphs from an ab initio pseudopotential calculation 
[34]. The volume is normalized to the experimental volume. The binding energy is the total electronic energy 
of the valence electrons. The slope of the dashed curve gives the pressure to transform silicon in the diamond 
structure to the P~Sn structure. Other polymorphs listed include face-centred cubic (fee), body-centred cubic 
(bec), simple hexagonal (sh), simple cubic (sc) and hexagonal close-packed (hep) structures. 

One notable consequence of the phase diagram in figure Al.3.23 was the prediction that high-pressure forms 
of silicon might be superconducting [35, 36 ]. This prediction was based on the observation that some high- 
pressure forms of silicon are metallic, but retain strong covalent-like bonds. It was later verified by high- 
pressure measurements that the predicted phase was a superconductor [36]. This success of the structural 
phase diagram of silicon helped verify the utility of the pseudopotential density functional method and has 
resulted in its widespread applicability to condensed phases. 

A1. 3.6.2 INSULATORS 

Insulating solids have band gaps which are notably larger than semiconductors. It is not unusual for an alkali 
halide to have a band gap of -10 eV or more. Electronic states in insulators are often highly localized around 
the atomic sites in insulating materials In most cases, this arises from a large transfer of electrons from one 
site to another. Exceptions are insulating materials like sulfur and carbon where the covalent bonds are so 
strong as to strongly localize charge between neighbouring atoms. 

As an example of the energy band structures for an insulator, the energy bands for lithium fluoride are 
presented in figure A 1.3. 24 . LiF is a highly ionic material which forms in the rocksalt structure ( figure 
A1.3.25 V). The bonding in this crystal can be understood by transferring an electron from the highly 

electropositive Li to the electronegative F atoms: i.e. one can view crystalline LiF as consisting of Li + F~ 
constituents. The highly localized nature of the electronic charge density results in very narrow, almost 
atomic-like, energy bands. 
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Figure Al.3.24. Band structure of LiF from ab initio pseudopotentials [39]. 

One challenge of modern electronic structure calculations has been to reproduce excited-state properties. 
Density functional theory is a ground-state theory. The eigenvalues for empty states do not have physical 
meaning in terms of giving excitation energies. If one were to estimate the band gap from density functional 
theory by taking the eigenvalue differences between the highest occupied and lowest empty states, the energy 
difference would badly underestimate the band gap. Contemporary approaches [37, 38] have resolved this 
issue by correctly including spatial variations in the electron-electron interactions and including self-energy 
terms (see section Al. 3. 2. 2 ). 

Because of the highly localized nature of electronic and hole states in insulators, it is difficult to describe the 
optical excitations. The excited electron is strongly affected by the presence of the hole state. One failure of 
the energy band picture concerns the interaction between the electron and hole. The excited electron and the 
hole can form a hydrogen atomic-like interaction resulting in the formation of an exciton, or a bound electron- 
hole pair. The exciton binding energy reduces the energy for an excitation below that of the conduction band 
and results in strong, discrete optical lines. This is illustrated in figure Al. 3. 25 . 
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Figure Al.3.25. Schematic illustration of exciton binding energies in an insulator or semiconductor. 

A simple model for the exciton is to assume a screened interaction between the electron and hole using a static 
dielectric function. In addition, it is common to treat the many-body interactions in a crystal by replacing the 
true mass of the electron and hole by a dynamical or effective mass. Unlike a hydrogen atom, where the proton 
mass exceeds that of the electron by three orders of magnitude, the masses of the interacting electron and hole 
are almost equivalent. Using the reduced mass for this system, we have l/|i = l/m Q + l/m^. Within this model, 
the binding energy of the exciton can be found from 


-h 2 v* 




^(r)=E b y/<r) 


(A1.3.97) 


where e is the static dielectric function for the insulator of interest. The binding energy from this hydrogenic 
Schrodinger equation is given by 


£ b = 


fie^ 
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(A1.3.98) 


where n = 1, 2, 3,. . .. Typical values for a semiconductor are |u and e are |u = 0.1 m and e = 10. This results in 
a binding energy of about 0.01 eV for the ground state, n = 1. For an insulator, the binding energy is much 
larger. For a material like silicon dioxide, one might have |u = 0.5 m and e = 3 or a binding energy of roughly 
1 eV. This estimate suggests that reflectivity spectra in insulators might be strongly altered by exciton 
interactions. 
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Even in semiconductors, where it might appear that the exciton binding energies would be of interest only for 
low temperature regimes, excitonic effects can strongly alter the line shape of excitations away from the band 
gap. 


The size of the electron-hole pair can be estimated from the Bohr radius for this system: 
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(A1.3.99) 


The size of the exciton is approximately 50 A in a material like silicon, whereas for an insulator the size 
would be much smaller: for example, using our numbers above for silicon dioxide, one would obtain a radius 
of only -3 A or less. For excitons of this size, it becomes problematic to incorporate a static dielectric 
constant based on macroscopic crystalline values. 

The reflectivity of LiF is illustrated in figure Al.3.26. The first large peak corresponds to an excitonic 
transition. 
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Figure Al.3.26. Reflectivity of LiF from ab initio pseudopotentials. (Courtesy of E L Shirley, see [ 39 ] and 
references therein. 

A1. 3.6.3 METALS 

Metals are fundamentally different from insulators as they possess no gap in the excitation spectra. Under the 
influence of an external field, electrons can respond by readily changing from one k state to another. The ease 
by which the ground-state configuration is changed accounts for the high conductivity of metals. 

Arguments based on a free electron model can be made to explain the conductivity of a metal. It can be shown 
that the k will evolve following a Newtonian law [1]: 
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(A1 .3.100) 


This can be integrated to yield 


(k-k ) = -f£tf-*yfi- 


(A1.3.101) 


After some typical time, x, the electron will scatter off a lattice imperfection. This imperfection might be a 
lattice vibration or an impurity atom. If one assumes that no memory of the event resides after the scattering 

event, then on average one has A k = -eEx/h. In this picture, the conductivity of the metal, a, can be extracted 
from Ohm's law: a = J/8 where J is the current density. The current density is given by 

J = -wfAv = -ne(-eEzfm) = ne 2 r£fm (A1.3.102) 

or 

cr = ne 2 r/m, (A1 .3.103) 

This expression for the conductivity is consistent with experimental trends. 

Another important accomplishment of the free electron model concerns the heat capacity of a metal. At low 
temperatures, the heat capacity of a metal goes linearly with the temperature and vanishes at absolute zero. 
This behaviour is in contrast with classical statistical mechanics. According to classical theories, the 

equipartition theory predicts that a free particle should have a heat capacity of 2k B where k B is the Boltzmann 

constant. An ideal gas has a heat capacity consistent with this value. The electrical conductivity of a metal 

suggests that the conduction electrons behave like 'free particles' and might also have a heat capacity of ik B , 

which would be strongly at variance with the observed behaviour and in violation of the third law of 
thermodynamics. 

The resolution of this issue is based on the application of the Pauli exclusion principle and Fermi-Dirac 
statistics. From the free electron model, the total electronic energy, U, can be written as 

U(T) = / tf(t+ T)D{t) de (A1.3.104) 

where f(e, T) is the Fermi-Dirac distribution function and£)(e) is the density of states. The Fermi-Dirac 
function gives the probability that a given orbital will be occupied: 

/(£, T) = . (A1.3.105) 

cxp (<e - E F )/*r) + 1 
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The value of E^ at zero temperature can be estimated from the electron density ( equation A 1.3. 2 6 ). Typical 
values of the Fermi energy range from about 1.6 eV for Cs to 14.1 eV for Be. In terms of temperature (7 F = 
E^lk), the range is approximately 2000-16,000 K. As a consequence, the Fermi energy is a very weak 
function of temperature under ambient conditions. The electronic contribution to the heat capacity, C, can be 
determined from 


- — - f 


€ D{C ) de. (A1 .3.106) 


The integral can be approximated by noting that the derivative of the Fermi function is highly localized 
around E v . To a very good approximation, the heat capacity is 


C=yD(*i)A 2 7\ 


(A1 .3.107) 


The linear dependence of C with temperature agrees well with experiment, but the pre-factor can differ by a 
factor of two or more from the free electron value. The origin of the difference is thought to arise from several 
factors: the electrons are not truly free, they interact with each other and with the crystal lattice, and the 
dynamical behaviour the electrons interacting with the lattice results in an effective mass which differs from 
the free electron mass. For example, as the electron moves through the lattice, the lattice can distort and exert 
a dragging force. 

Simple metals like alkalis, or ones with only s and p valence electrons, can often be described by a free 
electron gas model, whereas transition metals and rare earth metals which have d and f valence electrons 
cannot. Transition metal and rare earth metals do not have energy band structures which resemble free 
electron models. The formed bonds from d and f states often have some strong covalent character. This 
character strongly modulates the free-electron-like bands. 

An example of metal with significant d-bonding is copper. The atomic configuration of copper is 
ls 2 2s 2 2p 6 3s 2 3p 6 3d 10 4s 1 . If the 3d states were truly core states, then one might expect copper to resemble 
potassium as its atomic configuration is Is 2s 2p 6 3s 3p 4s . The strong differences between copper and 
potassium in terms of their chemical properties suggest that the 3d states interact strongly with the valence 
electrons. This is reflected in the energy band structure of copper ( figure Al. 3. 27 ). 
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Figure Al.3.27. Energy bands of copper from ab initio pseudopotential calculations [40], 


Copper has a FCC structure with one atom in the primitive unit cell. From simple orbital counting, one might 
expect the ten d electrons to occupy five d-like bands and the one s electron to 2occupy one s-like band. This 
is apparent in the figure, although the interpretation is not straightforward. The lowest band (L A to T 1 to X A ) is 


s-like, but it mixes strongly with the d-like bands (at T 25 and T 12 ), these bands are triply and doubly 
degenerate at T. Were it not for the d-mixing, the s-like band would be continuous from T 1 to X^. The d- 
mixing 'splits' the s bands. The Fermi level cuts the s-like band along the A direction, reflecting the partial 
occupation of the s levels. 


A1.3.7 NON-CRYSTALLINE MATTER 

A1. 3.7.1 AMORPHOUS SOLIDS 

Crystalline matter can be characterized by long-range order. For a perfect crystal, a prescription can be used 
to generate the positions of atoms arbitrarily far away from a specified origin. However, 'real crystals' always 
contain imperfections. They contain defects which can be characterized as point defects localized to an atomic 
site or extended defects spread over a number of sites. Vacancies on the lattice site or atoms of impurities are 
examples of point defects. Grain boundaries or dislocations are examples of extended defects. One might 
imagine starting from an ideal 
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crystal and gradually introducing defects such as vacancies. At some point the number of defects will be so 
large as to ruin the long-range order of the crystal. Solid materials that lack long-range order are called 
amorphous solids or glasses. The precise definition of an amorphous material is somewhat problematic. 
Usually, any material which does not display a sharp x-ray pattern is considered to be 'amorphous'. Some text 
books [1] define amorphous solids as 'not crystalline on any significant scale'. 

Glassy materials are usually characterized by an additional criterion. It is often possible to cool a liquid below 
the thermodynamic melting point (i.e. to supercool the liquid). In glasses, as one cools the liquid state 
significantly below the melting point, it is observed that at a temperature well below the melting point of the 
solid the viscosity of the supercooled liquid increases dramatically. This temperature is called the glass 
transition temperature, and labelled as T . This increase of viscosity delineates the supercooled liquid state 
from the glass state. Unlike thermodynamic transitions between the liquid and solid state, the liquid — » glass 
transition is not well defined. Most amorphous materials such as tetrahedrally coordinated semiconductors 
like silicon and germanium do not exhibit a glass transformation. 

Defining order in an amorphous solid is problematic at best. There are several 'qualitative concepts' that can 
be used to describe disorder [7]. In figure Al.3.28 a perfect crystal is illustrated. A simple form of disorder 
involves crystals containing more than one type of atom. Suppose one considers an alloy consisting of two 
different atoms (A and B). In an ordered crystal one might consider each A surrounded by B and vice versa. 
In a random alloy, one might consider the lattice sites to remain unaltered but randomly place A and B atoms. 
This type of disorder is called compositional disorder. Other forms of disorder may involve minor distortions 
of the lattice that destroy the long-range order of the solid, but retain the chemical ordering and short-range 
order of the solid. For example, in short-range ordered solids, the coordination number of each atom might be 
preserved. In a highly disordered solid, no short-range order is retained: the chemical ordering is random with 
a number of over- and under-coordinated species. 
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Figure Al.3.28. Examples of disorder: (a) perfect crystal, (b) compositional disorder, (c) positional disorder 
which retains the short-range order and (d) no long-range or short-range order. 
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In general, it is difficult to quantify structural properties of disordered matter via experimental probes as with 
x-ray or neutron scattering. Such probes measure statistically averaged properties like the pair-correlation 
function, also called the radial distribution function. The pair-correlation function measures the average 
distribution of atoms from a particular site. 

Several models have been proposed to describe amorphous solids, in particular glasses. The structure of 
glasses often focus on two archetypes [1]: the continuous random network and microcrystallite models. In the 
continuous random network model, short-range order is preserved. For example, in forms of crystalline silica 
each silicon atom is surrounded by four oxygen atoms. The Si0 4 tetrahedra are linked together in a regular 
way which establishes the crystal structure. In a continuous random network each Si0 4 tetrahedral unit is 
preserved, but the relative arrangement between tetrahedral units is random. In another model, the so-called 
microcrystallite model, small 'crystallites' of the perfect structure exist, but these crystallites are randomly 
arranged. The difference between the random network model and the crystallite model cannot be 
experimentally determined unless the crystallites are sufficiently large to be detected; this is usually not the 
situation. 

Amorphous materials exhibit special quantum properties with respect to their electronic states. The loss of 
periodicity renders Bloch's theorem invalid; k is no longer a good quantum number. In crystals, structural 
features in the reflectivity can be associated with critical points in the joint density of states. Since amorphous 
materials cannot be described by k-states, selection rules associated with k are no longer appropriate. 
Reflectivity spectra and associated spectra are often featureless, or they may correspond to highly smoothed 
versions of the crystalline spectra. 


One might suppose that optical gaps would not exist in amorphous solids, as the structural disorder would 
result in allowed energy states throughout the solid. However, this is not the case, as disordered insulating 
solids such as silica are quite transparent. This situation reflects the importance of local order in determining 
gaps in the excitation spectra. It is still possible to have gaps in the joint density of states without resort to a 
description of energy versus wavevector. For example, in silica the large energy gap arises from the existence 
of Si0 4 units. Disordering these units can cause states near the top of the occupied states and near the bottom 
of the empty states to tail into the gap region, but not remove the gap itself. 


Disorder plays an important role in determining the extent of electronic states. In crystalline matter one can 
view states as existing throughout the crystal. For disordered matter, this is not the case: electronic states 
become localized near band edges. The effect of localization has profound effects on transport properties. 
Electrons and holes can still carry current in amorphous semiconductors, but the carriers can be strongly 
scattered by the disordered structure. For the localized states near the band edges, electrons can be propagated 
only by a thermally activated hopping process. 

A1. 3.7.2 LIQUIDS 

Unlike the solid state, the liquid state cannot be characterized by a static description. In a liquid, bonds break 
and reform continuously as a function of time. The quantum states in the liquid are similar to those in 
amorphous solids in the sense that the system is also disordered. The liquid state can be quantified only by 
considering some ensemble averaging and using statistical measures. For example, consider an elemental 
liquid. Just as for amorphous solids, one can ask what is the distribution of atoms at a given distance from a 
reference atom on average, i.e. the radial distribution function or the pair correlation function can also be 
defined for a liquid. In scattering experiments on liquids, a structure factor is measured. The radial 
distribution function, g(r), is related to the structure factor, S(q), by 
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S(q) = l i A) i [sir)- l]exp(iq r)dV (A1.3.108) 

where p is the average concentration density of the liquid. By taking the Fourier transform of the structure, it 
is possible to determine the radial distribution function of the liquid. 

Typical results for a semiconducting liquid are illustrated in figure Al.3.29 where the experimental pair 
correlation and structure factors for silicon are presented. The radial distribution function shows a sharp first 
peak followed by oscillations. The structure in the radial distribution function reflects some local ordering. 
The nature and degree of this order depends on the chemical nature of the liquid state. For example, 
semiconductor liquids are especially interesting in this sense as they are believed to retain covalent bonding 
characteristics even in the melt. 
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Figure Al.3.29. Pair correlation and structure factor for liquid silicon from experiment [41]. 

One simple measure of the liquid structure is the average coordination number of an atom. For example, the 
average coordination of a silicon atom is four in the solid phase at ambient pressure and increases to six in 
high pressure forms of silicon. In the liquid state, the average coordination of silicon is six. The average 
coordination of the liquid can be determined from the radial distribution function. One common prescription 
is to integrate the area under the first peak of the radial distribution function. The integration is terminated at 
the first local minimum after the first peak. For a crystalline case, this procedure gives the exact number of 
nearest neighbours. In general, coordination numbers greater than four correspond to metallic states of silicon. 
As such, the radial distribution function suggests that silicon is a metal in the liquid state. This is consistent 
with experimental values of the conductivity. Most tetrahedrally coordinated 
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semiconductors, e.g. Ge, GaAs, InP and so on, become metallic upon melting. 

It is possible to use the quantum states to predict the electronic properties of the melt. A typical procedure is 
to implement molecular dynamics simulations for the liquid, which permit the wavefunctions to be 
determined at each time step of the simulation. As an example, one can use the eigenpairs for a given atomic 
configuration to calculate the optical conductivity. The real part of the conductivity can be expressed as 
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n^m tt=x,\\. 


where E f and \\f f are eigenvalues and eigenfunctions, and Q is the volume of the supercell. The dipole 
transition elements, (V m \p a \V n ) 9 reflect the spatial resolution of the initial and final wavefunctions. If the 
initial and final states were to have an even parity, then the electromagnetic field would not couple to these 
states. 


The conductivity can be calculated for each time step in a simulation and averaged over a long simulation 
time. This procedure can be used to distinguish the metallic and semiconducting behaviour of the liquid state. 
As an example, the calculated frequency dependence of the electrical conductivity of gallium arsenide and 
cadmium telluride are illustrated in figure Al.3.30. In the melt, gallium arsenide is a metal. As the 
temperature of the liquid is increased, its DC conductivity decreases. For cadmium telluride, the situation is 
reversed. As the temperature of the liquid is increased, the DC conductivity increases. This is similar to the 
behaviour of a semiconducting solid. As the temperature of the solid is increased, more carriers are thermally 
excited into the conduction bands and the conductivity increases. The relative conductivity of GaAs versus 
CdTe as determined via theoretical calculations agrees well with experiment. 
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Figure Al.3.30. Theoretical frequency-dependent conductivity for GaAs and CdTe liquids from ab initio 
molecular dynamics simulations [42]. 


-54- 


REFERENCES 


[1] Kittel C 1 996 Introduction to Solid State Physics 7th edn (New York: Wiley) 

[2] Ziman J M 1986 Principles of the Theory of Solids 2nd edn (Cambridge: Cambridge University Press) 

[3] Kittel C 1 987 Quantum Theory of Solids 2nd revn (New York: Wiley) 

[4] Callaway J 1974 Quantum Theory of the Solid State (Boston: Academic) 

[5] Yu P and Cardona M 1996 Fundamentals of Semiconductors (New York: Springer) 

[6] Wells A F 1 984 Structural Inorganic Chemistry 5th edn (Oxford: Clarendon) 

[7] Madelung O 1996 Introduction to Solid State Theory (New York: Springer) 

[8] Tauc J (ed) 1 974 Amorphous and Liquid Semiconductors (New York: Plenum) 

[9] Phillips J C 1989 Physics of High-T c Superconductors (Boston: Academic) 

C ™ Anderson P W 1 997 The Theory of Superconductivity in the High-T c Cuprates (Princeton Series in Physics) 
(Princeton: Princeton University Press) 


[11] Ziman J M 1960 Electrons and Phonons (Oxford: Oxford University Press) 

[12] Haug A 1 972 Theoretical Solid State Physics (New York: Pergamon) 

[13] Thomas L H 1926 Proc. Camb. Phil. Soc. 23 542 

[14] Fermi E 1928 Z. Phys. 48 73 

[15] Lundqvist S and March N H (eds) 1983 Theory of the Inhomogeneous Electron Gas (New York: Plenum) 

[16] Aschroft N W and Mermin N D 1976 Solid State Physics (New York: Holt, Rinehart and Winston) 

[17] Slater J C 1951 Phys. Rev. 81 385 

[18] Slater J C 1964-74 Quantum Theory of Molecules and Solids vols 1-4 (New York: McGraw-Hill) 

[19] Slater J C 1968 Quantum Theory of Matter (New York: McGraw-Hill) 

[20] Hohenberg P and Kohn W 1964 Phys. Rev. B 136 864 

[21] Kohn W and Sham L 1965 Phys. Rev. A 140 1133 

[22] Ceperley D M and Alder B J 1980 Phys. Rev. Lett. 45 566 

[23] Perdew J P, Burke K and Wang Y 1996 Phys. Rev. B 54 16 533 and references therein 


-55- 


[24] Chelikowsky J R and Louie S G (eds) 1996 Quantum Theory of Real Materials (Boston: Kluwer) 

[25] Cohen M L and Chelikowsky J R 1989 Electronic Structure and Optical Properties of Semiconductors 2nd edn 
(Springer) 

[26] Phillips J C and Kleinman L 1959 Phys. Rev. 116 287 

[27] Chelikowsky J R and Cohen M L 1992 Ab initio pseudopotentials for semiconductors Handbook on 
Semiconductors vol 1, ed P Landsberg (Amsterdam: Elsevier) p 59 

[28] Fermi E 1934 Nuovo Cimento 11 157 

[29] Herring C 1940 Phys. Rev. 57 1 169 

[30] Ley L, Kowalczyk S P, Pollack R A and Shirley D A 1972 Phys. Rev. Lett. 29 1088 

[31] Philipp H R and Ehrenreich H 1963 Phys. Rev. Lett. 127 1550 

[32] Zucca R R L and Shen Y R 1970 Phys. Rev. B 1 2668 

[33] Yang L W and Coppens P 1974 Solid State Commun. 15 1555 

[34] Yin M T and Cohen M L 1980 Phys. Rev. Lett. 45 1004 

[35] Chang K J, Dacorogna M M, Cohen M L, Mignot J M, Chouteau G and Martinez G 1985 Phys. Rev. Lett. 54 2375 

[36] Dacorogna M M, Chang K J and Cohen M L 1985 Phys. Rev. B 32 1853 

[37] Hybertsen M and Louie S G 1985 Phys. Rev. Lett. 55 1418 

[38] Hybertsen M and Louie S G 1986 Phys. Rev. B 34 5390 

[39] Benedict L X and Shirley E L 1999 Phys. Rev. B 59 5441 


[40] Chelikowsky J R and Chou M Y 1988 Phys. Rev. B 38 7966 

[41] Waseda Y 1980 The Structure of Non-Crystalline Materials (New York: McGraw-Hill) 

[42] Godlevsky V, Derby J and Chelikowsky J R 1998 Phys. Rev. Lett. 81 4959 


FURTHER READING 

Anderson P W 1963 Concepts in Solids (New York: Benjamin) 

Cox P A 1987 The Electronic Structure and Chemistry of Solids (Oxford: Oxford University Press) 

Harrison W A 1999 Elementary Electronic Structure (River Edge: World Scientific) 

Harrison W A 1989 Electronic Structure and the Properties of Solids: The Physics of the Chemical Bond (New York: Dover) 


-56- 

Hummel R 1985 Electronic Properties of Materials (New York: Springer) 

Jones W and March N 1973 Theoretical Solid State Physics (New York: Wiley) 

Lerner R G and Trigg G L (eds) 1983 Concise Encyclopedia of Solid State Physics (Reading, MA: Addison- Wesley) 

Myers H P 1997 Introductory Solid State Physics (London: Taylor and Francis) 

Patterson J D 1971 Introduction to the Theory of Solid State Physics (Reading, MA: Addison-Wesley) 

Peierls R 1955 Quantum Theory of Solids (Oxford: Clarendon) 

Phillips J C 1973 Bands and Bonds in Semiconductors (New York: Academic) 

Pines D 1963 Elementary Excitations in Solids (New York: Benjamin) 

Seitz F 1948 Modern Theory of Solids (New York: McGraw-Hill) 


-1- 

A1.4 The symmetry of molecules 

Per Jensen and P R Bunker 


A1.4.1 INTRODUCTION 

Unlike most words in a glossary of terms associated with the theoretical description of molecules, the word 
'symmetry' has a meaning in every-day life. Many objects look exactly like their mirror image, and we say 
that they are symmetrical or, more precisely, that they have reflection symmetry. In addition to having 
reflection symmetry, a pencil (for example) is such that if we rotate it through any angle about its long axis it 


will look the same. We say it has rotational symmetry. The concepts of rotation and reflection symmetry are 
familiar to us all. 

The ball-and-stick models used in elementary chemistry education to visualize molecular structure are 
frequently symmetrical in the sense discussed above. Reflections in certain planes, rotations by certain angles 
about certain axes, or more complicated symmetry operations involving both reflection and rotation, will 
leave them looking the same. One might initially think that this is 'the symmetry of molecules' discussed in 
the present chapter, but it is not. Ball-and-stick models represent molecules fixed at their equilibrium 
configuration, that is, at the minimum (or at one of the minima) of the potential energy function for the 
electronic state under consideration. A real molecule is not static and generally it does not possess the 
rotation-reflection symmetry of its equilibrium configuration. Anyway, the use we make of molecular 
symmetry in understanding molecules, their spectra and their dynamics, has its basis in considerations other 
than the appearance of the molecule at equilibrium. 

The true basis for understanding molecular symmetry involves studying the operations that leave the energy 
of a molecule unchanged, rather than studying the rotations or reflections that leave a molecule in its 
equilibrium configuration looking the same. Symmetry is a general concept. Not only does it apply to 
molecules, but it also applies, for example, to atoms, to atomic nuclei and to the particles that make up atomic 
nuclei. Also, the concept of symmetry applies to nonrigid molecules such as ammonia NH 3 , ethane C 2 H 6 , the 
hydrogen dimer (H 2 ) 2 , the water trimer (H 2 0) 3 and so on, that easily contort through structures that differ in 
the nature of their rotational and reflection symmetry. For a hydrogen molecule that is translating, rotating and 
vibrating in space, with the electrons orbiting, it is clear that the total energy of the molecule is unchanged if 
we interchange the coordinates and momenta of the two protons; the total kinetic energy is unchanged (since 
the two protons have the same mass), and the total electrostatic potential energy is unchanged (since the two 
protons have the same charge). However, the interchange of an electron and a proton will almost certainly not 
leave the molecular energy unchanged. Thus the permutation of identical particles is a symmetry operation 
and we will introduce others. In quantum mechanics the possible molecular energies are the eigenvalues of the 
molecular Hamiltonian and if the Hamiltonian is invariant to a particular operation (or, equivalently, if the 
Hamiltonian commutes with a particular operation) then that operation is a symmetry operation. 

We collect symmetry operations into various 'symmetry groups', and this chapter is about the definition and 
use of such symmetry operations and symmetry groups. Symmetry groups are used to label molecular states 
and this labelling makes the states, and their possible interactions, much easier to understand. One important 
symmetry group that we describe is called the molecular symmetry group and the symmetry operations it 
contains are permutations of identical nuclei with and without the inversion of the molecule at its centre of 
mass. One fascinating outcome is that indeed for 


rigid molecules (i.e., molecules that do not undergo large amplitude contortions to become nonrigid as 
discussed above) we can obtain a group of rotation and reflection operations that describes the rotation and 
reflection symmetry of the equilibrium molecular structure from the molecular symmetry group. However, by 
following the energy-invariance route we can understand the generality of the concept of symmetry and can 
readily deduce the symmetry groups that are appropriate for nonrigid molecules as well. 

This introductory section continues with a subsection that presents the general motivation for using symmetry 
and ends with a short subsection that lists the various types of molecular symmetry. 

A1. 4.1.1 MOTIVATION: ROTATIONAL SYMMETRY AS AN EXAMPLE 

Rotational symmetry is used here as an example to explain the motivation for using symmetry in molecular 
physics; it will be discussed in more detail in section Al. 4.3.2 . 


We consider an isolated molecule in field- free space with Hamiltonian //. We let /"be the total angular 
momentum operator of the molecule, that is 


F = JV + S + / (A1.4.1) 

where /Vis the operator for the rovibronic angular momentum that results from the rotational motion of the 

- 

nuclei and the orbital motion of the electrons, *is the total electron spin angular momentum operator and /is 
the total nuclear spin angular momentum operator. We introduce a Cartesian axis system (X,Y,Z). The 
orientation of the (X, Y, Z) axis system is fixed in space (i.e., it is independent of the orientation in space of the 
molecule), but the origin is tied to the molecular centre of mass. It is well known that the molecular 
Hamiltonian //commutes with the operators 

F" = Fj* +ff+F| (A1.4.2) 

and F z where this is the component of F 'along the Z axis, i.e., 

[F 2 > H] = F 2 H - HF 2 = (A1.4.3) 

and 

[r 7t /?] = ()♦ (A1.4.4) 

It is also well known that F and F z have simultaneous eigenfunctions \F,m F ) and that 

FV,™,,} =F(F+ iyh 2 \F 9 mp) (A1.4.5) 


and 


Fz\F,m F )=m f h\F+m F ) (A1.4.6) 

where, for a given molecule, F assumes non-negative values that are either integral (=0,1,2,3,. . .) or half- 
integral (=1/2,3/2,5/2,...) and, for a givenFvalue, m^has the 2F+1 values -F,-F+1,...,F-1,F. 

We can solve the molecular Schrodinger equation 

H*/ = EjVj (A1.4.7) 

by representing the unknown wavefunction x ¥. (where j is an index labelling the solutions) as a linear 


combination of known basis functions ty; , 


o J 


if 9 


*/=Eo-*; 


(A1.4.8) 


where the C are expansion coefficients and n is an index labelling the basis functions. As described, for 
example, in section 6.6 of Bunker and Jensen [1], the eigenvalues E. and expansion coefficients C can be 
determined from the 'Hamiltonian matrix' by solving the secular equation 

|/f M - 8 M E\ = (A1.4.9) 

where the Kronecker delta 8 has the value 1 for m = n and the value for m ^ n, and the Hamiltonian 
matrix elements if are given by 

run c> J 


H mn = j *£*#«• a 


(A1.4.10) 


with integration carried out over the configuration space of the molecule. This process is said to involve 
'diagonalizing the Hamiltonian matrix'. 

We now show what happens if we set up the Hamiltonian matrix using basis functions ^^that are 
eigenfunctions of Fand F z with eigenvalues given by ( equation Al. 4. 5 ) and (equation Al .4.6). We denote 
this particular choice of basis functions as ^" F m . From ( equation Al. 4. 3 ), ( equation Al. 4. 5 ) and the fact 
that F 2 is a Hermitian operator, we derive 
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from which it follows that the matrix element f V J ■, r l/Jltf/D .. - }must vanish if F f i F". From 

( equation A 1.4.4 ) it follows in a similar manner that the matrix element must also vanish if F ~^~ F. 
That is, in the basis T «,F*»i?the Hamiltonian matrix is block diagonal in F and m F , and we can rewrite 
( equation Al. 4. 8 ) as 

V J "2-r C /« V »J,ib, (A1.4.12) 

if 

the eigenfunctions of //are also eigenfunctions of F and P^. We can further show that since m F quantizes the 
molecular angular momentum along the arbitrarily chosen, space-fixed Z axis, the energy (i.e., the eigenvalue 
of //associated with the function ^t is independent of m F That is, the IF + 1 states with common values of 
j and F and m F = -F, -F + 1,...,F, are degenerate. 

In order to solve ( equation Al. 4. 7 ) we do not have to choose the basis functions to be eigenfunctions of F 2 
and F z , but there are obvious advantages in doing so: 

• The Hamiltonian matrix factorizes into blocks for basis functions having common values of F and 
m F . This reduces the numerical work involved in diagonalizing the matrix. 


• The solutions can be labelled by their values of F and m F . We say that F and m F are good quantum 
numbers. With this labelling, it is easier to keep track of the solutions and we can use the good 
quantum numbers to express selection rules for molecular interactions and transitions. In field-free 
space only states having the same values of F and m F can interact, and an electric dipole transition 
between states with F = F' and F" will take place if and only if 

\F'-F"\ < landF + F' > L (A1.4.13) 

At this point the reader may feel that we have done little in the way of explaining molecular symmetry. All we 
have done is to state basic results, normally treated in introductory courses on quantum mechanics, connected 
with the fact that it is possible to find a complete set of simultaneous eigenfunctions for two or more 
commuting operators. However, as we shall see in section Al. 4. 3. 2 , the fact that the molecular Hamiltonian 
//commutes with F 2 and F z is intimately connected to the fact that //commutes with (or, equivalently, is 

invariant to) any rotation of the molecule about a space-fixed axis passing through the centre of mass of the 
molecule. As stated above, an operation that leaves the Hamiltonian invariant is a symmetry operation of the 
Hamiltonian. The infinite set of all possible rotations of the 


molecule about all possible axes that pass through the molecular centre of mass can be collected together to 
form a group (see below). Following the notation of Bunker and Jensen [1] we call this group K (spatial). 
Since all elements of K (spatial) are symmetry operations of//, we say that K (spatial) is a symmetry group of 
//. Any group has a set of irreducible representations and they define the way coordinates, wavefunctions and 
operators have to transform under the operations in the group; it so happens that the irreducible 

representations of K (spatial), D^ F \ are labelled by the angular momentum quantum number F. The IF + 1 
functions \F f m F ) (or '- f - m ' ^ ) with a common value of F (and n or/) and m F = -F-F + l,...,F 

transform according to the irreducible representation D^ of K (spatial). As a result, we can reformulate our 
procedure for solving the Schrodinger equation of a molecule as follows: 

• For the Hamiltonian //we identify a symmetry group, and this is a group of symmetry operations of 
//a symmetry operation being defined as an operation that leaves //invariant (i.e., that commutes with 
//). In our example, the symmetry group is K (spatial). 

• Having done this we solve the Schrodinger equation for the molecule by diagonalizing the 
Hamiltonian matrix in a complete set of known basis functions. We choose the basis functions so that 
they transform according to the irreducible representations of the symmetry group. 

• The Hamiltonian matrix will be block diagonal in this basis set. There will be one block for each 
irreducible representation of the symmetry group. 

• As a result the eigenstates of //can be labelled by the irreducible representations of the symmetry 
group and these irreducible representations can be used as 'good quantum numbers' for understanding 
interactions and transitions. 

We have described here one particular type of molecular symmetry, rotational symmetry. On one hand, this 
example is complicated because the appropriate symmetry group, K (spatial), has infinitely many elements. 
On the other hand, it is simple because each irreducible representation of K (spatial) corresponds to a 
particular value of the quantum number F which is associated with a physically observable quantity, the 
angular momentum. Below we describe other types of molecular symmetry, some of which give rise to finite 
symmetry groups. 


A1.4.1.2 A LIST OF THE VARIOUS TYPES OF MOLECULAR SYMMETRY 

The possible types of symmetry for the Hamiltonian of an isolated molecule in field-free space (all of them 
are discussed in more detail later on in the article) can be listed as follows: 

(i) Translational symmetry. A translational symmetry operation displaces all nuclei and electrons in the 
molecule uniformly in space (i.e., all particles are moved in the same direction and by the same 
distance). This symmetry is a consequence of the uniformity of space. 

(ii) Rotational symmetry. A rotational symmetry operation rotates all nuclei and electrons by the same 
angle about a space-fixed axis that passes through the molecular centre of mass. This symmetry is a 
consequence of the isotropy of space. 


(iii) Inversion symmetry. The Hamiltonian that we customarily use to describe a molecule involves 
only the electromagnetic forces between the particles (nuclei and electrons) and these forces are 
invariant to the 'inversion operation' E* which inverts all particle positions through the centre 
of mass of the molecule. Thus such a Hamiltonian commutes with E*; the use of this operation 
leads (as we see in section Al. 4. 2. 5 ) to the concept of parity, and parity can be + or -. This 
symmetry results from the fact that the electromagnetic force is invariant to inversion. It is not a 
property of space. 

(iv) Identical particle permutation symmetry. The corresponding symmetry operations permute 
identical particles in a molecule. These particles can be electrons, or they can be identical 
nuclei. This symmetry results from the indistinguishability of identical particles. 

(v) Time reversal symmetry. The time reversal symmetry operation Tor ^reverses the direction of 
motion in a molecule by reversing the sign of all linear and angular momenta. This symmetry 
results from the properties of the Schrodinger equation of a system of particles moving under 
the influence of electromagnetic forces. It is not a property of space-time. 

We hope that by now the reader has it firmly in mind that the way molecular symmetry is defined and 
used is based on energy invariance and not on considerations of the geometry of molecular equilibrium 
structures. Symmetry defined in this way leads to the idea of conservation. For example, the total 
angular momentum of an isolated molecule in field- free space is a conserved quantity (like the total 
energy) since there are no terms in the Hamiltonian that can mix states having different values of F. This 
point is discussed further in section Al. 4. 3.1 and section Al. 4. 3.2 . 


A1 .4.2 GROUP THEORY 

The use of symmetry involves the mathematical apparatus of group theory, and in this section we 
summarize the basics. We first define the concept of a group by considering the permutations of the 
protons in the phosphine molecule PH 3 (figure Al.4.1) as an example. This leads to the definition of the 
nuclear permutation group for PH 3 . We briefly discuss point groups and then introduce representations 
of a group; in particular we define irreducible representations. We then go on to show how 
wavefunctions are transformed by symmetry operations, and how this enables molecular states to be 
labelled according to the irreducible representations of the applicable symmetry group. The final 
subsection explains the vanishing integral rule which is of major use in applying molecular symmetry in 
order to determine which transitions and interactions can and cannot occur. 


"& 
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Figure Al.4.1. A PH 3 molecule at equilibrium. The protons are labelled 1, 2 and 3, respectively, and the 
phosphorus nucleus is labelled 4. 


A1. 4.2.1 NUCLEAR PERMUTATION GROUPS 

The three protons in PH~ are identical and indistinguishable. Therefore the molecular Hamiltonian will 
commute with any operation that permutes them, where such a permutation interchanges the space and spin 
coordinates of the protons. Although this is a rather obvious symmetry, and a proof is hardly necessary, it 
can be proved by formal algebra as done in chapter 6 of [JJ. 

How many distinct ways of permuting the three protons are there? For example, we can interchange 
protons 1 and 2. The corresponding symmetry operation is denoted (12) (pronounced 'one-two') and it is 
to be understood quite literally: protons 1 and 2 interchange their positions in space. There are obviously 

two further distinct operations of this type: (23) and (31)-. A permutation operation that interchanges just 
two nuclei is called a transposition. A more complicated symmetry operation is (123). Here, nucleus 1 is 
replaced by nucleus 2, nucleus 2 by nucleus 3 and nucleus 3 by nucleus 1. Thus, after (123) nucleus 2 ends 
up at the position in space initially occupied by nucleus 1 , nucleus 3 ends up at the position in space 
initially occupied by nucleus 2 and nucleus 1 ends up at the position in space initially occupied by nucleus 
3. Such an operation, which involves more than two nuclei, is called a cycle. A moment's thought will 
show that in the present case, there exists one other distinct cycle, namely (132). We could write further 
cycles like (231), (321) etc, but we discover that each of them has the same effect as (123) or (132). There 
are thus five distinct ways of permuting three protons: (123), (132), (12), (23) and (31). 

We can apply permutations successively. For example, we can first apply (12), and then (123); the net 
effect of doing this is to interchange protons 1 and 3. Thus we have 


(123><12) = (31). (A1.4.14) 

When we apply permutations (or other symmetry operations) successively (this is commonly referred to as 
multiplying the operations so that (31) is the product of (123) and (12)), we write the operation to be 
applied first to the right in the manner done for general quantum mechanical operators. Permutations do not 
necessarily commute. For example, 

(]2)(123) = (23). (A1.4.15) 

If we apply the operation (12) twice, or the operation (123) three times, we obviously get back to the 
starting point. We write this as 

(A1.4.16) 


(I2)(I2) = (I23)(I23)(I23) = E 

where the identity operation E leaves the molecule unchanged by definition. Having defined E, we define 

the reciprocal (or inverse) R of a symmetry operation R (which, in our present example, could be (123), 
(132), (12), (23) or (31)) by the equation 

RR~ ] = R~ ] R= E. (A1.4.17) 


It is easy to verify that for example 

(12)" 1 = (12) and (123)' 1 = (132). (A1.4.18) 

The six operations 

8 3 = [E, (123), (132), (12), (23), (31)} (A1.4.19) 

are said to form a group because they satisfy the following group axioms: 

(i) We can multiply (i.e., successively apply) the operations together in pairs and the result is a 
member of the group. 

(ii) One of the operations in the group is the identity operation E. 

(iii) The reciprocal of each operation is a member of the group. 

(iv) Multiplication of the operations is associative; that is, in a multiple product the answer is 
independent of how the operations are associated in pairs, e.g., 

(12)(123)[23) = (12) [M23)<23)] = [(]2)(]23)] {23) - £. 4 

(12) \Z3y 

The fact that the group axioms (i), (ii), (iii) and (iv) are satisfied by the set in (equation Al.4.19) can be 
verified by inspecting the multiplication table of the group S^ given in table A 1.4.1; this table lists all 
products R^R 2 where R^ and R 2 are members of Sy The group S^ is the permutation group (or 
symmetric group) of degree 3, and it consists of all permutations of three objects. There are six elements 
in iS 3 and the group is said to have order six. In general, the permutation group S n (all permutations of n 
objects) has order n\. 

Table Al.4.1 The multiplication table of the S^ group. 

E (123)(132) (12) (23) (31) 


E 

E 

(123) 

(132) 

(12) 

(23) 

(31) 

(123) 

(123) 

(132) 

£ 

(31) 

(12) 

(23) 

(132) 

(132) 

£ 

(123) 

(23) 

(31) 

(12) 

(12) 

(12) 

(23) 

(31) 

£ 

(123) 

(132) 

(23) 

(23) 

(31) 

(12) 

(132) 

£ 

(123) 

(31) 

(31) 

(12) 

(23) 

(123) 

(132) 

£ 


Each entry is the product of first applying the permutation at the top of 
the column and then applying the permutation at the left end of the 
row. 


There is another way of developing the algebra of permutation multiplication, and we briefly explain it. In this 
approach for PH 3 three positions in space are introduced and labelled I, 2 and 3; the three protons are labelled 

Hp H 2 and H 3 . The permutation (12)^ (where S denotes space-fixed position labels) is defined in this 
approach as permuting the nuclei that are in positions 1 and 2, and the permutation (123) as replacing the 
proton in position I by the proton in position 2 etc. With this definition the effect of first doing (12) s and then 
doing (123) can be drawn as 

123 t!2) 5 213 (I23) s 132 
123 123 r 123 


j (23^ J 

and we see that 

<123> S (12> S = (23) s , (A1.4.21) 

This is not the same as ( equation Al. 4. 14 ). In fact, in this convention, which we can call the S-convention, the 
multiplication table is the transpose of that given in table A 1.4.1 . The convention we use and which leads to 
the multiplication table given in table A 1.4.1 , will be called the N-convention (where N denotes nuclear- fixed 
labels). 

A1. 4.2.2 POINT GROUPS 

Having defined the concept of a group in section Al. 4.2. 1 , we discuss in the present section a particular type 
of group that most readers will have heard about: the point group. We do this with some reluctance since 
point group operations do not commute with the complete molecular Hamiltonian and thus they are not true 
symmetry operations of the kind discussed in section Al. 4. 1.2 . Also the actual effect that the operations have 
on molecular coordinates is not straightforward to explain. From a pedagogical and logical point of view it 
would be better to bring them into the discussion of molecular symmetry only after groups consisting of the 
true symmetry operations enumerated in section A 1.4. 1.2 have been thoroughly explained. However, because 
of their historical importance we have decided to introduce them early on. As explained in section Al. 4.4 the 
operations of a molecular point group involve the rotation and/or reflection of vibrational displacement 
coordinates and electronic coordinates, within the molecular-fixed coordinate system which itself remains 
fixed in space. Thus the rotational variables (called Euler angles) that define the orientation of a molecule in 
space are not transformed and in particular the molecule is not subjected to an overall rotation by the 
operations that are called 'rotations' in the molecular point group. It turns out that the molecular point group is 
a symmetry group of use in the understanding of the vibrational and electronic states of molecules. However, 


because of centrifugal and Coriolis forces the vibrational and electronic motion is not completely separable 
from the rotational motion and, as we explain in section Al. 4. 5 , the molecular point group is only a near 
symmetry group of the complete molecular Hamiltonian appropriate for the hypothetical, non-rotating 
molecule. 

In general, a point group symmetry operation is defined as a rotation or reflection of a macroscopic object 
such that, after the operation has been carried out, the object looks the same as it did originally. The 
macroscopic objects we consider here are models of molecules in their equilibrium configuration; we could 
also consider idealized objects such as cubes, pyramids, spheres, cones, tetrahedra etc. in order to define the 
various possible point groups. 
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As an example, we again consider the PH 3 molecule. In its pyramidal equilibrium configuration PH 3 has all 

three P-H distances equal and all three bond angles Z(HPH) equal. This object has the point group symmetry 
C 3v where the operations of the group are 


Cjv = \E, CV ^V <Ji? P2»tfji}" 


(A1.4.22) 


The operations in the group can be understood by referring to figure Al .4.2 In this figure the right-handed 
Cartesian (p, q, r) axis system has origin at the molecular centre of mass, the P nucleus is above the pq plane 
(the plane of the page), and the three H nuclei are below the pq plane-. The operations C 3 and C^in (equation 
Al.4.22) are right-handed rotations of 120° and 240°, respectively, about the r axis. In general, we use the 

notation C n for a rotation of 2n/n radians about an axis-. Somewhat unfortunately, it is customary to use the 
symbol C n to denote not only the rotation operation, but also the rotation axis. That is, we say that the r axis in 

figure Al.4.2 is a C 3 axis. The operation a 1 is a reflection in the pr plane (which, with the same unfortunate 
lack of distinction used in the case of the C 3 operation and the C 3 axis, we call the a 1 plane), and a 2 and g 3 
are reflections in the a 2 and g 3 planes; these planes are obtained by rotating by 120° and 240°, respectively, 
about the r axis from the pr plane. As shown in figure Al.4.2 , each of the H nuclei in the PH 3 molecule lies 
in a o k plane (£=1,2,3) and the P nucleus lies on the C 3 axis. It is clear that the operations of C 3v as defined 
here leave the PH 3 molecule in its static equilibrium configuration looking unchanged. It is important to 
realize that when we apply the point group operations we do not move the (p,q,r) axes (we call this the 'space- 
fixed' axis convention) and we will now show how this aspect of the way point group operations are defined 
affects the construction of the group multiplication table. 



®-*p 


Figure Al.4.2. The PH 3 molecule at equilibrium. The symbol (+ r) indicates that the r axis points up, out of 
the plane of the page. 


Formally, we can say that the operations in C 3v act on points in space. For example, we show in figure Al.4.2 
how a point P in t\vQpq plane is transformed into another point P by the operation a^ we can say that P ' = 
g^P. The reader can now show by geometrical considerations that if we first reflect a point P in the a 1 plane 
to obtain P' = a^P, and we then reflect F in the g 2 plane to obtain P " = g 2 ,P' = s 2 s 1 P, then P" can be 
obtained directly from P by a 240° anticlockwise rotation about the r axis. Thus P }} = C\P ov generally 
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C$ =<T 2 V[^ (A1.4.23) 

We can also show that 

C-, =^{7i. (A1.4.24) 

The complete multiplication table of the C 3v point group, worked out using arguments similar to those leading 
to (equation Al.4.23) and (equation Al.4.24), is given in table Al.4.2. It is left as an exercise for the reader to 
use this table to show that the elements of C 3v satisfy the group axioms given in section Al.4.2. 1 . 

Table Al.4.2 The multiplication table of the C 3v point group using the space-fixed axis convention (see text). 


E C 3 C 3 a 1 a 2 a 3 


£ 

£ 

C 3 

c 2 

a 1 

a 2 

a 3 

C 3 

C 3 

cl 

£ 

CT 3 

CT 1 

CT 2 

cl 

C.1 

E 

C 3 

CT 2 

CT 3 

a 1 

a 1 

a 1 

a 2 

a 3 

£ 

C 3 

c| 

a 2 

a 2 

a 3 

a 1 

c? 

£ 

C 3 

CT 3 

CT 3 

CT 1 

CT 2 

C 3 

c? 

£ 


Each entry is the product of first applying the operation at the top of the column and then applying 
the operation at the left end of the row. 

If we were to define the operations of the point group as also rotating and reflecting the 
(p,q,r) axis system (in which case the axes would be 'tied' to the positions of the nuclei), we 
would obtain a different multiplication table. We could call this the 'nuclear- fixed axis 
convention.' To implement this the protons in the a 1? a 2 and a 3 planes in figure Al.4.2 
would be numbered H 1? H 2 and H 3 respectively. With this convention the C 3 operation 

would move the a 1 plane to the position in space originally occupied by the a 2 plane. If we 
follow such a C 3 operation by the a 1 reflection (in the plane containing H^ we find that, in 
the nuclear-fixed axis convention: 

GyCy = ay, (A1.4.25) 


Similarly, with the nuclear- fixed axis convention, we determine that 

Q = <T\Gl (A1.4.26) 

and this result also follows by multiplying (equation Al.4.25) on the left by o^. The 
multiplication table obtained using the nuclear- fixed axis convention is the transpose of the 
multiplication table obtained using the space-fixed axis convention (compare (equation 
Al.4.24) and (equation Al.4.26)). In dealing with point groups we will use the space-fixed 
axis convention. For defining the effect of permutation operations the S-convention (see 
( equation A 1.4.21 )) 
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is related to the N-convention (see ( equation A 1.4. 14 )) in the same way that the space-fixed and nuclear- fixed 
axis conventions for point groups are related. 

The operations in a point group are associated with so-called symmetry elements. Symmetry elements can be 
rotation axes (such as the C 3 axis that gives rise to the C 3 and operations in C 3v ) or reflection planes (such as 

the planes a 1? a 2 , a 3 ; each of which gives rise to a reflection operation in C 3v ). A third type of symmetry 
element not present in C 3v is the rotation-reflection axis or improper axis. For example, an allene molecule 
H 2 CCCH 2 in its equilibrium configuration will be unchanged in appearance by a rotation of 90° about the 
CCC axis combined with a reflection in a plane perpendicular to this axis and containing the 'middle' C 
nucleus. This operation (a rotation-reflection or an improper rotation) is called S 4 ; it is an element of the 
point group of allene, Z> 2d< Allene is said to have as a symmetry element the rotation-reflection axis or 
improper axis S 4 . It should be noted that neither the rotation of 90° about the CCC axis nor the reflection in 
the plane perpendicular to it are themselves in Z> 2d . For an arbitrary point group, all symmetry elements will 
intersect at the centre of mass of the object; this point is left unchanged by the group operations and hence the 
name point group. In order to determine the appropriate point group for a given static arrangement of nuclei, 
one first identifies the symmetry elements present. Cotton [2] gives in his section 3.14 a systematic procedure 
to select the appropriate point group from the symmetry elements found. The labels customarily used for point 
groups (such as C 3v and Z> 2d ) are named Schonflies symbols after their inventor. The most important point 
groups (defined by their symmetry elements) are 

C one n-fold rotation axis, 

n 

C one n-fold rotation axis and n reflection planes containing this axis, 

C . one ?z-fold rotation axis and one reflection plane perpendicular to this axis, 

D one n-fold rotation axis and n twofold rotation axes perpendicular to it, 

D. those of D n plus n reflection planes containing the n-fold rotation axis and bisecting the angles 

between the n twofold rotation axes, 
D , those of D n plus a reflection plane perpendicular to the n-fold rotation axis, 

S one alternating axis of symmetry (about which rotation by Inln radians followed by reflection in a 
plane perpendicular to the axis is a symmetry operation). 

The point groups J d , O^ and 7 h consist of all rotation, reflection and rotation-reflection symmetry operations 
of a regular tetrahedron, cube and icosahedron, respectively. 

Point groups are discussed briefly in sections 4.3 and 4.4 of [1] and very extensively in chapter 3 of Cotton 


[2]. We refer the reader to these literature sources for more details. 
A1.4.2.3 IRREDUCIBLE REPRESENTATIONS AND CHARACTER TABLES 

If we have two groups A and B, of the same order h: 

(A1.4.27) 
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B = {ii l Ji^B r , B h ) (A1.4.28) 

where A 1 = B^= E, the identity operation and if there is a one-to-one correspondence between the elements of 
A and B, A k <-> B k , k = 1, 2, 3, . . .,/z, so that if 

A(Aj = A m (A1.4.29) 

it can be inferred that 

BiBj = B ttt (A1.4.30) 

for all i < h andy < h, then the two groups A and B are said to be isomorphic. 

As an example we consider the group S^ introduced in ( equation A 1.4. 19 ) and the point group C 3v given in 
( equation A 1.4. 22 ). Inspection shows that the multiplication table of C 3v in table Al.4.2 can be obtained from 
the multiplication table of the group S^ ( table A 1.4.1 ) by the following mapping: 

S S : E (123) (132) (12) (23) (31) (A1431) 

Cj v ' E Ci Cj cr 3 CT( 05. 

Thus, C 3v and S^ are isomorphic. 

Homomorphism is analogous to isomorphism. Where an isomorphism is a one-to-one correspondence 
between elements of groups of the same order, homomorphism is a many-to-one correspondence between 
elements of groups having different orders. The larger group is said to be homomorphic onto the smaller 
group. For example, the point group C 3v is homomorphic onto S 2 = {E, (12)} with the following 
correspondences: 


C^ v : E C$ C3 fT} n2 "i 


S 2 t E (12) 


(A1.4.32) 


The multiplication table of S ? has the entries EE = E 9 E(12) = (12)E = (12) and (12)(12) = E. If, in the 

r 2 

multiplication table of C 3v ( table Al.4.2 ), the elements E, C 3 and 3are each replaced by E (of S 2 ) and cj 1? a 2 
and a 3 each by (12), we obtain the multiplication table of S 2 nine times over. 

We are particularly concerned with isomorphisms and homomorphisms, in which one of the groups involved 
is a matrix group. In this circumstance the matrix group is said to be a representation of the other group. The 
elements of a matrix group are square matrices, all of the same dimension. The 'successive application' of two 


matrix group elements (in the sense of group axiom (i) in section Al. 4. 2.1 ) is matrix multiplication. Thus, the 
identity operation E of a matrix group is the unit matrix of the appropriate dimension, and the inverse element 
of a matrix is its inverse matrix. Matrices and matrix groups are discussed in more detail in section 5.1 of [I]. 
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For the group A in ( equation Al. 4. 27 ) to be isomorphic to, or homomorphic onto, a 
matrix group containing matrices of dimension £, say, each element A k of A is 

mapped onto an £ x ^matrix M^, k = 1,2,3,4,. . .,/z, and ( equation A 1.4. 29 ) and 

( equation A 1.4. 30 ) can be rewritten in the form 


if Ai A j = A iri rhen MjM; = M JW (A1.4.33) 

for all i < h andy < h. The latter part of this equation says that the £ x ^matrix M is 

the product of the two lx ^matrices M. and M.. 

1 J 

If we have found one representation of t -dimensional matrices M 1? M 2 , M 3 ,. . . of 

the group A, then, at least for £> 1, we can define infinitely many other equivalent 
representations consisting of the matrices 

M^V-yV k = 1.2.3 A (A1.4.34) 

where V is an ix ^matrix. The determinant of V must be nonvanishing, so that V 
exists, but otherwise V is arbitrary. We say that M^is obtained from M^ by a 

similarity transformation. It is straightforward to show that the matrices MJ, k = 
1,2,3,. . .,/z form a representation of A since they satisfy an equation analogous to 
(equation Al. 4. 33). 

It is well known that the trace of a square matrix (i.e., the sum of its diagonal 
elements) is unchanged by a similarity transformation. If we define the traces 

t * 

Xft = JjM^ and Xk = J]< M *W (A1.4.35) 

p-\ p-\ 

we have 

Xk = Xk. (A1.4.36) 


The traces of the representation matrices are called the characters of the 
representation, and (equation Al.4.36) shows that all equivalent representations 
have the same characters. Thus, the characters serve to distinguish inequivalent 
representations. 

If we select an element of A, A . say, and determine the set of elements S given by 
forming all products 

S= R~ ] AjR (A1.4.37) 
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where R runs over all elements of A, then the set of distinct elements obtained, 

which will include A . (since for R = R = E we have S = A), is said to form a class 
of A. For any group the identity operation E is always in a class of its own since for 

all R we have S = R~^ER = R~^R = E. The reader can use the multiplication table 
( table A 1.4.1 ) to determine the classes of the group S^ ( equation (Al.4.19) ); there 
are three classes [£], [(123),(132)] and [(12),(23),(31)]. Since the groups S^ and C 3v 
(( equation A 1.4. 22 )) are isomorphic, the class structure of C 3v can be immediately 
inferred from the class structure of S^ together with ( equation Al. 4. 31 ). C 3v has the 

classes [£], [C 3 , 3] and [a v a 2 ,o^\. 

If two elements of A, A . and A . say, are in the same class, then there exists a third 

1 J 

element of A, R, such that 

Af = R~ [ AjR. (A1.4.38) 

Then by ( equation Al. 4. 33 ) 

M, = M^'lU^M* (A1.4.39) 

where M f , M. and M^ are the representation matrices associated with A f , A . and 7?, 
respectively. That is, M f is obtained from M ? . in a similarity transformation, and 


these two matrices thus have the same trace or character. Consequently, all the 
elements in a given class of a group are represented by matrices with the same 
character. 

If we start with an ^-dimensional representation of A consisting of the matrices M 1? 

M 2 , M 3 , . . ., it may be that we can find a matrix V such that when it is used with 
( equation Al. 4. 34 ) it produces an equivalent representation M' 1? M 2 , M' 3 , . . . each 
of whose matrices is in the same block diagonal form. For example, the 
nonvanishing elements of each of the matrices M^could form an upper-left-corner t^ 

x f 1 block and a lower-right-corner £ 2 x £ 2 block, where t^ + £ 2 = £. In this 

situation, a few moments' consideration of the rules of matrix multiplication shows 
that all the upper-left-corner £^ x £^ blocks, taken on their own, form an l^- 

dimensional representation of A and all the lower-right-corner £ 2 x £^ blocks, taken 

on their own, form an ^-dimensional representation. In these circumstances the 

original representation Y consisting of M^ M 2 , M 3 , ... is reducible and we have 
reduced it to the sum of the two representations, T 1 and T 2 say, of dimensions £ 1 

and ^ 2 , respectively. We write this reduction as 

r = n ©r 2 . (M.4.40) 

Clearly, a one-dimensional representation (also called a non-degenerate 
representation) is of necessity irreducible in that it cannot be reduced to 
representations of lower dimensions. Degenerate representations (i.e., groups of 
matrices with dimension higher than 1) can also be irreducible, which means that 
there is no matrix that by a similarity transformation will bring all the matrices of 
the representation into the same block diagonal form. It can be shown that the 
number of irreducible representations of a given group is equal to the number of 
classes in the group. We have seen that the group S^ has three classes [E], [(123), 
(132)] and [(23),(31),(12)] and therefore it has three irreducible representations. For 
a general group with n irreducible representations with dimensions t^jt^Jt^. • -£ n > ^ 

can also be shown that 
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y^l? = ^ (A1.4.41) 

r-l 

where h is the order of the group. For S^ this equation yields 

(] +*J+*j =6 (A1.4.42) 


and, since the £. have to be positive integers, we obtain i 1 =£ 2 = 1 and ^ 3 = 2. When developing general 
formulae we label the irreducible representations of a group as r i? r 2? . . .,r and denote the characters 
associated with r. as X T [R] 9 where 7? is an element of the group under study. However, the irreducible 

representations of symmetry groups are denoted by various other special symbols such as A 2 , 2T and D^\ 
The characters of the irreducible representations of a symmetry group are collected together into a 
character table and the character table of the group S^ is given in table Al.4.3. The construction of 
character tables for finite groups is treated in section 4.4 of [2] and section 3-4 of [3]. 

Table Al.4.3 The character table of the S^ group. 


E 

(123) 

(12) 

V 

2 

3 

V 

1 

1 

A 1 

1 

-1 

E 2 

-1 




One representative element in each class is given, and the number written below each element is the number of 
elements in the class. 


For any T. we have ^ r ' [E] = £ J? the dimension of T.. This is because the identity operation E is always 
represented by an £. x £. unit matrix whose trace obviously is £.. For any group there will be one irreducible 

representation (called the totally symmetric representation T^) which has all X T [R] = 1. Such a 
representation exists because any group is homomorphic onto the one-member matrix group { 1 } (where the 

'V is interpreted as a 1 x 1 matrix). The irreducible characters % l [7?] satisfy several equations (see, for 
example, section 4.3 of [2] and section 3-3 of [3]), for example 

£* r- [*]V J [*]=*fy (A1.4.43) 

it 

where the sum runs over all elements R of the group. 

In applications of group theory we often obtain a reducible representation, and we then need to reduce it to its 
irreducible components. The way that a given representation of a group is reduced to its irreducible 
components depends only on the characters of the matrices in the representation and on the characters of the 
matrices in the irreducible representations of the group. Suppose that the reducible representation is T and that 
the group involved 
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has irreducible representations that we label r i? r 2 ,r 3 ,. . .. What we mean by 'reducing' T is finding the 
integral coefficients a. in the expression 


(A1.4.44) 


where 

(A1.4.45) 

with the sum running over all the irreducible representations of the group. Multiplying (equation Al.4.45) on 
the right by [i?]* and summing over R it follows from the character orthogonality relation ( equation 
(Al.4.43) ) that the required a f are given by 

(A1.4.46) 

where h is the order of the group and R runs over all the elements of the group. 

A1. 4.2.4 THE EFFECTS OF SYMMETRY OPERATIONS 

For the PH 3 molecule, which we continue using as an example, we consider that proton i (= 1, 2 or 3) initially 
has the coordinates (X,7.,Z z -) in the (X, Y, Z) axis system, and the phosphorus nucleus has the coordinates 
(X 4 ,7 4 ,Z 4 ). After applying the permutation operation (12) to the PH 3 molecule, nucleus 1 is where nucleus 2 
was before. Consequently, nucleus 1 now has the coordinates (X 2 ,7 2 ,Z 2 ). Nucleus 2 is where nucleus 1 was 
before and has the coordinates (X^Y^Z^). Thus we can write 

(A1.4.47) 

where , and are the X, 7 and Z coordinates of nucleus i after applying the permutation (12). By convention we 
always give first the (X, Y, Z) coordinates of nucleus 1, then those of nucleus 2, then those of nucleus 3 etc. 

Similarly, after applying the operation (123) to the PH 3 molecule, nucleus 2 is where nucleus 1 was before 
and has the coordinates (X^Y^Z^). Nucleus 3 is where nucleus 2 was before and has the coordinates 
(X 2 ,7 2 ,Z 2 ) and, finally, nucleus 1 is where nucleus 3 was before and has the coordinates (X 3 ,7 3 ,Z 3 ). So 
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(123) [X\, Ki, Zi. X 2 , Y 2j Z 2 , Xv, Y^ Z,. X 4 . Y 4 , Z 4 ] 

= [x\, r; T z\ t x^ y;> z;, r y y;, z j y x^ y^ zy (m .4.48) 

= [^3i ^3* ^3< ^i* Y\> Zj H X2i Yj, Z^ 4 Xj, Kj h Z4] 

where here X r Y i and Z^are the X, 7and Z coordinates of nucleus i after applying the permutation (123). 

The procedure exemplified by ( equation Al. 4.47 ) and (equation Al.4.48) can be trivially generalized to 
define the effect of any symmetry operation, 7? say, on the coordinates (X.,Y.,Z) of any nucleus or electron i in 
any molecule by writing 

R[X;, Y;, Zi) = [RXi, RY l ,RZ i ] = [X r \ Y\, Z)]. (A1.4.49) 

We can also write 

r-\xu iy, z' t ] = [/r'x;, *-% *- l z;i = [x it y^ z<\ (maso) 

We use the nuclear permutation operations (123) and (12) to show what happens when we apply two 
operations in succession. We write the successive effect of these two permutations as (remember that we are 
using the N-convention; see ( equation A 1.4. 14 )) 


(123)(12) [X,, Y U Z U X 2 , Y 2 , Z 2 ,X }t , Y^,Z 2 , X it Y 4 ,Z*] 

= ( 123) [A | + I| + Z] , Aj* /2> ^2+ ^J* *3 h ^J 1 ^4 + M* ^4] 

= [,x;, y^ z;. x;. y;\ z;, x;. y^ z^. x;, y;, z;] (masij 

= [^3 + y?* z^ Xj. y-< Z2*X| N y| h Zi, x^ y^ Z4] 
= (3d [X| T yi, Zi, X2. y2, Z2 T X3, y^, z^. X4 T y^ Z4] 

where X-? *7> Z- are the coordinates of the nuclei after applying the operation (12). The result in (equation 
Al.4.51) is in accord with ( equation Al. 4. 14 ). 

Molecular wavefunctions are functions of the coordinates of the nuclei and electrons in a molecule, and we 
are now going to consider how such functions can be transformed by the general symmetry operation 7? as 
defined in (equation Al.4.49). To do this we introduce three functions of the coordinates, /(X., y,Z.), f^ 

(X^Y^Zj) and f ^(X,y.,Z.). The functions ^and f ^are such that their values at any point in configuration 

space are each related to the value of the function/at another point in configuration space, where the 
coordinates of this 'other' point are defined by the effect of R as follows: 

/ N * (* , s ^ , Z; ) = f(X' f , I?, Z\) (A1 .4.52) 
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and 


f«(X' i .Y;,Z' i ) = f(X h Y i ,Z i ) (A1.4.53) 


or equivalently, 


/"[Xi. Y it 7^) = f(RXi. RY t . RZ t ) (A1.4.54) 

and 

ff (X t , Yi, Z,) = /(R-'Xi, R~ l Y h R~ ] 2d. (A1.4.55) 

This means that f^is such that its value at any point (X,y.,Z z .) is the same as the value off at the point 
(RXpRYpRZ), and that f ^is such that its value at any point (X, Y^Z^ is the same as the value off at the point 

(IT 1 X f IC 1 Y f Rr 1 Z l ). Alternatively, for the latter we can say that f^is such that its value at (RX^RY^RZ^ is 
the same as the value off at the point (X.,Y.,Z). 

We define the effect of a symmetry operation on a wavefunction in two different ways depending on whether 
the symmetry operation concerned uses a moving or fixed 'reference frame' (see [4]). Either we define its 
effect using the equation 

Rf(X t . Y„ Z t ) = /*(X,, Y t . Zi) = fiRXi, RYt.RZi), (A1.4.56) 


or we define its effect using 

Rf(Xi> Y;, Zi) = f£(Xi>Y;, Zi) = f{R- x X f , R~ [ Yi, R^Z^. (A1.4.57) 

Nuclear permutations in the N-convention (which convention we always use for nuclear permutations) and 
rotation operations relative to a nuclear- fixed or molecule-fixed reference frame, are defined to transform 
wavefunctions according to (equation Al.4.56). These symmetry operations involve a moving reference 
frame. Nuclear permutations in the S-convention, point group operations in the space-fixed axis convention 
(which is the convention that is always used for point group operations; see section Al. 4. 2. 2 and rotation 
operations relative to a space-fixed frame are defined to transform wavefunctions according to (equation 
Al.4.57). These operations involve a fixed reference frame. 

Another distinction we make concerning symmetry operations involves the active and passive pictures. Below 
we consider translational and rotational symmetry operations. We describe these operations in a space-fixed 
axis system (X,Y,Z) with axes parallel to the {X, Y, Z) axes, but with the origin fixed in space. In the active 
picture, which we adopt here, a translational symmetry operation displaces all nuclei and electrons in the 
molecule along a vector^, say, 
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and leaves the (X,Y,Z) axis system unaffected. In the passive picture, the molecule is left unaffected but the 
(X,Y,Z) axis system is displaced by -A. Similarly, in the active picture a rotational symmetry operation 
physically rotates the molecule, leaving the axis system unaffected, whereas in the passive picture the axis 
system is rotated and the molecule is unaffected. If we think about symmetry operations in the passive picture, 
it is immediately obvious that they must leave the Hamiltonian invariant (i.e., commute with it). The energy of 
an isolated molecule in field- free space is obviously unaffected if we translate or rotate the (X,Y,Z) axis 
system. 

A1. 4.2.5 THE LABELLING OF MOLECULAR ENERGY LEVELS 

The irreducible representations of a symmetry group of a molecule are used to label its energy levels. The way 
we label the energy levels follows from an examination of the effect of a symmetry operation on the 
molecular Schrodinger equation. 


H VAX; ,Y i ,Z i )= E a + n (X t , Y, , Zi) (A1 .4.58) 

where ¥ (X ? 7 ? Z.) is a molecular eigenfunction having eigenvalue E . 

By definition, a symmetry operation R commutes with the molecular Hamiltonian //and so we can write the 
operator equation: 

HR = RH. (A1.4.59) 

If we act with each side of this equation on an eigenfunction *F (A^I^Z.) from (equation Al.4.58) we derive 

HRVM.Yi, Z.) = */?*„(*,, Yi>Zi) = RH„% t (X^ Y,,Z,) 

(A1.4.60) 
= £ !t RA>„{X l ,Y i /Z i ). 


The second equality follows from (equation Al.4.58)-, and the third equality from the fact that E n is a number 
and numbers are not affected by symmetry operations. We can rewrite the result of (equation Al.4.60) as 

H[R% t (X h Y it Z f )] = E ti [R% t (X tt Y h Z,)]- (A1.4.61) 

Thus 

R%(X it Yi, Zi) = *;(X h Y it Zt) (A1.4.62) 

is an eigenfunction having the same eigenvalue as *¥ (X.,Y.,Z). HE is a nondegenerate eigenvalue then 
Tb cannot be linearly independent of W , which means that we can only have 
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R%tiX t -, YirZi) = cW n (Xi* Yi, Zi) (A1.4.63) 

where c is a constant. An arbitrary symmetry operation 7? is such that R m = E the identity, where m is an 
integer. From (equation Al.4.63) we deduce that 

R m V ff (Xi, Y ti Z,) = r"* fl (^ Yh*i) (A1.4.64) 

Since R m = E we must have c m = 1 in (equation Al.4.64), which gives 

C= ^1. (A1.4.65) 

Thus, for example, for the PEL molecule any nondegenerate eigenfunction can only be multiplied by +1, co = 

exp(27i i/3), or co = exp(47i i/3) by the symmetry operation (123) since (123) = E (so that m = 3 in (equation 
Al.4.65)). In addition, such a function can only be multiplied by +1 or -1 by the symmetry operations (12), 
(23) or (31) since each of these operations is self-reciprocal (so that m = 2 in (equation Al.4.65)). 

We will apply this result to the H 2 molecule as a way of introducing the fact that nondegenerate molecular 
energy levels can be labelled according to the one-dimensional irreducible representations of a symmetry 
group of the molecular Hamiltonian. The Hamiltonian for the H 2 molecule commutes with E* and with the 
operation (12) that permutes the protons. Thus, the eigenfunction of any nondegenerate molecular energy 

level is either invariant, or changed in sign, by the inversion operation E* since (E*) = E (i.e., rn = 2 for R = 
E* in (equation Al.4.65)); invariant states are said to have positive parity (+) and states that are changed in 
sign by E* to have negative parity (-). Similarly, any nondegenerate energy level will be invariant or changed 
in sign by the proton permutation operation (12); states that are invariant are said to be symmetric (s) with 
respect to (12) and states that are changed in sign are said to be antisymmetric (a). This enables us to label 
nondegenerate energy levels of the H 2 molecule as being (+5), (-5), (+a) or (-a) according to the effect of the 
operations E* and (12). For the H 2 molecule we can form a symmetry group using these elements: {E, (12), 
E*, (12)*}, where 

(12) # = {\2)E* = E\YZ) (A1.4.66) 

and the character table of the group is given in table Al.4.4 . The effect of the operation (12)* on a 
wavefunction is simply the product of the effects of (12) and£*. The labelling of the states as (+5), (-5), (+a) 


or (-a) is thus according to the irreducible representations of the symmetry group and the nondegenerate 
energy levels of the H 2 molecule are of four different symmetry types in this group. 
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Table Al.4.4 The character table of a symmetry group for the H 2 molecule. 

E (12) E* (12)* 


+s 
-s 
+a 
-a 


The energy level of an an /-fold degenerate eigenstate can be labelled according to an /- 
fold degenerate irreducible representation of the symmetry group, as we now show. 

Suppose the / orthonormal- eigenfunctions ^ w p^ w 2 v * '^nl a ^ ^ ave ^ e same eigenvalue 
E n of the molecular Hamiltonian. If we apply a symmetry operation 7? to one of these 

functions the resulting function will also be an eigenfunction of the Hamiltonian with 

eigenvalue E (see ( equation A 1.4. 61 ) and the sentence after it) and the most general 

function of this type is a linear combination of the / functions Y . given above. Thus, 

using matrix notation, we can write the effect of R as- 


1 

1 

1 

1 

1 

1 

-1 

-1 

1 

-1 

1 

-1 

1 

-1 

-1 

1 


R^i^j^DlR]^ (A1.4.67) 

j=i 

where / = 1,2,. . .,/. For example, choosing i= 1, we have the effect of R on X F^ 1 as: 

R* ri] = />[*]!]*«, + W[W]j 2 +»2 + - ■ - + />[*]u*ij. (A1.4.68) 

The Z>[i?] .. are numbers and £>[,#] is a matrix of these numbers; the matrix D[R] is 
generated by the effect of 7^ on the / functions W ni . We can visualize (equation Al.4.67) 
as the effect of R acting on a column matrix W being equal to the product of a square 
matrix D[R] and a column matrix ¥ , i.e., 

«[*„] = [/)[«]][*.]. (A1.4.69) 

Each operation in a symmetry group of the Hamiltonian will generate such an / x / 
matrix, and it can be shown (see, for example, appendix 6-1 of [I]) that if three 
operations of the group P^ P 2 and P^ 2 are related by 

P\Pl — P[2 (A1.4.70) 

then the matrices generated by application of them to the Y . (as described by (equation 
Al.4.67)) will satisfy 


D^JDtft] = D[P }2 1 (A1.4.71) 

Thus, the matrices will have a multiplication table with the same structure as the 
multiplication table of the symmetry group and hence will form an /-dimensional 
representation of the group. 
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A given /-fold degenerate state can generate a reducible or an irreducible /-dimensional representation of the s) 
group considered. If the representation is irreducible then the degeneracy is said to be necessary, i.e., imposed 
symmetry of the Hamiltonian. However, if the representation is reducible then the degeneracy between the difi 
states is said to be accidental and it is not imposed by the symmetry of the Hamiltonian. The occurrence of ace 
degeneracy can indicate that some other symmetry operation has been forgotten, or paradoxically it can indical 
many symmetry operations (called unfeasible symmetry operations in section A 1.4.4 ) have been introduced. 

These considerations mean that, for example, using the symmetry group S^ for the PH 3 molecule (see table Al 
energy levels are determined to be of symmetry type A^ A 2 or E. In molecular physics the labelling of molecul 
levels according to the irreducible representations of a symmetry group is mainly what we use symmetry for. C 
have labelled the energy levels of a molecule, we can use the labels to determine which of the levels can inter a 
each other as the result of adding a term w to the molecular Hamiltonian. This term could be the result of applyi 
external perturbation such as an electric or magnetic field, it could be the result of including a previously uncoi 
term from the Hamiltonian, or this term could result from the effect of shining electromagnetic radiation throuj 
of the molecules. In this latter case the symmetry labels enable us to determine the selection rules for allowed t 
in the spectrum of the molecule. All this becomes possible by making use of the vanishing integral rule. 

A1.4.2.6 THE VANISHING INTEGRAL RULE 

To explain the vanishing integral rule we first have to explain how we determine the symmetry of a product. G 
fold degenerate state of energy E and symmetry Y , with eigenfunctions O < ,0 9 ,. . .,0 , and an r-fold degen 

vi "" vi vi i via, vis 

state of energy E and symmetry F , with eigenfunctions O wl ,0^ 2 ,. . .,0^ r , we wish to determine the symmel 
the set of functions ^F.. = O^xp^., where / = 1, 2,. . .,s andy =1, 2,. . .,r. There will be s x r functions of the type 
matrices D 1 "and O 1 p in the representations Y and F , respectively, are obtained from (see ( equation Al.4.67 )) 


tf* PN =J^D r "[fiU<t> Iik 


jt_l 


and 

where R is an operation of the symmetry group. To obtain the matrices in the representation F nm we write 

J r 

«L*in*«j] = E E r *[*]i*0 r "[*l//*«i*»ri 
t=l /=l 
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and we can write this as 


«^ = £Eo r i%^^ 


From this we see that the s x r dimensional representation Y nm generated by the s x r functions ¥.. has matric* 
elements given by 

D T "[KlijM = D r "[R] ik D r "[R] jf 

where each element of D Fwm is indexed by a row label ij and a column label kl, each of which runs over s x r va 
y,y diagonal element is given by 


lu 


and the character of the matrix is given by 

x T ""[H] = E L ^"[Kly.tf = L E Dr M» » r " [«]jy 

Jt-I J-l t-l f-L 

= x r "[*]x rn [*] 

We can therefore calculate the character, under a symmetry operation R, in the representation generated by the 
of two sets of functions, by multiplying together the characters under 7? in the representations generated by eac 
sets of functions. We write Y nm symbolically as 

1 jam = ' w *? ^ m 

where the characters satisfy (equation Al.4.78) in which usual algebraic multiplication is used. Knowing the cl 
in Y nm from (equation Al.4.78) we can then reduce the representation to its irreducible components using (equ 
Al.4.47). Suppose Y nm can be reduced to irreducible representations r i? T 2 and T 3 according to 

^0^ = 3^ ©r 2 ®2ry 

In this circumstance we say that Y contains r\ ,r~ and r\: since r ® Y contains r\ , for example, we write 

J nm 1' 2 3' n m I s r ' 
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Suppose that we can write the total Hamiltonian as /?= /?° + /?', where /?' is a perturbation. Let us further 

suppose that the Hamiltonian lr (//' having been neglected) has normalized eigenfunctions mand ji, with 
eigenvalues ^wrand ^n, respectively, and that // commutes with the group of symmetry operations G = 

{R v R 2 ,...,R h }- li will transform as the totally symmetric representation T^ s ^ of G, and we let ^m,^n and /?' 


generate the representations F , T n and F of G, respectively. The complete set of eigenfunctions of/? forms 

a basis set for determining the eigenfunctions and eigenvalues of the Hamiltonian //= // + //' and the 
Hamiltonian matrix H in this basis set is a matrix with elements H given by the integrals 

H„ m = j *; *(H n + H'WZiiz = a^JSj + //; rt (A1.4.82) 


where 




(A1.4.83) 


The eigenvalues £ of //can be determined from the Hamiltonian matrix by solving the secular equation 

\ff ma -S mn E\ =0. (A1.4.84) 

//' 

In solving the secular equation it is important to know which of the off-diagonal matrix elements * vanish 

since this will enable us to simplify the equation. 

We can use the symmetry labels T m and Y n on the levels E^ and E" T , together with the symmetry F of//', to 
determine which /^elements must vanish. The function ip^ */}■" ^generates the product representation 

V m * ® f ® Fj, = r^ JT t 1 ^ * has synnnelry 1"^ *). We can now state the vanishing integral rule-: the matrix 
element 


/ 


*J*H'*Jdr=0 (A1.4.85) 


if 

(A1.4.86) 


r,/<g>r'®r„2r< s > 
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where r' s ^ is the totally symmetric representation. If/?' is totally symmetric in G then tf^will vanish if 

iV®r rt ^r** (m.4.87) 

i.e., if 

T m ^T n . (A1.4.88) 

It would be an accident if "^vanished even though T* mfi D V^\ but if this were the case it might well 
indicate that there is some extra unconsidered symmetry present. 

The value of the vanishing integral rule is that it allows the matrix H to be block diagonalized. This occurs if 


we order the eigenfunctions ^according to their symmetry when we set up H. Let us initially consider the 
case when F = F s ). In this case all off-diagonal matrix elements between ^^basis functions of different 
symmetry will vanish, and the Hamiltonian matrix will block diagonalize with there being one block for each 
symmetry type of ^function. Each eigenfunction of //will only be a linear combination of ^functions 

having the same symmetry in G (G being the symmetry group of // . Thus the symmetry of each 
eigenfunction V F. of //in the group G will be the same as the symmetry of the ip^basis functions that make it 

up (G is a symmetry group of //when F = F s )) and each block of a block diagonal matrix can be diagonalized 
separately, which is a great simplification. The symmetry of the W. functions can be obtained from the 
symmetry of the ^functions without worrying about the details of//' and this is frequently very useful. 

When F ^ F s ) all off-diagonal matrix elements between X F° functions of symmetry T and T will vanish if 
(equation Al.4.87) is satisfied, and there will also be a block diagonalization of H (it will be necessary to 
rearrange the rows or columns of //, i.e., to rearrange the order of the ^functions, to obtain Hin block 
diagonal form). However, now nonvanishing matrix elements occur in H that connect ^functions of 
different symmetry in G and as a result the eigenfunctions of //may not contain only functions of one 
symmetry type of G; when F ^ F s ) the group G is not a symmetry group of //and its eigenfunctions V F. 
cannot be classified in G. However, the classification of the basis functions 4^in G will still allow a 
simplification of the Hamiltonian matrix. 

The vanishing integral rule is not only useful in determining the nonvanishing elements of the Hamiltonian 
matrix H. Another important application is the derivation of selection rules for transitions between molecular 
states. For example, the intensity of an electric dipole transition from a state with wavefunction q/^'^Ho a 

state with wavefunction \j/ ! ^ ■'"> *(see ( equation Al. 4. 12 )) is proportional to the quantity 


\T\ 2 = 


j<<'V,< W) dr 


(A1.4.89) 


-27- 


where |u^, A = X, Y, Z, is the component of the molecular dipole moment operator along the A axis. If 
ip^^^and \#\* - m J ^belong to the irreducible representations ^^'• rn 'rhnd p'_£ ' n / *, respectively and \i A has 

I I 9 

the symmetry r(|u A then | T\ , and thus the intensity of the transition, vanishes unless 


zf* r * <s> r<n A ) ® rjf^ d r<*. (a-iaqo) 


In the rotational symmetry group K (spatial) discussed in section A 1.4.1. 1 , we have p* '■"■« | V*= D^ F \ 
pi> '.ttij.i= d(F ") anc j pjj, ) = B^\ In this case the application of the vanishing integral rule leads to the 
selection rule given in ( equation Al. 4. 13 ) (see section 7.3.2, in particular equation (7-47), of [1]). 

A1.4.3 SYMMETRY OPERATIONS AND SYMMETRY GROUPS 

The various types of symmetry enumerated in section A 1.4. 1.2 are discussed in detail here and the symmetry 
groups containing such symmetry operations are presented. 


A1.4.3.1 TRANSLATIONAL SYMMETRY 

In the active picture adopted here the (X,Y,Z) axis system remains fixed in space and a translational symmetry 
operation changes the (X,Y,Z) coordinates of all nuclei and electrons in the molecule by constant amounts, (A 
X, A Y, A Z) say, 

(Xj r, Yi r, ZO -+ (Xi + AX. Y f - AY, Z, + AZ). (A1.4.91) 

We obtain a coordinate set more suitable for describing translational symmetry by introducing the centre of 
mass coordinates 


* i=l j = I t=\ ' 


(A1.4.92) 


together with 

(X it Y ( , Z,) = (Xi - X . Y, - Y . Z/ - 2„) (A1.4.93) 

for each particle /, where there are / particles in the molecule (TV nuclei and / - TV electrons), m f is the mass of 
particle i and M — Y^! , m^ s the total mass of the molecule. In this manner we have introduced a new axis 
system {X, Y, Z) 
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with axes parallel to the (X,Y,Z) axes but with origin at the molecular centre of mass. The molecule is 
described by the 3/ coordinates 

Xo. Yo- Zo. Xi* Yin Z2, X^. ¥}, Z$ T .. , ? Xf. Fj, Z^. 

The coordinates (X^Y^Z^) are redundant since they can be determined from the condition that the (X, Y, Z) 
axis system has origin at the molecular centre of mass. Obviously, the translational symmetry operation 
discussed above has the effect of changing the centre of mass coordinates 

(X . Y . Zo) -* (X ^ AX, Y 1 AY.Zo ■ AZ) (A1.4.94) 

whereas the coordinates X 2 , Y 2 , Z 2 , Xy Yy Z^...^Y^ Z 1 are unchanged by this operation. 

We now define the effect of a translational symmetry operation on a function. Figure Al.4.3 shows how a 
PH 3 molecule is displaced a distance A X along the X axis by the translational symmetry operation that 
changes X Q to X^= X Q + A X. Together with the molecule, we have drawn a sine wave symbolizing the 

molecular wavefunction, *¥. say. We have marked one wavecrest to better keep track of the way the function 
is displaced by the symmetry operation. For the physical situation to be unchanged by the symmetry 
operation, the marked wavecrest and thus the entire wavefunction, is displaced by A X along the X axis as 
shown in Figure Al.4.3 . Thus, an operator jj^X. AY.AJO w^^h describes the effect of the translational 

symmetry operation on a wavefunction, is defined according to the S-convention (see ( equation Al.4.57 )) 


(A1.4.95) 


J? I " ' *^(Xy, Y<j,Zy, X2* Y2> Z2* Xj. Kj* Zj, > ►♦* XV> }% Z/) 

- *;(X„ - AX, Yu - AY, Zy - AZ, X 2? ft, Z 3fc Xj, K 3 , Z 3 , . . . . Xj, F,, Z r ). 


This definition causes the wavefunction to 'move with the molecule' as shown for the X direction in figure 
Al.4.3 . The set of all translation symmetry operations /J^* ,AY " ^constitutes a group which we call the 
translational group G T Because of the uniformity of space, Gj is a symmetry group of the molecular 


Hamiltonian //in that all its elements commute with //: 


[<"*■*» S] = o. 


(A1.4.96) 


We could stop here in the discussion of the translational group. However, for the purpose of understanding the 
relation between translational symmetry and the conservation of linear momentum, we now show how the 
operator ^^X.i\Y.AJO can be expressed in terms of the quantum mechanical operators representing the 

translational linear momentum of the molecule; these operators are defined as 
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(A1.4.97) 
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The translational linear momentum is conserved for an isolated molecule in field free space and, as we see 
below, this is closely related to the fact that the molecular Hamiltonian commutes with /j>' i * ,iY - ii '' , for all 

values of (A X, A Y, A Z). The conservation of linear momentum and translational symmetry are directly 
related. 
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AX 


•A 

aaAaa 


X 


xs 


Figure Al.4.3. A PH 3 molecule and its wavefunction, symbolized by a sine wave, before (top) and after 
(bottom) a translational symmetry operation. 


.(AX.AY.A/X 


In order to determine the relationship between /ji. iA,aif li/,J and the (F x ,jP y ,F z ) operators, we consider a 
translation J?, ( Jr Hvhere A X is infinitesimally small. In this case we can approximate the right hand side of 
( equation Al. 4. 95 ) by a first-order Taylor expansion: 


*» mm * ; (X Bl Y„, Z Ul X : , y 2h Z 2 X h Y h Z,) 

= *j(X[| — 5X, Y u . Zu, X^ J^ Z2 Xj\ Fi, ZO gg 

A*- 

= *;(Xy, Yy, Zy, J£* t Y2, Z 2+ .. ., Xf, Ij, Zj) — — — sx. 

tfAu 

From the definition of the translational linear momentum operator jP x (in ( equation Al.4.97 )) we see that 

—±=LlftQj (A1.4.99) 

and by introducing this identity in (equation Al.4.98) we obtain 

jjffXAO)^ = ^ _ I 5X Px+y (A1.4.100) 
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where we have omitted the coordinate arguments for brevity. Since the function X F. in ( equation A 1.4. 100 ) is 
arbitrary, it follows that we can write the symmetry operation as 


jjtfx.o.w _ 1 _ L&XP X + (A1.4.101) 

. JL Ktf rL r\ , 

The operation R^ , for which A X is an arbitrary finite length, obviously has the same effect on a 

wavefunction as the operation fl' applied to the wavefunction AX/8X times. We simply divide the 

translation by A X into AX/8X steps, each step of length SX. This remains true in the limit of SX — » 0. Thus 




(A1.4.102) 


where we have used the general identity 

lim(l +«)- v/J = exp(tfy). (A1 .4.103) 

We can derive expressions analogous to (equation Al.4.102) for R^ i4V0 *and fl*? ,0,i ^and we can resolve a 
general translation /fi. AX,aY " i/) as 

ntAX.iY.AZl __ „(AXA.Q) »I0>AY.0) »<0>Q.A£} / A1 4 1Q4 x 

Consequently, 

(A1.4.105) 


gjAX.AY.AZ) = ej . p Li. (AX p x + AY /Vt AZ/* Z )1 . 


We deal with the exponentials in (equation A 1.4. 102) and (equation A 1.4. 105) whose arguments are operators 
by using their Taylor expansion 

CXp(it?) = H-i5+— (i£?) 2 + -- (A1 .4.106) 

where C/ is a Hermitian operator. 
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It follows from ( equation Al .4.105 ) that ( equation A 1.4 .96 ) is satisfied for arbitrary A X, A Y, A Z if and only 
if 


[H, Px] = [H, P Y ] = [H, P z ] = 0. (A1.4.107) 

From the fact that //commutes with the operators (jP x ,/* y ,jP z ) it is possible to show that the linear 

momentum of a molecule in free space must be conserved. First we note that the time-dependent 
wavefunction W(t) of a molecule fulfills the time-dependent Schrodinger equation 

iTi = HV{t). (A1.4.108) 

fit 

For A = X, Y, or Z, we use this identity to derive an expression for 

1 99(f) a \ / 3*(/)\ 


(A1.4.109) 


where, in the last equality, we have used the fact that F A does not depend explicitly on t. We obtain dWty/d t 
from (equation Al.4.108) and insert the resulting expression in (equation Al.4.109); this yields 

|-<*W|P A |*(I» = p((/f*(0|^l*(*)) - (*(OI^|W*{/))) 

(A1.4.110) 

^^*(O|[tfiJ**]l*WH = 
n 

where we have used (equation Al.4.107) in conjunction with the fact that //is Hermitian. (Equation Al.4.1 10) 
shows that the expectation value of each linear momentum operator is conserved in time and thus the 
conservation of linear momentum directly follows from the translational invariance of the molecular 
Hamiltonian (( equation Al.4.96 )). 


Because of (equation Al.4.107) and because of the fact that jP x , P Y and P z commute with each other, we 

s*i. ^H. /H. --. 

know that there exists a complete set of simultaneous eigenfunctions of P x , F y , P z and //. An eigenfunction 
of F x , JP Y and JP Z has the form 

*t(X 0t Y*. Z ) = exp[i(* x X fl + *vY -/ftZ )] (A1.4.111) 
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where 

Pa*t(Xo, Y,,Z fi ) = M a *t(X , Y,,2 ) (A1.4.112) 

with A = X, Y or Z, so that (( equation Al. 4. 105 )) 

<^,at,az^ t(X ^ \V ^q) = csp[-i(AXt x + AYAy + A^)]^ T (X <lh Y» h 2^ (A1 .4.1 13) 

That is, the effect of a translational operation is determined solely by the vector with components (k x ,k Y ,k z ) 
which defines the linear momentum. 

For a molecular wavefunction W .(X Q , Y Q , Z Q , X 2 , Y 2 , Z 2 , ...,Xy Y^ Zj) to be a simultaneous eigenfunction of 
P x , P Y , P z and //it must have the form 

%iX*< Yo, Z* X 2h Y 2 < Z 2 X f < Y it /;! = *r<X^ Y . Zol^l*:. Y 2 , / : X,. Y u ZfJ (A1 .4.1 14) 

where ^ t describes the internal motion of the molecule (see also section 7.3.1 of [1]). 

We can describe the conservation of linear momentum by noting the analogy between the time-dependent 
Schrodinger equation, ( equation Al.4.1 08 ), and ( equation Al. 4. 99 ). For an isolated molecule, //does not 
depend explicitly on t and we can repeat the arguments expressed in ( equation Al. 4. 98 ), ( equation Al. 4. 99 ), 
( equation A 1.4. 100 ), ( equation A 1.4. 101 ) and ( equation A 1.4. 102 ) with X replaced by t and P x replaced by - 

// to show that 


#tf)=expfi/^*a = 0). 


(A1.4.115) 


If the wavefunction at t = 0, W(t = 0), is an eigenfunction of jP x , jP y , P z and //so that it can be expressed as 
given in (equation Al.4.1 14), it follows from (equation Al.4.1 15) that at any other time t, 


*{0 = e\p(^rEj *(f = 


0) (A1.4.116) 


where E is the energy (i.e., the eigenvalue of //associated with the eigenfunction W(t = 0)). It is 
straightforward to show that this function is an eigenfunction of jP x , P Y , P z and //with the same eigenvalues 

as W(t = 0). This is another way of proving that linear momentum and energy are conserved in time. 


A1. 4.3.2 ROTATIONAL SYMMETRY 


In order to discuss rotational symmetry, we must first introduce the rotational and vibrational coordinates 
customarily used in molecular theory. We define a set of (x, y, z) axes with an orientation relative to the (X, Y, 
Z) axes discussed 
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above that is defined by the positions of the nuclei. These axes are called 'molecule fixed' axes; their 
orientation is determined by the coordinates of the nuclei only and the coordinates of the electrons are not 
involved. The (x, y, z) and (X, Y, Z) axis systems are always chosen to be right handed. For any placement of 
the TV nuclei in space (i.e., any set of values for the 37V- 3 independent coordinates X, 7. and Z i of the nuclei) 
there is an unambiguous way of specifying the orientation of the (x, y, z) axes with respect to the (X, Y, Z) 
axes. Three equations are required to define the three Euler angles (0, (|), x) (see figure Al.4.4 that specify this 
orientation and the equations used are the Eckart ( equation Al. 4. 5 ). The Eckart equations minimize the 
angular momentum in the (x, y, z) axis system and so they optimize the separation of the rotational and 
vibrational degrees of freedom in the rotation-vibration Schrodinger equation. It is described in detail in 
chapter 10 of [1]} how, by introducing the Eckart equations, we can define the (x, y, z) axis system and thus 

the Euler angles (0, (|), x). Suffice it to say that we describe the internal motion of a nonlinear molecule - by 3/ 
- 3 coordinates, where the first three are the Euler angles (0, (|), x) describing rotation, the next 37V- 6 are 
normal coordinates Q v Q 2 , Qy . . ., Q^_§ describing the vibration of the nuclei and the remaining 3(1 - TV) 

are electronic coordinates x N+ ^y N+1 z n+\> x a/+2' ^jv+2 z A/+2'•••' x l' ^l z P s i m Ply chosen as the Cartesian 
coordinates of the electrons in the (x, y, z) axis system. 



Figure Al.4.4. The definition of the Euler angles (0, (|), x) that relate the orientation of the molecule fixed (x, 
y, z) axes to the (X, Y, Z) axes. The origin of both axis systems is at the nuclear centre of mass O, and the node 
line ON is directed so that a right handed screw is driven along ON in its positive direction by twisting it from 
Z to z through where < < n. (|) and x have the ranges to 2tl x is measured from the node line. 

We consider rotations of the molecule about space-fixed axes in the active picture. Such a rotation causes the 
(x, y, z) axis system to rotate so that the Euler angles change 


(0,0^)^ (tf-A^-A^ + Ax)- 


(A1.4.117) 


The normal coordinates Q r ,r= 1,2, . . ., 37V- 6, and the electronic coordinates x /? y i z i ,i = N+ 1, 7V+ 2, ...,/, 
all describe motion relative to the (x, y, z) axis system and are invariant to rotations. 
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Initially, we neglect terms depending on the electron spin ^and the nuclear spin /in the molecular 
Hamiltonian //. In this approximation, we can take the total angular momentum to be /V(see ( equation 
Al.4.1 )) which results from the rotational motion of the nuclei and the orbital motion of the electrons. The 
components of /Vin the (X, Y, Z) axis system are given by: 


/ 3 B B \ 

N x = -iJi — sirnA — ^coscctfcos^ col ft cos <* — ) 

N Y — — -l/r J ccs0— - + cosecfl sirt ^- coiO sin^— - \ 

\ v9 fix d$/ 


(A1.4.118) 


(A1.4.119) 


and 


JV 2 = -ift — . (A1.4.120) 

* Hip 

By analogy with our treatment of translation symmetry, we aim to derive an operator R {MK ^' A * Which, 

K 

when applied to a wavefunction, describes the effect of a general symmetry operation that causes the change 

in the Euler angles given in ( equation Al .4. 1 1 7 ). Because of the analogy between (equation A 1.4. 120) and the 

.a 

definition of f* x in ( equation Al. 4. 97 ), we can repeat the arguments expressed in ( equation Al. 4. 98 ), 

( equation Al. 4. 99 ), ( equation A 1.4. 100 ), ( equation A 1.4. 101 ) and ( equation Al.4.1 02 ) with X replaced by § 
to show that 

Rl & *^= cxp (- i-A^V (A1.4.121) 

A more involved derivation (see, for example, section 3.2 of Zare [6]) shows that for a general rotation 

R i&».&+.*x* = Mp / l A fff y \ exp (-L&QNy] exp ( -yAxNz) . (A1.4.122) 

The operators /V^and /V z in (equation Al.4.122) do not commute and we have (see equation (10-90) of [1]) 

[Ny<N z ] =\hN x . (A1.4.123) 

The commutators [/V^,/Vy] and [/V z ,/V^] are obtained by replacing X YZ by ZXY and YZX, respectively, in 

(equation Al.4.123). It is, therefore, important in using (equation Al.4.122) that the exponential factors be 
applied in the correct order. 
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The set of all rotation operations /^"-^^forms a group which we call the rotational group K (spatial). 

Since space is isotropic, K (spatial) is a symmetry group of the molecular Hamiltonian //in that all its 
elements commute with //: 


[R t R ^ A0 ^ t H]=O. (A1.4.124) 

It follows from ( equation A 1.4. 122 ) that (equation Al.4.124) is satisfied for arbitrary (A0,A(|),A%) if and only 
if //commutes with /V^and N z . But then //also commutes with N x because of ( equation Al .4. 1 23 ). That is 


[H h N x \ = [H t N Y ] = [« t A r zl = U (A1.4.125) 

this equation is analogous to ( equation Al.4.107 ). We discussed above (in connection with ( equation 
Al.4.108 ), ( equation Al.4.109 ) and ( equation A 1.4.1 10 )) how the invariance of the molecular Hamiltonian to 
translation is related to the conservation of linear momentum. We now see that, in a completely analogous 
manner, the invariance of the molecular Hamiltonian to rotation is related to the conservation of angular 
momentum. 

The (X, Y, Z) components of /Vdo not commute and so we cannot find simultaneous eigenfunctions of all the 
four operators occurring in (equation A 1.4. 125). It is straightforwardly shown from the commutation relations 
in ( equation Al.4.123 ) that the operator 

JV 2 =N 2 X + Nl + Nl (A1.4.126) 

commutes with N x , Ny, and M z . Because of (equation Al.4.125), this operator also commutes with //. As a 

consequence, we can find simultaneous eigenfunctions of//, N and one component of /V, customarily 
chosen as N z . We can use this result to simplify the diagonalization of the matrix representation of the 

molecular Hamiltonian. We choose the basis functions as ff^ r **'* x \ They are eigenfunctions of N (with 

eigenvalues TV(TV + l)fi 2 , N= 0, 1, 2, 3, 4, . . .) and N z (with eigenvalues mft, m = - TV, -TV+ 1, . . ., TV- 1, TV). 

The functions ly" , v ,m = - TV, -TV+1,. . ., TV- 1, TV, transform according to the irreducible representation D^™ 

of K (spatial) (see section Al. 4. 1.1 ). With these basis functions, the matrix representation of the molecular 
Hamiltonian will be block diagonal in TV and m in the manner described for the quantum numbers F and m F in 
section Al. 4. 1.1 . 

- 
If we allow for the terms in the molecular Hamiltonian depending on the electron spin ^(see chapter 7 of [1]), 

the resulting Hamiltonian no longer commutes with the components of /Vas given in (equation Al.4.125), but 
with the components of 

J=N-S. (A1.4.127) 
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In this case, we choose the basis functions ^ m , that is, the eigenfunctions of J 2 (with eigenvalues J(J+\) 

h 2 ,J= | TV- S | , | TV- S | + 1,. . .,TV + S - 1, TV+ S) and N z (with eigenvalues m/j, mj=-J, -J+ 1,. . .,J-1, J). 
These functions are linear combinations of products ip" „ *J/" , where the function ip CJ „ is an 

eigenfunction of N 2 and N z as described above, and *" 5 r is an eigenfunction of S 2 (with eigenvalues S(S 

+ lyj 2 , S=0, 1/2, 1, 3/2, 2, 5/2, 3, . . .) and s z (with eigenvalues m/j, m s = - S, -S + 1,. . .,5-1, S). In this 


basis, the matrix representation of the molecular Hamiltonian is block diagonal in J and nij. The functions 
^/" 5 = -S, -S + 1 , ... ,5 -1 , S, transform according to the irreducible representation D^ ofK (spatial) and 
the functions i]f ° , = -J, -J + 1 , . . . J-l , J, have D^ symmetry in K (spatial). Singlet states have S = and 
for them J=N,J = N and ntj = m. 

Finally, we consider the complete molecular Hamiltonian which contains not only terms depending on the 
electron spin, but also terms depending on the nuclear spin / (see chapter 7 of [1]). This Hamiltonian 
commutes with the components of F given in ( equation Al. 4.1 ). The diagonalization of the matrix 
representation of the complete molecular Hamiltonian proceeds as described in section A 1.4. 1.1 . The theory 
of rotational symmetry is an extensive subject and we have only scratched the surface here. A relatively new 
book, which is concerned with molecules, is by Zare [6] (see [7] for the solutions to all the problems in [6] 
and a list of the errors). This book describes, for example, the method for obtaining the functions ty® ^ from 

^" „ and Q/" t , and for obtaining the functions *J/" ,, ( section A 1.4. 1.1 ) from the ^° . combined with 
eigenfunctions of/ 2 and I z . 

A1. 4.3.3 INVERSION SYMMETRY 

We have already discussed inversion symmetry and how it leads to the parity label in section Al. 4. 1.2 and 
section Al. 4. 2. 5 . For any molecule in field- free space, if we neglect terms arising from the weak interaction 
force (see the next paragraph), the molecular Hamiltonian commutes with the inversion operation E* and thus 
for such a Hamiltonian the inversion group $= {E,E*} is a symmetry group. The character table of the 
inversion group is given in table Al.4.5 and the irreducible representations are labelled + and - to give the 
parity. 

Table Al.4.5 The character table of the inversion group S 


E E* 


+ 1 1 
- 1 -1 


Often molecular energy levels occur in closely spaced doublets having opposite parity. This is of particular 
interest when there are symmetrically equivalent minima, separated by a barrier, in the potential energy 
function of the electronic state under investigation. This happens in the PH 3 molecule and such pairs of levels 
are called 'inversion doublets'; the splitting between such parity doublet levels depends on the extent of the 
quantum mechanical tunnelling through the barrier that separates the two minima. This is discussed further in 
section Al. 4.4. 
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The Hamiltonian considered above, which commutes with E* 9 involves the electromagnetic forces between 
the nuclei and electrons. However, there is another force between particles, the weak interaction force, that is 
not invariant to inversion. The weak charged current interaction force is responsible for the beta decay of 
nuclei, and the related weak neutral current interaction force has an effect in atomic and molecular systems. If 
we include this force between the nuclei and electrons in the molecular Hamiltonian (as we should because of 
electro weak unification) then the Hamiltonian will not commute with E* 9 and states of opposite parity will be 
mixed. However, the effect of the weak neutral current interaction force is incredibly small (and it is a very 
short range force), although its effect has been detected in extremely precise experiments on atoms (see, for 


example, Wood et al [8], who detect that a small part (-10 ) of a P state of caesium is mixed into an S state 
by this force). Its effect has not been detected in a molecule and, thus, for practical purposes we can neglect it 
and consider E* to be a symmetry operation. Note that inversion symmetry is not a universal symmetry like 
translational or rotational symmetry and it does not derive from a general property of space. In the theoretical 
physics community, when dealing with particle symmetry, the inversion operation is called the 'parity 
operator' P. 

An optically active molecule is a particular type of molecule in which there are two equivalent minima 
separated by an insuperable barrier in the potential energy surface and for which the molecular structures at 
these two minima are not identical (as they are in PH~) but are mirror images of one another. The two forms 
of the molecule are called the dextrorotatory (D) and laevorotatory (L) forms and they can be separated. The 
D and L wavefunctions are not eigenfunctions of E* and E* interconverts them. In the general case 
eigenstates of the Hamiltonian are eigenstates of E* and they have a definite parity. In the laboratory, when 
one makes an optically active molecule one obtains a racemic 50/50 mixture of the D and L forms, but in 
living organisms use is made of only one isomer; natural proteins, for example, are composed exclusively of 
L-amino acids, whereas nucleic acids contain only D-sugars. This fact is unexplained but it has been pointed 
out (see [9] and references therein) that in the molecular Hamiltonian the weak neutral current interaction 
term // WI would give rise to a small energy difference between the energy levels of the D and L forms, and 

this small energy difference could have acted to select one isomer over the long time of prebiotic evolution. 
The experimental determination of the energy difference between the D and L forms of any optically active 
molecule has yet to be achieved. However, see Daussy C, Marrel T, Amy-Klein A, Nguyen C T, Borde C J 
and Chardonnet C 1999 Phys. Rev. Lett. 83 1554 for a recent determination of an upper bound of 13 Hz on the 
energy difference between CHFClBr enantiomers. 

A very recent paper concerning the search for a parity- violating energy difference between enantiomers of a 
chiral molecule is by Lahamer A S, Mahurin S M, Compton R N, House D, Laerdahl J K, Lein M and 
Schwerdtfeger P 2000 Phys. Rev. Lett. 85 4470. The importance of the parity-violating energy difference in 
leading to prebiotic asymmetric synthesis is discussed in Frank P, Bonner W A and Zare R N 2000 On one 
hand but not the other: the challenge of the origin and survival of homochirality in prebiotic chemistry 
Chemistry for the 21st Century ed E Keinan and I Schechter (Weinheim: Wiley- VCH) pp 175-208. 

A1. 4.3.4 IDENTICAL PARTICLE PERMUTATION SYMMETRY 

If there are n electrons in a molecule there are n\ ways of permuting them and we can form the permutation 
group (or symmetric group) S^of degree n and order n\ that contains all the electron permutations. The 
molecular Hamiltonian is invariant to the elements of this group. Similarly, there can be sets of identical 
nuclei in a molecule and the Hamiltonian is invariant to the relevant identical-nucleus permutation groups. For 
example, the ethanol molecule 
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CH 3 CH 2 OH consists of 26 electrons, a set of six identical hydrogen nuclei, a set of two identical carbon 
nuclei and a lone oxygen nucleus. The molecular Hamiltonian of ethanol is therefore invariant to the 26! (-4 x 
10 26 ) elements of the electron permutation group S^J, the 6! = 720 possible permutations of the hydrogen 
nuclei in the group si H * an d the two possible permutations of the C nuclei (E and their exchange) in the group 
S^ S - The group of all possible permutations of identical nuclei in a molecule is called the complete nuclear 
permutation (CNP) group of the molecule G p . For ethanol G consists of all 6! elements of s\}^and of 

all these elements taken in combination with the exchange of the two C nuclei; 2x6! elements in all. This 
CNP group is called the direct product of the groups 5' H *and <^ c? and is written 

(A1.4.128) 


The CNP group of a molecule containing / identical nuclei of one type, m of another, n of another and so on is 
the direct product group 

G CN|1 = Si ® S rJ , ® S H . . * (A1.4.129) 

and the order of the group is /! x ml x «! . . .. It would seem that we have a very rich set of irreducible 
representation labels with which we can label the molecular energy levels of a molecule using the electron 
permutation group and the CNP group. But this is not the case for internal states described by W^ (see 
( equation A 1.4.1 14 )) because there is fundamentally no observable difference between states that differ 
merely in the permutation of identical particles. The environment of a molecule (e.g. an external electric or 
magnetic field, or the effect of a neighbouring molecule) affects whether the Hamiltonian of that molecule is 
invariant to a rotation operation or the inversion operation; states having different symmetry labels from the 
rotation or inversion groups can be mixed and transitions can occur between such differently labelled states. 
However, the Hamiltonian of a molecule regardless of the environment of the molecule is invariant to any 
identical particle permutation. Two ^ t states that differ only in the permutation of identical particles are 
observationally indistinguishable and there is only one state. Since there is only one state it can only transform 
as one set of irreducible representations of the various identical particle permutation groups that apply for the 
particular molecule under investigation. It is an experimental fact that particles with half integral spin (called 
fermions), such as electrons and protons, transform as that one-dimensional irreducible representation of their 
permutation group that has character +1 for all even permutations- and character -1 for all odd permutations. 
Nuclei that have integral spin (called bosons), such as 12 C nuclei and deuterons, transform as the totally 
symmetric representation of their permutation group (having character +1 for all permutations). Thus fermion 
wavefunctions are changed in sign by an odd permutation but boson wavefunctions are invariant. This simple 
experimental observation has defied simple theoretical proof but there is a complicated proof [ 10 ] that we 
cannot recommend any reader of the present article to look at. 

The fact that allowed fermion states have to be antisymmetric, i.e., changed in sign by any odd permutation of 
the fermions, leads to an interesting result concerning the allowed states. Let us write a state wavefunction for 
a system of n noninteracting fermions as 

|X) = |ai)|^}|cj}...|^} (A1.4.130) 
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where this indicates that particle 1 is in state a, particle 2 in state b and so on. Clearly this does not correspond 
to an allowed (i.e., antisymmetric) state since making an odd permutation of the indices, such as (12), does not 
give -1 times | X). But we can get an antisymmetric function by making all permutations of the indices in | X) 
and adding the results with the coefficient -1 for those functions obtained by making an odd permutation, i.e., 


|F)= J2±P|tf|)|fc)|Q>...k/„) 


(A1.4.131) 
P 

where the sum over all permutations involves a + or - sign as the permutation P is even or odd respectively. 
We can write (equation A 1.4. 131) as the determinant 


|F} = 


I*,} lAi) ki> ... i?i> 

|a 2 } |A 2 ) |c 2 ) ... to) 
|fl«) l*n) k«) ■ ■■ \q n ) 


(A1.4.132) 


The state | F) is such that the particle states a,b,c,...,q are occupied and each particle is equally likely to be 
in any one of the particle states. However, if two of the particle states a, b,c,...,q are the same then | F) 
vanishes; it does not correspond to an allowed state of the assembly. This is a characteristic of antisymmetric 
states and it is called 'the Pauli exclusion principle': no two identical fermions can be in the same particle 
state. The general function for an assembly of bosons is 


\B) = ^P\a\}\I?2)\cy)...\q ti ) (A1.4.133) 


where the sum over all permutations involves just '+' signs. In such a state it is possible for two or more of 
the particles to be in the same particle state. 

It would appear that identical particle permutation groups are not of help in providing distinguishing 
symmetry labels on molecular energy levels as are the other groups we have considered. However, they do 
provide very useful restrictions on the way we can build up the complete molecular wavefunction from basis 
functions. Molecular wavefunctions are usually built up from basis functions that are products of electronic 
and nuclear parts. Each of these parts is further built up from products of separate 'uncoupled' coordinate (or 
orbital) and spin basis functions. When we combine these separate functions, the final overall product states 
must conform to the permutation symmetry rules that we stated above. This leads to restrictions in the way 
that we can combine the uncoupled basis functions. 

We explain this by considering the H 2 molecule. For the H 2 molecule we label the electrons a and b 9 and the 
hydrogen nuclei 1 and 2. The electron permutation group is g^ )= {E,{ab)}, and the CNP group G CNP = {E, 

(12)}. The character tables of these groups are given in table Al. 4. 6 and table Al. 4.7 }. If there were no 
restriction on permutation symmetry we might think that the energy levels of the H 2 molecule could be of any 
one of the following four symmetry 
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types using these two groups: (p] c l rf W *)> (T^\ rf* P) ), (T^\ rj CNP> ) and (rj^ rf™) However, both 
electrons and protons are fermions (having a spin of 1/2) and so, from the above rules, the wavefunctions of 
the H 2 molecule must be multiplied by -1 by both (ab) and (12). Thus the energy levels of the H 2 molecule 
can only be of symmetry (r^\ r^ CNI>) )- 

Table Al.4.6 The character table of the group ^2 . 


E</b< (ab) 


1 L 
1 2 


These limitations lead to electron spin multiplicity restrictions and to differing 
nuclear spin statistical weights for the rotational levels. Writing the electronic 
wavefunction as the product of an orbital function W Q and a spin function x F es , 
there are restrictions on how these functions can be combined. The restrictions are 
imposed by the fact that the complete function ^^F c has to be of symmetry 
P^in the group g^'K The orbital function W Q can be of symmetry p L "tar p^and, 

for example, ¥ for the ground electronic state of H 9 has symmetry p|*^. For a two 
e z | 

electron system there are four possible electron spin functions—: aa, ap, Pa and 
PP, where a is a 'spin-up' function having m s = +1/2 and P is a 'spin-down' 
function having m s = -1/2. The functions jj U )= aa and \j^ - PP are invariant to 

the operation (ab) and therefore have symmetry rj*""*. The functions ap and Pa are 
interchanged by (ab) and do not transform irreducibly, but it is easy to see that 
their sum and difference, 4^ - (txfi + fitx)f>/2 and *£J >= (ar/J - /fcr)/>/2, transform 
as pj*-"^and p.^ Respectively. The three functions j/( ' ), ^j^'and 4^, each of 

symmetry p™ form a triplet electron spin state (with m s = 1,-1 and 0, for S = 1) 
and the function vu ^ ' \ having symmetry pl^is a singlet state (with S = 0). The 

ground electronic state cannot be a triplet state since if it were then the symmetry 
of both X ¥ Q and v F es would be p^and the product would therefore be of symmetry 
pj L "Kvhich is not allowed. Hence the ground electronic state of H 2 has to be a 
singlet electronic state. 


The way we combine the nuclear spin basis functions Y with the rotation- 
vibration-electronic basis functions Y in H 2 follows the same type of argument 

using the nuclear permutation group G CNP . Rovibronic states of symmetry 
jMCNPj can on jy k e combined with *F of species p^ CNFh3 (of which there is one with 

/ = 0), and rovibronic states of symmetry p^ CXP) can only be combined with *P of 
species P^ p "(of which there are three with /= 1). Thus rovibronic states of 
symmetry p* CNP> have a nuclear spin statistical weight of 1, and rovibronic states 
of symmetry p^ CNP) have a nuclear spin statistical weight of 3. An interesting result 

of these considerations follows for the lft Q 2 molecule by using the G p group. 

Labelling the O nuclei 1 and 2 this group is as in table Al.4.7 . The spin of O 
nuclei is and so the nuclear spin wavefunction is of species p< CNP \ There is no 

nuclear spin wavefunction of species p^ CNP) in ift G 2 . Since 16 nuclei are bosons 
the complete wavefunction must be of symmetry and thus 
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rovibronic states of species p^ CNI>J ( which can only be combined with a nuclear spin wavefunction of species 

r (CNP) 

1 2 ) have no nuclear spin partner with which to combine. Thus these states cannot occur and are 'missing.' 

This means that half the rotational levels of every vibronic state are missing in this molecule. Missing levels 
arise in other molecules and can also involve nuclei with nonzero spin; they arise for the ammonia molecule 
NH 3 . 


Table Al.4.7 The character table of the group G (CNP) for H 2 . 


(12) 


r (CNP) 1 i 

1 I 


n 


(CNP) <| _<| 


The Pauli exclusion principle follows from the indistinguishability of electrons and the rules of fermion 
permutation. It prevents the occurrence of states that have two or more electrons in the same particle state. As 
a result of the indistinguishability of nuclei, and the rules of fermion and boson permutation, there are missing 
levels. Both of these results can be tested experimentally. A negative result from trying to put an electron into 
the 1 S state of Cu (this state already having two electrons of opposite spin in it) was reported by Ramberg and 
Snow [11] and by analysing their results they determined an upper limit for the violation of the Pauli 

exclusion principle of 1.7 x 10 ; this means that at this level the electrons are indistinguishable. Attempts to 
observe spectral lines that would arise from transitions between 'missing' levels have been made in order to 
see whether the levels are truly missing. Such missing levels would arise if the nuclei involved are not 
completely identical. Such a situation is conceivable. Three negative attempts at a sensitivity level of only 
about 10 have been reported [12, 13, 14 ]. 

A1. 4.3.5 TIME REVERSAL SYMMETRY 

The time reversal symmetry operation #(or T) is the operation of reversing the direction of motion; it reverses 
all momenta, including spin angular momenta, but not the coordinates (see [15] for a good general account of 
this symmetry operation). As with the inversion operation E* the weak interaction force is not invariant to 
time reversal and we discuss this further in the next subsection. However, for all practical purposes in 
molecular physics we can take this to be a symmetry operation. This symmetry operation has the property, 
unlike the other symmetry operations discussed here, of being antiunitary. Also, time reversal invariance does 
not lead to any conservation law and molecular states are not eigenstates of £ . However, this symmetry 
operator constrains the form of the Hamiltonian, an example being that no term in the Hamiltonian can 
contain the product of an odd number of momenta. Also, it is sometimes a useful tool in determining whether 
certain matrix elements vanish (see, for example, [16]) and it can be responsible for extra degeneracies. In 
particular, if a symmetry group has a pair of irreducible representations, Y and T* say, whose characters are 
the complex conjugates of each other, then energy levels of symmetry Y and T* will always coincide in pairs 
and be degenerate because of time reversal symmetry. Such a pair of irreducible representations of a 
symmetry group are called 'separably degenerate'. The irreducible representations E + and E_ of the point 
group C 3 (see table Al.4.8 ) are separably degenerate. Such a character table can be condensed by adding the 
characters of the separably degenerate irreducible representations and this is done for the C 3 group in table 
Al.4.9 . In the condensed character table the separably degenerate representations are marked 'sep'. 
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Table Al.4.8 The character table of the point group Cy 



E 

c 3 

*1 

1 

1 1 

E + 

1 

" co 2 

E_ 

1 

c 2 " 


co = exp(2;i i/3). 
Table Al.4.9 The condensed character table of the C 3 group. 



E 

c 3 . 

E 

1 
2 

1 
-1 sep 


Apart from the degeneracy of separably degenerate states, time reversal symmetry leads to Kramers ' 
degeneracy or Kramers ' theorem: all energy levels of a system containing an odd number of particles with 
half-integral spin (i.e., fermions) must be at least doubly degenerate. One generally only considers systems 
having an odd number of electrons, but if nuclei with half integral spin cause the degeneracy then one must 
resolve the nuclear hyperfine structure for the degeneracy to be revealed. 

A1. 4.3.6 CONCLUDING REMARKS ABOUT SYMMETRIES 

In the above we have discussed several different symmetry groups: the translation group G T , the rotation 

group K (spatial), the inversion group , the electron permutation group and the complete nuclear permutation 

group G CNP . We have also discussed the time reversal symmetry operation . The translational states ® CM can 
be classified according to their linear momentum using G T , but we rarely worry about the translational state of 

a molecule. The internal states ® int can be labelled with their angular momentum (F,m F ) using K (spatial), 
and their parity (±) using . The symmetry in the group leads to restrictions on the electron spin multiplicities 

(the Pauli exclusion principle) and the symmetry in G p leads to nuclear spin statistical weights. One might 
think that we should form a 'full' symmetry group of the molecular Hamiltonian, G FULL say, describing all 

symmetry types simultaneously and symmetry classify our basis functions and eigenfunctions in this group. If 

we neglect time reversal symmetry (which requires special consideration because the operator is antiunitary), 

we have 


(A1.4.134) 

that is, the full symmetry group for an isolated molecule in field-free space is the direct product of the groups 
describing the individual symmetry types. However, it can be shown that it is completely equivalent and 
easier, to treat each type of symmetry and each symmetry group, separately. In order to transform irreducibly 

in G FULL' 
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a wavefunction must transform irreducibly in each of the groups G T , K (spatial), £, ^and G p . This is 

discussed in section 7.3 of [1]. Watson [ 17 ] has shown that for a molecule in an external electric field the full 
symmetry group cannot be factorized in the simple manner of ( equation Al .4. 1 34 ). In this case, instead of the 

three separate groups K (spatial), £and G CNP , it is necessary to consider a more complicated group containing 
selected elements of their direct product group. In the following section we show how the direct product of the 

groups 8 and G CNP , called the complete nuclear permutation-inversion (CNPI) group G CNPI , is used in 
molecular physics; it leads to the definition of the molecular symmetry (MS) group. In the final section we 
show how the molecular point group emerges from the molecular symmetry group as a near symmetry group 


of the molecular Hamiltonian. 

As a postscript to this section we consider the operation of charge conjugation symmetry. This operation is 
not used in molecular physics but it is an important symmetry in nature, and it does lead to an important 
implication about the probable breakdown of time reversal symmetry. Classical electrodynamic forces are 
invariant if we change the signs of the charges. In elementary particle physics the 'charge conjugation 
operation' C is introduced as a generalization of this changing-the-sign-of-the-charge operation: it is the 
operation of changing every particle (including uncharged particles like the neutron) into its antiparticle. 
Weak interactions are not invariant to the operation C just as they are not invariant to the inversion operation 
P. One might hope to preserve the exact 'mirror symmetry' of nature if invariance to the product CP were a 
fact. Unfortunately, CP symmetry is not universal [18], although its violation is a small effect that has never 
been observed outside the neutral K meson (kaon) system and the extent of its violation cannot be calculated 
(unlike the situation with parity violation, which by comparison is a big effect). CP violation permits unequal 
treatment of particles and antiparticles and it may be responsible for the domination of matter over antimatter 
in the universe [19]. Very recent considerations concerning CP violation are summarized in [20]; in particular, 
this reference points out that the study of CP violation in neutral B mesons will probe the physics behind the 
'standard model', which does not predict sufficient CP violation to account, by itself, for the predominance of 
matter over antimatter in the universe. In the light of the fact that C was introduced as a generalization of the 
changing-the-sign-of-the-charge operation, it is appropriate that CP violation provides an unambiguous 
'convention-free' definition of positive charge: it is the charge carried by the lepton preferentially produced 
in the decay of the long-lived neutral K meson[2Y\. Although CP violation is a fact there is one invariance in 
nature involving C that is believed to be universal (based on quantum field theory) and that is invariance 
under the triple operation TCP, which also involves the time reversal operation T.T C P symmetry implies 
that every particle has the same mass and lifetime as its antiparticle. However, now, if T CP symmetry is true 
the observation of CP violation in experiments on neutral K mesons must mean that there is a compensating 
violation of time reversal symmetry at the same time. A direct experimental measure of the violation of time 
reversal symmetry has not been made, mainly because the degree of violation is very small. 


A1.4.4 THE MOLECULAR SYMMETRY GROUP 

The complete nuclear permutation inversion (CNPI) group of the PH 3 molecule is the direct product of the 
complete nuclear permutation (CNP) group S^ (see ( equation A 1.4. 19 )) and the inversion group €= {E, E*}. 

This is a group of 12 elements that we call G 12 : 


<?I2 = (^(I23).(132),(12),(23*,(31).CMI23J",(132)* 1 {12)*,(2J)*,(3I)% (A1.4.135) 
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The rotation-vibration-electronic energy levels of the PH 3 molecule (neglecting nuclear spin) can be labelled 
with the irreducible representation labels of the group G 12 . The character table of this group is given in table 
Al.4.10. 

Table Al.4.10 The character table of the CNPI group G n . 


E (123) 

(12) 

E* 

(123)* 

(12)* 

(132) 

(23) 
(31) 


(132)* 

(23) 
(31)* 


V 

1 

1 

1 

1 

1 

1 

V 

1 

1 

1 

-1 

-1 

-1 

V 

1 

1 

-1 

1 

1 

-1 

A 2 ~ 

1 

1 

-1 

-1 

-1 

1 

E + 

2 

-1 



2 

-1 



E~ 

2 

-1 



-2 

1 




Before we consider the results of this symmetry labelling, we should consider the effect of the inversion 
motion in PH 3< In figure Al.4.5 we depict the two versions (see [22] for a discussion of this term) of the 
numbered equilibrium structure of the molecule and call them a and b. The inversion coordinate p is also 
indicated in this figure. In figure Al.4.6 we schematically indicate the cross-section in the potential energy 
surface of the PH 3 molecule that contains the two minima and the barrier between them. In this figure we also 
indicate several vibrational energy levels of the molecule. The barrier to inversion is so high («1 1 300 cm_; 
see [23]) that there is no observable inversion tunnelling splitting. Thus, the energy levels can be calculated by 
just considering the motion in one of the two minima and we do not need to consider both minima. The 
'single minimum' calculation is represented in figure Al.4.7 each minimum has a duplicate set of energy 
levels. 


a 
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"Xhy 


P H 


H, 



Figure Al.4.5. PH 3 inversion. 



f-*p 


Figure Al.4.6. A cross-section of the potential energy surface of PH 3 . The coordinate p is defined in figure 
Al.4.5. 
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Figure Al.4.7. A cross-section of the potential energy surface of PH 3 obtained by ignoring the version b (see 
figure Al.4.6 ). The coordinate p is defined in figure Al.4.5 . 

If we were to calculate the vibrational energy levels using the double minimum potential energy surface, we 
would find that well below the barrier, every energy level would be doubly degenerate to within measurement 
accuracy for PH 3 . If we symmetry classified the levels using the group G 12 we would find that there were 


three types of energy level: 


A? + A, . A* + A-, or t' + + fc" 


. This double degeneracy would be resolved by 


inversion tunnelling and it is an accidental degeneracy not forced by the symmetry group G 12 . If the inversion 
tunnelling is not resolved we have actually done too much work here. There are only three distinct types of 
level and yet we have used a symmetry group with six irreducible representations. However, Longuet-Higgins 
[ 24 ] showed how to obtain the appropriate subgroup of G 12 that avoids the unnecessary double labels. This is 
achieved by just using the elements of G 12 that are appropriate for a single minimum; we delete elements such 
as E* and (12) that interconvert the a and b forms. Longuet-Higgins termed the deleted elements 'unfeasible.' 
The group obtained is 'the molecular symmetry (MS) group'. In the case of PH 3 , we obtain the particular MS 
group 


C; 3v (M)=[A\{123) t (132),(12)\{23r,^l)l; 


(A1.4.136) 


its character table (with the class structure indicated) is given in table Al.4.11 . Using this group, we achieve a 
sufficient symmetry labelling of the levels as being either A^ A 2 or E. All possible interactions can be 
understood using this group (apart from the effect of inversion tunnelling). 
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Table Al.4.11 The character table of the molecular symmetry group C 3v (M). 



E 

(123) (12)* 

(132) (23)* 

(31)* 

*1 

E 

1 
1 
2 

1 1 

1 -1 

-1 


For PH 3 the labour saved by using the MS group rather than the CNPI group is not very great, but for larger 
molecules, such as the water trimer for example, a great saving is achieved if all unfeasible elements of the 
CNPI group are eliminated from consideration. An unfeasible element of the CNPI group is one that takes the 
molecule between versions that are separated by an insuperable energy barrier in the potential energy 
function. For the water trimer the CNPI group has 6! x3! x 2 = 8640 elements. The MS group that is used to 
interpret the spectrum has 48 elements [25]. 

Ammonia (NH 3 ) is pyramidal like PH 3 and in its electronic ground state there are two versions of the 
numbered equilibrium structure exactly as shown for PH 3 in figure Al.4.5 . The potential barrier between the 

two versions, however, is around 2000 cm for ammonia [26] and thus much lower than in PH 3 . This barrier 
is so low that the molecule will tunnel through it on the time scale of a typical spectroscopic experiment, and 
the tunnelling motion gives rise to energy level splittings that can be resolved experimentally (see, for 
example, figure 15-3 of [1]). Thus, for NH} 3 , all elements of the group G 12 are feasible, and the molecular 
symmetry group of NH 3 in its electronic ground state is G 12 - This group is isomorphic to the point group Z> 3h 
and in the literature it is customarily called Z> 3h (M). 


A1.4.5 THE MOLECULAR POINT GROUP 

The MS group is introduced by deleting unfeasible elements from the CNPI group. It can be applied to 
symmetry label the rotational, vibrational, electronic and spin wavefunctions of a molecule, regardless of 
whether the molecule is rigid or nonrigid. It is a true symmetry group and no terms in the Hamiltonian can 
violate the symmetry labels obtained (with the exception of the as yet undetected effect of the weak neutral 
current interaction). The MS group can be used to determine nuclear spin statistical weights, to determine 
which states can and cannot interact as a result of considering previously neglected higher order terms in the 
Hamiltonian, or the effect of externally applied magnetic or electric fields and it can be used to determine the 
selection rules for allowed electric and magnetic dipole transitions. What then of the molecular point group? 

For a molecule that has no observable tunnelling between minima on the potential energy surface (i.e., for a 

rigid molecule) and for which the equilibrium structure is nonlinear—, it turns out that the MS group is 
isomorphic to the point group of the equilibrium structure. For example, PH 3 has the molecular symmetry 


group C 3v (M) given in ( equation Al.4.136 ) and its equilibrium structure has the point group C 3v given in 
( equation A 1.4. 22 ). It is easy to show from ( equation Al. 4. 31 ) (using the fact that E*E* = E) that these two 
groups are isomorphic with the following mapping: 
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(A1.4.137) 

Obviously, we have chosen the name C 3v (M) for the molecular symmetry group of PH 3 because this group is 
isomorphic to C 3y . 

Quite remarkably, if we neglect the effect of the MS group elements on the rotational variables (the Euler 
angles 0, § and x) then each element of the MS group rotates and/or reflects the vibrational displacements and 
electronic coordinates in the manner described by its partner in the point group. In fact, for the purpose of 
classifying vibrational and electronic wavefunctions this defines what the elements of the molecular point 
group actually do to the molecular coordinates for a rigid nonlinear molecule. By starting with the 
fundamental definition of symmetry in terms of energy invariance, by considering the operations of inversion 
and identical nuclei permutation and, finally, by deleting unfeasible elements of the CNPI group, we recover 
the simple description of molecular symmetry in terms of rotations and reflections, but the rotations and 
reflections are of the vibrational displacements and the electronic coordinates — not of the entire molecule at 
its equilibrium configuration. Such operations are not symmetry operations of the full Hamiltonian (unlike the 
elements of the MS group) since the transformation of the rotational variables is neglected. This means that 
such effects as Coriolis coupling for example, which involve a coupling of rotation and vibration, will mix 
vibrational states of different point group symmetry. The molecular point group is a near symmetry group of 
the full Hamiltonian. However, the molecular point group is a symmetry group of the vibration-electronic 
Hamiltonian of a rigid molecule and in practice it is always used for labelling the vibration-electronic states 
of such molecules. Its use enables one, for example, to classify the normal vibration coordinates and to study 
the transformation properties of the electronic wavefunction without having to bother about molecular 
rotation. This is a useful simplification, but the reader must be aware that the rotation and/or reflection 
operations of the molecular point group do not rotate and/or reflect the molecule in space; they rotate and/or 

reflect the vibrational displacements and electronic coordinates—. To study the effect of molecular rotation (as 
one needs to do if one is interested in understanding high resolution rotationally resolved molecular spectra), 
or to study nonrigid molecules such as the water trimer, the point group is of no use and one must employ the 
appropriate MS group. 
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Clearly, the operation (21) has the same effect as (12), (13) has the same effect as (31) etc. 

2 The axis labels (p,q,r) are chosen in order not to confuse this axis system with other systems, such as the 
molecule fixed axes (x,y,z) discussed below, used to describe molecular motion. 

3 For an observer viewing the pq plane from a point that has a positive r coordinate ( figure A1 .4.2 ), the positive 
right-handed direction of the C 3 and rotations is anticlockwise. 
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4 Equivalently, it follows if we apply R to both sides of ( equation A1.4.58 ) and then use ( equation A1.4.59 ) on the 
left hand side. 

5 Two functions *¥ . and^F .are orthogonal if the products .*¥ ., integrated over all configuration space, vanishes. 

ni nj ° r ni nj ' ° or-' 

A function ¥ is normalized if the product ^F*^ integrated over all configuration space is unity. An orthonormal set 
contains functions that are normalized and orthogonal to each other. 

6 Note the order of the subscripts on D[R] which follows from the fact that we use the N-convention of ( equation 
A1.4.56 ) to define the effect of a permutation on a function. 

7 Proved, for example, in section 6.5 of [1]} 

8 That is, a molecule for which the minimum of the Born-Oppenheimer potential energy function corresponds to a 
nonlinear geometry. The theory of linear molecules is explained in chapter 17 of [1]. 

9 An even (odd) permutation is one that when expressed as the product of pair exchanges involves an even (odd) 
number of such exchanges. Thus (123)=(12)(23) and (12345)=(12)(23)(34)(45) are even permutations, whereas 
(12), (1234)=(12)(23)(34) and (123456)=(12)(23)(34)(45)(56) are odd permutations. 


10 


We give the spin of electron a first and of electron b second. 


11 Rigid linear molecules are a special case in which an extended MS group, rather than the MS group, is 
isomorphic to the point group of the equilibrium structure; see chapter 17 of [1]. 

12 A detailed discussion of the relation between MS group operations and point group operations is given in section 
4.5 of [1].}. 
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A 1.5 Intermolecular interactions 

Ajit J Thakkar 


A1.5.1 INTRODUCTION 


The existence of intermolecular interactions is apparent from elementary experimental observations. There 
must be attractive forces because otherwise condensed phases would not form, gases would not liquefy, and 
liquids would not solidify. There must be short-range repulsive interactions because otherwise solids and 
liquids could be compressed to much smaller volumes with ease. The kernel of these notions was formulated 
in the late eighteenth century, and Clausius made a clear statement along the lines of this paragraph as early as 
1857 [1]. 

Since the interaction energy V between a pair of molecules must have an attractive region at large 
intermolecular separations r and a steeply repulsive region at short distances, it is evident that V{r) must have 
the schematic form illustrated in figure A 1.5.1. It is conventional to denote the distance at which the 
interaction energy is a minimum by either r m or r Q and to refer to this distance as the equilibrium distance. 
Similarly it is common to denote the shorter distance at which the interaction energy is zero by a and refer to 
it as the slow collision diameter. The net potential energy of attraction at the minimum is V{r^) = -s, and s is 
called the well depth. 



Figure Al.5.1 Potential energy curve for NeF based on ab initio calculations of Archibong et al 


In 1873, van der Waals [2] first used these ideas to account for the deviation of real gases from the ideal gas 
law P V= RT in which P, Vand T are the pressure, molar volume and temperature of the gas and R is the gas 
constant. He argued that the incompressible molecules occupied a volume b leaving only the volume V- b free 
for the molecules to move in. He further argued that the attractive forces between the molecules reduced the 

pressure they exerted on the container by a/V ; thus the pressure appropriate for the gas law is P + a/V rather 
than P. These ideas led him to the van der Waals equation of state: 


{P+afV 2 )(V -h) = RT. 


(A 1.5.1) 


The importance of the van der Waals equation is that, unlike the ideal gas equation, it predicts a gas-liquid 
transition and a critical point for a pure substance. Even though this simple equation has been superseded, its 


remarkable success led to the custom of referring to the attractive and repulsive forces between molecules as 
van der Waals forces. 

The feature that distinguishes intermolecular interaction potentials from intramolecular ones is their relative 
strength. Most typical single bonds have a dissociation energy in the 150-500 kJ mol -1 range but the strength 
of the interactions between small molecules, as characterized by the well depth, is in the 1-25 kJ mol -1 range. 

A1.5.1.1 MANY-BODY EXPANSION 

The total energy of an assembly of molecules can be written as 


+ 


E = ^2 E ^Yl V *J + Yl V iJt^'- (A 1.5.2) 

r i>j t > j >k 

in which E f is the energy of isolated molecule i, V f . is the energy of interaction between molecules i andy in 
the absence of any others, V f - k is the non-additive energy of interaction among the three molecules i,j and k in 
the absence of any others, and so on. The interaction energy is then 

v-*-£/-.-£v y+ £ w.... (A1 . 5 . 3) 

For example, if there are three molecules A, B and C, then equation (Al.5.3) can be written as 

V = V A * + VW + Vca + VW ■ (A 1 .5.4) 

V AB is the interaction energy of molecules A and B in the absence of molecule C. The interaction between 
molecules A and B will be different in the presence of molecule C, and so on. The non-additive, three-body 
term V ABC is the total correction for these errors in the three pair interactions. When there are four molecules, 
a three-body correction is included for each distinct triplet of molecules and the remaining error is corrected 
by the non-additive four-body term. 


In many cases, it is reasonable to expect that the sum of two-body interactions will be much greater than the 
sum of the three-body terms which in turn will be greater than the sum of the four-body terms and so on. 
Retaining only the two-body terms in equation (Al.5.3) is called the pairwise additivity approximation. This 
approximation is quite good so the bulk of our attention can be focused on describing the two-body 
interactions. However, it is now known that the many-body terms cannot be neglected altogether, and they are 
considered briefly in section Al. 5. 2. 6 and section Al.5.3. 5 . 

A1.5.1.2 TYPES OF INTERMOLECULAR INTERACTIONS 

It is useful to classify various contributions to intermolecular forces on the basis of the physical phenomena 
that give rise to them. The first level of classification is into long-range forces that vary as inverse powers of 
the distance r~ n , and short-range forces that decrease exponentially with distance as in exp(-ar). 

There are three important varieties of long-range forces: electrostatic, induction and dispersion. Electrostatic 
forces are due to classical Coulombic interactions between the static charge distributions of the two 
molecules. They are strictly pairwise additive, highly anisotropic, and can be either repulsive or attractive. 


The distortions of a molecule's charge distribution induced by the electric field of all the other molecules 
leads to induction forces that are always attractive and highly non-additive. Dispersion forces are always 
present, always attractive, nearly pairwise additive, and arise from the instantaneous fluctuations of the 
electron distributions of the interacting molecules. If the molecules are in closed-shell ground states, then 
there are no other important long-range interactions. However, if one or more of the molecules are in 
degenerate states, then non-additive, resonance interactions of either sign can arise. Long-range forces are 
discussed in greater detail in section Al. 5. 2 . 

The most important short-range forces are exchange and repulsion; they are very often taken together and 
referred to simply as exchange-repulsion. They are both non-additive and of opposing sign, but the repulsion 
dominates at short distances. The overlap between the electron densities of molecules when they are close to 
one another leads to modifications of the long-range terms and thence to short-range penetration, charge 
transfer and damping effects. All these effects are discussed in greater detail in section Al. 5. 3 . 

A1.5.1.3 POTENTIAL ENERGY SURFACES 

Only the interactions between a pair of atoms can be described as a simple function V(r) of the distance 
between them. For nonlinear molecules, several coordinates are required to describe the relative orientation of 
the interacting species. Thus it is necessary to think of the interaction energy as a 'potential energy 
surface' (PES) that depends on many variables. There are usually several points of minimum energy on this 
surface; many of these will be 'local minima' and at least one will be the 'global minimum'. The interaction 
energy at a local minimum is lower than at any point in its neighbourhood but there can be lower energy 
minima further away. If there is more than one global minimum, then these are located at symmetry 
equivalent points on the surface, corresponding to the same minimum energy. 

For the interaction between a nonlinear molecule and an atom, one can place the coordinate system at the 
centre of mass of the molecule so that the PES is a function of the three spherical polar coordinates r,Q,§ 
needed to specify the location of the atom. If the molecule is linear, Fdoes not depend on § and the PES is a 
function of only two variables. In the general case of two nonlinear molecules, the interaction energy depends 
on the distance between the centres of mass, and five of the six Euler angles needed to specify the relative 
orientation of the molecular axes with respect to the global or 'space-fixed' coordinate axes. 


A1.5.2 LONG-RANGE FORCES 

A1. 5.2.1 LONG-RANGE PERTURBATION THEORY 

Perturbation theory is a natural tool for the description of intermolecular forces because they are relatively 
weak. If the interacting molecules (A and B) are far enough apart, then the theory becomes relatively simple 
because the overlap between the wavefunctions of the two molecules can be neglected. This is called the 
polarization approximation. Such a theory was first formulated by London [3, 4], and then reformulated by 
several others [5, 6 and 7]. 

Each electron in the system is assigned to either molecule A or B, and Hamiltonian operators 7i A and "W B for 
each molecule defined in terms of its assigned electrons. The unperturbed Hamiltonian for the system is then 
W = H A + "H B , and the perturbation Xh' consists of the Coulomb interactions between the nuclei and 

electrons of A and those of B. The unperturbed states, eigenfunctions ofJv, are simple product functions 
\j/ A q/ B . For closed-shell molecules, non-degenerate, Rayleigh-Schrodinger, perturbation theory gives the 

energy of the ground state of the interacting system. The first-order interaction energy is the electrostatic 


energy, and the second-order energy is partitioned into induction and dispersion energies. The induction 
energy consists of all terms that involve excited states of only one molecule at a time, whereas the dispersion 
energy includes all the remaining terms that involve excited states of both molecules simultaneously. 

Long-range forces are most conveniently expressed as a power series in 1/r, the reciprocal of the 
intermolecular distance. This series is called the multipole expansion. It is so common to use the multipole 
expansion that the electrostatic, induction and dispersion energies are referred to as 'non-expanded' if the 
expansion is not used. In early work it was noted that the multipole expansion did not converge in a 
conventional way and doubt was cast upon its use in the description of long-range electrostatic, induction and 
dispersion interactions. However, it is now established [8, 9, 10, 11, 12 and 13] that the series is asymptotic in 
Poincare's sense. The interaction energy can be written as 


V(r) = £>„//" <0(]/r v+l ) (A 1.5.5) 

tf=ft 

with the assurance that the remainder left upon truncation after some chosen term in r~ N tends to zero in the 
limit as r — » oo. In other words, the multipole expansion can be made as accurate as one desires for large 
enough intermolecular separations, even though it cannot be demonstrated to converge at any given value of r 
and, in some cases, diverges for all r! 

Some electric properties of molecules are described in section Al. 5. 2. 2 because the coefficients of the powers 
of Mr turn out to be related to them. The electrostatic, induction and dispersion energies are considered in turn 
in section Al. 5. 2. 3 , section Al. 5. 2.4 and section Al. 5.2. 5 , respectively. 


A1. 5.2.2 MULTIPOLE MOMENTS AND POLARIZABILITIES 

The long-range interactions between a pair of molecules are determined by electric multipole moments and 
polarizabilities of the individual molecules. Multipole moments are measures that describe the non-sphericity 
of the charge distribution of a molecule. The zeroth-order moment is the total charge of the molecule: Q = 
Yafli where q f is the charge of particle i and the sum is over all electrons and nuclei in the molecule. The first- 
order moment is the dipole moment vector with Cartesian components given by 


/** = / p(r)r a ifr ff £ \x> y\ i] 


(A 1.5.6) 


in which p(r) is the total (electronic plus nuclear) charge density of the molecule. The direction of the dipole 
moment is from negative to positive. Dipole moments have been measured for a vast variety of molecules [ 14 , 
15 and 16]. 

Next in order is the quadrupole moment tensor with components: 


(A 1.5.7) 


©<# = j I p(r){2r a r fi -r 2 & afi )d*r <x. fi e {*, >\ z] 

where the 'Kronecker delta' S^ R = 1 for a = P and 8^ R = for a ^ p. The quadrupole moment is a symmetric 


(0 o = ®o ) second-rank tensor. Moreover, it is traceless: 

v ap pa 7 ' 


e„+e„+© c = o. (A158) 


Therefore, it has at most five independent components, and fewer if the molecule has some symmetry. 
Symmetric top molecules have only one independent component of©, and, in such cases, the axial 
component is often referred to as the quadrupole moment. A quadrupolar distribution can be created from four 
charges of the same magnitude, two positive and two negative, by arranging them in the form of two dipole 
moments parallel to each other but pointing in opposite directions. Centro-symmetric molecules, like C0 2 , 
have a zero dipole moment but a non-zero quadrupole moment. 

The multipole moment of rank n is sometimes called the 2 n -pole moment. The first non-zero multipole 
moment of a molecule is origin independent but the higher-order ones depend on the choice of origin. 
Quadrupole moments are difficult to measure and experimental data are scarce [17, 18 and 19]. The octopole 
and hexadecapole moments have been measured only for a few highly symmetric molecules whose lower 
multipole moments vanish. Ab initio calculations are probably the most reliable way to obtain quadrupole and 
higher multipole moments [20, 21 and 22 ]. 

The charge redistribution that occurs when a molecule is exposed to an electric field is characterized by a set 
of constants called polarizabilities . In a uniform electric field F, a component of the dipole moment is 


l* a = t£ + U afi F fi + - flap? FfiFy + — r^ysF(tF r Fi + - - (A 1-5.9) 

in which a o, P o andT „ § , respectively, are components of the dipole polarizability, hyperpolarizability 
and second hyperpolarizability tensors, and a summation is implied over repeated subscripts. 

The dipole polarizability tensor characterizes the lowest-order dipole moment induced by a uniform field. The 
a tensor is symmetric and has no more than six independent components, less if the molecule has some 
symmetry. The scalar or mean dipole polarizability 

- K i v- 

t* = j Tr «= j 2^*" (A 1.5.10) 


is invariant to the choice of coordinate system and is often referred to simply as 'the polarizability'. It is 
related to many important bulk properties of an ensemble of molecules including the dielectric constant, the 
refractive index, the extinction coefficient, and the electric susceptibility. The polarizability is a measure of 
the softness of the molecule's electron density, and correlates directly with molecular size, and inversely with 
the ionization potential and HOMO-LUMO gap. Another scalar polarizability invariant commonly 
encountered is the polarizability anisotropy: 


(A<*) 3 = ^[3Trtt'-(Tra) 2 ] T (A 1.5.11] 


In linear, spherical and symmetric tops the components of a along and perpendicular to the principal axis of 
symmetry are often denoted by a,, and a^, respectively. In such cases, the anisotropy is simply Aa = a,, - a^. 
If the applied field is oscillating at a frequency co, then the dipole polarizability is frequency dependent as well 
a(co). The zero frequency limit of the 'dynamic' polarizability a(co) is the static polarizability described 
above. 

There are higher multipole polarizabilities that describe higher-order multipole moments induced by non- 
uniform fields. For example, the quadrupole polarizability is a fourth-rank tensor C that characterizes the 
lowest-order quadrupole moment induced by an applied field gradient. There are also mixed polarizabilities 
such as the third-rank dipole-quadrupole polarizability tensor A that describes the lowest-order response of 
the dipole moment to a field gradient and of the quadrupole moment to a dipolar field. All polarizabilities of 
order higher than dipole depend on the choice of origin. Experimental values are basically restricted to the 
dipole polarizability and hyperpolarizability [23, 24 and 25]. Ab initio calculations are an important source of 
both dipole and higher polarizabilities [20]; some recent examples include [26, 27 ]. 

A1. 5.2.3 ELECTROSTATIC INTERACTIONS 

The electrostatic potential generated by a molecule A at a distant point B can be expanded in inverse powers 
of the distance r between B and the centre of mass (CM) of A. This series is called the multipole expansion 
because the coefficients can be expressed in terms of the multipole moments of the molecule. With this 
expansion in hand, it is 


straightforward to write the electrostatic interaction between molecule A and another molecule with its CM at 
B as a multipole expansion. The formal expression [7, 28] for this electrostatic interaction, in terms of 'T 
tensors', is intimidating to all but the experts. However, explicit expressions for individual terms in this 
expansion are easily understood. 

Consider the case of two neutral, linear, dipolar molecules, such as HCN and KCl, in a coordinate system with 
its origin at the CM of molecule A and the z-axis aligned with the intermolecular vector r pointing from the 
CM of A to the CM of B. The relative orientation of the two molecules is uniquely specified by their spherical 
polar angles 6 A ,6 B and the difference § = § A - (|> B between their azimuthal angles. The leading term in the 
multipole expansion of the electrostatic interaction energy is the dipole-dipole term 


Vjj(r, A ,0u,#) = - M A . (2 cos 3 A cos fl B - sin 9 A sin 9 B cos 0) fA1 512 ) 

in which s Q is the vacuum permittivity, and |u A and |u B are the magnitudes of the dipole moments of A and B. 
This expression is also applicable to the dipole-dipole interaction between any pair of neutral molecules 
provided that the angles are taken to specify the relative orientation of the dipole moment vectors of the 
molecules. 

The leading term in the electrostatic interaction between a pair of linear, quadrupolar molecules, such as 
HCCHandC0 2 is 

V™ = : — L — -[1 5 cos 2 tf ft 5cas 2 fl|j ISco^flftCos 2 ^ i 2(4costf A cos#a /A , c , Q , 

16jt£o'* ( A l.o.lo) 

itn tt A sin flfcj cos £) 2 ] 

in which A and B are the axial quadrupole moments of A and B. This expression is also applicable to the 
quadrupole-quadrupole interaction between any pair of spherical or symmetric top molecules provided that 


the angles are taken to specify the relative orientation of the axial component of the quadrupole moment 
tensors of the molecules. 

The leading term in the electrostatic interaction between the dipole moment of molecule A and the axial 
quadrupole moment of a linear, spherical or symmetric top B is 

V dq = m J**' ^ Eeosfl A (3cos 2 ft B - I) -sinfl A sin20 B cos#] + (A 1.5.14) 

Note the r dependence of these three terms: the dipole-dipole interaction varies as r~ 3 , the dipole-quadrupole 
as r -4 and the quadrupole-quadrupole as r~ 5 . In general, the interaction between a 2 " -pole moment and a 2 L - 
pole moment varies as r~( + L + \ Thus, the dipole-octopole interaction also varies as r -5 . At large enough r, 
only the term involving the lowest-rank, non- vanishing, multipole moment is important. Higher terms begin to 
play a role as r decreases. The angular variation of the electrostatic interaction is much greater than that of the 
induction and dispersion. Hence, electrostatic forces often determine the geometry of a van der Waals 
complex even when they do not constitute the dominant contribution to the overall interaction. 


At a fixed distance r, the angular factor in equation (A 1.5. 12) leads to the greatest attraction when the dipoles 
are lined up in a linear head-to-tail arrangement, A = B = 0, whereas the linear tail-to-tail geometry, A = 
7i,0 B = 0, is the most repulsive. A head-to-tail, parallel arrangement, A = B = tt/2,(|) = tt, is attractive but less 
so than the linear head-to-tail geometry. Nevertheless, if the molecules are linear, the head-to-tail, parallel 

geometry may be more stable because it allows the molecules to get closer and thus increases the r -3 factor. 
For example, the HCN dimer takes the linear head- to-tail geometry in the gas phase [29], but the crystal 
structure shows a parallel, head-to-tail packing [30], 

For interactions between two quadrupolar molecules which have © A and © B of the same sign, at a fixed 
separation r, the angular factor in equation (A 1.5. 13) leads to a planar, T-shaped structure, A = 0, B = tt/2, § 
= 0, being preferred. This geometry is often seen for nearly spherical quadrupolar molecules. There are other 
planar (c|) = 0) configurations with A = tt/2-0 b that are also attractive. A planar, 'slipped parallel' structure, 
A = B ~ 7i/4, 4> = is often preferred by planar molecules, and long and narrow molecules because it allows 
them to approach closer thereby increasing the radial factor. For example, benzene, naphthalene and many 
other planar quadrupolar molecules have crystal structures consisting of stacks of tilted parallel molecules. 

For interactions between two quadrupolar molecules which have A and B of the opposite sign, at a fixed 
separation r, the angular factor in equation (A 1.5. 13) leads to a linear structure, A = B = 0, being the most 
attractive. Linear molecules may also prefer a C 2y rectangular or non-planar 'cross' arrangement with A = B 
= tt/2, which allows them to approach closer and increase the radial factor. 

Although such structural arguments based purely on electrostatic arguments are greatly appealing, they are 
also grossly over-simplified because all other interactions, such as exchange-repulsion and dispersion, are 
neglected, and there are serious shortcomings of the multipole expansion at smaller intermolecular 
separations. 

A1.5.2.4 INDUCTION INTERACTIONS 

If the long-range interaction between a pair of molecules is treated by quantum mechanical perturbation 
theory, then the electrostatic interactions considered in section Al. 5. 2. 3 arise in first order, whereas induction 
and dispersion effects appear in second order. The multipole expansion of the induction energy in its full 
generality [7, 28] is quite complex. Here we consider only explicit expressions for individual terms in the 


multipole expansion that can be understood readily. 

Consider the interaction of a neutral, dipolar molecule A with a neutral, S-state atom B. There are no 
electrostatic interactions because all the multipole moments of the atom are zero. However, the electric field 
of A distorts the charge distribution of B and induces multipole moments in B. The leading induction term is 
the interaction between the permanent dipole moment of A and the dipole moment induced in B. The latter 
can be expressed in terms of the polarizability of B, see equation (A 1.5. 9) , and the dipole-induced-dipole 
interaction is given by 


i 


Yml =- '**** , (3COS 2 A I 1) (A 1.5.15) 

in which A is the angle between the dipole moment vector of A and the intermolecular vector, and a B is the 
mean dipole polarizability of B. Since B is a spherical atom, its polarizability tensor is diagonal with the three 
diagonal 


components equal to one another and to the mean. 

If molecule A is a linear, spherical or symmetric top that has a zero dipole moment like benzene, then the 
leading induction term is the quadrupole-induced-dipole interaction 


^=- Jf*^ tW^Wfl A ) (A 1.5-16) 

in which A is the angle between the axial component of the quadrupole moment tensor of A and the 
intermolecular vector. 

If the molecule is an ion bearing a charge Q A , then the leading induction term is the isotropic, charge- 
induced-dipole interaction 

V * = - <>,A A< %-4 ' (A1 - 5 - 17) 

For example, this is the dominant long-range interaction between a neon atom and a fluoride anion F~. 

Note the r dependence of these terms: the charge-induced-dipole interaction varies as r , the dipole-induced- 

r o fi 

dipole as r ° and the quadrupole-induced-dipole as r °. In general, the interaction between a permanent 2 - 
pole moment and an induced 2 L -pole moment varies as r~ 2 ( + L + *\ At large enough r, only the leading term 
is important, with higher terms increasing in importance as r decreases. The induction forces are clearly non- 
additive because a third molecule will induce another set of multipole moments in the first two, and these will 
then interact. Induction forces are almost never dominant since dispersion is usually more important. 

A1. 5.2.5 DISPERSION INTERACTIONS 

The most important second-order forces are dispersion forces. London [3, 31, 32] showed that they are caused 
by a correlation of the electron distribution in one molecule with that in the other, and pointed out that the 


electrons contributing most strongly to these forces are the same as those responsible for the dispersion of 
light. Since then, these forces have been called London or dispersion forces. Dispersion interactions are 
always present, even between S-state atoms such as neon and krypton, although there are no electrostatic or 
induction interaction terms since all the multipole moments of both species are zero. 

Dispersion forces cannot be explained classically but a semiclassical description is possible. Consider the 
electronic charge cloud of an atom to be the time average of the motion of its electrons around the nucleus. 
The average cloud is spherically symmetric with respect to the nucleus, but at any instant of time there may be 
a polarization of charge giving rise to an instantaneous dipole moment. This instantaneous dipole induces a 
corresponding instantaneous dipole in the other atom and there is an interaction between the instantaneous 
dipoles. The dipole of either atom averages to zero over time, but the interaction energy does not because the 
instantaneous and induced dipoles are correlated and 
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they stay in phase. The average interaction energy falls off as r -6 just as the dipole-induced-dipole energy of 
equation (Al.5.15) . Higher-order instantaneous multipole moments are also involved, giving rise to higher- 
order dispersion terms. This picture is visually appealing but it should not be taken too literally. The actual 
effect is not time dependent in the sense of classical fluctuations taking place. 

The multipole expansion of the dispersion interaction can be written as 


V(r) = -C 6 /r € - C s /r* - C l0 /r l ° (A 1.5.18) 

where the dispersion coefficients C 6 , C 8 and C 1Q are positive, and depend on the electronic properties of the 
interacting species. The first term is the interaction between the induced-dipole moments on the atoms, the 
second is the induced-dipole-induced-quadrupole term and the third consists of the induced-dipole-induced- 
octopole term as well as the interaction between induced quadrupoles. In general, the interaction between an 

induced 2 -pole moment and an induced 2 L -pole moment varies as r ( " + L + \ The dispersion coefficients 
are constants for atoms but, for non-spherical molecules, they depend upon the five angles describing the 
relative orientation of the molecules. For example, the dispersion coefficients for the interactions between an 
S-state atom and a ZS *-state diatomic molecule can be expressed as 

FJ-2 

CmiQ) = ^C|f ^.(wsfl) (A 1.5.19) 

where the C^J'are dispersion constants, the P 2l (cos ^) are Legendre polynomials, and is the angle between 
the symmetry axis of the diatomic and the intermolecular vector. Note that C" r is the spherical average of C 2n 

(0) and is the appropriate quantity to use in equation (Al.5.18) if the orientation dependence is being 

neglected. Purely anisotropic dispersion terms varying as r _7 ,r -9 ,. . . arise if at least one of the interacting 
species lacks inversion symmetry. 

Perturbation theory yields a sum-over-states formula for each of the dispersion coefficients. For example, the 
isotropic C* ] *coefficient for the interaction between molecules A and B is given by 

C AB = ^V y^ /Am/Sri 


in which fiis the Planck-Dirac constant, AE Am = E Am - E^ is the excitation energy from the ground state m 
= to state m for molecule A andf Am is the corresponding dipole oscillator strength averaged over 
degenerate final states. Similarly, the sum-over-states formula for the mean, frequency-dependent, 
polarizability can be written as 
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a( M ) = — > — (A 1.5.21) 

where (£> Am = AE Am /h is the mth excitation frequency. An important advance consisted in the realization [ 33 , 
34 and 35] that use of the Feynman identity 

[ah(a yh}]- 1 = (2/jt) / - r- ,. ... rr fera>0^>0 (A 1.5.22) 

together with equation (Al.5.20) and equation (Al.5.21) leads to 

C ^° = ~71 ;? / «A(to)</ B {ittOdw (A 1.5.23) 

where «r A (ict>>is the analytic continuation of the dynamic dipole polarizability to the imaginary axis. The 

significance of equation (Al.5.23) is that it expresses an interaction coefficient in terms of properties of the 
individual, interacting molecules. The anisotropic components of C 6 can be written as similar integrals 
involving Aa(ico), and the higher dispersion coefficients as integrals involving components of the higher- 
order, dynamic polarizability tensors at imaginary frequency. 

Many methods for the evaluation of C 6 from equation (Al.5.20) use moments of the dipole oscillator strength 
distribution (DOSD) defined, for molecule A, by 

S A (k) = (aofe 2 )* J] /a^AEL for fc = 2, 1,0, -1, -2 (A 1.5.24) 

These moments are related to many physical properties. The Thomas-Kuhn-Reiche sum rule says that S(0) 
equals the number of electrons in the molecule. Other sum rules [ 36 ] relate S(2), S(l) and S(-l) to ground 
state expectation values. The mean static dipole polarizability is a(0} = e 2 $(— 2)/its v The Cauchy expansion 

of the refractive index n at low frequencies co is given by 

K 2 -i- ffo[S(-2) + a? 3 X 1 5(-4) + ^ 4 ff 2 5(-6)+-.] (A 1.5.25) 


where the K are known constants. One approach is to use experimental photoabsorption, refractive index and 
Verdet constant data, together with known sum rules to construct a constrained DOSD from which dipole 
properties including C 6 can be calculated. This approach was pioneered by Margenau [5, 37], extended by 
Dalgarno and 
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coworkers [38, 39], and refined and exploited by Meath and coworkers [40, 41 and 42] who also generalized 
it to anisotropic properties [43, 44]. Many methods for bounding C 6 in terms of a few DOSD moments have 
been explored, and the best of these have been identified by an extensive comparative study [45]. Ab initio 
calculations are the only route to the higher-order dispersion coefficients, and Wormer and his colleagues [ 46 , 
47 , 48 , 49 and 50] have led the field in this area. The dimensionless ratio C^QJC^is predicted to be a 

constant for all interactions by simple models [51], and this ratio still serves as a useful check on ab initio 
computations [48], Dispersion coefficients of even higher order can be estimated from simple models as well 
[52, 53]. 

The dispersion coefficient for interactions (^"between molecules A and B can be estimated to an average 

accuracy of 0.5% [ 45 ] from those of the A-A and B-B interactions using the Moelwyn-Hughes [ 54 ] 
combining rule: 


^AB 


2cy A cfw H 


C * = ^L 2 I c s* 2 (A 1-5.26) 


where a A and a B are the static dipole polarizabilities of A and B, respectively. This rule has a sound 
theoretical basis [55, 56 ]. 

A1 .5.2.6 MANY-BODY LONG-RANGE FORCES 

The induction energy is inherently non-additive. In fact, the non-additivity is displayed elegantly in a 
distributed polarizability approach [28]. Non-additive induction energies have been found to stabilize what 
appear to be highly improbable crystal structures of the alkaline earth halides [57]. 

In the third order of long-range perturbation theory for a system of three atoms A, B and C, the leading non- 
additive dispersion term is the Axilrod-Teller-Muto triple-dipole interaction [58, 59] 

Vjdd = C 9 (A 1.5.27) 

where ^ AB? r BC and r CA are the sides of the triangle formed by the atoms, and A ,0 B and C are its internal 
angles, and the C 9 coefficient can be written [60] in terms of the dynamic polarizabilities of the monomers as 

If, pec 

C ^ L = — ~a rr / HA(MffRtiw)Hr(i^)ditf- (A 1.5.28) 

Hence, the same techniques used to calculate C 6 are also used for Cg. Note that equation (Al.5.28) has a 
geometrical factor whose sign depends upon the geometry, and that, unlike the case of the two-body 
dispersion interaction, the triple-dipole dispersion energy has no minus sign in front of the positive coefficient 
Cg. For example, for an equilateral triangle configuration the triple-dipole dispersion is repulsive and varies 

as + (1 l/8)C n r -9 . There are strongly 
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anisotropic, non-additive dispersion interactions arising from higher-order polarizabilities as well [61], and 
the relevant coefficients for rare gas atoms have been calculated ab initio [48]. 


A1.5.3 SHORT- AND INTERMEDIATE-RANGE FORCES 

A1. 5.3.1 EXCHANGE PERTURBATION THEORIES 

The perturbation theory described in section Al. 5. 2.1 fails completely at short range. One reason for the 
failure is that the multipole expansion breaks down, but this is not a fundamental limitation because it is 
feasible to construct a 'non-expanded', long-range, perturbation theory which does not use the multipole 
expansion [6]. A more profound reason for the failure is that the polarization approximation of zero overlap is 
no longer valid at short range. 

When the overlap between the wavefunctions of the interacting molecules cannot be neglected, the zeroth- 
order wavefunction must be anti-symmetrized with respect to all the electrons. The requirement of anti- 
symmetrization brings with it some difficult problems. If electrons have been assigned to individual molecules 

in order to partition the Hamiltonian into an unperturbed part "W and a perturbation XH\ as described in 
section Al. 5. 2.1 , then these parts do not commute with the antisymmetrization operator ^ AB for the full 

system 


U AB , H°] # 0, [A™ , XH'] ^ 0- ( A 1 - 5 - 29 ) 

On the other hand, the system Hamiltonian "H AB = "ft + Xh r is symmetric with respect to all the electrons and 
commutes with /i AB 

[^ AB ,H° + XH'] = 0. (A 1.5.30) 

Combining these commutation relations, we find 

|\4 AB ,«*] = -[.4 AB , XH] # (A 1.5.31) 

which indicates that a zeroth-order quantity is equal to a non-zero, first-order quantity. Unfortunately, this 
means that there will be no unique definition of the order of a term in our perturbation expansion. Moreover, 
antisymmetrized products of the wavefunctions of A and B will be non-orthogonal, and therefore they will not 
be eigenfunctions of any Hermitian, zeroth-order Hamiltonian. 

Given these difficulties, it is natural to ask whether we really need to antisymmetrize the zeroth-order 
wavefunction. If we start with the product function, can we reasonably expect that the system wavefunction 
obtained by perturbation theory will converge to a properly antisymmetric one? Unfortunately, in that case, 
the series barely converges [62, 63]. Moreover, there are an infinite number of non-physical states with 
bosonic character that lie below the physical ground state [64] for most systems of interest — all those 
containing at least one atom with atomic number greater than two 
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[13]. Claverie [64] has argued that if perturbation theory converges at all, it will converge to one of these 


unphysical states. 

Clearly, standard Rayleigh-Schrodinger perturbation theory is not applicable and other perturbation methods 
have to be devised. Excellent surveys of the large and confusing variety of methods, usually called 'exchange 
perturbation theories', that have been developed are available [28, 65]. Here it is sufficient to note that the 
methods can be classified as either 'symmetric' or 'symmetry-adapted'. Symmetric methods start with 
antisymmetrized product functions in zeroth order and deal with the non-orthogonality problem in various 
ways. Symmetry-adapted methods start with non-antisymmetrized product functions and deal with the 
antisymmetry problem in some other way, such as antisymmetrization at each order of perturbation theory. 

A further difficulty arises because the exact wavefunctions of the isolated molecules are not known, except for 
one-electron systems. A common starting point is the Hartree-Fock wavefunctions of the individual 
molecules. It is then necessary to include the effects of intramolecular electron correlation by considering 
them as additional perturbations. Jeziorski and coworkers [66] have developed and computationally 
implemented a triple perturbation theory of the symmetry-adapted type. They have applied their method, 
dubbed SAPT, to many interactions with more success than might have been expected given the fundamental 
doubts [67] raised about the method. SAPT is currently both useful and practical. A recent application [68] to 
the C0 2 dimer is illustrative of what can be achieved with SAPT, and a rich source of references to previous 
SAPT work. 

A1.5.3.2 FIRST-ORDER INTERACTIONS 

In all methods, the first-order interaction energy is just the difference between the expectation value of the 
system Hamiltonian for the antisymmetrized product function and the zeroth-order energy 


in which f^and J^are the ground-state energies of isolated molecules A and B. An electrostatic part is usually 

separated out from the first-order energy, also called the Heitler-London energy, and the remainder is called 
the exchange-repulsion part: 

E m = E w + E w (A1 . 5 . 33) 


The 'non-expanded' form of the electrostatic or 'Coulomb' energy is 

£*- = / / — : : — d n d T'2 (A 1.5.34) 

J J In -r>\ 
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where p A and p B are the total (nuclear plus electronic) charge densities of A and B, respectively. A multipole 
expansion of equation (Al.5.34) leads to the long-range electrostatic energy discussed in section Al. 5.2. 3 . 

The difference between the converged multipole expansion of the electrostatic energy and E* is sometimes 
called the first-order penetration energy. The exchange-repulsion is often simply called the exchange energy. 
For Hartree-Fock monomer wavefunctions, £ m can be divided cleanly [69] into attractive exchange and 


dominant repulsion parts. The exchange part arises because the electrons of one molecule can extend over the 
entire system, whereas the repulsion arises because the Pauli principle does not allow electrons of the same 
spin to be in the same place. 

Figure Al.5.2 shows f l^and £ cn for the He-He interaction computed from accurate monomer wavefunctions 

[70]. Figure Al.5.3 shows that, as in interactions between other species, the first-order energy ES> for He-He 
decays exponentially with interatomic distance. It can be fitted [ 70 ] within 0.6% by a function of the form 


£ (11 = M/r)e" fcrV 


(A 1.5.35) 


where A,b,c are fitted parameters. 

20 



Figure Al.5.2 First-order Coulomb (0) and exchange-repulsion (□) energies for He-He. Based on data from 
Komasa and Thakkar [70], 
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Figure Al.5.3 First-order interaction energy for He-He. Based on data from Komasa and Thakkar [70]. 

The exchange-repulsion energy is approximately proportional to the overlap of the charge densities of the 
interacting molecules [71 , 72 and 73] 


ff^tf/pAWflgW^rl 


(A 1.5.36) 


where n « 1 . 


A1.5.3.3 SECOND-ORDER INTERACTIONS 


The details of the second-order energy depend on the form of exchange perturbation theory used. Most known 
results are numerical. However, there are some common features that can be described qualitatively. The 
short-range induction and dispersion energies appear in a non-expanded form and the differences between 
these and their multipole expansion counterparts are called penetration terms. 
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The non-expanded dispersion energy can be written as 


V^ir) = -yi(r)C*/r* - AC'KW'* - /i y 0)C ]0 /r ]0 


(A 1.5.37) 


where the^(r)/ 8 (r),. . . are 'damping' functions. The damping functions tend to unity as r — » go so that the 
long-range form of equation (Al.5.18) is recovered. As r^> 0, the damping functions tend to zero as r 11 so that 


they suppress the spurious r n singularity of the undamped dispersion, equation (Al.5.18) . Meath and 
coworkers [74, 75, 76, 77, 78 and 79] have performed ab initio calculations of these damping functions for 
interactions between small species. The general form is shown in figure Al.5.4. Observe that the distance at 
which the damping functions begin to decrease significantly below unity increases with n. The orientation 
dependence of the damping functions is not known. Similar damping functions also arise for the induction 
energy [74, 76, 79]. 



Figure Al.5.4 Dispersion damping functions, f 6 > 


->/«: and /i 


10-" 


for H-H based on data from [74]. 


A 'charge transfer' contribution is often identified in perturbative descriptions of intermolecular forces. This, 
however, is not a new effect but a part of the short-range induction energy. It is possible to separate the charge 
transfer part from the rest of the induction energy [80]. It turns out to be relatively small and often negligible. 
Stone [28] has explained clearly how charge transfer has often been a source of confusion and error. 
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A1.5.3.4 SUPERMOLECULE CALCULATIONS 

The conceptually simplest way to calculate potential energy surfaces for weakly interacting species is to treat 
the interacting system AB as a 'supermolecule', use the Schrodinger equation to compute its energy as a 
function of the relative coordinates of the interacting molecules, and subtract off similarly computed energies 
of the isolated molecules. This scheme permits one to use any available method for solving the Schrodinger 
equation. 

Unfortunately, the supermolecule approach [81, 82] is full of technical difficulties, which stem chiefly from 
the very small magnitude of the interaction energy relative to the energy of the supermolecule. Even today, a 
novice would be ill-advised to attempt such a computation using one of the 'black-box' computer programs 
available for performing ab initio calculations. 


That said, the remarkable advances in computer hardware have made ab initio calculations feasible for small 
systems, provided that various technical details are carefully treated. A few examples of recent computations 


include potential energy surfaces for He-He [83], Ne-Ne and Ar-Ar [84], Ar-H 2 , Ar-HF and Ar-NH 3 [85], 

N~-He [86, 87], He-F~ and Ne-F~ [88]. Density- functional theory [ 89 ] is currently unsuitable for the 
calculation of van der Waals interactions [90], but the situation could change. 


A1. 5.3.5 MANY-BODY SHORT-RANGE FORCES 


A few ab initio calculations are the main source of our current, very meagre knowledge of non-additive 
contributions to the short-range energy [91], It is unclear whether the short-range non-additivity is more or 
less important than the long-range, dispersion non-additivity in the rare-gas solids [28, 92]. 


A1.5.4 EXPERIMENTAL INFORMATION 

Despite the recent successes of ab initio calculations, many of the most accurate potential energy surfaces for 
van der Waals interactions have been obtained by fitting to a combination of experimental and theoretical 
data. The future is likely to see many more potential energy surfaces obtained by starting with an ab initio 
surface, fitting it to a functional form and then allowing it to vary by small amounts so as to obtain a good fit 
to many experimental properties simultaneously; see, for example, a recent study on 'morphing' an ab initio 
potential energy surface forNe-HF [93], 

This section discusses how spectroscopy, molecular beam scattering, pressure virial coefficients, 
measurements on transport phenomena and even condensed phase data can help determine a potential energy 
surface. 

A1.5.4.1 SPECTROSCOPY 

Spectroscopy is the most important experimental source of information on intermolecular interactions. A wide 
range of spectroscopic techniques is being brought to bear on the problem of weakly bound or 'van der 
Waals' complexes [94, 95]. Molecular beam microwave spectroscopy, pioneered by Klemperer and refined by 
Flygare, has been used to determine the microwave spectra of a large number of weakly bound complexes and 
obtain structural information 
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averaged over the vibrational ground state. With the development of tunable far-infrared lasers and 
sophisticated detectors, far-infrared ' vibration-rotation-tunnelling ' spectroscopy has enabled Saykally and 
others to measure data that probes portions of the potential energy surface further from the minimum. Other 
techniques including vacuum ultraviolet spectroscopy and conventional gas-phase absorption spectroscopy 
with very long path lengths have also been used. 

Spectroscopic data for a complex formed from two atoms can be inverted by the Rydberg-Klein-Rees 
procedure to determine the interatomic potential in the region probed by the data. The classical turning points 
r L and r R corresponding to a specific energy level E(v,J) with vibrational and rotational quantum numbers o 
and Jean be determined from a knowledge of all the vibrational and rotational energy level spacings between 
the bottom of the well and the given energy level. The standard equations are [ 96 ] 


(A 1.5.38) 



dv' 

[E(v,J)-E(v',J)] l ' i 


and 


^.y, = L/, t -i/, s = ^i£ 2 


2V5T t' BV.JiM 


[F4v,J)-F4v\J)] 


1/2 


where B(\j,J) = (2J+ 1) (9 £79 J) is a generalized rotational constant. If the rotational structure has not been 
resolved, then the vibrational spacings alone can be used to determine the well-width functional), 0). Similar 
methods have been developed which enable a spherically averaged potential function to be obtained by 
inversion of rotational levels, measured precisely enough to yield information on centrifugal distortion, for a 
single vibrational state. However, most van der Waals complexes are too floppy for a radial potential energy 
function to be a useful representation of the full PES. 

Determination of a PES from spectroscopic data generally requires fitting a parameterized surface to the 
observed energy levels together with theoretical and other experimental data. This is a difficult process 
because it is not easy to devise realistic functional representations of a PES with parameters that are not 
strongly correlated, and because calculation of the vibrational and rotational energy levels from a PES is not 
straightforward and is an area of current research. The former issue will be discussed further in section 
Al.5.5.3 . The approaches available for the latter currently include numerical integration of a truncated set of 
'close-coupled' equations, methods based on the discrete variable representation and diffusion Monte Carlo 
techniques [28]. Some early and fine examples of potential energy surfaces determined in this manner include 
the H 2 -rare gas surfaces of LeRoy and coworkers [97, 98 and 99], and the hydrogen halide-rare gas potential 
energy surfaces of Hutson [ 100 , 101 and 102 ]. More recent work is reviewed by van der Avoird et al [ 103 ], 
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A1.5.4.2 MOLECULAR BEAM SCATTERING 

One direct way to study molecular interactions is to cross two molecular beams, one for each of the 
interacting species, and to study how the molecules scatter after elastic collisions at the crossing point of the 
two beams. A collision of two atoms depends upon the relative kinetic energy E of collision and the impact 
parameter b, which is the distance by which the centres of mass would miss each other in the absence of 
interatomic interaction. Collimated beams with well defined initial velocities can be used, and the scattering 
measured as a function of deflection angle %. However, it is not possible to restrict the collisions to a single 
impact parameter, and results are therefore reported in the form of differential cross sections g(%, E) which 
are measures of the observed scattering intensity. The integral cross section 


-/-(* 


Q{£) = f n(x<E)dQ (A 1.5.40) 


is simply the integral of the differential cross section over all solid angles. 

The situation is much the same as with spectroscopic measurements. In the case of interactions between 


monatomic species, if all the oscillations in the measured differential cross sections are fully resolved, then an 
inversion procedure can be applied to obtain the interatomic potential [ 104 , 105 ]. No formal inversion 
procedures exist for the determination of a PES from measured cross sections for polyatomic molecules, and it 
is necessary to fit a parametrized surface to the observed cross sections. 


A1. 5.4.3 GAS IMPERFECTIONS 


The virial equation of state, first advocated by Kamerlingh Onnes in 1901, expresses the compressibility 
factor of a gas as a power series in the number density: 

PV/RT = 1 - B{T)/V + CiD/V 1 + - ■ ■ (A 1.5.41) 

in which B{T),C{ 2),. . . are called the second, third, . . . virial coefficients. The importance of this equation in 
the study of intermolecular forces stems from the statistical mechanical proof that the second virial coefficient 
depends only on the pair potential, even if the total interaction contains significant many-body contributions. 
For spherically symmetric interactions the relationship between B(T) and V(r) was well established by 1908, 
and first Keesom in 1912, and then Jones (later known as Lennard- Jones) in the 1920s exploited it as a tool 
for the determination of intermolecular potentials from experiment [ 106 , 107 ]. The relationship is simply 
[108]: 

B(T) = -2xN A / [exp(- V(r)fkT) - l]r 2 dr. (A 1.5.42) 

Jo 
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In the repulsive region (r < a) there is a one-to-one correspondence between the interaction energy and the 
intermolecular distance. Hence it is possible, in principle at least, to obtain V(r) for r < a by inverting B(T). 
However, in the region of the potential well (r > a), both the inner and outer turning points of the classical 
motion correspond to the same Fand hence it is impossible to obtain V(r) uniquely by inverting B(T). In fact 
[ 109 , 110 ], inversion of 5(7) can only yield the width of the well as a function of its depth. For light species, 
equation (A 1.5. 42) is the first term in a semi-classical expansion, and the following terms are called the 
quantum corrections [ 106 , 107 , 111]. For nonlinear molecules, the classical relationship is analogous to 
equation (A 1.5. 42) except that the integral is six dimensional since five angles are required to specify the 
relative orientation of the molecules. In such cases, inversion of B(T) is a hopeless task. Nevertheless, virial 
coefficient data provide an important test of a proposed potential function. 

The third virial coefficient C(T) depends upon three-body interactions, both additive and non-additive. The 
relationship is well understood [ 106 , 107 , 111 ]. If the pair potential is known precisely, then C(T) ought to 
serve as a good probe of the non-additive, three-body interaction energy. The importance of the non-additive 
contribution has been confirmed by C(T) measurements. Unfortunately, large experimental uncertainties in C 
(7) have precluded unequivocal tests of details of the non-additive, three-body interaction. 

A1. 5.4.4 TRANSPORT PROPERTIES 

The viscosity, thermal conductivity and diffusion coefficient of a monatomic gas at low pressure depend only 
on the pair potential but through a more involved sequence of integrations than the second virial coefficient. 
The transport properties can be expressed in terms of 'collision integrals' defined [ 111 ] by 


n^ > (r) = [(5 + ])!(*rr +2 ]" 1 f e ( °(£)c-^ T £ H d£ (a 1.5.43) 

where k is the Boltzmann constant and E is the relative kinetic energy of the collision. The collision integral is 
a thermal average of the transport cross section 


m 


Q" } iE) = 27t 


l + f-nH" 1 r^ 

[ _ ^ l ^ I / ( 1 - COS r X )* d/; (A 1 .5.44) 


20 + 

in which b is the impact parameter of the collision, and % is the deflection angle given by 

jf<£^) =X-2h — — — (A 1.5.45) 

/, n i" (I -A 2 / 1,2 -V{r)fEy* 2 

where r Q , the distance of closest approach in the collision, is the outermost classical turning point of the 
effective potential. The latter is the sum of the true potential and the centrifugal potential so that V Q ^L,r) = V 

(r) + L /(2|ir ) = V(r) + Eb Ir in which L is the angular momentum and |u the reduced mass. Hence r Q is the 
outermost solution of E = V ^L f r Q ). 
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The Chapman-Enskog solution of the Boltzmann equation [ 112 ] leads to the following expressions for the 
transport coefficients. The viscosity of a pure, monatomic gas can be written as 


* ( )_ ^Q^-nn (A1 - 5 - 46) 


and the thermal conductivity as 


Hn.Ht^Ly . ' f, (A 1.5.47, 

where m is the molecular mass.f andy^ are higher-order correction factors that differ from unity by only 1 or 
2% over a wide temperature range, and can be expressed in terms of collision integrals with different values 
of fand s. Expression (Al.5.46) and Expression (Al.5.47) imply that 

XT) 15*/, 

TFT = ^—r (A 1.5.48) 

and this is bome out experimentally [ 111 ] with the ratio of correction factors being a gentle function of 
temperature: fJL « 1 + 0.0042(1 - e °- 33 ( 1 - T *)) for 1< T* < 90 with T* = kT/e. The self-diffusion coefficient 


can be written in a similar fashion: 


1/2 1 


D{T) = ^- f J ^— : f n (A 1.5.49) 

where w is the number density. The higher-order correction factor f D differs from unity by only a few per cent 
and can also be expressed in terms of other collision integrals. 

Despite the complexity of these expressions, it is possible to invert transport coefficients to obtain information 
about the intermolecular potential by an iterative procedure [ 111 ] that converges rapidly, provided that the 
initial guess for V(f) has the right well depth. 

The theory connecting transport coefficients with the intermolecular potential is much more complicated for 
polyatomic molecules because the internal states of the molecules must be accounted for. Both quantum 
mechanical and semi-classical theories have been developed. McCourt and his coworkers [ 113 , 114 ] have 
brought these theories to computational fruition and transport properties now constitute a valuable test of 
proposed potential energy surfaces that 
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can be performed routinely. Electric and magnetic field effects on transport properties [ 113 , 114 ] depend 
primarily on the non-spherical part of the interaction, and serve as stringent checks on the anisotropy of 
potential energy surfaces. 


A1.5.5 MODEL INTERACTION POTENTIALS 

There are many large molecules whose interactions we have little hope of determining in detail. In these cases 
we turn to models based on simple mathematical representations of the interaction potential with empirically 
determined parameters. Even for smaller molecules where a detailed interaction potential has been obtained 
by an ab initio calculation or by a numerical inversion of experimental data, it is useful to fit the calculated 
points to a functional form which then serves as a computationally inexpensive interpolation and extrapolation 
tool for use in further work such as molecular simulation studies or predictive scattering computations. There 
are a very large number of such models in use, and only a small sample is considered here. The most 
frequently used simple spherical models are described in section A 1.5. 5.1 and some of the more common 
elaborate models are discussed in section Al. 5. 5.2 , section Al. 5. 5. 3 and section Al. 5. 5.4 . 

A1. 5.5.1 SIMPLE SPHERICAL MODELS 

The hard sphere model considers each molecule to be an impenetrable sphere of diameter a so that 


v <-Ho 


°° "-" (A 1.5.50) 

t > a. 


This simple model is adequate for some properties of rare gas fluids. When it is combined with an accurate 
description of the electrostatic interactions, it can rationalize the structures of a large variety of van der Waals 


complexes [ 115 , 116 and 117 ]. 

The venerable bireciprocal potential consists of a repulsive term A/r^ and an attractive term -B/r^ with n> m. 
This potential function was introduced by Mie [ 118 ] but is usually named after Lennard- Jones who used it 
extensively. Almost invariably, m = 6 is chosen so that the attractive term represents the leading dispersion 
term. Many different choices of n have been used, but the most common is n = 12 because of its 
computational convenience. The 'Lennard- Jones (12,6)' potential can be written in terms of the well depth (s) 
and either the minimum position (r ) or the zero potential location (a) as 

V(r) = 4s[(<r/r)' 2 - <<r/r)«] = £[(r m /0 12 - 2<r m /r) 6 ] (A 1.5.51) 

in which the relationship a = 2 r is a consequence of having only two parameters. Fitted values of the 
coefficient 4sa of the r ° term are often twice as large as the true C 6 value because the attractive term has to 
compensate for the absence of the higher-order dispersion terms. It is remarkable that this simple model 
continues to be used almost a century after its introduction. 
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Morse [ 119 ] introduced a potential energy model for the vibrations of bound molecules 

V(r) = e [ c -aw*)fr-^ _ 2 «-*/rtfr-*J] (A 1 5 52) 

where c is a dimensionless parameter related to the curvature of the potential at its minimum. This function 
has a more realistic repulsion than the Lennard- Jones potential, but has incorrect long-range behaviour. It has 
the merit that its vibrational and rotational energy levels are known analytically [ 119 , 120 ]. 

The 'exp-6' potential replaces the inverse power repulsion in the Lennard- Jones (12, 6) function by a more 
realistic exponential form: 

^ ' 1 ^-. ■- ^ r (A 1.5.53) 

The potential has a spurious maximum at r where the r -6 term again starts to dominate. The dimensionless 
parameter a is a measure of the steepness oithe repulsion and is often assigned a value of 14 or 15. The ideas 

of an exponential repulsion and of its combination with an r -6 attraction were introduced by Slater and 
Kirkwood [ 121 ], and the cut-off at r m by Buckingham [ 122 ], An exponential repulsion, A e , is commonly 
referred to as a Born-Mayer form, pemaps because their work [ 123 ] is better known than that of Slater and 
Kirkwood. 

The parameters in simple potential models for interactions between unlike molecules A and B are often 
deduced from the corresponding parameters for the A-A and B-B interactions using 'combination rules'. For 
example, the a and s parameters are often estimated from the 'Lorentz-Berthelot' rules: 

^AU = (^A + ffu)/2 (A 1 .5.54) 


£aB = (£a£b) ! ^ (A 1.5.55) 

The former is useful but the latter tends to overestimate the well depth. A harmonic mean rule 

€ A n = 2£ A en/(€A+£B) (A 1.5.56) 

proposed by Fender and Halsey [ 124 ] is generally better than the geometric mean of equation (Al.5.55). 
Combination rules for the steepness parameter in the exp-6 model include the arithmetic mean 

gab = ("a + ^&>/2 (A 1.5.57) 
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and the somewhat more accurate harmonic mean 

GAB = 2rt A *W(tfA + ^b)- (A 1.5.58) 

Many other rules, some of which are rather more elaborate, have been proposed [ 111 ], but these rules have 
insubstantial theoretical underpinnings and they continue to be used only because there is often no better way 
to proceed. 

A1. 5.5.2 ELABORATE SPHERICAL MODELS 

The potential functions for the interactions between pairs of rare-gas atoms are known to a high degree of 
accuracy [ 125 ], However, many of them use ad hoc functional forms parametrized to give the best possible fit 
to a wide range of experimental data. They will not be considered because it is more instructive to consider 
representations that are more firmly rooted in theory and could be used for a wide range of interactions with 
confidence. 

Slater and Kirkwood's idea [ 121 ] of an exponential repulsion plus dispersion needs only one concept, 
damping functions, see section Al. 5. 3. 3 , to lead to a working template for contemporary work. Buckingham 
and Corner [ 126 ] suggested such a potential with an empirical damping function more than 50 years ago: 

V(r) = Ae~"' - (Q/r fi + C*/r s )f(r) (A 1559) 

where the damping function is 

/(r) = f^p[4Cl-r/rJ^] r<r m 
I 1 r > r m 

Modern versions of this approach use a more elaborate exponential function for the repulsion, more dispersion 
terms, induction terms if necessary, and individual damping functions for each of the dispersion, and 
sometimes induction, terms as in equation (A 1.5. 3 7) . 


Functional forms used for the repulsion include the simple exponential multiplied by a linear combination of 
powers (possibly non-integer) of r, a generalized exponential function exp(-6(r)), where b(r) is typically a 
polynomial in r, and a combination of these two ideas. 

Parametrized representations of individual damping dispersion functions were first obtained [ 127 ] by fitting 
ab initio damping functions [74] for H-H interactions. The one-parameter damping functions of Douketis et 
al are [ 127 ]: 

y;,(r) = [1 -cxp(-2JV« -0J09,V 2 /V^)r (A 1.5.61) 
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where s = pr, and p is a scale parameter (defined to be p=l/a for H-H) that enables the damping functions to 
be used for any interaction. Meath and coworkers [78, 128 ] prefer the more elaborate form 


Mr) = [1 - exp(-rt„* - b H s~ - dnS 3 )] 1 * (A 1.5.62) 

in which the a n ,b n ,d n (n = 6,8,. . .,20) are parameters obtained by fitting to ab initio damping functions for H- 
H. A one-parameter damping function of the incomplete gamma form, based on asymptotic arguments and the 
H-H interaction, is advocated by Tang and Toennies [ 129 ]: 

w 

Mr) = 1 - exp(-^r) ^(*r)7*! (A 1.5.63) 

where b is a scale parameter which is often set equal to the corresponding steepness parameter in the Born- 
Mayer repulsion. 

Functional forms based on the above ideas are used in the HFD [ 127 ] and Tang-Toennies models [ 129 ], 
where the repulsion term is obtained by fitting to Hartree-Fock calculations, and in the XC model [92] where 
the repulsion is modelled by an ab initio Coulomb term g'^and a semi-empirical exchange-repulsion term 

g m . Current versions of all these models employ an individually damped dispersion series for the attractive 

term. 

An example of a potential energy function based on all these ideas is provided by the 10-parameter function 
used [88] as a representation of ab initio potential energy curves for He-F~ and Ne-F~ 

5 
V(r) = A exp[-Mr)] - J2f2n(r)C 2fl /r" (A 1.5.64) 

where b(r) = (b n + b~,z + b~z 2 )r with z = (r - r)/(r + r ), the damping functions/ (r) are those of equation 
(Al.5.61) , the r term is a pure induction term, and the higher r terms contain both dispersion and 
induction. Note that this representation implicitly assumes that the dispersion damping functions are 
applicable to induction without change. 

A1. 5.5.3 MODEL NON-SPHERICAL INTERMOLECULAR POTENTIALS 


The complete intermolecular potential energy surface depends upon the intermolecular distance and up to five 
angles, as discussed in section Al. 5. 1.3 . 

The interaction energy can be written as an expansion employing Wigner rotation matrices and spherical 
harmonics of the angles [28, 130 ]. As a simple example, the interaction between an atom and a diatomic 
molecule can be expanded in Legendre polynomials as 
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V(r, 0) = J^ V L (r)P L (co*e). (A 1.5.65) 

;._o 

This Legendre expansion converges rapidly only for weakly anisotropic potentials. Nonetheless, truncated 
expansions of this sort are used more often than justified because of their computational advantages. 

A more natural way to account for the anisotropy is to treat the parameters in an interatomic potential, such as 
equation (A 1.5. 64) , as functions of the relative orientation of the interacting molecules. Corner [ 131 ] was 
perhaps the first to use such an approach. Pack [ 132 ] pointed out that Legendre expansions of the well depth s 
and equilibrium location r m of the interaction potential converge more rapidly than Legendre expansions of 
the potential itself. 

As an illustration, consider the function used to fit an ab initio surface for N 2 -He [86, 87]. It includes a 
repulsive term of the form 

V r , p (/. 9) = c\p[A(9) - h(8)R + y{0) Inr] (A 1.5.66) 

in which 

A{9} = An + A2P2(cose) + A 4 P4(cose) (A 1.5.67) 

and similar three-term Legendre expansions are used for b(Q) and y(0). The same surface includes an 
anisotropic attractive term consisting of damped dispersion and induction terms: 

5 

Vatt(r^) = -J2f2^0)C 2n (0)/r 2n (A 1.5.68) 

/! = } 

in which the combined dispersion and induction coefficients C 2 (0) are given by Legendre series as in 
equation (A 1.5. 19) , and the damping functions are given by a version of equation (A 1.5.61) modified so that 
the scale factor has a weak angle dependence 

p(&) - A> + P^2(COS0), (A 1.5.69) 

To improve the description of the short-range anisotropy, the surface also includes a repulsive 'site-site' term 


V„ = V^e z "^ h- V^e z "^ ( A 1.5.70) 
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where r A and r B are distances between the nitrogen atoms and the helium atom. 

A1. 5.5.4 SITE-SITE INTERMOLECULAR POTENTIALS 

The approach described in section Al. 5. 5. 3 is best suited for accurate representations of the PES for 
interactions between small molecules. Interactions between large molecules are usually handled with an 
atom-atom or site-site approach. For example, an atom-atom, exp-6 potential for the interaction between 
molecules A and B can be written as 

V ss = ^ 53[^exp£-^; lA ) - Cffr* h ] (A 1.5.71) 

where the sums are over the atoms of each molecule, and there are three parameters A ^ b ab and C^ for each 

distinct type of atom pair. A set of parameters was developed by Filippini and Gavezzotti [ 133 , 134 ] for 
describing crystal structures and another set for hydrogen bonding. 

A more accurate approach is to begin with a model of the charge distribution for each of the molecules. 
Various prescriptions for obtaining point charge models, such as fitting to the electrostatic potential of the 
molecule [ 135 , 136 ], are currently in use. Unfortunately, these point charge models are insufficiently accurate 
if only atom-centred charges are used [ 137 ]. Hence, additional charges are sometimes placed at off-atom sites. 
This increases the accuracy of the point charge model at the expense of arbitrariness in the choice of off-atom 
sites and an added computational burden. A less popular but sounder procedure is to use a distributed 
multipole model [ 28 , 138 , 139 ] instead of a point charge model. 

Once the models for the charge distributions are in hand, the electrostatic interaction is computed as the 
interaction between the sets of point charges or distributed multipoles, and added to an atom-atom, exp-6 
form that represents the repulsion and dispersion interactions. Different exp-6 parameters, often from [ 140 , 
141 and 142 ], are used in this case. The induction interaction is frequently omitted because it is small, or it is 
modelled by a single site polarizability on each molecule interacting with the point charges or distributed 
multipoles on the other. 

A further refinement [ 143 , 144 ] is to treat the atoms as being non-spherical by rewriting the repulsive part of 
the atom-atom exp-6 model, equation (A 1.5. 71), as 

Vnp = VW 5D J^exp[-6^(fi^)(r^ - /?^(fi^))] (A 1.5.72) 

where Q ^ is used as a generic designation for all the angles required to specify the relative orientation of the 
molecules, and F ref is an energy unit. The p^(Q^) functions describe the shape of the contour on which the 
repulsion energy between atoms a and b equals V ^ The spherical harmonic expansions used to represent the 

angular variation of the steepness b ab (Q ab ) and shape p ab (Q ab ) functions are quite rapidly convergent. 
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A1.6 Interaction of light with matter: a coherent 
perspective 

David J Tannor 


A1.6.1 THE BASIC MATTER-FIELD INTERACTION 

There has been phenomenal expansion in the range of experiments connected with light-molecule 
interactions. If one thinks of light as an electromagnetic (EM) wave, like any wave it has an amplitude, a 
frequency and a phase. The advent of the laser in 1960 completely revolutionized the control over all three of 
hese factors. The amplitude of the EM wave is related to its intensity; current laser capabilities allow 

intensities up to about 10 W cm , fifteen orders of magnitude larger than prelaser technology allowed. 
Laser beams can be made extremely monochromatic. Finally, it is increasingly possible to control the absolute 
phase of the laser light. There have also been remarkable advances in the ability to construct ultrashort pulses. 

Currently it is possible to construct pulses of the order of 10~ 15 s (several femtoseconds), a time scale short 
compared with typical vibrational periods of molecules. These short pulses consist of a coherent superposition 
of many frequencies of the light; the word coherent implies a precise phase relationship between the different 
frequency components. When these coherent ultrashort pulse interact with a molecule they excite coherently 
many frequency components in the molecule. Such coherent excitation, whether it is with short pulses or with 
monochromatic light, introduces new concepts in thinking about the light-matter interaction. These new 
concepts can be used passively, to learn about molecular properties via new coherent spectroscopies, or 
actively, to control chemical reactions using light, or to use light to cool atoms and molecules to temperatures 
orders of magnitude lower than 1 K. 

A theme which will run through this section is the complementarity of light and the molecule with which it 
interacts. The simplest example is energy: when a photon of energy E = fico is absorbed by a molecule it 
disappears, transferring the identical quantity of energy E = fi((D f - a^) to the molecule. But this is only one of 

a complete set of such complementary relations: the amplitude of the EM field determines the amplitude of 
the excitation; the phase of the EM phase determines the phase of the excitation; and the time of the 
interaction with the photon determines the time of excitation of the molecule. Moreover, both the magnitude 
and direction of the momentum of the photon are imparted to the molecules, an observation which plays a 
crucial role in translational cooling. Finally, because of the conservation or increase in entropy in the universe, 
any entropy change in the system has to be compensated for by an entropy change in the light; specifically, 
coherent light has zero or low entropy while incoherent light has high entropy. Entropy exchange between the 
system and the light plays a fundamental role in laser cooling, where entropy from the system is carried off by 
the light via incoherent, spontaneous emission, as well in lasing itself where entropy from incoherent light 


must be transferred to the system. 

This section begins with a brief description of the basic light-molecule interaction. As already indicated, 
coherent light pulses excite coherent superpositions of molecular eigenstates, known as 'wavepackets', and 
we will give a description of their motion, their coherence properties, and their interplay with the light. Then 
we will turn to linear and nonlinear spectroscopy, and, finally, to a brief account of coherent control of 
molecular motion. 


A1.6.1.1 ELECTROMAGNETIC FIELDS 

The material in this section can be found in many textbooks and monographs. Our treatment follows that in 
[1,2 and 3]. 

(A) MAXWELL'S EQUATIONS AND ELECTROMAGNETIC POTENTIALS 

The central equations of electromagnetic theory are elegantly written in the form of four coupled equations for 
the electric and magnetic fields. These are known as Maxwell's equations. In free space, these equations take 
the form: 


„ ^ UB 

V X E = (A1.6.1) 

c dr 

(A1.6.2) 

V ■ E = 4.7rp (A1.6.3) 

V *B = (A1.6.4) 

where E is the electric field vector, B is the magnetic field vector, J is the current density, p is the charge 
density and c is the speed of light. It is convenient to define two potentials, a scalar potential § and a vector 
potential^, such that the electric and magnetic fields are defined in terms of derivatives of these potentials. 
The four Maxwell equations are then replaced by two equations which define the fields in terms of the 
potentials, 

E = -V0 (A1.6.5) 

B = V xA (A1.6.6) 

together with two equations for the vector and scalar fields themselves. Note that there is a certain amount of 
flexibility in the choice of A and §, such that the same values for E and B are obtained (called gauge 
invariance). We will adopt below the Coulomb gauge, in which V • A = 0. 

In free space (p = 0, J= 0, fy = constant), the equations for the potentials decouple and take the following 
simple form: 

V 2 ^ = (A1.6.7) 


V 2 A= ^^A (A1 . 6 . 8) 

c 2 at* 


Equation (Al.6.8) , along with the definitions (Al.6.5) and (Al.6.6) constitute the central equation for the 
propagation of electromagnetic waves in free space. The form of section Al. 6.4 admits harmonic solutions of 
the form 


A = Aq costk * r-tet) (A1 .6.9) 

from which it follows that 

E = j4osin(fc * r - tot) = Euv$in(kr - wf) (A1.6.10) 

€ 

B = -A\ } {k x e) smik - r - tot) (A1.6.11) 

(e is a unit vector in the direction of E and co = \k\c). 

(B) ENERGY AND PHOTON NUMBER DENSITY 

In what follows it will be convenient to convert between field strength and numbers of photons in the field. 
According to classical electromagnetism, the energy E in the field is given by 


-/' 


E 2 -D 2 

r , (A1.6.12) 

8,T 


If we assume a single angular frequency of the field, co, and a constant magnitude of the vector potential, A^, 
in the volume V, we obtain, using equation (Al.6.10) and equation (Al.6.1 1), and noting that the average 
value of sin (x) = 1/2, 

E 2 

E = V— . (A1.6.13) 

Stt 

But by the Einstein relation we know that the energy of a single photon on frequency co is given by fico, and 
hence the total energy in the field is 

E = NThd (A1.6.14) 

where N is the number of photons. Combining equation (Al.6.1 3) and equation (Al.6.14) we find that 

1/2 




(A1.6.15) 


Equation (Al.6.1 5) provides the desired relationship between field strength and the number of photons. 


A1.6.1.2 INTERACTION BETWEEN FIELD AND MATTER 


(A) CLASSICAL THEORY 


To this point, we have considered only the radiation field. We now turn to the interaction between the matter 
and the field. According to classical electromagnetic theory, the force on a particle with charge e due to the 
electric and magnetic fields is 


/ v x B \ 


(A1.6.16) 
This interaction can also be expressed in terms of a Hamiltonian: 

(A1.6.17) 


mp . A)= ±.( p _i A f 


where A = A(x) and where p and x are the conjugate variables that obey the canonical Hamilton equations. 
(Verifying that equation (Al.6.17) reduces to equation (Al.6.16) is non-trivial (cf [3])). Throughout the 
remainder of this section the radiation field will be treated using classical electromagnetic theory, while the 
matter will be treated quantum mechanically, that is, a 'semiclassicaP treatment. The Hamiltonian form for 
the interaction, equation (Al.6.17), provides a convenient starting point for this semiclassical treatment. 

(B) QUANTUM HAMILTONIAN FOR A PARTICLE IN AN ELECTROMAGNETIC FIELD 

To convert the Hamiltonian for the material from a classical to a quantum form, we simply replace/? with -ifi 
V. This gives: 


"=2^ 


c 


1 (A1.6.18) 


frV 2 . „ . ihe ,_ ..._.. e 2 „ . (m.6.19) 


+ V, + - — (V- A + A*V) + - — A- A 
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//ft + V (A1.6.20) 


where H Q is the Hamiltonian of the bare system and Fis the part of the Hamiltonian that comes from the 
radiation field and the radiation-matter interaction. Note that an additional term, V , has been included in the 
system Hamiltonian, to allow for internal potential energy of the system. V $ contains all the interesting 
features that make different atoms and molecules distinct from one another, and will play a significant role in 
later sections. 

We now make the following observations, 
(i) For many charged particles 


7 - 

V = V — If V - A +■ A • V) +■ —A - .4. 
A-* 2/wt- 2m t 


(ii) In the Coulomb gauge V-A = 0. This implies that V-(A\\f) = A-V\\f for any \\f, and hence the terms 
linear in A can be combined: 


V- A i A- V = 2A* V, 


(iii) The quadratic term in A, 


- — c A * A 

1 S 9 

can be neglected except for very strong fields, on the order of 10 W cm [4]. 

(iv) For isolated molecules, it is generally the case that the wavelength of light is much larger than the 

molecular dimensions. In this case it is a good approximation to make the replacement e « 1, which 
allows the replacement [3] 

e 
V = A -p= —E~ er. 

mc 

For many electrons and nuclei, Stakes the following form: 

V = ~ E ' Y, Z '" rf ' = " fi ■ A (A1.6.21) 

t 

where we have defined the dipole operator, ft = j~\ Zjefr The dipole moment is seen to be a product 

of charge and distance, and has the physical interpretation of the degree of charge separation in the 
atom or molecule. Note that for not-too-intense fields, equation (Al.6.21) is the dominant term in the 
radiation-matter interaction; this is the dipole approximation. 

A1. 6.1. 3 ABSORPTION, STIMULATED EMISSION AND SPONTANEOUS EMISSION OF LIGHT 

Consider a quantum system with two levels, a and b, with energy levels E Q and E^. Furthermore, let the 
perturbation 


between these levels be of the form equation (Al.6.21) , with monochromatic light, that is, E = 2?qCOs(o>0 
resonant to the transition frequency between the levels, so co = (E b - E^lh = E^Jft. The perturbation matrix 

element between a and b is then given by 


V ha = fi,cos(<wD ■ {b\v\a) = jic^+c-'"*) • fa (A1.6.22) 

where 

&«h = {h\fi\n} = fit* 

is the dipole matrix element. There are three fundamental possible kinds of transitions connected by the dipole 
interaction: absorption (a — » b), corresponding to the second term in equation (Al.6.22); stimulated emission 
(b — » a) governed by the first term in equation (Al.6.22); and spontaneous emission (also b — > a), for which 
there is no term in the classical radiation field. For a microscopic description of the latter, a quantum 


mechanical treatment of the radiation field is required. Nevertheless, there is a simple prescription for taking 
spontaneous emission into account, which was derived by Einstein during the period of the old quantum 
theory on the basis of considerations of thermal equilibrium between the matter and the radiation. Although 
for most of the remainder of this section the assumption of thermal equilibrium will not be satisfied, it is 
convenient to invoke it here to quantify spontaneous emission. 

Fermi's Golden Rule expresses the rate of transitions between b and a as 

W = -f|V* B | 2 p(£^) (A1.6.23) 

where p(E ba ) is the density of final states for both the system and the light. As described above, we will 
consider the special case of both the matter and light at thermal equilibrium. The system final state is by 
assumption non-degenerate, but there is a frequency dependent degeneracy factor for thermal light, p(E) d E, 
where 


V or 
(2^77 ft 


p{E) = ~^i (A1.6.24) 

(2,7 < > 


and Fis the volume of the 'box' and Q is an element of solid angle. 

The thermal light induces transitions from a — » b and from b ^ am proportion to the number of photons 
present. The number of transitions per second induced by absorption is 

^(fl -* '') = -T IKrfpfptEfr - *">> (A1.6.25) 
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= ^^|e - ^|i_L_^dQ. (A1.6.26) 

h 4 (lire) 3 ft 

Integrating over all solid angles and using equation (Al.6.15) and equation (A 1.6. 10) we find 

W ab% Ul -+b)= —N^lflrtl 2 . (A1-6-27) 

3fi r-* 

For thermal light, the number of transitions per second induced by stimulated emission integrated over solid 
angles, ^ stim , is equal to W b . The total emission, which is the sum of the stimulated and spontaneous 
emission, may be obtained by letting N — » N + 1 in the expression for stimulated emission, giving 

W M (b -* a) = 4-W + D^-I^i-I 2 - < A1 - 6 - 28 ) 

3ft f J 

Einstein's original treatment [5] used a somewhat different notation, which is still in common use: 
where 


_ IhNhrt _ 2Na?h 
P = — - — P(f ) = — 

is the energy in the field per unit volume between frequencies v and v + dv (the 'radiation density') (the factor 
of 2 comes from the two polarizations of the light and the factor h from the scaling between energy and 
frequency). Comparing with equation (Al.6.27) leads to the identification 

B{h -+ a) = B{a -> h) = — T |^| 2 - 

Moreover, in Einstein's treatment 

4 /o>\3 


W Hpoill (ft -* a) = Aih ->«) = — \-j \p ah \ 7 


leading to the following ratio of the Einstein A and B coefficients: 

A(h-> a) 2h /w\* 


B(t 




(A1.6.29) 


The argument is sometimes given that equation (A 1.6. 2 9) implies that the ratio of spontaneous to stimulated 
emission goes as the cube of the emitted photon frequency. This argument must be used with some care: recall 
that for light at thermal equilibrium, ^ stim goes as Bfi, and hence the rate of stimulated emission has a factor 

of (co/c) coming from p. The ratio of the spontaneous to the stimulated emission rates is therefore frequency 
independent! However, for non-thermal light sources (e.g. lasers), only a small number of energetically 
accessible states of the field are occupied, and the ^factor is on the order of unity. The rate of spontaneous 

emission still goes as co 3 , but the rate of stimulated emission goes as co, and hence the ratio of spontaneous to 
stimulated emission goes as co 2 . Thus, for typical light sources, spontaneous emission dominates at 
frequencies in the UV region and above, while stimulated emission dominates at frequencies in the far-IR 
region and below, with both processes participating at intermediate frequencies. 

A1.6.1.4 INTERACTION BETWEEN MATTER AND FIELD 

In the previous sections we have described the interaction of the electromagnetic field with matter, that is, the 
way the material is affected by the presence of the field. But there is a second, reciprocal perspective: the 
excitation of the material by the electromagnetic field generates a dipole (polarization) where none existed 
previously. Over a sample of finite size this dipole is macroscopic, and serves as a new source term in 
Maxwell's equations. For weak fields, the source term, P, is linear in the field strength. Thus, 


P = xE (A1.6.30) 

where the proportionality constant %, called the (linear) susceptibility, is generally frequency dependent and 
complex. As we shall see below, the imaginary part of the linear susceptibility determines the absorption 
spectrum while the real part determines the dispersion, or refractive index of the material. There is a universal 
relationship between the real part and the imaginary part of the linear susceptibility, known as the Kramers- 
Kronig relation, which establishes a relationship between the absorption spectrum and the frequency- 
dependent refractive index. With the addition of the source term P, Maxwell's equations still have wavelike 


solutions, but the relation between frequency and wavevector in equation (A 1.6. 10) must be generalized as 
follows: 


©'='-(5)'="- 


(A1.6.31) 


The quantity 1 + % is known as the dielectric constant, £; it is constant only in the sense of being independent 
of E, but is generally dependent on the frequency of E. Since % is generally complex so is the wavevector k. It 
is customary to write 

kc 

— = JJ + IJC (A1.6.32) 

where r| and k are the refractive index and extinction coefficient, respectively. The travelling wave solutions 
to Maxwell's equations, propagating in the z-direction now take the form 


exp(i(£- -<ot)) =exp iaif ^ - t J - I ^^ J . (A1.6.33) 

In this form it is clear that k leads to an attenuation of the electric field amplitude with distance (i.e. 
absorption). 

For stronger fields the relationship between the macroscopic polarization and the incident field is non-linear. 
The general relation between P and E is written as 

P = x°*E+X i2) ' E 2 +X m ; E 3 + -- = P CI) + F (2 '+_P (3 > + -... (A1.6.34) 

The microscopic origin of % and hence of P is the non-uniformity of the charge distribution in the medium. To 
lowest order this is given by the dipole moment, which in turn can be related to the dipole moments of the 
component molecules in the sample. Thus, on a microscopic quantum mechanical level we have the relation 

P = {$\fi\i'). (A1.6.35) 

Assuming that the material has no permanent dipole moment, P originates from changes in the wavefunction 
\|/ that are induced by the field; this will be our starting point in section Al. 6.4 . 


A1.6.2 COHERENCE PROPERTIES OF LIGHT AND MATTER 

In the previous section we discussed light and matter at equilibrium in a two-level quantum system. For the 
remainder of this section we will be interested in light and matter which are not at equilibrium. In particular, 
laser light is completely different from the thermal radiation described at the end of the previous section. In 
the first place, only one, or a small number of states of the field are occupied, in contrast with the Planck 
distribution of occupation numbers in thermal radiation. Second, the field state can have a precise phase; in 
thermal radiation this phase is assumed to be random. If multiple field states are occupied in a laser they can 
have a precise phase relationship, something which is achieved in lasers by a technique called 'mode-locking'. 
Multiple frequencies with a precise phase relation give rise to laser pulses in time. Nanosecond experiments 


have been very useful in probing, for example, radiationless transitions, intramolecular dynamics and 
radiative lifetimes of single vibronic levels in molecules. Picosecond experiments have been useful in 
probing, for example, collisional relaxation times and rotational reorientation times in solutions. Femtosecond 
experiments have been useful in observing the real time breaking and formation of chemical bonds; such 
experiments will be described in the next section. Any time that the phase is precisely correlated in time over 
the duration of an experiment, or there is a superposition of frequencies with well-defined relative phases, the 
process is called coherent. Single frequency coherent processes will be the major subject of section Al.6.2, 
while multifrequency coherent processes will be the focus for the remainder of the section. 
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A1. 6.2.1 WAVEPACKETS: SOLUTIONS OF THE TIME-DEPENDENT SCHRODINGER EQUATION 

The central equation of (non-relativistic) quantum mechanics, governing an isolated atom or molecule, is the 
time-dependent Schrodinger equation (TDSE): 

Ift-^ = ff^U, f). (A1.6.36) 

In this equation H is the Hamiltonian (developed in the previous section) which consists of the bare system 
Hamiltonian and a term coming from the interaction between the system and the light. That is, 

7i 2 
H = -—V 2 H- VAX) ~ £ (0m- (A1.6.37) 

2lil 

Since we are now interested in the possibility of coherent light, we have taken the interaction between the 
radiation and matter to be some general time-dependent interaction, V= -E(t)\x, which could in principle 
contain many frequency components. At the same time, for simplicity, we neglect the vector character of the 
electric field in what follows. The vector character will be reintroduced in section Al. 6.4 , in the context of 
nonlinear spectroscopy. 

Real molecules in general have many quantum levels, and the TDSE can exhibit complicated behaviour even 
in the absence of a field. To simplify matters, it is worthwhile discussing some properties of the solutions of 
the TDSE in the absence of a field and then reintroducing the field. First let us consider 

// = -iLv 2 t V(xh (A1.6.38) 

Since in this case the Hamiltonian is time independent, the general solution can be written as 

TO 

*U,D = ^r^Je-'^'K (A1.6.39) 

(This expression assumes a system with a discrete level structure; for systems with both a discrete and a 
continuous portion to their spectrum the expression consists of a sum over the discrete states and an integral 
over the continuous states.) Here, \|/ (x) is a solution of the time-independent Schrodinger equation, 

H*fr n (x) = E n iff(x), 

with eigenvalue E„. The coefficients, a„, satisfy the normalization condition E„ \aj = 1, and are time 


independent in this case. Equation (Al.6.39) describes a moving wavepacket, that is, a state whose average 
values in coordinate and momentum change with time. To see this, note that according to quantum mechanics 

|\|/ (x,t)\ 2 dx is the probability to find the particle between x and x + dx at time t. Using equation (Al.6.39) we 
see that 
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|¥(JM)I 2 = £ a fl >,CW^(^" PP ^ ,( 


in other words, the probability density has a non- vanishing time dependence so long as there are components 
of two or more different energy eigenstates. 

One of the remarkable features of time evolution of wavepackets is the close connection they exhibit with the 
motion of a classical particle. Specifically, Ehrenfest's theorem indicates that for potentials up to quadratic, 
the average value of position and momentum of the quantum wavepacket as a function of time is exactly the 
same as that of a classical particle on the same potential that begins with the corresponding initial conditions 
in position and momentum. This classical-like behaviour is illustrated in figure Al.6.1 for a displaced 
Gaussian wavepacket in a harmonic potential. For the case shown, the initial width is the same as the ground- 
state width, a 'coherent state', and hence the Gaussian moves without spreading. By way of contrast, if the 
initial Gaussian has a different width parameter, the centre of the Gaussian still satisfies the classical 
equations of motion; however, the width will spread and contract periodically in time, twice per period. 
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Figure Al.6.1. Gaussian wavepacket in a harmonic oscillator. Note that the average position and momentum 
change according to the classical equations of motion (adapted from [6]). 
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A1. 6.2.2 COHERENCE IN A TWO-LEVEL SYSTEM: THE RABI SOLUTION 

We now add the field back into the Hamiltonian, and examine the simplest case of a two-level system coupled 
to coherent, monochromatic radiation. This material is included in many textbooks (e.g. [6, 7, 8, 9, 10 and 
11]). The system is described by a Hamiltonian H Q having only two eigenstates, \\f a and \|/^, with energies E 
= fico^ and E b = ftco^. Define a> = co^ - co fl . The most general wavefunction for this system may be written as 

*{f) = fl{0c"^ r f fl + fc(0e" ta ^- (A1.6.40) 

The coefficients a(t) and b{t) are subject to the constraint that \a(t)\ + \b(t)\ = 1. If we couple this system to a 
light field, represented as V= -M-^2? cos((Dt), then we may write the TDSE in matrix form as 

At \b(t)e-™ ) " \-Ui llh EcosUoi) E h )\b{t)t-^)' { '' ' 

To continue we define a detuning parameter, A = co - 03 Q . If A <Kco then exp(-i(co - a> )0 is slowly varying 

while exp(-i(co + co )0 is rapidly varying and cannot transfer much population from state A to state B. We 
therefore ignore the latter term; this is known as the 'rotating wave approximation'. If we choose as initial 
conditions \a(0)\ 2 = 1 and \b(0)\ 2 = then the solution of equation (Al.6.41) is 

ait) = e~^'l cos [ -£2/ I -i— sin ( -Qt ) \ (A1.6.42) 


( cos (H -i £ sin (H) 
(£t)(HH> 


b(t) = e" s At I rr-T I ( 2i sm I ^ Qt \ I . (A1 .6.43) 


where the Rabi frequency, Q, is defined as 


(¥)■ 


Q = J A 2+{^L\ . (A1.6.44) 


The populations as functions of time are then 

L 2 




(A1.6.45) 


(A1.6.46) 
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The population in the upper state as a function of time is shown in figure Al.6.2. There are several important 
things to note. At early times, resonant and non-resonant excitation produce the same population in the upper 
state because, for short times, the population in the upper state is independent of the Rabi frequency: 


<*- (a 1 * G°o =*(£)'■ 


(A1.6.47) 


One should also notice that resonant excitation completely cycles the population between the lower and upper 
state with a period of 2tt/Q. Non-resonant excitation also cycles population between the states but never 
completely depopulates the lower state. Finally, one should notice that non-resonant excitation cycles 
population between the two states at a faster rate than resonant excitation. 



Figure Al.6.2. The population in the upper state as a function of time for resonant excitation (full curve) and 
for non-resonant excitation (dashed curve). 

A1. 6.2.3 GEOMETRICAL REPRESENTATION OF THE EVOLUTION OF A TWO-LEVEL SYSTEM 

A more intuitive, and more general, approach to the study of two-level systems is provided by the Feynman- 
Vernon-Hellwarth geometrical picture. To understand this approach we need to first introduce the density 
matrix. 


In the Rabi solution of the previous section we considered a wavefunction \\f(t) of the form 

*(0 = fl(r)e"^> d + ft(Oe H( *V^ 


(A1.6.48) 
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We saw that the time-dependent populations in each of the two levels is given by P a = \a(t)\ and P b = \b(t)\ . 
So long as the field is on, these populations continue to change; however, once the external field is turned off, 
these populations remain constant (discounting relaxation processes, which will be introduced below). Yet the 
amplitudes in the states \|/ and \y b do continue to change with time, due to the accumulation of time- 
dependent phase factors during the field-free evolution. We can obtain a convenient separation of the time- 
dependent and the time-independent quantities by defining a density matrix, p. For the case of the 
wavefunction |\|/), p is given as the 'outer product' of |\|/) with itself, 


This outer product gives four terms, which may be arranged in matrix form as 


(A1.6.49) 


p =[ar <**■*»<* \b\* ) (A1 - 6 - 50) 

Note that the diagonal elements of the matrix, \a\ and \b\ , correspond to the populations in the energy levels, 
a and b, and contain no time dependence, while the off-diagonal elements, called the coherences, contain all 
the time dependence. 

A differential equation for the time evolution of the density operator may be derived by taking the time 
derivative of equation (Al.6.49) and using the TDSE to replace the time derivative of the wavefunction with 
the Hamiltonian operating on the wavefunction. The result is called the Liouville equation, that is, 

i7 ? ^ =[H,p]. (A1.6.51) 

iff 

The strategy for representing this differential equation geometrically is to expand both //and p in terms of the 
three Pauli spin matrices, cr 1? a 2 an< ^ a 3 anc * then view the coefficients of these matrices as time-dependent 
vectors in three-dimensional space. We begin by writing the the two-level system Hamiltonian in the 
following general form, 


' \ V ttif E lt ) 


H =[v: fZ) < ai - 6 - 52 > 

where we take the radiation-matter interaction to be of the dipole form, but allow for arbitrary time-dependent 
electric fields: 

Vto^-HtoEO). (A1.6.53) 

Moreover, we will write the density matrix for the system as 


_(hh* n*h\ 
~ \ab* aa* / 


(A1.6.54) 
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where a and b now contain the bare system evolution phase factors. We proceed to express both the 
Hamiltonian and the density matrix in terms of the standard Pauli spin matrices: 


p = {ah M + a w b)<T\ + itii m b -tib*)a 2 + (bb m -aa*)<Ji. 

r\ * m 2 H 

We now define the three-dimensional vectors, Tand S2, consisting of the coefficients of the Pauli matrices in 
the expansion of p and H, respectively: 

? = (n*r 2 *r 3 ) (A1.6.55) 


n 


(A1.6.56) 


Using these vectors, we can rewrite the Liouville equation for the two-level system as 


— r = n x r. 


(A1.6.57) 


Note that r 3 is the population difference between the upper and lower states: having all the population in the 
lower state corresponds to r 3 = -1 while having a completely inverted population (i.e. no population in the 
lower state) corresponds to r 3 = +1. 

This representation is slightly inconvenient since E^ and E 2 in equation (Al.6.56) are explicitly time- 
dependent. For a monochromatic light field of frequency co, we can transform to a frame of reference rotating 
at the frequency of the light field so that the vector J2is a constant. To completely remove the time dependence 


we make the rotating wave approximation (RWA) as before: E cos(cdt) = ^{E e 
the rotating frame, the Liouville equation for the system is 


3 / r- _-i(DT _j_ £ e iCDT\ v 3 


) -> jE e- 1C0T . In 


— r' = Q J x r 


(A1.6.58) 


where JV is now time independent. The geometrical interpretation of this equation is that the pseudospin 
vector, r\ precesses around the field vector, Q', in exactly the same way that the angular momentum vector 
precesses around a body fixed axis of a rigid object in classical mechanics. This representation of the two- 
level system is called the Feynman-Vernon-Hellwarth, or FVH representation; it gives a unified, pictorial 
view with which one can understand the effect of a wide variety of optical pulse effects in two-level systems. 
For example, the geometrical picture of Rabi 
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cycling within the FVH picture is shown in figure Al.6.3. Assuming that at t = all the population is in the 
ground-state then the initial position of the T* vector is (0,0,-1), and so K' points along the negative z-axis. For 
a resonant field, co - co = and so the jy vector points along the x-axis. Equation (A 1.6. 5 8) then says that the 

population vector simply precesses about the x-axis. It then periodically points along the positive z-axis, 
which corresponds to having all the population in the upper state. If the field is non-resonant, then £V no 
longer points along the x-axis but along some other direction in the xz-plane. The population vector still 
precesses about the field vector, but now at some angle to the z-axis. Thus, the projection onto the z-axis of f 
never equals one and so there is never a complete population inversion. 



Figure Al.6.3. FVH diagram, exploiting the isomorphism between the two-level system and a pseudospin 
vector precessing on the unit sphere. The pseudospin vector, K\ precesses around the field vector, £2', 
according to the equation d^'/d t = Qx f. The z-component of the F vector is the population difference 
between the two levels, while the x- and ^-components refer to the polarization, that is, the real and imaginary 
parts of the coherence between the amplitude in the two levels. In the frame of reference rotating at the carrier 
frequency, the z-component of the Sector is the detuning of the field from resonance, while the x- and y- 
components indicate the field amplitude. In the rotating frame, the ^-component of I2may be set equal to zero 
(since the overall phase of the field is irrelevant, assuming no coherence of the levels at t = 0), unless there is 
non-uniform change in phase in the field during the process. 

The FVH representation allows us to visualize the results of more complicated laser pulse sequences. A laser 
pulse which takes F f from (0,0,-1) to (0,0,1) is called a Ti-pulse since the T' vector precesses n radians about 
the field vector. Similarly, a pulse which takes T' from (0,0,-1) to (+1,0,0) is called a tt/2 -pulse. The state 
represented by the vector (+1,0,0) is a coherent superposition of the upper and lower states of the system. 

One interesting experiment is to apply a Ti/2-pulse followed by a tt/2 phase shift of the field. This phase shift 
will bring ft' parallel to f. Since now £2' x f = 0, the population is fixed in time in a coherent superposition 
between the ground and excited states. This is called photon locking. 

A second interesting experiment is to begin with a pulse which is far below resonance and slowly and 
continuously sweep the frequency until the pulse is far above resonance. At t = -go the field vector is pointing 
nearly along the -z-axis, and is therefore almost parallel to the state vector. As the field vector slowly moves 
fromz = -l toz = +1 
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the state vector adiabatically follows it, precessing about the instantaneous direction of the field vector (figure 
Al.6.4). When, at t — » +oo, the field vector is directed nearly along the +z-axis, the state vector is directed 
there as well, signifying complete population inversion. The remarkable feature of 'adiabatic following', as 
this effect is known, is its robustness — there is almost no sensitivity to either the field strength or the exact 
schedule of changing the frequency, provided the conditions for adiabaticity are met. 



Figure Al.6.4. FVH diagram, showing the concept of adiabatic following. The Bloch vector, T\ precesses in 
a narrow cone about the rotating frame torque vector, £2'. As the detuning, A, changes from negative to 
positive, the field vector, £2', becomes inverted. If the change in J2' is adiabatic the Bloch vector follows the 


field vector in this inversion process, corresponding to complete population transfer to the excited state. 
A1. 6.2.4 RELAXATION OF THE DENSITY OPERATOR TO EQUILIBRIUM 

In real physical systems, the populations \a(t)\ 2 and \b(t)\ 2 are not truly constant in time, even in the absence of 
a field, because of relaxation processes. These relaxation processes lead, at sufficiently long times, to thermal 

equilibrium, characterized by the populations ?* = e " '*/&' ^ = c Vfi ? where Q is the canonical partition 
function which serves as a normalization factor and P = llkT, where k is the Boltzmann's constant and 7 is the 
temperature. The thermal equilibrium state for a two-level system, written as a density matrix, takes the 
following form: 




(A 1.6.59) 


The populations, c~^ E "/@ ? appear on the diagonal as expected, but note that there are no off-diagonal 
elements — no coherences; this is reasonable since we expect the equilibrium state to be time-independent, and 
we have associated the coherences with time. 
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It follows that there are two kinds of processes required for an arbitrary initial state to relax to an equilibrium 
state: the diagonal elements must redistribute to a Boltzmann distribution and the off-diagonal elements must 
decay to zero. The first of these processes is called population decay; in two-level systems this time scale is 
called Ty The second of these processes is called dephasing, or coherence decay; in two-level systems there is 
a single time scale for this process called T 2 . There is a well-known relationship in two level systems, valid 
for weak system-bath coupling, that 


(A 1.6.60) 


where T~ is the time scale for so-called pure dephasing. Equation (Al.6.60) has the following significance: 
even without pure dephasing there is still a minimal dephasing rate that accompanies population relaxation. 

In the presence of some form of relaxation the equations of motion must be supplemented by a term involving 
a relaxation superoperator — superoperator because it maps one operator into another operator. The literature 
on the correct form of such a superoperator is large, contradictory and incomplete. In brief, the extant theories 
can be divided into two kinds, those without memory relaxation (Markovian) Tp and those with memory 

relaxation (non-Markovian) J-x *' ™ ' "**' ^ . The Markovian theories can be further subdivided into 
those that preserve positivity of the density matrix (all/? w > in equation (Al.6.66) for all admissible p) and 
those that do not. For example, the following widely used Markovian equation of motion is guaranteed to 
preserve positivity of the density operator for any choice of {Fj}: 


3r " 


H 


L J j" P 


(A 1.6.61] 


As an example, consider the two-level system, with relaxation that arises from spontaneous emission. In this 
case there is just a single V.\ 


v =y [!! {l J) "* ='"*(? ?)• < A1662 > 


It is easy to verify that the dissipative contribution is given by 




(A 1.6.63) 


We now make two connections with topics discussed earlier. First, at the beginning of this section we defined 
1/7^ as the rate constant for population decay and l/T 2 as the rate constant for coherence decay. Equation 
(Al.6.63) shows that for spontaneous emission 1/7^ = y, while \IT 2 = y/2; comparing with equation (Al.6.60) 

we see that for spontaneous emission, l/T- = 0. Second, note that y is the rate constant for population transfer 
due to spontaneous emission; it is identical to the Einstein A coefficient which we defined in equation 
(Al.6.3) . 
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For the two-level system, the evolution equation for p may also be expressed, as before, in terms of the three- 
vector K: 


d 

—r^Qxr-Vr (A 1.6.64) 

where 

^^•^(wtA} (A1665) 

Equation (Al.6.64) describes the relaxation to equilibrium of a two-level system in terms of a vector equation. 
It is the analogue of the Bloch equation, originally developed for magnetic resonance, in the optical regime 
and hence is called the optical Bloch equation. 

In the above discussion of relaxation to equilibrium, the density matrix was implicitly cast in the energy 
representation. However, the density operator can be cast in a variety of representations other than the energy 
representation. Two of the most commonly used are the coordinate representation and the Wigner phase space 
representation. In addition, there is the diagonal representation of the density operator; in this representation, 
the most general form of p takes the form 


p = 5Zni*i){*i-i 


(A 1.6.66) 


where the/? z - are real numbers, 0>p f > 1 and Xp z - = 1. This equation expresses p as an incoherent 
superposition of fundamental density operators, |\|/ ? .) (\|/ z .|, where |\|/ z .) is a wavefunction but not necessarily an 
eigenstate. In equation (Al.6.66), thQp f are the probabilities (not amplitudes) of finding the system in state 

|\|/ .). Note that in addition to the usual probabilistic interpretation for finding the particle described by a 
particular wavefunction at a specified location, there is now a probability distribution for being in different 
eigenstates! If one of thep f = 1 and all the others are zero, the density operator takes the form equation 
(Al.6.49) and corresponds to a single wavefunction; we say the system is in a pure state. If more than one of 
the/? 7 . > we say the system is in a mixed state. 


A measure of the purity or coherence of a system is given by ^Pf"- ^Pf = 1 for a pure state and £/? ^ 1 
for a mixed state; the greater the degree of mixture the lower will be the purity. A general expression tor the 

purity, which reduces to the above definition but is representation free, is given by Tr(p 2 ): Tr(p 2 ) < 1 for a 
mixed state and Tr(p ) = 1 for a pure state. Note that in the absence of dissipation, the purity of the system, as 
measured by Tr(p ), is conserved in time. To see this, take the equation of motion for p to be purely 
Hamiltonian, that is, 

P = -Uh,pI (A 1.6.67) 

Ti 
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Then: 

— Ti(p 2 ) = Uvpp = ^Tr(p[H, pi) = ^Tr(p{Hp - pH)) = (A 1.6.68) 

d/ ifi m 

where in the last step we have used the cyclic invariance of the trace. This invariance of the purity to 
Hamiltonian manipulations is essentially equivalent to the invariance of phase space density, or entropy, to 
Hamiltonian manipulations. Including the dissipative part to the equations of motion gives 

A=-t[H.p]+I % p and — Tr{p 2 ) = 2TY( />!>)< (A 1.6.69) 

ft df 

In concluding this section, we note the complementarity of the light and matter, this time in terms of 
coherence properties (i.e. phase relations). The FVH geometrical picture shows explicitly how the phase of 
the field is inseparably intertwined with the phase change in the matter; in the next section, in the context of 
short pulses, we shall see how the time of interaction with the pulse is similarly intertwined with the time of 
the response of the molecule, although in general an integration over all such times must be performed. But 
both these forms of complementarity are on the level of the Hamiltonian portion of the evolution only. The 
complementarity of the dissipation will appear at the end of this section, in the context of laser cooling. 


A1.6.3 THE FIELD TRANSFERS ITS COHERENCE TO THE MATTER 

Much of the previous section dealt with two-level systems. Real molecules, however, are not two-level 
systems: for many purposes there are only two electronic states that participate, but each of these electronic 
states has many states corresponding to different quantum levels for vibration and rotation. A coherent 
femtosecond pulse has a bandwidth which may span many vibrational levels; when the pulse impinges on the 
molecule it excites a coherent superposition of all these vibrational states — a vibrational wavepacket. In this 
section we deal with excitation by one or two femtosecond optical pulses, as well as continuous wave 
excitation; in section Al. 6.4 we will use the concepts developed here to understand nonlinear molecular 
electronic spectroscopy. 

The pioneering use of wavepackets for describing absorption, photodissociation and resonance Raman spectra 
is due to Heller [12, 13, 14, 15 and 16]. The application to pulsed excitation, coherent control and nonlinear 
spectroscopy was initiated by Tannor and Rice ([17] and references therein). 

A1. 6.3.1 FIRST-ORDER AMPLITUDE: WAVEPACKET INTERFEROMETRY 

Consider a system governed by Hamiltonian H=H n J r H u where H n is the bare molecular Hamiltonian and 


H j is the perturbation, taken to be the -\x E(i) as we have seen earlier. Adopting the Born-Oppenheimer (BO) 
approximation and specializing to two BO states, H Q can be written as 
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_{H CJ 0\ 


and //j as 


H(,= [ " „ (A 1.6.70) 


Hl = ( ° -^f<oy (A1671 , 


The TDSE in matrix form reads: 


Note the structural similarity between equation (Al.6.72) and equation (Al.6.41) , with E a and E b being 
replaced by H and H b , the BO Hamiltonians governing the quantum mechanical evolution in electronic states 
a and b, respectively. These Hamiltonians consist of a nuclear kinetic energy part and a potential energy part 
which derives from nuclear-electron attraction and nuclear-nuclear repulsion, which differs in the two 
electronic states. 

If H^ is small compared with H Q we may treat H^ by perturbation theory. The first-order perturbation theory 
formula takes the form [18, 19, 20 and 21]: 

V U U< /)4f C-^^^-^l-^fiW^JC-^^^V.U, 0) dl' (A 1.6.73) 

& Jo 

where we have assumed that all the amplitude starts on the ground electronic state. This formula has a very 
appealing physical interpretation. At t = the wave function is in, say, v = of the ground electronic state. The 
wavefunction evolves from t = until time f under the ground electronic state, Hamiltonian, H . If we assume 
that the initial state is a vibrational eigenstate of H q9 {H ( \^ v = £ ,ijO, there is no spatial evolution, just the 
accumulation of an overall phase factor; that is the action of £" (WWj 'VaC*i 0)can be replaced by 
e -{l ' ft,£ *''^ l ,(jr, 0) df'. For concreteness, in what follows we will take v = 0, which is the most common case of 
interest. At t = f the electric field, of amplitude £(t'), interacts with the transition dipole moment, promoting 
amplitude to the excited electronic state. This amplitude evolves under the influence of H b from time f until 
time t. The integral d f indicates that one must take into account all instants in time f at which the interaction 
with the field could have taken place. In general, if the field has some envelope of finite duration in time, the 
promotion to the excited state can take place at any instant under this envelope, and there will be interference 
from portions of the amplitude that are excited at one instant and portions that are excited at another. The 
various steps in the process may be visualized schematically with the use of Feynman diagrams. The Feynman 
diagram for the process just described is shown in figure Al.6.5 . 
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Figure Al.6.5 Feynman diagram for the first-order process described in the text. 


We will now proceed to work through some applications of this formula to different pulse sequences. Perhaps 
the simplest is to consider the case of a 8-function excitation by light. That is, 

f(f') = 5(i'-f]). (A 1.6.74) 

In this case, the first-order amplitude reduces to 

*' l W) = T5r«" (iA, * f '" M [-^»e" <i/ft)En/l *ftU-,0>. (A 1.6.75) 

iw 

Within the Condon approximation (\i ba independent of x), the first-order amplitude is simply a constant times 
the initial vibrational state, propagated on the excited-state potential energy surface! This process can be 
visualized by drawing the ground-state vibrational wavefunction displaced vertically to the excited-state 
potential. The region of the excited-state potential which is accessed by the vertical transition is called the 
Franck-Condon region, and the vertical displacement is the Franck-Condon energy. Although the initial 
vibrational state was an eigenstate ofH a , in general it is not an eigenstate ofH b , and starts to evolve as a 
coherent wavepacket. For example, if the excited-state potential energy surface is repulsive, the wavepacket 
will evolve away from the Franck-Condon region toward the asymptotic region of the potential, 
corresponding to separated atomic or molecular fragments (see figure Al.6.6 ). If the excited-state potential is 
bound, the wavepacket will leave the Franck-Condon region, but after half a period reach a classical turning 
point and return to the Franck-Condon region for a complete or partial revival. 
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Figure Al.6.6 The wavepacket picture corresponding to the first-order process described in the text. The 
wavepacket propagates on the ground-state surface until time t^ but since it is an eigenstate of this surface it 
only develops a phase factor. At time ^ a photon impinges and promotes the initial vibrational state to an 
excited electronic state, for which it is not an eigenstate. The state is now a wavepacket and begins to move 
according to the TDSE. Often the ensuing motion is very classical-like, the initial motion being along the 
gradient of the excited-state potential, with recurrences at multiples of the excited-state vibrational period 
(adapted from [32]). 

An alternative perspective is as follows. A S-function pulse in time has an infinitely broad frequency range. 
Thus, the pulse promotes transitions to all the excited-state vibrational eigenstates having good overlap 
(Franck-Condon factors) with the initial vibrational state. The pulse, by virtue of its coherence, in fact 
prepares a coherent superposition of all these excited-state vibrational eigenstates. From the earlier sections, 
we know that each of these eigenstates evolves with a different time-dependent phase factor, leading to 
coherent spatial translation of the wavepacket. 

The 5-function excitation is not only the simplest case to consider; it is the fundamental building block, in the 
sense thatv the more complicated pulse sequences can be interpreted as superpositions of 5-functions, giving 
rise to superpositions ofwavepackets which can in principle interfere. 

The simplest case of this interference is the case of two S-function pulses [22, 23 and 24]: 


£(/') = &(?' - fiJo- 1 ™ 1 ' 1 + S(r' - t 2 )&-™ l ' 1 e ) +. 


(A 1.6.76) 
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We will explore the effect of three parameters: t 2 - 1^ ($ L and §, that is, the time delay between the pulses, the 
tuning or detuning of the carrier frequency from resonance with an excited-state vibrational transition and the 
relative phase of the two pulses. We follow closely the development of [22]. Using equation (Al.6.73) , 


ift 


(A 1.6.77) 


-;T lc e e c + ijt e (A 1.6.78) 

To simplify the notation, we define H b = H b ~ E^ - E Qb , tb L = co^ + a> - ^qq/^J - E^lh, where E^ is the 
vertical displacement between the minimum of the ground electronic state and the minimum of the excited 
electronic state, E^ is the zero point energy on the excited- state surface and 03q = E^lh. Specializing to the 

harmonic oscillator, e~*^ >w * T ${x m 0) = )fr{x,Q), where x = 2tt/co is the excited-state vibrational period, that 
is, any wavefunction in the harmonic oscillator returns exactly to its original spatial distribution after one 
period. To highlight the effect of detuning we write co^ = tzcd + A, where A is the detuning from an excited- 
state vibrational eigenstate, and we examine time delays equal to the vibrational period ^ _ *i = T - We °btain: 

^ (1> (jr t /J = — (c (l/,j) * T e~ i(pr ^ lA, V*^ ])c- (i/B) * ( '" r, M-/ift tf c" iA ")^U t 0) (A 1.6.79) 




(A 1.6.80) 


To illustrate the dependence on detuning, A, time delay, x, and phase difference, (|), we consider some special 
cases, (i) If A = c|) = then the term in parentheses gives 1 + 1 = 2. In this case, the two pulses create two 
wavepackets which add constructively, giving two units of amplitude or four units of excited-state population, 
(ii) If A = and § = ±n then the term in parentheses gives -1 + 1 = 0. In this case, the two pulses create two 
wavepackets which add destructively, giving no excited-state population! Viewed from the point of view of 
the light, this is stimulated emission. Emission against absorption is therefore controlled by the relative phase 
of the second pulse relative to the first, (iii) If A = and § = ±(n/2) then the term in parentheses gives ±i + 1. 

In this case, the excited-state population, (\|/ )|\|/ "'), is governed by the factor (-i + l)(i + 1) = 2. The 
amplitude created by the two pulses overlap, but have no net interference contribution. This result is related to 
the phenomenon of 'photon locking', which was be discussed in section Al. 6. 2 . (iv) If A = co/2 and § = then 
the term in parentheses gives -1 + 1=0. This is the ultrashort excitation counterpart of tuning the excitation 
frequency between vibrational resonances in a single frequency excitation: no net excited-state population is 
produced. As in the case above, of the two pulses n out of phase, the two wavepackets destructively interfere. 
In this case, the destructive interference comes from the offset of the carrier frequency from resonance, 
leading to a phase factor of (oo/2)x = n. For time delays that are significantly different from x the first 
wavepacket is not in the Franck-Condon region when the second packet is promoted to the excited state, and 
the packets do not interfere; two units of population are prepared on the excited state, as in the case of a ±(n/2) 
phase shift. These different cases are summarized in figure Al.6.7 . 
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Figure Al.6.7. Schematic diagram illustrating the different possibilities of interference between a pair of 
wavepackets, as described in the text. The diagram illustrates the role of phase ((a) and (c)), as well as the role 
of time delay (b). These cases provide the interpretation for the experimental results shown in figure Al.6.8. 
Reprinted from [22]. 

Figure Al.6.8 shows the experimental results of Scherer et al of excitation of I 2 using pairs of phase locked 
pulses. By the use of heterodyne detection, those authors were able to measure just the interference 
contribution to the total excited-state fluorescence (i.e. the difference in excited-state population from the two 
units of population which would be prepared if there were no interference). The basic qualitative dependence 
on time delay and phase is the same as that predicted by the harmonic model: significant interference is 
observed only at multiples of the excited- state vibrational frequency, and the relative phase of the two pulses 
determines whether that interference is constructive or destructive. 

There is a good analogy between the effects of pulse pairs and pulse shapes, and Fresnel and Fraunhofer 
diffraction in optics. Fresnel diffraction refers to the interference pattern obtained by light passing through 
two slits; interference from the wavefronts passing through the two slits is the spatial analogue of the 
interference from the two pulses in time discussed above. Fraunhofer diffraction refers to interference arising 
from the finite width of a single slit. The different subportions of a single slit can be thought of as independent 
slits that happen to adjoin; wavefronts passing through each of these subslits will interfere. This is the 
analogue of a single pulse with finite duration: there is interference from excitation coming from different 
subportions of the pulse, which may be insignificant if the pulse is short but can be important for longer pulse 
durations. 
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Figure Al.6.8. Wavepacket interferometry. The interference contribution to the excited-state fluorescence of 
I 2 as a function of the time delay between a pair of ultrashort pulses. The interference contribution is isolated 
by heterodyne detection. Note that the structure in the interferogram occurs only at multiples of 300 fs, the 
excited- state vibrational period of I 2 : it is only at these times that the wavepacket promoted by the first pulse 
is back in the Franck-Condon region. For a phase shift of between the pulses the returning wavepacket and 
the newly promoted wavepacket are in phase, leading to constructive interference (upper trace), while for a 
phase shift of n the two wavepackets are out of phase, and interfere destructively (lower trace). Reprinted 
from Scherer N Fetal 1991 J. Chem. Phys. 95 1487. 

There is an alternative, and equally instructive, way of viewing the effect of different pulse sequences, by 
Fourier transforming the pulse train to the frequency domain. In the time domain, the wavefunction produced 
is the convolution of the pulse sequence with the excited-state dynamics; in frequency it is simply the product 
of the frequency envelope with the Franck-Condon spectrum (the latter is simply the spectrum of overlap 
factors between the initial vibrational state and each of the excited vibrational states). The Fourier transform 
of 5-function excitation is simply a constant excitation in frequency, which excites the entire Franck-Condon 
spectrum. The Fourier transform of a sequence of two 8-functions in time with spacing x is a spectrum having 
peaks with a spacing of 2tt/t. If the carrier frequency of the pulses is resonant and the relative phase between 
the pulses is zero, the frequency spectrum of the pulses will lie on top of the Franck-Condon spectrum and the 
product will be non-zero; if, on the other hand, the carrier frequency is between resonances, or the relative 
phase is n, the frequency spectrum of the pulses will lie in between the features of the Franck-Condon 
spectrum, signifying zero net absorption. Similarly, a single pulse of finite duration may have a frequency 
envelope which is smaller than that of the entire Franck-Condon spectrum. The absorption process will 
depend on the overlap of the frequency spectrum with the Franck-Condon spectrum, and hence on both pulse 
shape and carrier frequency. 
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A1. 6.3.2 SECOND-ORDER AMPLITUDE: CLOCKING CHEMICAL REACTIONS 


We now turn to the second-order amplitude. This quantity is given by [18, 19, 20 and 21] 


{-/i l */:(0}c- ^/s,ff *<''- r, {-ft,,,, fcV))e- (W "- r >U t 0) d/'d/' 


(A 1.6.81) 


This expression may be interpreted in a very similar spirit to that given above for one-photon processes. Now 
there is a second interaction with the electric field and the subsequent evolution is taken to be on a third 
surface, with Hamiltonian H Q . In general, there is also a second-order interaction with the electric field 
through \x b which returns a portion of the excited-state amplitude to surface a, with subsequent evolution on 
surface a. The Feynman diagram for this second-order interaction is shown in figure Al.6.9. 
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Figure Al.6.9. Feynman diagram for the second-order process described in the text. 

Second-order effects include experiments designed to 'clock' chemical reactions, pioneered by Zewail and co- 
workers [25]. The experiments are shown schematically in figure Al. 6. 10 . An initial 100-150 fs pulse moves 
population from the bound ground state to the dissociative first excited state in ICN. A second pulse, time 
delayed from the first then moves population from the first excited state to the second excited state, which is 
also dissociative. By noting the frequency of light absorbed from the second pulse, Zewail can estimate the 
distance between the two excited-state surfaces and thus infer the motion of the initially prepared wavepacket 
on the first excited state ( figure Al.6.10 ). 


-28- 



40 cO 

Displacement in C-l (au) 


<■) 




T 1 1 1 1 1 

fc^lsm i i 

^■MMm* 

— , . O^TMiM 

-»^__ * j - j» i *« 



(b) 


400 


00 1W 300 

Probe pulse delay (fs) 


mo 


Figure Al.6.10. (a) Schematic representation of the three potential energy surfaces of ICN in the Zewail 

experiments, (b) Theoretical quantum mechanical simulations for the reaction ICN — » ICN* — » [I 

CN]"!"*^ i + CN. Wavepacket moves and spreads in time, with its centre evolving about 5 A in 200 fs. 
Wavepacket dynamics refers to motion on the intermediate potential energy surface B. Reprinted from 
Williams S O and Imre D G 1988 J. Phys. Chem. 92 6648. (c) Calculated FTS signal (total fluorescence from 
state C) as a function of the time delay between the first excitation pulse (A-^ B) and the second excitation 
pulse (B — > C). Reprinted from Williams S O and Imre D G, as above. 
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A dramatic set of experiments by Zewail involves the use of femtosecond pulse pairs to probe the wavepacket 
dynamics at the crossing between covalent and ionic states of Nal [25]. A first pulse promotes wavepacket 
amplitude from the ionic to the covalent potential curve. The packet begins to move out, but most of the 
amplitude is reflected back from the crossing between the covalent and ionic curves, that is, the adiabatic 
potential changes character to ionic at large distances, and this curve is bound, leading to wavepacket 
reflection back to the FC region. The result is a long progression of wavepacket revivals, with a slow overall 
decay coming from amplitude which dissociates on the diabatic curve every period. 


Femtosecond pump-probe experiments have burgeoned in the last ten years, and this field is now commonly 
referred to as 'laser femtochemistry' [26, 27, 28 and 29 ]. 

A1. 6.3.3 SPECTROSCOPY AS THE RATE OF ABSORPTION OF MONOCHROMATIC RADIATION 

In this section we will discuss more conventional spectroscopies: absorption, emission and resonance Raman 
scattering. These spectroscopies are generally measured under single frequency conditions, and therefore our 

formulation will be tailored accordingly: we will insert monochromatic perturbations of the form e 1C0T into the 
perturbation theory formulae used earlier in the section. We will then define the spectrum as the time rate of 

change of the population in the final level. The same formulae apply with only minor modifications to 

electronic absorption, emission, photoionization and photodetachment/transition state spectroscopy. If the CW 

perturbation is inserted into the second-order perturbation theory one obtains the formulae for resonance 

Raman scattering, two-photon absorption and dispersed fluorescence spectroscopy. The spectroscopies of this 

section are to be contrasted with coherent nonlinear spectroscopies, such as coherent anti-Stokes Raman 

spectroscopy (CARS) or photon echoes, in which the signal is directional, which will be described in section 

Al.6.4 . 

(A) ELECTRONIC ABSORPTION AND EMISSION SPECTROSCOPY 

Consider the radiation-matter Hamiltonian, equation (Al.6.73) , with the interaction term of the form: 


/*(/) = -ju£(0 = 


^^c-^ J absorption 


«F +■ i - ' (A 1.6.82) 


, J e JWhP emission 


where the incident (scattered) light has frequency co I (co s ) and \x is the (possibly coordinate-dependent) 
transition dipole moment for going from the lower state to the upper state. This form for the matter-radiation 
interaction Hamiltonian represents a light field that is 'on' all the time from -oo to qo. This interaction will 
continuously move population from the lower state to the upper state. The propagating packets on the upper 
states will interfere with one another: constructively, if the incident light is resonant with a transition from an 
eigenstate of the lower surface to an eigenstate of the upper surface, destructively if not. Since for a one- 
photon process we have two potential energy surfaces we have, in effect, two different Hq's: one for before 
excitation (call it HJ and one for after (call it H b ). With this in mind, we can use the results of section Al. 6. 2 
to write down the first-order correction to the unperturbed wavefunction. If |\|/ i (— qo)> is an eigenstate of the 
ground-state Hamiltonian, H , then 
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"(»} = _i f e-^^^^^VEoe-^'e-^^^^'l^f-co))^'. (A 1.6.83) 


Defining c5= EJh + co /? replacing \|/(-oo) by i|/(0), since the difference is only a phase factor, which exactly 
cancels in the bra and ket, and assuming that the electric field vector is time independent, we find 

l^^COI^'CO) = -^ P(^(0)|^ir c- a/ ^^'fo/i|V^(0))c^d^ (A 1.6.84) 

The absorption spectrum, a(co), is the ratio of transition probability per unit time/incident photon flux. The 
incident photon flux is the number of photons per unit area per unit time passing a particular location, and is 


given by 


Ni: E}c 


V KTrfiriJ 

where we have used equation (Al.6.15) . Finally, we obtain [ 12 , 13]: 

2jrfio> d 


^M= : ^ 1 7tt fl, (/)|* (l, (0} (A 1.6.85) 

= -r— / {^(0)|^e H/f ^/x|^-(0»}e^ f df. (A 1.6.86) 


Rotational averaging yields 




<T(w) = — — / (^(O)|^(0)rdf (A 1.6.87) 


where in the last equation we have defined |^(0)} = /i|^(0)} and \MO) = e" /M ^|^(0» 

Since the absorption spectrum is a ratio it is amenable to other interpretations. One such interpretation is that 
the absorption spectrum is the ratio of energy absorbed to energy incident. From this perspective, the quantity 
ftco(d/dO(v (Olv (0) is interpreted as the rate of energy absorption (per unit volume), since d Eld t = fico(d 
N/d i) while the quantity E Q c/fico is interpreted as the incident energy flux, which depends only on the field 
intensity and is independent of frequency. 
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Equation Al.6.87 expresses the absorption spectrum as the Fourier transform of a wavepacket correlation 
function. This is a result of central importance. The Fourier transform relationship between the wavepacket 
autocorrelation function and the absorption spectrum provides a powerful tool for interpreting absorption 
spectra in terms of the underlying nuclear wavepacket dynamics that follows the optically induced transition. 
The relevant correlation function is that of the moving wavepacket on the excited-state potential energy 
surface (or more generally, on the potential energy surface accessed by interaction with the light) with the 
stationary wavepacket on the ground-state surface (more generally, the initial wavepacket on the potential 
surface of origin), and thus the spectrum is a probe of excited-state dynamics, particularly in the Franck- 
Condon region (i.e. the region accessed by the packet undergoing a vertical transition at t = 0). Since often 
only short or intermediate dynamics enter in the spectrum (e.g. because of photodissociation or radiationless 
transitions to other electronic states) computation of the time correlation function can be much simpler than 
evaluation of the spectrum in terms of Franck-Condon overlaps, which formally can involve millions of 
eigenstates for an intermediate sized molecule. 

We now proceed to some examples of this Fourier transform view of optical spectroscopy. Consider, for 
example, the UV absorption spectrum of C0 2 , shown in figure Al.6.11 . The spectrum is seen to have a long 
progression of vibrational features, each with fairly uniform shape and width. What is the physical 
interpretation of this vibrational progression and what is the origin of the width of the features? The goal is to 
come up with a dynamical model that leads to a wavepacket autocorrelation function whose Fourier transform 


agrees with the spectrum in figure Al.6.11 . figure Al.6.12 gives a plausible dynamical model leading to such 
an autocorrelation function. In (a), equipotential contours of the excited-state potential energy surface of C0 2 
are shown, as a function of the two bond lengths, R^ and R 2 , or, equivalently, as a function of the symmetric 
and antisymmetric stretch coordinates, v and u (the latter are linear combinations of the former). Along the 
axis u = the potential has a minimum; along the axis v = (the local 'reaction path') the potential has a 
maximum. Thus, the potential in the region u = 0, v = has a 'saddle-point'. There are two symmetrically 
related exit channels, for large values of R^ and R 2 , respectively, corresponding to the formation of OC + O 
versus O + CO. figure Al.6.12 (a) also shows the initial wavepacket, which is approximately a two- 
dimensional Gaussian. Its centre is displaced from the minimum in the symmetric stretch coordinate, figure 
A1.6.12(b)-(f) show the subsequent dynamics of the wavepacket. It moves downhill along the v coordinate, 
while at the same time spreading. After one vibrational period in the v coordinate the centre of the wavepacket 
comes back to its starting point in v, but has spread in u ( figure Al. 6. 12(e) )). The resulting wavepacket 
autocorrelation function is shown in figure A1.6.12(right) (a) . At t = the autocorrelation function is 1 . On a 
time scale x b the correlation function has decayed to nearly 0, reflecting the fact that the wavepacket has 
moved away from its initial Franck-Condon location ( figure Al. 6. 12(b) )). At time x e the wavepacket has 
come back to the Franck-Condon region in the v coordinate, and the autocorrelation function has a 
recurrence. However, the magnitude of the recurrence is much smaller than the initial value, since there is 
irreversible spreading of the wavepacket in the u coordinate. Note there are further, smaller recurrences at 
multiples of i e . 


-32- 


aww | I | I | I | I | I | 


2500 — 


50M — 


I 
£ 


1500 


K)00 


500 — 



7OOD0 72000 74000 76QOO 7GGO0 B000O B2QQO S4000 


Figure Al.6.11. Idealized UV absorption spectrum of C0 2 - Note the regular progression of intermediate 
resolution vibrational progression. In the frequency regime this structure is interpreted as a Franck-Condon 


progression in the symmetric stretch, with broadening of each of the lines due to predissociation. Reprinted 
from [31]. 

The spectrum obtained by Fourier transform of figure Al.6.12 (right) (a) is shown in figure Al.6.12 (right) 
(b) . Qualitatively, it has all the features of the spectrum in figure Al.6.1 1 : a broad envelope with resolved 
vibrational structure underneath, but with an ultimate, unresolvable linewidth. Note that the shortest time 
decay, 8, determines the overall envelope in frequency, 1/8; the recurrence time, T, determines the vibrational 
frequency spacing, 2n/T; the overall decay time determines the width of the vibrational features. Moreover, 
note that decays in time correspond to widths in frequency, while recurrences in time correspond to spacings 
in frequency. 
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Figure Al.6.12. Left: A qualitative diagram showing evolution of §(t) on the upper potential surface. Note the 
oscillation along the v (symmetric stretch) coordinate, and the spreading along the u (antisymmetric stretch) 
coordinate. Reprinted from [32], Right: (a) The absolute value of the correlation function, |((|)|<K0)l versus t for 
the dynamical situation shown in figure Al.6.12. (b) The Fourier transform of ((])|(|)(0X giving the absorption 
spectrum. Note that the central lobe in the correlation function, with decay constant 8, gives rise to the overall 
width of the absorption spectrum, on the order of 2tt/S. Furthermore, the recurrences in the correlation on the 
time scale Tgive rise to the oscillations in the spectrum on the time scale 2nlT. Reprinted from [32]. 

Perhaps the more conventional approach to electronic absorption spectroscopy is cast in the energy, rather 
than in the time domain. It is straightforward to show that equation (A 1.6. 8 7) can be rewritten as 


n 


3r7j 


(A 1.6.88) 


Note that if we identify the sum over 8-functions with the density of states, then equation (Al.6.88) is just 
Fermi's Golden Rule, which we employed in section Al.6.1 . This is consistent with the interpretation of the 
absorption spectrum as the transition rate from state i to state n. 
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The coefficients of the 8-function in the sum are called Franck-Condon factors, and reflect the overlap of the 
initial state with the excited-state \\r n at energy E n = h® n (see figure Al.6.13). Formally, equation (Al.6.88) 

gives a 'stick' spectrum of the type shown in figure Al. 6. 13(b); generally, however, the experimental 
absorption spectrum is diffuse, as in figure Al.6.11 . This highlights one of the advantages of the time domain 
approach: that the broadening of the stick spectrum need not be introduced artificially, but arises naturally 
from the decay of the wavepacket correlation function, as we have seen in figure Al.6.11 . 
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Figure Al.6.13. (a) Potential energy curves for two electronic states. The vibrational wavefunctions of the 
excited electronic state and for the lowest level of the ground electronic state are shown superimposed, (b) 
Stick spectrum representing the Franck-Condon factors (the square of overlap integral) between the 
vibrational wavefunction of the ground electronic state and the vibrational wavefunctions of the excited 
electronic state (adapted from [3]). 

The above formulae for the absorption spectrum can be applied, with minor modifications, to other one- 
photon spectroscopies, for example, emission spectroscopy, photoionization spectroscopy and 
photodetachment spectroscopy (photoionization of a negative ion). For stimulated emission spectroscopy, the 
factor of 03x is simply replaced by a> s , the stimulated light frequency; however, for spontaneous emission 

spectroscopy, the prefactor cOj is replaced by the prefactor <i> s . The extra factor of a> s is due to the density of 
states of vacuum field states which induce the spontaneous emission, which increase quadratically with 
frequency. Note that in emission spectroscopy the roles of the ground- and excited-state potential energy 
surfaces are reversed: the initial wavepacket starts from the vibrational ground state of the excited electronic 
state and its spectrum has information on the vibrational eigenstates and potential energy surface of the 
ground electronic state. 

(B) RESONANCE RAMAN SPECTROSCOPY 


We will now look at two-photon processes. We will concentrate on Raman scattering although two-photon 
absorption can be handled using the same approach. In Raman scattering, absorption of an incident photon of 
frequency co T carries 
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the initial wavefunction, \|/., from the lower potential to the upper. The emission of a photon of frequency oe> s 
returns the system to the lower potential, to state \\f,. If a> s = CDj then the scattering is elastic and the process is 
called Rayleigh scattering. Raman scattering occurs when a> s ^ cOj and in that case \|/^ \\f f . The measured 
quantity is the Raman intensity, /(cOpCDg). The amplitudes of the incident and emitted fields are taken as E l 
and i? s ; for simplicity, we begin with the case of stimulated Raman scattering, and then discuss the 
modifications for spontaneous Raman scattering at the end. 

We start from the expression for the second-order wavefunction: 


■ c -i\fti\H h ii'-t") 


»* M) —visls***™^******- (A „ 89) 


where H (H b ) is the Hamiltonian for the lower (upper) potential energy surface and, as before, aij = C0j + (D f . 
In words, equation (Al.6.89) is saying that the second-order wavefunction is obtained by propagating the 
initial wavefunction on the ground-state surface until time f\ at which time it is excited up to the excited 
state, upon which it evolves until it is returned to the ground state at time t\ where it propagates until time t. 
NRT stands for non-resonant term: it is obtained by E l ^ E^ and C0j <-> -a> s , and its physical interpretation is 
the physically counterintuitive possibility that the emitted photon precedes the incident photon, y is the 
spontaneous emission rate. 

If we define a> s = a> s - Sp then we can follow the same approach as in the one-photon case. We now take the 
time derivative of the norm of |\|/ \t)), with the result: 

W[Ci>$— (^■'"(Olf (.'))/|£| x =flj-jais-j y Jft^-Qm)! $ia>f-l-ti>$ - {oji +^j). (A 1.6.90) 

where 

a f i(0H) = I ^/l/ i e" H/ft)%F /i|^r)e" = , c^'d/ + N K'l\ (A 1.6.91) 

Again, NRT is obtained from the first term by the replacement C0j — » -co s . If we define |(|y) = \i\\\f} and 
m v)i — e fllrtf^ then we see that the frequency-dependent polarizability, a^cOj), can be written in the 

following compact form [14]: 

& fi (Wl)= j (0/1^(0)5"^*™" d/ + W«r (A1.6.92) 

The only modification of equation (A 1.6. 90) for spontaneous Raman scattering is the multiplication by the 
density of states of the cavity, equation (Al.6.24) , leading to a prefactor of the form CDjCOg . 
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Equation (A 1.6. 92) has a simple physical interpretation. At t = the initial state, \|/. is multiplied by jli (which 
may be thought of as approximately constant in many cases, the Condon approximation). This product, 
denoted (^., constitutes an initial wavepacket which begins to propagate on the excited-state potential energy 
surface (figure Al.6.14). Initially, the wavepacket will have overlap only with \|/. ? and will be orthogonal to all 
other \\frOn the ground-state surface. As the wavepacket begins to move on the excited state, however, it will 
develop overlap with ground vibrational states of ever-increasing quantum number. Eventually, the 
wavepacket will reach a turning point and begin moving back towards the Franck-Condon region of the 
excited-state surface, now overlapping ground vibrational states in decreasing order of their quantum number. 
These time-dependent overlaps determine the Raman intensities via equation (A 1.6. 92) and equation 
(Al.6.90) . If the excited state is dissociative, then the wavepacket never returns to the Franck-Condon region 
and the Raman spectrum has a monotonically decaying envelope. If the wavepacket bifurcates on the excited 
state due to a bistable potential, then it will only have non-zero overlaps with ground vibrational states which 
are of even parity; the Raman spectrum will then have 'missing' lines. In multidimensional systems, there are 
ground vibrational states corresponding to each mode of vibration. The Raman intensities then 
containinformation about the extent to which different coordinates participate in the wavepacket motion, to 
what extent, and even in what sequence [15, 33 ]. Clearly, resonance Raman intensities can be a sensitive 
probe of wavepacket dynamics on the excited-state potential. 
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Figure Al.6.14. Schematic diagram showing the promotion of the initial wavepacket to the excited electronic 
state, followed by free evolution. Cross-correlation functions with the excited vibrational states of the ground- 
state surface (shown in the inset) determine the resonance Raman amplitude to those final states (adapted 
from [14]. 
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One of the most interesting features of the Raman spectrum is its dependence on the incident light frequency, 
(Dp When (Dj is on resonance with the excited electronic state, the scattering process closely resembles a 
process of absorption followed by emission. However, as co T is detuned from resonance there are no longer 


any nearby eigenstates, and thus no absorption: the transition from the initial state i to the final state/is a 
'scattering' process. In the older literature the non-existent intermediate state was called a 'virtual' state. 

There can be no completely rigorous separation between absorption-emission and Raman scattering. This is 
clear from the time-domain expression, equation (A 1.6. 92) , in which the physical meaning of the variable t is 
the time interval between incident and emitted photons. If the second photon is emitted long after the first 
photon was incident the process is called absorption/emission. If the second photon is emitted almost 
immediately after the first photon is incident the process is called scattering. The limits on the integral in 
(Al.6.92) imply that the Raman amplitude hasc contributions from all values of this interval ranging from 
(scattering) to oo (absorption/emission). However, the regions that contribute most depend on the incident 
light frequency. In particular, as the incident frequency is detuned from resonance there can be no absorption 
and the transition becomes dominated by scattering. This implies that as the detuning is increased, the relative 
contribution to the integral from small values of t is greater. 

Mathematically, the above observation suggests a time-energy uncertainty principle [15]. If the incident 
frequency is detuned by an amount Aco from resonance with the excited electronic state, the wavepacket can 
'live' on the excited state only for a time x « 1/Aco (see figure Al. 6. 15 . This follows from inspection of the 
integral in equation (Al.6.92) : if the incident light frequency is mismatched from the intrinsic frequencies of 
the evolution operator, there will be a rapidly oscillating phase to the integrand. Normally, such a rapidly 
oscillating phase would kill the integral completely, but there is a special effect that comes into play here, 
since the lower bound of the integral is and not -oo. The absence of contributions from negative t leads to an 
incomplete cancellation of the portions of the integral around t = 0. The size of the region around t = is 
inversely proportional to the mismatch in frequencies, Aco. Since the physical significance of t is time delay 
between incident and scattered photons, and this time delay is the effective wavepacket lifetime in the excited 
state, we are led to conclude that the effective lifetime decreases as the incident frequency is detuned from 
resonance. 


Because of the two frequencies, C0j and a> s , that enter into the Raman spectrum, Raman spectroscopy may be 
thought of as a 'two-dimensional' form of spectroscopy. Normally, one fixes CDj and looks at the intensity as a 
function of a> s ; however, one may vary cOj and probe the intensity as a function of cOj 
Raman excitation profile. 


co s . This is called a 
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Figure Al.6.15. Schematic diagram, showing the time-energy uncertainty principle operative in resonance 
Raman scattering. If the incident light is detuned from resonance by an amount Aco, the effective lifetime on 
the excited-state is x « 1/Aco (adapted from [15]). 

The more conventional, energy domain formula for resonance Raman scattering is the expression by 
Kramers-Heisenberg-Dirac (KHD). The differential cross section for Raman scattering into a solid angle dQ 
can be written in the form 

— gi-li = -±j*<|(«/,M«r)l 2 > (A 1-6.93) 

where 

(tfyi)^(WrJ - > 7—rz —7^ — ^—7^ + — -7= r 7^ ■"" n (A 1 .6.94) 

and the angular brackets indicate orientational averaging. The labels ej s refer to the direction of polarization 
of the incident and scattered light, respectively, while the subscripts p and X refer to x, y and z components of 
the vector / J . Integrated over all directions and polarizations one obtains [33, 34]: 

Vji(w\) = g 4 ' 2^ l^^l (A 1.6.95) 
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Equation (A 1.6. 94) is called the KHD expression for the polarizability, a. Inspection of the denominators 
indicates that the first term is the resonant term and the second term is the non-resonant term. Note the 
product of Franck-Condon factors in the numerator: one corresponding to the amplitude for excitation and the 
other to the amplitude for emission. The KHD formula is sometimes called the 'sum-over-states' formula, 
since formally it requires a sum over all intermediate states j, each intermediate state participating according 
to how far it is from resonance and the size of the matrix elements that connect it to the states \|/. and \\f* The 
KHD formula is fully equivalent to the time domain formula, equation (A 1.6. 92) , and can be derived from the 
latter in a straightforward way. However, the time domain formula can be much more convenient, particularly 
as one detunes from resonance, since one can exploit the fact that the effective dynamic becomes shorter and 
shorter as the detuning is increased. 


A1.6.4 COHERENT NONLINEAR SPECTROSCOPY 

As described at the end of section Al. 6.1 , in nonlinear spectroscopy a polarization is created in the material 
which depends in a nonlinear way on the strength of the electric field. As we shall now see, the microscopic 
description of this nonlinear polarization involves multiple interactions of the material with the electric field. 
The multiple interactions in principle contain information on both the ground electronic state and excited 
electronic state dynamics, and for a molecule in the presence of solvent, information on the molecule-solvent 
interactions. Excellent general introductions to nonlinear spectroscopy may be found in [35, 36 and 37 ]. 
Raman spectroscopy, described at the end of the previous section, is also a nonlinear spectroscopy, in the 
sense that it involves more than one interaction of light with the material, but it is a pathological example 
since the second interaction is through spontaneous emission and therefore not proportional to a driving field 


and not directional; at the end of this section we will connect the present formulation with Raman 
spectroscopy [38]. 

What information is contained in nonlinear spectroscopy? For gas-phase experiments, that is, experiments in 
which the state of the system undergoes little or no dissipation, the goal of nonlinear spectroscopy is generally 
as in linear spectroscopy, that is, revealing the quantum energy level structure of the molecule, both in the 
ground and the excited electronic state(s). For example, two-photon spectroscopy allows transitions that are 
forbidden due to symmetry with one photon; thus the two-photon spectrum allows the spectroscopic study of 
many systems that are otherwise dark. Moreover, nonlinear spectroscopy allows one to access highly excited 
vibrational levels that cannot be accessed by ordinary spectroscopies, as in the example of time-dependent 
CARS spectroscopy below. Moreover, nonlinear spectroscopy has emerged as a powerful probe of molecules 

in anisotropic environments, for example, molecules at interfaces, where there is a P@' signal which is absent 
for molecules in an isotropic environment. 

A feature of nonlinear spectroscopy which is perhaps unique is the ability to probe not only energy levels and 
their populations, but to probe directly coherences, be they electronic or vibrational, via specially designed 
pulse sequences. For an isolated molecule this is generally uninteresting, since in the absence of relaxation the 
coherences are completely determined by the populations; however, for a molecule in solution the decay of 
the coherence is an indicator of molecule-solvent interactions. One normally distinguishes two sources of 
decay of the coherence: inhomogeneous decay, which represents static differences in the environment of 
different molecules; and homogeneous decay, which represents the dynamics interaction with the 
surroundings and is the same for all molecules. Both these sources of decay contribute to the linewidth of 
spectral lines; in many cases the inhomogenous decay is faster than the homogeneous decay, masking the 

latter. In echo spectroscopies, which are related to a particular subset of diagrams in P@\ one can at least 
partially discriminate between homogeneous and inhomogeneous decay. 
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From the experimental point of view, nonlinear spectroscopy has the attractive feature of giving a directional 
signal (in a direction other than that of any of the incident beams), and hence a background free signal (figure 
Al.6.16). A significant amount of attention is given in the literature on nonlinear spectroscopy to the 
directionality of the signals that are emitted in different directions, and their dynamical interpretation. As we 
shall see, many dynamical pathways can contribute to the signal in each direction, and the dynamical 
interpretation of the signal depends on sorting out these contributions or designing an experiment which 
selects for just a single dynamical pathway. 



Figure Al.6.16. Diagram showing the directionality of the signal in coherent spectroscopy. Associated with 
the carrier frequency of each interaction with the light is a wavevector, k. The output signal in coherent 
spectroscopies is determined from the direction of each of the input signals via momentum conservation (after 
[48a]). 


A1.6.4.1 GENERAL DEVELOPMENT 

As discussed in section Al. 6.1 , on a microscopic quantum mechanical level, within the dipole approximation, 
the polarization, P{i), is given by 

P{t)= {\}f \fi\ifr}. (A 1.6.96) 

Assuming that the system has no permanent dipole moment, the existence of P(t) depends on a non-stationary 
\|/ induced by an external electric field. For weak fields, we may expand the polarization in orders of the 
perturbation, 

P(0 = {^ImI^} = P m (0 + P iU U) + P i2) (0 + P m (0 + - ' - (A 1.6.97) 

We can then identify each term in the expansion with one or more terms in the perturbative expansion of 

P m (l) = {*ff i0 \t)\i±\$ m {i)} (A 1.6.98) 

P i]] (t) = (^' 0> (0|;*|^- l) tO}^cc (A 1.6.99) 

P 2l (f) = (itf m U)hl\V ay U)\ l-CJC + ^COl** * W GM (A 1.6.100) 

and 

P 0t (t) = {^ (0) (0lwl*' 3) (O) + «; + <^ (l) (l)[MlV' t2) W) ,t cc (A 1.6.101) 
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etc. Note that for an isotropic medium, terms of the form p( \t) (P^\t), F*\t), etc) do not survive 
orientational averaging. For example, the first term, (v/°)||li|\|/ )), is the permanent dipole moment, which 
gives zero when averaged over an isotropic medium. At an interface, however (e.g. between air and water), 
these even orders of P(t) do not vanish, and in fact are sensitive probes of interface structure and dynamics. 

The central dynamical object that enters into the polarization are the coherences of the form (\|/°)(Y)||i |v|/ \t)) 
and (\|/ \t)\[i\\\f( \t)), etc. These quantities are overlaps between wavepackets moving on different potential 
energy surfaces [40, 41 and 42, 52]: the instantaneous overlap of the wavepackets creates a non- vanishing 
transition dipole moment which interacts with the light. This view is appropriate both in the regime of weak 
fields, where perturbation theory is valid, and for strong fields, where perturbation theory is no longer valid. 
Note that in the previous sections we saw that the absorption and Raman spectra were related to 

4;W (n\v l^andJr^ W'W ^\ The coherences that appear in equation (Al.6.99) and equation 
(Al.6.101) are precisely equivalent to these derivatives: the rate of change of a population is proportional to 
the instantaneous coherence, a relationship which can be observed already in the vector precession model of 
the two-level system ( section Al. 6. 2. 3 ). 

The coherences can be written compactly using the language of density matrices. The total polarization is 
given by 

P = Tt(PV) = P m (t) + P m (t) + P ih (t) + P m (t) + ■ ■ ■ . (A 1.6.102) 

where the different terms in the perturbative expansion of P are accordingly as follows: 


P°* = Trip * fi) P i2 - =Tr(p (2 V) ^ =TiW)ctc. 


(A 1.6.103) 


In the absence of dissipation and pure state initial conditions, equation (Al.6.102) and equation (Al.6.103) 
are equivalent to equation (Al. 6.97) , (Al.6.98) , (Al.6.99) , (Al.6.100) and (Al.6.101) . But equation 
(Al.6.102) and equation (Al.6.103) are more general, allowing for the possibility of dissipation, and hence 
for describing nonlinear spectroscopy in the presence of an environment. There is an important caveat 
however. In the presence of an environment, it is customary to define a reduced density matrix which 
describes the system, in which the environment degrees of freedom have been traced out. The tracing out of 
the environment should be performed only at the end, after all the interactions of the system environment with 
the field, otherwise important parts of the nonlinear spectrum (e.g. phonon sidebands) will be missing. The 
tracing of the environment at the end can be done analytically if the system is a two-level system and the 
environment is harmonic, the so-called spin-boson or Brownian oscillator model. However, in general the 
dynamics in the full system-environment degrees of freedom must be calculated, which essentially entails a 
return to a wavefunction description, equation (Al.6.97) , equation (Al.6.98) , equation (Al.6.99) , equation 
(Al.6.100) and equation (Al.6.101) , but in a larger space. 

The total of three interactions of the material with the field can be distributed in several different ways 
between the ket and the bra (or more generally, between left and right interactions of the field with the density 
operator). For example, the first term in equation (Al.6.101) corresponds to all three interactions being with 
the ket, while the second term corresponds to two interactions with the ket and one with the bra. The second 
term can be further subdivided into three possibilities: that the single interaction with the bra is before, 
between or after the two interactions with the ket (or correspondingly, left/right interactions of the field with 

the density operator) [37]. These different contributions to P^ (or, equivalently, to p( 3 )) are represented 
conveniently using double-sided Feynman diagrams, a generalization of the single-sided Feynman diagrams 
introduced in section Al. 6. 3 , as shown in figure Al.6.17 . 
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Figure Al.6.17. Double-sided Feynman diagrams, showing the interaction time with the ket (left) and the bra 
(right). Time moves forward from down to up (adapted from [36]). 


The subdivision of the second term into three possibilities has an interesting physical interpretation. The 
ordering of the interactions determines whether diagonal vs off-diagonal elements of the density matrix are 
produced: populations versus coherences. In the presence of relaxation processes (dephasing and population 
relaxation) the order of the interactions and the duration between them determines the duration for which 
population versus coherence relaxation mechanisms are in effect. This can be shown schematically using a 
Liouville space diagram, figure Al.6.18 [37]. The different pathways in Liouville space are drawn on a lattice, 
where ket interactions are horizontal steps and bra interactions are vertical. The diagonal vertices represent 
populations and the off-diagonal vertices are coherences. The three different time orderings for contributions 
to |\|/ ') (\|/ )| correspond to the three Liouville pathways shown in figure Al.6.18 . From such a diagram one 


sees at a glance which pathways pass through intermediate populations (i.e. diagonal vertices) and hence are 
governed by population decay processes, and which pathways do not. 
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Figure Al.6.18. Liouville space lattice representation in one-to-one correspondence with the diagrams in 
figure A 1.6. 17. Interactions of the density matrix with the field from the left (right) is signified by a vertical 
(horizontal) step. The advantage to the Liouville lattice representation is that populations are clearly identified 
as diagonal lattice points, while coherences are off-diagonal points. This allows immediate identification of 
the processes subject to population decay processes (adapted from [37]). 
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As a first application of the lattice representation of Liouville pathways, it is interesting to re-examine the 
process of electromagnetic spontaneous light emission, discussed in the previous section. Note that formally, 
diagrams Al.6.18 ( all contribute to the Kramers-Heisenberg-Dirac formula for resonance Raman scattering. 
However, diagrams (b) and (c) produce an excited electronic state population (both the bra and ket are excited 
in the first two interactions) and hence are subject to excited-state vibrational population relaxation processes, 
while diagram (d) does not. Typically, in the condensed phase, the fluorescence spectrum consists of sharp 
lines against a broad background. Qualitatively speaking, the sharp lines are associated with diagram (d), and 
are called the resonance Raman spectrum, while the broad background is associated with diagrams (a) and 
(ft), and is called the resonance fluorescence spectrum [38]. Indeed, the emission frequency of the sharp lines 
changes with the excitation frequency, indicating no excited electronic state population relaxation, while the 
broad background is independent of excitation frequency, indicating vibrationally relaxed fluorescence. 

There is an aspect of nonlinear spectroscopy which we have so far neglected, namely the spatial dependence 
of the signal. In general, three incident beams, described by k- vectors k^ k 2 and k^ will produce an outgoing 
beam at each of the directions: 


Aw = ifci ± k-> ± k-\ 


(A 1.6.104) 


Figure Al. 6. 19 shows eight out of the 48 Feynman diagrams that contribute to an outgoing k- vector at -k^ + 
k 2 + ky The spatial dependence is represented by the wave vector k on each of the arrows in figure Al.6.19 . 
Absorption (emission) by the ket corresponds to a plus (minus) sign of k; absorption (emission) by the bra 
corresponds to a minus (plus) sign of k. The eight diagrams shown dominate under conditions of electronic 
resonance; the other 40 diagrams correspond to non-resonant contributions, involving emission before 
absorption. The reason there are eight resonant diagrams now, instead of the four in figure Al. 6. 17 , is a result 
of the fact that the introduction of the ^-dependence makes the order of the interactions distinguishable. At the 
same time, the ^-dependence of the detection eliminates many additional processes that might otherwise 


contribute; for example, detection at -k^ + k 2 + & 3 eliminates processes in which A^ and k 2 are interchanged, 
as well as processes representing two or more interactions with a single beam. Under conditions in which the 
interactions have a well-defined temporal sequence, just two diagrams dominate, while two of the diagrams in 
figure Al.6.17 are eliminated since they emit to k^ - k 2 + ky Below we will see that in resonant CARS, 
where in addition to the electronic resonance there is a vibrational resonance after the second interaction, there 
is only a single resonant diagram. All else being equal, the existence of multiple diagrams complicates the 
interpretation of the signal, and experiments that can isolate the contribution of individual diagrams have a 
better chance for complete interpretation and should be applauded. 
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Figure Al.6.19. Eight double-sided Feynman diagrams corresponding to electronic resonance and emission at 
-k^ + k 2 + ky Absorption is shown by an incoming arrow, while emission is indicated by an outgoing arrow. 
Note that if an arrow is moved from (to) the ket to (from) the bra while changing from (to) absorption to 
(from) emission, the slope of the arrow and therefore its k- vector will be unchanged. The eight diagrams are 
for arbitrary time ordering of the interactions; with a fixed time ordering of the interactions, as in the case of 
non-overlapping pulses, only two of the diagrams survive (adapted from [48]). 

A1. 6.4.2 LINEAR RESPONSE 

We now proceed to the spectrum, or frequency-dependent response [ 41 , 42]. The power, or rate of energy 
absorption, is given by 




(A 1.6.105) 
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(In the second step we have used equation (A 1.6. 72) and noted that the terms involving d\\f/d t cancel.) To 
lowest order, this gives 

P = -2R^{P^E*}. (A 1.6.106) 

The total energy absorbed, AE, is the integral of the power over time. Keeping just the lowest order terms we 
find 

P6t = -2RC / Puintttttit = 21m / 'oP^U^E'l (,))<!(» (A 1.6.107) 

where 

Pj, l V) = ( P^(r)^dt (A 1.6.108) 


and 


-£ 


£(<o) = I E(t)e dt. (A 1.6.109) 


The last relation in equation (Al.6.107) follows from the Fourier convolution theorem and the property of the 
Fourier transform of a derivative; we have also assumed that E(($) = E(-($). The absorption spectrum is 
defined as the total energy absorbed at frequency co, normalized by the energy of the incident field at that 
frequency. Identifying the integrand on the right-hand side of equation (Al.6.107) with the total energy 
absorbed at frequency co, we have 

aM= l^ = ^!^£». (A ,.6,10, 

| £(*>) | 2 3f7i \E(&)\ 2 

Note the presence of the co prefactor in the absorption spectrum, as in equation (Al.6.87) ; again its origin is 
essentially the faster rate of the change of the phase of higher frequency light, which in turn is related to a 
higher rate of energy absorption. The equivalence between the other factors in equation (Al.6.1 10) and 
equation (Al.6.87) under linear response will now be established. 

In the perturbative regime one may decompose these coherences into the contribution from the field and a part 
which is intrinsic to the matter, the response function. For example, note that the expression P^\t) = (\|/°) 
(OlM-IV (0) i s not simply an intrinsic function of the molecule: it depends on the functional form of the field, 
since \|/ \t) does. However, since the dependence on the field is linear it is possible to write Pqi as a 
convolution of the field with a response function which depends on the material. Using the definition of \|/ \ 




e- i ^ ( '-' ,, {-^ira')k" i£ ' n '> ,0 Mf' (a 1.6.111) 
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we find that 


I. f* 
p^(t) = — j ^ (W (f)|//e-' J * (r - ,r) (-/i£(f')Je- £ »' i |^ (0> }df' (A 1.6.112) 


Wo 


X 

V 


(^^(Ol^C-^'/il^* 01 )^^ - *"><** (A 1.6.113) 

= '-{KiORSmiOl (A 1.6.114) 


where SqqCO is the half or causal form of the autocorrelation function: 


Coo(0 ?>0 (A 1.6.1 15) 

r<0 (A 1.6.116) 


and ® signifies convolution. We have defined the wavepacket autocorrelation function 

CdDs^^liir^/il^l. (A 1.6.117) 

where C 00 (7) is just the wavepacket autocorrelation function we encountered in section Al. 6. 3. 3 . There we 
saw that the Fourier transform of C^(t) is proportional to the linear absorption spectrum. The same result 
appears here but with a different interpretation. There, the correlation function governed the rate of excited- 
state population change. Here, the expectation value of the dipole moment operator with the correlation 
function is viewed as the response function of the molecule. 

By the Fourier convolution theorem 

Pil\io) = [ P^W'df = J^AV)-Voo(w)L (A 1.6.118) 


-5oo(w)L (A 1.6.1 19) 


Using the definition of the susceptibility, % ( equation (A 1.6. 30) ) we see that 

x (l V) = 

Substituting P^\co) into equation (A 1.6. 110) we find that the linear absorption spectrum is given by 

ff(u) = -r-pKetW^f (A 1.6.120) 


_2ttoj r 


i-TJi' 


CmU^'d! (A 1.6.121) 
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in agreement with equation (Al.6.87) . We also find that 


fTito) = ^Im ix {l H*>)} (A 1.6.122) 

3cft 

establishing the result in section Al. 6. 1.4 that the absorption spectrum is related to the imaginary part of the 
susceptibility % at frequency co. 

A1.6.4.3 NONLINEAR RESPONSE: ISOLATED SYSTEMS 

As discussed above, the nonlinear material response, P^\i) is the most commonly encountered nonlinear term 
since P^ vanishes in an isotropic medium. Because of the special importance of P@' we will discuss it in 
some detail. We will now focus on a few examples ofP@> spectroscopy where just one or two of the 48 
double-sided Feynman diagrams are important, and will stress the dynamical interpretation of the signal. A 
pictorial interpretation of all the different resonant diagrams in terms of wavepacket dynamics is given in [41]. 

COHERENT ANTI-STOKES RAMAN SPECTROSCOPY (CARS) 

Our first example of a p( ) signal is coherent anti-Stokes Raman spectroscopy, or CARS. Formally, the 
emission signal into direction k = k^ - k 2 + k^ has 48 Feynman diagrams that contribute. However, if the 

frequency co 1 is resonant with the electronic transition from the ground to the excited electronic state, and the 
mismatch between frequencies co 1 and co 2 is resonant with a ground-state vibrational transition or transitions, 
only one diagram is resonant, namely, the one corresponding to R 6 in figure Al.6.19 (with the interchange of 
labels k^ and k 2 ). 

To arrive at a dynamical interpretation of this diagram it is instructive to write the formula for the dominant 
term in P@' explicitly: 

= ^ r ah r Mil dJi^'^niu^-^'-^* (a 1.6.123) 

x{^£>a2)}e" i ^ 2 "' l)/ ^^£i(r ] )]e- i ^' l ]^ (C * ) } (A 1.6.124) 

where in the second line we have substituted explicitly for the third-order wavefunction, \|/ 3 '(t). This formula, 
although slightly longer than the formulae for the first- and second-order amplitude discussed in the previous 

section, has the same type of simple dynamical interpretation. The initial wavepacket, \|/°) interacts with the 
field at time ^ and propagates on surface b for time t 2 - t^; at time t 2 it interacts a second time with the field 
and propagates on the ground surface a for time t^-t 2 ; at time t^ it interacts a third time with the field and 
propagates on surface b until variable time t. The third-order wavepacket on surface b is projected onto the 
initial wavepacket on the ground state; this overlap 
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is a measure of the coherence which determines both the magnitude and phase of the CARS signal. Formally, 
the expression involves an integral over three time variables, reflecting the coherent contribution of all 
possible instants at which the interaction with the light took place, for each of the three interactions. However, 
if the interaction is with pulses that are short compared with a vibrational period, as we saw in equation 
(Al.6.76) , one can approximate the pulses by 8-functions in time, eliminate the three integrals and the simple 
dynamical interpretation above becomes precise. 


Qualitatively, the delay between interaction 1 and 2 is a probe of excited-state dynamics, while the delay 
between interaction 2 and 3 reflects ground-state dynamics. If pulses 1 and 2 are coincident, the combination 
of the first two pulses prepares a vibrationally excited wavepacket on the ground-state potential energy 
surface; the time delay between pulses 2 and 3 then determines the time interval for which the wavepacket 
evolves on the ground-state potential, and is thus a probe of ground-state dynamics [43, 45, 52]. If a second 
delay, the delay between pulses 1 and 2, is introduced this allows large wavepacket excursions on the excited 
state before coming back to the ground state. The delay between pulses 1 and 2 can be used in a very precise 
way to tune the level of ground-state vibrational excitation, and can prepare ground vibrational wavepackets 
with extremely high energy content [44]. The sequence of pulses involving one against two time delays is 
shown in figure Al.6.20 (a) and figure Al. 6. 20(b) . The control over the vibrational energy content in the 
ground electronic state via the delay between pulses 1 and 2 is illustrated in figure Al.6.20 (right). 
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Figure Al.6.20. (Left) Level scheme and nomenclature used in (a) single time-delay CARS, (b) Two-time 

delay CARS ((TD) CARS). The wavepacket is excited by co , then transferred back to the ground state by 
© st with Raman shift co R . Its evolution is then monitored by co (after [44])- (Right) Relevant potential energy 
surfaces for the iodine molecule. The creation of the wavepacket in the excited state is done by co . The 
transfer to the final state is shown by the dashed arrows according to the state one wants to populate (after 
[44]). 


-49- 


SiimuliwdEimisiGTi 



Excited StaiE Ab*oq*iQn 



Figure Al.6.21. Bra and ket wavepacket dynamics which determine the coherence overlap, ((p 1 ^^ 2 )). 
Vertical arrows mark the transitions between electronic states and horizontal arrows indicate free propagation 
on the potential surface. Full curves are used for the ket wavepacket, while dashed curves indicate the bra 
wavepacket. (a) Stimulated emission, (b) Excited state (transient) absorption (from [41]). 
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Figure Al.6.22 (a) Sequence of pulses in the canonical echo experiment, (b) Polarization versus time for the 
pulse sequence in (a), showing an echo at a time delay equal to the delay between the excitation pulses. 

(B) STIMULATED RAMAN AND DYNAMIC ABSORPTION SPECTROSCOPY 

In CARS spectroscopy, co 1 = a> 3 , and a> 2 is generally different and of lower frequency. If co 1 = a> 2 = co 3 the 
process is called degenerate four-wave mixing (DFWM). Now, instead of a single diagram dominating, two 
diagrams participate if the pulses are non-overlapping, four dominate if two pulses overlap and all eight 
resonant diagrams contribute if all three pulses overlap (e.g., in continuous wave excitation) [43, 46]. The 

additional diagrams correspond to terms of the form (\|/ \0IM-Iv (0) discussed above; this is the overlap of a 
second-order wavepacket on the ground-state surface with a first-order wavepacket on the excited-state 
surface. These new diagrams come in for two reasons. First, even if the pulses are non-overlapping, the 
degeneracy of the first two interactions allows the second interaction to produce an absorption, not just 
emission. If the pulses are overlapping there is the additional flexibility of interchanging the order of pulses 1 
and 2 (at the same time exchanging their role in producing absorption versus emission). The contribution of 

these additional diagrams to the P^ signal is not simply additive, but there are interference terms among all 
the contributions, considerably complicating the interpretation of the signal. Diagrams ^-^ 4 are commonly 

referred to as stimulated Raman scattering: the first two interactions produce an excited-state population while 

the last interaction produces stimulated emission back to the ground electronic state. 

A process which is related diagrammatically to stimulated Raman scattering is transient absorption 
spectroscopy. In an ordinary absorption spectrum, the initial state is typically the ground vibrational eigenstate 
of the ground electronic state. Dynamic absorption spectroscopy refers to the excitation of a vibrational 
wavepacket to an electronic state b via a first pulse, and then the measurement of the spectrum of that moving 
wavepacket on a third electronic state c as function of time delay between the pump and the probe. The time 
delay controls the instantaneous wavepacket on state b whose spectrum is being measured with the second 
pulse; in an ideal situation, one may obtain 'snapshots' of the wavepacket on electronic b as a function of 
time, by observing its shadow onto surface c. This form of spectroscopy is very similar in spirit to the pump- 
probe experiments of Zewail et al [25], described in section Al. 6. 3. 2 , but there are two differences. First, the 
signal in a dynamic absorption spectrum is a coherent signal in the direction of the probe 
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pulse (pulse 3), as opposed to measuring fluorescence from state c, which is non-directional. Second, field 
intensity in the direction of the probe pulse can be frequency resolved to give simultaneous time and 


frequency resolution of the transient absorption. Although in principle the fluorescence from state c can also 
be frequency resolved, this fluorescence takes place over a time which is orders of magnitude longer than the 
vibrational dynamics of interest and the signal contains a complicated combination of all excited- and ground- 
state frequency differences. 

The dynamic absorption signal, P@\ can be written in a form which looks analogous to the linear absorption 
signal P^ (see equation (Al.6.113) ), 


P L V) = - / ^ (h (/')|/*e" ,ft - (| - nffl ^|^r fn (0}£{/ i )d^ (A 1.6.125) 

ft J-^ 

However, because of the f dependence in \\f^\f) one cannot write that P^ = E(t) ® S^(t). For the latter to 
hold, it is necessary to go to the limit of a probe pulse which is short compared with the dynamics on surface 

1. In this case, \|/ \f) is essentially frozen and we can write \|/ ) « \\f (\ where we have indicated explicitly 
the parametric dependence on the pump-probe delay time, x. In this case, equation (Al.6.125) is isomorphic 
with equation (Al.6.113) , indicating that under conditions of impulsive excitation, dynamic absorption 

spectroscopy is just first-order spectroscopy on the frozen state, § ( \ on surface c. Note the residual 
dependence of the frozen state on x, the pump-probe delay, and thus variation of the variables (co, x) generates 
a two-dimensional dynamic absorption spectrum. Note that the pair of variables (co, x) are not limited by some 
form of time-energy uncertainty principle. This is because, although the absorption is finished when the probe 
pulse is finished, the spectral analysis of which frequency components were absorbed depends on the full time 
evolution of the system, beyond its interaction with the probe pulse. Thus, the dynamic absorption signal can 
give high resolution both in time (i.e. time delay between pump and probe pulses) and frequency, 
simultaneously. 

A1. 6.4.4 NONLINEAR RESPONSE: SYSTEMS COUPLED TO AN ENVIRONMENT 

(A) ECHO SPECTROSCOPY 

In discussing spectroscopy in condensed phase environments, one normally distinguishes two sources of 
decay of the coherence: inhomogeneous decay, which represents static differences in the environment of 
different molecules, and homogeneous decay, which represents the dynamics interaction with the 
surroundings and is the same for all molecules. Both these sources of decay contribute to the linewidth of 
spectral lines; in many cases the inhomogenous decay is faster than the homogeneous decay, masking the 

latter. In echo spectroscopies, which are related to a particular subset of diagrams in P@\ one can at least 
partially discriminate between homogeneous and inhomogeneous decay. 

Historically, photon echoes grew up as optical analogues of spin echoes in NMR. Thus, the earliest photon 
echo experiments were based on a sequence of two excitation pulses, a n/2 pulse followed by a n pulse, 
analogous to the pulse sequence used in NMR. Conceptually, the n/2 pulse prepares an optical coherence, 
which will proceed to dephase due to both homogeneous and inhomogeneous mechanisms. After a delay time 
x, the n pulse reverses the role of the excited and ground electronic states, which causes the inhomogeneous 
contribution to the dephasing to reverse itself but does not affect the homogeneous decay. The reversal of 
phases generated by the n -pulse has been described in many colourful ways over the years (see figure 
Al.6.23 ). 
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Figure Al.6.23. Schematic representation of dephasing and reversal on a race track, leading to coherent 
rephasing and an 'echo' of the starting configuration. From Phys. Today, (Nov. 1953), front cover. 

Fundamentally, the above description of photon echoes is based on a two-level description of the system. As 
we have seen throughout this article, much of molecular electronic spectroscopy is described using two 
electronic states, albeit with a vibrational manifold in each of these electronic states. This suggests that photon 
echoes can be generalized to include these vibrational manifolds, provided that the echo signal is now defined 
in terms of a wavepacket overlap (or density matrix coherence) involving the coherent superposition of all the 
participating vibrational levels. This is shown schematically in figure Al.6.24 . The tt/2 pulse transfers 50% of 
the wavepacket amplitude to the excited electronic state. This creates a non-stationary vibrational wavepacket 
in the excited electronic state (and generally, the remaining amplitude in the ground electronic state is non- 
stationary as well). After a time delay x a n pulse comes in, exchanging the wavepackets on the ground and 
excited electronic states. The wavepackets continue to evolve on their new respective surfaces. At some later 
time, when the wavepackets overlap, an echo will be observed. This sequence is shown in figure Al.6.24 . 
Note that this description refers only to the isolated molecule; if there are dephasing mechanisms due to the 
environment as well, the echo requires the rephasing in both the intramolecular and the environmental degrees 
of freedom. 
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Figure Al.6.24. Schematic representation of a photon echo in an isolated, multilevel molecule, (a) The initial 
pulse prepares a superposition of ground- and excited-state amplitude, (b) The subsequent motion on the 
ground and excited electronic states. The ground-state amplitude is shown as stationary (which in general it 
will not be for strong pulses), while the excited-state amplitude is non-stationary, (c) The second pulse 
exchanges ground- and excited-state amplitude, (d) Subsequent evolution of the wavepackets on the ground 
and excited electronic states. When they overlap, an echo occurs (after [40]). 

Although the early photon echo experiments were cast in terms of n/2 and n pulses, these precise inversions 
of the population are by no means necessary [36]. In fact echoes can be observed using sequences of weak 

pulses, and can be described within the perturbative P^ formalism which we have used throughout section 
Al.6.4 . Specifically, the diagrams i? 1? R 4 , R 5 and R% in figure A 1.6. 19 correspond to echo diagrams, while the 
diagrams R 2 , R^ R§ and Rj do not. In the widely used Brownian oscillator model for the relaxation of the 
system [37, 48], the central dynamical object is the electronic frequency correlation, 


M{r) = 




(A 1.6.126) 


where Aco(r) = ( <b ) - a>(t). Here (a> ) is the average transition frequency, co(r) is the transition frequency at 
time t, and the brackets denote an ensemble average. It can be shown that as long as M(t) is a monotonically 

decaying function, the diagrams R^ R 4 , R 5 and R% can cause rephasing of P^ while the diagrams R 2 , R^ R^ 
and Rj cannot (see figure Al.6.25 ). 
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Figure Al.6.25. Modulus squared of the rephasing, l^l 2 , (a), and non-rephasing, |^ 2 I 2 > O 3 )' response 
functions versus final time t for a near-critically overdamped Brownian oscillator model M(t). The time delay 
between the second and third pulse, T, is varied as follows: (a) from top to bottom, T= 0, 20, 40, 60, 80, 100, 

oo fs; (b) from bottom to top, T= 0, 20, 40, 60, 80, 100, oo fs. Note that I^J 2 and \R 2 \ 2 are identical at T= oo. 
After [48]. 

It is instructive to contrast echo spectroscopy with single time-delayed CARS spectroscopy, discussed above. 
Schematically, TD-CARS spectroscopy involves the interaction between pulses 1 and 2 being close in time, 
creating a ground-state coherence, and then varying the delay before interaction 3 to study ground-state 
dynamics. In contrast, echo spectroscopy involves an isolated interaction 1 creating an electronic coherence 
between the ground and the excited electronic state, followed by a pair of interactions 2 and 3, one of which 
operates on the bra and the other on the ket. The pair of interactions 2,3 essentially reverses the role of the 
ground and the excited electronic states. If there is any inhomogeneous broadening, or more generally any 
bath motions that are slow compared with the time intervals between the pulses, these modes will show up as 
echo signal after the third pulse is turned off [47]. 

We close with three comments. First, there is preliminary work on retrieving not only the amplitude but also 
the phase of photon echoes [49]. This appears to be a promising avenue to acquire complete 2-dimensional 
time and frequency information on the dynamics, analogous to methods that have been used in NMR. Second, 
we note that there is a growing literature on non-perturbative, numerical simulation of nonlinear 
spectroscopies. In these methods, the consistency of the order of interaction with the field and the appropriate 
relaxation process is achieved automatically, 
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and thus these methods may become a useful alternative to the perturbative formalism [50, 51 ]. Third, there is 


a growing field of single molecule spectroscopy. If the optical response from individual molecules in a 
condensed phase environment is detected, then one has a more direct approach than echo spectroscopy for 
removing the effect of environmental inhomogeneity. Moreover, the spectral change of individual molecules 
can be followed in time, giving data that are masked in even the best echo spectrum. 


A1.6.5 COHERENT CONTROL OF MOLECULAR DYNAMICS 

Not only has there been great progress in making femtosecond pulses in recent years, but also progress has 
been made in the shaping of these pulses, that is, giving each component frequency any desired amplitude and 
phase. Given the great experimental progress in shaping and sequencing femtosecond pulses, the inexorable 
question is: How is it possible to take advantage of this wide possible range of coherent excitations to bring 
about selective and energetically efficient photochemical reactions? Many intuitive approaches to laser 
selective chemistry have been tried since 1980. Most of these approaches have focused on depositing energy 
in a sustained manner, using monochromatic radiation, into a particular state or mode of the molecule. 
Virtually all such schemes have failed, due to rapid intramolecular energy redistribution. 

The design of pulse sequences to selectively control chemical bond breaking is naturally formulated as a 
problem in the calculus of variations [17, 52]. This is the mathematical apparatus for finding the best shape, 
subject to certain constraints. For example, the shape which encloses the maximum area for a given perimeter; 
the minimum distance between two points on a sphere subject to the constraint that the connecting path be on 
the sphere; the shape of a cable of fixed length and fixed endpoints which minimizes the potential energy; the 
trajectory of least time; the path of least action; all these are searches for the best shape, and are problems in 
the classical calculus of variations. In our case, we are searching for the best shape of laser pulse intensity 
against time. If we admit complex pulses this involves an optimization over the real and imaginary parts of the 
pulse shape. We may be interested in the optimal pulse subject to some constraints, for example for a fixed 
total energy in the pulse. 

It turns out that there is another branch of mathematics, closely related to the calculus of variations, although 
historically the two fields grew up somewhat separately, known as optimal control theory (OCT). Although 
the boundary between these two fields is somewhat blurred, in practice one may view optimal control theory 
as the application of the calculus of variations to problems with differential equation constraints. OCT is used 
in chemical, electrical, and aeronautical engineering; where the differential equation constraints may be 
chemical kinetic equations, electrical circuit equations, the Navier-Stokes equations for air flow, or Newton's 
equations. In our case, the differential equation constraint is the TDSE in the presence of the control, which is 
the electric field interacting with the dipole (permanent or transition dipole moment) of the molecule [53, 54, 
55 and 56]. From the point of view of control theory, this application presents many new features relative to 
conventional applications; perhaps most interesting mathematically is the admission of a complex state 
variable and a complex control; conceptually, the application of control techniques to steer the microscopic 
equations of motion is both a novel and potentially very important new direction. 
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A very exciting approach adopted more recently involves letting the laser learn to design its own optimal 
pulse shape in the laboratory [59, 60, 61, 62 and 63]. This is achieved by having a feedback loop, such that the 
increase or decrease in yield from making a change in the pulse is fed back to the pulse shaper, guiding the 
design of the next trial pulse. A particular implementation of this approach is the 'genetic algorithm', in which 
large set of initial pulses are generated; those giving the highest yield are used as 'parents' to produce a new 
'generation' of pulses, by allowing segments of the parent pulses to combine in random new combinations. 


The various approaches to laser control of chemical reactions have been discussed in detail in several recent 
reviews [ 64 , 65 ]. 

A1.6.5.1 INTUITIVE CONTROL CONCEPTS 

Consider the ground electronic state potential energy surface in figure Al.6.26. This potential energy surface, 
corresponding to collinear ABC, has a region of stable ABC and two exit channels, one corresponding to A + 
BC and one to AB + C. This system is the simplest paradigm for control of chemical product formation: a two 
degree of freedom system is the minimum that can display two distinct chemical products. The objective is, 
starting out in a well-defined initial state (v = for the ABC molecule) to design an electric field as a function 
of time which will steer the wavepacket out of channel 1, with no amplitude going out of channel 2, and vice 
versa [ 19 , 52 ]. 



Figure Al.6.26. Stereoscopic view of ground- and excited-state potential energy surfaces for a model 
collinear ABC system with the masses of HHD. The ground-state surface has a minimum, corresponding to 
the stable ABC molecule. This minimum is separated by saddle points from two distinct exit channels, one 
leading to AB + C the other to A + BC. The object is to use optical excitation and stimulated emission 
between the two surfaces to 'steer' the wavepacket selectively out of one of the exit channels (reprinted from 
[54]). 
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We introduce a single excited electronic state surface at this point. The motivation is severalfold. (i) 
Transition dipole moments are generally much stronger than permanent dipole moments, (ii) The difference in 
functional form of the excited and ground potential energy surface will be our dynamical kernel; with a single 
surface one must make use of the (generally weak) coordinate dependence of the dipole. Moreover, the use of 
excited electronic states facilitates large changes in force on the molecule, effectively instantaneously, without 
necessarily using strong fields, (iii) The technology for amplitude and phase control of optical pulses is 
significantly ahead of the corresponding technology in the infrared. 

The object now will be to steer the wavefunction out of a specific exit channel on the ground electronic state, 

using the excited electronic state as an intermediate. Insofar as the control is achieved by transferring 

amplitude between two electronic states, all the concepts regarding the central quantity a introduced above 

•ii • 1 ° 

will now come into play. 


(A) PUMP-DUMP SCHEME 


Consider the following intuitive scheme, in which the timing between a pair of pulses is used to control the 
identity of products [52]. The scheme is based on the close correspondence between the centre of a 
wavepacket in time and that of a classical trajectory (Ehrenfest's theorem). The first pulse produces an excited 
electronic state wavepacket. The time delay between the pulses controls the time that the wavepacket evolves 
on the excited electronic state. The second pulse stimulates emission. By the Franck-Condon principle, the 
second step prepares a wavepacket on the ground electronic state with the same position and momentum, 
instantaneously, as the excited-state wavepacket. By controlling the position and momentum of the 
wavepacket produced on the ground state through the second step, one can gain some measure of control over 
product formation on the ground state. This 'pump-dump' scheme is illustrated classically in figure Al.6.27 . 
The trajectory originates at the ground-state surface minimum (the equilibrium geometry). At t = it is 
promoted to the excited-state potential surface (a two-dimensional harmonic oscillator in this model) where it 
originates at the Condon point, that is, vertically above the ground-state minimum. Since this position is 
displaced from equilibrium on the excited state, the trajectory begins to evolve, executing a two-dimensional 
Lissajous motion. After some time delay, the trajectory is brought down vertically to the ground state 
(keeping both the instantaneous position and momentum it had on the excited state) and allowed to continue 
to evolve on the ground-state, figure Al.6.27 shows that for one choice of time delay it will exit into channel 
1, for a second choice of time delay it will exit into channel 2. Note how the position and momentum of the 
trajectory on the ground state, immediately after it comes down from the excited state, are both consistent 
with the values it had when it left the excited state, and at the same time are ideally suited for exiting out their 
respective channels. 
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Figure Al.6.27. Equipotential contour plots of (a) the excited- and (b), (c) ground-state potential energy 
surfaces. (Here a harmonic excited state is used because that is the way the first calculations were performed.) 
(a) The classical trajectory that originates from rest on the ground-state surface makes a vertical transition to 
the excited state, and subsequently undergoes Lissajous motion, which is shown superimposed, (b) Assuming 
a vertical transition down at time ^ (position and momentum conserved) the trajectory continues to evolve on 
the ground-state surface and exits from channel 1. (c) If the transition down is at time t 2 the classical 
trajectory exits from channel 2 (reprinted from [52]). 


A full quantum mechanical calculation based on these classical ideas is shown in figure Al.6.28 and figure 
Al.6.29 [19]. The dynamics of the two-electronic-state model was solved, starting in the lowest vibrational 
eigenstate of the ground electronic state, in the presence of a pair of femtosecond pulses that couple the states. 
Because the pulses were taken to be much shorter than a vibrational period, the effect of the pulses is to 
prepare a wavepacket on the excited/ground state which is almost an exact replica of the instantaneous 
wavefunction on the other surface. Thus, the first pulse prepares an initial wavepacket which is almost a 
perfect Gaussian, and which begins to evolve on the excited-state surface. The second pulse transfers the 
instantaneous wavepacket at the arrival time of the pulse back to the ground state, where it continues to evolve 
on the ground-state surface, given its position and momentum at the time of arrival from the excited state. For 
one choice of time delay the exit out of channel 1 is almost completely selective ( figure Al.6.28 ), while for a 
second choice of time delay the exit out of channel 2 is almost completely selective ( Al.6.29 . Note the close 
correspondence with the classical model: the wavepacket on the excited state is executing a Lissajous motion 
almost identical with that of the classical trajectory (the wavepacket is a nearly Gaussian wavepacket on a 
two-dimensional harmonic oscillator). On the groundstate, the wavepacket becomes spatially extended but its 
exit channel, as well as the partitioning of energy into translation and vibration (i.e. parallel and perpendicular 
to the exit direction) are seen to be in close agreement with the corresponding classical trajectory. 
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Figure Al.6.28. Magnitude of the excited-state wavefunction for a pulse sequence of two Gaussians with 
time delay of 610 a.u. = 15 fs. (a) t = 200 a.u., (b) t = 400 a.u., (c) t = 600 a.u. Note the close correspondence 
with the results obtained for the classical trajectory (figure Al. 6. 27(a) and (b)). Magnitude of the ground-state 
wavefunction for the same pulse sequence, at (a) t = 0, (b) t = 800 a.u., (c) t = 1000 a.u. Note the close 
correspondence with the classical trajectory of figure Al. 6. 27(c)). Although some of the amplitude remains in 
the bound region, that which does exit does so exclusively from channel 1 (reprinted from [52]). 
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Figure Al.6.29. Magnitude of the ground- and excited-state wavefunctions for a sequence of two Gaussian 
pulses with time delay of 810 a.u. (upper diagram) excited-state wavefunction at 800 a.u., before the second 
pulse, (a) Ground-state wavefunction at a.u. (b) Ground-state wavefunction at 1000 a.u. (c) Ground-state 
wavefunction at 1200 a.u. That amplitude which does exit does so exclusively from channel 2. Note the close 
correspondence with the classical trajectory of figure Al. 6. 27(c) (reprinted from [52]). 

This scheme is significant for three reasons: (i) it shows that control is possible, (ii) it gives a starting point for 
the design of optimal pulse shapes, and (iii) it gives a framework for interpreting the action of two pulse and 
more complicated pulse sequences. Nevertheless, the approach is limited: in general with the best choice of 
time delay and central frequency of the pulses one may achieve only partial selectivity. Perhaps most 
importantly, this scheme does not exploit the phase of the light. Intuition breaks down for more complicated 
processes and classical pictures cannot adequately describe the role of the phase of the light and the 
wavefunction. Hence, attempts were made to develop a systematic procedure for improving an initial pulse 
sequence. 
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Before turning to these more systematic procedures for designing shaped pulses, we point out an interesting 
alternative perspective on pump-dump control. A central tenet of Feynman's approach to quantum mechanics 
was to think of quantum interference as arising from multiple dynamical paths that lead to the same final state. 
The simple example of this interference involves an initial state, two intermediate states and a single final 
state, although if the objective is to control some branching ratio at the final energy then at least two final 
states are necessary. By controlling the phase with which each of the two intermediate states contributes to the 
final state, one may control constructive versus destructive interference in the final states. This is the basis of 
the Brumer-Shapiro approach to coherent control [57, 58]. It is interesting to note that pump-dump control 


can be viewed entirely from this perspective. Now, however, instead of two intermediate states there are 
many, corresponding to the vibrational levels of the excited electronic state (see figure Al. 6. 31 ). The control 
of the phase which determines how each of these intermediate levels contributes to the final state is achieved 
via the time delay between the excitation and the stimulated emission pulse. This 'interfering pathways' 
interpretation of pump-dump control is shown in figure Al.6.30. 
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Figure Al.6.30. (a) Two pulse sequence used in the Tannor-Rice pump-dump scheme, (b) The Husimi time- 
frequency distribution corresponding to the two pump sequence in (a), constructed by taking the overlap of 
the pulse sequence with a two-parameter family of Gaussians, characterized by different centres in time and 
carrier frequency, and plotting the overlap as a function of these two parameters. Note that the Husimi 
distribution allows one to visualize both the time delay and the frequency offset of pump and dump 
simultaneously (after [52a]). 
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Figure Al.6.31. Multiple pathway interference interpretation of pump-dump control. Since each of the pair 
of pulses contains many frequency components, there are an infinite number of combination frequencies 
which lead to the same final energy state, which generally interfere. The time delay between the pump and 


dump pulses controls the relative phase among these pathways, and hence determines whether the interference 
is constructive or destructive. The frequency domain interpretation highlights two important features of 
coherent control. First, if final products are to be controlled there must be degeneracy in the dissociative 
continuum. Second, a single interaction with the light, no matter how it is shaped, cannot produce control of 
final products: at least two interactions with the field are needed to obtain interfering pathways. 

A1. 6.5.2 VARIATIONAL FORMULATION OF CONTROL OF PRODUCT FORMATION 

The next step, therefore, is to address the question: how is it possible to take advantage of the many additional 
available parameters: pulse shaping, multiple pulse sequences, etc — in general an E(t) with arbitrary 
complexity — to maximize and perhaps obtain perfect selectivity? Posing the problem mathematically, one 
seeks to maximize 


J= lim {f{T)\P„\${'n) (A 1.6.127) 

where P a is a projection operator for chemical channel a (here, a takes on two values, referring to 
arrangement channels A + BC and AB + C; in general, in a triatomic molecule ABC, a takes on three values, 
1,2,3, referring to arrangement channels A + BC, AB + C and AC + B). The time Tis understood to be longer 
than the duration of the pulse sequence, E(t); the yield, J, is defined as T^> go, that is, after the wavepacket 
amplitude has time to reach its asymptotic arrangement. The key observation is that the quantity J is a 
functional of E(t), that is, J is a function of a function, because \|/(7) depends on the whole history of E(t). To 
make this dependence on E(T) explicit we may write 


-63- 


J[£(0] = Jim (n^mT)\P a \f[E(t)](T)) (A 1.6.128) 

where square brackets are used to indicate functional dependence. The problem of maximizing a function of a 
function has a rich history in mathematical physics, and falls into the class of problems belonging to the 
calculus of variations. 

In the OCT formulation, the TDSE written as a 2 x 2 matrix in a BO basis set, equation (Al.6.72) , is 
introduced into the objective functional with a Lagrange multiplier, %(x, i) [54]. The modified objective 
functional may now be written as 


J = Hm {f (T)\P^{T)) + 2Re / di( x 0) 


at " ift 


f dr\E{t)\ 2 


(A 1.6.129) 


where a constraint (or penalty) on the time integral of the energy in the electric field has also been added. It is 
clear that as long as \|/ satisfies the TDSE the new term in ./will vanish for any %(x, i). The function of the new 
term is to make the variations of ./with respect to E and with respect to \|/ independent, to first-order in 8E (i.e. 
to 'deconstrain' \\f andiT). 


The requirement that 8J/8\\f = leads to the following equations: 


\h— = fix (A 1.6.130) 

Xtx/f) = P a flXiT) (A 1.6.131) 

that is, the Lagrange multiplier must obey the TDSE, subject to the boundary condition at t\vQ final time 7 that 
X be equal to the projection operator operating on the Schrodinger wave function. These conditions 'conspire', 
so that a change in E, which would ordinarily change ./through the dependence of \|/(7) on E, does not do so 
to first-order in the field. For a physically meaningful solution it is required that 

\h^- = //^ (A 1.6.132) 

Kv>0)=iMa-). (A 1.6.133) 

Finally, the optimal E(t) is given by the condition that 8J/8 E = which leads to the equation 

E(i) = ^I(XJWI^) - <VUjUlto>]. (A 1.6.134) 

The interested reader is referred to [ 54 ] for the details of the derivation. 
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Equation (Al.6.129) , equation (Al.6.130) , equation (Al.6.131) , equation (Al.6.132) and equation (Al.6.133) 
form the basis for a double-ended boundary value problem. \|/ is known at t = 0, while % is known at t = T. 
Taking a guess for E(t) one can propagate \|/ forward in time to obtain \\f(t); at time Tthe projection operator 
P a may be applied to obtain %(7), which may be propagated backwards in time to obtain %(t). Note, however, 
that the above description is not self-consistent: the guess of E(t) used to propagate \\f(t) forward in time and 
to propagate %(t) backwards in time is not, in general, equal to the value of E(t) given by equation (Al.6.133) . 
Thus, in general, one has to solve these equations iteratively until self-consistency is achieved. Optimal 
control theory has become a widely used tool for designing laser pulses with specific objectives. The 
interested reader can consult the review in [65] for further examples. 

A1. 6.5.3 OPTIMAL CONTROL AND LASER COOLING OF MOLECULES 

The use of lasers to cool atomic translational motion has been one of the most exciting developments in 
atomic physics in the last 15 years. For excellent reviews, see [66, 67]. Here we give a non-orthodox 
presentation, based on [68], 

(A) CALIBRATION OF COOLING: THE ZEROTH LAW 

Consider, figure Al.6.32 in which a system is initially populated with an incoherent distribution of 
populations with Boltzmann probabilities, P n , H n P n = 1. The simple-minded definition of cooling is to 
manipulate all the population into the lowest energy quantum state, i.e. to make Pq = 1 and all the other P n = 

0. Cooling can then be measured by the quantity H n P^\ for the initial, incoherent distribution ^ n P n < 1 
while for the final distribution S^ P^ = 1. However, adoption of this definition of cooling implies that if all 
the population is put into any single quantum state, not necessarily the lowest energy state, the degree of 
cooling is identical. Although this seems surprising at first, it is in fact quite an appealing definition of 
cooling. It highlights the fact that the essence of cooling is the creation of a pure state starting from a mixed 
state; once the state is pure then coherent manipulations, which are relatively straightforward, can transfer this 


population to the ground state. As described in section Al. 6. 2.4 , the conventional measure of the degree of 
purity of a system in quantum mechanics is Tr (p 2 ), where p is the system's density matrix, and thus we have 
here defined cooling as the process of bringing Tr(p 2 ) from its initial value less than 1 to unity. The definition 
of cooling in terms of Tr(p ) leads to an additional surprise, namely, that the single quantum state need not 
even be an eigenstate: it can, in principle, be a superposition consisting of a coherent superposition of many 
eigenstates. So long as the state is pure (i.e. can be described by a single Schrodinger wavefunction) it can 
manipulated into the lowest energy state by a unitary transformation, and in a very real sense is already cold! 
figure Al.6.32 gives a geometrical interpretation of cooling. The density matrix is represented as a point on a 

generalized Bloch sphere of radius R = Tr(p ). For an initially thermal state the radius R < 1, while for a pure 
state R = 1. Thus, the object of cooling, that is, increasing the purity of the density matrix, corresponds to 
manipulating the density matrix onto spheres of increasingly larger radius. 
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Figure Al.6.32. (a) Initial and (b) final population distributions corresponding to cooling, (c) Geometrical 
interpretation of cooling. The density matrix is represented as a point on generalized Bloch sphere of radius R 

= Tr(p ). For an initially thermal state the radius R < 1, while for a pure state R = 1. The object of cooling is to 
manipulate the density matrix onto spheres of increasingly larger radius. 

We have seen in section Al. 6. 2.4 that external fields alone cannot change the value of Tr(p )! Changes in the 
purity can arise only from the spontaneous emission, which is inherently uncontrollable. Where then is the 
control? 

A first glimmer of the resolution to the paradox of how control fields can control purity content is obtained by 

noting that the second derivative, Tr(P), does depend on the external field. Loosely speaking, the 
independence of the first derivative and the dependence of the second derivative on the control field indicates 
that the control of cooling is achieved only in two-stages: preparation of the initial state by the control field, 
followed by spontaneous emission into that recipient state. This two-stage interpretation will now be 
quantified. 

To find the boundary between heating and cooling we set Tr(P") = 0. Figure Al.6.33 shows isocontours of Tr( 
P ) as a function of the parameters p 22 (^) and |p 12 | (x). The dark region corresponds to & "^ '< 0; that is, 

cooling, while the light region corresponds to ^ ^ *> 0, (i.e. heating). Note that the cooling region fills part, 
but not all of the lower hemisphere. For fixed z, the maximum occurs along the line x = 0, with the global 
maximum at z = 1/4, x = 0. 
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Figure Al.6.33. (a) Contour map of i Tr ^ Zfc as a function of the parameters p 22 (z) and |p 12 | (x). The dark 

region corresponds to £ Tr ^"k 0, i.e. cooling while the light region corresponds to z Tti ^b> 0, i.e. heating. For 
fixed z, the maximum occurs along the line x = 0. (b) Isopurity, or isocoherence contours (contours of fixed Tr 

(p 2 )) as a function of p 22 (z) and |p 12 | (x) for the two-level system. The contour takes its maximum value of 1, 
corresponding to a pure state, along the outermost circle, while the function takes its minimum value of 1/2, 
representing the most impure state, at the centre. 

To gain a qualitative understanding for the heating and cooling regions we consider three representative points 
(top to bottom, figure Al. 6. 33(a) . (i) Spontaneous emission will lead from 1:99 to 0:100 and hence purity 
increase, (ii) Spontaneous emission will lead from 100:0 to 99:1 and hence purity decrease, (iii) Spontaneous 
emission will lead from 40:60 to 30:70 which suggests a purity increase; however, if there is purity stored in 
the coherences p 12 , spontaneous emission will force these to decay at a rate T 2 = l/2T^; this leads to a 
decrease in purity which is greater than the increase in purity brought about by the population transfer. 

The manipulations allowed by the external field are those that move the system along a contour of constant 
value of Tr(p 2 ), an isocoherence contour; it is clear from figure Al.6.33 that the location on this contour has a 
profound affect on Tr(P"). This gives a second perspective on how the external field cannot directly change Tr 
(P~), but can still affect the rate of change of Tr(p ). If we imagine that at every instant in time the external 
field moves the system along the instantaneous isocoherence contour until it intersects the curve of maximum 

Tr(P"), that would provide an optimal cooling strategy. This last observation is the crux of our cooling theory 
and puts into sharp perspective the role played by the external field: while the external field cannot itself 
change the purity of the system it can perform purity-preserving transformations which subsequently affect 
the rate of change of purity. 


To summarize, we have the following chain of dependence: (p 22 ,|p 12 |) 


Tr(p 2 ) 


(P 22 ,\t> l2 \) -+ Tr(/>-). This 


chain of dependence gives Tr(P ) as a function of Tr(p ), which is a differential equation for the optimal 

trajectory Tr(p 2 )((). By studying the rate of approach of the optimal trajectory to absolute zero (i.e. to a pure 
state) we will have found an inviolable limitation on cooling rate with the status of a third law of 
thermodynamics. 
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Note that the differential equation obtained from this approach will never agree perfectly with the results of a 
simulation. The above formulation is essentially an adiabatic formulation of the process: the spontaneous 
emission is considered to be slow compared with the time scale for the purity-preserving transformations 
generated by the external field, which is what allows us to assume in the theory that the external field 


manipulation along the isocoherence contour is instantaneous. If the external field is sufficiently intense, the 
population transfer may become nearly instantaneous relative to the spontaneous emission, and the adiabatic 
approximation will be excellent. 

(B) COOLING AND LASING AS COMPLEMENTARY PROCESSES 

It is interesting to consider the regions of heating, that is, regions where Tr(P") < 0. We conjecture that these 
regions correspond to regions where lasing can occur. The conjecture is based on the following 
considerations: 

(0 Note that for the two-level system with no coherence (p 12 = 0), the region where Tr(P~) < 

corresponds to p 22 > 2. This corresponds to the conventional population inversion criterion for lasing: 

that population in the excited state be larger than in the ground state. 

(ii) The fact that in this region the system coherence is decreasing, leaves open the possibility that 

coherence elsewhere can increase. In particular, excitation with incoherent light can lead to emission 
of coherent light. This is precisely the reverse situation as with laser cooling, where coherent light is 
transformed to incoherent light (spontaneous emission), increasing the level of coherence of the 
system. 

(iii) xhe regions with Tr(/^) < and d < Inecessarily imply y > 0, that is, coherences between the ground 
and excited state. This may correspond to lasing without population inversion, an effect which has 
attracted a great deal of attention in recent years, and is made possible by coherences between the 
ground and excited states. Indeed, in the three-level X system the boundary between heating and 
cooling is in exact agreement with the boundary between lasing and non-lasing. 

Fundamentally, the conditions for lasing are determined unambiguously once the populations and coherences 
of the system density matrix are known. Yet, we have been unable to find in the literature any simple criterion 
for lasing in multilevel systems in terms of the system density matrix alone. Our conjecture is that entropy, as 

expressed by the purity content Tr(p 2 ), is the unifying condition; the fact that such a simple criterion could 
have escaped previous observation may be understood, given the absence of thermodynamic considerations in 
conventional descriptions of lasing. 
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A1.7 Surfaces and interfaces 

J A Yarmoff 


A1.7.1 INTRODUCTION 

Some of the most interesting and important chemical and physical interactions occur when dissimilar 
materials meet, i.e. at an interface. The understanding of the physics and chemistry at interfaces is one of the 
most challenging and important endeavors in modern science. 

Perhaps the most intensely studied interface is that between a solid and vacuum, i.e. a surface. There are a 
number of reasons for this. For one, it is more experimentally accessible than other interfaces. In addition, it is 


conceptually simple, as compared to interfaces between two solids or between a solid and a liquid, so that the 
vacuum-solid interface is more accessible to fundamental theoretical investigation. Finally, it is the interface 
most easily accessible for modification, for example by photons or charged particle beams that must be 
propagated in vacuum. 

Studies of surfaces and surface properties can be traced to the early 1800s [1]. Processes that involved 
surfaces and surface chemistry, such as heterogeneous catalysis and Daguerre photography, were first 
discovered at that time. Since then, there has been a continual interest in catalysis, corrosion and other 
chemical reactions that involve surfaces. The modern era of surface science began in the late 1950s, when 
instrumentation that could be used to investigate surface processes on the molecular level started to become 
available. 

Since the modern era began, the study of solid surfaces has been one of the fastest growing areas in solid-state 
research. The geometric, electronic and chemical structure at the surface of a solid is generally quite different 
from that of the bulk material. It is now possible to measure the properties of a surface on the atomic scale 
and, in fact, to image individual atoms on a surface. The theoretical understanding of the chemistry and 
physics at surfaces is also improving dramatically. Much of the theoretical work has been motivated by the 
experimental results, as well as by the vast improvements in computer technology that are required to carry 
out complex numerical calculations. 

Surface studies address important issues in basic physics and chemistry, but are also relevant to a variety of 
applications. One of the most important uses of a surface, for example, is in heterogeneous catalysis. Catalysis 
occurs via adsorption, diffusion and reaction on a solid surface, so that delineation of surface chemical 
mechanisms is critical to the understanding of catalysis. Microelectronic devices are manufactured by 
processing of single-crystal semiconductor surfaces. Most dry processes that occur during device manufacture 
involve surface etching or deposition. Thus, understanding how molecules adsorb and react on surfaces and 
how electron and ion beams modify surfaces is crucial to the development of manufacturing techniques for 
semiconductor and, more recently, micro-electromechanical (MEMS), devices. Surfaces are also the active 
component in tribology, i.e. solid lubrication. In order to design lubricants that will stick to one surface, yet 
have minimal contact with another, one must understand the fundamental surface interactions involved. In 
addition, the movement of pollutants through the environment is controlled by the interactions of chemicals 
with the surfaces encountered in the soil. Thus, a fundamental understanding of the surface chemistry of metal 
oxide materials is needed in order to properly evaluate and solve environmental problems. 


Surfaces are found to exhibit properties that are different from those of the bulk material. In the bulk, each 
atom is bonded to other atoms in all three dimensions. In fact, it is this infinite periodicity in three dimensions 
that gives rise to the power of condensed matter physics. At a surface, however, the three-dimensional 
periodicity is broken. This causes the surface atoms to respond to this change in their local environment by 
adjusting their geometric and electronic structures. The physics and chemistry of clean surfaces is discussed in 
section Al. 7. 2. 

The importance of surface science is most often exhibited in studies of adsorption on surfaces, especially in 
regards to technological applications. Adsorption is the first step in any surface chemical reaction or film- 
growth process. The mechanisms of adsorption and the properties of adsorbate-covered surfaces are discussed 
in section Al. 7. 3 . 

Most fundamental surface science investigations employ single-crystal samples cut along a low-index plane. 
The single-crystal surface is prepared to be nearly atomically flat. The surface may also be modified in 
vacuum. For example, it may be exposed to a gas that adsorbs (sticks) to the surface, or a film can be grown 
onto a sample by evaporation of material. In addition to single-crystal surfaces, many researchers have 
investigated vicinal, i.e. stepped, surfaces as well as the surfaces of poly crystalline and disordered materials. 


In section Al. 7.4 , methods for the preparation of surfaces are discussed. 

Surfaces are investigated with surface-sensitive techniques in order to elucidate fundamental information. The 
approach most often used is to employ a variety of techniques to investigate a particular materials system. As 
each technique provides only a limited amount of information, results from many techniques must be 
correlated in order to obtain a comprehensive understanding of surface properties. In section Al. 7. 5 , methods 
for the experimental analysis of surfaces in vacuum are outlined. Note that the interactions of various kinds of 
particles with surfaces are a critical component of these techniques. In addition, one of the more interesting 
aspects of surface science is to use the tools available, such as electron, ion or laser beams, or even the tip of a 
scanning probe instrument, to modify a surface at the atomic scale. The physics of the interactions of particles 
with surfaces and the kinds of modifications that can be made to surfaces are an integral part of this section. 

The liquid-solid interface, which is the interface that is involved in many chemical and environmental 
applications, is described in section Al. 7. 6 . This interface is more complex than the solid-vacuum interface, 
and can only be probed by a limited number of experimental techniques. Thus, obtaining a fundamental 
understanding of its properties represents a challenging frontier for surface science. 


A1.7.2 CLEAN SURFACES 

The study of clean surfaces encompassed a lot of interest in the early days of surface science. From this, we 
now have a reasonable idea of the geometric and electronic structure of many clean surfaces, and the tools are 
readily available for obtaining this information from other systems, as needed. 


When discussing geometric structure, the macroscopic morphology must be distinguished from the 
microscopic atomic structure. The morphology is the macroscopic shape of the material, which is a collective 
property of groups of atoms determined largely by surface and interfacial tension. The following discussion, 
however, will concentrate on the structure at the atomic level. Note that the atomic structure often plays a role 
in determining the ultimate morphology of the surface. What is most important about the atomic structure, 
however, is that it affects the manner in which chemistry occurs on a surface at the molecular level. 

A1.7.2.1 SURFACE CRYSTALLOGRAPHY 

To first approximation, a single-crystal surface is atomically flat and uniform, and is composed of a regular 

array of atoms positioned at well defined lattice sites. Materials generally have of the order of 10 15 atoms 
positioned at the outermost atomic layer of each square centimetre of exposed surface. A bulk crystalline 
material has virtually infinite periodicity in three dimensions, but infinite periodicity remains in only two 
dimensions when a solid is cut to expose a surface. In the third dimension, i.e. normal to the surface, the 
periodicity abruptly ends. Thus, the surface crystal structure is described in terms of a two-dimensional unit 
cell parallel to the surface. 

In describing a particular surface, the first important parameter is the Miller index that corresponds to the 
orientation of the sample. Miller indices are used to describe directions with respect to the three-dimensional 
bulk unit cell [2]. The Miller index indicating a particular surface orientation is the one that points in the 
direction of the surface normal. For example, a Ni crystal cut perpendicular to the [ 100 ] direction would be 
labelled Ni(100). 

The second important parameter to consider is the size of the surface unit cell. A surface unit cell cannot be 
smaller than the projection of the bulk cell onto the surface. However, the surface unit cell is often bigger than 


it would be if the bulk unit cell were simply truncated at the surface. The symmetry of the surface unit cell is 
easily determined by visual inspection of a low-energy electron diffraction (LEED) pattern (LEED is 
discussed in section Al. 7. 5.1 and section B 1.21 ). 

There is a well defined nomenclature employed to describe the symmetry of any particular surface [1]. The 
standard notation for describing surface symmetry is in the form 

where M is the chemical symbol of the substrate material, h, k, and / are the Miller indices that indicate the 
surface orientation,/? and q relate the size of the surface unit cell to that of the substrate unit cell and A is the 
chemical symbol for an adsorbate (if applicable). For example, atomically clean Ni cut perpendicular to the 
[ 100 ] direction would be notated as Ni(100)-(1 x 1), since this surface has a bulk-terminated structure. If the 
unit cell were bigger than that of the substrate in one direction or the other, then/? and/or q would be larger 
than one. For example, if a Si single crystal is cleaved perpendicular to the direction, a Si(l 1 1)-(2 x 1) surface 
is produced. Note that/? and q are often, but 


are not necessarily, integers. If an adsorbate is involved in forming the reconstruction, then it is explicitly part 
of the nomenclature. For example, when silver is adsorbed on Si(l 1 1) under the proper conditions, the Si 
(1 1 1)-(V3 x V3)-Ag structure is formed. 

In addition, the surface unit cell may be rotated with respect to the bulk cell. Such a rotated unit cell is notated 
as 

M(hk!)-(p x q)Rr~-A 

where r is the angle in degrees between the surface and bulk unit cells. For example, when iodine is adsorbed 
onto the (1 1 1) face of silver, the Ag(l 1 1)-(V3 x V3)R30°-I structure can be produced. 

Finally, there is an abbreviation 'c', which stands for 'centred', that is used to indicate certain common 
symmetries. In a centred structure, although the primitive unit cell is rotated from the substrate unit cell, the 
structure can also be considered as a non-rotated unit cell with an additional atom placed in the centre. For 
example, a common adsorbate structure involves placing an atom at every other surface site of a square 
lattice. This has the effect of rotating the primitive unit cell by 45°, so that such a structure would ordinarily 
be notated as (V2 x V2)R45°. However, the unit cell can also be thought of as a (2 x 2) in registry with the 
substrate with an additional atom placed in the centre of the cell. Thus, in order to simplify the nomenclature, 
this structure is equivalently called a c(2 x 2). Note that the abbreviation 'p', which stands for 'primitive', is 
sometimes used for a unit cell that is in registry with the substrate in order to distinguish it from a centred 
symmetry. Thus, p(2 x 2) is just an unambiguous way of representing a (2 x 2) unit cell. 

A1. 7.2.2 TERRACES AND STEPS 

For many studies of single-crystal surfaces, it is sufficient to consider the surface as consisting of a single 
domain of a uniform, well ordered atomic structure based on a particular low-Miller-index orientation. 
However, real materials are not so flawless. It is therefore useful to consider how real surfaces differ from the 
ideal case, so that the behaviour that is intrinsic to a single domain of the well ordered orientation can be 
distinguished from that caused by defects. 


Real, clean, single-crystal surfaces are composed of terraces, steps and defects, as illustrated in figure A 1.7.1 . 
This arrangement is called the TLK, or terrace-ledge-kink, model. A terrace is a large region in which the 
surface has a well-defined orientation and is atomically flat. Note that a singular surface is defined as one that 
is composed solely of one such terrace. It is impossible to orient an actual single-crystal surface to precise 
atomic flatness, however, and steps provide the means to reconcile the macroscopic surface plane with the 
microscopic orientation. A step separates singular terraces, or domains, from each other. Most steps are single 
atomic height steps, although for certain surfaces a double-height step is required in order that each terrace is 
equivalent. Figure Al. 7.1(a) illustrates two perfect terraces separated by a perfect monoatomic step. The 
overall number and arrangement of the steps on any actual surface is determined by the misorientation, which 
is the angle between the nominal crystal axis direction and the actual surface normal. If the misorientation is 
not along a low-index direction, then there will be kinks in the steps to adjust for this, as illustrated in figure 
A1.7.1(b) . 
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Figure Al.7.1. Schematic diagram illustrating terraces, steps, and defects, (a) Perfect flat terraces separated 
by a straight, monoatomic step, (b) A surface containing various defects. 


A surface that differs from a singular orientation by a finite amount is called vicinal. Vicinal surfaces are 
composed of well oriented singular domains separated by steps. Figure Al.7.2 shows a large-scale scanning 
tunnel microscope (STM) image of a stepped Si(l 1 1) surface (STM instruments are described in section 
Al.7.5.3 and section B 1.20 ). In this image, flat terraces separated by well defined steps are easily visible. It 
can be seen that the steps are all pointing along the same general direction. 



Figure Al.7.2. Large-scale (5000 Atimes 5000 A) scanning tunnelling microscope image of a stepped Si 
(11 1)-(7 x 7) surface showing flat terraces separated by step edges (courtesy of Alison Baski). 

Although all real surfaces have steps, they are not usually labelled as vicinal unless they are purposely 
misoriented in order to create a regular array of steps. Vicinal surfaces have unique properties, which make 
them useful for many types of experiments. For example, steps are often more chemically reactive than 
terraces, so that vicinal surfaces provide a means for investigating reactions at step edges. Also, it is possible 
to grow 'nano wires' by deposition of a metal onto a surface of another metal in such a way that the deposited 
metal diffuses to and attaches at the step edges [3]. 

Many surfaces have additional defects other than steps, however, some of which are illustrated in figure 
A1.7.1(b) . For example, steps are usually not flat, i.e. they do not lie along a single low-index direction, but 
instead have kinks. Terraces are also not always perfectly flat, and often contain defects such as adatoms or 
vacancies. An adatom is an isolated atom adsorbed on top of a terrace, while a vacancy is an atom or group of 
atoms missing from an otherwise perfect terrace. In addition, a group of atoms called an island may form on a 
terrace, as illustrated. 


Much surface work is concerned with the local atomic structure associated with a single domain. Some 
surfaces are essentially bulk-terminated, i.e. the atomic positions are basically unchanged from those of the 
bulk as if the atomic bonds in the crystal were simply cut. More common, however, are deviations from the 
bulk atomic structure. These structural adjustments can be classified as either relaxations or reconstructions. 
To illustrate the various classifications of surface structures, figure A1.7.3(a) shows a side-view of a bulk- 
terminated surface, figure Al. 7. 3(b) shows an oscillatory relaxation and figure A1.7.3(c) shows a 
reconstructed surface. 
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Figure Al.7.3. Schematic illustration showing side views of (a) a bulk-terminated surface, (b) a relaxed 
surface with oscillatory behaviour, and (c) a reconstructed surface. 

A1.7.2.3 RELAXATION 

Most metal surfaces have the same atomic structure as in the bulk, except that the interlayer spacings of the 
outermost few atomic layers differ from the bulk values. In other words, entire atomic layers are shifted as a 
whole in a direction perpendicular to the surface. This is called relaxation, and it can be either inward or 
outward. Relaxation is usually reported as a percentage of the value of the bulk interlayer spacing. Relaxation 
does not affect the two-dimensional surface unit cell symmetry, so surfaces that are purely relaxed have (1 x 
1) symmetry. 

The reason that relaxation occurs can be understood in terms of the free electron character of a metal. Because 
the electrons are free, they are relatively unperturbed by the periodic ion cores. Thus, the electron density is 
homogeneous 


-8- 

parallel to the surface. At the surface of a metal the solid abruptly stops, so that there is a net dipole 
perpendicular to the surface. This dipole field acts to attract electrons to the surface and is, in fact, responsible 
for the surface work function. The dipole field also interacts with the ion cores of the outermost atomic layer, 
however, causing them to move perpendicular to the surface. Note that some metals are also reconstructed 
since the assumption of perfectly free electrons unperturbed by the ion cores is not completely valid. 


In many materials, the relaxations between the layers oscillate. For example, if the first-to-second layer 
spacing is reduced by a few percent, the second-to-third layer spacing would be increased, but by a smaller 
amount, as illustrated in figure A1.7.3(b) . These oscillatory relaxations have been measured with LEED [4, 5] 
and ion scattering [6, 7] to extend to at least the fifth atomic layer into the material. The oscillatory nature of 
the relaxations results from oscillations in the electron density perpendicular to the surface, which are called 
Friedel oscillations [8]. The Friedel oscillations arise from Fermi-Dirac statistics and impart oscillatory forces 
to the ion cores. 

A1.7.2.4 RECONSTRUCTION 

The three-dimensional symmetry that is present in the bulk of a crystalline solid is abruptly lost at the surface. 
In order to minimize the surface energy, the thermodynamically stable surface atomic structures of many 
materials differ considerably from the structure of the bulk. These materials are still crystalline at the surface, 
in that one can define a two-dimensional surface unit cell parallel to the surface, but the atomic positions in 
the unit cell differ from those of the bulk structure. Such a change in the local structure at the surface is called 
a reconstruction. 

For covalently bonded semiconductors, the largest driving force behind reconstructions is the need to pair up 
electrons. For example, as shown in figure A 1.7. 4(a) if a Si(100) surface were to be bulk-terminated, each 
surface atom would have two lone electrons pointing away from the surface (assuming that each atom remains 
in a tetrahedral configuration). Lone electrons protruding into the vacuum are referred to as dangling bonds. 
Instead of maintaining two dangling bonds at each surface atom, however, dimers can form in which electrons 
are shared by two neighbouring atoms. Figure A 1.7. 4(b) shows two symmetrically dimerized Si atoms, in 
which two dangling bonds have been eliminated, although the atoms still have one dangling bond each. Figure 
Al. 7.4(c) shows the asymmetric arrangement that further lowers the energy by pairing up two lone electrons 
onto one atom. In this arrangement, the electrons at any instant are associated with one Si atom, while the 
other has an empty orbital. This distorts the crystal structure, as the upper atom is essentially sp 3 hybridized, 
i.e. tetrahedral, while the other is sp 2 , i.e. flat. 


YY 


(a) 



(b) 



(c) 

Figure Al.7.4. Schematic illustration of two Si atoms as they would be oriented on the (100) surface, (a) 
Bulk-terminated structure showing two dangling bonds (lone electrons) per atom, (b) Symmetric dimer, in 
which two electrons are shared and each atom has one remaining dangling bond, (c) Asymmetric dimer in 
which two electrons pair up on one atom and the other has an empty orbital. 

Figure Al. 7. 5(a) shows a larger scale schematic of the Si(100) surface if it were to be bulk-terminated, while 
figure Al. 7. 5(b) shows the arrangement after the dimers have been formed. The dashed boxes outline the two- 
dimensional surface unit cells. The reconstructed Si(100) surface has a unit cell that is two times larger than 
the bulk unit cell in one direction and the same in the other. Thus, it has a (2 x 1) symmetry and the surface is 
labelled as Si(100)-(2 xl). Note that in actuality, however, any real Si(100) surface is composed of a mixture 
of (2 x 1) and (1 x 2) domains. This is because the dimer direction rotates by 90° at each step edge. 
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Figure Al.7.5. Schematic illustration showing the top view of the Si(100) surface, (a) Bulk-terminated 
structure. (b)Dimerized Si(100)-(2 x 1) structure. The dashed boxes show the two-dimensional surface unit 
cells. 

The surface unit cell of a reconstructed surface is usually, but not necessarily, larger than the corresponding 
bulk-terminated two-dimensional unit cell would be. The LEED pattern is therefore usually the first indication 
that a reconstruction exists. However, certain surfaces, such as GaAs(l 10), have a reconstruction with a 
surface unit cell that is still (1 x 1). At the GaAs(l 10) surface, Ga atoms are moved inward perpendicular to 
the surface, while As atoms are moved outward. 

The most celebrated surface reconstruction is probably that of Si(l 1 1)-(7 x 7). The fact that this surface has 
such a large unit cell had been known for some time from LEED, but the detailed atomic structure took many 
person-years of work to elucidate. Photoelectron spectroscopy [9], STM [10] and many other techniques were 
applied to the determination of this structure. It was transmission electron diffraction (TED), however, that 
provided the final information enabling the structure to be determined [11]. The structure now accepted is the 
so-called DAS, or dimer adatom stacking-fault, model, as shown in figure Al.7.6 . In this structure, there are a 
total of 19 dangling bonds per unit cell, which can be compared to the 49 dangling bonds that the bulk- 
terminated surface would have. Figure Al.7.7 shows an atomic resolution STM image of the Si(l 1 1)-(7 x 7) 
surface. The bright spots in the image represent individual Si adatoms. 
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Figure Al.7.6. Schematic diagrams of the DAS model of the Si(l 1 1)-(7 x 7) surface structure. There are 12 
'adatoms' per unit cell in the outermost layer, which each have one dangling bond perpendicular to the 
surface. The second layer, called the rest layer, also has six 'rest' atoms per unit cell, each with a 
perpendicular dangling bond. The 'corner holes' at the edges of the unit cells also contain one atom with a 
dangling bond. 
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Figure Al.7.7. Atomic-resolution, empty-state STM image (100 A x 100 A) of the reconstructed Si(l 1 1)-7 x 
7 surface. The bright spots correspond to a top layer of adatoms, with 12 adatoms per unit cell (courtesy of 
Alison Baski). 

Although most metal surfaces exhibit only relaxation, some do have reconstructions. For example, the fee 
metals, Pt(l 10), Au(l 10) and Ir(l 10), each have a (1 x 2) surface unit cell. The accepted structure of these 
surfaces is a 
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missing row model, in which every other surface row is missing. Also, as discussed below, when an adsorbate 
attaches to a metal surface, a reconstruction of the underlying substrate may be induced. 

Reliable tables that list many known surface structures can be found in [JJ. Also, the National Institute of 
Standards and Technology (NIST) maintains databases of surface structures and other surface-related 
information, which can be found at http://www.nist.gov/srd/surface.htm . 

A1. 7.2.5 SELF-DIFFUSION 

The atoms on the outermost surface of a solid are not necessarily static, particularly as the surface temperature 
is raised. There has been much theoretical [12, 13] and experimental work (described below) undertaken to 
investigate surface self-diffusion. These studies have shown that surfaces actually have dynamic, changing 
structures. For example, atoms can diffuse along a terrace to or from step edges. When atoms diffuse across a 
surface, they may move by hopping from one surface site to the next, or by exchanging places with second 
layer atoms. 

The field ion microscope (FIM) has been used to monitor surface self-diffusion in real time. In the FIM, a 
sharp, crystalline tip is placed in a large electric field in a chamber filled with He gas [14]. At the tip, He ions 
are formed, and then accelerated away from the tip. The angular distribution of the He ions provides a picture 
of the atoms at the tip with atomic resolution. In these images, it has been possible to monitor the diffusion of 
a single adatom on a surface in real time [15]. The limitations of FIM, however, include its applicability only 
to metals, and the fact that the surfaces are limited to those that exist on a sharp tip, i.e. diffusion along a large 


terrace cannot be observed. 

More recently, studies employing STM have been able to address surface self-diffusion across a terrace [ 16 , 
17 , 18 and 19]. It is possible to image the same area on a surface as a function of time, and 'watch' the 
movement of individual atoms. These studies are limited only by the speed of the instrument. Note that the 
performance of STM instruments is constantly improving, and has now surpassed the 1 ps time resolution 
mark [20]. Not only has self-diffusion of surface atoms been studied, but the diffusion of vacancy defects on 
surfaces has also been observed with STM [18]. 

It has also been shown that sufficient surface self-diffusion can occur so that entire step edges move in a 
concerted manner. Although it does not achieve atomic resolution, the low-energy electron microscopy 
(LEEM) technique allows for the observation of the movement of step edges in real time [21]. LEEM has also 
been useful for studies of epitaxial growth and surface modifications due to chemical reactions. 

A1.7.2.6 SURFACE ELECTRONIC STRUCTURE 

At a surface, not only can the atomic structure differ from the bulk, but electronic energy levels are present 
that do not exist in the bulk band structure. These are referred to as 'surface states'. If the states are occupied, 
they can easily be measured with photoelectron spectroscopy (described in section Al. 7. 5.1 and section 
Bl.25.2 ). If the states are unoccupied, a technique such as inverse photoemission or x-ray absorption is 
required [22, 23]. Also, note that STM has been used to measure surface states by monitoring the tunnelling 
current as a function of the bias voltage [24] (see section B 1.20 ). This is sometimes called scanning tunnelling 
spectroscopy (STS). 
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Surface states can be divided into those that are intrinsic to a well ordered crystal surface with two- 
dimensional periodicity, and those that are extrinsic [25]. Intrinsic states include those that are associated with 
relaxation and reconstruction. Note, however, that even in a bulk-terminated surface, the outermost atoms are 
in a different electronic environment than the substrate atoms, which can also lead to intrinsic surface states. 
Extrinsic surface states are associated with imperfections in the perfect order of the surface region. Extrinsic 
states can also be formed by an adsorbate, as discussed below. 

Note that in core-level photoelectron spectroscopy, it is often found that the surface atoms have a different 
binding energy than the bulk atoms. These are called surface core-level shifts (SCLS), and should not be 
confused with intrinsic surface states. An SCLS is observed because the atom is in a chemically different 
environment than the bulk atoms, but the core-level state that is being monitored is one that is present in all of 
the atoms in the material. A surface state, on the other hand, exists only at the particular surface. 


A1 .7.3 ADSORPTION 

When a surface is exposed to a gas, the molecules can adsorb, or stick, to the surface. Adsorption is an 
extremely important process, as it is the first step in any surface chemical reaction. Some of the aspects of 
adsorption that surface science is concerned with include the mechanisms and kinetics of adsorption, the 
atomic bonding sites of adsorbates and the chemical reactions that occur with adsorbed molecules. 

The coverage of adsorbates on a given substrate is usually reported in monolayers (ML). Most often, 1 ML is 
defined as the number of atoms in the outermost atomic layer of the unreconstructed, i.e. bulk-terminated, 
substrate. Sometimes, however, 1 ML is defined as the maximum number of adsorbate atoms that can stick to 
a particular surface, which is termed the saturation coverage. The saturation coverage can be much smaller 


than the number of surface atoms, particularly with large adsorbates. Thus, in reading the literature, care must 
be taken to understand how a particular author defines 1 ML. 

Molecular adsorbates usually cover a substrate with a single layer, after which the surface becomes passive 
with respect to further adsorption. The actual saturation coverage varies from system to system, and is often 
determined by the strength of the repulsive interactions between neighbouring adsorbates. Some molecules 
will remain intact upon adsorption, while others will adsorb dissociatively. This is often a function of the 
surface temperature and composition. There are also often multiple adsorption states, in which the stronger, 
more tightly bound states fill first, and the more weakly bound states fill last. The factors that control 
adsorbate behaviour depend on the complex interactions between adsorbates and the substrate, and between 
the adsorbates themselves. 

The probability for sticking is known as the sticking coefficient, S. Usually, S decreases with coverage. Thus, 
the sticking coefficient at zero coverage, the so-called initial sticking coefficient, S , reflects the interaction of 
a molecule with the bare surface. 

In order to calibrate the sticking coefficient, one needs to determine the exposure, i.e. how many molecules 

have initially impacted a surface. The Langmuir (L) is a unit of exposure that is defined as 1CT 6 Torr s. An 
exposure of 1 L is approximately the number of incident molecules such that each outermost surface atom is 
impacted once. Thus, a 
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1 L exposure would produce 1 ML of adsorbates if the sticking coefficient were unity. Note that a quantitative 
calculation of the exposure per surface atom depends on the molecular weight of the gas molecules and on the 
actual density of surface atoms, but the approximations inherent in the definition of the Langmuir are often 
inconsequential. 

A1.7.3.1 PHYSISORPTION 

Adsorbates can physisorb onto a surface into a shallow potential well, typically 0.25 eV or less [25]. In 
physisorption, or physical adsorption, the electronic structure of the system is barely perturbed by the 
interaction, and the physisorbed species are held onto a surface by weak van der Waals forces. This attractive 
force is due to charge fluctuations in the surface and adsorbed molecules, such as mutually induced dipole 
moments. Because of the weak nature of this interaction, the equilibrium distance at which physisorbed 
molecules reside above a surface is relatively large, of the order of 3 A or so. Physisorbed species can be 
induced to remain adsorbed for a long period of time if the sample temperature is held sufficiently low. Thus, 
most studies of physisorption are carried out with the sample cooled by liquid nitrogen or helium. 

Note that the van der Waals forces that hold a physisorbed molecule to a surface exist for all atoms and 
molecules interacting with a surface. The physisorption energy is usually insignificant if the particle is 
attached to the surface by a much stronger chemisorption bond, as discussed below. Often, however, just 
before a molecule forms a strong chemical bond to a surface, it exists in a physisorbed precursor state for a 
short period of time, as discussed below in section Al. 7. 3. 3 . 

A1.7.3.2 CHEMISORPTION 

Chemisorption occurs when the attractive potential well is large so that upon adsorption a strong chemical 
bond to a surface is formed. Chemisorption involves changes to both the molecule and surface electronic 
states. For example, when oxygen adsorbs onto a metal surface, a partially ionic bond is created as charge 
transfers from the substrate to the oxygen atom. Other chemisorbed species interact in a more covalent 
manner by sharing electrons, but this still involves perturbations to the electronic system. 


Chemisorption is always an exothermic process. By convention, the heat of adsorption, A// d , has a positive 
sign, which is opposite to the normal thermodynamic convention [1]. Although the heat of adsorption has 
been directly measured with the use of a very sensitive microcalorimeter [26], it is more commonly measured 
via adsorption isotherms [1]. An isotherm is generated by measuring the coverage of adsorbates obtained by 
reaction at a fixed temperature as a function of the flux of incoming gas molecules. The flux is adjusted by 
regulating the pressure used during exposure. An analysis of the data then allows H ^ and other parameters to 
be determined. Heats of adsorption can also be determined from temperature programmed desorption (TPD) if 
the adsorption is reversible (TPD is discussed in section Al. 7. 5.4 and section B 1.25 ). 

When a molecule adsorbs to a surface, it can remain intact or it may dissociate. Dissociative chemisorption is 
common for many types of molecules, particularly if all of the electrons in the molecule are tied up so that 
there are no electrons available for bonding to the surface without dissociation. Often, a molecule will 
dissociate upon adsorption, and then recombine and desorb intact when the sample is heated. In this case, 
dissociative chemisorption can be detected with TPD by employing isotopically labelled molecules. If mixing 
occurs during the adsorption/desorption sequence, it indicates that the initial adsorption was dissociative. 
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Atom abstraction occurs when a dissociation reaction occurs on a surface in which one of the dissociation 
products sticks to the surface, while another is emitted. If the chemisorption reaction is particularly 
exothermic, the excess energy generated by chemical bond formation can be channelled into the kinetic 
energy of the desorbed dissociation fragment. An example of atom abstraction involves the reaction of 
molecular halogens with Si surfaces [27, 28]. In this case, one halogen atom chemisorbs while the other atom 
is ejected from the surface. 

A1. 7.3.3 ADSORPTION KINETICS 

When an atom or molecule approaches a surface, it feels an attractive force. The interaction potential between 
the atom or molecule and the surface, which depends on the distance between the molecule and the surface 
and on the lateral position above the surface, determines the strength of this force. The incoming molecule 
feels this potential, and upon adsorption becomes trapped near the minimum in the well. Often the molecule 
has to overcome an activation barrier, E v before adsorption can occur. 

It is the relationship between the bound potential energy surface of an adsorbate and the vibrational states of 
the molecule that determine whether an adsorbate remains on the surface, or whether it desorbs after a period 
of time. The lifetime of the adsorbed state, x, depends on the size of the well relative to the vibrational energy 
inherent in the system, and can be written as 


x = r exp(Atf ads /*T). (A1.7.1) 

Such lifetimes vary from less than a picosecond to times greater than the age of the universe [29]. Thus, 
adsorbed states with short lifetimes can occur during a surface chemical reaction, or long-lived adsorbed 
states exist in which atoms or molecules remain attached to a surface indefinitely. 

In this manner, it can also be seen that molecules will desorb as the surface temperature is raised. This is the 
phenomenon employed for TPD spectroscopy (see section Al. 7. 5.4 and section B 1.25 ). Note that some 
adsorbates may adsorb and desorb reversibly, i.e. the heats of adsorption and desorption are equal. Other 
adsorbates, however, will adsorb and desorb via different pathways. 

Note that chemisorption often begins with physisorption into a weakly bound precursor state. While in this 


state, the molecule can diffuse along the surface to find a likely site for chemisorption. This is particularly 
important in the case of dissociative chemisorption, as the precursor state can involve physisorption of the 
intact molecule. If a precursor state is involved in adsorption, a negative temperature dependence to the 
adsorption probability will be found. A higher surface temperature reduces the lifetime of the physisorbed 
precursor state, since a weakly bound species will not remain on the surface in the presence of thermal 
excitation. Thus, the sticking probability will be reduced at higher surface temperatures. 

The kinetics of the adsorption process are important in determining the value and behaviour of S for any given 
system. There are several factors that come into play in determining S [25]. 
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(a) The activation barrier must be overcome in order for a molecule to adsorb. Thus, only the fraction of the 
incident particles whose energy exceeds E 3Ct will actually stick. 

(b) The electronic orbitals of the incoming molecule must have the correct orientation with respect to the 
orbitals of the surface. Thus, only a fraction of the incoming molecules will immediately stick to the 
surface. Some of the incoming molecules may, however, diffuse across the surface while in a precursor 
state until they achieve the proper orientation. Thus, the details of how the potential energy varies across 
the surface are critical in determining the adsorption kinetics. 

(c) Upon adsorption, a molecule must effectively lose the remaining part of its kinetic energy, and possibly 
the excess energy liberated by an exothermic reaction, in a time period smaller than one vibrational 
period. Thus, excitations of the surface that can carry away this excess energy, such as plasmons or 
phonons, play a role in the adsorption kinetics. 

(d) Adsorption sites must be available for reaction. Thus, the kinetics may depend critically on the coverage 
of adsorbates already present on the surface, as these adsorbates may block or modify the remaining 
adsorption sites. 

A1.7.3.4 ADSORPTION MODELS 

The most basic model for chemisorption is that developed by Langmuir. In the Langmuir model, it is assumed 
that there is a finite number of adsorption sites available on a surface, and each has an equal probability for 
reaction. Once a particular site is occupied, however, the adsorption probability at that site goes to zero. 
Furthermore, it is assumed that the adsorbates do not diffuse, so that once a site is occupied it remains 
unreactive until the adsorbate desorbs from the surface. Thus, the sticking probability S goes to zero when the 
coverage, 0, reaches the saturation coverage, Q . These assumptions lead to the following relationship 
between the sticking coefficient and the surface coverage, 


S=ft(] -0/tfo)- (A1.7.2) 


The straight line in figure Al.7.8 shows the relationships between S and expected for various models, with 
the straight line indicating Langmuir adsorption. 
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Figure Al.7.8. Sticking probability as a function of surface coverage for three different adsorption models. 

Adsorbate atoms have a finite lifetime, x, for remaining on a surface. Thus, there will always be a flux of 
molecules leaving the surface even as additional molecules are being adsorbed. If the desorption rate is equal 
to the rate of adsorption, then an isotherm can be collected by measuring the equilibrium coverage at a fixed 
temperature as a function of pressure,/?. From the assumptions of the Langmuir model, one can derive the 
following expression relating the equilibrium coverage to pressure [29]. 


= 


XP 


l+XP 


(A1.7.3) 


where % is a constant that depends on the adsorbate lifetime and surface temperature, T, as 


x«rr"\ 


I/2 


(A1.7.4) 


If Langmuir adsorption occurs, then a plot of versus/? for a particular isotherm will display the form of 
equation (Al.7.3). Measurements of isotherms are routinely employed in this manner in order to determine 
adsorption kinetics. 

Langmuir adsorption adequately describes the behaviour of many systems in which strong chemisorption 
takes place, but it has limitations. For one, the sticking at surface sites actually does depend on the occupancy 
of neighbouring sites. Thus, sticking probability usually changes with coverage. A common observation, for 
example, is that the sticking probability is reduced exponentially with coverage, i.e. 
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S <x exp(— ttOfkT) 


(A1.7.5) 


which is called the Elovich equation [25]. This is compared to the Langmuir model in figure Al.7.8 . 


If adsorption occurs via a physisorbed precursor, then the sticking probability at low coverages will be 
enhanced due to the ability of the precursor to diffuse and find a lattice site [30]. The details depend on 
parameters such as strength of the lateral interactions between the adsorbates and the relative rates of 
desorption and reaction of the precursor. In figure Al.7.8 an example of a plot of S versus for precursor 
mediated adsorption is presented. 

Another limitation of the Langmuir model is that it does not account for multilayer adsorption. The 
Braunauer, Emmett and Teller (BET) model is a refinement of Langmuir adsorption in which multiple layers 
of adsorbates are allowed [29, 31 ]. In the BET model, the particles in each layer act as the adsorption sites for 
the subsequent layers. There are many refinements to this approach, in which parameters such as sticking 
coefficient, activation energy, etc, are considered to be different for each layer. 

A1. 7.3.5 ADSORPTION SITES 

When atoms, molecules, or molecular fragments adsorb onto a single-crystal surface, they often arrange 
themselves into an ordered pattern. Generally, the size of the adsorbate-induced two-dimensional surface unit 
cell is larger than that of the clean surface. The same nomenclature is used to describe the surface unit cell of 
an adsorbate system as is used to describe a reconstructed surface, i.e. the symmetry is given with respect to 
the bulk terminated (unreconstructed) two-dimensional surface unit cell. 

When chemisorption takes place, there is a strong interaction between the adsorbate and the substrate. The 
details of this interaction determine the local bonding site, particularly at the lowest coverages. At higher 
coverages, adsorbate-adsorbate interactions begin to also play a role. Most non-metallic atoms will adsorb 
above the surface at specific lattice sites. Some systems have multiple bonding sites. In this case, one site will 
usually dominate at low coverage, but a second, less stable site will be filled at higher coverages. Some 
adsorbates will interact with only one surface atom, i.e. be singly coordinated, while others prefer multiple 
coordinated adsorption sites. Other systems may form alloys or intermix during adsorption. 

Local adsorption sites can be roughly classified either as on-top, bridge or hollow, as illustrated for a four-fold 
symmetric surface in figure Al.7.9 . In the on-top configuration, a singly coordinated adsorbate is attached 
directly on top of a substrate atom. A bridge site is the two-fold site between two neighbouring surface atoms. 
A hollow site is positioned between three or four surface atoms, for surfaces with three- or four-fold 
symmetry, respectively. 
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Figure Al.7.9. Schematic diagram illustrating three types of adsorption sites. 

There are interactions between the adsorbates themselves, which greatly affect the structure of the adsorbates 
[32]. If surface diffusion is sufficiently facile during or following the adsorption step, attractive interactions 
can induce the adsorbates to form islands in which the local adsorbate concentration is quite high. Other 
adsorbates may repel each other at low coverages forming structures in which the distance between adsorbates 


is maximized. Certain co-adsorption systems form complex ordered overlayer structures. The driving force in 
forming ordered overlay ers are these adsorbate-adsorbate interactions. These interactions dominate the long- 
range structure of the surface in the same way that long-range interactions cause the formation of three- 
dimensional solid crystals. 

Adsorbed atoms and molecules can also diffuse across terraces from one adsorption site to another [33], On a 
perfect terrace, adatom diffusion could be considered as a 'random walk' between adsorption sites, with a 
diffusivity that depends on the barrier height between neighbouring sites and the surface temperature [29]. 
The diffusion of adsorbates has been studied with FIM [14], STM [34, 35] and laser- induced thermal 
desorption [36]. 

A1. 7.3.6 ADSORPTION-INDUCED RECONSTRUCTION 

When an adsorbate attaches to a surface, the substrate itself may respond to the perturbation by either losing 
its relaxation or reconstruction, or by forming a new reconstruction. This is not surprising, considering the 
strength of a chemisorption bond. Chemisorption bonds can provide electrons to satisfy the requirements for 
charge neutrality or electron pairing that may otherwise be missing at a surface. 

For a reconstructed surface, the effect of an adsorbate can be to provide a more bulk-like environment for the 
outermost layer of substrate atoms, thereby lifting the reconstruction. An example of this is As adsorbed onto 
Si(l 1 1)-(7 x 7) [37]. Arsenic atoms have one less valence electron than Si. Thus, if an As atom were to 
replace each outermost Si atom in the bulk-terminated structure, a smooth surface with no unpaired electrons 
would be produced, with a second layer consisting of Si atoms in their bulk positions. Arsenic adsorption has, 
in fact, been found to remove the reconstruction and form a Si(l 1 1)-(1 x 1)-As structure. This surface has a 
particularly high stability due to the absence of dangling bonds. 

An example of the formation of a new reconstruction is given by certain fee (110) metal surfaces. The clean 
surfaces have (1 x 1) symmetry, but become (2x1) upon adsorption of oxygen [16, 38]. The (2x1) 
symmetry is not just due to oxygen being adsorbed into a (2 x l) surface unit cell, but also because the 
substrate atoms rearrange themselves 
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into a new configuration. The reconstruction that occurs is sometimes called the 'missing-row' structure 
because every other row of surface atoms along the 2x direction is missing. A more correct terminology, 
however, is the 'added-row' structure, as STM studies have shown that it is formed by metal atoms diffusing 
away from a step edge and onto a terrace to create a new first layer, rather than by atoms being removed [16]. 
In this case, the (2x1) symmetry results not just from the long-range structure of the adsorbed layer, but also 
from a rearrangement of the substrate atoms. 

A more dramatic type of restructuring occurs with the adsorption of alkali metals onto certain fee metal 
surfaces [39]. In this case, multilayer composite surfaces are formed in which the alkali and metal atoms are 
intermixed in an ordered structure. These structures involve the substitution of alkali atoms into substrate 
sites, and the details of the structures are found to be coverage-dependent. The structures are influenced by the 
repulsion between the dipoles formed by neighbouring alkali adsorbates and by the interactions of the alkalis 
with the substrate itself [40]. 

There is also an interesting phenomenon that has been observed following the deposition of the order of 1 ML 
of a metal onto another metallic substrate. For certain systems, this small coverage is sufficient to alter the 
surface energy so that a large-scale faceting of the surface occurs [41]. The morphology of such a faceted 


surface can be seen in the STM image of figure Al.7.10 which was collected from an annealed W(l 1 1) 
surface onto which a small amount of Pd had been deposited. 



Figure Al.7.10. STM image (1000 A x 1000 A) of the (1 1 1) surface of a tungsten single crystal, after it had 
been coated with a very thin film of palladium and heated to about 800 K (courtesy of Ted Madey). 
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A1. 7.3.7 WORK FUNCTION CHANGES INDUCED BY ADSORBATES 


The surface work function is formally defined as the minimum energy needed in order to remove an electron 
from a solid. It is often described as being the difference in energy between the Fermi level and the vacuum 
level of a solid. The work function is a sensitive measure of the surface electronic structure, and can be 
measured in a number of ways, as described in section B 1.26.4 . Many processes, such as catalytic surface 
reactions or resonant charge transfer between ions and surfaces, are critically dependent on the work function. 

When an electropositive or electronegative adsorbate attaches itself to a surface, there is usually a change in 
the surface dipole, which, in turn, affects the surface work function. Thus, very small coverages of adsorbates 
can be used to modify the surface work function in order to ascertain the role that the work function plays in a 
given process. Conversely, work function measurements can be used to accurately determine the coverage of 
these adsorbates. 


For example, alkali ions adsorbed onto surfaces donate some or all of their valence electron to the solid, 
thereby producing dipoles pointing away from the surface [40, 42]. This has the effect of substantially 
lowering the work function for coverages as small as 0.01 ML. When the alkali coverage is increased to the 
point at which the alkali adsorbates can interact with each other, they tend to depolarize. Thus, the work 
function initially decreases as alkali atoms are adsorbed until a minimum in the work function is attained. At 
higher alkali coverages, the work function may increase slightly due to the adsorbate-adsorbate interactions. 
Note that it is very common to use alkali adsorption as a means of modifying the surface work function. 


A1.7.3.8 SURFACE CHEMICAL REACTIONS 

Surface chemical reactions can be classified into three major categories [29]: 

(a) corrosion reactions, 

(b) crystal growth reactions, 

(c) catalytic reactions. 

All three types of reactions begin with adsorption of species onto a surface from the gas phase. 

In corrosion, adsorbates react directly with the substrate atoms to form new chemical species. The products 
may desorb from the surface (volatilization reaction) or may remain adsorbed in forming a corrosion layer. 
Corrosion reactions have many industrial applications, such as dry etching of semiconductor surfaces. An 
example of a volatilization reaction is the etching of Si by fluorine [43], In this case, fluorine reacts with the 
Si surface to form SiF 4 gas. Note that the crystallinity of the remaining surface is also severely disrupted by 
this reaction. An example of corrosion layer formation is the oxidation of Fe metal to form rust. In this case, 
none of the products are volatile, but the crystallinity of the surface is disrupted as the bulk oxide forms. 
Corrosion and etching reactions are discussed in more detail in section A3. 10 and section C2.9 . 

The growth of solid films onto solid substrates allows for the production of artificial structures that can be 
used for many purposes. For example, film growth is used to create pn junctions and metal-semiconductor 
contacts during semiconductor manufacture, and to produce catalytic surfaces with properties that are not 
found in any single material. Lubrication can be applied to solid surfaces by the appropriate growth of a solid 
lubricating film. Film growth is also 
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used to fabricate quantum- wells and other types of layered structures that have unique electronic properties. 
These reactions may involve dissociative or non-dissociative adsorption as the first step. The three basic types 
of film growth reactions are physical vapour deposition (PVD), chemical vapour deposition (CVD) and 
molecular beam epitaxy (MBE). In PVD, an atomic gas is condensed onto a surface forming a solid. In CVD, 
a molecular gas dissociates upon adsorption. Some of the dissociation fragments solidify to form the material, 
while other dissociation fragments are evolved back into the gas phase. In MBE, carefully controlled atomic 
and/or molecular beams are condensed onto a surface in the proper stoichiometry in order to grow a desired 
material [44]. MBE is particularly important in the growth of III-V semiconductor materials. 

In crystal growth reactions, material is deposited onto a surface in order to extend the surface crystal structure, 
or to grow a new material, without disruption of the underlying substrate. Growth mechanisms can be roughly 
divided into three categories. If the film grows one atomic layer at a time such that a smooth, uniform film is 
created, it is called Frank von der Merwe growth. Such layer-by-layer growth will occur if the surface energy 
of the overlayer is lower than that of the substrate. If the film grows in a von der Merwe growth mode such 
that it forms a single crystal in registry with the substrate, it is referred to as epitaxial. The smaller the lattice 
mismatch between the overlayer and the substrate, the more likely it is that epitaxial growth can be achieved. 
If the first ML is deposited uniformly, but subsequent layers agglomerate into islands, it is called Stranshi- 
Krastanov growth. In this case, the surface energy of the first layer is lower than that of the substrate, but the 
surface energy of the bulk overlayer material is higher. If the adsorbate agglomerates into islands 
immediately, without even wetting the surface, it is referred to as Vollmer-Weber growth. In this case, the 
surface energy of the substrate is lower than that of the overlayer. Growth reactions are discussed in more 
detail in section A3. 10. 


The desire to understand catalytic chemistry was one of the motivating forces underlying the development of 
surface science. In a catalytic reaction, the reactants first adsorb onto the surface and then react with each 
other to form volatile product(s). The substrate itself is not affected by the reaction, but the reaction would not 
occur without its presence. Types of catalytic reactions include exchange, recombination, unimolecular 
decomposition, and bimolecular reactions. A reaction would be considered to be of the Langmuir- 
Hinshelwood type if both reactants first adsorbed onto the surface, and then reacted to form the products. If 
one reactant first adsorbs, and the other then reacts with it directly from the gas phase, the reaction is of the 
Eley-Ridel type. Catalytic reactions are discussed in more detail in section A3. 10 and section C2.8 . 

A tremendous amount of work has been done to delineate the detailed reaction mechanisms for many catalytic 
reactions on well characterized surfaces [1, 45]. Many of these studies involved impinging molecules onto 
surfaces at relatively low pressures, and then interrogating the surfaces in vacuum with surface science 
techniques. For example, a useful technique for catalytic studies is TPD, as the reactants can be adsorbed onto 
the sample in one step, and the products formed in a second step when the sample is heated. Note that 
catalytic surface studies have also been performed by reacting samples in a high-pressure cell, and then 
returning them to vacuum for measurement. 

Recently, in situ studies of catalytic surface chemical reactions at high pressures have been undertaken [ 46 , 
47 ], These studies employed sum frequency generation (SFG) and STM in order to probe the surfaces as the 
reactions are occurring under conditions similar to those employed for industrial catalysis (SFG is a laser- 
based technique that is described in section Al. 7. 5. 5 and section B 1.22 ). These studies have shown that the 
highly stable adsorbate sites that are probed under vacuum conditions are not necessarily the same sites that 
are active in high-pressure catalysis. Instead, less stable sites that are only occupied at high pressures are often 
responsible for catalysis. Because the active 
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adsorption sites are not populated at low pressures, they are not seen in vacuum surface science experiments. 
Despite this, however, the low-pressure experiments are necessary in order to calibrate the spectroscopy so 
that the high-pressure results can be properly interpreted. 


A1.7.4 PREPARATION OF CLEAN SURFACES 

The exact methods employed to prepare any particular surface for study vary from material to material, and 
are usually determined empirically. In some respects, sample preparation is more of an art than a science. 
Thus, it is always best to consult the literature to look for preparation methods before starting with a new 
material. 

Most samples require some initial ex situ preparation before insertion into a vacuum chamber [45]. A bulk 
single crystal must first be oriented [48], which is usually done with back-reflection Laue x-ray diffraction, 
and then cut to expose the desired crystal plane. Samples are routinely prepared to be within ±1° of the 
desired orientation, but an accuracy of ±1/4° or better can be routinely obtained. Cutting is often done using 
an electric discharge machine (spark cutter) for metals or a diamond saw or slurry drill for semiconductors. 
The surface must then be polished. Most polishing is done mechanically, with alumina or diamond paste, by 
polishing with finer and finer grits until the finest available grit is employed, which is usually of the order of 
0.5 |um. Often, as a final step, the surface is electrochemically or chemi-mechanically polished. In addition, 
some samples are chemically reacted in solution in order to remove a large portion of the oxide layer that is 
present due to reaction with the atmosphere. Note that this layer is referred to as the native oxide. 


In order to maintain the cleanliness of a surface at the atomic level, investigations must be carried out in ultra- 
high vacuum (UHV). UHV is usually considered to be a pressure of the order of 1 x 10 -10 Torr or below. 
Surface science techniques are often sensitive to adsorbate levels as small as 1% of ML or less, so that great 
care must be taken to keep the surface contamination to a minimum. Even at moderate pressures, many 

contaminants will easily adsorb onto a surface. For example, at 1 x 10 Torr, which is a typical pressure 
realized by many diffusion-pumped systems, a 1 L exposure to the background gases will occur in 1 s. Thus, 
any molecule that is present in the background and has a high sticking probability, such as water or oxygen, 
will cover the surface within seconds. It is for this reason that extremely low pressures are necessary in order 
to keep surfaces contaminant-free at the atomic level. 

Once a sample is properly oriented and polished, it is placed into a UHV chamber for the final preparation 
steps. Samples are processed in situ by a variety of methods in order to produce an atomically clean and flat 
surface. Ion bombardment and annealing (IB A) is the most common method used. Other methods include 
cleaving and film growth. 

In IB A, the samples are first irradiated for a period of time with noble gas ions, such as Ar + or Ne + , that have 
kinetic energies in the range of 0.5-2.0 keV. This removes the outermost layers of adsorbed contaminants and 
oxides by the process of sputtering. In sputtering, ions directly collide with the atoms at the surface of the 
sample, physically knocking out material. Usually the sample is at room temperature during sputtering and the 
ion beam is incident normal to the surface. Certain materials, however, are better prepared by sputtering at 
elevated temperature or with different incidence directions. 
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Because keV ions penetrate several layers deep into a solid, a side effect of sputtering is that it destroys the 
crystallinity of the surface region. In the preparation of a single-crystal surface, the damage is removed by 
annealing (heating) the surface in UHV in order to re-crystallize it. Care must be taken to not overheat the 
sample for (at least) two reasons. First, surfaces will melt and/or sublime well below the melting point of the 
bulk material. Second, contaminants sometimes diffuse to the surface from the bulk at high temperatures. If 
the annealing temperature is not high enough, however, the material will not be sufficiently well ordered. 
Thus, care must be taken to determine the optimal annealing temperature for any given material. 

After a sample has been sputtered to remove the contaminants and then annealed at the proper temperature to 
re-crystallize the surface region, a clean, atomically smooth and homogeneous surface can be produced. Note, 
however, that it usually takes many cycles of IB A to produce a good surface. This is because a side effect of 
annealing is that the chamber pressure is raised as adsorbed gases are emitted from the sample holder, which 
causes additional contaminants to be deposited on the surface. Also, contaminants may have diffused to the 
surface from the bulk during annealing. Another round of sputtering is then needed to remove these additional 
contaminants. After a sufficient number of cycles, the contaminants in either the sample holder or the bulk 
solid are depleted to the point that annealing does not significantly contaminate the surface. 

For some materials, the most notable being silicon, heating alone suffices to clean the surface. Commercial Si 
wafers are produced with a thin layer of silicon dioxide covering the surface. This native oxide is inert to 
reaction with the atmosphere, and therefore keeps the underlying Si material clean. The native oxide layer is 
desorbed, i.e. removed into the gas phase, by heating the wafer in UHV to a temperature above approximately 
1 100 °C. This procedure directly forms a clean, well ordered Si surface. 

At times, in situ chemical treatments are used to remove particular contaminants. This is done by introducing 

a low pressure (~10 6 Torr) of gas to the vacuum chamber, which causes it to adsorb (stick) to the sample 
surface, followed by heating the sample to remove the adsorbates. The purpose is to induce a chemical 


reaction between the contaminants and the adsorbed gas to form a volatile product. For example, carbon can 
be removed by exposing a surface to hydrogen gas and then heating it. This procedure produces methane gas, 
which desorbs from the surface into the vacuum. Similarly, hydrogen adsorption can be used to remove 
oxygen by forming gaseous water molecules. 

Certain materials, most notably semiconductors, can be mechanically cleaved along a low-index crystal plane 
in situ in a UHV chamber to produce an ordered surface without contamination. This is done using a sharp 
blade to slice the sample along its preferred cleavage direction. For example, Si cleaves along the (1 1 1) plane, 
while III-V semiconductors cleave along the (110) plane. Note that the atomic structure of a cleaved surface 
is not necessarily the same as that of the same crystal face following treatment by IB A. 

In addition, ultra-pure films are often grown in situ by evaporation of material from a filament or crucible, by 
molecular beam epitaxy (MBE), or with the use of chemical methods. Since the films are grown in UHV, the 
surfaces as grown will be atomically clean. Film growth has the advantage of producing a much cleaner 
and/or more highly ordered surface than could be obtained with IBA. In addition, certain structures can be 
formed with MBE that cannot be produced by any other preparation method. Film growth is discussed more 
explicitly above in section Al. 7. 3. 8 and in section A3. 10 . 


-25- 


A1.7.5 TECHNIQUES FOR THE INVESTIGATION OF SURFACES 

Because surface science employs a multitude of techniques, it is necessary that any worker in the field be 
acquainted with at least the basic principles underlying the most popular ones. These will be briefly described 
here. For a more detailed discussion of the physics underlying the major surface analysis techniques, see the 
appropriate chapter in this encyclopedia, or [49]. 

With the exception of the scanning probe microscopies, most surface analysis techniques involve scattering of 
one type or another, as illustrated in figure Al.7.1 1. A particle is incident onto a surface, and its interaction 
with the surface either causes a change to the particles' energy and/or trajectory, or the interaction induces the 
emission of a secondary particle(s). The particles that interact with the surface can be electrons, ions, photons 
or even heat. An analysis of the mass, energy and/or trajectory of the emitted particles, or the dependence of 
the emitted particle yield on a property of the incident particles, is used to infer information about the surface. 
Although these probes are indirect, they do provide reliable information about the surface composition and 
structure. 
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Figure Al.7.11. Schematic diagram of a generic surface science experiment. Particles, such as photons, 
electrons, or ions, are incident onto a solid surface, while the particles emitted from the surface are collected 
and measured by the detector. 


Energetic particles interacting can also modify the structure and/or stimulate chemical processes on a surface. 
Absorbed particles excite electronic and/or vibrational (phonon) states in the near-surface region. Some 
surface scientists investigate the fundamental details of particle-surface interactions, while others are 
concerned about monitoring the changes to the surface induced by such interactions. Because of the 
importance of these interactions, the physics involved in both surface analysis and surface modification are 
discussed in this section. 

The instrumentation employed for these studies is almost always housed inside a stainless-steel UHV 
chamber. One UHV chamber usually contains equipment for performing many individual techniques, each 
mounted on a different port, so that they can all be applied to the same sample. The sample is mounted onto a 
manipulator that allows for movement of the sample from one port to another, as well as for in situ heating 
and often cooling with liquid nitrogen (or helium). The chamber contains facilities for sample preparation, 
such as sputtering and annealing, as well as the possibility for gaseous exposures and/or film growth. Many 
instruments also contain facilities for the transfer of the 
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sample from one chamber to another while maintaining UHV. This allows for the incorporation of even more 
techniques, as well as the easy introduction of new samples into the chamber via a load-lock mechanism. 
Sample transfer into a reaction chamber also allows for the exposure of samples at high pressures or with 
corrosive gases or liquids that could not otherwise be introduced into a UHV chamber. 

Below are brief descriptions of some of the particle-surface interactions important in surface science. The 
descriptions are intended to provide a basic understanding of how surfaces are probed, as most of the 
information that we have about surfaces was obtained through the use of techniques that are based on such 
interactions. The section is divided into some general categories, and the important physics of the interactions 
used for analysis are emphasized. All of these techniques are described in greater detail in subsequent sections 
of the encyclopaedia. Also, note that there are many more techniques than just those discussed here. These 
particular techniques were chosen not to be comprehensive, but instead to illustrate the kind of information 
that can be obtained from surfaces and interfaces. 

A1.7.5.1 ELECTRON SPECTROSCOPY 

Electrons are extremely useful as surface probes because the distances that they travel within a solid before 
scattering are rather short. This implies that any electrons that are created deep within a sample do not escape 
into vacuum. Any technique that relies on measurements of low-energy electrons emitted from a solid 
therefore provides information from just the outermost few atomic layers. Because of this inherent surface 
sensitivity, the various electron spectroscopies are probably the most useful and popular techniques in surface 
science. 

Electrons interact with solid surfaces by elastic and inelastic scattering, and these interactions are employed in 
electron spectroscopy. For example, electrons that elastically scatter will diffract from a single-crystal lattice. 
The diffraction pattern can be used as a means of structural determination, as in LEED. Electrons scatter 
inelastically by inducing electronic and vibrational excitations in the surface region. These losses form the 
basis of electron energy loss spectroscopy (EELS). An incident electron can also knock out an inner-shell, or 
core, electron from an atom in the solid that will, in turn, initiate an Auger process. Electrons can also be used 
to induce stimulated desorption, as described in section Al. 7. 5. 6 . 

Figure Al. 7. 12 shows the scattered electron kinetic energy distribution produced when a monoenergetic 
electron beam is incident on an Al surface. Some of the electrons are elastically backscattered with essentially 


no energy loss, as evidenced by the elastic peak. Others lose energy inelastically, however, by inducing 
particular excitations in the solid, but are then emitted from the surface by elastic backscattering. The plasmon 
loss features seen in figure Al. 7. 12 represent scattered electrons that have lost energy inelastically by 
excitation of surface plasmons. A plasmon is a collective excitation of substrate electrons, and a single 
plasmon excitation typically has an energy in the range of 5-25 eV. A small feature due to the emission of 
Auger electrons is also seen in the figure. Finally, the largest feature in the spectrum is the inelastic tail. The 
result of all of the electronic excitations is the production of a cascade of secondary electrons that are ejected 
from the surface. The intensity of the secondary electron 'tail' increases as the kinetic energy is reduced, until 
the cutoff energy is reached. The exact position of the cutoff is determined by the surface work function, and, 
in fact, is often used to measure the work function changes as the surface composition is modified. 
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Figure Al.7.12. Secondary electron kinetic energy distribution, obtained by measuring the scattered electrons 
produced by bombardment of Al(lOO) with a 170 eV electron beam. The spectrum shows the elastic peak, loss 
features due to the excitation of plasmons, a signal due to the emission of Al LMM Auger electrons and the 
inelastic tail. The exact position of the cutoff at eV depends on the surface work function. 

The inelastic mean free path (IMFP) is often used to quantify the surface sensitivity of electron spectroscopy. 
The IMFP is the average distance that an electron travels through a solid before it is annihilated by inelastic 
scattering. The minimum in the IMFP for electrons travelling in a solid occurs just above the plasmon energy, 
as these electrons have the highest probability for excitation. Thus, for most materials, the electrons with the 
smallest mean free path are those with approximately 25-50 eV of kinetic energy [50]. When performing 
electron spectroscopy for quantitative analysis, it is necessary to define the mean escape depth (MED), rather 
then just use the IMFP [51]. The MED is the average depth below the surface from which electrons have 
originated, and includes losses by all possible elastic and inelastic mechanisms. Typical values of the MED 
for 10-1000 eV electrons are in the range of 4-10 A, which is of the order of the interlay er spacings of a solid 
[52, 53 ]. Electron attenuation is modelled by assuming that the yield of electrons originating from a particular 
depth within the sample decreases exponentially with increasing depth, i.e., 


Number of electrons = exp(— zl/X), 


(A1.7.6) 


Where X is the MED for the particular material and d is the distance below the surface from which the 


electron originated. This consideration allows measurements of depth distributions by changing either the 
electron kinetic energy or the emission angle in order to vary X. 

A popular electron-based technique is Auger electron spectroscopy (AES), which is described in section 
Bl.25.2.2 . In AES, a 3-5 keV electron beam is used to knock out inner-shell, or core, electrons from atoms in 
the near-surface region of the material. Core holes are unstable, and are soon filled by either fluorescence or 
Auger decay. In the Auger 
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process, one valence, or lower-lying core, electron fills the hole while another is emitted from the sample, in 
order to satisfy conservation of energy. The emitted Auger electrons have kinetic energies that are 
characteristic of a particular element. The Perkin-Elmer Auger handbook contains sample spectra of each 
element, along with information on the relative sensitivity of each Auger line [54]. AES is most useful as a 
quantitative measure of the surface atomic composition, and is a standard technique employed to determine 
sample cleanliness. The ratio of the AES signal from an adsorbate to that of the substrate is also commonly 
used to quantify the coverage of an adsorbate. 

LEED is used primarily to ascertain the crystallinity and symmetry of a single-crystal surface, but can also be 
used to obtain detailed structural information [55, 56]. LEED is described in detail in section B 1.21 . In LEED, 
a 20-200 eV electron beam is incident upon a single-crystal surface along the sample normal. The angular 
distribution of the elastically scattered electrons is then measured, usually by viewing a phosphorescent 
screen. At certain angles, there are spots that result from the diffraction of electrons. The symmetry of the 
pattern of spots is representative of the two-dimensional unit cell of the surface. Note, however, that the 
spacings between LEED spots provide distances in inverse space, i.e. more densely packed LEED spots 
correspond to larger surface unit cells. The sharpness of the spots is an indication of the average size of the 
ordered domains on the surface. In order to extract detailed atomic positions from LEED, the intensity of the 
spots as a function of the electron energy, or intensity-voltage (I-V) curves, are collected and then compared 
to theoretical predictions for various surface structures [55, 56]- LEED I-V analysis is capable of providing 
structural details to an accuracy of 0.01 A. LEED is probably the most accurate structural technique available, 
but it will only work for structures that are not overly complex. 

The excitation of surface quanta can be monitored directly with EELS, as discussed in section B 1.7 and 
section Bl. 25. 5 . In EELS, a monoenergetic electron beam is incident onto a surface and the kinetic energy 
distribution of the scattered electrons is collected. The kinetic energy distribution will display peaks 
corresponding to electrons that have lost energy by exciting transitions in the near-surface region, such as the 
plasmon loss peaks shown in figure Al. 7. 12 . EELS can be used to probe electronic transitions, in which case 
incident electron energies in the range of 10-100 eV are used. More commonly, however, EELS is used to 
probe low-energy excitations, such as molecular vibrations or phonon modes [57]. In this case, very low 
incident electron energies (<10 eV) are employed and a very high-energy resolution is required. When EELS 
is performed in this manner, the technique is known as high-resolution electron energy loss spectroscopy 
(HREELS). 

Photoelectron spectroscopy provides a direct measure of the filled density of states of a solid. The kinetic 
energy distribution of the electrons that are emitted via the photoelectric effect when a sample is exposed to a 
monochromatic ultraviolet (UV) or x-ray beam yields a photoelectron spectrum. Photoelectron spectroscopy 
not only provides the atomic composition, but also information concerning the chemical environment of the 
atoms in the near-surface region. Thus, it is probably the most popular and useful surface analysis technique. 
There are a number of forms of photoelectron spectroscopy in common use. 


X-ray photoelectron spectroscopy (XPS), also called electron spectroscopy for chemical analysis (ESCA), is 
described in section B 1.25. 2.1 . The most commonly employed x-rays are the Mg Ka (1253.6 eV) and the Al 
Ka (1486.6 eV) lines, which are produced from a standard x-ray tube. Peaks are seen in XPS spectra that 
correspond to the bound core-level electrons in the material. The intensity of each peak is proportional to the 
abundance of the emitting atoms in the near-surface region, while the precise binding energy of each peak 
depends on the chemical oxidation state and local environment of the emitting atoms. The Perkin-Elmer XPS 
handbook contains sample spectra of each element and binding energies for certain compounds [58]. 
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XPS is also often performed employing synchrotron radiation as the excitation source [59]. This technique is 
sometimes called soft x-ray photoelectron spectroscopy (SXPS) to distinguish it from laboratory XPS. The 
use of synchrotron radiation has two major advantages: (1) a much higher spectral resolution can be achieved 
and (2) the photon energy of the excitation can be adjusted which, in turn, allows for a particular electron 
kinetic energy to be selected. 

One of the more recent advances in XPS is the development of photoelectron microscopy [60], By either 
focusing the incident x-ray beam, or by using electrostatic lenses to image a small spot on the sample, 
spatially-resolved XPS has become feasible. The limits to the spatial resolution are currently of the order of 1 
|um, but are expected to improve. This technique has many technological applications. For example, the 
chemical makeup of micromechanical and microelectronic devices can be monitored on the scale of the 
device dimensions. 

Ultraviolet photoelectron spectroscopy (UPS) is a variety of photoelectron spectroscopy that is aimed at 
measuring the valence band, as described in sectionBl.25.2.3 . Valence band spectroscopy is best performed 
with photon energies in the range of 20-50 eV. A He discharge lamp, which can produce 21.2 or 40.8 eV 
photons, is commonly used as the excitation source in the laboratory, or UPS can be performed with 
synchrotron radiation. Note that UPS is sometimes just referred to as photoelectron spectroscopy (PES), or 
simply valence band photoemission. 

A particularly useful variety of UPS is angle-resolved photoelectron spectroscopy (ARPES), also called 
angle-resolved ultraviolet photoelectron spectroscopy (ARUPS) [ 61 , 62 ]. In this technique, measurements are 
made of the valence band photoelectrons emitted into a small angle as the electron emission angle or photon 
energy is varied. This allows for the simultaneous determination of the kinetic energy and momentum of the 
photoelectrons with respect to the two-dimensional surface Brillouin zone. From this information, the 
electronic band structure of a single-crystal material can be experimentally determined. 

The diffraction of photoelectrons (or Auger electrons) is also used as a structural tool [63, 64]. When electrons 
of a well defined energy are created at a particular atomic site, such as in XPS or AES, then the emitted 
electrons interact with other atoms in the crystal structure prior to leaving the surface. The largest effect is 
'forward scattering', in which the intensity of an electron wave emitted from one atom is enhanced when it 
passes through another atom. Thus, the angular distribution of the emitted electron intensity provides a 'map' 
of the surface crystal structure. More generally, however, there is a complex multiple scattering behaviour, 
which produces variations of the emitted electron intensity with respect to both angle and energy such that the 
intensity modulations do not necessarily relate to the atomic bond directions. In order to determine a surface 
structure from such diffraction data, the measured angular and/or energy distributions of the Auger or 
photoelectrons is compared to a theoretical prediction for a given structure. Similar to LEED analysis, the 
structure employed for the calculation is varied until the best fit to the data is found. 

A1.7.5.2 ION SPECTROSCOPY 


Ions scattered from solid surfaces are useful probes for elemental identification of surface species and for 
measurements of the three-dimensional atomic structure of a single-crystal surface. Ions used for surface 
studies can be roughly divided into low (0.5-10 keV), medium (10-100 keV) and high (100 keV-1 MeV) 
energy regimes. In each regime, ions have distinct interactions with solid material and each regime is used for 
different types of measurements. The use of particle scattering for surface structure determination is described 
in detail in section B1.23. 
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The fundamental interactions between ions and surfaces can be separated into elastic and inelastic processes. 
When an ion undergoes a direct collision with a single atom in a solid, it loses energy elastically by 
transferring momentum to the target atom. As an ion travels through a material, it also loses energy 
inelastically by initiating various electronic and vibrational excitations. The elastic and inelastic energy losses 
can usually be treated independently from each other. 

Elastic losses result from binary collisions between the ions and unbound target atoms positioned at the lattice 
sites. For keV and higher energy ions, the cross sections for collisions are small enough that the ions 
essentially 'see' each atom in the solid individually, i.e. the trajectory can be considered as a sequence of 
events in which the ion interacts with one target atom at a time. This is the so-called binary collision 
approximation (BCA). The energy of a scattered particle is determined by conservation of energy and 
momentum during the single collision (the binding energy of the target atom to the surface can be neglected 
since it is considerably smaller than the energy of the ions). The smaller the mass of the target atom relative to 
the projectile, the more the energy that is lost during an elastic collision and the lower the scattered energy. 
Peaks are seen in scattered ion energy spectra, called single scattering peaks (SSP), or quasi-single (QS) 
scattering peaks, that result from these binary collisions. In this manner, ion scattering produces a mass 
spectrum of the surface region, as the position of each SSP indicates the mass of the target atom. 

Ions in the low-energy range have reasonably short penetration depths, and therefore provide a surface- 
sensitive means for probing a material. Low-energy ion scattering (LEIS), often called ion scattering 
spectroscopy (ISS), is generally used as a measure of the surface composition. The surface sensitivity when 
using noble gas ions for standard ISS results from the high probability for neutralization for any ions that have 
penetrated past the first atomic layer. The intensity of an SSP is related to the surface concentration of the 
particular element, but care must be taken in performing quantitative analysis to properly account for ion 
neutralization. Energy losses due to inelastic excitations further modify the ion energies and charge states of 
scattered particles. In the low-energy regime, these effects are often neglected, as they only slightly alter the 
shapes of the SSP and shift it to a lower energy. In the high-energy regime, however, inelastic excitations are 
dominant in determining the shape of the scattered ion energy spectrum, as in Rutherford backscattering 
spectroscopy (RBS) [65, 66], which is discussed in section B 1.24 . 

Measurements of the angular distributions of scattered ions are often used as a structural tool, as they depend 
strongly on the relative positions of the atoms in the near-surface region. Ion scattering is used for structure 
determination by consideration of the shadow cones and blocking cones. These 'cones' are the regions behind 
each atom from which incoming ions are excluded because of scattering. A shadow cone is formed when an 
ion is incident onto the surface, while a blocking cone is formed when an ion that has scattered from a deeply- 
lying atom interacts with a surface atom along the outgoing trajectory. The ion flux is increased at the edges 
of the cones. Thus, rotating the ion beam or detector relative to the sample alters the flux of ions that scatter 
from any particular atom. The angular distributions are usually analysed by comparing the measured 
distributions to those obtained by computer simulation for a given geometry. Shadow/blocking cone analysis 
is used in both low- and medium-energy ion scattering to provide the atomic structure, and is accurate to 
about 0.1 A [67, 68]. 


In the high-energy ion regime, ion channelling is used for surface structure determination [65, 66]. In this 
technique, the incident ion beam is aligned along a low-index direction in the crystal. Thus, most of the ions 
will penetrate into 'channels' created by the crystal structure. Those few ions that do backscatter from a 
surface atom are collected. The number of these scattering events is dependent on the detailed atomic 
structure. For performing a structure determination, the data is usually collected as 'rocking curves' in which 
the backscattered ion yield is collected as the crystal is precisely rotated about the channelling direction. The 
measured rocking curves are then compared to the 
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results of computer simulations performed for particular model surface structures. As in LEED I-V analysis, 
the structure employed for the simulation that most closely matches the experimental data is deemed to be 
correct. 

Ions are also used to initiate secondary ion mass spectrometry (SIMS) [69], as described in section B 1.25. 3 . In 
SIMS, the ions sputtered from the surface are measured with a mass spectrometer. SIMS provides an accurate 
measure of the surface composition with extremely good sensitivity. SIMS can be collected in the 'static' 
mode in which the surface is only minimally disrupted, or in the 'dynamic' mode in which material is 
removed so that the composition can be determined as a function of depth below the surface. SIMS has also 
been used along with a shadow and blocking cone analysis as a probe of surface structure [70], 

A1. 7.5.3 SCANNING PROBE METHODS 

Scanning probe microscopies have become the most conspicuous surface analysis techniques since their 
invention in the mid-1980s and the awarding of the 1986 Nobel Prize in Physics [ 71 , 72]. The basic idea 
behind these techniques is to move an extremely fine tip close to a surface and to monitor a signal as a 
function of the tip's position above the surface. The tip is moved with the use of piezoelectric materials, 
which can control the position of a tip to a sub-Angstom accuracy, while a signal is measured that is 
indicative of the surface topography. These techniques are described in detail in section B 1.20 . 

The most popular of the scanning probe techniques are STM and atomic force microscopy (AFM). STM and 
AFM provide images of the outermost layer of a surface with atomic resolution. STM measures the spatial 
distribution of the surface electronic density by monitoring the tunnelling of electrons either from the sample 
to the tip or from the tip to the sample. This provides a map of the density of filled or empty electronic states, 
respectively. The variations in surface electron density are generally correlated with the atomic positions. 
AFM measures the spatial distribution of the forces between an ultrafine tip and the sample. This distribution 
of these forces is also highly correlated with the atomic structure. STM is able to image many semiconductor 
and metal surfaces with atomic resolution. AFM is necessary for insulating materials, however, as electron 
conduction is required for STM in order to achieve tunnelling. Note that there are many modes of operation 
for these instruments, and many variations in use. In addition, there are other types of scanning probe 
microscopies under development. 

Scanning probe microscopies have afforded incredible insight into surface processes. They have provided 
visual images of surfaces on the atomic scale, from which the atomic structure can be observed in real time. 
All of the other surface techniques discussed above involve averaging over a macroscopic region of the 
surface. From STM images, it is seen that many surfaces are actually not composed of an ideal single domain, 
but rather contain a mixture of domains. STM has been able to provide direct information on the structure of 
atoms in each domain, and at steps and defects on surfaces. Furthermore, STM has been used to monitor the 
movement of single atoms on a surface. Refinements to the instruments now allow images to be collected 
over temperatures ranging from 4 to 1200 K, so that dynamical processes can be directly investigated. An 


STM has also been adapted for performing single-atom vibrational spectroscopy [73]. 

One of the more interesting new areas of surface science involves manipulation of adsorbates with the tip of 
an STM. This allows for the formation of artificial structures on a surface at the atomic level. In fact, STM 
tips are being investigated for possible use in lithography as part of the production of very small features on 
microcomputer chips [74]. 
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Some of the most interesting work in this area has involved physisorbed molecules at temperatures as low as 4 
K [75]. Note that it takes a specialized instrument to be able to operate at these low temperatures. An STM tip 
is brought into contact with the physisorbed species by lightly pushing down on it. Then, the STM tip is 
translated parallel to the surface while pressure is maintained on the adsorbate. In this manner, the adsorbates 
can be moved to any location on the surface. Manipulation of this type has led to the writing of 'IBM' with 
single atoms [76], as well as to the formation of structures such as the 'quantum corral' [77]. The quantum 
corral is so named, as it is an oval-shaped enclosure made from adsorbate atoms that provides a barrier for the 
free electrons of the metal substrate. Inside the corral, standing wave patterns are set up that can be imaged 
with the STM. 

There are many other experiments in which surface atoms have been purposely moved, removed or 
chemically modified with a scanning probe tip. For example, atoms on a surface have been induced to move 
via interaction with the large electric field associated with an STM tip [78]. A scanning force microscope has 
been used to create three-dimensional nanostructures by 'pushing' adsorbed particles with the tip [79]. In 
addition, the electrons that are tunnelling from an STM tip to the sample can be used as sources of electrons 
for stimulated desorption [80], The tunnelling electrons have also been used to promote dissociation of 
adsorbed 2 molecules on metal or semiconductor surfaces [ 81 , 82 ]. 

A1.7.5.4 THERMAL DESORPTION 

Temperature programmed desorption (TPD), also called thermal desorption spectroscopy (TDS), provides 
information about the surface chemistry such as surface coverage and the activation energy for desorption 
[49]. TPD is discussed in detail in section B 1.25 . In TPD, a clean surface is first exposed to a gaseous 

molecule that adsorbs. The surface is then quickly heated (on the order of 10 K s ), while the desorbed 
molecules are measured with a mass spectrometer. An analysis of TPD spectra basically provides three types 
of information: (1) The identities of the desorbed product(s) are obtained directly from the mass spectrometer. 
(2) The area of a TPD peak provides a good measure of the surface coverage. In cases where there are 
multiple species desorbed, the ratios of the TPD peaks provide the stoichiometry. (3) The shapes of the peaks, 
and how they change with surface coverage, provide detailed information on the kinetics of desorption. For 
example, the shapes of TPD curves differ for zeroth-, first- or second-order processes. 

A1.7.5.5 LASER-SURFACE INTERACTIONS 

Lasers have been used to both modify and probe surfaces. When operated at low fluxes, lasers can excite 
electronic and vibrational states, which can lead to photochemical modification of surfaces. At higher fluxes, 
the laser can heat the surface to extremely high temperatures in a region localized at the very surface. A high- 
power laser beam produces a very non-equilibrium situation in the near-surface region, during which the 
effective electron temperature can be extremely high. Thus, lasers can also be used to initiate thermal 
desorption. Laser-induced thermal desorption (LITD) has some advantages over TPD as an analytical 
technique [36]. When a laser is used to heat the surface, the heat is localized in the surface region and the 
temperature rise is extremely fast. It is also possible to produce excitations that involve multiple photons 


because of the high flux available with lasers. Furthermore, there are nonlinear effects that occur with laser 
irradiation of surfaces that allow for surface sensitive probes that do not require UHV, such as second 
harmonic generation (SHG) and sum frequency generation (SFG) [83, 84]. Optical techniques in surface 
science are discussed in section B1.22. 
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Surface photochemistry can drive a surface chemical reaction in the presence of laser irradiation that would 
not otherwise occur. The types of excitations that initiate surface photochemistry can be roughly divided into 
those that occur due to direct excitations of the adsorbates and those that are mediated by the substrate. In a 
direct excitation, the adsorbed molecules are excited by the laser light, and will directly convert into products, 
much as they would in the gas phase. In substrate-mediated processes, however, the laser light acts to excite 
electrons from the substrate, which are often referred to as 'hot electrons'. These hot electrons then interact 
with the adsorbates to initiate a chemical reaction. 

Femtosecond lasers represent the state-of-the-art in laser technology. These lasers can have pulse widths of 
the order of 100 fm s. This is the same time scale as many processes that occur on surfaces, such as desorption 
or diffusion. Thus, femtosecond lasers can be used to directly measure surface dynamics through techniques 
such as two-photon photoemission [85]. Femtochemistry occurs when the laser imparts energy over an 
extremely short time period so as to directly induce a surface chemical reaction [86]. 

A1.7.5.6 STIMULATED DESORPTION 

An electron or photon incident on a surface can induce an electronic excitation. When the electronic excitation 
decays, an ion or neutral particle can be emitted from the surface as a result of the excitation. Such processes 
are known as desorption induced by electronic transitions (DIET) [87]. The specific techniques are known as 
electron-stimulated desorption (ESD) and photon-stimulated desorption (PSD), depending on the method of 
excitation. 

A DIET process involves three steps: (1) an initial electronic excitation, (2) an electronic rearrangement to 
form a repulsive state and (3) emission of a particle from the surface. The first step can be a direct excitation 
to an antibonding state, but more frequently it is simply the removal of a bound electron. In the second step, 
the surface electronic structure rearranges itself to form a repulsive state. This rearrangement could be, for 
example, the decay of a valence band electron to fill a hole created in step (1). The repulsive state must have a 
sufficiently long lifetime that the products can desorb from the surface before the state decays. Finally, during 
the emission step, the particle can interact with the surface in ways that perturb its trajectory. 

There are two main theoretical descriptions applied to stimulated desorption. The Menzel-Gomer-Redhead 
(MGR) model is used to describe low-energy valence excitations, while the Knotek-Feibelman mechanism is 
used to describe a type of desorption that occurs with ionically-bound species. In the MGR model, it is 
assumed that the initial excitation occurs by absorption of a photon or electron to directly create an excited, 
repulsive state. This excited state can be neutral or ionic. It simply needs to have a sufficient lifetime so that 
desorption can occur before the system relaxes to the ground state. Thus, the MGR mechanism can be applied 
to positive or negative ion emission, or to the emission of a neutral atom. The Knotek-Feibelman mechanism 
applies when there is an ionic bond at the surface. In this case, the incident electron kicks out an inner-shell 
electron, and an Auger process then fills the resulting core hole. In the Auger process, one electron drops 
down to fill the hole, while another electron is emitted from the surface in order to satisfy conservation of 
energy. Thus, the system has lost at least two electrons, which is sufficient to turn the negatively charged 
anion into a positive ion. Finally, Coulomb repulsion between this positive ion and the cation leads to the 
emission of a positive ion from the surface. Although this mechanism was originally proposed for maximally 


valent bonding, it has since been observed to occur in a variety of systems providing that there is at least a 
moderate amount of charge transfer involved in the bonding. Note that this mechanism is often referred to as 
Auger-stimulated desorption (ASD). 
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Electron stimulated desorption angular distributions (ESDIAD) [88] provide a quick measure of the bond 
angles for a lightly bound adsorbate. ESDIAD patterns are recorded by impinging an electron beam onto a 
surface and then measuring the angular distributions of the desorbed ions with an imaging analyser. The 
measured ion emission angles are related to the original surface bond angles. The initial excitation responsible 
for ESD is normally directly along the bond axis. As an ion is exiting from a surface, however, there are two 
effects that act to alter the ion's trajectory. First, the ion is attracted to its image charge, which tends to spread 
out the ESDIAD pattern. Second, however, is that there is inhomogeneous neutralization of the emitted ions, 
in that the ions emitted at more grazing angles are preferentially neutralized. This acts to compress the 
observed pattern. Thus, a balance between these competing effects produces the measured angular 
distribution, and it is therefore difficult, although not impossible, to quantitatively determine the bond angle. 

The ESDIAD pattern does, however, provide very useful information on the nature and symmetry of an 
adsorbate. As an example, figure Al. 7. 13(a) shows the ESDIAD pattern of desorbed F + collected from a 0.25 
ML coverage of PF 3 on Ru(0001) [89]. The F + pattern displays a ring of emission, which indicates that the 
molecule adsorbs intact and is bonded through the P end. It freely rotates about the P-Ru bond so that the F + 
emission occurs at all azimuthal angles, regardless of the substrate structure. In figure Al. 7. 13(b) , the 

ESDIAD pattern is shown following sufficient e~-beam damage to remove much of the fluorine and produce 

adsorbed PF 2 and PF. Now, the F + emission shows six lobes along particular azimuths and one lobe along the 
surface normal. The off-normal lobes arise from PF 2 , and indicate that PF 2 adsorbs in registry with the 

substrate, with the F atoms pointing away from the surface at an off-normal angle. The centre lobe arises from 

PF and indicates that the PF moiety is bonded through the P end, with F pointing normal to the surface. 



Figure Al.7.13. ESDIAD patterns showing the angular distributions of F + emitted from PF 3 adsorbed on Ru 
(0001) under electron bombardment, (a) 0.25 ML coverage, (b) the same surface following electron beam 
damage. 

Some recent advances in stimulated desorption were made with the use of femtosecond lasers. For example, it 
was shown by using a femtosecond laser to initiate the desorption of CO from Cu while probing the surface 
with SHG, that the entire process is completed in less than 325 fs [90]. The mechanism for this kind of laser- 
induced desorption has been termed desorption induced by multiple electronic transitions (DIMET) [91]. Note 
that the mechanism must involve a multiphoton process, as a single photon at the laser frequency has 
insufficient energy to directly induce desorption. DIMET is a modification of the MGR mechanism in which 
each photon excites the adsorbate to a higher vibrational level, until a sufficient amount of vibrational energy 
has been amassed so that the particle can escape the surface. 
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A1.7.6 LIQUID-SOLID INTERFACE 

One of the less explored frontiers in atomic-scale surface science is the study of the liquid-solid interface. 
This interface is critically important in many applications, as well as in biological systems. For example, the 
movement of pollutants through the environment involves a series of chemical reactions of aqueous 
groundwater solutions with mineral surfaces. Although the liquid-solid interface has been studied for many 
years, it is only recently that the tools have been developed for interrogating this interface at the atomic level. 
This interface is particularly complex, as the interactions of ions dissolved in solution with a surface are 
affected not only by the surface structure, but also by the solution chemistry and by the effects of the electrical 
double layer [31] . It has been found, for example, that some surface reconstructions present in UHV persist 
under solution, while others do not. 

The electrical double layer basically acts as a capacitor by storing charge at the surface that is balanced by 
ions in solution [92]. The capacitance of the double layer is a function of the electrochemical potential of the 
solution, and has a maximum at the potential of zero-charge (pzc). The pzc in solution is essentially 
equivalent to the work function of that surface in vacuum. In solution, however, the electrode potential can be 
used to vary the surface charge in much the same way that alkali adsorbates are used to vary the work 
function of a surface in vacuum. The difference is that in solution the surface charge can be varied, while the 
surface composition is unchanged. The surface energy, which effects the atomic structure and reactivity, is 
directly related to the surface charge. It has been shown, for example, that by adjusting the electrode potential 
the reconstructions of certain surfaces in solution can be altered in a reversible manner. Electrochemistry can 
also be used to deposit and remove adsorbates from solution in a manner that is controlled by the electrode 
potential. 

Studies of the liquid-solid interface can be divided into those that are performed ex situ and those performed 
in situ. In an ex situ experiment, a surface is first reacted in solution, and then removed from the solution and 
transferred into a UHV spectrometer for measurement. There has recently been, however, much work aimed 
at interrogating the liquid-solid interface in situ, i.e. while chemistry is occurring rather than after the fact. 

In performing ex situ surface analysis, the transfer from solution to the spectrometer sometimes occurs either 
through the air or within a glove bag filled with an inert atmosphere. Many ex situ studies of chemical 
reactions at the liquid-solid interface, however, have been carried out using special wet cells that are directly 
attached to a UHV chamber [93, 94]. With this apparatus, the samples can be reacted and then immediately 
transferred to UHV without encountering air. Note that some designs enable complete immersion of the 
sample into solution, while others only allow the sample surface to interact with a meniscus. Although these 
investigations do not probe the liquid-solid interface directly, they can provide much information on the 
surface chemistry that has taken place. 
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One of the main uses of these wet cells is to investigate surface electrochemistry [94, 95]. In these 
experiments, a single-crystal surface is prepared by UHV techniques and then transferred into an 
electrochemical cell. An electrochemical reaction is then run and characterized using cyclic voltammetry, with 
the sample itself being one of the electrodes. In order to be sure that the electrochemical measurements all 
involved the same crystal face, for some experiments a single-crystal cube was actually oriented and polished 
on all six sides! Following surface modification by electrochemistry, the sample is returned to UHV for 


measurement with standard techniques, such as AES and LEED. It has been found that the chemisorbed layers 
that are deposited by electrochemical reactions are stable and remain adsorbed after removal from solution. 
These studies have enabled the determination of the role that surface structure plays in electrochemistry. 

The force between two adjacent surfaces can be measured directly with the surface force apparatus (SFA), as 
described in section B 1.20 [96]. The SFA can be employed in solution to provide an in situ determination of 
the forces. Although this instrument does not directly involve an atomically resolved measurement, it has 
provided considerable insight into the microscopic origins of surface friction and the effects of electrolytes 
and lubricants [97]. 

Scanning probe microscopies are atomically resolved techniques that have been successfully applied to 
measurements of the liquid-solid interface in situ [98, 99, 100 , 101 and 102 ]. The STM has provided 
atomically resolved images of surface reconstructions and adsorption geometry under controlled conditions in 
solution, and the dependence of these structures on solution composition and electrode potential. Note that in 
order to perform STM under solution, a special tip coated with a dielectric must be used in order to reduce the 
Faradaic current that would otherwise transmit through the solution. As an example, figure Al.7.14 shows an 
STM image collected in solution from docosanol physisorbed on a graphite surface. The graphite lattice and 
the individual atoms in the adsorbed molecules can be imaged with atomic resolution. In addition, scanning 
probe microscopies have been used to image the surfaces of biological molecules and even living cells in 
solution [ 103 ], 
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Figure Al.7.14. 3.4 nm x 3.4 nm STM images of 1 -docosanol physisorbed onto a graphite surface in 
solution. This image reveals the hydrogen-bonding alcohol molecules assembled in lamellar fashion at the 
liquid-solid interface. Each 'bright' circular region is attributed to the location of an individual hydrogen 


atom protruding upward out of the plane of the all-trans hydrocarbon backbone, which is lying flat on the 
surface, (a) Top view, and (b) a perspective image (courtesy of Leanna Giancarlo and George Flynn). 

Since water is transparent to visible light, optical techniques can be used to interrogate the liquid-solid 
interface in situ [ 104 ]. For example, SFG has been used to perform IR spectroscopy directly at the liquid- 
solid interface [ 105 , 106 ]. The surface sensitivity of SFG arises from the breaking of centrosymmetry at the 
interface, rather than from electron attenuation as in more traditional surface techniques, so that the 
information obtained is relevant to atomic-scale processes at the solid-liquid interface. This allows for the 
identification of the adsorbed species while a reaction is occurring. Note that these techniques can be extended 
to the liquid-liquid interface, as well [ 107 ]. In addition, x-ray scattering employing synchrotron radiation is 
being developed for use at the liquid-solid interface. For example, an in situ electrochemical cell for x-ray 
scattering has been designed [ 108 ], 
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A2.1 Classical thermodynamics 

Robert L Scott 


A2.1.1 INTRODUCTION 

Thermodynamics is a powerful tool in physics, chemistry and engineering and, by extension, to substantially 
all other sciences. However, its power is narrow, since it says nothing whatsoever about time-dependent 
phenomena. It can demonstrate that certain processes are impossible, but it cannot predict whether 
thermodynamically allowed processes will actually take place. 

It is important to recognize that thermodynamic laws are generalizations of experimental observations on 
systems of macroscopic size; for such bulk systems the equations are exact (at least within the limits of the 
best experimental precision). The validity and applicability of the relations are independent of the correctness 
of any model of molecular behaviour adduced to explain them. Moreover, the usefulness of thermodynamic 
relations depends crucially on measurability; unless an experimenter can keep the constraints on a system and 
its surroundings under control, the measurements may be worthless. 

The approach that will be outlined here is due to Caratheodory [1] and Born [2] and should present fresh 
insights to those familiar only with the usual development in many chemistry, physics or engineering 
textbooks. However, while the formulations differ somewhat, the equations that finally result are, of course, 
identical. 

A2.1.2 THE ZEROTH LAW 

A2.1.2.1 THE STATE OF A SYSTEM 

First, a few definitions: a system is any region of space, any amount of material for which the boundaries are 
clearly specified. At least for thermodynamic purposes it must be of macroscopic size and have a topological 
integrity. It may not be only part of the matter in a given region, e.g. all the sucrose in an aqueous solution. A 
system could consist of two non-contiguous parts, but such a specification would rarely be useful. 

To define the thermodynamic state of a system one must specify the values of a minimum number of 
variables, enough to reproduce the system with all its macroscopic properties. If special forces (surface 
effects, external fields — electric, magnetic, gravitational, etc) are absent, or if the bulk properties are 
insensitive to these forces, e.g. the weak terrestrial magnetic field, it ordinarily suffices — for a one-component 
system — to specify three variables, e.g. the temperature T, the pressure/? and the number of moles n, or an 
equivalent set. For example, if the volume of a surface layer is negligible in comparison with the total volume, 
surface effects usually contribute negligibly to bulk thermodynamic properties. 


In order to specify the size of the system, at least one of these variables ought to be extensive (one that is 
proportional to the size of the system, like n or the total volume V). In the special case of several phases in 
equilibrium several extensive properties, e.g. n and Vfox two phases, may be required to determine the 
relative amounts of the two phases. The rest of the variables can be intensive (independent of the size of the 
system) like T,p, the molar volume V — V/n, or the density p. For multicomponent systems, additional 

variables, e.g. several ns, are needed to specify composition. 


For example, the definition of a system as 10.0 g H 2 at 10.0°C at an applied pressure/? = 1.00 atm is 
sufficient to specify that the water is liquid and that its other properties (energy, density, refractive index, even 
non-thermodynamic properties like the coefficients of viscosity and thermal conductivity) are uniquely fixed. 

Although classical thermodynamics says nothing about time effects, one must recognize that nearly all 
thermodynamic systems are metastable in the sense that over long periods of time — much longer than the 
time to perform experiments — they may change their properties, e.g. perhaps by a very slow chemical 
reaction. Moreover, the time scale is merely relative; if a thermodynamic measurement can be carried out fast 
enough that it is finished before some other reaction can perturb the system, but slow enough for the system to 
come to internal equilibrium, it will be valid. 

A2.1.2.2 WALLS AND EQUATIONS OF STATE 

Of special importance is the nature of the boundary of a system, i.e. the wall or walls enclosing it and 
separating it from its surroundings. The concept of 'surroundings' can be somewhat ambiguous, and its 
thermodynamic usefulness needs to be clarified. It is not the rest of the universe, but only the external 
neighbourhood with which the system may interact. Moreover, unless this neighbourhood is substantially at 
internal equilibrium, its thermodynamic properties cannot be exactly specified. Examples of 'surroundings' 
are a thermostatic bath or the external atmosphere. 

If neither matter nor energy can cross the boundary, the system is described as isolated; if only energy (but 
not matter) can cross the boundary, the system is closed; if both matter and energy can cross the boundary, the 
system is open. 

(Sometimes, when defining a system, one must be careful to clarify whether the walls are part of the system or 
part of the surroundings. Usually the contribution of the wall to the thermodynamic properties is trivial by 
comparison with the bulk of the system and hence can be ignored.) 

Consider two distinct closed thermodynamic systems each consisting of n moles of a specific substance in a 
volume Fand at a pressure/?. These two distinct systems are separated by an idealized wall that may be either 
adiabatic (heat-impermeable) or diathermic (heat-conducting). However, because the concept of heat has not 
yet been introduced, the definitions of adiabatic and diathermic need to be considered carefully. Both kinds of 
walls are impermeable to matter; a permeable wall will be introduced later. 

If a system at equilibrium is enclosed by an adiabatic wall, the only way the system can be disturbed is by 
moving part of the wall; i.e. the only coupling between the system and its surroundings is by work, normally 
mechanical. (The adiabatic wall is an idealized concept; no real wall can prevent any conduction of heat over 
a long time. However, heat transfer must be negligible over the time period of an experiment.) 


The diathermic wall is defined by the fact that two systems separated by such a wall cannot be at equilibrium 

at arbitrary values of their variables of state,/? 01 , F 01 ,/?^ and W. (The superscripts are not exponents; they 
symbolize different systems, subsystems or phases; numerical subscripts are reserved for components in a 

mixture.) Instead there must be a relation between the four variables, which can be called an equation of state: 

F{f\V*,pt\ V fi ) = Q. (A2.1.1) 

Equation (A2.1.1) is essentially an expression of the concept of thermal equilibrium. Note, however, that, in 
this formulation, this concept precedes the notion of temperature. 


To make the differences between the two kinds of walls clearer, consider the situation where both are ideal 
gases, each satisfying the ideal-gas lawpV= nRT. If the two were separated by a diathermic wall, one would 

observe experimentally that/? a F°7pPFP = C where the constant C would be n a /n^. If the wall were adiabatic, 
the two pV products could be varied independently. 

A2.1.2.3 TEMPERATURE AND THE ZEROTH LAW 

The concept of temperature derives from a fact of common experience, sometimes called the 'zeroth law of 
thermodynamics', namely, if two systems are each in thermal equilibrium with a third, they are in thermal 
equilibrium with each other. To clarify this point, consider the three systems shown schematically in figure 
A2.1.1, in which there are diathermic walls between systems a and y and between systems P and y, but an 
adiabatic wall between systems a and p. 



Figure A2.1.1. Illustration of the zeroth law. Three systems with two diathermic walls (solid) and one 
adiabatic wall (open). 


Equation (A2.1.1) governs the diathermic walls, so one may write 

F A (p ff T,^F}-0 (A2.1.2a) 

f B (^, V^p y ,V y ) = Q. (A2.1.2/)) 

It is a universal experimental observation, i.e. a 'law of nature', that the equations of state of systems 1 and 2 
are then coupled as if the wall separating them were diathermic rather than adiabatic. In other words, there is a 
relation 

F c (p w ,v*,/,y^) = a (A2.i.2c) 

It may seem that equation (A2.1.2c) is just a mathematical consequence of equation (2.1. 2a) and equation 
(2.1.26), but it is not; it conveys new physical information. If one rewrites equation (2.1.2a) and equation 
(2.1.2b) in the form 


it is evident that this does not reduce to equation (A2.1.2c) unless one can separate V 1 out of the equation. 
This is not possible unless fa = / p (p", V i/ )g(V y ) + fr(l^)and ^ = fp(p fi m V fi )g{V y ) + h{V fi ). If equation 

(A2.1.2c) is a statement of a general experimental result, then f u (p**, V u ) =: fAp^, L^)and the symmetry of 

equation (2.1.2a), equation (2.1.26) and equation (A2.1.2c) extends the equality to f y {p v , V y ): 

f«{f, V") = f^, V?) = f y {f t V) = 0. (A2.1.3) 

The three systems share a common property 0, the numerical value of the three functions f a ,fa andf, which 
can be called the empirical temperature. The equations (A2.1.3) are equations of state for the various systems, 

but the choice of is entirely arbitrary, since any function of/Xe.g.^r, log/, cos f- 3/, etc) will satisfy 
equation (A2.1.3) and could serve as 'temperature'. 

Redlich [3] has criticized the 'so-called zeroth law' on the grounds that the argument applies equally well for 
the introduction of any generalized force, mechanical (pressure), electrical (voltage), or otherwise. The 
difference seems to be that the physical nature of these other forces has already been clearly defined or 
postulated (at least in the conventional development of physics) while in classical thermodynamics, especially 
in the Born-Caratheodory approach, the existence of temperature has to be inferred from experiment. 

For convenience, one of the systems will be taken as an ideal gas whose equation of state follows Boyle 's law, 

pV = Uf(0) = wCfl ifi (A2.1.4) 

and which defines an ideal-gas temperature 0- proportional to pV/n. Later this will be identified with the 
thermodynamic temperature T. It is now possible to use the pair of variables Fand instead ofp and Fto 
define the state of the system (for fixed n). [The pair/? and would also do unless there is more than one 
phase present, in which case some variable or variables (in addition to n) must be extensive.] 


A2.1.3 THE FIRST LAW 


A2.1.3.1 WORK 


There are several different forms of work, all ultimately reducible to the basic definition of the infinitesimal 
work Dw =fdl where/is the force acting to produce movement along the distance d/. Strictly speaking, both/ 
and d/ are vectors, so Dw is positive when the extension d/ of the system is in the same direction as the 
applied force; if they are in opposite directions Dw is negative. Moreover, this definition assumes (as do all 
the equations that follow in this section) that there is a substantially equal and opposite force resisting the 
movement. Otherwise the actual work done on the system or by the system on the surroundings will be less or 
even zero. As will be shown later, the maximum work is obtained when the process is essentially 'reversible'. 

The work depends on the detailed path, so Dw is an 'inexact differential' as symbolized by the capitalization. 
(There is no established convention about this symbolism; some books — and all mathematicians — use the 
same symbol for all differentials; some use 8 for an inexact differential; others use a bar through the d; still 
others — as in this article — use D.) The difference between an exact and an inexact differential is crucial in 
thermodynamics. In general, the integral of a differential depends on the path taken from the initial to the final 
state. However, for some special but important cases, the integral is independent of the path; then and only 
then can one write 


/ 


/ 

dF = F; - F t = AF. 


One then speaks of F as a 'state function' because it is a function only of those variables that define the state 
of the system, and not of the path by which the state was reached. An especially important feature of such 
functions is that if one writes DF as a function of several variables, say x, y, z, 

DF = X(jc, if, z) dx + Y(x, y, z) dy + Z(x n y n z) dU 

then, for exact differentials only, X = (tiF/n.v)j, = , V = (BFfttif) liZ mA Z = (flF/flZ^.Since these exact 
differentials are path-independent, the order of differentiation is immaterial and one can then write 

(3 2 F/3*i7y) 2 = (dX/0if) XmZ = (9Y/dx)y ml etc. 

One way of verifying the exactness of a differential is to check the validity of expressions like that above. 
(A) GRAVITATIONAL WORK 

What is probably the simplest form of work to understand occurs when a force is used to raise the system in a 
gravitational field: 

Dwpw-mgdh 


where m is the mass of the system, g is the acceleration of gravity, and Ah is the infinitesimal increase in 
height. Gravitational work is rarely significant in most thermodynamic applications except when a falling 
weight outside the system drives a paddle wheel inside the system, as in one of the famous experiments in 
which Joule (1849) compared the work done with the increase in temperature of the system, and determined 
the 'mechanical equivalent of heat'. Note that, in this example, positive work is done on the system as the 
potential energy of the falling weight decreases. Note also that, in free fall, the potential energy of the weight 
decreases, but no work is done. 

(B) ONE-DIMENSIONAL WORK 

When a spring is stretched or compressed, work is done. If the spring is the system, then the work done on it 
is simply 

Dun = /di, 

Note that a displacement from the initial equilibrium, either by compression or by stretching, produces 
positive work on the system. A situation analogous to the stretching of a spring is the stretching of a chain 
polymer. 

(C) TWO-DIMENSIONAL (SURFACE) WORK 

When a surface is compressed by a force/= ttZ, the 'surface pressure' n =f/L is the force per unit width L 
producing a decrease in length Al. (Note that L and / are not the same; indeed they are orthogonal.) The work 
is then 


Dw 2 = — irdA 

where dA= L dlis the change in the surface area. This kind of work and the related thermodynamic functions 
for surfaces are important in dealing with monolayers in a Langmuir trough, and with membranes and other 
materials that are quasi-two-dimensional. 

(D) THREE-DIMENSIONAL (PRESSURE-VOLUME) WORK 

When a piston of area A, driven by a force/ = pA, moves a distance d/ = -dV/A, it produces a compression of 
the system by a volume dV. The work is then 

Dmj*= -pdV. (A2.1.5) 

It is this type of work that is ubiquitous in chemical thermodynamics, principally because of changes of the 
volume of the system under the external pressure of the atmosphere. The negative sign of the work done on 
the system is, of course, because the application of excess pressure produces a decrease in volume. (The 
negative sign in the two-dimensional case is analogous.) 


(E) OTHER MECHANICAL WORK 

One can also do work by stirring, e.g. by driving a paddle wheel as in the Joule experiment above. If the 
paddle is taken as part of the system, the energy input (as work) is determined by appropriate measurements 
on the electric motor, falling weights or whatever drives the paddle. 

(F) ELECTRICAL WORK 

When a battery (or a generator or other power supply) outside the system drives current, i.e. a flow of electric 
charge, through a wire that passes through the system, work is done on the system: 

DmiuIuc = £dQ 

where dQ is the infinitesimal charge that crosses the boundary of the system and 8 is the electric potential 
(voltage) across the system, i.e. between the point where the wire enters and the point where it leaves. 

Converting to current 1= dQIdt where dt is an infinitesimal time interval and to resistance ^= E/I one can 
rewrite this equation in the form 

Dw dK = £Idt = l$ 2 /7l) df. 

Such a resistance device is usually called an 'electrical heater' but, since there is no means of measurement at 
the boundary between the resistance and the material in contact with it, it is easier to regard the resistance as 
being inside the system, i.e. a part of it. Energy enters the system in the form of work where the wire breaches 
the wall, i.e. enters the container. 

(G) ELECTROCHEMICAL WORK 

A special example of electrical work occurs when work is done on an electrochemical cell or by such a cell on 
the surroundings (-w in the convention of this article). Thermodynamics applies to such a cell when it is at 
equilibrium with its surroundings, i.e. when the electrical potential (electromotive force emf) of the cell is 


balanced by an external potential. 

(H) ELECTROMAGNETIC WORK 

This poses a special problem because the source of the electromagnetic field may lie outside the defined 
boundaries of the system. A detailed discussion of this is outside the scope of this section, but the basic 
features can be briefly summarized. 

When a specimen is moved in or out of an electric field or when the field is increased or decreased, the total 
work done on the whole system (charged condenser + field + specimen) in an infinitesimal change is 

Dw c] = / dV{E + dD), 


*.-/ 


where E is the electric field vector, D = &E is the electric displacement vector, and s is the electric 
susceptibility tensor. The integration is over the whole volume encompassed by the total system, which must 
in principle extend as far as measurable fields exist. 

Similarly, when a specimen is moved in or out of a magnetic field or when the magnetic field is increased or 
decreased, the total work done on the whole system (coil + field + specimen) in an infinitesimal change is 

Dww= [ dV(J7- dB) 


-f 


where H is the magnetic field vector, B = \iH is the magnetic induction vector and is the magnetic 
permeability tensor. (Some modern discussions of magnetism regard B as the fundamental magnetic field 
vector, but usually fail to give a new name to H.) As before the integration is over the whole volume. 

For the special but familiar case of an isotropic specimen in a uniform external field E^ or i? , it can be shown 
[4] that 

Dri.' d = / dVUoJft - d£fa - P ♦ d-Efa) (A2.1.6) 

DMJ mag = / dV(U ♦ dBo//i () + 13^ - dM) (A2.1.7) 

where P is the polarization vector and M the magnetization vector; s Q and |Iq are the susceptibility and 
permeability of the vacuum in the absence of the specimen. The vector notation could now be dropped since 
the external field and the induced field are parallel and the scalar product of two vectors oriented identically is 
simply the product of their scalar magnitudes; this will not be done in this article to avoid confusion with 
other thermodynamic quantities. (Note that equation (A2.1.7) is not the analogue of equation (A2.1.6).) 

The work done increases the energy of the total system and one must now decide how to divide this energy 
between the field and the specimen. This separation is not measurably significant, so the division can be made 
arbitrarily; several self-consistent systems exist. The first term on the right-hand side of equation (A2.1.6) is 
obviously the work of creating the electric field, e.g. charging the plates of a condenser in the absence of the 
specimen, so it appears logical to consider the second term as the work done on the specimen. 


By analogy, one is tempted to make the same division in equation (A2.1.7), regarding the first term as the 
work of creating the magnetic field in the absence of the specimen and the second, f dV{Bo* dAI\ as the 

work done on the specimen. This is the way most books on thermodynamics present the problem and it is an 
acceptable convention, except that it is inconsistent with the measured spectroscopic energy levels and with 
one's intuitive idea of work. For example, equation (A2.1.7) says that the work done in moving a permanent 
magnet (constant magnetization M) into or out of an electromagnet of constant B^ is exactly zero! This is 
actually correct if one considers the extra electrochemical work done on the battery driving the current 
through the electromagnet while the permanent magnet is moving; this exactly balances the mechanical work. 
A careful analysis [5, 6] shows that, if one writes equation (A2.1.7) in the following form: 


Dm™* = j d V \B [y + dBuftiy - MdBjj + djflnjjf) ] 


then term A is the work of creating the field in the absence of the specimen; term B is the work done on the 

specimen by 'ponderable forces', e.g. by a spring or by a physical push or pull; this is directly reflected in a 

change of the kinetic energy of the electrons; and term C is the work done by the electromotive force in the 

coil in creating the interaction field between Bq and M. We elect to consider term B as the only work done on 

the specimen and write for the electromagnetic work 


Dw'ciOTiaB = j AVi-P* dE - M ♦ dB n ). 


If in addition the specimen is assumed to be spherical as well as isotropic, so that P and Mare uniform 
throughout the volume V, one can then write for the electromagnetic work 

Dri.'drcmir™* = V(-P* dJ3» - M - dB ), (A2.1.8) 

Equation (A2.1.8) turns out to be consistent with the changes of the energy levels measured spectroscopically, 
so the energy produced by work defined this way is frequently called the 'spectroscopic energy'. Note that the 
electric and magnetic parts of the equations are now symmetrical. 

A2.1.3.2ADIABATIC WORK 

One may now consider how changes can be made in a system across an adiabatic wall. The first law of 
thermodynamics can now be stated as another generalization of experimental observation, but in an unfamiliar 
form: the work required to transform an adiabatic (thermally insulated) system from a completely specified 
initial state to a completely specified final state is independent of the source of the work (mechanical, 
electrical, etc.) and independent of the nature of the adiabatic path. This is exactly what Joule observed; the 
same amount of work, mechanical or electrical, was always required to bring an adiabatically enclosed 
volume of water from one temperature 1 to another 2 . 

This can be illustrated by showing the net work involved in various adiabatic paths by which one mole of 
helium gas (4.00 g) is brought from an initial state in which/? = 1.000 atm, V= 24.62 1 [T= 300.0 K], to a 
final state in which/? = 1.200 atm, V= 30.779 1 [T= 450.0 K]. Ideal-gas behaviour is assumed (actual 
experimental measurements on a slightly non-ideal real gas would be slightly different). Information shown in 
brackets could be measured or calculated, but is not essential to the experimental verification of the first law. 
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Path I (a) Do electrical work on the system at constant V = 24.62 I until the 

pressure has risen to 1.500 atm. [AT = 150.0 K, w=(3/2)RAT\ 
(b) Expand the gas into a vacuum (i.e. against zero external pressure) 
until the total volume V is 30.77 I and p = 1 .200 atm. [AT = 0] 

Path II (a) Compress the gas reversibly and adiabatically from 1 .000 atm to 

1 .200 atm. [At the end of the compression T = 322.7 K, V = 22.07 \,w = 
(3/2)RA7] 
(b) Do electrical work on the system, holding the pressure constant at 1 .200 
atm, until the volume \/has increased to 30.77 I; under these 
circumstances the system also does expansion work against the external 
pressure. 

[Electrical work = (5/2)RAT\ 
[Expansion work = -pAV = -10.45 I atm] 

Path III (a) Do electrical work on the system, holding the pressure constant at 1 .000 
atm, until the volume \/has increased to 34.33 I; under these 
circumstances, the system also does expansion work against the external 
pressure. 

[Final 7= 418.4 K] 
[Electrical work = (5/2)R7] 
[Expansion work = -pAV= -9.71 I atm] 
(b) Compress the gas reversibly and adiabatically from 1 .000 atm to 

1 .200 atm. [At the end of the compression T = 450.0 K, V = 30.77 I, AT = 
31.65 K, w=(3/2)RT\ 


i/i/, =1871 J 
elec 


w =0 J 
exp 

1/ia = 1871 J 
tot 


w = 283 J 
comp 


w , = 2646 J 
elec 

w =-1058 J 
exp «^^ v 

w. = 1871 J 
tot 


w . = 2460 J 
elec 

w = -984 J 
exp 


w = 395 J 

comp 

w. = 1871 J 
tot 


For all of these adiabatic processes, the total (net) work is exactly the same. 


(As we shall see, because of the limitations that the second law of thermodynamics imposes, it may be 
impossible to find any adiabatic paths from a particular state A to another state B because S A - S B < 0. 
situation, however, there will be several adiabatic paths from state B to state A.) 


If the adiabatic work is independent of the path, it is the integral of an exact differential and suffices to define 
a change in a function of the state of the system, the energy U. (Some thermodynamicists call this the 'internal 
energy', so as to exclude any kinetic energy of the motion of the system as a whole.) 


dU = dw 


^iJijhlLlLC 


or 


AU = Uf(V f , f ) - Ui(Vi ? 0i) = / d?i^|i. lK . lt k: = WadiJilHitic- 


(A2.1.9) 


Here the subscripts i and/refer to the initial and final states of the system and the work w is defined as the 
work performed on the system (the opposite sign convention — with w as work done by the system on the 
surroundings — is also in common use). Note that a cyclic process (one in which the system is returned to its 
initial state) is not introduced; as will be seen later, a cyclic adiabatic process is possible only if every step is 
reversible. Equation (A2.1.9), i.e. the introduction of [/as a state function, is an expression of the law of 
conservation of energy. 
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A2.1.3.3 NON-ADIABATIC PROCESSES. HEAT 

Not all processes are adiabatic, so when a system is coupled to its environment by diathermic walls, the heat q 
absorbed by the system is defined as the difference between the actual work performed and that which would 
have been required had the change occurred adiabatically. 

Dq = dnj adiaba iic D ^- = dU Dw 
or 

q = MJadiabatlf -w = All- Hi. (A2.1.10) 

Note that, since Dw is inexact, so also must be Dq. 

This definition may appear eccentric because many people have an intuitive feeling for 'heat' as a certain kind 
of energy flow. However, thoughtful reconsideration supports a suspicion that the intuitive feeling is for the 
heat absorbed in a particular kind of process, e.g. constant pressure, for which, as we shall see, the heat q is 
equal to the change in a state function, the enthalpy change AH. For another example, the 'heats' measured in 
modern calorimeters are usually determined either by a measurement of electrical or mechanical work or by 
comparing one process with another so calibrated (as in an ice calorimeter). Indeed one can argue that one 
never measures q directly, that all 'measurements' require equation (A2.1.10); one always infers q from other 
measurements. 

A2.1.4 THE SECOND LAW 

In this and nearly all subsequent sections, the work Dw will be restricted to pressure-volume work, -p dV, and 
the fact that the 'heat' Dq may in some cases be electrical work will be ignored. 

A2.1.4.1 REVERSIBLE PROCESSES 

A particular path from a given initial state to a given final state is the reversible process, one in which after 
each infinitesimal step the system is in equilibrium with its surroundings, and one in which an infinitesimal 
change in the conditions (constraints) would reverse the direction of the change. 

A simple example ( figure A2.1.2 ) consists of a gas confined by a movable piston supporting a pile of sand 
whose weight produces a downward force per unit area equal to the pressure of the gas. Removal of a grain of 
sand decreases the downward pressure by an amount 8p and the piston rises with an increase of volume 8V 
sufficient to decrease the gas pressure by the same 8p; the system is now again at equilibrium. Restoration of 
the grain of sand will drive the piston and the gas back to their initial states. Conversely, the successive 
removal of additional grains of sand will produce additional small decreases in pressure and small increases in 
volume; the sum of a very large number of such small steps can produce substantial changes in the 
thermodynamic properties of the system. Strictly speaking, such experimental processes are never quite 
reversible because one can never make the small changes in pressure and volume infinitesimally small (in 
such a case there would be no tendency for change and the process would take place only at an infinitely slow 
rate). The true reversible process is an idealized concept; however, one can usually devise processes 
sufficiently close to reversibility that no measurable differences will be observed. 
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Figure A2.1.2. Reversible expansion of a gas with the removal one-by-one of grains of sand atop a piston. 

The mere fact that a substantial change can be broken down into a very large number of small steps, with 
equilibrium (with respect to any applied constraints) at the end of each step, does not guarantee that the 
process is reversible. One can modify the gas expansion discussed above by restraining the piston, not by a 
pile of sand, but by the series of stops (pins that one can withdraw one-by-one) shown in figure A2.1.3. Each 
successive state is indeed an equilibrium one, but the pressures on opposite sides of the piston are not equal, 
and pushing the pins back in one-by-one will not drive the piston back down to its initial position. The two 
processes are, in fact, quite different even in the infinitesimal limit of their small steps; in the first case work 
is done by the gas to raise the sand pile, while in the second case there is no such work. Both the processes 
may be called 'quasi-static' but only the first is anywhere near reversible. (Some thermodynamics texts 
restrict the term 'quasi-static' to a more restrictive meaning equivalent to 'reversible', but this then leaves no 
term for the slow irreversible process.) 



Figure A2.1.3. Irreversible expansion of a gas as stops are removed. 
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If a system is coupled with its environment through an adiabatic wall free to move without constraints (such 
as the stops of the second example above), mechanical equilibrium, as discussed above, requires equality of 
the pressure/? on opposite sides of the wall. With a diathermic wall, thermal equilibrium requires that the 
temperature of the system equal that of its surroundings. Moreover, it will be shown later that, if the wall is 
permeable and permits exchange of matter, material equilibrium (no tendency for mass flow) requires equality 
of a chemical potential |u. 


Obviously the first law is not all there is to the structure of thermodynamics, since some adiabatic changes 
occur spontaneously while the reverse process never occurs. An aspect of the second law is that a state 
function, the entropy S, is found that increases in a spontaneous adiabatic process and remains unchanged in a 
reversible adiabatic process; it cannot decrease in any adiabatic process. 

The next few sections deal with the way these experimental results can be developed into a mathematical 
system. A reader prepared to accept the second law on faith, and who is interested primarily in applications, 
may skip section A2.1.4.2 and section A2. 1.4. 6 and perhaps even A2.1.4.7 , and go to the final statement in 
section A2. 1.4. 8 . 

A2.1.4.2 ADIABATIC REVERSIBLE PROCESSES AND INTEGRABILITY 

In the example of the previous section, the release of the stop always leads to the motion of the piston in one 
direction, to a final state in which the pressures are equal, never in the other direction. This obvious 
experimental observation turns out to be related to a mathematical problem, the integrability of differentials in 
thermodynamics. The differential Dg, even D# rev , is inexact, but in mathematics many such expressions can 
be converted into exact differentials with the aid of an integrating factor. 

In the example of pressure-volume work in the previous section, the adiabatic reversible process consisted 
simply of the sufficiently slow motion of an adiabatic wall as a result of an infinitesimal pressure difference. 
The work done on the system during an infinitesimal reversible change in volume is then -p dFand one can 
write equation (A2. 1 . 1 1) in the form 


D^= dU + pdl/=0. (A2.1.11) 

If [/is expressed as a function of two variables of state, e.g. Fand 0, one can write dU = (dU/dV)^ &V+ 
(dU/dQ) v dQ and transform equation (A2.1.1 1) into the following: 

Tkt^=[{m/&Vh I p]dV + (3U/W)vd# = YdV+Zd0 = Q< (A2.1.12) 

The coefficients Y and Z are, of course, functions of Vand and therefore state functions. However, since in 
general (dp/dQy) is not zero, dY/dQ is not equal to dZ/dV, so D# rev is not the differential of a state function but 
rather an inexact differential. 

For a system composed of two subsystems a and P separated from each other by a diathermic wall and from 
the surroundings by adiabatic walls, the equation corresponding to equation (A2.1.12) is 
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= |(»UVaV tf ) fl +p tf ]dV tf + f(Bt/*/flV% + /|dV* + fO£/ tf /flfl)^ (A2 1 13) 

+ (?u/'7a0),,t]d0 
= xdv* + ydv /? + zdfl = a 

One must now examine the integrability of the differentials in equation (A2.1.12) and equation (A2.1.13), 
which are examples of what mathematicians call Pfaff differential equations. If the equation is integrable, one 
can find an integrating denominator X, a function of the variables of state, such that Dq YQy /X = d(|) where d(|) is 
the exact differential of a function § that defines a surface (line in the case of equation (A2. 1.12) ) in which the 
reversible adiabatic path must lie. 


All equations of two variables, such as equation (A2.1.12) , are necessarily integrable because they can be 
written in the form dy/dx =f{x, y), which determines a unique value of the slope of the line through any point 
(x,y). Figure A2.1.4 shows a set of non-intersecting lines in V-Q space representing solutions of equation 
(A2.1.12) . 



Figure A2.1.4. Adiabatic reversible (isentropic) paths that do not intersect. (The curves have been calculated 
for the isentropic expansion of a monatomic ideal gas.) 

For equations such as (A2.1.13) involving more than two variables the problem is no longer trivial. Most such 
equations are not integrable. 

(Born [2] cites as an example of a simple expression for which no integrating factor exists 


DF — dy + x dz = X(x, y, z) d0 1 
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If an integrating factor exists ^fttx — 0, JJ^/dy = 1/Aand i)$ffiz = ,T/A. From the first of these relations one 

concludes that § depends only on y and z. Using this result in the second relation one concludes that X depends 
only on y and z. Given that § and X are both functions only of y and z, the third relation is a contradiction, so 
no factor X can exist.) 

There are now various adiabatic reversible paths because one can choose to vary dV 1 or d^P in any 
combination of steps. The paths can cross and interconnect. The question of integrability is tied to the question 

of whether all regions of F 01 , V$, space are accessible by a series of connected adiabatic reversible paths or 
whether all such paths lie in a series of non-crossing surfaces. To distinguish, one must use a theorem of 

Caratheodory (the proof can be found in [I] and [2] and in books on differential equations): 


If a Pfaff differential expression DF = Xdx+Ydy + Zdz has the property that every arbitrary neighbourhood 
of a point P(x, y, z) contains points that are inaccessible along a path corresponding to a solution of the 
equation DF = 0, then an integrating denominator exists. Physically this means that there are two mutually 
exclusive possibilities: either {a) a hierarchy of non-intersecting surfaces §(x, y, z) = C, each with a different 
value of the constant C, represents the solutions DF = 0, in which case a point on one surface is inaccessible 


by a path that is confined to another, or (b) any two points can be connected by a path, each infinitesimal 
segment of which satisfies the condition DF = 0. One must perform some experiments to determine which 
situation prevails in the physical world. 

It suffices to carry out one such experiment, such as the expansion or compression of a gas, to establish that 
there are states inaccessible by adiabatic reversible paths, indeed even by any adiabatic irreversible path. For 
example, if one takes one mole of N 2 gas in a volume of 24 litres at a pressure of 1.00 atm (i.e. at 25 °C), 
there is no combination of adiabatic reversible paths that can bring the system to a final state with the same 
volume and a different temperature. A higher temperature (on the ideal-gas scale 9- ) can be reached by an 
adiabatic irreversible path, e.g. by doing electrical work on the system, but a state with the same volume and a 
lower temperature 0. is inaccessible by any adiabatic path. 

A2.1.4.3 ENTROPY AND TEMPERATURE 

One concludes, therefore, that equation (A2.1.13) is integrable and there exists an integrating factor X. For the 
general case Dg rev = A, d(|) it can be shown [l, 2] that 

lnA= / g(0) df) + In I (<p) 


* = fg(0) 


where I(<\>) is a constant of integration. It then follows that one may define two new quantities by the relations: 

ln<77 C) = f g(0) dO S = ( L/C) / / (0) d<^ 
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and one can now write 


DJ^v = kdtp = TdS, (A2.1.14) 

There are an infinite number of other integrating factors A with corresponding functions §; the new quantities 
Tand S are chosen for convenience. S is, of course, the entropy and T, a function of only, is the 'absolute 
temperature', which will turn out to be the ideal-gas temperature, 0. . The constant C is just a scale factor 
determining the size of the degree. 

The surfaces in which the paths satisfying the condition Dg = must lie are, thus, surfaces of constant 
entropy; they do not intersect and can be arranged in an order of increasing or decreasing numerical value of 
the constant S. One half of the second law of thermodynamics, namely that for reversible changes, is now 
established. 


Since Dw rey = -pdV, one can utilize the relation dU= D# rev + Dw rey and write 

dU = TdS- pdV. (A2.1.15) 

Equation (A2.1.15) involves only state functions, so it applies to any infinitesimal change in state whether the 
actual process is reversible or not (although, as equation (A2.1.14) suggests, dS is not experimentally 
accessible unless some reversible path exists). 


A2.1.4.4 THERMODYNAMIC TEMPERATURE AND THE IDEAL-GAS THERMOMETER 

So far, the thermodynamic temperature 7 has appeared only as an integrating denominator, a function of the 
empirical temperature 0. One now can show that Tis, except for an arbitrary proportionality factor, the same 
as the empirical ideal-gas temperature 0. introduced earlier. Equation (A2.1.15) can be rewritten in the form 

TdS = dU + pdV = [dU/B6}v dtf + {{BU/9V)* + p]dV. (A2.1 .16) 

One assumes the existence of a fluid that obeys Boyle's law ( equation (A2.1.4) ) and that, on adiabatic 
expansion into a vacuum, shows no change in temperature, i.e. for which /?F =f[Q) and (&UffiV)it = 0. (All 
real gases satisfy this condition in the limit of zero pressure.) Equation (A2.1.16) then simplifies to 

TdS = (dLLfdO)dO -h [/{ft)/ V]dV = f(0)[[(dU/df))/f(0)]dO + dV/V}, 

The factor in wavy brackets is obviously an exact differential because the coefficient of d0 is a function only 
of and the coefficient of dFis a function only of V. (The cross-derivatives vanish.) Manifestly then 

T = Cf{0) = C(pV) | 

a * — Tf{0) au ~t rF a v - r ~t Fv I 
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If the arbitrary constant C is set equal to (nR) where n is the number of moles in the system and R is the gas 
constant per mole, then the thermodynamic temperature T = 0- where 0- is the temperature measured by the 
ideal-gas thermometer depending on the equation of state 

pV=nRj^ = nRT (A2.1.17) 

Now that the identity has been proved 0- need not be used again. 
A2.1.4.5 IRREVERSIBLE CHANGES AND THE SECOND LAW 

It is still necessary to consider the role of entropy in irreversible changes. To do this we return to the system 
considered earlier in section A2. 1.4. 2 , the one composed of two subsystems in thermal contact, each coupled 
with the outside through movable adiabatic walls. Earlier this system was described as a function of three 

independent variables, F 01 , V$ and (or T). Now, instead of the temperature, the entropy S = S a + S$ will be 
used as the third variable. A final state V a \ V&\ 5'can always be reached from an initial state F 010 , J^ , SP by a 
two-step process. 

W The volumes are changed adiabatically and reversibly from V* and W to V^ and yfi , during which 
change the entropy remains constant at S>. 

(2) At constant volumes y^ and yfi ', the state is changed by the adiabatic performance of work (stirring, 
rubbing, electrical 'heating') until the entropy is changed from S> to S f . 

If the entropy change in step (2) could be at times greater than zero and at other times less than zero, every 
neighbouring state y*\ yfi\ y would be accessible, for there is no restriction on the adjustment of volumes in 


step (1). This contradicts the experimental fact that allowed the integration of equation (A2. 1.13) and 

established the entropy S as a state function. It must, therefore, be true that either S J > S° always or that S'< SP 
always. One experiment demonstrates that the former is the correct alternative; if one takes the absolute 
temperature as a positive number, one finds that the entropy cannot decrease in an adiabatic process. This 
completes the specification of temperature, entropy and part of the second law of thermodynamics. One 
statement of the second law of thermodynamics is therefore: 


for any adiabatic process (Dij = 0} di> > 0. 


(A2.1.18) 


(This is frequently stated for an isolated system, but the same statement about an adiabatic system is broader.) 
A2.1.4.6 IRREVERSIBLE CHANGES AND THE MEASUREMENT OF ENTROPY 

Thermodynamic measurements are possible only when both the initial state and the final state are essentially 
at equilibrium, i.e. internally and with respect to the surroundings. Consequently, for a spontaneous 
thermodynamic change to take place, some constraint — internal or external — must be changed or released. 
For example, the expansion of a gas requires the release of a pin holding a piston in place or the opening of a 
stopcock, while a chemical reaction can be initiated by mixing the reactants or by adding a catalyst. One often 
finds statements that 'at equilibrium in an isolated system (constant U, V, n), the entropy is maximized'. What 
does this mean? 
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Consider two ideal-gas subsystems a and P coupled by a movable diathermic wall (piston) as shown in figure 
A2. 1 .5. The wall is held in place at a fixed position / by a stop (pin) that can be removed; then the wall is free 
to move to a new position /'. The total system (a + (3) is adiabatically enclosed, indeed isolated (q = w = 0), so 
the total energy, volume and number of moles are fixed. 



Figure A2.1.5. Irreversible changes. Two gases at different pressures separated by a diathermic wall, a piston 
that can be released by removing a stop (pin). 

When the pin is released, the wall will either (a) move to the right, or (b) move to the left, or (c) remain at the 

original position /. It is evident that these three cases correspond to initial situations in which p a >pP,p a <p$ 
and v a = pP, respectively; if there are no other stops, the piston will come to rest in a final state where 
p a = p& . For the two spontaneous adiabatic changes (a) and (b) 9 the second law requires that AS > 0, but one 
does not yet know the magnitude. (Nothing happens in case (c), so AS = 0.) 


The change of case (a) can be carried out in a series of small steps by having a large number of stops 
separated by successive distances A/. For any intermediate step, ^ 

no matter how small, are never reversible, one still has no information about AS. 


1 > p a " > pP* > pP\ but since the steps, 


The only way to determine the entropy change is to drive the system back from the final state to the initial 
state along a reversible path. One reimposes a constraint, not with simple stops, but with a gear system that 
permits one to do mechanical work driving the piston back to its original position / Q along a reversible path; 
this work can be measured in various conventional ways. During this reverse change the system is no longer 
isolated; the total Fand the total n remain unchanged, but the work done on the system adds energy. To keep 
the total energy constant, an equal amount of energy must leave the system in the form of heat: 

d(J = D^ 4- Du; rilv = 


or 


AC »C f D{ 1™ f 


D?A 
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For an ideal gas and a diathermic piston, the condition of constant energy means constant temperature. The 
reverse change can then be carried out simply by relaxing the adiabatic constraint on the external walls and 
immersing the system in a thermostatic bath. More generally the initial state and the final state may be at 
different temperatures so that one may have to have a series of temperature baths to ensure that the entire 
series of steps is reversible. 

Note that although the change in state has been reversed, the system has not returned along the same detailed 
path. The forward spontaneous process was adiabatic, unlike the driven process and, since it was not 
reversible, surely involved some transient temperature and pressure gradients in the system. Even for a series 
of small steps ('quasi-static' changes), the infinitesimal forward and reverse paths must be different in detail. 
Moreover, because q and w are different, there are changes in the surroundings; although the system has 
returned to its initial state, the surroundings have not. 

One can, in fact, drive the piston in both directions from the equilibrium value / = l Q (p a =pP) and construct a 
curve of entropy S (with an arbitrary zero) as a function of the piston position / (figure A2.1.6). If there is a 

series of stops, releasing the piston will cause / to change in the direction of increasing entropy until the piston 

is constrained by another stop or until / reaches / . It follows that at / = / e , dS/dl = and d 2 S/dl 2 < 0; i.e. S is 
maximized when / is free to seek its own value. Were this not so, one could find spontaneous processes to 
jump from the final state to one of still higher entropy. 



Figure A2.1.6. Entropy as a function of piston position / (the piston held by stops). The horizontal lines mark 
possible positions of stops, whose release produces an increase in entropy, the amount of which can be 
measured by driving the piston back reversibly. 

Thus, the spontaneous process involves the release of a constraint while the driven reverse process involves 
the imposition of a constraint. The details of the reverse process are irrelevant; any series of reversible steps 
by which one can go from the final state back to the initial state will do to measure AS. 
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A2.1.4.7 IRREVERSIBLE PROCESSES: WORK, HEAT AND ENTROPY CREATION 

One has seen that thermodynamic measurements can yield information about the change AS in an irreversible 
process (and thereby the changes in other state functions as well). What does thermodynamics tell one about 
work and heat in irreversible processes? Not much, in spite of the assertion in many thermodynamics books 
that 


Dw = - P=Kt dV = Fcx tdV^ (A2.1.19) 

and 

Dq = -T^ t d$^= -dCU-Dw (A2.1.20) 

where p t and T t are the external pressure and temperature, i.e. those of the surroundings in which the 
changes dV Qxt = -dFand dS Qxt occur. 

Consider the situation illustrated in figure A2.1.5 , with the modification that the piston is now an adiabatic 
wall, so the two temperatures need not be equal. Energy is transmitted from subsystem a to subsystem p only 
in the form of work; obviously dV* = -&W so, in applying equation (A2.1.20), is dlf 1 "~* P equal to -p a dW = 
p a dV 1 or equal top? dF 01 , or is it something else entirely? One can measure the changes in temperature, 
J* _ 7^ and T& — T^and thus determine Alf 1 "* P after the fact, but could it have been predicted in 
advance, at least for ideal gases? If the piston were a diathermic wall so the final temperatures are equal, the 

energy transfer Alf 1 "^ P would be calculable, but even in this case it is unclear how this transfer should be 
divided between heat and work. 

In general, the answers to these questions are ambiguous. When the pin in figure A2.1.5 is released, the 
potential energy inherent in the pressure difference imparts a kinetic energy to the piston. Unless there is a 
mechanism for the dissipation of this kinetic energy, the piston will oscillate like a spring; frictional forces, of 
course, dissipate this energy, but the extent to which the dissipation takes place in subsystem a or subsystem 
P depends on details of the experimental design not uniquely fixed by specifying the initial thermodynamic 
state. (For example, one can introduce braking mechanisms that dissipate the energy largely in subsystem a 
or, conversely, largely in subsystem p.) Only in one special case is there a clear prediction: if one subsystem 
(P) is empty no work can be done by a on P; for expansion into a vacuum necessarily w = 0. A more detailed 
discussion of the work involved in irreversible expansion has been given by Kivelson and Oppenheim [7]. 

The paradox involved here can be made more understandable by introducing the concept of entropy creation. 
Unlike the energy, the volume or the number of moles, the entropy is not conserved. The entropy of a system 
(in the example, subsystems a or P) may change in two ways: first, by the transport of entropy across the 
boundary (in this case, from a to P or vice versa) when energy is transferred in the form of heat, and second, 


by the creation of entropy within the subsystem during an irreversible process. Thus one can write for the 
change in the entropy of subsystem a in which some process is occurring 
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where d t S a = -cLSP is the change in entropy due to heat transfer to subsystem a and d^ is the irreversible 
entropy creation inside subsystem a. (In the adiabatic example the dissipation of the kinetic energy of the 
piston by friction creates entropy, but no entropy is transferred because the piston is an adiabatic wall.) 

The total change dS a can be determined, as has been seen, by driving the subsystem a back to its initial state, 
but the separation into d^S a and dS a is sometimes ambiguous. Any statistical mechanical interpretation of the 
second law requires that, at least for any volume element of macroscopic size, d^ > 0. However, the total 

entropy change dS a can be either positive or negative since the second law places no limitation on either the 

sign or the magnitude of d t S a . (In the example above, the piston's adiabatic wall requires that d t S a = d f S$ = 
0.) 

In an irreversible process the temperature and pressure of the system (and other properties such as the 
chemical potentials \i to be defined later) are not necessarily definable at some intermediate time between the 
equilibrium initial state and the equilibrium final state; they may vary greatly from one point to another. One 
can usually define Tandp for each small volume element. (These volume elements must not be too small; e.g. 
for gases, it is impossible to define T,p, S, etc for volume elements smaller than the cube of the mean free 

path.) Then, for each such sub-subsystem, d^ (but not the total dS) must not be negative. It follows that d^ 01 , 
the sum of all the d^s for the small volume elements, is zero or positive. A detailed analysis of such 

irreversible processes is beyond the scope of classical thermodynamics, but is the basis for the important field 

of 'irreversible thermodynamics'. 

The assumption (frequently unstated) underlying equations (A2.1.19) and equation (A2.1.20) for the 
measurement of irreversible work and heat is this: in the surroundings, which will be called subsystem p, 
internal equilibrium (uniform I^,p^ and fi -""throughout the subsystem; i.e. no temperature, pressure or 
concentration gradients) is maintained throughout the period of time in which the irreversible changes are 

taking place in subsystem a. If this condition is satisfied d^ = and all the entropy creation takes place 
entirely in a. In any thermodynamic measurement that purports to yield values of q or w for an irreversible 
change, one must ensure that this condition is very nearly met. (Obviously, in the expansion depicted in figure 
A2.1.5 neither subsystem a nor subsystem (3 satisfied this requirement.) 

Essentially this requirement means that, during the irreversible process, immediately inside the boundary, i.e. 
on the system side, the pressure and/or the temperature are only infinitesimally different from that outside, 
although substantial pressure or temperature gradients may be found outside the vicinity of the boundary. 
Thus an infinitesimal change inp QXt or T t would instantly reverse the direction of the energy flow, i.e. the 
sign of w or q. That part of the total process occurring at the boundary is then 'reversible'. 

Subsystem p may now be called the 'surroundings' or as Callen (see further reading at the end of this article) 
does, in an excellent discussion of this problem, a 'source'. To formulate this mathematically one notes that, if 

d^P = 0, one can then write 
and thus 


d t S* = -d t S* = -D^/r* = d ? v^ 
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because Dg a , the energy received by a in the form of heat, must be the negative of that lost by p. Note, 
however, that the temperature specified is still 7^ ? since only in the P subsystem has no entropy creation been 
assumed (d^SP = 0). Then 


If one adds Dq a /T^ to both sides of the inequality one has 


A2.1.4.8 FINAL STATEMENT 

If one now considers a as the 'system' and P as the 'surroundings' the second law can be reformulated in the 
form: 

There exists a state function S, called the entropy of a system, related to the heat Dq absorbed from the 
surroundings during an infinitesimal change by the relations 


where T 7 is a positive quantity depending only on the (empirical) temperature of the surroundings. It is 
understood that for the surroundings d-S = 0. For the integral to have any meaning T must be constant, 
or one must change the surroundings in each step. The above equations can be written in the more compact 
form 

(A2.1.21) 

where, in this and subsequent similar expressions, the symbol > ('greater than or equal to') implies the 
equality for a reversible process and the inequality for a spontaneous (irreversible) process. 

Equation (A2.1.21) includes, as a special case, the statement dS > for adiabatic processes (for which Dq = 0) 
and, a fortiori, the same statement about processes that may occur in an isolated system (Dq = Dw = 0). If the 
universe is an isolated system (an assumption that, however plausible, is not yet subject to experimental 
verification), the first and second laws lead to the famous statement of Clausius: 'The energy of the universe 
is constant; the entropy of the universe tends always toward a maximum.' 

It must be emphasized that equation (A2.1.21) permits the entropy of a particular system to decrease; this can 
occur if more entropy is transferred to the surroundings than is created within the system. The entropy of the 
system cannot decrease, however, without an equal or greater increase in entropy somewhere else. 
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There are many equivalent statements of the second law, some of which involve statements about heat engines 
and 'perpetual motion machines of the second kind' that appear superficially quite different from equation 
(A2.1.21) . They will not be dealt with here, but two variant forms of equation (A2.1.21) may be noted: in 


view of the definition dS = Dg /r rr one can also write for an infinitesimal change 

lev S UX I 

and, because dU= Do , + Dw , = Da + Dw, 

Since w is defined as work done on the system, the minimum amount of work necessary to produce a given 
change in the system is that in a reversible process. Conversely, the amount of work done by the system on 
the surroundings is maximal when the process is reversible. 

One may note, in concluding this discussion of the second law, that in a sense the zeroth law (thermal 
equilibrium) presupposes the second. Were there no irreversible processes, no tendency to move toward 
equilibrium rather than away from it, the concepts of thermal equilibrium and of temperature would be 
meaningless. 

A2.1.5 OPEN SYSTEMS 

A2.1.5.1 PERMEABLE WALLS AND THE CHEMICAL POTENTIAL 

We now turn to a new kind of boundary for a system, a wall permeable to matter. Molecules that pass through 
a wall carry energy with them, so equation (A2.1.15) must be generalized to include the change of the energy 
with a change in the number of moles dn: 

dU = Td$-pdV + ,Ldn. (A2.1.22) 

Here |u is the 'chemical potential' just as the pressure/? is a mechanical potential and the temperature Tis a 
thermal potential. A difference in chemical potential A|u is a driving 'force' that results in the transfer of 
molecules through a permeable wall, just as a pressure difference Ap results in a change in position of a 
movable wall and a temperature difference AT produces a transfer of energy in the form of heat across a 
diathermic wall. Similarly equilibrium between two systems separated by a permeable wall must require 
equality of the chemical potential on the two sides. For a multicomponent system, the obvious extension of 
equation (A2.1.22) can be written 

dU = TdS-pdU | ^//,-djz; (A2.1.23) 
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where |u. and n. are the chemical potential and number of moles of the /th species. Equation (A2.1.23) can also 
be generalized to include various forms of work (such as gravitational, electrochemical, electromagnetic, 
surface formation, etc., as well as the familiar pressure-volume work), in which a generalized force X. 
produces a displacement dx. along the coordinate x., by writing 

dU = TdS + ^ x i d *) + J>idwi- 


As a particular example, one may take the electromagnetic work terms of equation (A2. 1.8) and write 

dU = TdS-V(P> dEa + M ♦ dU () J + J^ t *i d««- (A2.1.24) 

i 

The chemical potential now includes any such effects, and one refers to the gravochemical potential, the 
electrochemical potential, etc. For example, if the system consists of a gas extending over a substantial 
difference in height, it is the gravochemical potential (which includes a term mgh) that is the same at all 
levels, not the pressure. The electrochemical potential will be considered later. 

A2.1.5.2 INTERNAL EQUILIBRIUM 

Two subsystems a and (3, in each of which the potentials T,p, and all the ^.s are uniform, are permitted to 
interact and come to equilibrium. At equilibrium all infinitesimal processes are reversible, so for the overall 
system (a + P), which may be regarded as isolated, the quantities conserved include not only energy, volume 
and numbers of moles, but also entropy, i.e. there is no entropy creation in a system at equilibrium. One now 

considers an infinitesimal reversible process in which small amounts of entropy dS a ^P, volume dF 01 ^ and 
numbers of moles dj£~^are transferred from subsystem a to subsystem p. For this reversible change, one 

may use equation A2. 1.2 3 and write for dlf 1 and dlft 

dU* = -T° dS^ + p* dV*^ - J^tif dnp' 

du* = T* d$*^ - / dv*^ + y, nf dn r fi < 

i 
Combining, one obtains for dU 

dU = dW + dU^ = = (T p - T") d$"~ fi - (p* - p^dV-' 


Thermal equilibrium means free transfer (exchange) of energy in the form of heat, mechanical (hydrostatic) 
equilibrium means free transfer of energy in the form of pressure-volume work, and material equilibrium 
means free transfer 
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of energy by the motion of molecules across the boundary. Thus it follows that at equilibrium our choices of 
d^^P, d^^P, and dj£~**are independent and arbitrary. Yet the total energy must be kept unchanged, so the 
conclusion that the coefficients of dS 01 ^, dF 01- ^ and drif^ 5 must vanish is inescapable. 

T* = T* v" = v* it* = u? . 


P = r A; = W 


If there are more than two subsystems in equilibrium in the large isolated system, the transfers of S, Fand n. 
between any pair can be chosen arbitrarily; so it follows that at equilibrium all the subsystems must have the 
same temperature, pressure and chemical potentials. The subsystems can be chosen as very small volume 
elements, so it is evident that the criterion of internal equilibrium within a system (asserted earlier, but without 
proof) is uniformity of temperature, pressure and chemical potentials throughout. It has now been 


demonstrated conclusively that T,p and \i. are potentials', they are intensive properties that measure 'levels'; 
they behave like the (equal) heights of the water surface in two interconnected reservoirs at equilibrium. 

A2.1.5.3 INTEGRATION OF DU 

Equation (A2.1.23) can be integrated by the following trick: One keeps T, p, and all the chemical potentials \i t 
constant and increases the number of moles n. of each species by an amount n. d£, where d£, is the same 
fractional increment for each. Obviously one is increasing the size of the system by a factor (1 + dy, 
increasing all the extensive properties ([/, S, V, n t ) by this factor and leaving the relative compositions (as 

measured by the mole fractions) and all other intensive properties unchanged. Therefore, dS = S d^, dV= V 
d^dn. = n. d^, etc, and 

dU = Udf= TSdt-- pVdf + 5^/irMjdf- 

j 

Dividing by d£, one obtains 

U = TS - pV+^foiif. (A2.1.25) 

j 

Mathematically equation (A2.1.25) is the direct result of the statement that [/is homogeneous and of first 
degree in the extensive properties S, Fand n f . It follows, from a theorem of Euler, that 

U = {fUfiSlv* S I (flU/fl V)^ V + ^iaUftnfivAH.ni- (A2.1 .26) 

j 

(The expression {dli/iJtiJ^iAu signifies, by common convention, the partial derivative of [/with respect to the 

number of moles n . of a particular species, holding S 9 Fand the number of moles n. of all other species (j ^ i) 
constant.) 
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Equation (A2.1.26) is equivalent to equation (A2.1.25) and serves to identify T,p, and \i. as appropriate 
partial derivatives of the energy [/, a result that also follows directly from equation (A2.1.23) and the fact that 
df/is an exact differential. 

T = (Sti/»S) v , Bj p = -(»U/8 l/) tH , /i = (BU/Jta,)^ . 

If equation (A2.1.25) is differentiated, one obtains 

dLi = TdS + SdT- pdV - V dp + ^// y d^ + J^w, d/i* 


which, on combination with equation (A2.1.23) , yields a very important relation between the differentials of 
the potentials: 


SdT- Vdp + J^wditi = 0- 


(A2.1.27) 


The special case of equation (A2.1.27) when Tandp are constant (dT = 0, dp = 0) is called the Gibbs-Duhem 
equation, so equation (A2.1.27) is sometimes called the 'generalized Gibbs-Duhem equation'. 

A2.1.5.4 ADDITIONAL FUNCTIONS AND DIFFERING CONSTRAINTS 

The preceding sections provide a substantially complete summary of the fundamental concepts of classical 
thermodynamics. The basic equations, however, can be expressed in terms of other variables that are 
frequently more convenient in dealing with experimental situations under which different constraints are 
applied. It is often not convenient to use S and Fas independent variables, so it is useful to define other 
quantities that are also functions of the thermodynamic state of the system. These include the enthalpy (or 
sometimes unfortunately called 'heat content') H= [/ + pV, the Helmholtz free energy (or 'work content') A = 
U - TS and the Gibbs free energy (or 'Lewis free energy', frequently just called the 'free energy') G = A + pV. 
The usefulness of these will become apparent as some special situations are considered. In what follows it 
shall be assumed that there is no entropy creation in the surroundings, whose temperature and pressure can be 
controlled, so that equation (A2.1.19) and equation (A2.1.20) can be used to determine dw and dq. Moreover, 
for simplicity, the equations will be restricted to include only pressure-volume work; i.e. to equation (A2.1.5) ; 
the extension to other forms of work should be obvious. 

(A) CONSTANT-VOLUME (ISOCHORIC) PROCESSES 

If there is no volume change (dV= 0), then obviously there is no pressure-volume work done (dw = 0) 
irrespective of the pressure, and it follows from equation (A2.1.10) that the change in energy is due entirely to 
the heat absorbed, which can be designated as q v : 

(AV = 0) dil-dq Ail-q v . (A2.1.28) 
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Note that in this special case, the heat absorbed directly measures a state function. One still has to consider 
how this constant-volume 'heat' is measured, perhaps by an 'electric heater', but then is this not really work? 
Conventionally, however, if work is restricted to pressure-volume work, any remaining contribution to the 
energy transfers can be called 'heat'. 

(B) CONSTANT-PRESSURE (ISOBARIC) PROCESSES 

For such a process the pressure p t of the surroundings remains constant and is equal to that of the system in 
its initial and final states. (If there are transient pressure changes within the system, they do not cause changes 
in the surroundings.) One may then write 

dw = — pu X tdV 
dq = dU + p lwl dlA 

However, since dp ext = and the initial and final pressures inside equal p QxV i.e. Ap = for the change in 
state, 

(Ap = 0) dq = d(U 4- p mx dV) = d(U 4- pV) = dH 

(A2.1.29) 
A{U + pV) = AH = q r 


Thus for isobaric processes a new function, the enthalpy H, has been introduced and its change AH is more 
directly related to the heat that must have been absorbed than is the energy change AU. The same reservations 
about the meaning of heat absorbed apply in this process as in the constant- volume process. 

(C) CONSTANT-TEMPERATURE CONSTANT-VOLUME (ISOTHERMAL-ISOCHORIC) PROCESSES 

In analogy to the constant-pressure process, constant temperature is defined as meaning that the temperature T 
of the surroundings remains constant and equal to that of the system in its initial and final (equilibrium) states. 
First to be considered are constant-temperature constant-volume processes (again dw = 0). For a reversible 
process 

(AT = 0, A V = 0) dU = chf™ = T dS. 

For an irreversible process, invoking the notion of entropy transfer and entropy creation, one can write 

dU = dq = Td t S < T(d t S +diS) = TdS = d{TS) = dq rm (A2.1.30) 

which includes the inequality of equation (A2.1.21) . Expressed this way the inequality dU< TdS looks like a 
contradiction of equation (A2. 1.15) until one realizes that the right-hand side of equation (A2.1.30) refers to 
the measurement of the entropy by a totally different process, a reverse (driven) process in which some work 
must be done on the system. If equation (A2.1.30) is integrated to obtain the isothermal change in state one 
obtains 
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(AT = 0, AV = 0) AU = q < q nw = TAS = A{TS) 

or, rearranging the inequality, 

(A2.1.31) 

Thus, for spontaneous processes at constant temperature and volume a new quantity, the Helmholtz free 
energy A, decreases. At equilibrium under such restrictions dA = 0. 

(D) CONSTANT-TEMPERATURE CONSTANT-PRESSURE (ISOTHERMAL-ISOBARIC) PROCESSES 

The constant-temperature constant-pressure situation yields an analogous result. One can write for the 
reversible process 

(AT = 0, Ap = 0) dU = dq KV + dww = TdS-pdV 

and for the irreversible process 

dU = cUf + dw = Td t S - p d V < TdS - p dV 

which integrated becomes 


(AT = 0, Ap = 0) AU < TAS - pAV = A(TS - pV) 

A(U +pV - TS) = AC < 0. 


(A2.1.32) 


For spontaneous processes at constant temperature and pressure it is the Gibbsfree energy G that decreases, 
while at equilibrium under such conditions dG = 0. 

More generally, without considering the various possible kinds of work, one can write for an isothermal 
change in a closed system (dn f = 0) 

ALL =q -\-w = TAS-h A A 

Now, as has been shown, q = TAS for an isothermal reversible process only; for an isothermal irreversible 
process AS = A^S + A^, and q = TA t S. Since A^ is positive for irreversible changes and zero only for 
reversible processes, one concludes 

£} = TAS w = AA (isothermal reversible changes) 

q < TAS mj > A A (isothermal irreversible changes). 
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Another statement of the second law would be: 'The maximum work from (i.e. - w) a given isothermal 
change in thermodynamic state is obtained when the change in state is carried out reversibly; for irreversible 
isothermal changes, the work obtained is less'. Thus, in the expression U= TS + A, one may regard the TS 
term as that part of the energy of a system that is unavailable for conversion into work in an isothermal 
process, while A measures the 'free' energy that is available for isothermal conversion into work to be done 
on the surroundings. In isothermal changes some of A may be transferred quantitatively from one subsystem 
to another, or it may spontaneously decrease (be destroyed), but it cannot be created. Thus one may transfer 
the available part of the energy of an isothermal system (its 'free' energy) to a more convenient container, but 
one cannot increase its amount. In an irreversible process some of this 'free' energy is lost in the creation of 
entropy; some capacity for doing work is now irretrievably lost. 

The usefulness of the Gibbs free energy G is, of course, that most changes of chemical interest are carried out 
under constant atmospheric pressure where work done on (or by) the atmosphere is not under the 
experimenter's control. In an isothermal-isobaric process (constant Tandp), the maximum available 'useful' 
work, i.e. work other than pressure-volume work, is -AG; indeed Guggenheim (1950) suggested the term 
'useful energy' for G to distinguish it from the Helmholtz 'free energy' A. (Another suggested term for G is 
'free enthalpy' from G = H- TS.) An international recommendation is that A and G simply be called the 
'Helmholtz function' and the 'Gibbs function', respectively. 

A2.1.5.5 USEFUL INTERRELATIONS 

By differentiating the defining equations for H, A and G and combining the results with equation (A2.1.25) 
and equation (A2.1.27) for dU and U (which are repeated here) one obtains general expressions for the 
differentials dH, &4, dG and others. One differentiates the defined quantities on the left-hand side of equation 
(A2.1.34), equation (A2.1.35), equation (A2.1.36), equation (A2.1.37), equation (A2.1.38) and equation 
(A2.1.39) and then substitutes the right-hand side of equation (A2.1.33) to obtain the appropriate differential. 
These are examples of Legendre transformations: 

il-TS- pV 4- J^/i^r dLf = TdS- pdV 4- J^/j,- dn f . (A2.1.33) 

i i 

h = u + pv = ts + J2v^ dtf = rds + vdp + £> ; d^ (A2134) 

f f 

(A2.1.35) 


A= U-TS = -pV+^fi t m dA=-SdT -pdV-\- J^-dM/. 

J' f 

G = U + pV-TS = Y J W"i dG = -SdT + V dp + J^t^i dm. (A2.1.36) 

f f 

-pV - U-TS-^fifHi -d(pV) = -SdT - pdV-^riidpi. (A2.1.37) 

TS=U + pV~Y^Pi"i -d{TS) = T&S + Vdp-^ i ntdnt. (A2.1.38) 

0=U-TS+pV- J^im -d(0) = -SdT 4- Vdp - J> d,^ (A2.1.39) 
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Equation (A2.1.39) is the 'generalized Gibbs-Duhem equation' previously presented ( equation (A2.1.27) ). 
Note that the Gibbs free energy is just the sum over the chemical potentials. 

If there are other kinds of work, similar expressions apply. For example, with electromagnetic work ( equation 
(A2.1.8) ) instead of pressure-volume work, one can write for the Helmholtz free energy 


dA = -SdT- V{P+ dJSb+M- dfl„) + ^^,-dn^ 


(A2.1.40) 


It should be noted that the differential expressions on the right-hand side of equation (A2.1.33) , equation 
(A2.1.34) , equation (A2.1.35) , equation (A2.1.36) , equation (A2.1.37) , equation (A2.1.38) , equation 
(A2.1.39) and equation (A2.1.40) express for each function the appropriate independent variables for that 
function, i.e. the variables — read constraints — that are kept constant during a spontaneous process. 

All of these quantities are state functions, i.e. the differentials are exact, so each of the coefficients is a partial 
derivative. For example, from equation (A2.1.35) p = — {fiA/8 V) Tm „ j9 while from equation (A2.1.36) 

S = — (ttG/ftTJpjj;. Moreover, because the order of partial differentiation is immaterial, one obtains as cross- 
differentiation identities from equation (A2.1.33) , equation (A2.1.34) , equation (A2.1.35) , equation (A2.1.36) , 
equation (A2.1.37) , equation (A2.1.38) , equation (A2.1.39) and equation (A2.1.40) a whole series of useful 
equations usually known as 'Maxwell relations'. A few of these are: from equation (A2.1.33) : 

(s 2 u/»sav)„. = a(<iuftiv)fft$ = n(»u/as)/av 
= -(9p/asi Vmm ^(aT/nv)^ 

from equation (A2.1.35) : 

(H 2 A/HTHV) ni = -C»S/fll/>i-, H , = -(flp/tfT) v , Hl - (A2.1.41) 


from equation (A2.1.36) : 


(3 2 C/tfTtfp) n; = -(SS/SpJr^ = (HVfaT)^ (A2.1.42) 


and from equation (A2.1.40) 


(<i 2 A/HTHEu) n = -(aS/a£ w )-r, H = -V(ftP [y f<}T) E<i , n 
(8 2 Af<iT8B iy )„ = -(3S/flB„>r,p, = -V(SM /»T)b„,,i- 


2a**t-* B \ ,**,**.* ,«*■* iftT \ (A2.1.43) 
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(Strictly speaking, differentiation with respect to a vector quantity is not allowed. However for the isotropic 
spherical samples for which equation (A2.1.8) is appropriate, the two vectors have the same direction and 
could have been written as scalars; the vector notation was kept to avoid confusion with other thermodynamic 
quantities such as energy, pressure, etc. It should also be noted that the Maxwell equations above are correct 
for either of the choices for electromagnetic work discussed earlier; under the other convention A is replaced 
by a generalized G.) 

A2.1.5.6 FEATURES OF EQUILIBRIUM 

Earlier in this section it was shown that, when a constraint, e.g. fixed /, was released in a system for which U, 
Fand n were held constant, the entropy would seek a maximum value consistent with the remaining 

restrictions (e.g. dS/dl = and d S/dl < 0). One refers to this, a result of equation (A2.1.33) , as a 'feature of 
equilibrium'. We can obtain similar features of equilibrium under other conditions from equation (A2.1.34) , 
equation (A2.1.35) , equation (A2.1.36) , equation (A2.1.37) , equation (A2.1.38) and equation (A2.1.39) . Since 
at equilibrium all processes are reversible, all these equations are valid at equilibrium. Each equation is a 
linear relation between differentials; so, if all but one are fixed equal to zero, at equilibrium the remaining 
differential quantity must also be zero. That is to say, the function of which it is the differential must have an 
equilibrium value that is either maximized or minimized and it is fairly easy, in any particular instance, to 
decide between these two possibilities. To summarize the more important of these equilibrium features: 

for fixed f/, V, m S is a maximum 

for fixed H, p, i%i S is a maximum 

for fixed S t V t fit U is a minimum 

for fixed S, p, ;^ H is a minimum 

for fixed T, V\ rii A is a minimum 

for fixed 7\ p, m O is a minimum. 

Of these the last condition, minimum Gibbs free energy at constant temperature, pressure and composition, is 
probably the one of greatest practical importance in chemical systems. (This list does not exhaust the 
mathematical possibilities; thus one can also derive other apparently unimportant conditions such as that at 
constant U, S and n^ Vis a minimum.) However, an experimentalist will wonder how one can hold the 
entropy constant and release a constraint so that some other state function seeks a minimum. 

A2.1.5.7 THE CHEMICAL POTENTIAL AND PARTIAL MOLAR QUANTITIES 

From equation (A2.1.33) , equation (A2.1.34) , equation (A2.1.35) and equation (A2.1.36) it follows that the 
chemical potential may be defined by any of the following relations: 

t S, = (tfU/Jtay)*;^ =(flH/»H|) i|PiKj . 
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In experimental work it is usually most convenient to regard temperature and pressure as the independent 
variables, and for this reason the term partial molar quantity (denoted by a bar above the quantity) is always 
restricted to the derivative with respect to n- holding T, p, and all the other n . constant. (Thus 
V t = {fi Vfftti f ) T ™ „,.) From the right-hand side of equation (A2. 1 .44) it is apparent that the chemical potential 
is the same as the partial molar Gibbs free energy G r and, therefore, some books on thermodynamics, e.g. 

Lewis and Randall (1923), do not give it a special symbol. Note that the partial molar Helmholtz free energy 
is not the chemical potential; it is 

On the other hand, in the theoretical calculations of statistical mechanics, it is frequently more convenient to 
use volume as an independent variable, so it is important to preserve the general importance of the chemical 
potential as something more than a quantity G^whose usefulness is restricted to conditions of constant 

temperature and pressure. 

From cross-differentiation identities one can derive some additional Maxwell relations for partial molar 
quantities: 

(a 2 G/9p0ffi)T.«j = (Bfli/ap)r. n? = (flV/fiiii)!-,^ = v*. 

In passing one should note that the method of expressing the chemical potential is arbitrary. The amount of 
matter of species i in this article, as in most thermodynamics books, is expressed by the number of moles n f ; it 
can, however, be expressed equally well by the number of molecules N. (convenient in statistical mechanics) 
or by the mass m- x (Gibbs' original treatment). 

A2.1.5.8 SOME ADDITIONAL IMPORTANT QUANTITIES 

As one raises the temperature of the system along a particular path, one may define a heat capacity C ath = 
Dq m ft/dT. (The term 'heat capacity' is almost as unfortunate a name as the obsolescent 'heat content for H; 
alas, no alternative exists.) However several such paths define state functions, e.g. equation (A2.1.28) and 
equation (A2.1.29) . Thus we can define the heat capacity at constant volume C y and the heat capacity at 
constant pressure C as 

C v - (tfU/»T) ViJi| = T(ftSfHT) Vjli (A2.1.45) 

C p = (8H/flr) pjiI = T(aS/ST) p ^ (A2.1.46) 

The right-hand equalities in these two equations arise directly from equation (A2.1.33) and equation 
(A2.1.34) . 

Two other important quantities are the isobaric expansivity ('coefficient of thermal expansion') a and the 
isothermal compressibility k t , defined as 
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«r = -(i/l/)(8l//flp)- rjr .. 

The adjectives 'isobaric' and 'isothermal' and the corresponding subscripts are frequently omitted, but it is 
important to distinguish between the isothermal compressibility and the adiabatic compressibility . 

A relation between C and Cy can be obtained by writing 

<»s/»r) p , H , =(&s/m> iKj H-(»s/»i/)-r, Hl .(tfv/ar) JJ . li; 

= (flS/flT) ViHj +(»p/8T)^ n ,(BV/ar) JJ .„ J 
0p/fl7>, B . = -(tfp/fl l/) Tjil (BV/flT);^,. = V*"'" (from the cyclic rule). 

Combining these, we have 

(aH/»r) pji . -(au/»r) Vjil =r(a P /BT) l :,„,(BV/ar) J , ? , 

or 

C F = Cy = Tl/flfJ/ifr- (A2.1.47) 

For the special case of the ideal gas ( equation (A2. 1 . 1 7) ), a = 1/7 and k^= 1/p, 

C y -C v = TVp/T 2 = mJ? (ideal gas only). 

A similar derivation leads to the difference between the isothermal and adiabatic compressibilities: 

Af-j- -^ = TVffp/Cj,. (A2.1.48) 

A2.1.5.9 THERMODYNAMIC EQUATIONS OF STATE 

Two exact equations of state can be derived from equation (A2.1.33) and equation (A2.1.34) 


(dU/BVh.v =-? + T(9S/V) T , M =-p + T(dp/dT) Vr n 
or P =T(ap/9T) VM -{M/9Vh M . 


(A2.1.49) 
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(aH/aph:*;. = V+ T(3S/3p>r, B . = V-T(tfV/3T) p . ir; 
or V = TiaV/ai),,,,. + (SH/»p)-j-,n . 

It is interesting to note that, when the van der Waals equation for a fluid, 


(A2.1.50) 


p = nRTf(V-nb)-i£a/V 2 , 

is compared with equation (A2.1.49) , the right-hand sides separate in the same way: 

r(flp/»r) Vjl = nKT/(V - nb) and (fill f 11 V) Tjl = rc 2 fl/^ 
A2.1. 6 APPLICATIONS 

A2.1.6.1 PHASE EQUILIBRIA 

When two or more phases, e.g. gas, liquid or solid, are in equilibrium, the principles of internal equilibrium 
developed in section A2. 1.5. 2 apply. If transfers between two phases a and p can take place, the appropriate 
potentials must be equal, even though densities and other properties can be quite different. 

As shown in preceding sections, one can have equilibrium of some kinds while inhibiting others. Thus, it is 

possible to have thermal equilibrium (T 01 = 7**) through a fixed impermeable diathermic wall; in such a case 
p a need not equal /?P, nor need /J-^equal f.i . It is possible to achieve mechanical equilibrium (p a =p^) through 
a movable impermeable adiabatic wall; in such a case the transfer of heat or matter is prevented, so T and \\ >i 

can be different on opposite sides. It is possible to have both thermal and mechanical equilibrium (p a =pP, T 01 
= jP) through a movable diathermic wall. For a one-component system |u =J(T,p), so |u a = [ft even if the wall 
is impermeable. However, for a system of two or more components one can have v a =pP and T a = T^, but the 
chemical potenti al is now also a function of composition, so /J /need not equal 'fr . It does not seem 
experimentally possible to permit material equilibrium /„* _ ^^ without simultaneously achieving thermal 

equilibrium (J a = 7**). 

Finally, in membrane equilibria, where the wall is permeable to some species, e.g. the solvent, but not others, 
thermal equilibrium (T 01 = 7^) and solvent equilibrium {p* = f^ )are found, but /iy ^ fi ^and// 1 *p*; the 
difference v^-v a is the osmotic pressure. 
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For a one-component system, AG a ^P = |ur - a = 0, so one may write 

AG*^ = AJF-^-TAS^* = or AS*"** = AH^'/L (A2.1.51) 

THE CLAPEYRON EQUATION 

Moreover, using the generalized Gibbs-Duhem equations (A2.1.27) for each of the two one-component 
phases, 

S u dT - V"dp + n"d(±=Q 
or 


dfl = V" dp - 5™ dT = V? d/J - 5^ dT 

one obtains the Clapeyron equation for the change of pressure with temperature as the two phases continue to 
coexist: 

dp/dT = A3^/Af^ = AH U ^/TAV U ^. (A2.1.52) 

The analogue of the Clapeyron equation for multicomponent systems can be derived by a complex procedure 
of systematically eliminating the various chemical potentials, but an alternative derivation uses the Maxwell 
relation (A2. 1.41) 

{BpA/BTdV) ni = -(d$fdV)T^ = -(BpfdT) Vfnil (A2.1.41) 

Applied to a two-phase system, this says that the change in pressure with temperature is equal to the change in 
entropy at constant temperature as the total volume of the system (a + P) is increased, which can only take 
place if some a is converted to P: 

dp/dT = AS"^ fi /AV^ fl = AH^VTAV^^ 

In this case, whatever n f moles of each species are required to accomplish the A Fare the same ns that 
determine AS or AH. Note that this general equation includes the special one-component case of equation 
(A2.1.52). 

When, for a one-component system, one of the two phases in equilibrium is a sufficiently dilute gas, i.e. is at a 
pressure well below 1 atm, one can obtain a very useful approximate equation from equation (A2.1.52). The 
molar volume of the gas is at least two orders of magnitude larger than that of the liquid or solid, and is very 
nearly an ideal gas. Then one can write 

AV { S* P*fis RT/p 
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which can be substituted into equation (A2.1.52) to obtain 

dp/dT ^- AH l ^(p/RT 2 ) 


or 


dlnp/dT ^ Afi^/RT 2 = AH vap /tfT 2 (A2.1.53) 

or 

dlnp/d(l/r)«-A^ p /;4, 

where AH V[i J s the molar enthalpy of vaporization at the temperature T. The corresponding equation for the 
vapour pressure of the solid is identical except for the replacement of the enthalpy of vaporization by the 


enthalpy of sublimation. 

(Equation (A2.1.53) is frequently called the Clausius-Clapeyron equation, although this name is sometimes 
applied to equation (A2.1.52) . Apparently Clapeyron first proposed equation (A2.1.52) in 1834, but it was 
derived properly from thermodynamics decades later by Clausius, who also obtained the approximate 
equation (A2.1.53).) 

It is interesting and surprising to note that, although the molar enthalpy A// V!tp an d the molar volume of 
vaporization a V v .| P both decrease to zero at the critical temperature of the fluid (where the fluid is very non- 
ideal), a plot of \np against 1/Tfor most fluids is very nearly a straight line all the way from the melting point 
to the critical point. For example, for krypton, the slope d lnp/d(l/T) varies by less than 1% over the entire 
range of temperatures; even for water the maximum variation of the slope is only about 15%. 

THE PHASE RULE 

Finally one can utilize the generalized Gibbs-Duhem equations (A2.1.27) for each phase 

S* dT - V* dp 4- Y* rj ^ d ^ = ° 

f 

SfidT - V* dp 4- Y, nf d/j; = 

J 

etc to obtain the 'Gibbs phase rule'. The number of variables (potentials) equals the number of components 
Cplus two (temperature and pressure), and these are connected by an equation for each of the ^phases. It 
follows that the number of potentials that can be varied independently (the 'degrees of freedom' f) is the 
number of variables minus the number of equations: 
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From this equation one concludes that the maximum number of phases that can coexist in a one-component 
system (C = 1) is three, at a unique temperature and pressure (T = 0). When two phases coexist (T = 1), 
selecting a temperature fixes the pressure. Conclusions for other situations should be obvious. 

A2.1.6.2 REAL AND IDEAL GASES 

Real gases follow the ideal-gas equation (A2.1.17) only in the limit of zero pressure, so it is important to be 
able to handle the thermodynamics of real gases at non-zero pressures. There are many semi-empirical 
equations with parameters that purport to represent the physical interactions between gas molecules, the 
simplest of which is the van der Waals equation (A2.1.50) . However, a completely general form for 
expressing gas non-ideality is the series expansion first suggested by Kamerlingh Onnes (1901) and known as 
the virial equation of state: 

pVfnRT = 1 H- B(n/V) 4- C(nfV) 7 H- D(n/Vf + - ^ 

The equation is more conventionally written expressing the variable n/Vas the inverse of the molar volume, 
1 / P, although n/Vis just the molar concentration c, and one could equally well write the equation as 


p/RT =C+Bc +Cc 3 -\-Dc 4 -{ . (A2.1.54) 

The coefficients B, C, D, etc for each particular gas are termed its second, third, fourth, etc. virial coefficients, 
and are functions of the temperature only. It can be shown, by statistical mechanics, that B is a function of the 
interaction of an isolated pair of molecules, C is a function of the simultaneous interaction of three molecules, 
D, of four molecules, etc., a feature suggested by the form of equation (A2.1.54). 

While volume is a convenient variable for the calculations of theoreticians, the pressure is normally the 
variable of choice for experimentalists, so there is a corresponding equation in which the equation of state is 
expanded in powers of p: 

pV/n = RT+B f p + CY + U> 3 -f ■♦■, (A2.1.55) 

The pressure coefficients can be related to the volume coefficients by reverting the series and one finds that 

B f = B C = (C - B 2 )/KT D f = (D - 3BC +2B 3 )f{RT) 2 etc 

According to equation (A2.1.39) (d\i/dp) T = V/n, so equation (A2.1.55) can be integrated to obtain the 
chemical potential: 
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t i(T m p) - t i"(T, ffi) = Krin(p/p n ) 4- B f p + Cp 2 /2 -f ffjpf$+* ... (A2.1.56) 

Note that a constant of integration |u° has come into the equation; this is the chemical potential of the 
hypothetical ideal gas at a reference pressure/? , usually taken to be one atmosphere. In principle this 
involves a process of taking the real gas down to zero pressure and bringing it back to the reference pressure 
as an ideal gas. Thus, since d|i = (V/n) dp, one may write 

u 

V(T, p°) - tt°(T, p°) = [ [(V/n) - (RT/p)] dp = flV + C'(p") 2 + . ■ . , 

Jo 

If p = 1 atm, it is sufficient to retain only the first term on the right. However, one does not need to know the 
virial coefficients; one may simply use volumetric data to evaluate the integral. 

The molar entropy and the molar enthalpy, also with constants of integration, can be obtained, either by 
differentiating equation (A2.1.56) or by integrating equation (A2.1.42) or equation (A2.1.50) : 

3(I\ p) - 9\T> p°) = -Rln{p/p") - W/dT)p - (dC7d7V/2 - (dff/dT)j^/3 - - 

. (A2.1.57) 

R{T. p}- ff\T. p (l ) = [JS' - T(dB7cir)Jp 4- [C - TldC r fdT)]p*f2+ [& - T{dD'/dT)\^j5 

where, as in the case of the chemical potential, the reference molar entropy g^and reference molar enthalpy 
H^'are for the hypothetical ideal gas at a pressure/?. 

It is sometimes convenient to retain the generality of the limiting ideal-gas equations by introducing the 
activity a, an 'effective' pressure (or, as we shall see later in the case of solutions, an effective mole fraction, 


concentration, or molality). For gases, after Lewis (1901), this is usually called the fugacity and symbolized 
by /rather than by a. One can then write 

,;.(r, P )-,A7V)=/<rJn(//A 

One can also define an activity coefficient ox fugacity coefficient y =f/p; obviously 

RT Iny = fl'p 4- Cp 2 /2 + D'p 3 /3 + " ' * 


TEMPERATURE DEPENDENCE OF THE SECOND VIRIAL COEFFICIENT 

Figure A2.1.7 shows schematically the variation ofB = B' with temperature. It starts strongly negative 
(theoretically at minus infinity for zero temperature, but of course unmeasurable) and decreases in magnitude 
until it changes sign at the Boyle temperature (B = 0, where the gas is more nearly ideal to higher pressures). 
The slope dB/dT remains 
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positive, but decreases in magnitude until very high temperatures. Theory requires the virial coefficient finally 
to reach a maximum and then slowly decrease, but this has been experimentally observed only for helium. 
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Figure A2.1.7. The second virial coefficient B as a function of temperature T/T B . (Calculated for a gas 
satisfying the Lennard- Jones potential [8].) 

It is widely believed that gases are virtually ideal at a pressure of one atmosphere. This is more nearly true at 
relatively high temperatures, but at the normal boiling point (roughly 20% of the Boyle temperature), typical 
gases have values of pV/nRT that are 5 to 15% lower than the ideal value of unity. 


THE JOULE-THOMSON EFFECT 


One of the classic experiments on gases was the measurement by Joule and Thomson (1853) of the change in 
temperature when a gas flows through a porous plug (throttling valve) from a pressure p^ to a pressure p 2 
( figure A2.1.8 ). The total system is jacketed in such a way that the process is adiabatic (q = 0), and the 
pressures are constant (other than an infinitesimal 8p) in the two parts. The work done on the gas in the right- 
hand region to bring it through isp 2 dV 2 , while that in the left-hand region is -p^dV^ (because dV^ is 
negative). The two volumes are of course unequal, but no assumption about the ideality of the gas is necessary 
(or even desirable). The total energy change can then be written as the loss of energy from the left-hand 
region plus the gain in energy by the right-hand region 

dU = -dUi + dU 2 = pi dl/i - J* d V 2 
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g(co) is essentially the density of states and the above expression corresponds to the Debye model. 

In general, the phonon density of states g(co), dco is a complicated function which can be directly measured 
from experiments, or can be computed from the results from computer simulations of a crystal. The explicit 
analytic expression of g(co) for the Debye model is a consequence of the two assumptions that were made 
above for the frequency and velocity of the elastic waves. An even simpler assumption about g(co) leads to the 
Einstein model, which first showed how quantum effects lead to deviations from the classical equipartition 
result as seen experimentally. In the Einstein model, one assumes that only one level at frequency a> E is 
appreciably populated by phonons so that g(co) = 8(co-co E ) and , for each of the Einstein modes. 03 £ /^ b is 
called the Einstein temperature £ . 

High-temperature behaviour. Consider Tmuch higher than a characteristic temperature like D or E . Since P 
ftco is then small compared to 1 , one can expand the exponential to obtain 


tflhtu _ I £ 


and 


as expected by the equipartition law. This leads to a value of 3Nk^ for the heat capacity C y . This is known as 
the Dulong and Petit 's law. 

Low-temperature behaviour. In the Debye model, when 7^0 D , the upper limit, x D , can be approximately 
replaced by go, the integral over x then has a value tt 4 /15 and the total phonon energy reduces to 

proportional to T 4 . This leads to the heat capacity, for r^0 D , 


Cv = TTT = ™\ T B A * T * ( AZZ96 ) 

This result is called the Debye Z 3 law. Figure A2.2.4 compares the experimental and Debye model values for 
the heat capacity C . It also gives Debye temperatures for various solids. One can also evaluate Cy for the 

Einstein model: as expected it approaches the equipartition result at high temperatures but decays 

exponentially to zero as Tgoes to zero. 
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(aT/»V) u = -(m/aV) T /{ail/ST) v = -RT 2 [(dBfdT)(n/V) 2 + '-]fC v > 

Unlike (dH/dp) T , (dU/dV) T does indeed vanish for real gases as the pressure goes to zero, but this is because 
the derivative is with respect to V, not because of the difference between [/and H. At appreciable pressures 
(dTldV)jj'\s almost invariably negative, because the Joule temperature, at which d^/drbecomes negative, is 
extremely high (see figure A2.1.7 ). 

A2.1.6.3 GASEOUS MIXTURES 

According to Dalton's law of partial pressures, observed experimentally at sufficiently low pressures, the 
pressure of a gas mixture in a given volume Fis the sum of the pressures that each gas would exert alone in 
the same volume at the same temperature. Expressed in terms of moles n f 

i 
p(n u ft, >.**n h V, T) = ^ ^-(iij, V\ T), 

j=i 

or, given the validity of the ideal-gas law ( equation (A2. 1.18) ) at these pressures, 

i ^ if * 

The partial pressure p ■ of a component in an ideal-gas mixture is thus 

Pi = ln< / Y^ n i jP= X *P (A2.1.59) 

where x f = nJn is the mole fraction of species i in the mixture. (The partial pressure is always defined by 
equation (A2.1.59) even at higher pressures where Dalton's law is no longer valid.) 

Given this experimental result, it is plausible to assume (and is easily shown by statistical mechanics) that the 
chemical potential of a substance with partial pressure p i in an ideal-gas mixture is equal to that in the one- 
component ideal gas at pressure// =p. 

/<;(/>. 7\ X}) = /fjt/ = Xip. T.xi= 1), (A2.1.60) 
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What thermodynamic experiments can be cited to support such an assumption? There are several: 

(1) There are a few semipermeable membranes that separate a gas mixture from a pure component gas. 
One is palladium, which is permeable to hydrogen, but not (in any reasonable length of time) to other 
gases. Another is rubber, through which carbon dioxide or ammonia diffuses rapidly, while gases like 
nitrogen or argon diffuse much more slowly. In such cases, at equilibrium (when the chemical 
potential of the diffusing gas must be the same on both sides of the membrane) the pressure of the 
one-component gas on one side of the membrane is found to be equal to its partial pressure in the gas 
mixture on the other side. 

(2) In the phase equilibrium between a pure solid (or a liquid) and its vapour, the addition of other gases, 
as long as they are insoluble in the solid or liquid, has negligible effect on the partial pressure of the 
vapour. 

(3) In electrochemical cells (to be discussed later), if a particular gas participates in a chemical reaction at 
an electrode, the observed electromotive force is a function of the partial pressure of the reactive gas 
and not of the partial pressures of any other gases present. 

For precise measurements, there is a slight correction for the effect of the slightly different pressure on the 
chemical potentials of the solid or of the components of the solution. More important, corrections must be 
made for the non-ideality of the pure gas and of the gaseous mixture. With these corrections, equation 
(A2.1.60) can be verified within experimental error. 

Given equation (A2.1.60) one can now write for an ideal-gas mixture 

MP, T\ X) = /ij = /i|V T) H- RThtpt/jP) 

= ti i }(p i \T) + RT]iix i p/ t fi) (A2.1.61) 

= /iJVi r) 4- RT\n(p/p°) H- RT In x t . 

Note that this has resulted in the separation of pressure and composition contributions to chemical potentials 
in the ideal-gas mixture. Moreover, the thermodynamic functions for ideal-gas mixing at constant pressure 
can now be obtained: 


AG m (T, p) = 5^nj[/ir(T, p,x- t )- pi{T, pA)] = K7 J^-H/ln */ 

AS w (T n p) = -{SAG m /ST) lw = -RJ\ m In* 
AH.tr, p) = AC m (T n p) + TA^(T, p) = 0. 


► ideal gas only 


Gas mixtures are subject to the same degree of non-ideality as the one-component ('pure') gases that were 
discussed in the previous section. In particular, the second virial coefficient for a gas mixture can be written as 
a quadratic average 

k k 
B(T.Xu.„ + X k ) = ^Y2x t X } B ii . (A2.1.62) 

r=l ;=i 
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where B. p a function of temperature only, depends on the interaction of an i,j pair. Thus, for a binary mixture 


of gases, one has B^ and B 22 from measurements on the pure gases, but one needs to determine i? 12 as well. 
The corresponding third virial coefficient is a cubic average over the C-.^s, but this is rarely needed. 
Appropriate differentiation of equation (A2. 1 .62) will lead to the non-ideal corrections to the equations for the 
chemical potentials and the mixing functions. 

A2.1.6.4 DILUTE SOLUTIONS AND HENRY'S LAW 

Experiments on sufficiently dilute solutions of non-electrolytes yield Henry 's law, that the vapour pressure of 
a volatile solute, i.e. its partial pressure in a gas mixture in equilibrium with the solution, is directly 
proportional to its concentration, expressed in any units (molar concentrations, molality, mole fraction, weight 
fraction, etc.) because in sufficiently dilute solution these are all proportional to each other. 

p i =k c ct=k in tnt=hxi 

where c f is the molar concentration of species i (conventionally, but not necessarily, expressed in units of 
moles per litre of solution), m f is its molality (conventionally expressed as moles per kilogram of solvent), and 
x. is its mole fraction. The Henry's law constants k , k and k differ, of course, with the choice of units. 

I C YYl JC 

It follows that, because phase equilibrium requires that the chemical potential \i t be the same in the solution as 
in the gas phase, one may write for the chemical potential in the solution: 

MT*d) - WP\r°) = Krinfr/A (A2.1.63) 

Here the composition is expressed as concentration c and the reference state is for unit concentration c° 
(conventionally 1 mol 1 ) but it could have been expressed using any other composition variable and the 
corresponding reference state. 

It seems appropriate to assume the applicability of equation (A2.1.63) to sufficiently dilute solutions of non- 
volatile solutes and, indeed, to electrolyte species. This assumption can be validated by other experimental 
methods (e.g. by electrochemical measurements) and by statistical mechanical theory. 

Just as increasing the pressure of a gas or a gas mixture introduces non-ideal corrections, so does increasing 
the concentration. As before, one can introduce an activity a . and an activity coefficient y. and write a. = c.y. 
and 

t*i(T,Ci) -t^(T,c°) = RTln(a f fc") = RTln(c f /c°) 4- RTln M . 

In analogy to the gas, the reference state is for the ideally dilute solution at c , although at c° the real solution 
may be far from ideal. (Technically, since this has now been extended to non- volatile solutes, it is defined at 

the reference pressure/? rather than at the vapour pressure; however, because UUh/^ph — ^r, and molar 
volumes are small in condensed systems, this is rarely of any consequence.) 
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Using the Gibbs-Duhem equation ( (A2.1.27) with dT= 0, dp = 0), one can show that the solvent must obey 
Raoult's law over the same concentration range where Henry's law is valid for the solute (or solutes): 


where x Q is the mole fraction of solvent, p^ is its vapour pressure, and p®is the vapour pressure of pure 
solvent, i.e. at x Q = 1. A more careful expression of Raoult's law might be 

It should be noted that, whatever the form of Henry's law (i.e. in whatever composition units), Raoult's law 
must necessarily be expressed in mole fraction. This says nothing about the appropriateness of mole fractions 
in condensed systems, e.g. in equilibrium expressions; it arises simply from the fact that it is a statement about 
the gas phase. 

The reference state for the solvent is normally the pure solvent, so one may write 

Finally, a brief summary of the known behaviour of activity coefficients: 
Binary non-electrolyte mixtures: 

solvent: In y\ y = Jfcc* 4- 0{c*) 

solute: In }^ = k'c-i 4- 0(c^). 

(Theory shows that these equations must be simple power series in the concentration (or an alternative 
composition variable) and experimental data can always be fitted this way.) 

Single electrolyte solution: 

solvent: Inyo = fc"m"J'' 2 4- 0{m]} 

solute: lnyi = k f "m\ /2 + Q(m x ). 

(The situation for electrolyte solutions is more complex; theory confirms the limiting expressions (originally 
from Debye-Hiickel theory), but, because of the long-range interactions, the resulting equations are non- 
analytic rather than simple power series.) It is evident that electrolyte solutions are 'ideally dilute' only at 
extremely low concentrations. Further details about these activity coefficients will be found in other articles. 
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A2.1.6.5 CHEMICAL EQUILIBRIUM 

If a thermodynamic system includes species that may undergo chemical reactions, one must allow for the fact 
that, even in a closed system, the number of moles of a particular species can change. If a chemical reaction 
(e.g. N 2 + 3H 2 -^ 2NH 3 ) is represented by the symbolic equation 

u A A 4- UuB H -> vyY + i' Z Z -f — (A2.1 .64) 


it is obvious that any changes in the numbers of moles of the species must be proportional to the coefficients 
v. Thus if j^ jj A, etc, are the numbers of moles of the species at some initial point, we may write for the 

number of moles at some subsequent point 

h a = ri{ - i'a£ dn A = -i'a d£ 

Hy = H„ - Uii^ dflB = — V\S d% 

tfy = fl£ 4- yy£ d?iy = V\ df 

Hz = u^ 4- vz% duz = vz d£ 


where the parameter £, is called the 'degree of advancement' of the reaction. (If the variable £, goes from to 
1, one unit of the reaction represented by equation (A2.1.64) takes place, but £, = does not ecessarily mean 
that only reactants are present, nor does £, = 1 mean that only products remain.) More generally one can write 

n? = rcj 1 4- v*£ dnj = v; d£ (A2.1.65) 

where positive values of v. designate products and negative values of v^. designate reactants. Equation 
(A2.1.33) , Equation (A2.1.34) , Equation (A2.1.35) and Equation (A2.1.36) can be rewritten in new forms 
appropriate for these closed, but chemically reacting systems. Substitution from equation (A2.1.65) yields 


dL/ =TdS- pdV + ^/^dHr = TdS - pdV + ( J] Wr/Jj ) 
dH = TdS-hl/dp + ^/j l -dHj = TdS + Vdp + f^i^-j 
dA =-SdT- pd^ + J^/ijdMi = -SdT - pdV + ( J^ u^/i, J 
dC = -SdT + Vdp + ^t^dni = -SdT + Vdp + ( J] Vj/i; ) 

r i : ' 


df 
df 
df 

d^ 
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We have seen that equilibrium in an isolated system (dU=0,dV=0) requires that the entropy S be a 
maximum, i.e. that (dS/d^) uv = 0. Examination of the first equation above shows that this can only be true if 
Zj-Vj-ili . vanishes. Exactly the same conclusion applies for equilibrium under the other constraints. Thus, for 
constant temperature and pressure, minimization of the Gibbs free energy requires that (dG/d^) T = E/v^. = 


THE AFFINITY 

This new quantity Z/v^., the negative of which De Donder (1920) has called the 'affinity' and given the 
symbol of a script^, is obviously the important thermodynamic function for chemical equilibrium: 

-A = J] vtm = -T{d$m)uy = (BUmhv = (8H/fl£)s. p = {BA/B$)t,v = {dGfflh^ 

i 

Figure A2.1.9 illustrates how the entropy S and the affinity -4vary with ^ in a constant U, V system. It is 


apparent that when the slope (dS/dQy y is positive (positive affinity), \ will spontaneously increase; when it 
is negative, £, will spontaneously decrease; when it is zero, £, has no tendency to change and the system is at 
equilibrium. Moreover, one should note the feature that .4= is the criterion for equilibrium for all these sets 
of constraints, whether [/and Fare fixed, or Tandp are fixed. 

Instead of using the chemical potential \i f one can use the absolute activity X f = exp([iJRT). Since at 
equilibrium .4= 0, 

-a = j^ vtth = KT y, v> in Aj = " or y\ ^r = L 

; j" i 

It is convenient to define a relative activity a f in terms of the standard states of the reactants and products at 
the same temperature and pressure, where Xj = X^ f fij = (a® 

Thus, at equilibrium 

-A = J] Villi = Y, v i M^V) = J] ^" + ^ T £ V * ]na i = °- 
t I I i 

If we define an equilibrium constant K as 

*=rK=rw)"* (A2.1.68) 

it can now be related directly to A or AG : 
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Figure A2.1.9. Chemically reacting systems, (a) The entropy S as a function of the degree of advancement £, 
of the reaction at constant [/and V. (b) The affinity ^4as a function of £, for the same reacting system. 
Equilibrium is reached at £= 0.623 where Wis a maximum and ^4= 0. 


■■ r ■■ 

= -AG° = ^ 


(A2.1.67) 


To proceed further, to evaluate the standard free energy AG , we need information (experimental or 
theoretical) about the particular reaction. One source of information is the equilibrium constant for a chemical 
reaction involving gases. Previous sections have shown how the chemical potential for a species in a gaseous 
mixture or in a dilute solution (and the corresponding activities) can be defined and measured. Thus, if one 
can determine (by some kind of analysis) 
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the partial pressures of the reacting gases in an equilibrium mixture or the concentrations of reacting species 
in a dilute solution equilibrium (and, where possible, adjust them to activities so one allows for non-ideality), 

one can obtain the thermodynamic equilibrium constant K and the standard free energy of reaction AG from 
equation (A2.1.66) and equation (A2.1.67) . 


A cautionary word about units: equilibrium constants are usually expressed in units, because pressures and 
concentrations have units. Yet the argument of a logarithm must be dimensionless, so the activities in 
equation (A2.1.66) , defined in terms of the absolute activities (which are dimensionless) are dimensionless. 

The value of the standard free energy AG depends on the choice of reference state, as does the equilibrium 
constant. Thus it would be safer to write the equilibrium constant K for a gaseous reaction as 

i i 

Here K is dimensionless, but K is not. Conversely, the factor (p i] ) Y ^has units (unless Zv • = 0) but the value 

P " • i 

unity if p® = 1 atm. Similar considerations apply to equilibrium constants expressed in concentrations or 
molalities. 

A2.1.6.6 REVERSIBLE GALVANIC CELLS 

A second source of standard free energies comes from the measurement of the electromotive force of a 
galvanic cell. Electrochemistry is the subject of other articles ( A2.4 and BL28), so only the basics of a 
reversible chemical cell will be presented here. For example, consider the cell conventionally written as 

l't, H 2 (l atm)|HCl(m)|AgCl(s). A S (s) 

for which the electrode reactions are oxidation at the left electrode, the anode, 

|H 2 (l atm) ++ tr(m) + e~(#ieffl) 
and reduction at the right electrode, the cathode, 

AgCl(s) + e" Wright) <» Cl-(m) 4- Agfa) 

which can be rewritten as two concurrent reactions: 

^H 2 ( I atm) 4- AgCI(R) *+ H + (m) + CI" (m) + Ag(s) (I) 

e" (bright) ^e"(feir)^ (II) 
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The chemical reaction (I) cannot come to equilibrium directly; it can come to equilibrium only if the two 
electrodes are connected so that electrons can flow. One can use this feature to determine the affinity (or the 
AG) of reaction (I) by determining the affinity of reaction (II) which balances it. 

In these equations the electrostatic potential \\f might be thought to be the potential at the actual electrodes, the 
platinum on the left and the silver on the right. However, electrons are not the hypothetical test particles of 
physics, and the electrostatic potential difference at a junction between two metals is unmeasurable. What is 
measurable is the difference in the electrochemical potential \i of the electron, which at equilibrium must be 
the same in any two wires that are in electrical contact. One assumes that the electrochemical potential can be 
written as the combination of two terms, a 'chemical' potential minus the electrical potential (-\|/ because of 
the negative charge on the electron). When two copper wires are connected to the two electrodes, the 


'chemical' part of the electrochemical potential is assumed to be the same in both wires; then the 
potentiometer measures, under conditions of zero current flow, the electrostatic potential difference A\|/ = 
bright _ ^left between the two copper wires, which is called the electromotive force (emf) £of the cell. 

For reaction (I) the two solids and the hydrogen gas are in their customary standard states (a= 1), so 
AG, = AC? 4- RTln( g y gQ ) = AC " + *Tln(fl H .fl C i ) 

while for the electrons 

AG || = ^"(^rLght - l^kift) = F£ 

where J^is the Faraday constant (the amount of charge in one mole of electrons). 

When no current flows, there is a constrained equilibrium in which the chemical reaction cannot proceed in 
either direction, and £can be measured. With this constraint, for the overall reaction AG = AG l + AG n = 0, so 

AG [ l + RT\n(a H .a a ) = -JF£\ 

Were the HC1 in its standard state, AC "would equal -J 7 ^ , where £° is the standard emf for the reaction. In 
general, for any reversible chemical cell without transference, i.e. one with a single electrolyte solution, not 
one with any kind of junction between two solutions, 

AG? + flTln (j^uA = -*&£ (A2.1.68) 

AG? = -nJF£° (A2.1.69) 
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where n is the number of electrons associated with the cell reaction as written. By combining equation 
(A2.1.68) and equation (A2.1.69) one obtains the Nernst equation 


£ = £°-{RTf 


rtF)ln(j2«i\ 


Thus, if the activities of the various species can be determined or if one can extrapolate to infinite dilution, the 
measurement of the emf yields the standard free energy of the reaction. 

A2.1.6.7 STANDARD STATES AND STANDARD FREE ENERGIES OF FORMATION 

With several experimental methods for determining the AGs of chemical reactions, one can start to organize 
the information in a systematic way. (To complete this satisfactorily, or at least efficiently and precisely, one 

needs the third law to add third-law entropies to calorimetrically determined A//°s. Discussion of this is 
deferred to the next section, but it will be assumed for the purpose of this section that all necessary 
information is available.) 


STANDARD STATES 

Conventions about standard states (the reference states introduced earlier) are necessary because otherwise the 
meaning of the standard free energy of a reaction would be ambiguous. We summarize the principal ones: 

(1) All standard states, both for pure substances and for components in mixtures and solutions, are 
defined for a pressure of exactly 1 atmosphere. However the temperature must be specified. (There is 
some movement towards metricating this to a pressure of 1 bar =100 kPa = 0.986 924 atm. This 

would make a significant difference only for gases; at T= 298 K, this would decrease a |u by 32.6 J 

mol" 1 .) 

(2) As noted earlier, the standard state of a gas is the hypothetical ideal gas at 1 atmosphere and the 
specified temperature T. 

(3) The standard state of a substance in a condensed phase is the real liquid or solid at 1 atm and T. 

(4) The standard state of an electrolyte is the hypothetical ideally dilute solution (Henry's law) at a 
molarity of 1 mol kg . (Actually, as will be seen, electrolyte data are conventionally reported as for 
the formation of individual ions.) Standard states for non-electrolytes in dilute solution are rarely 
invoked. 

(5) For a free energy of formation, the preferred standard state of the element should be the 
thermodynamically stable (lowest chemical potential) form of it; e.g. at room temperature, graphite for 
carbon, the orthorhombic crystal for sulfur. 

Compounds that are products in reactions are sometimes reported in standard states for phases that are not the 
most stable at the temperature in question. The stable standard state of H 2 at 298 K (and 1 atm) is, of course, 

the liquid, but AG°s are sometimes reported for reactions leading to gaseous H~0 at 298 K. Moreover the 
standard functions for the formation of some metastable states, e.g. C(diamond) or S(monoclinic) at 298 K, 
are sometimes reported in tables. 
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The useful thermodynamic functions (e.g. G, H, S, C , etc) are all state functions, so their values in any 
particular state are independent of the path by which the state is reached. Consequently, one can combine (by 

addition or subtraction) the AG°s for several chemical reactions at the same temperature, to obtain the AG of 
another reaction that is not directly measurable. (Indeed one experimentalist has commented that the principal 
usefulness of thermodynamics arises 'because some quantities are easier to measure than others'.) 

In particular, one can combine reactions to yield the AG*}, AhfJ'and ASi'for formation of compounds from 
their elements, quantities rarely measurable directly. (Many A H"sfor formation of substances are easily 

calculated from the calorimetric measurement of the enthalpies of combustion of the compound and of its 
constituent elements.) For example, consider the dimerization of N0 2 at 298 K. In appropriate tables one 
finds 

^N 2 (g) + Ojfg) -+ NCfe(g) Art" = 5 J .840 kJ mol" 1 
N 2 (g) + 20z(g) -+ N 2 4 (g) Afl" = 9S.2S6 kJ mol" 1 

2NQ 2 (a) -+ N 2 4 (g) AC^ = (98.286 -2x51 .840) kJ mol" 1 

= -5.394 kJ mol" 1 . 

With this information one can now use equation (A2.1.67) to calculate the equilibrium constant at 298 K. One 


finds K = 75.7 or, using the dimensional constant, K = 75.7 atm . (In fact, the free energies of formation 
were surely calculated using the experimental data on the partial pressures of the gases in the equilibrium. 
One might also note that this is one of the very few equilibria involving only gaseous species at room 
temperature that have constants K anywhere near unity.) 

Thermodynamic tables usually report at least three quantities: almost invariably the standard enthalpy of 

formation at 298 K, Af^(298 K); usually the standard entropy at 298 K, S°(298 K) (not AS)'(298 K), but the 

entropy based on the third-law convention (see subsequent section) that S°(0 K) = 0); and some form of the 
standard free energy of formation, usually either AG^i(298 K) or log 10 ^y Many tables will include these 

quantities at a series of temperatures, as well as the standard heat capacity C^, and enthalpies and entropies 
of various transitions (phase changes). 

THE STANDARD FREE ENERGY OF FORMATION OF IONS 

A special convention exists concerning the free energies of ions in aqueous solution. Most thermodynamic 
information about strong (fully dissociated) electrolytes in aqueous solutions comes, as has been seen, from 
measurements of the emf of reversible cells. Since the ions in very dilute solution (or in the hypothetical 

ideally dilute solution at m = 1 mol kg" 1 ) are essentially independent, one would like to assign free energy 
values to individual ions and add together the values for the anion and the cation to get that for the electrolyte. 
Unfortunately the emf of a half cell is unmeasurable, although there have been some attempts to estimate it 
theoretically. Consequently, the convention that the standard half-cell emf of the hydrogen electrode is exactly 
zero has been adopted. 

±H 2 (ideal gas, T) -+ H + (aq, ideal, m = 1 kg mol" 1 , T) + e" f " = 0, A(7{ ] = 0. 
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Thus, when tables report the standard emf or standard free energy of the chloride ion, 

^Cl z (ideal ga.s^ r)H-e~ — > CF(aq, ideal, m = I kg mol - , T) 

it is really that of the reaction 

|H2 (ideal gas, T) + ^Cl: (ideal gas, 7 1 ) 

— > H + (aq, idea], m = 1 kg mol" ] ? T) + Cl"(aq, ideal, m = I kg mol -1 , T). 

Similarly, the standard free energy or standard emf of the sodium ion, reported for 

Na(s, T) — ► Na + (aq, ideal, m = 1 kg mol" , T) -h e~ 

is really that for 

H + (aq, ideal, m = I kg mol" ] , 7") + Na(s, T) 

— > ^Hj (ideal gas, T) -hNa^faq. ideal, m = I kg mol -1 , T). 

TEMPERATURE DEPENDENCE OF THE EQUILIBRIUM CONSTANT 

Since equation (A2.1.67) relates the equilibrium constant K to the standard free energy AG of the reaction, 


one can rewrite the equation as 

InX = -AG"fRT 
and differentiate with respect to temperature 

din tf/dr = -(dAG^/dTj/RT ■*■ AG^/RT 2 = (TAS" + AG")fRT 2 = AH°/RT 2 . (A2.1.70) 

This important relation between the temperature derivative of the equilibrium constant K and the standard 

enthalpy of the reaction A// is sometimes known as the van 't Hoff equation. (Note that the derivatives are not 
expressed as partial derivatives at constant pressure, because the quantities involved are all defined for the 

standard pressure/? . Note also that in this derivation one has not assumed — as is sometimes alleged — that 
Mr and AS are independent of temperature.) 

The validity of equation (A2.1.70) has sometimes been questioned when enthalpies of reaction determined 
from calorimetric experiments fail to agree with those determined from the temperature dependence of the 
equilibrium constant. The thermodynamic equation is rigorously correct, so doubters should instead examine 
the experimental uncertainties and whether the two methods actually relate to exactly the same reaction. 
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A2.1.7 THE THIRD LAW 

A2.1.7.1 HISTORY; THE NERNST HEAT THEOREM 

The enthalpy, entropy and free energy changes for an isothermal reaction near K cannot be measured 
directly because of the impossibility of carrying out the reaction reversibly in a reasonable time. One can, 
however, by a suitable combination of measured values, calculate them indirectly. In particular, if the value of 

A// , AS or AG is known at a specified temperature J 7 , say 298 K, its value at another temperature T can be 
computed using this value and the changes involved in bringing the products and the reactants separately from 
T to T. If these measurements can be extrapolated to K, the isothermal changes for the reaction at K can 
be calculated. 

If, in going from K to T, a substance undergoes phase changes (fusion, vaporization, etc) at T A and T B with 
molar enthalpies of transition aHa^^ AH& one can write 


ff '(T) - ft"(o> = fJ A cj dr + Aft A + iJJ CJ dT + Aft, + /J c« dr 


(A2.1.71) 


It is manifestly impossible to measure heat capacities down to exactly K, so some kind of extrapolation is 
necessary. Unless C were to approach zero as T approaches zero, the limiting value of C JT would not be 
finite and the first integral in equation (A2.1.71) would be infinite. Experiments suggested that C might 

approach zero and Nernst (1906) noted that computed values of the entropy change AS for various reactions 
appeared to approach zero as the temperature approached K. This empirical discovery, known as the Nernst 
heat theorem, can be expressed mathematically in various forms as 

lim(dAGVdT) = lim AS" = ljm(dAH"/dT) = 0. (A2.1.72) 


However, the possibility that C might not go to zero could not be excluded before the development of the 

quantum theory of the heat capacity of solids. When Debye (1912) showed that, at sufficiently low 

temperatures, C is proportional to J 3 , this uncertainty was removed, and a reliable method of extrapolation 
for most crystalline substances could be developed. (For metals there is an additional term, proportional to T, 

a contribution from the electrons.) If the temperature T is low enough that C = a r , one may write 

3V) - 3"(0) = f (CJJ/T) dr = f uT 2 dT = ffT-*/3 = C?(r')/3. (A2.1.73) 

Jo Jlh y 

With this addition, better entropy determinations, e.g. measurements plus extrapolations to K, became 
available. 
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The evidence in support of equation (A2. 1 .72) is of several kinds: 

(1) Many substances exist in two or more solid allotropic forms. At K, the thermodynamically stable 
form is of course the one of lowest energy, but in many cases it is possible to make thermodynamic 
measurements on another (metastable) form down to very low temperatures. Using the measured 
entropy of transition at equilibrium, the measured heat capacities of both forms and equation 
(A2.1.73) to extrapolate to K, one can obtain the entropy of transition at K. Within experimental 

error A*v is zero for the transitions between p~ and y-phosphine, between orthorhombic and 
monoclinic sulfur and between different forms of cyclohexanol. 

v 2 ) As seen in previous sections, the standard entropy A^v of a chemical reaction can be determined from 
the equilibrium constant K and its temperature derivative, or equivalently from the temperature 
derivative of the standard emf of a reversible electrochemical cell. As in the previous case, 
calorimetric measurements on the separate reactants and products, plus the usual extrapolation, will 

yield AS°(0). 

The limiting ASP so calculated is usually zero within experimental error, but there are some disturbing 
exceptions. Not only must solutions and some inorganic and organic glasses be excluded, but also crystalline 
CO, NO, N 2 and H 2 0. It may be easy to see, given the most rudimentary statistical ideas of entropy, that 
solutions and glasses have some positional disorder frozen in, and one is driven to conclude that the same 
situation must occur with these few simple crystals as well. For these substances in the gaseous state at 
temperature T there is a disagreement between the 'calorimetric' entropy calculated using equation (A2.1.71) 
and the 'spectroscopic' entropy calculated by statistical mechanics using the rotational and vibrational 
constants of the gas molecule; this difference is sometimes called 'residual entropy'. However, it can be 
argued that, because such a substance or mixture is frozen into a particular disordered state, its entropy is in 
fact zero. In any case, it is not in internal equilibrium (unless some special hypothetical constraints are 
applied), and it cannot be reached along a reversible path. 

It is beyond the scope of this article to discuss the detailed explanation of these exceptions; suffice it to say 
that there are reasonable explanations in terms of the structure of each crystal. 

A2.1.7.2 FIRST STATEMENT (AS -> 0) 

Because it is necessary to exclude some substances, including some crystals, from the Nernst heat theorem, 
Lewis and Gibson (1920) introduced the concept of a 'perfect crystal' and proposed the following 
modification as a definitive statement of the 'third law of thermodynamics' (exact wording due to Lewis and 
Randall (1923)): 


If the entropy of each element in some crystalline state be taken as zero at the absolute zero of temperature, 
every substance has a finite positive entropy, but at the absolute zero of temperature the entropy may become 
zero, and does so become in the case of perfect crystalline substances. 

Because of the Nernst heat theorem and the third law, standard thermodynamic tables usually do not report 
entropies of formation of compounds; instead they report the molar entropy go ^ for each element and 

compound. The entropies reported for those substances that show 'residual entropy' (the 'imperfect' 
crystalline substances) are 'spectroscopic' entropies, not 'calorimetric' entropies. 
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For those who are familiar with the statistical mechanical interpretation of entropy, which asserts that at K 
substances are normally restricted to a single quantum state, and hence have zero entropy, it should be pointed 
out that the conventional thermodynamic zero of entropy is not quite that, since most elements and 
compounds are mixtures of isotopic species that in principle should separate at K, but of course do not. The 
thermodynamic entropies reported in tables ignore the entropy of isotopic mixing, and in some cases ignore 
other complications as well, e.g. ortho- and para-hydrogen. 

A2.1.7.3 SECOND STATEMENT (UN ATTAIN ABILITY OF K) 

In the Lewis and Gibson statement of the third law, the notion of 'a perfect crystalline substance', while 
understandable, strays far from the macroscopic logic of classical thermodynamics and some scientists have 
been reluctant to place this statement in the same category as the first and second laws of thermodynamics. 
Fowler and Guggenheim (1939), noting that the first and second laws both state universal limitations on 
processes that are experimentally possible, have pointed out that the principle of the unattainability of 
absolute zero, first enunciated by Nernst (1912) expresses a similar universal limitation: 

It is impossible by any procedure, no matter how idealized, to reduce the temperature of any system to the 
absolute zero of temperature in a finite number of operations. 

No one doubts the correctness of either of these statements of the third law and they are universally accepted 
as equivalent. However, there seems to have been no completely satisfactory proof of their equivalence; some 
additional, but very plausible, assumption appears necessary in making the connection. 

Consider how the change of a system from a thermodynamic state a to a thermodynamic state P could 
decrease the temperature. (The change in state a — » P could be a chemical reaction, a phase transition, or just 
a change of volume, pressure, magnetic field, etc). Initially assume that a and P are always in complete 
internal equilibrium, i.e. neither has been cooled so rapidly that any disorder is frozen in. Then the Nernst heat 

theorem requires that S a (0) = 5^(0) and the plot of entropy versus temperature must look something like the 
sketch in figure A2. 1.10a . 
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Figure A2.1.10. The impossibility of reaching absolute zero, (a) Both states a and P in complete internal 
equilibrium. Reversible and irreversible paths (dashed) are shown, (b) State P not in internal equilibrium and 
with 'residual entropy'. The true equilibrium situation for p is shown dotted. 
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The most effective cooling process will be an adiabatic reversible one (AS = 0). In any non-adiabatic process, 
heat will be absorbed from the surroundings (which are at a higher temperature), thus defeating the cooling 
process. Moreover, according to the second law, for an irreversible adiabatic process AS > 0; it is obvious 
from figure A2. 1.10a that the reversible process gets to the lower temperature. It is equally obvious that the 
process must end with p at a non-zero temperature. 


But what if the thermodynamic state P is not in 'complete internal equilibrium', but has some 'residual 
entropy' frozen in? One might then imagine a diagram like figure A2.1.10b with a path for p leading to a 
positive entropy at K. But P is not the true internal equilibrium situation at low temperature; it was obtained 
by freezing in what was equilibrium at a much higher temperature. In a process that generates P at a much 
lower temperature, one will not get this same frozen disorder; one will end on something more nearly like the 


true internal equilibrium curve (shown dotted). This inconceivability of the low-temperature process yielding 
the higher temperature's frozen disorder is the added assumption needed to prove the equivalence of the two 
statements. (Most ordinary processes become increasingly unlikely at low temperatures; only processes with 
essentially zero activation energy can occur and these are hardly the kinds of processes that could generate 
'frozen' situations.) 

The principle of the unattainability of absolute zero in no way limits one's ingenuity in trying to obtain lower 
and lower thermodynamic temperatures. The third law, in its statistical interpretation, essentially asserts that 
the ground quantum level of a system is ultimately non-degenerate, that some energy difference As must exist 
between states, so that at equilibrium at K the system is certainly in that non-degenerate ground state with 
zero entropy. However, the As may be very small and temperatures of the order of As/k (where k is the 
Boltzmann constant, the gas constant per molecule) may be obtainable. 

MAGNETIC COOLING 

A standard method of attaining very low temperatures is adiabatic demagnetization, a procedure suggested 
independently by Debye and by Giauque in 1926. A paramagnetic solid is cooled to a low temperature (one 
can reach about 1 K by the vaporization of liquid helium) and the solid is then magnetized isothermally in a 
high magnetic field B^. (Any heat developed is carried away by contact with dilute helium gas.) As shown in 
figure A2.1.11 , the entropy obviously decreases (compare equation (A2.1.43) ). 
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Figure A2.1.11. Magnetic cooling: isothermal magnetization at 4 K followed by adiabatic demagnetization to 
0.04 K. (Constructed for a hypothetical magnetic substance with two magnetic states with an energy 


separation As = £(0.1 K) at B^ = and B* (the field at which the separation is 10Ae or 7400 gauss); the 
crystalline Stark effect has been ignored. The entropy above S/Nk = 0.69 = In 2 is due to the vibration of a 
Debye crystal.) 

Now the system is thermally insulated and the magnetic field is decreased to zero; in this adiabatic, essentially 
reversible (isentropic) process, the temperature necessarily decreases since 

((dS/dB^j.is negative and {HS/HT)bu, a heat capacity divided by temperature, is surely positive, so (dT/dB^) s 
is positive.) 
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A2.1.7.4 THIRD STATEMENT (SIMPLE LIMITS AND STATISTICAL THERMODYNAMICS) 

As we have seen, the third law of thermodynamics is closely tied to a statistical view of entropy. It is hard to 
discuss its implications from the exclusively macroscopic view of classical thermodynamics, but the problems 
become almost trivial when the molecular view of statistical thermodynamics is introduced. Guggenheim 
(1949) has noted that the usefulness of a molecular view is not unique to the situation of substances at low 
temperatures, that there are other limiting situations where molecular ideas are helpful in interpreting general 
experimental results: 

(1) Substances at high dilution, e.g. a gas at low pressure or a solute in dilute solution, show simple 
behaviour. The ideal-gas law and Henry's law for dilute solutions antedate the development of the 
formalism of classical thermodynamics. Earlier sections in this article have shown how these 
experimental laws lead to simple thermodynamic equations, but these results are added to 
thermodynamics; they are not part of the formalism. Simple molecular theories, even if they are not 
always recognized as statistical mechanics, e.g. the 'kinetic theory of gases', make the experimental 
results seem trivially obvious. 

(2) The entropy of mixing of very similar substances, i.e. the ideal solution law, can be derived from the 
simplest of statistical considerations. It too is a limiting law, of which the most nearly perfect example 
is the entropy of mixing of two isotopic species. 

With this in mind Guggenheim suggested still another statement of the 'third law of thermodynamics': 

By the standard methods of statistical thermodynamics it is possible to derive for certain entropy changes 
general formulas that cannot be derived from the zeroth, first, and second laws of classical thermodynamics. 
In particular one can obtain formulae for entropy changes in highly disperse systems, for those in very cold 
systems, and for those associated with the mixing of very similar substances. 

A2.1.8 THERMODYNAMICS AND STATISTICAL MECHANICS 

Any detailed discussion of statistical mechanics would be inappropriate for this section, especially since other 
sections ( A2.2 and A2.3 ) treat this in detail. However, a few aspects that relate to classical thermodynamics 
deserve brief mention. 

A2.1.8.1 ENSEMBLES AND THE CONSTRAINTS OF CLASSICAL THERMODYNAMICS 

It is customary in statistical mechanics to obtain the average properties of members of an ensemble, an 
essentially infinite set of systems subject to the same constraints. Of course each of the systems contains the 


same substance or group of substances, but in addition the constraints placed on a particular ensemble are 
parallel to those encountered in classical thermodynamics. 

The microcanonical ensemble is a set of systems each having the same number of molecules TV, the same 
volume Fand the same energy U. In such an ensemble of isolated systems, any allowed quantum state is 
equally probable. In classical thermodynamics at equilibrium at constant n (or equivalently, N), V, and U, it is 
the entropy S that is a maximum. For the microcanonical ensemble, the entropy is directly related to the 
number of allowed quantum states Q(7V, V, U)\ 
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The canonical ensemble is a set of systems each having the same number of molecules TV, the same volume V 
and the same temperature T. This corresponds to putting the systems in a thermostatic bath or, since the 
number of systems is essentially infinite, simply separating them by diathermic walls and letting them 
equilibrate. In such an ensemble, the probability of finding the system in a particular quantum state / is 
proportional to e~^ ftr where U^N, V) is the energy of the /th quantum state and k, as before, is the Boltzmann 

constant. In classical thermodynamics, the appropriate function for fixed TV, Fand T is the Helmholtz free 
energy A, which is at a minimum at equilibrium and in statistical mechanics it is A that is directly related to 
the canonical partition function Q for the canonical ensemble. 

Q(N\ V, T) = ^ e"^^^ 

t 
A(N, V, T) = -AT In Q(N, V, T). 

The grand canonical ensemble is a set of systems each with the same volume V, the same temperature T and 
the same chemical potential |u (or if there is more than one substance present, the same set of |u.s). This 
corresponds to a set of systems separated by diathermic and permeable walls and allowed to equilibrate. In 
classical thermodynamics, the appropriate function for fixed |u, V, and 7 is the product pV (see equation 
(A2.1.37) ) and statistical mechanics relates pV directly to the grand canonical partition function ^. 

N .V 

pV =kT In ZQi, V,D- 

where X is the absolute activity of section 2.1.6.5 . 

Since other sets of constraints can be used, there are other ensembles and other partition functions, but these 
three are the most important. 

A2.1.8.2 FLUCTUATIONS; THE 'EXACTNESS' OF THERMODYNAMICS 

In defining the thermodynamic state of a system in terms of fixed macroscopic constraints, classical 
thermodynamics appears to assume the identical character of two states subject to the identical set of 
constraints. However, any consideration of the fact that such systems are composed of many molecules in 
constant motion suggests that this must be untrue. Surely, fixing the number of gas molecules N in volume V 
at temperature T does not guarantee that the molecules striking the wall create exactly the same pressure at all 
times and in all such systems. If the pressure/? is just an average, what can one say about the magnitude of 
fluctuations about this average? 


According to statistical mechanics, for the canonical ensemble one may calculate (£/), the average energy of 
all the members of the ensemble, while for the grand canonical ensemble one can calculate two averages, (TV) 
and (U). Of crucial importance, however, is the probability of observing significant variations (fluctuations) 
from these averages in any particular member of the ensemble. Fortunately, statistical mechanics yields an 
answer to these questions. 
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Probability theory shows that the standard deviation a x of a quantity x can be written as 

*i = <**) - (zf 

and statistical mechanics can relate these averages to thermodynamic quantities. In particular, for the 
canonical ensemble 

4 = kT'Cy 

while for the grand canonical ensemble 

All the quantities in these equations are intensive (independent of the size of the system) except Cy, N, and V, 
which are extensive (proportional to the size of the system, i.e. to TV)- It follows that ayis of the order of TV, so 

Ojy/N is of the order of If , as is OjJU. Since a macroscopic system described by thermodynamics probably 
has at least about 10 molecules, the uncertainty, i.e. the typical fluctuation, of a measured thermodynamic 
quantity must be of the order of 10 times that quantity, orders of magnitude below the precision of any 
current experimental measurement. Consequently we may describe thermodynamic laws and equations as 
'exact'. 

(An exception to this conclusion is found in the immediate vicinity of critical points, where fluctuations 
become much more significant, although — with present experimental precision — still not of the order of TV.) 
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A2.2 Statistical mechanics of weakly interacting 
systems 


Rashmi C Desai 


A2.2.1 INTRODUCTION 

Thermodynamics is a phenomenological theory based upon a small number of fundamental laws, which are 
deduced from the generalization and idealization of experimental observations on macroscopic systems. The 
goal of statistical mechanics is to deduce the macroscopic laws of thermodynamics and other macroscopic 
theories (e.g. hydrodynamics and electromagnetism) starting from mechanics at a microscopic level and 
combining it with the rules of probability and statistics. As a branch of theoretical physics, statistical 
mechanics has extensive applications in physics, chemistry, biology, astronomy, materials science and 
engineering. Applications have been made to systems which are in thermodynamic equilibrium, to systems in 
steady state and also to non-equilibrium systems. Even though the scope of statistical mechanics is quite 
broad, this section is mostly limited to basics relevant to equilibrium systems. 

At its foundation level, statistical mechanics involves some profound and difficult questions which are not 
fully understood, even for systems in equilibrium. At the level of its applications, however, the rules of 
calculation that have been developed over more than a century have been very successful. 

The approach outlined here will describe a viewpoint which leads to the standard calculational rules used in 
various applications to systems in thermodynamic (thermal, mechanical and chemical) equilibrium. Some 
applications to ideal and weakly interacting systems will be made, to illustrate how one needs to think in 
applying statistical considerations to physical problems. 

Equilibrium is a macroscopic phenomenon which implies a description on a length and time scale much larger 
than those appropriate to the molecular motion. The concept of 'absolute equilibrium' is an idealization and 
refers to a state of an absolutely isolated system and an infinitely long observation time. In non-equilibrium 
systems with slowly varying macroscopic properties, it is often useful to consider 'local equilibrium' where 
the 'macroscopic' time and length scales are determined in the context of an observation, or an experiment, 
and the system. A typical value of an experimentally measured property corresponds to the average value over 
the observation time of the corresponding physical observable; the physical properties of a system in 
equilibrium are time invariant and should be independent of the observation time. The observation time of an 
equilibrium state is typically quite long compared to the time characteristic of molecular motions. 

Conservation laws at a microscopic level of molecular interactions play an important role. In particular, 
energy as a conserved variable plays a central role in statistical mechanics. Another important concept for 
equilibrium systems is the law of detailed balance. Molecular motion can be viewed as a sequence of 
collisions, each of which is akin to a reaction. Most often it is the momentum, energy and angular momentum 
of each of the constituents that is changed during a collision; if the molecular structure is altered, one has a 
chemical reaction. The law of detailed balance implies that, in equilibrium, the number of each reaction in the 
forward direction is the same as that in the reverse direction; i.e. each microscopic reaction is in equilibrium. 
This is a consequence of the time reversal symmetry of mechanics. 


A2.2.2 MECHANICS, MICROSTATES AND THE DEGENERACY 
FUNCTION 

Macroscopic systems contain a large number, N, of microscopic constituents. Typically TV is of the order of 


90 9^ 

10 -10 . Thus many aspects of statistical mechanics involve techniques appropriate to systems with large 
TV. In this respect, even the non-interacting systems are instructive and lead to non-trivial calculations. The 
degeneracy function that is considered in this subsection is an essential ingredient of the formal and general 
methods of statistical mechanics. The degeneracy function is often referred to as the density of states. 

We first consider three examples as a prelude to the general discussion of basic statistical mechanics. These 
are: (i) TV non-interacting spin-1 particles in a magnetic field, (ii) TV non-interacting point particles in a box, 

and (iii) TV non-interacting harmonic oscillators. For each example the results of quantum mechanics are used 
to enumerate the microstates of the TV-particle system and then obtain the degeneracy function (density of 
states) of the system's energy levels. Even though these three examples are for ideal non-interacting systems, 
there are many realistic systems which turn out to be well approximated by them. 

A microstate (or a microscopic state) is one of the quantum states determined from TiO^ = Efi^ (/ = 1,2,. . .), 

where Wis the Hamiltonian of the system, Ej is the energy of the quantum state / and O^ is the wavefunction 

representing the quantum state /. The large-TV behaviour of the degeneracy function is of great relevance. The 
calculation of the degeneracy function in these three examples is a useful precursor to the conceptual use of 
the density of states of an arbitrary interacting system in the general framework of statistical mechanics. 

(i) TV non-interacting spin-1 particles in a magnetic field. Each particle can be considered as an elementary 

magnet (its magnetic moment has a magnitude equal to |u) which can point along two possible directions in 
space (+z or 'up' and -z or 'down'). A microstate of such a model system is given by giving the orientation (+ 

or -) of each magnet. It is obvious that for TV such independent magnets, there are 2 N different microstates for 
this system. Note that this number grows exponentially with TV. The total magnetic moment ,VJof the model 

system is the vector sum of the magnetic moments of its TV constituents. The component of M along the z 

direction varies between -TV |u and + TV |u, and can take any of the (TV + 1) possible values TV//, (TV- 2)|u, (TV- 

4) |i v . .,-TV|u. This number of possible values of Mis much less than the total number of microstates 2 N , for 
large TV. The number of microstates for a given ,V1is the degeneracy function. In each of these microstates, 
there will be ±N + m spins up and W-m spins down, such that the difference between the two is 2m, which is 

called the spin excess and equals Ml\i. If x is the probability for a particle to have its spin up and y = (1 - x) is 
the probability for its spin down, the degeneracy function g(N, m) can be obtained by inspection from the 
binomial expansion 

\« N , 

( X +y) N = V^ — x hN+m yhN-m 

^ dN + m)U\N-m)\ 


That is 


AT! 
!i(N. m) = -j— — |— -. (A2.2.1) 

By setting x = y = \, one can see that ~L m g(N,m) = 2 . For typical macroscopic systems, the number Af of 

constituent molecules is very large: TV- 10 . For large TV, g(N,m) is a very sharply peaked function of m. In 
order to see this one needs to use the Stirling approximation for TV! which is valid when TV » 1 : 

N\ ^(2jrW)^^expl- J V+1/(12W)+ J. (A2.2.2) 


For sufficiently large N, the terms 1/(12N) + . . . can be neglected in comparison with N and one obtains 

log N\ * \ k>£(2;r) + (N + 7) log N - N (A2.2.3) 

^ NlogN -N (A2.2.4) 

since both ilog(2n:) and ^are negligible compared to N. Using (A2.2.3), for log g(N,m) one obtains 

wynB [t ) s = -c<)s/<iB t) } T /uis/mBo- 

which reduces to a Gaussian distribution for g(N, m): 

g(N, m) *zg{N,Q) s\p(-2m 2 /N ) (A2.2.5) 

with 

giN, 0) = -r- —. « (-Z- V 2\ (A2.2.6) 

When m 2 = 7V/2, the value of g is decreased by a factor of e from its maximum at m = 0. Thus the fractional 
width of the distribution is A{m/N) — {IfN}*. For TV- 10 22 the fractional width is of the order of 10 -11 . It is 
the sharply peaked behaviour of the degeneracy functions that leads to the prediction that the thermodynamic 
properties of macroscopic systems are well defined. 

For this model system the magnetic potential energy in the presence of a uniform magnetic field jyis given by 
-JA'H and, for ^pointing in + z direction, it is -M^or -2m\iH. A fixed magnetic potential energy thus 
implies a fixed value of m. For a given H, the magnetic potential energy of the system is bounded from above 
and below. This is not the 


case for the next two examples. Besides the example of an ideal paramagnet that this model explicitly 
represents, there are many other systems which can be modelled as effective two-state systems. These include 
the lattice gas model of an ideal gas, binary alloy and a simple two-state model of a linear polymer. The time 
dependence of the mean square displacement of a Brownian particle can also be analysed using such a model. 

(ii) N non-interacting point particles in a box. The microstate (orbital) of a free particle of mass M confined in 

a cube of volume I? is specified by three integers (quantum numbers): (n % , n. nj = n; n z , n. n x =1, 2, 3, . . .. 
Its wavefunction is 

^ n = i2/L) m s\n(jcp x )s\n{yp v ) sin(s;j-) 

with/? = (hn/L)n, and the energy *=p 2 /(2M) = (hn/L) 2 n 2 /(2M). The energy grows quadratically with n, and 
without bounds. One can enumerate the orbitals by considering the positive octant of a sphere in the space 
defined by n , n and n for free particle orbitals. With every unit volume An An An =1, there is one orbital 
per spin orientation of the particle. For particles of spin /, there are y = (21+ 1) independent spin orientations. 
The energy of an orbital on the surface of a sphere of radius n Q in the n space is * o = (hn/L) n J(2M). The 


degeneracy function, or equivalently, the number of orbitals in the allowed (positive) octant of a spherical 
shell of thickness An is y^4nn 2 o An = ^fnn 2 Q An. This is an approximate result valid asymptotically for large 
n . Often one needs the number of orbitals with energy between eand e+ de. If it is denoted by 2?(c) de, it is 
easy to show by using 

P(OtU = -yjrn^dt (A2.2.7) 

2 df 


that 


yV /2M\? i 


(A2.2.8) 


This is the density of microstates for one free particle in volume V= L . 

For TV non-interacting particles in a box, the result depends on the particle statistics: Fermi, Bose of 
Boltzmann. The state of a quantum system can be specified by the wavefunction for that state, v|/ v (q 1? q 2 , . . 
.q^). \|/ v is the vth eigensolution to the Schrodinger equation for an TV-particle system. If the particles are non- 
interacting, then the wavefunction can be expressed in terms of the single-particle wavefunctions (§ given 
above is an example). Let these be denoted by (^(q), (^(q)? • • •> ^(q)? • • •• F° r a specific state v, \j/(q l3 q 2 , . . 
., q^y) will be the appropriately symmetrized product containing n^ particles with the single-particle 

wavefunction (^(q), n 2 particles with ^(q), etc. For Fermi particles (with half integral spin) the product is 
antisymmetric and for Bose particles (with integer spin) it is symmetric. The antisymmetry of the Fermi 
particle wavefunction implies that fermions obey the Pauli exclusion principle. The numbers n^ n 2 , . . ., n-, . 
. are the occupation numbers of the respective single-particle states, and this set of occupation numbers {«•} 
completely specify the state v of the system. If there are 7V v particles in this state then 


7V v = £• n-, and if they'th single-particle state has energy £-, then the energy of the system in the state v is E = 

In an ideal molecular gas, each molecule typically has translational, rotational and vibrational degrees of 
freedom. The example of 'one free particle in a box' is appropriate for the translational motion. The next 
example of oscillators can be used for the vibrational motion of molecules. 

(iii) TV non-interacting harmonic oscillators. Energy levels of a harmonic oscillator are non-degenerate, 
characterized by a quantum number / and are given by the expression ^ = (/ + -j)to, / = 0, 1, 2, 3, . . ., oo. For a 
system of TV independent oscillators, if the /th oscillator is in the state n^ the set {n^} gives the microstate of 
the system, and its total energy E is given by E = ^A/ftco + Z^i 11 !^ 03 - Consider the case when the energy of the 
system, above its zero point energy ^A/ftco, is a fixed amount E =(E - jA/ftco). Define n such that E = nltay. 
Then, one needs to find the number of ways in which a set of {nj can be chosen such that ^- = i^ z - = n. This 
number is the degeneracy function g(N, n) for this system. For a single-oscillator case, n^ = n and g(l, n) = 1, 
for all n. Consider a sum Z^i>g(l, n)t n \ it is called a generating function. Since g(l, n) = 1, it sums to (1-0 _1 ? 
if U | < 1. For n independent oscillators, one would therefore use (1-(T^, rewriting it as 

S(AU;t/) = tin ntJVjV;*/). 


Now since in general, 

its use for the specific example of N oscillators gives 

<2(yv, v,T) = Y, e-'-' ( " v '^ r 

A(W, V, T) = -kT In Q{N, V, T). 

with the final result that the degeneracy function for the Af-oscillator system is 

(JV + n-1)! 

/a H ■■V - 1 ) ! 

The model of non-interacting harmonic oscillators has a broad range of applicability . Besides vibrational 
motion of molecules, it is appropriate for phonons in harmonic crystals and photons in a cavity (black-body 
radiation). 

A2.2.2.1 CLASSICAL MECHANICS 

The set of microstates of a finite system in quantum statistical mechanics is a finite, discrete denumerable set 
of quantum states each characterized by an appropriate collection of quantum numbers. In classical statistical 
mechanics, the set of microstates form a continuous (and therefore infinite) set of points in T space (also 
called phase space). 


Following Gibbs, the T space is defined as a 2/-dimensional space for a system with/degrees of freedom: (p v 
p 2 , . . ;Ppqp #2' • • •' #p)' a bbreviated as (p, q). Here {p^ q^), i = 1, . . .,/are the canonical momenta and 

canonical coordinates of the/degrees of freedom of the system. Given a precise initial state (p°, q°), a system 
with the Hamiltonian 7i(p, q) evolves deterministically according to the canonical equations of motion: 


' = «; 1 = -/>-■ ( A2 - 2 - 1 °) 

Now, if D/Dt represents time differentiation along the deterministic trajectory of the system in the Y space, it 
follows that 




(A2.2.11) 


where the last equality is obtained using the equations of motion. Thus, when Wdoes not depend on time 
explicitly, i.e. when dHldt = 0, the above equation implies that "H(p, q) = E = constant. The locus of points in 
T space satisfying this condition defines a (2f- l)-dimensional energy hypersurface S, and the trajectory of 
such a system in T space would lie on this hypersurface. Furthermore, since a given trajectory is uniquely 
determined by the equations of motion and the initial conditions, two trajectories in T space can never 
intersect. 


A2.2.2.2 LIOUVILLE'S THEOREM 

The volume of a T -space-volume-element does not change in the course of time if each of its points traces out 
a trajectory in Y space determined by the equations of motion. Equivalently, the Jacobian 

J{t * '*> = »/ o oi = L (A2.2.12) 

Liouville's theorem is a restatement of mechanics. The proof of the theorem consists of two steps. 
(1) Expand J(t, t Q ) around t to obtain: 

•2 = <**> - (*)* 

Hence, 


(2) From the multiplication rule of Jacobians, one has for any tj between t Q and t, 

u% = kTK T (N 2 fV), 

Now let t-y approach t. Then, by the result of the first step, the first factor on the right-hand side vanishes. 
Hence, 

= and J(i, t ti ) = consttinL 

Finally, since J(t Q , tj = 1, one obtains the result J(t, tj = 1, which concludes the proof of Liouville's theorem. 

Geometrically, Liouville's theorem means that if one follows the motion of a small phase volume in T space, 
it may change its shape but its volume is invariant. In other words the motion of this volume in T space is like 
that of an incompressible fluid. Liouville's theorem, being a restatement of mechanics, is an important 
ingredient in the formulation of the theory of statistical ensembles, which is considered next. 

A2.2.3 STATISTICAL ENSEMBLES 

In equilibrium statistical mechanics, one is concerned with the thermodynamic and other macroscopic 
properties of matter. The aim is to derive these properties from the laws of molecular dynamics and thus 
create a link between microscopic molecular motion and thermodynamic behaviour. A typical macroscopic 
system is composed of a large number TV of molecules occupying a volume V which is large compared to that 
occupied by a molecule: 

N ^ JO" 1 molecules V ^ 10" 3 molecular volumes. 


Due to such large numbers, it is useful to consider the limiting case of the thermodynamic limit, which is 
defined as 

V 

N -* 00 V ^ 00 — = V (A2.2.13) 

N 

where the specific volume o is a given finite number. For a three-dimensional system with N point particles, 
the total number of degrees of freedom/= 37V. 

A statistical ensemble can be viewed as a description of how an experiment is repeated. In order to describe a 
macroscopic system in equilibrium, its thermodynamic state needs to be specified first. From this, one can 
infer the macroscopic constraints on the system, i.e. which macroscopic (thermodynamic) quantities are held 
fixed. One can also deduce, from this, what are the corresponding microscopic variables which will be 
constants of motion. A macroscopic system held in a specific thermodynamic equilibrium state is typically 
consistent with a very large number (classically infinite) of microstates. Each of the repeated experimental 
measurements on such a system, under ideal 
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conditions and with identical macroscopic constraints, would correspond to the system in a different 
accessible microstate which satisfies the macroscopic constraints. It is natural to represent such a collection of 
microstates by an ensemble (a mental construct) of systems, which are identical in composition and 
macroscopic conditions (constraints), but each corresponding to a different microstate. For a properly 
constructed ensemble, each of its member systems satisfies the macroscopic constraints appropriate to the 
experimental conditions. Collectively the ensemble then consists of all the microstates that satisfy the 
macroscopic constraints (all accessible states). The simplest assumption that one can make in order to 
represent the repeated set of experimental measurements by the ensemble of accessible microstates is to give 
each an equal weight. The fundamental assumption in the ensemble theory is then the Postulate of 'Equal 
apriori probabilities'. It states that 'when a macroscopic system is in thermodynamic equilibrium, its 
microstate is equally likely to be any of the accessible states, each of which satisfy the macroscopic 
constraints on the system'. 

Such an ensemble of systems can be geometrically represented by a distribution of representative points in the 
r space (classically a continuous distribution). It is described by an ensemble density function p(p, q, t) such 
that p{p, q, t)d 'd is the number of representative points which at time t are within the infinitesimal phase 
volume element dfp df q (denoted by d ^Q) around the point (p, q) in the Y space. 

Let us consider the consequence of mechanics for the ensemble density. As in subsection A2.2.2. 1 , let D/Dt 
represent differentiation along the trajectory in T space. By definition, 


According to Liouville's theorem, 


^<pd 3 'fi) = Q. 


D! 


Therefore, 


D/ 


or, equivalently, 




which can be rewritten in terms of Poisson brackets using the equations of motion, (A2.2.10) : 


dp yp C^L^i _ ^2l\ - 


This is same as 

^7+IH.p| PBj =(X (A2.2.14) 

at 

For the quantum mechanical case, p and Ware operators (or matrices in appropriate representation) and the 
Poisson bracket is replaced by the commutator [H, p]_ If the distribution is stationary, as for the systems in 
equilibrium, then dp/dt = 0, which implies 

\Hi p]p.ii. = chittsically und [H, p]- — quanLum itujchnniLally. (A2.2.15) 

A stationary ensemble density distribution is constrained to be a functional of the constants of motion 

(globally conserved quantities). In particular, a simple choice is p(p, q) = p (7f (p, q)), where p (h) is some 
functional (function of a function) of "H. Any such functional has a vanishing Poisson bracket (or a 
commutator) with "Hand is thus a stationary distribution. Its dependence on (p, q) through H(p, q) = E is 

expected to be reasonably smooth. Quantum mechanically, p (H) is the density operator which has some 
functional dependence on the Hamiltonian "Hdepending on the ensemble. It is also normalized: Trp = 1. The 
density matrix is the matrix representation of the density operator in some chosen representation of a complete 
orthonormal set of states. If the complete orthonormal set of eigenstates of the Hamiltonian is known: 

H\v) = E v \v) (vlv') =S VV < 

then the density operator is 

Often the eigenstates of the Hamiltonian are not known. Then one uses an appropriate set of states | u) which 


are complete and orthonormal. In any such representation the density matrix, given as (u | p (H) \ u), is not 
diagonal. 

A2.2.3.1 MICROCANONICAL ENSEMBLE 

An explicit example of an equilibrium ensemble is the microcanonical ensemble, which describes closed 
systems with adiabatic walls. Such systems have constraints of fixed TV, Vand E < H< E + dE. dE is very 
small compared to E, and corresponds to the assumed very weak interaction of the 'isolated' system with the 
surroundings. dE has to be chosen such that it is larger than (8E) ~ M ob where h is the Planck's constant 
and ^ ob is the duration of the observation time. In such a case, even though dE may be small, there will be a 
great number of microstates for a macroscopic size system. For a microcanonical ensemble, the 'equal a 
priori probability' postulate gives its density distribution as: 


-10- 
classically, 

() (constant \iE<Wp,q) <E + dE (A2 

1 otherwise 

quantum mechanically, if the system microstate is denoted by /, then 

constant if fC < Ej < E + dE 


H 


, (A2.2.17) 

otherwise. 


One considers systems for which the energy shell is a closed (or at least finite) hypersurface S. Then the 
energy shell has a finite volume: 

J£<?*<E<kLF 

For each degree of freedom, classical states within a small volume Ap x Aq x ~ h merge into a single quantum 
state which cannot be further distinguished on account of the uncertainty principle. For a system with/ 

degrees of freedom, this volume is Jr. Furthermore, due to the indistinguishability of identical particles in 
quantum mechanics, there are TV! distinguishable classical states for each quantum mechanical state, which are 
obtained by simple permutations of the TV particles in the system. Then the number of microstates T(E) in T 
space occupied by the microcanonical ensemble is given by 




(A2.2.18) 


where/is the total number of degrees of freedom for the TV-particle system, and V(E) is the density of states 
of the system at energy E. If the system of N particles is made up of 7V A particles of type A, 7V B particles of 
type B, . . ., then TV! is replaced by 7V A !7V B !, . . .. Even though dE is conceptually essential, it does not affect 
the thermodynamic properties of macroscopic systems. In order that the ensemble density p is normalized, the 

'constant' above in (A2.2.16) has to be |T(£)] . Quantum mechanically T(E) is simply the total number of 
microstates within the energy interval E < E l < E + dE, and fixes the 'constant' in (A2.2.17). T(E) is the 
microcanonical partition function; in addition to its indicated dependence on E, it also depends on N and V. 

Consider a measurable property B(p, q) of the system, such as its energy or momentum. When a system is in 


equilibrium, according to Boltzmann, what is observed macroscopically are the time averages of the form 

(A2.2.19) 


B * = lim ^F / Bip^q f )dt. 


It was assumed that, apart from a vanishingly small number of exceptions, the initial conditions do not have 
an effect on these averages. However, since the limiting value of the time averages cannot be computed, an 
ergodic hypothesis 
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was introduced: time averages are identical with statistical averages over a microcanonical ensemble, for 
reasonable functions B-, except for a number of initial conditions, whose importance is vanishingly small 
compared with that of all other initial conditions. Here the ensemble average is defined as 


{B) = ± 1 , f * "' / » (A2.2.20) 

The thinking behind this was that, over a long time period, a system trajectory in T space passes through every 
configuration in the region of motion (here the energy shell), i.e. the system is ergodic, and hence the infinite 
time average is equal to the average in the region of motion, or the average over the microcanonical ensemble 
density. The ergodic hypothesis is meant to provide justification of the 'equal a priori probability' postulate. 

It is a strong condition. For a system of TV particles, the infinitely long time must be much longer than O (e^), 
whereas the usual observation time window is O (1). (When one writes y = O (x) and z = o (x), it implies that 

liin^^y/z = finite ^ and lirn^^z/x = 0.) However, if by 'reasonable functions B' one means large 

variables O(N), then their values are nearly the same everywhere in the region of motion and the trajectory 
need not be truly ergodic for the time average to be equal to the ensemble average. Ergodicity of a trajectory 
is a difficult mathematical problem in mechanics. 

The microcanonical ensemble is a certain model for the repetition of experiments: in every repetition, the 
system has 'exactly' the same energy, TV and V; but otherwise there is no experimental control over its 
microstate. Because the microcanonical ensemble distribution depends only on the total energy, which is a 
constant of motion, it is time independent and mean values calculated with it are also time independent. This 
is as it should be for an equilibrium system. Besides the ensemble average value (£?), another commonly used 
'average' is the most probable value, which is the value of Bip, q) that is possessed by the largest number of 
systems in the ensemble. The ensemble average and the most probable value are nearly equal if the mean 
square fluctuation is small, i.e. if 

' ' 1 « L (A2.2.21) 

If this condition is not satisfied, there is no unique way of calculating the observed value of tf, and the validity 
of the statistical mechanics should be questioned. In all physical examples, the mean square fluctuations are of 
the order of 1 /TV and vanish in the thermodynamic limit. 

A2.2.3.2 MIXING 

In the last subsection, the microcanonical ensemble was formulated as an ensemble from which the 
equilibrium properties of a dynamical system can be determined by its energy alone. We used the postulate of 


equal a priori probability and gave a discussion of the ergodic hypothesis. The ergodicity condition, even 
though a strong condition, does not ensure that if one starts from a non-equilibrium ensemble the expectation 
values of dynamical functions will approach their equilibrium values as time proceeds. For this, one needs a 
stronger condition than ergodicity, a condition of mixing. Every mixing system is ergodic, but the reverse is 
not true. 
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Consider, at t = 0, some non-equilibrium ensemble density p nG (p°, q°) on the constant energy hypersurface S, 
such that it is normalized to one. By Liouville's theorem, at a later time t the ensemble density becomes p ne 
(§_ t (p, <?)), where §_ t (p, q) is the function that takes the current phase coordinates {p, q) to their initial values 
time (t) ago; the function § is uniquely determined by the equations of motion. The expectation value of any 
dynamical variable tfat time t is therefore 


/ d 2f a S(p r 9)Ak W-i IP* </))■ (A2.2.22) 


As t becomes large, this should approach the equilibrium value (tf), which for an ergodic system is 

f6 2 tnB{p*<r)p(p<q) f s d 2r nm Pi q) 


(A2.2.23) 


where S is the hypersurface of the energy shell for the microcanonical ensemble. This equality is satisfied if 
the system is mixing. 

A system is mixing if, for every pair of functions/and g whose squares are integrable on S, 

lim / d 2 'Q f(p, q)g(<P-A P , q)) = ^ fV,n ^~^ (AZZ24) 

The statement of the mixing condition is equivalent to the following: if Q and 7? are arbitrary regions in S, and 
an ensemble is initially distributed uniformly over Q, then the fraction of members of the ensemble with phase 
points in R at time t will approach a limit as t -^ go, and this limit equals the fraction of area of S occupied by 
R. 

The ensemble density p nQ (p f q t ) of a mixing system does not approach its equilibrium limit in the pointwise 
sense. It is only in a 'coarse-grained' sense that the average of p nQ (p f q t ) over a region R in S approaches a 
limit to the equilibrium ensemble density as t -^ oo for each fixed R. 

In the condition of mixing, equation (A2.2.24), if the function g is replaced by p ne , then the integral on the 
left-hand side is the expectation value off, and at long times approaches the equilibrium value, which is the 
microcanonical ensemble average {/), given by the right-hand side. The condition of mixing is a sufficient 
condition for this result. The condition of mixing for equilibrium systems also has the implication that every 
equilibrium time-dependent correlation function, such as (f(p, q)g($ t (p, q))), approaches a limit of the 
uncorrected product (J) (g) as t -^ oo. 

A2.2.3.3 ENTROPY 

For equilibrium systems, thermodynamic entropy is related to ensemble density distribution p as 


S = -k h (hgfi) (A2.2.25) 
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where k^ is a universal constant, the Boltzmann's constant. For equilibrium systems, p is a functional of 
constants of motion like energy and is time independent. Thus, the entropy as defined here is also invariant 
over time. In what follows it will be seen that this definition of entropy obeys all properties of thermodynamic 
entropy. 

A low-density gas which is not in equilibrium, is well described by the one-particle distribution/(jJ, f, t), 
which describes the behaviour of a particle in |u space of the particle's velocity jjand position ?. One can 
obtain /(?, ?, from the classical density distribution p(p, q, i) (defined earlier following the postulate of 
equal a priori probabilities) by integrating over the degrees of freedom of the remaining (N-l) particles. 
Such a coarse-grained distribution/satisfies the Boltzmann transport equation (see section A3. 1 ). Boltzmann 
used the crucial assumption of molecular chaos in deriving this equation. From/one can define an H function 
as H(t) = (logf} nQ where the non-equilibrium average is taken over the time-dependent/ Boltzmann proved 
an H theorem, which states that 'if at a given instant t, the state of gas satisfies the assumption of molecular 
chaos, then at the instant t + £(i — » 0), dH (t) I dt = 0; the equality dH (t) I dt = is satisfied if and only if/is 
the equilibrium Maxwell-Boltzmann distribution'; i.e. H (t) obtained from the Boltzmann transport equation 
is a monotonically decreasing function of t. Thus a generalization of the equilibrium definition of entropy to 
systems slightly away from equilibrium can be made: 


S(l) = -* b (log /),„ = -k b H{t), (A2.2.26) 

Such a generalization is consistent with the Second Law of Thermodynamics, since the //theorem and the 
generalized definition of entropy together lead to the conclusion that the entropy of an isolated non- 
equilibrium system increases monotonically, as it approaches equilibrium. 

A2.2.3.4 ENTROPY AND TEMPERATURE IN A MICROCANONICAL ENSEMBLE 

For a microcanonical ensemble, p = [r(iT)] for each of the allowed T(E) microstates. Thus for an isolated 
system in equilibrium, represented by a microcanonical ensemble, 

S = Jt b lugr(E). (A2.2.27) 

Consider the microstates with energy Ej such that Ej < E. The total number of such microstates is given by 

Z(£) = [A- f JV!J" 1 / d Z/ ft. (A2.2.28) 

Then T(E) = Z(E + dE)- 2CE), and the density of states V(E) = dZ/dE. A system containing a large number of 
particles N 9 or an indefinite number of particles but with a macroscopic size volume V, normally has the 
number of states X, which approaches asymptotically to 

E ^«p ("*(§■ ■£)) or ^^(^Hf-v)) (A2229) 
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where E/N, E/Vand § or \|/ are each -O(l), and 


(j?>0 <p f > 0" < 0. (A2.2.30) 

Consider the three examples considered in ( section A2.2.2 ). For examples (i) and (iii), the degeneracy 
function g(N, n) is a discrete analogue of T(E). Even though H(E) can be obtained from g(N, n) by summing it 
over n from the lowest-energy state up to energy E, the largest value of n dominates the sum if TV is large, so 
that g(N, n) is also like 2(£). For example (ii), Z(£) can be obtained from the density of states Z>(£) by an 
integration over efrom zero to E. S(£) so obtained conforms to the above asymptotic properties for large TV, 
for the last two of the three examples. For the first example of 'TV non-interacting spin-1 particles in a 

magnetic field', this is the case only for the energy states corresponding to < m < TV/2. (The other half state 
space with > m >(-TV/2), corresponding to the positive magnetic potential energy in the range between zero 
and \jlHN, corresponds to the system in non-equilibrium states, which have sometimes been described using 
'negative temperatures' in an equilibrium description. Such peculiarities often occur in a model system which 
has a finite upper bound to its energy.) 

Using the asymptotic properties of S(£) for large TV, one can show that, within an additive constant ~0(log TV) 
or smaller, the three quantities k^ log T (£), k b log V(E) and k^ log S(£) are equivalent and thus any of the 

three can be used to obtain S for a large system. This leads to the result that S = k^ log T (E) = k^N§, so that 
the entropy as defined is an extensive quantity, consistent with the thermodynamic behaviour. 

For an isolated system, among the independent macroscopic variables TV, Fand E, only Fcan change. Now V 
cannot decrease without compressing the system, and that would remove its isolation. Thus Fcan only 
increase, as for example is the case for the free expansion of a gas when one of the containing walls is 
suddenly removed. For such an adiabatic expansion, the number of microstates in the final state is larger; thus 
the entropy of the final state is larger than that of the initial state. More explicitly, note that S(£) is a non- 
decreasing function of V, since if V 1 > V 2 , then the integral in the defining equation, ( A2.2.28 ), for V=V^ 

extends over a domain of integration that includes that for V= Vj. Thus S(E, V) = k^ log S(£) is a non- 
decreasing function of V. This is also consistent with the Second Law of Thermodynamics. 

Next, let x f be either p. or q., (i = 1, . . .,/). Consider the ensemble average (x f d Hldx- ): 


(•§»- 


dIC(a/iih)f 7{<E d ir Q XjCSH/Hxj) 
DE6E 
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Since 8E/dx. = 0, 


/ d ; Q Xi - — = / d ; Q x; — 


The first integral on the right-hand side is zero: it becomes a surface integral over the boundary where (7i - E) 
= 0. Using the result in the previous equation, one obtains 


V{E)J n<£ V(E) 

'ozt^/ac J [dE b J 


-i 


= 4/ 


as/aE 


(A2.2.31) 


In a microcanonical ensemble, the internal energy of a system is 

U = (ft) - E, 

Since the temperature T relates [/and S as T= (9 C//9 5)p it is appropriate to make the identification 

B$ 1 

Since 7 is positive for systems in thermodynamic equilibrium, S and hence log S should both be 
monotonically increasing functions of E. This is the case as discussed above. 

With this identification of T, the above result reduces to the generalized equipartition theorem: 


(A2.2.32) 


(*s;H'< t; 


-16- 


For i =j and x. =p., one has 


and for i =j and x. = q. 9 




fc.r 


Now since 9 W/9 9. = -# i? one gets the virial theorem: 


(A2.2.33) 


(A2.2.34) 


(A2.2.35) 


(A2.2.36) 


S*A) = -/*h^ 


Ll=l 


(A2.2.37) 


There are many physical systems which are modelled by Hamiltonians, which can be transformed through a 
canonical transformation to a quadratic form: 


n = JjLa i fi + b l qf) 


(A2.2.38) 


where p f and q f are canonically conjugate variables and a f and b f are constants. For such a form of a 
Hamiltonian: 




(A2.2.39) 


If/of the constants a. and Z?. are non-zero, then it follows from above that 

<7fl = i/itbT- (A2.2.40) 

Each harmonic term in the Hamiltonian contributes ^Tto the average energy of the system, which is the 

theorem of the equipartition of energy. Since this is also the internal energy [/of the system, one can compute 
the heat capacity 

Cy f 

— = — . (A2.2.41) 

Ab 2 

This is a classical result valid only at high temperatures. At low temperatures, quantum mechanical attributes 
of a degree of freedom can partially or fully freeze it, thereby modifying or removing its contribution to U and 
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A2.2.3.5 THERMODYNAMICS IN A MICROCANONICAL ENSEMBLE: CLASSICAL IDEAL GAS 

The definition of entropy and the identification of temperature made in the last subsection provides us with a 
connection between the microcanonical ensemble and thermodynamics. 

A quasistatic thermodynamic process corresponds to a slow variation of E, Fand N. This is performed by 
coupling the system to external agents. During such a process the ensemble is represented by uniformly 
distributed points in a region in Y space, and this region slowly changes as the process proceeds. The change 
is slow enough that at every instant we have a microcanonical ensemble. Then the change in the entropy 
during an infinitesimal change in E, V and N during the quasistatic thermodynamic process is 

The coefficient of dE is the inverse absolute temperature as identified above. We now define the pressure and 
chemical potential of the system as 


(A2.2.43) 




Then one has 


d$ = — (dK + ^dV- (JidN) ord^J = TdS - PtiV+fidN\ (A2.2.44) 

This is the First Law of Thermodynamics. 

The complete thermodynamics of a system can now be obtained as follows. Let the isolated system with TV 
particles, which occupies a volume Fand has an energy E within a small uncertainty dE, be modelled by a 
microscopic Hamiltonian H. First, find the density of states V(E) from the Hamiltonian. Next, obtain the 
entropy as S(E, V, N) = k^ log V(E) or, alternatively, by either of the other two equivalent expressions 

involving T(E) or 2(£). Then, solve for E in terms of S, Fand TV. This is the internal energy of the system: U 
(S, V, N) = E(S, V, TV). Finally, find other thermodynamic functions as follows: the absolute temperature from 
T=(dUldS) v N , the pressure from P = T (d Sid V) E N = -(d U/d V) s N , the Helmholtz free energy from A = 
U- TS, the enthalpy from H= U+PV, the Gibbs free energy from G = U + PV- TS,\x = G I TV and the heat 
capacity at constant volume from C y = (d U/d T) y N . 

To illustrate, consider an ideal classical gas of TV molecules occupying a volume Fand each with mass M and 
three degrees of translational motion. The Hamiltonian is 
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w -s#Erf- (AZZ45) 


Calculate the £(£) first. It is 


J(}<n<£ Nl\k*/ /h<£ 

If Pq = {2ME)i, then the integral is the volume of a 37V- sphere of radius P Q which is also equal to 
Cw f„ 3W where C 3 ^is the volume of a unit sphere in 37V dimensions. It can be shown that 


t-Vv = 


x'*" 2 


C3/V/2)! 

For large N,N\~ N N , e and C 37V reduces to 

C w = ^— J exp(3JV/2). 

This gives 


X(E) = ^-ijp;{2MEy-\ . (A2.2.46) 

Now one can use S = k^ log S. Then, for large AT, for entropy one obtains 

"5 , (V (4wML^'X\ 

.2 +,og bHlFF)JJ 


£(E, V. N) = Nk b 


(A2.2.47) 


It is now easy to invert this result to obtain E(S, V, N) = U(S, V, N)\ 

As expected, S and [/are extensive, i.e. are proportional to TV. From [/one can obtain the temperature 
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From this result it follows that 


fSU\ 2 U 

1 = WL = s «;■ <A2249 » 

C: " = (»r L = 5 wt '- <A2250) 


Finally, the equation of state is 




and the chemical potential is 

For practical calculations, the microcanonical ensemble is not as useful as other ensembles corresponding to 
more commonly occurring experimental situations. Such equilibrium ensembles are considered next. 

A2.2.3.6 INTERACTION BETWEEN SYSTEMS 

Between two systems there can be a variety of interactions. Thermodynamic equilibrium of a system implies 
thermal, chemical and mechanical equilibria. It is therefore logical to consider, in sequence, the following 
interactions between two systems: thermal contact, which enables the two systems to share energy; material 
contact, which enables exchange of particles between them; and pressure transmitting contact, which allows 
an exchange of volume between the two systems. In each of the cases, the combined composite system is 


supposed to be isolated (surrounded by adiabatic walls as described in section A2.1). 

In addition, there could be a mechanical or electromagnetic interaction of a system with an external entity 
which may do work on an otherwise isolated system. Such a contact with a work source can be represented by 
the Hamiltonian H(p, q, x) where x is the coordinate (for example, the position of a piston in a box containing 
a gas, or the magnetic moment if an external magnetic field is present, or the electric dipole moment in the 
presence of an external electric field) describing the interaction between the system and the external work 
source. Then the force, canonically conjugate to x, which the system exerts on the outside world is 

= HHip^x) (A2 2 53) 

dx 
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A thermal contact between two systems can be described in the following way. Let two systems with 
Hamiltonians Wj and W n be in contact and interact with Hamiltonian W. Then the composite system (I + II) 

has Hamiltonian 7i= Tij + W n + W. The interaction should be weak, such that the microstate of the composite 

system, say /, is specified by giving the microstate /' of system I and the microstate /" of system II, with the 
energy E l of the composite system given, to a good approximation, by Ej = Ej+ Fj 'where / = (/', /"). The 

existence of the weak interaction is supposed to allow a sufficiently frequent exchange of energy between the 
two systems in contact. Then, after sufficient time, one expects the composite system to reach a final state 
regardless of the initial states of the subsystems. In the final state, every microstate (/', /") of the composite 
system will be realized with equal probability, consistent with the postulate of equal a priori probability. Any 
such final state is a state of statistical equilibrium, the corresponding ensemble of states is called a canonical 
ensemble, and corresponds to thermal equilibrium in thermodynamics. The thermal contact as described here 
corresponds to a diathermic wall in thermodynamics (see section A2.1). 

Contacts between two systems which enable them to exchange energy (in a manner similar to thermal contact) 
and to exchange particles are other examples of interaction. In these cases, the microstates of the composite 
system can be given for the case of a weak interaction by, (TV, /) = (TV ', /'; N ", I"). The sharing of the energy 
and the number of particles lead to the constraints: EfN) = EJ(N ') + E*!(N ") and N = N ' + N ". The 

corresponding equilibrium ensemble is called a grand canonical ensemble, or a T- \i ensemble. 

Finally, if two systems are separated by a movable diathermic (perfectly conducting) wall, then the two 
systems are able to exchange energy and volume: E^V) = E}{V) + F.)\V") and V=N' + V". If the 

interaction is weak, the microstate of the composite system is (V, I) = (V, I '; V", I "), and the corresponding 
equilibrium ensemble is called the T-P ensemble. 


A2.2.4 CANONICAL ENSEMBLE 

Consider two systems in thermal contact as discussed above. Let the system II (with volume V R and particles 
7V R ) correspond to a reservoir R which is much larger than the system I (with volume Fand particles TV) of 
interest. In order to find the canonical ensemble distribution one needs to obtain the probability that the 
system I is in a specific microstate v which has an energy E . When the system is in this microstate, the 
reservoir will have the energy £ R = E T ~E v due to the constraint that the total energy of the isolated 

composite system I+II is fixed and denoted by E T ; but the reservoir can be in any one of the r R (£"^- E ) 
possible states that the mechanics within the reservoir dictates. Given that the microstate of the system of 


interest is specified to be v, the total number of accessible states for the composite system is clearly Y R (E T ~ 
E ). Then, by the postulate of equal a priori probability, the probability that the system will be in state v 
(denoted by P y ) is proportional to T R (E T -E y ): 

P,{E V ) = -r R (E T - E v ) 

where the proportionality constant is obtained by the normalization of? , 
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c = J^r^E T -E>) 


where the sum is over all microstates accessible to the system /. Thus 

r R (£ r - e v ) 


PAE») = 


Z v r*(E T -E,) 


which can be rewritten as 

exp[k>gr R (E r - f ,.)] exp[5 R {£ r - E v )/k b ] 


PAE V ) = 


£ L , e*p[log r R (E T - £,)] Y^ v exp[S R (E T - E v )/k b ] 


where the following definition of statistical entropy is introduced 

S{E , V\ N) = Jt h log V{E, V\ A ? ). (A2.2.54) 

Now, since the reservoir is much bigger than the system I, one expects E T ^E v Thermal equilibrium between 

the reservoir and the system implies that their temperatures are equal. Therefore, using the identification of T 
in section A2.1.4, one has 

a%(£jt) »s t {E v ) a 57 1 

— = 1 — = = — , (A2.2.55) 

HE R dE % HE T T 

Then it is natural to use the expansion of S R (i? R ) around the maximum value of the reservoir energy, E T : 

35 K (Et) . 

$R{E-f — *^) = ■SftC^/0 — ~ — E v + ■ — . 

Using the leading terms in the expansion and the identification of the common temperature T, one obtains 

£„<£> - E,) = S k {E r ) - E v /(k b T) 
from which it follows that 


ft(£ , ) = J±^IL, (A2 . 2 , 6) 
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Note that in this normalized probability, the properties of the reservoir enter the result only through the 
common equilibrium temperature T. The accuracy of the expansion used above can be checked by considering 
the next term, which is 

23£2 v ' 

Its ratio to the first term can be seen to be (d T I d Ej) E y l 2T. Since E is proportional to the number of 
particles in the system N and i^is proportional to the number of particles in the composite system (N + 7V R ), 
the ratio of the second-order term to the first-order term is proportional to N l(N + 7V R ). Since the reservoir is 
assumed to be much bigger than the system, (i.e. 7V R 3>A/) this ratio is negligible, and the truncation of the 

expansion is justified. The combination l/(k^T) occurs frequently and is denoted by P below. 

The above derivation leads to the identification of the canonical ensemble density distribution. More 
generally, consider a system with volume Fand N A particles of type A, N B particles of type B, etc., such that 
N = N A + 7V B + . . ., and let the system be in thermal equilibrium with a much larger heat reservoir at 
temperature T. Then if Wis the system Hamiltonian, the canonical distribution is (quantum mechanically) 

cxp(-/W) 

* - TYKHrt-wr <A2 267) 

The corresponding classical distribution is 

where/is the total number of degrees of freedom for the TV-particle system and 

Q"tf* V) = , fl . ,'» ■ ( e-^"-" d 2 'G (A2.2.59) 

hf N^.N a \... J 

which, for a one-component system, reduces to 

G*tf' ^ = 7TT77 f e' mM) d 2/ Q (A2.2.60) 

fr N\ J 

This result is the classical analogue of 

Q,v(P> V) = JVpC-jSE,.) = Tr[exp(-^)]. (A2 . 2 .61) 
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2^(P, V) is called the canonical partition function, and plays a central role in determining the thermodynamic 
behaviour of the system. The constants in front of the integral in (A2.2.59) and (A2.2.60) can be understood in 
terms of the uncertainty principle and indistinguishability of particles, as was discussed earlier in section 
A2.2.3.1 while obtaining (A2.2.18) . Later, in section A2. 2. 5. 5 , the classical limit of an ideal quantum gas is 
considered, which also leads to a similar understanding of these multiplicative constants, which arise on 
account of overcounting of microstates in classical mechanics. 

The canonical distribution corresponds to the probability density for the system to be in a specific microstate 
with energy E~Tf\ from it one can also obtain the probability T(E) that the system has an energy between E 
and E + dE if the density of states V(E) is known. This is because, classically, 

f/i y Ar A !W B ! ---I" 1 / d 2J ti = V{E)dE (A2.2.62) 

and, quantum mechanically, the sum over the degenerate states with E < 7i< E + &E also yields the extra 
factor V(E) dE. The result is 

T(E)d{E) = [Q tW r l &-* E V(E)iE. (A2.2.63) 

Then, the partition function can also be rewritten, as 

Qk = I e~ fiE V[E)&E. (A2.2.64) 

A2.2.4.1 THERMODYNAMICS IN A CANONICAL ENSEMBLE 

In the microcanonical ensemble, one has specified E ~ U(S, V, N) and T, P and |i are among the derived 
quantities. In the canonical ensemble, the system is held at fixed T, and the change of a thermodynamic 
variable from S in a microcanonical ensemble to T in a canonical ensemble is achieved by replacing the 
internal energy U(S, V, N) by the Helmholtz free energy A(T, V, N) = (U- T S). The First Law statement for 
df/, equation (A2.2.44) now leads to 

dA - {tdN - PdV - SdT. (A2.2.65) 

If one denotes the averages over a canonical distribution by < . . . ), then the relation^ = U- TS and U= (H) 
leads to the statistical mechanical connection to the thermodynamic free energy A: 

A = -k b T log Q fX . (A2.2.66) 

To see this, note that S = -k^( log p). Thus 
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S = -k b {]«£{Q-J c-l"*)) = k b log Q s + T~ l {H) = k b log Q N + r~ ] U 

which gives the result A = U- TS = -^riog Q N . For any canonical ensemble system, its thermodynamic 
properties can be found once its partition function is obtained from the system Hamiltonian. The sequence can 
be 

W -> Qs -* A -► (/i, J\ S) (A2.2.67) 

where the last connection is obtained from the differential relations 

"-(!£)„ '-(I^L *■(£)»,■ ,A2 ' 2 ' 68) 

One can trivially obtain the other thermodynamic potentials U, H and G from the above. It is also interesting 
to note that the internal energy U and the heat capacity Cy N can be obtained directly from the partition 
function. Since Q^ifi, V) = I exp(-$E ), one has 

£/ = {£)= Z* £ i- c *P(-lEi-> 

L V ^P(-^^) ( (A2 . 2 . 69) 

Fluctuations in energy are related to the heat capacity C y N , and can be obtained by twice differentiating log 
Q N with respect to P, and using equation (A2.2.69): 

{(E v - (E)) 2 ) = (E;.) - (E) 2 

W . r2 W , -r-r (A2.2.70) 

Both (E) and C y ^are extensive quantities and proportional to TV or the system size. The root mean square 
fluctuation in energy is therefore proportional to N 3 , and the relative fluctuation in energy is 

« E " - Wft* -. _L (A2.2.71) 

(£) Ni' 

This behaviour is characteristic of thermodynamic fluctuations. This behaviour also implies the equivalence 
of various ensembles in the thermodynamic limit. Specifically, as N ^> 00 the energy fluctuations vanish, the 
partition of energy between the system and the reservoir becomes uniquely defined and the thermodynamic 
properties in microcanonical and canonical ensembles become identical. 
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A2.2.4.2 EXPANSION IN POWERS OF J) 

In the relation (A2.2.66) , one can use the partition function evaluated using either (A2.2.59) or (A2.2.61) . The 
use of (A2.2.59) gives the first term in an expansion of the quantum mechanical A in powers of fjin the quasi- 


classical limit. In this section the next non-zero term in this expansion is evaluated. For this consider the 
partition function (A2.2.61) . The trace of exp(-BW) can be obtained using the wavefunctions of free motion 
of the ideal gas of TV particles in volume V: 


^ = V~ N/I e (i/ *'E/ ** (A2.2.72) 

where q. are the coordinates and/?. = hh are the corresponding momenta of the N particles, whose 3 N degrees 

of freedom are labelled by the suffix/. The particles may be identical (with same mass M) or different. For 
identical particles, the wavefunctions above have to be made symmetrical or antisymmetrical in the 
corresponding {q.} depending on the statistics obeyed by the particles. This effect, however, leads to 
exponentially small correction in A and can be neglected. The other consequence of the indistinguishability of 
particles is in the manner of how the momentum sums are done. This produces a correction which is third 
order in ft, obtained in section A2. 2. 5. 5 , and does not affect the 0{J?) term that is calculated here. In each of 
the wavefunctions \|/, the momenta/?, are definite constants and form a dense discrete set with spacing 

between the neighbouring/?, proportional to V~ . Thus, the summation of the matrix elements (\\f = exp(-p 
W)|\]/ ) with respect to all/?, can be replaced by an integration: 

Qh{JS, V) = TrCVvl cxp<-jffW) W„} (A2.2.73) 

= ,U'» L , f ^ P d3A '« ' < AZ2 - 74 ) 

h ih N*\N D \ . . . J 

where 


2 *j- Mj Bqj 

When p = 0, / = 1 . For systems in which the Hamiltonian Tican be written as 

K = Y^ + U=- i -h 2 Y—-^+U (A2.2.75) 

with U= U({q.}) as the potential energy of interaction between N particles, the integral /can be evaluated by 
considering its derivative with respect to (3, (note that the operator Wwill act on all factors to its right): 
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— = _ e" ( W £, **' TUt 11 ^ Z, 9** i } (A2.2.76) 

= - Fi ^ )f ^2^{p^ + ^) (A2277) 

where E{p, q) = (S.(/? 2 . / 2M.) + U) is the classical form of the energy. By using the substitution, / = exp(-p E 

j j j <\ ^~ *" 

(P> ^))X an d expanding % = 1 + h%i + f J %2 + ' ' '> one can obtain the quantum corrections to the classical 


partition function. Since for P = 0, / = 1, one also has for p = 0, % = 1, and x 1 = % 2 = ®- With this boundary 
condition, one obtains the result that 

*' 2 P ^ M; % 


and 


jr 2 = -[^(y£L a JL)\l^yylLfLJllL 

8 V Y W, ty, / 6' ^* ^- Mj Mi ih {( iU {ll 

For the partition function, the contribution from Xj, which is the first-order correction in h, vanishes 
identically. One obtains 

QkW- V) = Qi-O+tfiXlf +■) (A2.2.78) 

where the superscript (cl) corresponds to the classical value, and (%2/ * s the classical canonical ensemble 
average of % . The free energy A can then be inferred as 

A = A Ll -y? _l ]oe(l+fi 3 (x2} d +- ■■) (A2.2.79) 

«#A d -^" 1 » 2 {X2> d - (A2.2.80) 

One can formally evaluate (% 2 ) • Since (p-Pj) =M.p _1 8.^, one obtains 
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This can be further simplified by noting that 


/ ^ e"^ dfl/ = — r*° + /" f |^ Ye"^ dtfj (A2.2.82) 


which implies that 


m-im) 


(A2.2.83) 


It follows that 


-r-K-1) 


(X2>t ' = _ ^4 > ^ Tr ( I T- J ) (A2.2.84) 


with the end result that 


fetf. V) = flj(l - »'| £ ^((gj)")) <«2.85» 


and 


2A £j* Hj\\d<u J I 


A „ A * + tfL.Y J. ^ \ (A2.2.86) 


The leading order quantum correction to the classical free energy is always positive, is proportional to the sum 
of mean square forces acting on the particles and decreases with either increasing particle mass or increasing 
temperature. The next term in this expansion is of order Jr. This feature enables one to independently 
calculate the leading correction due to quantum statistics, which is 0(J?). The result calculated in section 

A2.2.5.5 is 

^ 3 = ±- —^ (A2.2.87) 

for an ideal quantum gas of TV identical particles. The upper sign is for Fermi statistics, the lower is for Bose 
statistics and y is the degeneracy factor due to nuclear and electron spins. 
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In the following three subsections, the three examples described in A2.2.2 are considered. In each case the 

model system is thermal equilibrium with a large reservoir at temperature T= (k |3)~ . Then the partition 
function for each system is evaluated and its consequences for the thermodynamic behaviour of the model 
system are explored. 

A2.2.4.3 APPLICATION TO IDEAL SYSTEMS: TWO-STATE MODEL 

Let us consider first the two-state model of non-interacting spin-1 particles in a magnetic field. For a system 
with only one such particle there are two non-degenerate energy levels with energies ±\iH, and the partition 

canonical partition function is 


function is Q, = exp(-fi\iH) + exp(P|n//) = 2 cosh(fi\iH). For N such indistinguishable spin-j particles, the 




The internal energy is 


u = - 


a log Q* 


= -NiiHianh(fiiiH). 


If His oo (very large) or T is zero, the system is in the lowest possible and a non-degenerate energy state and 
U=-N\iH. If either Hoy p is zero, then U= 0, corresponding to an equal number of spins up and down. There 
is a symmetry between the positive and negative values of P|u//, but negative P values do not correspond to 
thermodynamic equilibrium states. The heat capacity is 

k b I - \ dp J H v 

Figure A2.2.1 shows C R N , is units of Nk^ as a function of (P|li//). C H N '\s zero in the two limits of zero and 
infinite values of (P|ll//), which also implies the limits of 7= oo and T= 0. For small (Pjli//), it approaches zero 

as ~(P|li//) 2 and for large (P|li//) as (P|li//) 2 exp(-2P|n//). It has a maximum value of 0.4397V£ b around Pjuif = 
1.2. This behaviour is characteristic of any two-state system, and the maximum in the heat capacity is called a 
Schottky anomaly. 
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0,5 r 



Figure A2.2.1. Heat capacity of a two-state system as a function of the dimensionless temperature, k^T/(\i H). 
From the partition function, one also finds the Helmholtz free energy as 

One can next obtain the entropy either from S = (U-F)/T or from S = -{dFldT) v N H , and one can verify that 
the result is the same. 


It is also instructive to start from the expression for entropy S = k^ log(g(7V, m)) for a specific energy partition 
between the two-state system and the reservoir. Using the result for g(N, m) in section A2.2.2 , and noting that 

E = -(2\iH)m, one gets (using the Stirling approximation N ! « (2nN)2N N e _N ), 


t; l s = -N 


(r £)'<* (r £) + G -£)'"* G - £). 


Since E = -{2\xH)m, a given spin excess value 2m implies a given energy partition. The free energy for such a 
specific energy partition is 


This has to be minimized with respect to E or equivalently m/N to obtain the thermal equilibrium result. The 
value of m/TVthat corresponds to equilibrium is found to be 

which corresponds to (2m) = Ntanh(fi\iH) and leads to the same [/as above. It also gives the equilibrium 
magnetization as M= N\i tanh(P|u//). 


(A2.2.88) 
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A2.2.4.4 APPLICATION TO IDEAL SYSTEMS: CLASSICAL IDEAL GAS 

Consider a system of TV non-interacting point particles in a three-dimensional cubical box of volume V= I? . 
First consider one classical particle with energy E =p 2 /(2M). The partition function is 

G, = ft" 3 / dV / dp, / dp y / d^e- ( ^ + ^ + ^> /(2 ^ r) 

where the definition of the quantum volume V associated with the thermal deBroglie wavelength, X T ~ 
h/l2jr\fti^Ty^ * s introduced. The same result is obtained using the density of states u(f) obtained for this 
case in section A2.2.2 . Even though this 2?(i) was obtained using quantum considerations, the sum over nwas 
replaced by an integral which is an approximation that is valid when k^T is large compared to energy level 
spacing. This high-temperature approximation leads to a classical behaviour. 

For an ideal gas of TV indistinguishable point particles one has Q N = Q\IN\ = (V/V ) N /Nl. For large TV one can 
again use the Stirling approximation for TV! and obtain the Helmholtz free energy 

F = -M log Q N = Nk b 1 log (e£ vA = Nk b T log (^f ^^^ ) = ) 

(The term 2log(27iTV)^ b ris negligible compared to terms proportional to Nk^T.) The entropy obtained from the 

relation S = -(d Fid T) N y agrees with the expression, equation (A2.2.47) , obtained for the microcanonical 
ensemble, and one also obtains U = F ' + T S= kNk^T consistent with the equipartition law. The ideal equation 

of state P = Nk^T/ Vis also obtained from evaluating P = -(d Fid V) N T Thus one obtains the same 
thermodynamic behaviour from the canonical and microcanonical ensembles. This is generally the case when 


TV is very large since the fluctuations around the average behave as Wi. A quantum ideal gas with either Fermi 
or Bose statistics is treated in subsection A2.2.5.4 , subsection A2.2.5.5 , subsection A2.2.5.6 and subsection 

A2.2.5.7 . 

A2.2.4.5 IDEAL GAS OF DIATOMIC MOLECULES 

Consider a gas of TV non-interacting diatomic molecules moving in a three-dimensional system of volume V. 
Classically, the motion of a diatomic molecule has six degrees of freedom — three translational degrees 
corresponding to the centre of mass motion, two more for the rotational motion about the centre of mass and 
one additional degree for the vibrational motion about the centre of mass. The equipartition law gives ( ^ trans ) 
= kNk^T. In a similar manner, 
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since the rotational Hamiltonian has rotational kinetic energy from two orthogonal angular momentum 
components, in directions each perpendicular to the molecular axis, equipartition gives ( £" t ) = Nk^T. For a 
rigid dumb-bell model, one would then get ( E tQta ^ = \Nk^T, since no vibration occurs in a rigid dumb-bell. 

The corresponding heat capacity per mole (where TV = 7V a is the Avogadro's number and 7? = NJi^ is the gas 
constant), is C y = \R and C = j/?. If one has a vibrating dumb-bell, the additional vibrational motion has two 

quadratic terms in the associated Hamiltonian — one for the kinetic energy of vibration and another for the 
potential energy as in a harmonic oscillator. The vibrational motion thus gives an additional (E w ^) = Nk^T 
from the equipartition law, which leads to (E tot ^ = jA^rand heat capacities per mole as C y = ^R and C = 

p. 

These results do not agree with experimental results. At room temperature, while the translational motion of 
diatomic molecules may be treated classically, the rotation and vibration have quantum attributes. In addition, 
quantum mechanically one should also consider the electronic degrees of freedom. However, typical 
electronic excitation energies are very large compared to k^T (they are of the order of a few electronvolts, and 

1 eV corresponds to r« 10 000 K). Such internal degrees of freedom are considered frozen, and an electronic 
cloud in a diatomic molecule is assumed to be in its ground state * Q with degeneracy g Q . The two nuclei A and 

B, which along with the electronic cloud make up the molecule, have spins I A and 7 B , and the associated 
degeneracies (27 A +1) and (2/ B +1), respectively. If the molecule is homonuclear, A and B are 
indistinguishable and, by interchanging the two nuclei, but keeping all else the same, one obtains the same 
configuration. Thus for a homonuclear molecule, the configurations can be overcounted by a factor of two if 
the counting scheme used is the same as that for heteronuclear molecules. Thus, the degeneracy factor in 
counting the internal states of a diatomic molecule is g = g (2/ A + 1)(2/ B + 1)/(1 + 8 AB ) where 8 AB is zero for 
the heteronuclear case and one for the homonuclear case. 

The energy of a diatomic molecule can be divided into translational and internal contributions: €■ = (fik) l(2M) 
+ e int , and £- mt = £ Q + e rot + e yib . In the canonical ensemble for an ideal gas of diatomic molecules in thermal 

equilibrium at temperature T= (k^fi) the partition function then factorizes: 

e.v = (Ar!)- | [G trans f[2 int ] A ' 

where the single molecule translational partition function 2 trans ^ s the same as Q^ in equation (A2.2.88) and 
the single-molecule internal partition function is 


The rotational and vibrational motions of the nuclei are uncoupled, to a good approximation, on account of a 
mismatch in time scales, with vibrations being much faster than the rotations (electronic motions are even 
faster than the vibrational ones). One typically models these as a rigid rotation plus a harmonic oscillation, 
and obtains the energy eigenstates for such a model diatomic molecule. The resulting vibrational states are 
non-degenerate, are characterized by a vibrational quantum number v = 0, 1, 2, . . . and with an energy e yib = 

€ = (2 + v)fico where co o is the characteristic vibrational frequency. Thus 
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J-H 


The rotational states are characterized by a quantum number J = 0, 1, 2, ... are degenerate with degeneracy 
(2/ + 1) and have energy * rot = £j = J(J+ l)h /(2I Q ) where I Q is the molecular moment of inertia. Thus 


T£ 


<?* = 2>' +l > c "'" +, *' r 

where r = h /(2I k^). If the spacing between the rotational levels is small compared to k^T, i.e. if T !^0 r , the 

sum can be replaced by an integral (this is appropriate for heavy molecules and is a good approximation for 
molecules other than hydrogen): 




which is the high-temperature, or classical, limit. A better evaluation of the sum is obtained with the use of the 
Euler-Maclaurin formula: 

M iiou 111 I 

Putting//) = (27+ 1) exp(-J(J+ 1) 9/7), one obtains 


T 1 1 r 4 /0 r \ 2 


If T*KQ v , then only a first few terms in the sum need to be retained: 


Once the partition function is evaluated, the contributions of the internal motion to thermodynamics can be 
evaluated. Q. x depends only on 7, and has no effect on the pressure. Its effect on the heat capacity C y can be 

obtained from the general expression C y = (k^T 2 )~^(d 2 log QJdfi 2 ). Since the partition function factorizes, its 
logarithm and, hence, heat capacity, reduces to additive contributions from translational, rotational and 
vibrational contributions :c\. = f; irj ™ + f™ + C nih where the translational motion (treated classically) yields 
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The rotational part at high temperatures gives 


--^mhmh-) 


which shows that (^'decreases at high T, reaching the classical equipartition value from above at T = oo. At 
low temperatures, 

so that as T—> 0, CT'drops to zero exponentially. The vibrational contribution C,V h i s given by 

ft(Ou 




■ L it o 

For T "StQ , C r V is very nearly 7V& b , the equipartition value, and for T*KQ , C ( V tends to zero as (0 IT) exp(- 
IT). For most diatomic molecules is of the order of 1000 K and is less than 100 K. For HC1, = 15 K; 

v j y r ' r ' 

for N 2 , 2 and NO it is between 2 and 3 K; for H 2 , D 2 and HD it is, respectively, 85, 43 and 64 K. Thus, at 
room temperature, the rotational contribution could be nearly Nk^ and the vibrational contribution could be 
only a few per cent of the equipartition value. Figure A2.2.2 shows the temperature dependence of C for HD, 
HT and DT, various isotopes of the hydrogen molecule. 
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Figure A2.2.2. The rotational-vibrational specific heat, C , of the diatomic gases HD, HT and DT as a 
function of temperature. From Statistical Mechanics by Raj Pathria. Reprinted by permission of Butterworth 
Heinemann. 

A2.2.4.6 APPLICATION TO IDEAL SYSTEMS: BLACK BODY RADIATION 

This subsection, and the next, deals with a system of N non-interacting harmonic oscillators. 

Electromagnetic radiation in thermal equilibrium within a cavity is often approximately referred to as the 
black-body radiation. A classical black hole is an ideal black body. Our own star, the Sun, is pretty black! A 
perfect black body absorbs all radiation that falls onto it. By Kirchhoff s law, which states that 'a body must 
emit at the same rate as it absorbs radiation if equilibrium is to be maintained', the emissivity of a black body 
is highest. As shown below, the use of classical statistical mechanics leads to an infinite emissivity from a 
black body. Planck quantized the standing wave modes of the electromagnetic radiation within a black-body 
cavity and solved this anomaly. He considered the distribution of energy U among TV oscillators of frequency 
co. If [/is viewed as divisible without limit, then an infinite number of distributions are possible. Planck 
considered '[/as made up of an entirely determined number of finite equal parts' of value fioo. This 
quantization of the electromagnetic radiation leads to the concept of photons of energy quanta to, each of 
which having a Hamiltonian of the form of a harmonic oscillator. A state of the free electromagnetic field is 
specified by the number, n, for each of such oscillators and n then corresponds to the number of photons in a 
state with energy ftco. Photons obey Bose-Einstein statistics. Denote by n. the number of photons with energy 

€. = toy 
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Then n • = 0, 1, 2, . . . and the canonical partition function is 




~P {It i £ i + 1*2*1 *- +« j * j +■ ") 


Here the zero point energy is temporarily suppressed. Now the exponential is a product of independent 
factors. Thus one gets 


e =n(j e -)=n(^;) 


(A2.2.89) 


on account of the geometric nature of the series being summed. One should note that photons are massless and 
their total number is indeterminate. Since log Q = - $A, one can obtain various properties of the photon gas. 
Specifically consider the average occupation number of they'th state: 

, t E..";e-^ £■„„..■■ fljc-*""*-""*"' 
{,tj} - S>-** - Q 

1 __ I 

~~ ^y _ ] gfihwj _ [ " 

This is the Planck distribution function. The thermal average energy in theyth mode is (including the zero 
point energy) 

( f i ) = -r^h + ^n — - — 
w ' 2 J e^ JdW ' - I 

Since for small (3/)co. = 7, (exp(y)-l) - « [y(l + jV2 + . . .)] « jT (1 -}V2) = (y -1/2), one obtains, when £. 

*Kk^T, the result for the high-temperature limit: (£.) — > k^T. This is also the average energy for a classical 

harmonic oscillator with two quadratic degrees of freedom (one kinetic and one potential) in the Hamiltonian, 
an equipartition result. For low temperatures one has e . ^k^T and i*j) — * (.\ftwj + htuj e - ^' ). The oscillator 

settles down in the ground state at zero temperature. 

Any cavity contains an infinite number of electromagnetic modes. For radiation confined to a perfectly 
conducting cubical cavity of volume V= I? , the modes are given by the electric field components of the form: 
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£.y = E x toSin€t>tcos{n x jtxfL)&m{?iy7rxfL)sm{n z irxfL) 
J? v = FyQzinwi sin(w T 7r„c// J )cns(rt v Tj:// J }sin(/F-jrA7/ J ) 
£. = £ ;ft sin*tf/ sin(n v jrjr/L) Hn(nyirxf L) co&(n z 7TJc f L). 

Within the cavity y- £= 0, which in Fourier space is £• g= 0. Thus, only two of the three components of jrare 

independent. The electromagnetic field in a cavity is a transversely polarized field with two independent 
polarization directions, which are mutually perpendicular and are each normal to the propagation direction £of 

™ 9 9 ™ 9 ™ 9 

the £field, which satisfies the electromagnetic wave equation, c V E= d fjdt . Substituting the form of the 
gfield above, one gets 

c n~n = ti)~L where n = (n~ +«" +/!")- 

so that the quantized photon modes have frequencies of the form a> n = nnc/L. The total energy of the photons 
in the cavity is then 

Here the zero point energy is ignored, which is appropriate at reasonably large temperatures when the average 
occupation number is large. In such a case one can also replace the sum over «by an integral. Each of the 
triplet (n x , n. n z ) can take the values 0, 1, 2, . . ., go. Thus the sum over (« n. n z ) can be replaced by an 
integral over the volume element dn , dn , dn which is equivalent to an integral in the positive octant of the 
three-dimensional ^-space. Since there are two independent polarizations for each triplet (n x , n. n z ) 9 one has 

£<-..) =2l [ X ten 1 dn (■■■)- 

U = rt I dntr— = V— t-7 / da}— ■ = V I dwu tr! . 

Since f^dx x /(e x - 1) = n /15, one obtains the result for the energy per unit volume as 

— = ^-7' 4 . (A2.2.90) 

V 15ft-V 


Then 
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This is known as the Stefan-Boltzmann law of radiation. If in this calculation of total energy [/one uses the 
classical equipartition result (£ ) = £ b r, one encounters the integral / O w dco co 2 which is infinite. This 

divergence, which is the Rayleigh- Jeans result, was one of the historical results which collectively led to the 
inevitability of a quantum hypothesis. This divergence is also the cause of the infinite emissivity prediction 
for a black body according to classical mechanics. 


The quantity u^ introduced above is the spectral density defined as the energy per unit volume per unit 
frequency range and is 


u™ = 


tiy 


7Z V e^ - L 


(A2.2.91) 


This is known as the Planck radiation law. Figure A2.2.3 shows this spectral density function. The surface 
temperature of a hot body such as a star can be estimated by approximating it by a black body and measuring 
the frequency at which the maximum emission of radiant energy occurs. It can be shown that the maximum of 
the Planck spectral density occurs at f J(0 max /(£ B 7) ~ 2.82. So a measurement of a> max yields an estimate of the 
temperature of the hot body. From the total energy U, one can also obtain the entropy of the photon gas 
(black-body radiation). At a constant volume, dS = dU/T= (4n 2 k^V)/(l5h 3 c 3 )T 2 dT. This can be integrated 
with the result 


S = 


An 2 ktV , 
457rV 


The constant of integration is zero: at zero temperature all the modes go to the unique non-degenerate ground 
state corresponding to the zero point energy. For this state S ~ log(g) = log(l) = 0, a confirmation of the Third 
Law of Thermodynamics for the photon gas. 
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Figure A2.2.3. Planck spectral density function as a function of the dimensionless frequency ft(£>/(k^T). 
A2.2.4.7 APPLICATION TO IDEAL SYSTEMS: ELASTIC WAVES IN A SOLID 


The energy of an elastic wave in a solid is quantized just as the energy of an electromagnetic wave in a cavity. 


The quanta of the elastic wave energy are called phonons The thermal average number of phonons in an 
elastic wave of frequency <d is given, just as in the case of photons, by 


(;i(fli)}=(e?tp(^«)- I) 


-i 


Phonons are normal modes of vibration of a low-temperature solid, where the atomic motions around the 
equilibrium lattice can be approximated by harmonic vibrations. The coupled atomic vibrations can be 
diagonalized into uncoupled normal modes (phonons) if a harmonic approximation is made. In the simplest 
analysis of the contribution of phonons to the average internal energy and heat capacity one makes two 
assumptions: (i) the frequency of an elastic wave is independent of the strain amplitude and (ii) the velocities 
of all elastic waves are equal and independent of the frequency, direction of propagation and the direction of 
polarization. These two assumptions are used below for all the modes and leads to the famous Debye model. 

There are differences between photons and phonons: while the total number of photons in a cavity is infinite, 
the number of elastic modes in a finite solid is finite and equals 37V if there are TV atoms in a three-dimensional 
solid. Furthermore, an elastic wave has three possible polarizations, two transverse and one longitudinal, in 
contrast to only 
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two transverse polarizations for photons. Thus the sum of a quantity over all phonon modes is approximated 
by 

3 f"° 

4jrii 2 d/i(—) 


?<->-il 


where the maximum number n D is obtained from the constraint that the total number of phonon modes is 37V: 


3 r"" 


i 


4jtjt d/i = 3Af 


which gives n^ = (dN/7t)^ Keeping in mind the differences noted above, the total thermal energy 

contributed by phonons can be calculated in a manner analogous to that used above for photons. In place of 
the velocity of light c, one has the velocity of sound v and a> n = nnv/L. The maximum value n^ corresponds 
to the highest allowed mode frequency a> D = n^nv/L, and a> D is referred to as the Debye frequency. The 
calculation for [/then proceeds as 


e /ft<uv, _ I 


3jt r» J 

= — / ami 

2 J e^ - J 

3^ f ( "° ftoj 3 V p> jc* 

2tV h e**" - J 2jt Vfc V k e l - 


] 


The upper limit of the dimensionless variable x n is typically written in terms of the Debye temperature D as 
x D = D /r, where using A] } = fifttou = flhjtunnfL = fihv(6x 2 Nf V)f, one identifies the Debye temperature 


as 


9 D = ihvfk h )(6x 2 N/V)K (A2.2.92) 

Since ojq= 6k v N/V, one can also write 




^tM-^TTj (AZZ93) 


where g(co)d(D is the number of phonon states with a frequency between co and co + dco, and is given by 

g(«))= \ *5 9 (A2.2.94) 
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g(co) is essentially the density of states and the above expression corresponds to the Debye model. 

In general, the phonon density of states g(co), dco is a complicated function which can be directly measured 
from experiments, or can be computed from the results from computer simulations of a crystal. The explicit 
analytic expression of g(co) for the Debye model is a consequence of the two assumptions that were made 
above for the frequency and velocity of the elastic waves. An even simpler assumption about g(co) leads to the 
Einstein model, which first showed how quantum effects lead to deviations from the classical equipartition 
result as seen experimentally. In the Einstein model, one assumes that only one level at frequency a> E is 
appreciably populated by phonons so that g(co) = 8(oo-oo E ) and U = (ha^y/i^ 11 ^- - 1), for each of the 

Einstein modes. Jjoo E /^ b is called the Einstein temperature £ . 

High-temperature behaviour. Consider Tmuch higher than a characteristic temperature like D or E . Since P 
ftco is then small compared to 1 , one can expand the exponential to obtain 


e^ JCU - 1 £ 
and 

u = £ J^z- X * k * r £ ] = wo 7 < A2 - 2 - 95 ) 

n n 

as expected by the equipartition law. This leads to a value of 3Nk^ for the heat capacity Cy. This is known as 
the Dulong and Petit 's law. 

Low-temperature behaviour. In the Debye model, when r<K0 D , the upper limit, x D , can be approximately 
replaced by go, the integral over x then has a value n /15 and the total phonon energy reduces to 


U{T) 


IV 


x 4 ^ivk*. 


27t 2 uWp 15 


50, 


D 


proportional to V" . This leads to the heat capacity, for r<£6 D , 




\2x A Nk h ^ 


591 


T'^A^T 


(A2.2.96) 


This result is called the Debye T 3 law. Figure A2.2.4 compares the experimental and Debye model values for 
the heat capacity C . It also gives Debye temperatures for various solids. One can also evaluate Cy for the 

Einstein model: as expected it approaches the equipartition result at high temperatures but decays 

exponentially to zero as Tgoes to zero. 
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The Debye model is more appropriate for the acoustic branches of the elastic modes of a harmonic solid. For 
molecular solids one has in addition optical branches in the elastic wave dispersion, and the Einstein model is 
more appropriate to describe the contribution to [/and Cy from the optical branch. The above discussion for 
phonons is suitable for non-metallic solids. In metals, one has, in addition, the contribution from the electronic 
motion to [/and Cy. This is discussed later, in section (A2.2.5.6) . 
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Figure A2.2.4. Experimental and Debye values for the heat capacity C . From Born and Huang [JJ. 


A2.2.5 GRAND CANONICAL ENSEMBLE 

Now consider two systems that are in thermal and diffusive contact, such that there can be sharing of both 
energy and particles between the two. Again let I be the system and II be a much larger reservoir. Since the 
composite system is isolated, one has the situation in which the volume of each of the two are fixed at V and 
V", respectively, and the total energy and total number of particles are shared: Ej = E r + /T'Vhere / = (/', I") 

and N = N r + AT '. We shall use the 
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notation E = E r + E" for the former of these two constraints. For a given partition the allowed microstates of 
the system I is given by T l (E", N r ) and that for the sytem II by T n (£" ', N") = T n (E - E\ N - N r ). Then the 
total number of allowed microstates for the composite system, subject to the two constraints, is 

r c (E. N) = J]^ri(£\ N f )r [t (E -e\n- n'). 

Among all possible partitions in the above expression, the equilibrium partition corresponds to the most 
probable partition, for which dr c = 0. Evaluating this differential yields the following relation: 

Since E" and N r are independent variables, their variations are arbitrary. Hence, for the above equality to be 
satisfied, each of the two bracketed expressions must vanish when the (E, N) partition is most probable. The 
vanishing of the coefficient of dE r implies the equality of temperatures of I and II, consistent with thermal 
equilibrium: 

cHo^ri d log Hi 

ft = — = ^— - = ft]. (A2.2.97) 

The result that the coefficient of dA^' is zero for the most probable partition is the consequence of the chemical 
equilibrium between the system and the reservoir. It leads us to identify the chemical potential |u as 

-^ = -8u (A2.2.98) 

i)N ' 

in analogy to the thermodynamic definition. Then, since Pj = (3 n , the vanishing of the coefficient of dTV' leads 
to the equality of chemical potentials: jUj = |u n . In a manner similar to that used to obtain the canonical 
distribution, one can expand 


r„(E -E',N- AT) = expf5[,C£ -E\N - N')/k h ] 

= I- u(£ ,, )C xp(-l( £ -i| + ^)) 
oc expL-jtf(£'-juAT')l. 

With this result and arguments similar to those used in the last section, one finds the grand canonical 
ensemble distribution as (quantum mechanically) 
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9 " r;v=oTr|exp(-^ - „N])] ' (A2Z99) 

The corresponding classical distribution is 

/><p, f/ ; N)d 2 ^ = ^^ (A2.2.100) 

where/is the total number of degrees of freedom if the system has N particles, and the grand partition 
function S(P, |u, V) is given by 

3(fr/*. V) = f^ -J- fc-'^-' ilV| d 2/ n (A2.2.101) 

which is the classical analogue of 

J] ^expf-^E, - tiH\) = ^Trfexpt-^ft - i*N})]* (A2.2.102) 

A^O v N=ft 

In the above, the sum over N has the upper limit of infinity. This is clearly correct in the thermodynamic limit. 
However, for a system with finite volume, V, depending on the 'hard core size' of its constituents, there will 
be a maximum number of particles, M( V), that can be packed in volume V. Then, for all N such that N > M( V), 
the value of (-P"W) becomes infinity and all terms in the N sum with N> M(V) vanish. Thus, provided the 
inter-particle interactions contain a strongly repulsive part, the N sum in the above discussion can be extended 
to infinity. 

If, in this ensemble, one wants to find only the probability that the system has N particles, one sums the 
distribution over the energy microstates to obtain: 

7?(jV) = *" ' ' H — - (A2.2.103) 

The combination ePV> occurs frequently. It is called the fugacity and is denoted by z. The grand canonical 
ensemble is also known as T- \i ensemble. 


A2.2.5.1 T-P ENSEMBLE 


In many experiments the sample is in thermodynamic equilibrium, held at constant temperature and pressure, 
and various properties are measured. For such experiments, the T - P ensemble is the appropriate description. 
In this case the system has fixed TV and shares energy and volume with the reservoir: E = E' + E" and V=V 
+ V'\ i.e. the system 
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and the reservoir are connected by a pressure transmitting movable diathermic membrane which enables the 
sharing of the energy and the volume. The most probable partition leads to the conditions for thermal 
(equality of temperatures) and mechanical (equality of pressures) equilibria. The later condition is obtained 
after identifying pressure P as 


aioer 

— = MP, (A2.2.104) 

The T-P ensemble distribution is obtained in a manner similar to the grand canonical distribution as 
(quantum mechanically) 

exp{-fi[tt + FV\) 

P = ~P^ — (A2.2.105) 

^dVTr[exp(-^[H + J P^])] 

and classically as 

p{p, </; V)d 2 fQ = — — (A2.2.106) 

where the T- P partition function Y(T, P, N) is given by 

W\p t N) = ^y Pdv j t -nw>*v»rv\tfr Qt (A2.2.107) 

Its quantum mechanical analogue is 

Y{T.P,N)= I dvy\exp(-p[E r (V) + rV\) (A2.2.108) 

Jo „ 

i}VlT[£*p{-fi[H + PV])]. (A2.2.109) 


The T-P partition function can also be written in terms of the canonical partition function Q N as: 

F(7\ P.N) = / C jV (^ V)e' ppv dV (A2.2.110) 

Jo 


and the probability that the system will have a volume between Fand V+ dFis given by 
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P(^)d(V) = Qxifl, V) . (A2.2.111) 

From the canonical ensemble where (V, T,N) are held fixed, one needs to change Fto P as an independent 
variable in order to obtain the T- P ensemble where (P, T, TV) are fixed. This change is done through a 
Legendre transform, G = A + PV, which replaces the Helmholtz free energy by the Gibbs free energy as the 
relevant thermodynamic potential for the t-P ensemble. Now, the internal energy [/and its natural 
independent variables S, V, and N are all extensive quantities, so that for an arbitrary constant a, 

U{a$.aV<aN) = aU(S< V T N). (A2.2.112) 

Differentiating both sides with respect to a and using the differential form of the First Law, dU = TdS-P, dV 
+ jlx dTV, one obtains the Gibbs-Duhem equation: 

U = TS - PV + fl\ r (A2.2.113) 

which implies that G = A + PV= U-TS + PV= \iN. The connection to thermodynamics in the T- P 
ensemble is made by the identification 

G(T, p, N) = -Jt b riogy(r ? p,n). (A2.2.114) 

The average value and root mean square fluctuations in volume Fof the T- P ensemble system can be 
computed from the partition function Y(T, P, N): 

^-'■"'--KirL- (A2 - 2 - 116 > 

The entropy S can be obtained from 


(A2.2.117) 


A2.2.5.2 THERMODYNAMICS IN A GRAND CANONICAL ENSEMBLE 

In a canonical ensemble, the system is held at fixed (V, T, N). In a grand canonical ensemble the (V, T, \x) of 
the system are fixed. The change from TV to |i as an independent variable is made by a Legendre 
transformation in which the dependent variable A, the Helmholtz free energy, is replaced by the grand 
potential 
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Qu = A - t iN = U -TS- (i N = -PV. (A2.2.1 18) 

Therefore, from the differential relation, equation (A2.2.65) , one obtains, 

dft Cl = -$dT - PdV - Ndfi (A2.2.119) 

which implies 


(A2.2.120) 


Using equation (A2.2.101) and equation (A2.2.60) , one has 

3(jM, V) = ^e^&vtf, V) = ^e w,v * Th « s ^, (A2.2.121) 

Using this expression for S and the relation A = -k^T log Q N , one can show that the average of (\iN-A) in 
the grand canonical ensemble is 

a 

{fiN -A) = —(log E(/!. /i, V)). (A2.2.122) 

The connection between the grand canonical ensemble and thermodynamics of fixed (V, T, \i) systems is 
provided by the identification 

log E{j3, (X, V) = -££2 ti = jSP V, (A2.2.123) 

Then one has 

k h T k>£ H(0, ji, V) = fi{N) - (A). (A2.2.124) 

In the grand canonical ensemble, the number of particles fluctuates. By differentiating log E, equation 
(A2.2.121) with respect to P|u at fixed Fand p, one obtains 

1 !i log S 
< W > - ^^ (A2.2.125) 
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and 


({If - [N)) 2 ) = {N~) - (N? = -JT^-^- = ■?-%-!■. (A2.2.126) 


Since d(N) /d\x ~ (N), the fractional root mean square fluctuation in N is 

«' V - <"» 3 > ; - _L (A2.2.127) 

W «■ ' 

There are two further useful results related to ((N-(N)) 2 ). First is its connection to the isothermal 
compressibility k t = -V dP/dV)^ v and the second to the spatial correlations of density fluctuations in a 
grand canonical system. 

Now since Q G = -PV, the Gibbs-Duhem equation gives dQ G = -S, dT-P, dV-(N) d\i = -P dV- Vdp, which 
implies that d|u = (VdP - S dT)/(N). Let v = V/(N) be the specific volume, and express |u as |u(v, T). Then the 
result for d|u gives 


\Dv/ T \8tt } 1 


Now a change in v can occur either through For (N): 


\dv} VT ~ v \d{N)} v 

(*) ={N) (JL) 


W(»»\ =-v( iiP ) 

These two should lead to an equivalent change in v. Thus one obtains 
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the substitution of which yields, for the mean square number fluctuations, the result 

UN -if/)) 2 ) *y 
iff) $v' 


(A2.2.128) 


For homogeneous systems, the average number density is n Q = (TV) / V=v . Let us define a local number 
density through 


n(r) = ^5(r -F f ) (A2.2.129) 


(=i 


where ^is a point within the volume V of the grand ensemble system in which, at a given instant, there are TV 
particles whose positions are given by the vectors i^., i = 1, 2 , ..., TV. One has N = j d V if (?)and, for 
homogeneous systems, {N} = JdV {n(7}} = f J V u^ = Vn n . One can then define the fluctuations in the local 
number density as 8n = n - n , and construct the spatial density-density correlation function as 


C(?-?')sft- 3 {i«(r)fc(?')}. (A2.2.130) 

G(?) is also called the pair correlation function and is sometimes denoted by h(f). Integration over j^and ?' 
through the domain of system volume gives, on the one hand, 


/ dr' / drG(r-r'} = V f d?G(r) 
Jv Jv J 


and, on the other, 


/ dr' / d?(Hr - r') = n~ 2 f dr' / dr [ (u(r)ii (r'j ) - njj 

Comparing the two results and substituting the relation of the mean square number fluctuations to isothermal 
compressibility, equation (A2.2.128) one has 

/ drG(r) = kbTKr* (A2.2.131) 
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The correlation function G(f) quantifies the density fluctuations in a fluid. Characteristically, density 
fluctuations scatter light (or any radiation, like neutrons, with which they can couple). Then, if a radiation of 
wavelength X is incident on the fluid, the intensity of radiation scattered through an angle is proportional to 
the structure factor 


= n j dr 


S(r/) = ri j dFe W ? G(r) (A2.2.132) 

where \q\ =4n sin(0/2)A,. The limiting value of S(tj) as q -^ is then proportional to k t . Near the critical 

point of a fluid, anomalous density fluctuations create a divergence of k t which is the cause of the 
phenomenon of critical opalescence: density fluctuations become correlated over a lengthscale which is long 
compared to a molecular lengthscale and comparable to the wavelength of the incident light. This causes the 
light to be strongly scattered, whereby multiple scattering becomes dominant, making the fluid medium 
appear turbid or opaque. 


For systems in which the constituent particles interact via short-range pair potentials, W = YL}=\ Yl'j=] u( \ (?■ 
- ?.) | ), there are two relations, that one can prove by evaluating the average of the total energy E = K+ W, 
where K is the total kinetic energy, and the average pressure P, that are valid in general. These are 


(AD 

and the virial equation of state, 


1 1 f 

= -zk],T + -n a I drgir)u{r 


) (A2.2.133) 


P = n^T(\ - ^pp J d?^r)r^p-\ (A2.2.134) 

Here g(r) = G(r) + 1 is called a radial distribution function, since n Q g(r) is the conditional probability that a 
particle will be found at ?if there is another at the origin. For strongly interacting systems, one can also 

introduce the potential of the mean force w(r) through the relation g(r) = exp(-Pco(r)). Both g(r) and w(r) are 
also functions of temperature T and density n Q 


A2.2.5.3 DENSITY EXPANSION 

For an imperfect gas, i.e. a low-density gas in which the particles are, most of the time, freely moving as in an 
ideal gas and only occasionally having binary collisions, the potential of the mean force is the same as the pair 
potential u(r). Then, g(r)H& exp(-fiu(r))[l + 0(nj\, and from equation (A2.2.133) the change from the ideal 
gas energy,A[/= (E) - (E^eal' to lading order in n , is 
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N 1 Jv 1 dp J v 


(A2.2.135) 


where 


f(r) =e"^^- 1. (A2.2.136) 

Figure A2.2.5 shows a sketch of f(r) for Lennard- Jones pair potential. Now if AA is the excess Helmholtz free 
energy relative to its ideal gas value, then (-&AA) = log(g/g ideal ) and AU/N= [d(fiA A/N)/(d$)]. Then, 
integrating with respect to P, one obtains 

-fi&A/N = -/i / d?f{r)+0(nl). (A2.2.137) 

One can next obtain pressure P from the above by 


^(fi&A/N) , , 


(A2.2.138) 


where 


Bi(T) 


-I/- 


/(*"). 


(A2.2.139) 


The same result can also be obtained directly from the virial equation of state given above and the low-density 
form of g(r). B 2 (T) is called the second virial coefficient and the expansion of P in powers of n Q is known as 
the virial expansion, of which the leading non-ideal term is deduced above. The higher-order terms in the 
virial expansion for P and in the density expansion of g(r) can be obtained using the methods of cluster 
expansion and cumulant expansion. 
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Figure A2.2.5. Sketch of f(r) for the Lennard- Jones pair potential u(r) = 4s[(a/r) - (a/r) 6 ]; full curve ~Ps = 
1.0 and broken curve -Ps = 0.5. From Plischke and Bergensen 1985, further reading. 

For purely repulsive potentials (u(r) > 0),f(r) is negative and B 2 (T) is positive. For purely attractive potentials, 
on the other hand,/frj is always positive leading to a negative B 2 (T). Realistic interatomic potentials contain 
both a short-range repulsive potential (due to the strong short distance overlap of electronic wavefunctions of 
the two atoms) and a weaker longer range van der Waals attractive potential. The temperature dependence of 
B 2 (T) can be used to qualitatively probe the nature of interatomic potential. At a certain temperature T B , 
known as the Boyle temperature, the effects of attractive and repulsive potentials balance exactly, giving B 2 
(T B ) = 0. A phenomenological extension of the ideal gas equation of state was made by van der Waals more 
than 100 years ago. For one mole of gas, 


P V = RT^* (p+^(u-b) = RT, 


(A2.2.140) 


Here b corresponds to the repulsive part of the potential, which is equivalent to the excluded volume due to 
the finite atomic size, and a/v 2 corresponds to the attractive part of the potential. The van der Waals equation 


of state is a very good qualitative description of liquids as well as imperfect gases. Historically, it is the first 
example of a mean field theory. It fails only in the neighbourhood of a critical point due to its improper 
treatment of the density fluctuations. 

A2.2.5.4 IDEAL QUANTUM GASES 

Thermodynamics of ideal quantum gases is typically obtained using a grand canonical ensemble. In principle 
this can also be done using a canonical ensemble partition function, Q = i_^.exp(-p E ). For the photon and 

phonon gases, the canonical ensemble was used in section A2.2.4.6 and section A2. 2.4. 7 . Photons and 
phonons are massless and their total number indeterminate, since they can be created or destroyed, provided 
the momentum and energy are conserved in the process. On the other hand, for an ideal gas consisting of 
particles with non-zero mass, in a canonical ensemble, 
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the total number of particles is fixed at N. Thus, in the occupation number representation of the single-particle 
states j, the sum of all n- is constrained to be TV: 

fl|*&2 flj-H- ^ j * * j * 

where £. is the energy of theyth single-particle state. The restriction on the sum over n . creates a complicated 

combinatorial problem, which even though solvable, is non-trivial. This constraint is removed by considering 
the grand canonical partition function: 


v 


= xy"* x 4 ff "£ B ') cxp (" /, £''^) (A22141) 

= X ex p(-^X ( ^ _y *^)* 

Now the exponential factors for various w . within the sum are independent, which simplifies the result as 

The sum over n . can now be performed, but this depends on the statistics that the particles in the ideal gas 
obey. Fermi particles obey the Pauli exclusion principle, which allows only two possible values: n. = 0, 1. For 
Bose particles, n . can be any integer between zero and infinity. Thus the grand partition function is 


H = [~[l I + e"^"^] for reunions 


(A2.2.142) 


and 


E = Y\[ I - e"^"' !" 1 for bosons. 


(A2.2.143) 


This leads to, using equations (A2.2.123) , 

fiPV = -£0 = k> G S = ± J> e [l ± c-^>-'"] (A2.2.144) 
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where the upper sign corresponds to fermions and the lower sign to bosons. From equation (A2.2.141) , the 
average occupation number {iij) = (J (log 3}/<>(/J/i>.. From this one obtains 

[Itj) = [&**-& ± I]" 1 (A2.2.145) 

where again the upper sign corresponds to fermions and the lower sign to bosons. From this, one has, for the 
total number of particles, (N), 


(AO^^J^re^-'i]]- 1 


and for the total internal energy U = (E) 


(A2.2.146) 


(A2.2.147) 


When the single-particle states j are densely packed within any energy interval of k^T, the sum overy can be 
replaced by an integral over energy such that 




(A2.2.148) 


Using equation (A2.2.88) , this can be rewritten as 


^ 2 Y V - /«* i 

T ^ V 4 ^ 


(A2.2.149) 


Using this approximation, expressions for (TV), [/and P reduce to 


7T ~i V* /o 


(A2.2.150) 


(A2.2.151) 


T^(0M> 


TT>V q 
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7Z^V a JO 


(A2.2.152) 


and 


■ -^-/T'Fjtfn) (A2.2.153) 


P=± -T^-0* / *** tog[l ±e^ € "^] (A2.2.154) 

= l-^-P* Td^Mc^"^ ± ]j" ] (A2.2.155) 

2 2K .-i 


3jt*V, 


ft'^F^Pfi). (A2.2.156) 


An integration by parts was used to deduce equation (A2.2.155) from equation (A2.2.154). Comparing the 
results for [/and P, one finds that, just as for the classical gas, for ideal quantum gases, also, the relation U = 
kPVis satisfied. In the above results it was found that P = P(fi\i) and (N)/V= n Q = n (fi\x). In principle, one 

has to eliminate (P|u) between the two in order to deduce the equation of state, P = P($ 9 n Q ), for ideal quantum 
gases. Now F*(fi\i) is a function of a single variable. Therefore P is a homogeneous function of order £in \i 

and k^T. Similarly n Q is a homogeneous function of order ^in \i and k^T; and so is S/V= (d P/8 T) y . This 

means that S/(N) is a homogeneous function of order zero, i.e. S/(N) = (KPmO, which in turn implies that for an 
adiabatic process Pjli remains constant. Thus, from the expressions above for P and (N)/V, one has for 
adiabatic processes, fyi = constant, y f -i= constant and j VP = constant. 

A2.2.5.5 IDEAL QUANTUM GASES— CLASSICAL LIMIT 

When the temperature is high and the density is low, one expects to recover the classical ideal gas limit. The 
number of particles is still given by N = 2J/z.. Thus the average number of particles is given by equation 

(A2.2.146) . The average density (N)/V= n Q is the thermodynamic density. At low n Q and high Tone expects 
many more accessible single-particle states than the available particles, and (TV) = 12 £n •) means that each (n •) 

must be small compared to one. Thus, from equation (A2.2.145) for ( n .- , the classical limit corresponds to the 
limit when exp((3(e .-|i)) » 1 . This has to be so for any i ., which means that the fugacity z = exp(-P|u) » or (- 

Pjli) » at low n Q and high T. In this classical limit, 
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(n .) = exp(-P(e . - u)). The chemical potential u is determined from (N) = U/« •), which leads to the result that 

^ = A b 7" log J _-L__ J (A2.2.157) 


with the final result that 


e -*7 


{B ' } = W E^ 


This is the classical Boltzmann distribution in which (n-)/(N), the probability of finding a particle in the 
single-particle state j, is proportional to the classical Boltzmann factor %-fitj. 

Now log Q N = -ft A, A = G- PV= \i(N) - PVand $PV= log S. Thus the canonical partition function is 

Iog0((tf), V\n = -^W + logE 
which leads to the classical limit result: 

log Q = - pfi(N) ± Y^ l^£l I ± z-M*'-^} 

J 

= - mm + £>;> = -^w + w 


where the approximation log (1 + x) « x for small x is used. Now, from the result for \i above, in equation 
(A2.2.157), one has 


fift = \0£{N) - log Y, C ***• (A2.2.158) 


Thus 


log Q = - (N) \o${N) + {N) + (AT) lt>g J^ e~ pt ' . 

i 
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For large (N), (N) log (N)-(N)« log«A0!) whereby 


fnvn.v.T).— (E^'") - 


This result is identical to that obtained from a canonical ensemble approach in the thermodynamic limit, 
where the fluctuations in N vanish and (TV) = N. The single-particle expression for the canonical partition 
function q, = £\ c-^-can be evaluated using ^ . — (j ljr )- V~^n 2 /{2 M )f° r a particle in a cubical box of volume V. 

In the classical limit, the triplet of quantum numbers rt . can be replaced by a continuous variable through the 

transformation 

£ , _> (K ^j /« di- T j"» di- y /» dA v and fm^ -+ *t V Vjt)JI= = P' which is the momentum of the classical 
particle. The transformation leads to the result that 


fiftfp 


This is the same as that in the canonical ensemble. All the thermodynamic results for a classical ideal gas then 
follow, as in section A2. 2.4.4 . In particular, since from equation (A2.2.158) the chemical potential is related 
to Qp which was obtained in equation (A2.2.88) , one obtains 

Pfi = \og{N) - \ogQ t = \og\({N) V^/iyV)] = log(«„ V H /r) 

or, equivalently, z = n Q V Jy. The classical limit is valid at low densities when n Q « y/V ), i.e. when z = exp(P|u) 
« 1. For n Q > (y/V ) one has a quantum gas. Equivalently, from the definition of V one has a quantum gas 
when k^T is below 

Ifz = exp(P|n)«l, one can also consider the leading order quantum correction to the classical limit. For this 
consider the thermodynamic potential a> G given in equation (A2.2.144) . Using equation (A2.2.149) , one can 
convert the sum to an integral, integrate by parts the resulting integral and obtain the result: 

4y V f*\ y* 

fi C = i / dy ' (A2.2.159) 

where z = exp((3|n) is the fugacity, which has an ideal gas value of n Q V Jy. Note that the integral is the same as 
Fa). Since V n is proportional to ft 3 , and z is small, the expansion of the integrand in powers of z is appropriate 

2 4 

and leads to the leading quantum correction to the classical ideal gas limit. Using 
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[Z _1 e v ± Ip 1 = Z*~ y [\ ±ze~ y r ] (A2.2.160) 

= ze _>, [l =Fze" v + 0(; 2 )] (A2.2.161) 


Q n can be evaluated with the result 


a 


= -PV =-2lXl [l T _L + 0(r) . (A2.2.162) 

v L 2* J 


The first term is the classical ideal gas term and the next term is the first-order quantum correction due to 
Fermi or Bose statistics, so that one can write 

Q c = ng ± ■£— -^ + otfy (A2.2.163) 

The small additions to all thermodynamic potentials are the same when expressed in terms of appropriate 
variables. Thus the first-order correction term when expressed in terms of Fand P is the correction term for 
the Helmholtz free energy A: 

A = A d ± — — S-» 3 + ■ • - (A2.2.164) 

where the classical limiting value z = n Q V /y, and the definition in equation (A2.2.88) of V is used. Finally, 
one can obtain the correction to the ideal gas equation of state by computing P = -(8A/dV)n N . The result is 

*-«A1-[l±i(2£)W..]. (A2.2.165, 

The leading correction to the classical ideal gas pressure term due to quantum statistics is proportional to ft 
and to n . The correction at constant density is larger in magnitude at lower temperatures and lighter mass. 
The coefficient of n Q can be viewed as an effective second virial coefficient B^(T\ The effect of quantum 

statistics at this order of correction is to add to a classical ideal gas some effective interaction. The upper sign 
is for a Fermi gas and yields a positive /J^ 7" equivalent to an effective repulsive interaction which is a 

consequence of the Pauli exclusion rule. The lower sign is for a Bose gas which yields a negative 
B^(T corresponding to an effective attractive interaction. This is an indicator of the tendency of Bose 

particles to condense in the lowest-energy state. This phenomena is treated in section A2. 2. 5. 7 . 
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A2.2.5.6 IDEAL FERMI GAS AND ELECTRONS IN METALS 

The effects of quantum statistics are relevant in many forms of matter, particularly in solids at low 
temperatures. The presence of strong Coulombic interaction between electrons and ions leads one to expect 
that the behaviour of such systems will be highly complex and nonlinear. It is then remarkable that numerous 
metals and semiconductors can be well described in many respect in terms of models of effectively non- 
interacting 'particles'. One such example is the thermal properties of conducting electrons in metals, which 
can be well approximated by an ideal gas of fermions. This approximation works at high densities on account 
of the Pauli exclusion principle. No two indistinguishable fermions occupy the same state, many single- 
particle states are filled and the lowest energy of unoccupied states is many times k^T, so that the energetics of 
interactions between electrons become negligible. If the conduction electrons (mass m Q ) in a metal are 
modelled by an ideal Fermi gas, the occupation number in theyth single-particle state is (from equation 
(A2.2.145) 


{«;> = fUj) where f(e f ) = \e^~^ + 1 1- ] (A2.2.166) 

9 i 

with €. = (hk)/(2m) and fc = n;r V~ jas for a free particle in a cubical box. Consider first the situation at T = 

J e 

0, P = 00. Since |u depends on T, it is useful to introduce the Fermi energy and Fermi temperature as * F = k^ T^ 
= \x(T= 0). At T= 0, (n) is one for €. * F and is zero for £• > * F . Due to the Pauli exclusion principle, each 
single-particle state n. is occupied by one spin-up electron and one spin-down electron. The total available, TV, 
electrons fill up the single-particle states up to the Fermi energy * F which therefore depends on N. If n^ is 
defined via ^ _ (&)-/(2,t^)vH{tfi^) 2 ' then N = (2)(^)(4jrjiJ/3) = ?r^/3> which g ives the relation 
between N and * F : 

^ = Jt b r h - = ih) l f{2mJ07r 2 NfV)K (A2.2.167) 

The total energy of the Fermi gas at T= is 


u n = 5^(n/}^ = 52 *» = 2 («/ / d/in ^» 


(A2.2.168) 





w * = ^Afti-. 


2m c VWo K)m e W h 5 

The average kinetic energy per particle at T= 0, is |of the Fermi energy * F . At constant TV, the energy 

2 
increases as the volume decreases since e p ~ JO. Due to the Pauli exclusion principle, the Fermi energy gives 

a repulsive contribution to the binding of any material. This is balanced by the Coulombic attraction between 
ions and electrons in metals. 

The thermal average of a physical quantity X can be computed at any temperature through 


-59- 

This can be expressed, in terms of the density of states 2>(*), as 

{X) = j &tV(t)f(t,T >t i)X(e). 
For the total number of particles N and total energy U one has 

dfV<f)f(f t I\ f L) 


N-f* 


and 


u =h 


P(«)e/(€,7\/i). 


For an ideal gas the density of states is computed in section A2.2.2 ( equation A2.2.8 ). Its use in evaluating N 
at T = gives 

jrn-i V /2m e 6y\i J V /2lReV i /«oo«™ 

"-ir-^b^J and Dt ">-5?(-? L )«*- (A2Z169) 

This gives a simple relation 

P(€ F ) = 3W/(2f H ) = 3W/(2t b 7i;)- (A2.2.170) 
At T = 0, TV and [/ obtained above can also be found using 


ftp f*r 

N= / d*I>(€) and U= I UeP(0*. 


(A2.2.171) 


If the increase in the total energy of a system of TV conduction electrons when heated from zero to 7 is denoted 
by At/, then 

AU = J d<: !>(€)€/(€) - / d<: V(€)€. 

Jo Jo 

Now multiplying the identity (N = /^ df V{f.)f{e) = f* f de U(e))by f F , one has 
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Of + / ) tie e F l><€>/(«) = I* d* e F l>(t). 

Using this, one can rewrite the expression for At/ in physically transparent form: 

AU = [ d<= (€ -*v)Vic)f(e)+ [ ' d«{e F -OP(0[1 - /(*)]■ (A2.2.172) 

The first integral is the energy needed to move electrons from * F to orbitals with energy e> £p, and the second 
integral is the energy needed to bring electrons to * F from orbitals below £p The heat capacity of the electron 
gas can be found by differentiating AU with respect to T. The only ^-dependent quantity isffi). So one obtains 


3U f 00 df 


Now, typical Fermi temperatures in metals are of the order of 50 000 K. Thus, at room temperature, TIT^ is 
very small compared to one. So, one can ignore the T dependence of jli, to obtain 


tiT (c A + I)- 


This is a very sharply peaked function around * F with a width of the order of k^T. (At T= 0,J{€) is a step 
function and its temperature derivative is a delta function at £p.) Thus in the integral for C, 5 one can replace 
P(<=) by its value at £ F? transform the integration variable from fto x and replace the lower limit of x, which is 
(-(3s F ), by (-00). Then one obtains 


■x »2**X A ^2 


/■cm r 2 e- T f x 


jr'e* 


^ p (A2.2.173) 

where equation (A2.2.170) is used. This result can be physically understood as follows. For small T/T^, the 
number of electrons excited at Tfrom the T= step-function Fermi distribution is of the order of NT/T^ and 

the energy of each of these electrons is increased by about k^T. This gives A U -Nk^T IT^ and C j - Nk^T/T^. 

In typical metals, both electrons and phonons contribute to the heat capacity at constant volume. The 
temperature-dependent expression 


-61- 


C v = A eL r + /t ph T* (A2.2.174) 

where A , is given in equation (A2.2.96) obtained from the Debye theory discussed in section A2. 2.4. 7 , fits 
the low-temperature experimental measurements of C y for many metals quite well, as shown in figure A2.2.6 
for copper. 
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Figure A2.2.6. Electronic contribution to the heat capacity C y of copper at low temperatures between 1 and 4 
K. (From Corak et al [2]). 

A2.2.5.7 IDEAL BOSE GAS AND BOSE-EINSTEIN CONDENSATION 

In an ideal Bose gas, at a certain transition temperature a remarkable effect occurs: a macroscopic fraction of 
the total number of particles condenses into the lowest-energy single-particle state. This effect, which occurs 
when the Bose particles have non-zero mass, is called Bose-Einstein condensation, and the key to its 
understanding is the chemical potential. For an ideal gas of photons or phonons, which have zero mass, this 
effect does not occur. This is because their total number is arbitrary and the chemical potential is effectively 
zero for the photon or phonon gas. 

From equation (A2.2.145) , the average occupation number of an ideal Bose gas is 

(fly) = [*^-^ - IJ" 1 = U" a c^ 1 - l]" 1 , 
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There is clearly a possible singularity in ( n .) if (£■ - \i) vanishes. Let the energy scale be chosen such that the 


ground-state energy * o = 0. Then the ground-state occupancy is 


(uo)-te* l -iJ" , = U" l -ir l - 


At T= 0, it is expected that all the N particles will be in the ground state. Now if at low temperatures, N& (n ) 
is to be large, such as 10 , then one must have z very close to one and 3u « 1. Thus 


which gives the chemical potential of a Bose gas, as T^> 0, to be 

k T I 

H = — 7- and the fugacity z = t^ ^ ] - — . (A2.2.1 75) 

N u N 

The chemical potential for an ideal Bose gas has to be lower than the ground-state energy. Otherwise the 
occupancy ( n) of some state y would become negative. 

Before proceeding, an order of magnitude calculation is in order. For N= 10 at T= 1 K, one obtains \i = - 
1 .4 x 10~ 36 ergs. For a He 4 atom (mass M) in a cube with V= I? , the two lowest states correspond to (n % , n. 
n z ) = (1, 1, 1), and (2, 1,1). The difference in these two energies is A* = €(21 l)-f(l 1 1) = V?n 2 l(2ML 2 ). For a 

box with L = 1 cm containing He particles, Ae = 2.5 x 10 ergs. This is very small compared to k^T, which 

even at 1 mK is 1.38 xlCT 19 ergs. On the other hand, Ae is large compared to |u, which at 1 mK is -1.4 xlO -39 
ergs. Thus the occupancy of the (211) orbital is < n 2U ) ~ [exp(PA<0 - l]" 1 « [p A ep 1 « 0.5 x 10 11 , and <« in > 

« N& 10 20 , so that the ratio (^n^^in) ~ ^ 10 _9 > a ver Y small fraction. 
For a spin-zero particle in a cubic box, the density of states is 


V /2A*V - 


The total number of particles in an ideal Bose gas at low temperatures needs to be written such that the 
ground-state occupancy is separated from the excited-state occupancies: 
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= AMD + / deV(<-)\z- ] e* - l|" 1 (A2.2.176) 

V /2A/V /** , f * 


I 

+ 


Since VU )is zero when £= 0, the ground state does not contribute to the integral for N Q . At sufficiently low 
temperatures, N will be very large compared to one, which implies z is very close to one. Then one can 
approximate z by one in the integrand for N . Then the integral can be evaluated by using the transformation x 
= Ps and the known value of the integral 


/0 


I 

e< - 1 


The result for the total number in the excited states is 

1.306 V /2Mk b T\* 


N e (T) = 


/ 2Mk b r y 


2.6J2» 4 V 


(A2.2.177) 


where 'i H s V^ ' = [(Mk\,T)fi2nJr)]*is the quantum concentration. The fraction N^/N& 2. 612« /« where n 
= NIV. This ratio can also be written as 


§ = (pi 1 


whore 7 t = 




(—i- 

\2M2VJ 


(A2.2.178) 


T v is called the Einstein temperature; N (T v ) = N. Above T v the ground-state occupancy is not a macroscopic 


number. Below 7^, however, 7V o begins to become a macroscopic fraction of the total number of particles 
according to the relation 


*-*-*- *[!-(£)']. 


(A2.2.179) 


The function A^ (2) is sketched in figure A2.2.7 . At zero temperature all the Bose particles occupy the ground 
state. This phenomenon is called the Bose-Einstein condensation and T E is the temperature at which the 
transition to the condensation occurs. 
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Figure A2.2.7. Fraction of Bose particles in the ground state as a function of the temperature. 


A2.2.6 SUMMARY 


In this chapter, the foundations of equilibrium statistical mechanics are introduced and applied to ideal and 
weakly interacting systems. The connection between statistical mechanics and thermodynamics is made by 
introducing ensemble methods. The role of mechanics, both quantum and classical, is described. In particular, 
the concept and use of the density of states is utilized. Applications are made to ideal quantum and classical 
gases, ideal gas of diatomic molecules, photons and the black body radiation, phonons in a harmonic solid, 
conduction electrons in metals and the Bose — Einstein condensation. Introductory aspects of the density 


expansion of the equation of state and the expansion of thermodynamic quantities in powers of flare also 
given. Other chapters deal with the applications to the strongly interacting systems, and the critical 
phenomena. Much of this section is restricted to equilibrium systems. Other sections discuss kinetic theory of 
fluids, chaotic and other dynamical systems, and other non-equilibrium phenomena. 
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A2.3 Statistical mechanics of strongly interacting 
systems: liquids and solids 

Jayendran C Rasaiah 

Had I been present at the creation, I would have given some useful hints for the better 

ordering of the universe. 

Alphonso X, Learned King of Spain, 1252-1284 


A2.3.1 INTRODUCTION 

Statistical mechanics provides a link between the microscopic properties of a system at an atomic or 


molecular level, and the equilibrium and dynamic properties measured in a laboratory. The statistical element 
follows from the enormously large number of particles involved, of the order of Avogadro's number (6.023 x 

10 23 ), and the assumption that the measured properties (e.g. the pressure) are averages over instantaneous 
values. The equilibrium properties are determined from the partition function, while the transport coefficients 
of a system, not far from its equilibrium state, are related to equilibrium time correlation functions in the so- 
called linear response regime. 

Fluctuations of observables from their average values, unless the observables are constants of motion, are 
especially important, since they are related to the response functions of the system. For example, the constant 
volume specific heat C u of a fluid is a response function related to the fluctuations in the energy of a system at 
constant N, Vand T, where N is the number of particles in a volume Fat temperature T. Similarly, fluctuations 
in the number density (p = N/V) of an open system at constant |u, Fand T, where \i is the chemical potential, 
are related to the isothermal compressibility k^ which is another response function. Temperature-dependent 
fluctuations characterize the dynamic equilibrium of thermodynamic systems, in contrast to the equilibrium of 
purely mechanical bodies in which fluctuations are absent. 

In this chapter we discuss the main ideas and results of the equilibrium theory of strongly interacting systems. 
The partition function of a weakly interacting system, such as an ideal gas, is easily calculated to provide the 
absolute free energy and other properties (e.g. the entropy). The determination of the partition function of a 
strongly interacting system, however, is much more difficult, if not impossible, except in a few special cases. 
The special cases include several one-dimensional systems (e.g. hard rods, the one-dimensional (ID) Ising 
ferromagnet), the two-dimensional (2D) Ising model for a ferromagnet at zero magnetic field and the entropy 
of ice. Onsager's celebrated solution of the 2D Ising model at zero field profoundly influenced our 
understanding of strongly interacting systems near the critical point, where the response functions diverge. 
Away from this region, however, the theories of practical use to most chemists, engineers and physicists are 
approximations based on a mean-field or average description of the prevailing interactions. Theories of fluids 
in which, for example, the weaker interactions due to dispersion forces or the polarity of the molecules are 
treated as perturbations to the harsh repulsive forces responsible for the structure of the fluid, also fall into the 
mean-field category. 

The structure of a fluid is characterized by the spatial and orientational correlations between atoms and 
molecules determined through x-ray and neutron diffraction experiments. Examples are the atomic pair 
correlation functions (g , g oh , g hh ) in liquid water. An important feature of these correlation functions is that 
the thermodynamic properties of a 


system can be calculated from them. The information they contain is equivalent to that present in the partition 
function, and is more directly related to experimental observations. It is therefore natural to focus attention on 
the theory of these correlation functions, which is now well developed, especially in the region away from the 
critical point. Analytic and numerical approximations to the correlations functions are more readily 
formulated than for the corresponding partition functions from which they are derived. This has led to several 
useful theories, which include the scaled particle theory for hard bodies and integral equations approximations 
for the two body correlation functions of simple fluids. Examples are the Percus-Yevick, mean spherical and 
hypernetted chain approximations which are briefly described in this chapter and perturbation theories of 
fluids which are treated in greater detail. 

We discuss classical non-ideal liquids before treating solids. The strongly interacting fluid systems of interest 
are hard spheres characterized by their harsh repulsions, atoms and molecules with dispersion interactions 
responsible for the liquid-vapour transitions of the rare gases, ionic systems including strong and weak 
electrolytes, simple and not quite so simple polar fluids like water. The solid phase systems discussed are 
ferromagnets and alloys. 


A2.3.2 CLASSICAL NON-IDEAL FLUIDS 

The main theoretical problem is to calculate the partition function given the classical Hamitonian 


where K(p N ) is the kinetic energy, E- x is the internal energy due to vibration, rotation and other internal 
degrees of freedom and 

U N {T N ,i*J N ) = ^ liij (n;. W,.W;) + ^ UijkU'iJ > r '*' r Jy P^f.W; J Wi)+--- (A2.3.2) 

is the intermolecular potential composed of two-body, three-body and higher-order interactions. Here /A 

stands for the sets of momenta {ppP 2 > • • • ->P^} of the TV particles, and likewise r^and co^are the 
corresponding sets of the positions and angular coordinates of the TV particles and r.. is the distance between 

particles / andy. For an ideal gas U^r^, co ) = 0. 
A2.3.2.1 INTERATOMIC POTENTIALS 

Information about interatomic potentials comes from scattering experiments as well as from model potentials 
fitted to the thermodynamic and transport properties of the system. We will confine our discussion mainly to 

systems in which the total potential energy U(r^, co N ) for a given configuration {r^, ($ N } is pairwise additive, 
which implies that the three- and higher-body potentials are ignored. This is an approximation because the 
fluctuating electron charge distribution in atoms and molecules determines their polarizability which is not 
pair- wise additive. However, the total potential can be approximated as the sum of effective pair potentials. 


A few of the simpler pair potentials are listed below, 
(a) The potential for hard spheres of diameter a 


3C ■ r < a;; 
(J r > tijj. 


««(')= {ft , ... , (A2.3.3) 

(b) The square well or mound potential 


Hi; " 


oo r < Ujj 


djj «y < r < hjj (A2.3.4) 


bij < r. 


(c) The Lennard- Jones potential 


"">> = 4 * [(7) 12 - (7)'] ( A235 > 

which has two parameters representing the atomic size a and the well depth s of the interatomic potential. The 


r 6 dependence of the attractive part follows from the dispersive forces between the particles, while the r 
dependence is a convenient representation of the repulsive forces. The potential is zero at r = a and -s at the 

minimum when r = 2 /6 a. Typical values of s and a are displayed in table A2.3.1. 
Table A2.3.1 Parameters for the Lennard- Jones potential. 


Substance CT (A) elk (K) 


He 

2.556 

10.22 

Ne 

2.749 

35.6 

Ar 

3.406 

119.8 

Kr 

3.60 

171 

Xe 

4.10 

221 

CH 4 

3.817 

148.2 
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(d) The Coulomb potential between charges e. and e. separated by a distance r 

ro"V)-— ■ (A236) 

(e) The point dipole potential 

u™(r.u>i, Uj) = -~[Hfr * r ti ){ t x i • r u )ir 2 - fa . ftj)] (A2.3.7) 

where |u. is the dipole moment of particle i and r = \r..\ is the intermolecular separation between the point 

Z IJ 

dipoles i andy. 

Thermodynamic stability requires a repulsive core in the interatomic potential of atoms and molecules, which 
is a manifestation of the Pauli exclusion principle operating at short distances. This means that the Coulomb 
and dipole interaction potentials between charged and uncharged real atoms or molecules must be 
supplemented by a hard core or other repulsive interactions. Examples are as follows. 

(f) The restricted primitive model (RPM) for ions in solution 


u R ™(r) = Uj H . s (r) + ng oul (r) (A2.3.8) 

in which the positive or negative charges are embedded in hard spheres of the same size in a continuum 
solvent of dielectric constant s. An extension of this is the primitive model (PM) electrolyte in which the 
restriction of equal sizes for the oppositely charged ions is relaxed. 

Other linear combinations of simple potentials are also widely used to mimic the interactions in real systems. 
An example is the following. 

(g) The Stockmayer potential for dipolar molecules: 

U* (r, Wil wj) = H L V) + f/^Vwi. **j) (A2.3.9) 

which combines the Lennard- Jones and point dipole potentials. 

Important applications of atomic potentials are models for water (TIP3, SPC/E) in which the intermodular 
potential consists of atom-atom interactions between the oxygen and hydrogen atoms of distinct molecules, 
with the characteristic atomic geometry maintained (i.e. an HOH angle of 109° and a intramolecular OH 
distance of 1 A) by imposing constraints between atoms of the same molecule. For example, the effective 
simple point charge model (SPC/E) for water is defined as a linear combination of Lennard- Jones interactions 
between the oxygen atoms of distinct molecules and Coulombic interactions between the charges adjusted for 
a self-polarization correction. 


The SPC/E model approximates many-body effects in liquid water and corresponds to a molecular dipole 
moment of 2.35 Debye (D) compared to the actual dipole moment of 1.85 D for an isolated water molecule. 
The model reproduces the diffusion coefficient and thermodynamics properties at ambient temperatures to 
within a few per cent, and the critical parameters (see below) are predicted to within 15%. The same model 
potential has been extended to include the interactions between ions and water by fitting the parameters to the 
hydration energies of small ion-water clusters. The parameters for the ion-water and water-water interactions 
in the SPC/E model are given in table A2.3.2. 

Table A2.3.2 Halide-water, alkali metal cation-water and water-water potential parameters (SPC/E model). 
In the SPC/E model for water, the charges on H are at 1.000 A from the Lennard- Jones centre at O. The 
negative charge is at the O site and the HOH angle is 109.47°. 


Ion/water 

G. (A) 
IO x ' 

s. o (kJ mol" 1 ) 

Charge (q) 

F" 

3.143 

0.6998 

-1 

cr 

3.785 

0.5216 

-1 

Br 

3.896 

0.4948 

-1 

r 

4.168 

0.5216 

-1 


Li + 

2.337 

0.6700 

+1 

Na + 

2.876 

0.5216 

+1 

K + 

3.250 

0.5216 

+1 

Rb + 

3.348 

0.5216 

+1 

Cs + 

3.526 

0.5216 

+1 

Water-water 

a (A) 

00 v ' 

c (kJmor 1 ) 

00 v ' 

Charge (q) 

0(H 2 0) 

3.169 

0.6502 

-0.8476 

H(H 2 0) 
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A2.3.2.2 EQUATIONS OF STATE, THE VIRIAL SERIES AND THE LIQUID-VAPOUR CRITICAL POINT 

The equation of state of a fluid relates the pressure (P), density (p) and temperature (7), 

P = Pip, T)> (A2.3.10) 


It is determined experimentally; an early study was the work of Andrews on carbon dioxide [JJ. The exact 
form of the equation of state is unknown for most substances except in rather simple cases, e.g. a ID gas of 
hard rods. However, the ideal gas law P = pkT, where k is Boltzmann's constant, is obeyed even by real fluids 
at high temperature and low densities, and systematic deviations from this are expressed in terms of the virial 
series: 


Z = P/pkT = l \ B 2 {T)p+ Bi{T)p 2 +-- (A2.3.11) 

which is an expansion of the compressibility factor Z = P/pkTin powers of the number density p at constant 
temperature. Here B 2 (T), # 3 (7), . . . , B (T) etc are the second, third, . . . and nth virial coefficients determined 
by the intermolecular potentials as discussed later in this chapter. They can be determined experimentally, but 
the radius of convergence of the virial series is not known. Figure A2.3.1 shows the second virial coefficient 
plotted as a function of temperature for several gases. 
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Figure A2.3.1 Second virial coefficient B 2 (T) of several gases as a function of temperature T. (From [10]). 

The temperature at which B 2 (T) is zero is the Boyle temperature T B . The excess Helmholtz free energy 
follows from the thermodynamic relation 




=r(^- , ) dinp 


%. 


BnW) „_, 


(A2.3.12) 


V^ 1 u t \Vi t , 


tr=2 


The first seven virial coefficients of hard spheres are positive and no Boyle temperature exists for hard 
spheres. 


Statistical mechanical theory and computer simulations provide a link between the equation of state and the 
interatomic potential energy functions. A fluid-solid transition at high density has been inferred from 
computer simulations of hard spheres. A vapour-liquid phase transition also appears when an attractive 
component is present in the interatomic potential (e.g. atoms interacting through a Lennard- Jones potential) 
provided the temperature lies below T , the critical temperature for this transition. This is illustrated in figure 
A2.3.2 where the critical point is a point of inflexion of the critical isotherm in the P - Fplane. 

Below T , liquid and vapour coexist and their densities approach each other along the coexistence curve in the 
T-V plane until they coincide at the critical temperature T . The coexisting densities in the critical region are 
related to T- T Q by the power law 

(A2.3.13) 

where P is called a critical exponent. The pressure P approaches the critical pressure P Q along the critical 
isotherm like 

(A2.3.14) 


which defines another critical exponent 8. The isothermal compressibility k- and the constant volume specific 
heat Cy are response functions determined by fluctuations in the density and the energy. They diverge at the 

critical point, and determine two other critical exponents a and y defined, along the critical isochore, by 


(A2.3.15) 
and 

(A2.3.16) 
As discussed elsewhere in this encyclopaedia, the critical exponents are related by the following expressions: 

(A2.3.17) 

The individual values of the exponents are determined by the symmetry of the Hamiltonian and the 
dimensionality of the system. 

Although the exact equations of state are known only in special cases, there are several useful approximations 
collectively described as mean-field theories. The most widely known is van der Waals' equation [2] 

(A2.3.18) 


7"-conit 



(hi 

Figure A2.3.2 (a) P-V-T surface for a one-component system that contracts on freezing, (b) P-V isotherms in 
the region of the critical point. 


The parameters a and b are characteristic of the substance, and represent corrections to the ideal gas law due 
to the attractive (dispersion) interactions between the atoms and the volume they occupy due to their repulsive 
cores. We will discuss van der Waals' equation in some detail as a typical example of a mean-field theory. 


van der Waals' equation shows a liquid-gas phase transition with the critical constants p^ = (1/36), P = al 
(21b \ T = 8a/(21kb). This follows from the property that at the critical point on the P-V plane there is a 

9 9 

point of inflexion and (d P/dV )„ = (dP/dV„) = 0. These relations determine the parameters a and b from the 


experimentally determined critical constants for a substance. The compressibility factor Z Q = P Q /p c kT c , 
however, is 3/8 in contrast to the experimental value of about »0.30 for the rare gases. By expanding van der 
Waals' equation in powers of p one finds that the second virial coefficient 


tt 


kT 


(A2.3.19) 


This is qualitatively of the right form. As T — » go, B 2 (7) — » b and B 2 (T) = at the Boyle temperature, T B = al 
(kb). This provides another route to determining the parameters a and b. 


van der Waals' equation of state is a cubic equation with three distinct solutions for Fat a given P and T 
below the critical values. Subcritical isotherms show a characteristic loop in which the middle portion 
corresponds to positive (dP/dV) T representing an unstable region. 

The coexisting densities below T Q are determined by the equalities of the chemical potentials and pressures of 
the coexisting phases, which implies that the horizontal line joining the coexisting vapour and liquid phases 
obeys the condition 


/ J vapour "~ Aliu.LiiJ 


= / Vdp=0 

J Liquid 


(7* constant). 


(A2.3.20) 


This is the well known equal areas rule derived by Maxwell [3], who enthusiastically publicized van der 
Waal's equation (see figure A2. 3. 3 . The critical exponents for van der Waals' equation are typical mean-field 
exponents a « 0, P = 1/2, y = 1 and 5 = 3. This follows from the assumption, common to van der Waals' 
equation and other mean-field theories, that the critical point is an analytic point about which the free energy 
and other thermodynamic properties can be expanded in a Taylor series. 
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Figure A2.3.3 P-V isotherms for van der Waals' equation of state. Maxwell's equal areas rule (area ABE 
area ECD) determines the volumes of the coexisting phases at subcritical temperatures. 

van der Waals' equation can be written in the reduced form 


Pr = 


87-k 
<3V H -1) 


vs 


(A2.3.21) 


using the reduced variables V R = VI V c , P R = PIP Q and T R = TIT Q . The reduced second virial coefficient 

Bi(T) 


B 2 ATr) = 


_3 sr R J 


(A2.3.22) 


where V Q = VJNand the reduced Boyle temperature 7 RB = 27/8. The reduced forms for the equation of state 

and second virial coefficient are examples of the law of corresponding states. The statistical mechanical basis 
of this law is discussed in this chapter and has wider applicability than the van der Waals form. 


A2.3.3 ENSEMBLES 

An ensemble is a collection of systems with the same r + 2 variables, at least one of which must be extensive. 
Here r is the number of components. For a one-component system there are just three variables and the 
ensembles are characterized by given values of N, V, T (canonical), |i, V, T (grand canonical), N, V, E 
(microcanonical) and N, P, T (constant pressure). Our discussion of strongly interacting systems of classical 
fluids begins with the fundamental equations in canonical and grand canonical ensembles. The results are 
equivalent to each other in the thermodynamic limit. The particular choice of an ensemble is a matter of 
convenience in developing the theory, or in treating the fluctuations like the density or the energy. 
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A2.3.3.1 CANONICAL ENSEMBLE (N, V, T) 

This is a collection of closed systems with the same number of particles TV and volume V (constant density) for 
each system at temperature T. The partition function 

(A2.3.23) 

where Q- x is the internal partition function (PF) determined by the vibration, rotation, electronic states and 
other degrees of freedom. It can be factored into contributions q> t from each molecule so that C?im. = <Am- The 

factor 1/A is the translational PF in which A = hi (2% mkT) is the thermal de Broglie wavelength. The 
configurational PF assuming classical statistics for this contribution is 


Z(N, V,Ty=^f™ P {-pUAr f, \u f! ))dr' v du t/ 


(A2.3.24) 


where Q is 4tt for linear molecules and 8tt for nonlinear molecules. The classical configurational PF is 
independent of the momenta and the masses of the particles. In the thermodynamic limit (A^> qo, F— » oo, NIV 
= p), the Helmholtz free energy 

A - -kTl&Q(N r V, T). (A2.3.25) 

Other thermodynamic properties are related to the PF through the equation 

dA = -SdT - pdV + ^/i, dNj (A2.3.26) 


where |u. is the chemical potential of species i, and the summation is over the different species. The pressure 

P = -kT (3 In ZfdV) St T (A2.3.27) 

and since the classical configurational PF Zis independent of the mass, so is the equation of state derived 
from it. Differences between the equations of state of isotopes or isotopically substituted compounds (e.g. 
H 2 and D 2 0) are due to quantum effects. 


For an ideal gas, t/(r\ co N ) = and the configurational PF Z(N, V,T)= V N . Making use of Sterling's 
approximation for N\ « (e/N) N for large N, it follows that the Helmholtz free energy 

A ideal = -NkT InUf^e/A 3 ) + NkT\np (A2.3.28) 


-12- 
and the chemical potential 

w MMl = OA M /W) r7 = kT ln(A 3 /«mi) F kT Inp, (A2.3.29) 

The excess Helmholtz free energy 

A c * = A - A M = -AT ln[Z(A\ V h T)f V*] (A2.3.30) 

and the excess chemical potential 

/t m = (dA**f9N) v , T 

** A**(N + 1, V. T) - A**{N f V. T) for larye N (A2.3.31) 

= *rin[vzov. v ? r>/z(iV - 1. v f T)]. 

Confining our attention to angularly-independent potentials to make the argument and notation simpler, 

in which 

U]YM<r w+1 ) = C/a(t-' V ) + AU^r^r") 

where A U N (r N+ ^ r^) is the interaction of the (7V+ l)th particle situated at r^ +1 with the remaining N 
particles at coordinates r* = {r 1? r 2 , . . . , r^}. Substituting this into the expression for Z(7V+ 1, F, 7) we see 
that the ratio 

z<a\ wn " /ov, wo 

But exp(-(3 U-KJ(r))/Z(Ni V, T) is just the probability that the N particle system is in the configuration {r^}, 


and it follows that 


H™ = -kl ln[Z(N + L V. T)j VZiN* V, 7 )] 

(A2 3 32) 

= -kT[n(exp(-pAU N (r N n>r N )) N 
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where ( . . . ^represents a configurational average over the canonical TV particle system. This expression for 

|n ex , proposed by Widom, is the basis of the particle insertion method used in computer simulations to 
determine the excess chemical potential. The exponential term is either zero or one for hard core particles, and 


fl & = -kT In P(Q> v) (A2.3.33) 

where P(0, v) is the probability of forming a cavity of the same size and shape as the effective volume v 
occupied by the particle. 

A2.3.3.2 GRAND CANONICAL ENSEMBLE (p, V, T) 

This is a collection of systems at constant |u, Fand Tin which the number of particles can fluctuate. It is of 
particular use in the study of open systems. The PF 


Z(N,VJ) 7JV 


Z(,.( ? V, 7) = Vp V p Di A = V -Z' V (A2.3.34) 


where the absolute activity X = exp (\i/kT) and the fugacity z = <^ A/A 3 . The first equation shows that the 
grand PF is a generating function for the canonical ensemble PF. The chemical potential 

(A = kT lll(A 3 /</ini) + kT bz. (A2.3.35) 

For an ideal gas Z(N, V,T)= V N , we have seen earlier that 

}X idMl = k T lil( A Vw« ) +kTiVLf>. (A2.3.36) 

The excess chemical potential 

M° X = JW - J*""' = kT\ll(z/p) (A2.3.37) 

and (z/p) — > 1 as p — > 0. In the thermodynamic limit (v — > go), the pressure 


fcT. 
/? = — In S(/i, V, r). 


The characteristic thermodynamic equation for this ensemble is 


(A2.3.38) 


(A2.3.39) 


d(p V) = S dr + p dv + Y, N * d ^ 
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from which the thermodynamic properties follow. In particular, for a one-component system, 

(N) = kT\a{PV)fdi±\ TtV = kT{ -a In 3/diJr.v. (A2.3.40) 

Note that the average density p = (N)/V. Defining %(z) = (1/K)ln S, one finds that 

p/*r = x<z) 

(A2.3.41) 

p = zd X (z)/dz. 

On eliminating z between these equations, we get the equation of state P = P(p, T) and the virial coefficients 
by expansion in powers of the density. For an ideal gas, Z(N, V,T)= V N , E = exp(zK) and PlkT= z = %(z) = p. 

A2.3.3.3 THE VIRIAL COEFFICIENTS 

The systematic calculation of the virial coefficients, using the grand canonical ensemble by eliminating z 
between the equations presented above, is a technically formidable problem. The solutions presented using 
graph theory to represent multidimensional integrals are elegant, but impractical beyond the first six or seven 
virial coefficients due to the mathematical complexity and labour involved in calculating the integrals. 
However, the formal theory led to the development of density expansions for the correlation functions, which 
have proved to be extremely useful in formulating approximations for them. 

A direct and transparent derivation of the second virial coefficient follows from the canonical ensemble. To 
make the notation and argument simpler, we first assume pairwise additivity of the total potential with no 
angular contribution. The extension to angularly-independent non-pairwise additive potentials is 
straightforward. The total potential 

*Mr jV ) = J^mjinj) (A2.3.42) 

and the configurational PF assuming classical statistics is 


Z(N, V. T) = f exp ( - fi ^ u(nj)\ dr x . . . dry 


-In 


(A2.3.43) 


exp(-0w{rr>))dri < . * dry. 


The Mayer function defined by 

fijinj) = expC-^KyOV/tt - 1 (A2.3.44) 

figures prominently in the theoretical development [7]. It is a step function for hard spheres: 


-15- 


0"Ho~' HI ( A2 -3- 4 5) 

where a is the sphere diameter. More generally, for potentials with a repulsive core and an attractive well,^.. 
(r..) has the limiting values -1 and as r — > and oo, respectively, and a maximum at the interatomic distance 
corresponding to the minimum in u.. (r..). Substituting (1 +/.. (r..)) for exp(-p»..(r..)) decomposes the 
configurational PF into a sum of terms: 

Z{N.V,T)= (Y\(\+ fairy)) dr , . . . dr* 

= / [' * D fotofl + ■ " ] dr ' • - drff 

= V n + N t N - l > v *-2 f /|2(r|2 j dr , dr . 2 + . (A2.3.46) 


where 


jjfi2(rn)dr x 


h(T) = -(]/2V) // /i 2 (ri2)dridr 2 


The third step follows after interchanging summation and integration and recognizing that the N(N- l)/2 
terms in the sum are identical. The pressure follows from the relation 


>-*r(™%*n) 


ft.r 


+■ i— = -!■> (T) + ■ ■ ■ A2.3.47 

V l-[tf(N- l)/V]/ 2 (7") y 2 


/VAT 


A/- 1 
1 + h(T) + 


] 


In the thermodynamic limit (N^> oo, F^> oo with MF= p), this is just the virial expansion for the pressure, 
with I 2 (T) identified as the second virial coefficient 
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B 2 {T)= -\/{2V) jj fnirn)dr } 6T2 
= -]/(2V)jjf n (n 2 )dr ] dr^ 

The second step involves a coordinate transformation to the origin centred at particle 1 with respect to which 


the coordinates of particle 2 are defined. Since/ 12 depends only on the distance between particles 1 and 2, 
integration over the position of 1 gives a factor Fthat cancels the Fin the denominator: 


/ 


B 2 (T) = -0/2) I /i2(ru)dr, 2 . (A2.3.48) 


Finally, the assumed spherical symmetry of the interactions implies that the volume element dr 12 is 
4nrh dr\2- For angularly-dependent potentials, the second virial coefficient 


■// 


B 2 tT) = -]/(2^) I I /| 2 (ri:,^i,w 2 )dri 2 dwidu;2 (A2.3.49) 


where / 12 (r 12 , (Dp a> 2 ) is the corresponding Mayer /-function for an angularly-dependent pair potential w 12 
(r 12 , (o v co 2 ). 

The nth virial coefficient can be written as sums of products of Mayer /-functions integrated over the 
coordinates and orientations of n particles. The third virial coefficient for spherically symmetric potentials is 


*// 


Bi(T) = -l/(3fl J ) / / /,2(n:)/l3(H3)/23^3)d^]>df l3 . (A2.350) 


If we represent the /bond by a line with two open circles to denote the coordinates of the particle 1 and 2, 
then the first two virial coefficients can be depicted graphically as 


B 2 (T) = '1/2 • (A2.3.51) 


and 


A 


B 3 (T) = -1/3 f\ (A2.3.52) 


where blackening a circle implies integration over the coordinates of the particle represented by the circle. 
The higher virial coefficients can be economically expressed in this notation by extending it to include the 
symmetry number [5]. 
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For hard spheres of diameter a,/ 12 (r 12 ) = -1 for r < a and is zero otherwise. It follows that 

B\ [ * = /^ = 2ff ff S /3 (A2.3.53) 

where the second virial coefficient, abbreviated as Z? Q , is independent of temperature and is four times the 
volume of each sphere. This is called the excluded volume correction per molecule; the difference between 
the system volume V and the excluded volume of the molecules is the actual volume available for further 
occupancy. The factor four arises from the fact that a is the distance of closest approach of the centers of the 
two spheres and the excluded volume for a pair is the volume of a sphere of radius a. Each molecule 


contributes half of this to the second virial coefficient which is equal to b^. The third and fourth virial 
coefficients for hard spheres have been calculated exactly and are 

Bffhl = 5/8 (A2.3.54) 

B^/I?l = [2707* +43V2 - 4131 arc cos(l/3)]/448Q (A2.3.55) 

while the fifth, sixth and seventh virial coefficients were determined numerically by Rhee and Hoover [§]: 

B^/4 ^ 0,0336 (A2.3.56) 

B?*fb%** 0.0138. 

They are positive and independent of temperature. 

The virial series in terms of the packing fraction r| = npo /3 is then 

P/pkT = I + 4y + IO>/ 2 + 18.36^ + 28.25*j 4 +39.5^ + --- (A2.3.57) 

which, as noticed by Carnahan and Starling [9], can be approximated as 

P/pkT = 1 -H4rj+ I0r) 2 + IS*] 3 + 28jj 4 +40jj 5 + ■ ■ . (A2.3.58) 

This is equivalent to approximating the first few coefficients in this series by 

C^ fe {n - l) 2 + 3(* - 1) torn > 2. 

Assuming that this holds for all n enables the series to be summed exactly when 
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-L 


I i n ' >i 2 - n* 


- n) 2 


(A2.3.59) 


This is Carnahan and Starling's (CS) equation of state for hard spheres; it agrees well with the computer 
simulations of hard spheres in the fluid region. The excess Helmholtz free energy 

W , , , , , , 

' ' ' ' " (A2.3.60) 


X Ju \pkT } d-rj) : 


Figure A2.3.4 compares P/pkT- 1, calculated from the CS equation of state for hard spheres, as a function of 


the reduced density pa with the virial expansion. 

i r 



Figure A2.3.4 The equation of state P/pkT- 1, calculated from the virial series and the CS equation of state 
for hard spheres, as a function of r| = 7ipa A /6 where pa is the reduced density. 

These equations provide a convenient and accurate representation of the thermodynamic properties of hard 
spheres, especially as a reference system in perturbation theories for fluids. 
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A2.3.3.4 QUANTUM CORRECTIONS AND PATH INTEGRAL METHODS 


We have so far ignored quantum corrections to the virial coefficients by assuming classical statistical 
mechanics in our discussion of the configurational PF. Quantum effects, when they are relatively small, can 
be treated as a perturbation (Friedman 1995) when the leading correction to the PF can be written as 


1 —-Y— — iOi;ft 2 > 
24 f^ M { J 


(A2.3.61) 


where U ft = {# 2 Utitf&r*)is the curvature in the potential energy function for a given configuration. The 
curvature of the pair potential is greatest near the minimum in the potential, and is analogous to a force 

1/9 

constant, with the corresponding angular frequency given by (( U")/m^ . Expressing this as a wavenumber v 
in cm , the leading correction to the classical Helmholtz free energy of a system with a pairwise additive 
potential is given by 




(A2.3.62) 


which shows that the quantum correction is significant only if the mean curvature of the potential corresponds 


to a frequency of 1000 cm or more [4] and the temperature is low. Thus the quantum corrections to the 
second virial coefficient of light atoms or molecules He 4 , H 2 , D 2 and Ne are significant [6, 8]. For angularly- 
dependent potentials such as for water, the quantum effects of the rotational contributions to the second virial 
coefficient also contribute to significant deviations from the classical value (10 or 20%) at low temperatures 
[10]. 

When quantum effects are large, the PF can be evaluated by path integral methods [11]. Our exposition 
follows a review article by Gillan [12]. Starting with the canonical PF for a system of N particles 

Q[N, V, T) = Tr c"*" = £>|c-*%) (A2 . 3 .63) 

pr 

where p = 1/^rand the trace is taken over a complete set of orthonormal states \ri). If the states \n) are the 
eigenstates of the Hamiltonian operator, the PF simplifies to 


Q(N>V ¥ r) = J^vxp(-ti£ n ) 


(A2.3.64) 


where the E are the eigenvalues. The average value of an observable A is given by 

(A) = Q(N, V, T)- f J^{n\A\n) exp(-fiE n ) (A2 3 65) 
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where Ais the operator corresponding to the observable A. The above expression for (A) is the sum of the 
expectation values of A in each state weighted by the probabilities of each state at a temperature T. The 
difficulty in evaluating this for all except simple systems lies in (a) the enormous number of variables required 
to represent the state of TV particles which makes the sum difficult to determine and (b) the symmetry 
requirements of quantum mechanics for particle exchange which must be incorporated into the sum. 

To make further progress, consider first the PF of a single particle in a potential field V(x) moving in one 
dimension. The Hamiltonian operator 


W = -^lil + y(.0. (A2.3.66) 

2m o.t j 


The eigenvalues and eigenfunctions are E and § n (x) respectively. The density matrix 

pU,x'; /5) = £ <l>„(x)tp n (x') exp(-fiE„) (A2.3.67) 

is a sum over states; it is a function of x and x f and the temperature. The PF 

Q(V, T) = ^expt-jtf^) = [dxpijcx : 0) (A2.3.68) 

which is equivalent to setting x = x' in the density matrix and integrating over x. The average value of an 
observable A is then given by 


{A} = J Ap(x,x'h =l <dx 


(A2.3.69) 


where ^p(x, x*) x = x , means ^operates on p(x, x'; P) and then x' is set equal to x. A differential equation for the 
density matrix follows from differentiating it with respect to P, when one finds 

■jUu,jr':£) = -Hp[x 9 x';p) (A2.3.70) 

dp 

which is the Bloch equation. This is similar in form to the time-dependent Schrodinger equation: 

ift-^ = fffi(x t t). (A2.3.71) 

Qt 
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The time evolution of the wavefunction \\f(x, i) is given by 

iKJM) = / dx K{x, x: t)ir{x\ 0) (A2.3.72) 

where K{x, x'; is the propagator which has two important properties. The first follows from differentiating 
the above equation with respect to time, when it is seen that the propagator obeys the equation 

ill = HK{x,x :t) (A2.3.73) 

df 

with the boundary condition 

K(X,X f :0) = &(X-X\ (A2.3.74) 

Comparing this with the Bloch equation establishes a correspondence between t and i$h. Putting t = ipfi, one 
finds 

p{x,X f \ fi) = K(x, X\ -ifift). (A2.3.75) 

The boundary condition is equivalent to the completeness relation 

^0 ft U)0 n (jf') = &{X - j'J. (A2.3.76) 

ft 

The second important property of the propagator follows from the fact that time is a serial quantity, and 


(A2.3.77) 


tiix.tt +/ 2 ) = fdx' , K(x,x":! 2 )f(x".i l ) 

= fdx'6x"K(x,x"; t 2 )K(x", x\ii)if{x',0) 

which implies that 

K U, x'\ t\ + tt)= j d.\" K(x, x"; i2)KU' t x'; t } ). (A2.3.78) 

A similar expression applies to the density matrix, from its correspondence with the propagator. For example, 
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/>(*, /; fi) = j dx"f>[x t x"; fi{2)p{x'\x f ; fi/2) (A2.3.79) 


and generalizing this to the product of P factors at the inverse temperature p/P, 

p{.\^x p , ^) = / d.t|d_\2...dr^_ ] p(.r y+ j:|;/f/P)pU|, x 2 \$jP) 

Here P is an integer. By joining the ends x Q and x p and labelling this x, one sees that 
Q(N, V,T) = jdxp( X > X ' t P) 


(A2.3.80) 


=h 


(A2.3.81) 


i„.6xpp(xux 1 ;PfP)piX2*xy 9 p/P)„*p(xp-ux P ;P/P) 


which has an obvious cyclic structure of P beads connecting P density matrices at an inverse temperature of 
p/P. This increase in temperature by a factor P for the density matrix corresponds to a short time 
approximation for the propagator reduced by the same factor P. 

To evaluate the density matrix at high temperature, we return to the Bloch equation, which for a free particle 
(T(x) = 0) reads 

d '<-"-" > = »j«W-f:fl (A2 . 3 .82, 

dfi 2m <\x 2 

which is similar to the diffusion equation 

^ = -£>** (A2.3.83) 

d/ dx 2 


The solution to this is a Gaussian function, which spreads out in time. Hence the solution to the Bloch 
equation for a free particle is also a Gaussian: 

1/2 

I IJ7 ! I 

(A2.3.84) 


"•'■■"•{sp) ->h^-' ,)! -H 
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where Fis zero. The above solution also applies to a single-particle, ID system in a constant potential V. For a 
single particle acted on by a potential F(x), we can treat it as constant in the region between x and x', with a 
value equal to its mean, when (3 is small. One then has the approximation 

Klxtih 1 / I 2fi7r 2P J 

which leads to the following expression for the PF of a single particle in a potential V(x): 

P/2 


Qi 




(A2.3.86) 


Feynman showed that this is exact in the limit P — » go. The expression for Q p has the following important 
characteristics. 

(a) It is the classical PF for a polymer chain of P beads coupled by the harmonic force constant 


mP 


k = — I (A2.3.87) 


with each bead acted on by a potential V(x)/P. The spring constant comes from the kinetic energy operator in 
the Hamiltonian. 

(b) The cyclic polymer chain has a mean extension, characterized by the root mean square of its radius of 
gyration 

A 2 = P~ l l Y^ Ax^ J (A2.3.88) 


Hg**) 


where Ax = x -jt and *is the instantaneous centre of mass of the chain, defined by 

s s ' J 


= P- I j^x s . (A2.3.89) 


*■=] 


-24- 

For free particles, the mean square radius of gyration is essentially the thermal wavelength to within a 
numerical factor, and for a ID harmonic oscillator in the P —> go limit, 


A 2 = (fimwlr^ifiliWnWcQlhifihiJnfl) - 1] (A2.3.90) 

where co is the frequency of the oscillator. As T^> 0, this tends to the mean square amplitude of vibration in 
the ground state. 

(c) The probability distribution of finding the particle at x 1 is given by 

^ /„./dr 2 ...cUv.fflp[-a72H.c ; ».| -*,) 2 i P- ] V(x y )] 
^'"/.../d*i...d* P «(pl-(t/2)C^-i-J,) 2 + ''- , VU.)] (AZ3 - 91) 

which shows that it is the same as the probability of finding one particular bead at x 1 in the classical 
isomorph, which is the same as IIP times the probability of finding any particular one of the P beads at Xy 
The isomorphism between the quantum system and the classical polymer chain allows a variety of techniques, 
including simulations, to study these systems. 

The eigenfunctions of a system of two particles are determined by their positions x and j, and the density 
matrix is generalized to 

p(x, y\ n\ y\ fi) = ^(Ma;, y)£Xp(-pE ?T )tt)*(x , y ) (A2.3.92) 

with the PF given by 

Q(2, V, T)= J &JL-dypi^y;n,y;fi). (A2.3.93) 

In the presence of a potential function U(x,y), the density matrix in the high-temperature approximation has 
the form 

^[Uix,y)-Uiv\y)]\. 


(A2.3.94) 


+ 
2P 1 


-25- 
Using this in the expression for the PF, one finds 


(A2.3.95) 


0(2, V, T) *(^n ) jdxiiy]...6xpdyp 


x cxp f - /f j^[ 


.2. /i ^r/. .. ^2 


d/2)i:(x v+ i - ^y + (i/2)jrdb + , - tt ) a 


H-p-'c/c 


avpKv)] J 


There is a separate cyclic polymer chain for each of the two particles, with the same force constant between 
adjacent beads on each chain. The potential acting on each bead in a chain is reduced, as before, by a factor 
IIP but interacts only with the corresponding bead of the same label in the second chain. The generalization to 
many particles is straightforward, with one chain for each particle, each having the same number of beads 
coupled by harmonic springs between adjacent beads and with interactions between beads of the same label 
on different chains. This, however, is still an approximation as the exchange symmetry of the wavefunctions 
of the system is ignored. 

The invariance of the Hamiltonian to particle exchange requires the eigenfunctions to be symmetric or 
antisymmetric, depending on whether the particles are bosons or fermions. The density matrix for bosons and 
fermions must then be sums over the corresponding symmetric and anti-symmetric states, respectively. 
Important applications of path integral simulations are to mixed classical and quantum systems, e.g. an 
electron in argon. For further discussion, the reader is referred to the articles by Gillan, Chandler and 
Wolynes, Berne and Thirumalai, Alavi and Makri in the further reading section. 

We return to the study of classical systems in the remaining sections. 

A2.3.3.5 1 D HARD RODS 

This is an example of a classical non-ideal system for which the PF can be deduced exactly [13]. Consider N 
hard rods of length d in a ID extension of length L which takes the place of the volume in a three-dimensional 
(3D) system. The canonical PF 

where the configurational PF 


= J f ... j F(cxp(-/BHy(^))df, ...dr*. 


(A2.3.97) 
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For hard rods of length d 


"ijCij) = 


n ' , (A2.3.98) 

r ( j > a 


so that an exponential factor in the integrand is zero unless r.>d. Another restriction in ID systems is that 
since the particles are ordered in a line they cannot pass each other. Hence dl2<r^<L- Nd + d/2 9 3d/2 < rl 
<L- Nd + 3 d/2 etc. Changing variables to x = r 1 - d/2, x = r 2 - 3d/2, . . . jc N = r-(N- \)dl2, we have 


J Jft<\ t <l,-Ntt 


Z(N ¥ L,T)= J ... | dvi . . .dj A r = (L - AM) - (A2.3.99) 

The pressure 

P = kT(dllLZ/dL)K. T = pAT/(l -pd) (A2.3.100) 

where the density of rods p = NIL. This result is exact. Expanding the denominator (when pd < 1) leads to the 
virial series for the pressure: 

P/pkT = 1 + dp + d 2 p 2 + ■ ■ ■ . (A2.3.101) 

The nth virial coefficient #^ HR = d n ~ 1 is independent of the temperature. It is tempting to assume that the 
pressure of hard spheres in three dimensions is given by a similar expression, with d replaced by the excluded 
volume 6q, but this is clearly an approximation as shown by our previous discussion of the virial series for 
hard spheres. This is the excluded volume correction used in van der Waals' equation, which is discussed 
next. Other ID models have been solved exactly in [ 14 , 15 and 16]}. 

A2.3.3.6 MEAN-FIELD THEORY— VAN DER WAALS' EQUATION 

van der Waals' equation corrects the ideal gas equation for the attractive and repulsive interactions between 
molecules. The approximations employed are typical of mean-field theories. We consider a simple derivation, 
assuming that the total potential energy U N (r") of the TV molecules is pairwise additive and can be divided 
into a reference part and a remainder in any given spatial configuration {r^}. This corresponds roughly to 
repulsive and attractive contributions, and 

U N {r N ) = I/JO + UJV*) (A2.3.102) 
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where £/ j (r A )is the energy of the reference system and 


UjjFV ) - y u fj f l r ij) (A2.3.103) 

is the attractive component. This separation assumes the cross terms are zero. In the above sum, there are N(N 
- l)/2 terms that depend on the relative separation of the particles. The total attractive interaction of each 
molecule with the rest is replaced by an approximate average interaction, expressed as 


UFivs ) = N{N 2V U f ufiin*)**^ d, l3 ft- - fl ip 


(A2.3.104) 


where 


a= -U" 


^(r i2 )4jtr 7 l2 d n2 . (A2.3.105) 


ThePF 


Z(N. V, T) = /exp(-£[/ jV (r")dr* 

J (A2.3.106) 

** zxp{fi<iN 2 fV)Z*(N, y T T) 

in which Zr(N, V, T) is the configurational PF of the reference system. The Helmholtz free energy and 
pressure follow from the fundamental equations for the canonical ensemble 

A =-kT\nQ{N< V,T) = A Q +aN 2 fV (A2.3.107) 

and 

F = jfcnin Z(N, V, T)fT) N j = F° - aN 2 f V\ (A2.3.108) 

Assuming a hard sphere reference system with the pressure given by 

P i] = NkT/(V - Nh) (A2.3.109) 


-28- 


we immediately recover van der Waals' equation of state 


P = NkTf{V - N h) - nN 7 / V 2 (A2.3.1 10) 

since p = N/V. An improvement to van der Waals' equation would be to use a more accurate expression for 
the hard sphere reference system, such as the CS equation of state discussed in the previous section. A more 
complete derivation that includes the Maxwell equal area rule was given in [18]. 

A2.3.3.7 THE LAW OF CORRESPONDING STATES 

van der Waals' equation is one of several two-parameter, mean-field equations of state (e.g. the Redlich- 
Kwong equation) that obey the law of corresponding states. This is a scaling law in which the thermodynamic 
properties are expressed as functions of reduced variables defined in terms of the critical parameters of the 
system. 

A theoretical basis for the law of corresponding states can be demonstrated for substances with the same 
intermolecular potential energy function but with different parameters for each substance. Conversely, the 
experimental verification of the law implies that the underlying intermolecular potentials are essentially 
similar in form and can be transformed from substance to substance by scaling the potential energy 
parameters. The potentials are then said to be conformal. There are two main assumptions in the derivation: 

(a) quantum effects are neglected, i.e. classical statistics is assumed; 


(b) the total potential is pairwise additive 

U.v(r y ) = J^Myfoj) (A2.3.111) 

><j 

and characterized by two parameters: a well depth 8 and a size a 

utj(r t j) = etinjM. (A2.3.112) 

The configurational PF 

Z(N, V, T) = /■■■/* cxp(-j6ft/.v(r A ') dr, dr 2 . . ■ dr.v 

= a* N Z*{N, V\ T") 

where F* = Via , r* = £77s and Z* (TV, F, 21 is the integral in the second line. 


(A2.3.113) 


(A2.3.114) 
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The excess Helmholtz free energy 

A^ = -kT\w{Z{N. V. T)/V N ) = -kT\n(Z*(N> V\ T*)fV m *) 
= -NkT[p*f(T\p u )] 

where p* = pa , p = N/Vand 

f{T\p*) = [Z*(N< V\T*)] ]/S /N. (A2.3.115) 

It follows that atoms or molecules interacting with the same pair potential sc|)(r../a), but with different s and a, 

have the same thermodynamic properties, derived from A QX /NkT, at the same scaled temperature J* and scaled 
density p*. They obey the same scaled equation of state, with identical coexistence curves in scaled variables 
below the critical point, and have the same scaled vapour pressures and second virial coefficients as a function 
of the scaled temperature. The critical compressibility factor P V IRT is the same for all substances obeying 
this law and provides a test of the hypothesis. Table A2.3.3 lists the critical parameters and the 
compressibility factors of rare gases and other simple substances. 

Table A2.3.3 Critical constants. 


Substance j (K) P (atm) V (cm 3 mol" 1 ) P V IRT 


He 5.21 2.26 57.76 0.305 

Ne 44.44 26.86 41.74 0.307 


Ar 150.7 48 75.2 0.292 

Kr 209.4 54.3 92.2 0.291 

Xe 289.8 58.0 118.8 0.290 

CH 4 190.6 45.6 98.7 0.288 

H 33.2 12.8 65.0 0.305 


150.7 

48 

209.4 

54.3 

289.8 

58.0 

190.6 

45.6 

33.2 

12.8 

126.3 

33.5 

154.8 

50.1 

304.2 

72.9 

405.5 

111.3 

647.4 

218.3 


N 2 126.3 33.5 126.3 0.292 

O 154.8 50.1 78.0 0.308 

C0 2 304.2 72.9 94.0 0.274 

NH 3 405.5 111.3 72.5 0.242 

H 2 647.4 218.3 55.3 0.227 
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The compressibility factor Z c = P^VJRT^ of the rare gases Ar, Kr and Xe at the critical point is nearly 0.291, 
and they are expected to obey a law of corresponding states, but one very different from the prediction of van 
der Waals' equation discussed earlier, for which the compressibility factor at the critical point is 0.375. 
Deviations of Z Q from 0.291 for the other substances listed in the table are small, except for C0 2 , NH 3 or 
H 2 0, for which the molecular charge distributions contribute significantly to the intermolecular potential and 
lead to deviations from the law of corresponding states. The effect of hydrogen bonding in water contributes 
to its anomalous properties; this is mimicked by the charge distribution in the SPC/E or other models 
discussed in section A2. 3. 2 . The pair potentials of all the substances listed in the table, except C0 2 , NH 3 or 
H 2 0, are fairly well represented by the Lennard- Jones potential — see table A2.3.1 . The lighter substances, He, 
H 2 and to some extent Ne, show deviations due to quantum effects. The rotational PF of water in the vapour 
phase also has significant contribution from this source. 

The equation of state determined by Z*(7V, V*, T*) is not known in the sense that it cannot be written down as 
a simple expression. However, the critical parameters depend on s and a, and a test of the law of 
corresponding states is to use the reduced variables T R , P R and F R as the scaled variables in the equation of 
state. Figure A2.3.5 b) illustrates this for the liquid-gas coexistence curves of several substances. As first 
shown by Guggenheim [19], the curvature near the critical point is consistent with a critical exponent P closer 
to 1/3 rather than the 1/2 predicted by van der Waals' equation. This provides additional evidence that the law 
of corresponding states obeyed is not the form associated with van der Waals' equation. Figure A2. 3. 5 (b) 
shows that P/pkTis approximately the same function of the reduced variables T R and P R 


I'fpkT = fa\ P R ) (A2.3.116) 

for several substances. 

Figure A2.3.6 illustrates the corresponding states principle for the reduced vapour pressure P R and the second 
virial coefficient as functions of the reduced temperature T R showing that the law of corresponding states is 
obeyed approximately by the substances indicated in the figures. The usefulness of the law also lies in its 
predictive value. 
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Figure A2.3.5 (a) P/pkTas a function of the reduced variables T R and P R and (b) coexisting liquid and 
vapour densities in reduced units p v as a function of T v for several substances (after [19]). 
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Figure A2.3.6 (a) Reduced second virial coefficient B 2 /V c as a function of T R and (b) In P R versus 1/7 R for 
several substances (after [19]). 
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A2.3.4 CORRELATION FUNCTIONS OF SIMPLE FLUIDS 

The correlation functions provide an alternate route to the equilibrium properties of classical fluids. In 
particular, the two-particle correlation function of a system with a pairwise additive potential determines all of 
its thermodynamic properties. It also determines the compressibility of systems with even more complex 
three-body and higher-order interactions. The pair correlation functions are easier to approximate than the PFs 
to which they are related; they can also be obtained, in principle, from x-ray or neutron diffraction 
experiments. This provides a useful perspective of fluid structure, and enables Hamiltonian models and 
approximations for the equilibrium structure of fluids and solutions to be tested by direct comparison with the 
experimentally determined correlation functions. We discuss the basic relations for the correlation functions 
in the canonical and grand canonical ensembles before considering applications to model systems. 

A2.3.4.1 CANONICAL ENSEMBLE 

The probability of observing the configuration {r^} in a system of given N, Fand Tis 


exp(-££/(r*) 


P(r*) = - 


Z(AM\n 


(A2.3.117) 


where Z(N, V, T) is the configurational PF and 


f P(t n )At* = 


(A2.3.118) 


The probability function P(r) cannot be factored into contributions from individual particles, since they are 
coupled by their interactions. However, integration over the coordinates of all but a few particles leads to 
reduced probability functions containing the information necessary to calculate the equilibrium properties of 
the system. 

Integration over the coordinates of all but one particle provides the one-particle correlation function: 

pjJVi) = N I ..♦ J P(r w )dr 2 ^ + dr y = (^(r, - r f t )\ (A2.3.119) 

where (D) CE denotes the average value of a dynamical variable D in the canonical ensemble. Likewise, the 
two- and ^2-particle reduced correlation functions are defined by 
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(A2.3.120) 


pjfVl ■ T2) = N(H - 1) r . . / Pir») dr } . . . drjv 
/ ,V - \ 

^■(n T V ) = ,V! f . . / P(r A ) dr ir+l . . .dr jV (A2.3.121 

(N-n)lJ J 

where « is an integer. Integrating these functions over the coordinates r 1? r 2 , . . . , r N gives the normalization 
factor N\/(N- n)\. In particular, 




(A2.3.122) 


r| dr-2 = W — L 


For an isotropic fluid, the one-particle correlation function is independent of the position and 

p^\ Vl ) = N/V =p (A2.3.123) 

which is the fluid density. The two-particle correlation function depends on the relative separation between 
particles. Assuming no angular dependence in the pair interaction, 

dJVi.n) = pf(r) = p 2 g(r) (A2.3.124) 

where r = |r 12 |. The second relation defines the radial distribution function g(r) which measures the 

correlation between the particles in the fluid at a separation r. Thus, we could regard pg(r) as the average 
density of particles at r 12 , given that there is one at the origin r^. Since the fluid is isotropic, the average local 

density in a shell of radius r and thickness Ar around each particle is independent of the direction, and 


g(r) = 


(N(r.Ar)) 


[(N-])/V]V^ 


(A2.3.125) 
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where ^ shell = 4nr Ar is the volume of the shell and (N(r, Ar)) is the average number of particles in the shell. 
We see that the pair distribution function g(r) is the ratio of the average number (N(r, Ar)) in the shell of 
radius r and thickness Ar to the number that would be present if there were no particle interactions. At large 
distances, this interaction is zero and g(r)—> 1 as r — » oo. At very small separations, the strong repulsion 
between real atoms (the Pauli exclusion principle working again) reduces the number of particles in the shell, 
and g(r) — » as r — » 0. For hard spheres of diameter a, g(r) is exactly zero for r< a. Figure A2.3.7 illustrates 
the radial distribution function g(r) of argon, a typical monatomic fluid, determined by a molecular dynamics 
(MD) simulation using the Lennard- Jones potential for argon at T* = 0.72 and p* = 0.84, and figure A2.3.8 
shows the corresponding atom-atom radial distribution functions g 00 (r), g oh (r) and g hh (V) of the SPC/E 
model for water at 25 °C also determined by MD simulations. The correlation functions are in fair agreement 
with the experimental results obtained by x-ray and neutron diffraction. 



S«t 


r/sigma 


Figure A2.3.7 The radial distribution function g(r) of a Lennard- Jones fluid representing argon at T* 
and p* = 0.844 determined by computer simulations using the Lennard- Jones potential. 
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Figure A2.3.8 Atom-atom distribution functions g 00 (r), g h( r ) an d &hh( r ) ^ or ^^^ water at 25°C determined 
by MD simulations using the SPC/E model. Curves are from the leftmost peak: g oh , g hh , g oo are red, green, 
blue, respectively. 

Between the limits of small and large r, the pair distribution function g(r) of a monatomic fluid is determined 
by the direct interaction between the two particles, and by the indirect interaction between the same two 
particles through other particles. At low densities, it is only the direct interaction that operates through the 
Boltzmann distribution and 


g{r)*cxp(-tiii(r)) 


(low- dens i ly approximation). 


(A2.3.126) 


At higher densities, the effect of indirect interactions is represented by the cavity function y(r,p, 2), which 
multiplies the Boltzmann distribution 


tf(r) *exp(-/?u (r))> (i\p t T). 


(A2.3.127) 


j(r,p,7) — > 1 as p — » 0, and it is a continuous function of r even for hard spheres at r = a, the diameter of the 
spheres. It has a density expansion similar to the virial series: 


x 


yt^p>T) = H-J^ yti {r>T)p\ 


(A2.3.128) 


n=\ 
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The coefficient of p is the convolution integral of Mayer /-functions: 


(A2.3.129) 


y ] (r.T)=$J u (r u )j; 2 (r li2 )dr : , 


.A 


In the graphical representation of the integral shown above, a line represents the Mayer function/r ••) between 
two particles i andy . The coordinates are represented by open circles that are labelled, unless it is integrated 
over the volume of the system, when the circle representing it is blackened and the label erased. The black 
circle in the above graph represents an integration over the coordinates of particle 3, and is not labelled. The 
coefficient of p 2 is the sum of three terms represented graphically as 

)/ 2 (fj) = I 1+2 i^h + P^Q + P^Q (A2.3.130) 

I 2 1 2 1 2 1 2 

In general, each graph contributing to y (r,T) has n black circles representing the number of particles through 
which the indirect interaction occurs; this is weighted by the nth power of the density in the expression for g 
(r). This observation, and the symmetry number of a graph, can be used to further simplify the graphical 
notation, but this is beyond the scope of this article. The calculation or accurate approximation of the cavity 
function are important problems in the correlation function theory of non-ideal fluids. 

For hard spheres, the coefficients y n (f) are independent of temperature because the Mayer /-functions, in 
terms of which they can be expressed, are temperature independent. The calculation of the leading term y^(r) 
is simple, but the determination of the remaining terms increases in complexity for larger n. Recalling that the 
Mayer /-function for hard spheres of diameter a is -1 when r<a, and zero otherwise, it follows that ^(r, T) 
is zero for r > 2a. For r < 2a, it is just the overlap volume of two spheres of radii 2a and a simple calculation 
shows that 


yAr) = i 


HH;K(;) 5 } — 


(A2.3.131] 


This leads to the third virial coefficient for hard spheres. In general, the nth virial coefficient of pairwise 
additive potentials is related to the coefficient y (r,T) in the expansion of g(r), except for Coulombic systems 
for which the virial coefficients diverge and special techniques are necessary to resum the series. 

The pair correlation function has a simple physical interpretation as the potential of mean force between two 
particles separated by a distance r 

w[r) = -kT\ngir) = u(r) - kT ]nyO ) (A2.3.132) 
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As p -^> 0, y(r) -^> 1 and w(r) -^ u(r). At higher densities, however, w(r) ^ u(r). To understand its significance, 
consider the mean force between two fixed particles at r 1 and r 2 , separated by the distance r = |r 1 - r 2 |. The 
mean force on particle 1 is the force averaged over the positions and orientations of all other particles, and is 
given by 


_ -/.../ Vj U x iT*) cxp(-fiUy(r") J n ... dry 
Z(N t V, T) 


= kTV { In I / ^ / exp£-/JU jV (i- v )dr 3 . . . dr.v (A2.3.133) 

L a'(w - n J 


= JtV V L lng(/ J = -Vj wj(r) 


(2), 


where we have used the definition of the two-particle correlation function, p^ } (r 1? r 2 ), and its representation 
as p 2 g(r) for an isotropic fluid in the last two steps. It is clear that the negative of the gradient of w(r) is the 
force on the fixed particles, averaged over the motions of the others. This explains its characterization as the 
potential of mean force. 

The concept of the potential of mean force can be extended to mixtures and solutions. Consider two ions in a 
sea of water molecules at fixed temperature Tand solvent density p. The potential of mean force w(r) is the 

direct interaction between the ions u..(r) = u*j(r)+q .q . /r, plus the interaction between the ions through water 

u l J 

molecules which is -kT In y..(r). Here N if( r ) is the short-range potential and qq./r is the Coulombic potential 

u l J 

between ions. Thus, 

Wjj(r) = u* t (r) +<y^>/f - kT In y(r). (A2.3.134) 

At large distances, l{ ij(r) and w.(r) —qqJzr where s is the macroscopic dielectric constant of the solvent. 

u l J 

This shows that the dielectric constant s of a polar solvent is related to the cavity function for two ions at large 
separations. One could extend this concept to define a local dielectric constant z(r) for the interaction between 
two ions at small separations. 

The direct correlation function c(r) of a homogeneous fluid is related to the pair correlation function through 
the Ornstein-Zernike relation 


■/ 


h(r [2 ) = r.{r u ) + p I t (r L3 )A(r 32 ) dr 3 (A2.3.135) 


where h(r) = g(r) - 1 differs from the pair correlation function only by a constant term. h(r) — > as r -^ go and 
is equal to -1 in the limit of r = 0. For hard spheres of diameter a, h(r) = -1 inside the hard core, i.e. r<a. 
The function c(r) has the range of the intermolecular potential u{f), and is generally easier to approximate. 
Figure A2.3.9 shows plots of g(r) and c{r) for a Lennard- Jones fluid at the triple point T* = 0.72, p* = 0.84, 
compared to $u(r) = §(r)/€. 
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Figure A2.3.9 Plots of g(r) and c(r) versus r for a Lennard- Jones fluid at T* = 0.72, p* = 0.84, compared to 
Pn(r). 

The second term in the Ornstein-Zernike equation is a convolution integral. Substituting for h(r) in the 
integrand, followed by repeated iteration, shows that h{r) is the sum of convolutions of c-functions or 'bonds' 
containing one or more c-functions in series. Representing this graphically with c(r) = o o, we see that 


h (r) = rj a ■ p n * o H-/j n * * o t ■ 


(A2.3.136) 


Z^(r) - c(r) is the sum of series diagrams of obonds, with black circles signifying integration over the 
coordinates. It represents only part of the indirect interactions between two particles through other particles. 
The remaining indirect interactions cannot be represented as series diagrams and are called bridge diagrams. 
We now state, without proof, that the logarithm of the cavity function 


[ny(r)=hir)-c(r)+B(r) 


(A2.3.137) 


where the bridge diagram B(r) has the/-bond density expansion 


BCi,) = 


IX, 


(A2.3.138) 
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Only the first term in this expansion is shown. It is identical to the last term shown in the equation for j 2 (r), 
which is the coefficient of p 2 in the expansion of the cavity function y{r). 

It follows that the exact expression for the pair correlation function is 


j?00 = Hr) + 1 = exp(-0H(r) \ Mr) - cir) < Bir)). 


(A2.3.139) 


Combining this with the Ornstein-Zernike equation, we have two equations and three unknowns h(r),c(r) and 
B(r) for a given pair potential u(r). The problem then is to calculate or approximate the bridge functions for 
which there is no simple general relation, although some progress for particular classes of systems has been 
made recently. 

The thermodynamic properties of a fluid can be calculated from the two-, three- and higher-order correlation 
functions. Fortunately, only the two-body correlation functions are required for systems with pairwise additive 
potentials, which means that for such systems we need only a theory at the level of the two-particle 
correlations. The average value of the total energy 

(E) = {KE) + {E tal ) + (U*) (A2.3.140) 

where the translational kinetic energy (KE) = 3/2NkT is determined by the equipartition theorem. The 
rotational, vibrational and electronic contributions to (E^) are separable and determined classically or 
quantum mechanically. The average potential energy 

_ /.../i/.v( r ,)e*p(-/W,(r,v))^ (A2 . 3 , 41) 

Z(/V,V, T) 

For a pairwise additive potential, each term in the sum of pair potentials gives the same result in the above 
expression and there are N(N- l)/2 such terms. It follows that 

(£/*(r jV )} = Z(A r V T) 

^V y i r >*> (A2.3.142) 

= - / ■■■ / u [ 2(ri2)p il *(r],r2)dT ] dr2. 

For a fluid p^ 2 \r^ r 2 ) = p 2 g(r 12 ) where the number density p = N/V. Substituting this in the integral, changing 
to relative coordinates with respect to particle 1 as the origin, and integrating over r 1 to give V, leads to 

--■■-■' (A2.3.143) 


.V 


= 2 / / W|2Cr| -^ (/|2 ^ dr|2i 
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The pressure follows from the virial theorem, or from the characteristic thermodynamic equation and the PF. 
It is given by 


which is called the virial equation for the pressure. The first term is the ideal gas contribution to the pressure 
and the second is the non-ideal contribution. Inserting g(r) = exp (-fiu(r))y(r) and the density expansion of the 
cavity function y(r) into this equation leads to the virial expansion for the pressure. The nth virial coefficient, 
B (7), given in terms of the coefficients y (r, T) in the density expansion of the cavity function is 

B n {T) = -—— [ V "' 3 ^ cxp{-/Mr))y lf - a {j-, r)dr l2 . (A2.3.145) 

okT J or 


The virial pressure equation for hard spheres has a simple form determined by the density p, the hard sphere 
diameter a and the distribution function at contact g(a+). The derivative of the hard sphere potential is 
discontinuous at r = a, and 


,Kr)=cxp(-/Mi-))=[ 


1 J" < (7 

r > a 


is a step function. The derivative of this with respect to r is a delta function 

dA 6u{r) 

-j- = HS-j— exp<-0( ff (i"» = Sir - *+) 
or or 

and it follows that 

Inserting this expression in the virial pressure equation, we find that 

P 2t 

— - = 1 + -^-pff^(ff+) (A2.3.146) 

where we have used the fact that j/(a) = g(o+) for hard spheres. The virial coefficients of hard spheres are thus 
also related to the contact values of the coefficients y n (o) in the density expansion of the cavity function. For 
example, the expression y 2 (r) for hard spheres given earlier leads to the third virial coefficient fi] = 5^/8. 
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We conclude this section by discussing an expression for the excess chemical potential in terms of the pair 
correlation function and a parameter X, which couples the interactions of one particle with the rest. The idea 
of a coupling parameter was introduced by Onsager [ 20 ] and Kirkwood [21]. The choice of X depends on the 
system considered. In an electrolyte solution it could be the charge, but in general it is some variable that 
characterizes the pair potential. The potential energy of the system 


■v 
t/ A r(r- v ;A) = U r w-i(r"~ l ) + kJ2 {i lj(r\j) (A2.3.147) 

J-\ 

where the particle at r 1 couples with the remaining TV- 1 particles and < X < 1. The configurational PF for 
this system 

tf- i\i*JY (A2.3.148) 


ZOY. V*T;X)= j ... / exp(-/f[/, v {r^;A)dr 


When X= 1, we recover the PF Z(N, V, T) for the fully coupled system. In the opposite limit of X = 0, Z(N, V, 
T; 0) = VZ(N- 1, V,T), where Z(N- l,V,T) refers to a fully coupled system ofN- 1 particles. Our previous 
discussion of the chemical potential showed that the excess chemical potential is related to the logarithm of 
the ratio Z(N 9 V, T; 0)/VZ(N- 1,V,T) for large N: 


t i cx = -kl \n['/.(N t V t T)l VZ(N - 1, V, T)] 

(A2.3.149) 




The integral is easily simplified for a pairwise additive system, and one finds 
dZOV, V. T: k) 


J J j = 2 

= 0(N - 1) J . , . j u ]2 (r [2 ) exp(-^{ r \ A))dr jV . 


dA 

; = 2 


Dividing by Z(N, V, T; X) and recalling the definition of the correlation function 

dlnZWV.D P f f ( . a , .... . 


dA 


-*/■■■/« 
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For a fluid, p^(rp r 2? ^) = P 2 &( r i2> ^)- Changing to relative coordinates, integrating over the coordinates of 
particle 1 and inserting this in the expression for the excess chemical potential leads to the final result 

fi™ = p j dA / w(/ !2 )^(r| 2 ; A)dr| 2 . (A2.3.150) 

This is Kirkwood's expression for the chemical potential. To use it, one needs the pair correlation function as 
a function of the coupling parameter X as well as its spatial dependence. For instance, if A, is the charge on a 
selected ion in an electrolyte, the excess chemical potential follows from a theory that provides the 
dependence of g(r 12 ; X) on the charge and the distance r 12 . This method of calculating the chemical potential 
is known as the Guntelburg charging process, after Guntelburg who applied it to electrolytes. 

By analogy with the correlation function for the fully coupled system, the pair correlation function g(r; X) for 
an intermediate values of X is given by 

g(r u ; A) = cxp(-filu ] 2(r u )y(r u , p, 7; a) (A2.3.151) 

where y(r, p,T; X) is the corresponding cavity function for the partially coupled system. Kirkwood derived an 
integral equation for g(r; X) in terms of a three-body correlation function approximated as the product of two- 
body correlation functions called the superposition approximation. The integral equation, which can be solved 
numerically, gives results of moderate accuracy for hard spheres and Lennard- Jones systems. A similar 
approximation is due to Born and Green [23, 24] and Yvon [22]. Other approximations for g(r) are discussed 
later in this chapter. 

The presence of three-body interactions in the total potential energy leads to an additional term in the internal 
energy and virial pressure involving the three-body potential ^^3^1' r 2' r 3^' an< ^ ^ e corres P on ding three- 
body correlation function g^\r^ r 2 , r 3 ). The expression for the energy is then 


(U N ) p 

(A2.3.152) 




The virial equation for the pressure is also modified by the three-body and higher-order terms, and is given in 
general by 


P = 


pkT - — / Yi r * ' Vf U*(7-- V )J (A2.3.153) 


where D is the dimensionality of the system. 


-44- 

42.3.4.2 GR4A/D CANONICAL ENSEMBLE (y, V, T) 

The grand canonical ensemble is a collection of open systems of given chemical potential \i, volume Fand 
temperature T, in which the number of particles or the density in each system can fluctuate. It leads to an 
important expression for the compressibility K^of a one-component fluid: 

pkTK-r = ■ — - (A2.3.154) 

(N) 

where the compressibility can be determined experimentally from light scattering or neutron scattering 
experiments. Generalizations of the above expression to multi-component systems have important 
applications in the theory of solutions [25]. 

It was shown in section A2. 3. 3. 2 that the grand canonical ensemble (GCE) PF is a generating function for the 
canonical ensemble PF, from which it follows that correlation functions in the GCE are just averages of the 
fluctuating numbers TV and TV- 1 

/fp ( "(n))oci:dr ] = (Af} 

(A2.3.155) 

jj{p a \r u T 2 ))c.aL dr L dr 2 = {N - I). 

We see that 

(j[{p a \r x ,T2)) - <^ (l Vi)Hp'V2)}]dridr 2 = {N{N - ])) - (N) 2 

where the subscript GCE has been omitted for convenience. The right-hand side of this is just ( N ) - { N) - 
(N). The pair correlation function g(r 1? r 2 ) is defined by 

te (2) (r,,r 2 )} = (p {X) (r ] mp l,i (r2)}^r lt r 2 ) (A2.3.156) 


and it follows that 


For an isotropic fluid, the singlet density is the density of the fluid, i.e. (p^\r)) = (N)/V= p, and the pair 
correlation function g(r^r^) depends on the interparticle separation r 12 = |r 1 - r 2 |. Using this in the above 

integral, changing to relative coordinates with respect to particle 1 as the origin and integrating over its 

coordinates, one finds 
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P 2 V f[g(r) - l]dr l2 = (N 2 } - {N} 2 - (N), 


Division by (TV) and taking note of the fluctuation formula (A2.3.144) for the compressibility leads to the 
fundamental relation 


/ 


pk'fKr = \+P J [gO'n) - 1]^ (A2.3.157) 

called the compressibility equation which is not limited to systems with pairwise additive potentials. 
Integrating the compressibility with respect to the density provides an independent route to the pressure, aside 
from the pressure calculated from the virial equation. The exact pair correlation function for a given model 
system should give the same values for the pressure calculated by different routes. This serves as a test for the 
accuracy of an approximate g(r) for a given Hamiltonian. 

The first term in the compressibility equation is the ideal gas term and the second term, the integral of g(r)-l 
= h(r), represents the non-ideal contribution due to the correlation or interaction between the particles. The 
correlation function h(r) is zero for an ideal gas, leaving only the first term. The correlations between the 
particles in a fluid displaying a liquid-gas critical point are characterized by a correlation length £ that 
becomes infinitely large as the critical point is approached. This causes the integral in the compressibility 
equation and the compressibility K^to diverge. 

The divergence in the correlation length Q is characterized by the critical exponent v defined by 

f = ir-Iil"* (A2.3.158) 

while the divergence in the compressibility, near the critical point, is characterized by the exponent y as 
discussed earlier. The correlation function near the critical region has the asymptotic form [ 26 ] 

h(r) ss - J ^P (A2.3.159) 

where D is the dimensionality and r| is a critical exponent. Substituting this in the compressibility equation, it 
follows with D = 3 that 


*s^- J ' f 


kTK T ^^~ li I /{X)x l - i} xdx (A2.3.160) 


where x = r/Q. Inserting the expressions for the temperature dependence of the compressibility and the 
correlation length near the critical point, one finds that the exponents are related by 
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y = v(2 - q). (A2.3.161) 

Table A2.3.4 summarizes the values of these critical exponents in two and three dimensions and the 
predictions of mean field theory. 

Table A2.3.4 The critical exponents y, v and r|. 


Exponent 

MFT 

Ising (of = 2) 

Numerical (of = 3) 

V 

y 

1/2 

1 



1 
7/4 

1/4 

0.630 ± 0.001 
1.239 ±0.002 
0.03 


The compressibility equation can also be written in terms of the direct correlation function. Taking the Fourier 
transform of the Ornstein-Zernike equation 

/it*) = r{k) + pc(k)k{k) (A2.3.162) 

where we have used the property that the Fourier transform of a convolution integral is the product of Fourier 
transforms of the functions defining the convolution. Here the Fourier transform of a function/(r) is defined 
by 

f(k)= I /{r)exp(-]AT)di\ (A2.3.163) 

From the Ornstein-Zernike equation in Fourier space one finds that 

]+ph(k) = [\-pHk)]- ] 

when k = 0, l+p^(0) is just the right-hand side of the compressibility equation. Taking the inverse, it follows 
that 


fi(^j =[l-^(0)] =l-pj c(r) dr. 


(A2.3.164) 


At the critical point P(9 P/dp) T = 0, and the integral of the direct correlation function remains finite, unlike the 
integral of h(r). 
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A2.3.4.3 INTEGRAL EQUATION APPROXIMATIONS FOR A FLUID 

The equilibrium properties of a fluid are related to the correlation functions which can also be determined 
experimentally from x-ray and neutron scattering experiments. Exact solutions or approximations to these 
correlation functions would complete the theory. Exact solutions, however, are usually confined to simple 
systems in one dimension. We discuss a few of the approximations currently used for 3D fluids. 

Successive n and n + 1 particle density functions of fluids with pairwise additive potentials are related by the 
Yvon-Born-Green (YBG) hierarchy [6] 


V ] p in Hr n )^^^ + ^F^p^(r^) + $fF l ^ i p^ ] \r nVi )dr >,il (A2.3.165) 

where J F ext 1 = -V^ is the external force, Fy = -V 1 w(r 1 .) and r"= {^ 1? ^2' r 3 • • • r n) * s ^ e set °f coor dinates of 
n particles. The simplest of these occurs when n= 1, and it relates the one- and two-particle density functions 
of a fluid in an inhomogeneous field, e.g. a fluid near a wall: 


- / Ptaln; 


kT\l\p{r u [4>\) = -V|0(r,) - / p(r 2 \r t * [0])V lW (r l2 )dr 2 (A2.3.166) 

where pC^kpM) = p^(r 1? r 2 ;[(|)]) pO^M) and the superscript 1 is omitted from the one-particle local 
density. For an homogeneous fluid in the absence of an external field, F QXt = and p( n \r n ) = p n g( n \r n ) and 
the YBG equation leads to 

J-2 J 

Kirkwood derived an analogous equation that also relates two- and three-particle correlation functions but an 
approximation is necessary to uncouple them. The superposition approximation mentioned earlier is one such 
approximation, but unfortunately it is not very accurate. It is equivalent to the assumption that the potential of 
average force of three or more particles is pairwise additive, which is not the case even if the total potential is 
pair decomposable. The YBG equation for n = 1, however, is a convenient starting point for perturbation 
theories of inhomogeneous fluids in an external field. 

We will describe integral equation approximations for the two-particle correlation functions. There is no 
single approximation that is equally good for all interatomic potentials in the 3D world, but the solutions for a 
few important models can be obtained analytically. These include the Percus-Yevick (PY) approximation [27, 
28 ] for hard spheres and the mean spherical (MS) approximation for charged hard spheres, for hard spheres 
with point dipoles and for atoms interacting with a Yukawa potential. Numerical solutions for other 
approximations, such as the hypernetted chain (HNC) approximation for charged systems, are readily 
obtained by fast Fourier transform methods 
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The Ornstein-Zernike equation 


/ 


'M'"L2) = ^('"12) + P f c{T[$)b(rtt) dr% (A2.3.168) 

and the exact relation for the pair correlation function 

g(r) =A(r)+ I =cxp[(-Mr)) + Wr)-c(r) + B(r)] (A2.3.169) 

provide a convenient starting point for the discussion of these approximations. This equivalent to the exact 
relation 

c(r) = fiutr) - lnjj(r)+A(0 + s ( r ) (A2.3.170) 

for the direct correlation function. As r — » oo, c(r) — > - (3 w(r) except at T= T Q . Given the pair potential u{f), 
we have two equations for the three unknowns h(r),c(r) and B(r); one of these is the Ornstein-Zernike relation 
and the other is either one of the exact relations cited above. Each of the unknown functions has a density 
expansion which is the sum of integrals of products of Mayer/-functions, which motivates their 
approximation by considering different classes of terms. In this sense, the simplest approximation is the 
following. 

(a) Hypernetted chain approximation 
This sets the bridge function 

B{r) =0. (A2.3.171) 

It is accurate for simple low valence electrolytes in aqueous solution at 25°C and for molten salts away from 
the critical point. The solutions are obtained numerically. A related approximation is the following. 

(b) Percus-Yevick (PY) approximation 

In this case [27, 28], the function exp[(/z(r)-c(r)] in the exact relation for g(r) is linearized after assuming B(r) 
= 0, when 

£(r) - exp{-/Jw(r))[l + h(r) - c{r)] =exp(-0«{O)[*(r) -c{r)l (A2.3.172) 

Rearranging this, we have the PY approximation for the direct correlation function 

c(r) = /(r)y(r). (A2.3.173) 

This expression is combined with the Orstein-Zernick equation to obtain the solution for c(r). 
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For hard spheres of diameter a, the PY approximation is equivalent to c(r) = for r > a supplemented by the 
core condition g(r) = for r < a. The analytic solution to the PY approximation for hard spheres was obtained 
independently by Wertheim [32] and Thiele [33]. Solutions for other potentials (e.g. Lennard- Jones) are 


obtained numerically. 

(c) Mean spherical approximation 

In the MS approximation, for hard core particles of diameter a, one approximates the direct correlation 
function by 

c(r) = -fltt(r) foff > <X (A2.3.174) 

and supplements this with the exact relation 

g{r) =0 for r < j. (A2.3.175) 

The solution determines c(r) inside the hard core from which g(r) outside this core is obtained via the 
Ornstein-Zernike relation. For hard spheres, the approximation is identical to the PY approximation. Analytic 
solutions have been obtained for hard spheres, charged hard spheres, dipolar hard spheres and for particles 
interacting with the Yukawa potential. The MS approximation for point charges (charged hard spheres in the 
limit of zero size) yields the Debye-Huckel limiting law distribution function. 

It would appear that the approximations listed above are progressively more drastic. Their accuracy, however, 
is unrelated to this progression and depends on the nature of the intermolecular potential. Approximations that 
are good for systems with strong long-range interactions are not necessarily useful when the interactions are 
short ranged. For example, the HNC approximation is accurate for simple low valence electrolytes in aqueous 
solution in the normal preparative (0-2 M) range at 25°C, but fails near the critical region. The PY 
approximation, on the other hand, is poor for electrolytes, but is much better for hard spheres. The relative 
accuracy of these approximations is determined by the cancellation of terms in the density expansions of the 
correlation functions, which depends on the range of the intermolecular potential. 


A2.3.5 EQUILIBRIUM PROPERTIES OF NON-IDEAL FLUIDS 

A2.3.5.1 INTEGRAL EQUATION AND SCALED PARTICLE THEORIES 

Theories based on the solution to integral equations for the pair correlation functions are now well developed 
and widely employed in numerical and analytic studies of simple fluids [6]. Further improvements for simple 
fluids would require better approximations for the bridge functions B(r). It has been suggested that these 
functions can be scaled to the same functional form for different potentials. The extension of integral equation 
theories to molecular fluids was first accomplished by Chandler and Andersen [30] through the introduction 
of the site-site direct correlation function cJr) between atoms in each molecule and a site-site Ornstein- 
Zernike relation called the reference interaction site 
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model (RISM) equation [31] . Approximations, corresponding to the closures for simple monatomic fluids, 
enable the site-site pair correlation functions hJr) to be obtained. The theory has been successfully applied 
to simple molecules and to polymers. 

Integral equation approximations for the distribution functions of simple atomic fluids are discussed in the 
following. 


(a) Hard spheres 

(i) PY and MS approximations. The two approximations are identical for hard spheres, as noted earlier. The 
solution yields the direct correlation function inside the hard core as a cubic polynomial: 


c(r) — -A| - &\m{fj*j) -{l/2)k]7](r/a) } r <<j (A2.3.176) 

= r > a\ 

In this expression, the packing fraction r| = 7ipa 3 /6, and the other two parameters are related to this by 

i, = (1 +2ij) a /(l - >lf *2 = -d +n/2f/{l - »;) 4 . 

The solution was first obtained independently by Wertheim [ 32 ] and Thiele [33] using Laplace transforms. 
Subsequently, Baxter [ 34 ] obtained the same solutions by a Wiener-Hopf factorization technique. This 
method has been generalized to charged hard spheres. 

The pressure from the virial equation is calculated by noting that h(r)-c(r) is continuous at r = a, and c(r) = 
for r > a. It follows that 

ft(ff - cO+) = A(ff-> - r(cr-) 

and since c(a+) = and h(o-) = -1, we have g(c+) = 1+/z(g+) = c(a-). This gives an expression for the 
pressure of hard spheres in the PY approximation in terms of c(a-), equivalent to the virial pressure equation 
A2.3.146 

= ] - —PG 3 C(<J-). (A2.3.177) 


pkT 3 

Setting r = a in the solution for c(r), it follows that 

P v _ L +2jj h 3ry 2 


(A2.3.178) 
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where the subscript V denotes the pressure calculated from the virial pressure equation. The pressure from the 
compressibility equation follows from the expression for (dP/dp) r in terms of the integral of the direct 
correlation function c(r); the upper limit of this integral is r = a in the PY approximation for hard spheres 
since c(r) = for r > a. One finds that the pressure P Q from the compressibility equation is given by 

— — = -, (A2.3.179) 

pkT (I-*))* 

The CS equation for the pressure is found to be the weighted mean of the pressure calculated from the virial 
and compressibility equations: 

/> cs = (l/3)/\ + (2/3)P c . 


Figure A2.3.10 compares the virial and pressure equations for hard spheres with the pressure calculated form 
the CS equations and also with the pressures determined in computer simulations. 



Figure A2.3.10 Equation of state for hard spheres from the PY and HNC approximations compared with the 
CS equation~(-,-,-). C and V refer to the compressibility and virial routes to the pressure (after [6]). 

The CS pressures are close to the machine calculations in the fluid phase, and are bracketed by the pressures 
from the virial and compressibility equations using the PY approximation. Computer simulations show a 
fluid-solid phase transition that is not reproduced by any of these equations of state. The theory has been 
extended to mixtures of hard spheres with additive diameters by Lebowitz [35], Lebowitz and Rowlinson 
[35], and Baxter [36]. 
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(ii) Scaled particle theory. The virial equation for the pressure of hard spheres is determined by the contact 
value g(a+) of the pair correlation functions, which is related to the average density of hard spheres in contact 
with a spherical cavity of radius a, from which the spheres are excluded. The fixed cavity affects the fluid in 
the same way that a hard sphere at the centre of this cavity would influence the rest of the fluid. Reiss, Frisch 
and Lebowitz [37] developed an approximate method to calculate this, and found that the pressure for hard 
spheres is identical to the pressure from the compressibility equation in the PY approximation given in 
equation A2. 3. 178 . 

The method has been extended to mixtures of hard spheres, to hard convex molecules and to hard 
spherocylinders that model a nematic liquid crystal. For mixtures (m subscript) of hard convex molecules of 
the same shape but different sizes, Gibbons [38] has shown that the pressure is given by 


I 


AB 


B 2 C 


pkT l-£ m O-^) 2 3(1-^) 3 


(A2.3.180) 


where 


_ (A2.3.181) 

i t 

where R f is the radius of particle i averaged over all orientations, V f and S f are the volume and surface area of 
the particle i, respectively, and x f is its mole fraction. The pressure corresponding to the PY compressibility 
equation is obtained for parameters corresponding to hard sphere mixtures. We refer the reader to the review 
article by Reiss in the further reading section for more detailed discussions. 

(Hi) Gaussian statistics. Chandler [ 39 ] has discussed a model for fluids in which the probability P(N,v) of 
observing TV particles within a molecular size volume v is a Gaussian function of TV. The moments of the 
probability distribution function are related to the ?z -particle correlation functions g^ n \r^r 2 , . . . ,f* n ), and 

a „ = {N(N-l)...N-n+]))=p" /*... /V"V, Odn-dTV 

The inversion of this leads to an expression for P(N,v): 

P(H. v) = Yi-\r- N — - — 


involving all of the moments of the probability distribution function. The Gaussian approximation implies that 

only the first two mome 
function, are sufficient 


only the first two moments (TV) and (TV 2 ) , which are determined by the density and the pair correlation 
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to determine the probability distribution P(N,v). Computer simulation studies of hard spheres by Crooks and 
Chandler [40] and even water by Hummer et al [41] have shown that the Gaussian model is accurate at 
moderate fluid densities; deviations for hard spheres begin to occur at very low and high densities near the 
ideal gas limit and close to the transition to a solid phase, respectively. 

The assumption of Gaussian fluctuations gives the PY approximation for hard sphere fluids and the MS 
approximation on addition of an attractive potential. The RISM theory for molecular fluids can also be 
derived from the same model. 

(b) Strong electrolytes 

The long-range interactions between ions lie at the opposite extreme to the harsh repulsive interactions 
between hard spheres. The methods used to calculate the thermodynamic properties through the virial 
expansion cannot be directly applied to Coulombic systems since the integrals entering into the virial 
coefficients diverge. The correct asymptotic form of the thermodynamic properties at low concentrations was 
first obtained by Debye and Hiickel in their classic study of charged hard spheres [ 42 ] by linearizing the 
Poisson-Boltzmann equation as discussed below. This immediately excludes serious consideration of ion 
pairing, but this defect, especially in low dielectric solvents, was taken into account by Bjerrum [43], who 
assumed that all oppositely charged ions within a distance e + e_/2s£rwere paired, while the rest were free. 
The free ions were treated in the Debye-Hiickel approximation. 

The Debye treatment is not easily extended to higher concentrations and special methods are required to 


incorporate these improvements. One method, due to Mayer [44], resums the virial expansion to cancel out 
the divergences of the integrals. Mayer obtained the Debye-Hiickel limiting law and the first correction to this 
as a convergent renormalized second virial coefficient that automatically incorporates the effect of ion pairing. 
Improvements due to Outhwaite, Bhuyian and others, involve modifications of Debye and HiickePs original 
treatment of the Poisson-Boltzmann equation to yield a modified Poisson-Boltzmann (MPB) equation for the 
average electrostatic potential ij/^r) of an ion. We refer to the review article by Outhwaite (1974) in the 
further reading section for a detailed discussion. 

Two widely used theories of electrolytes at room temperature are the MS and HNC approximations for the 
pair correlation functions. The approximations fail or are less successful near the critical point. The solutions 
to the HNC approximation in the usual laboratory concentration range are obtained numerically, where fast 
Fourier transform methods are especially useful [45]. They are accurate for low valence electrolytes in 
aqueous solution at room temperature up to 1 or 2 M. However, the HNC approximation does not give a 
numerical solution near the critical point. The MS approximation of charged hard spheres can be solved 
analytically, as first shown by Waisman and Lebowitz [46]. This is very convenient and useful in mapping out 
the properties of electrolytes of varying charges over a wide range of concentrations. The solution has been 
extended recently to charged spheres of unequal size [47] and to sticky charged hard spheres [48, 49]. Ebeling 
[ 50 ] extended Bjerrum's theory of association by using the law of mass action to determine the number of ion 
pairs while treating the free ions in the MS approximation supplemented with the second ionic virial 
coefficient. Ebeling and Grigoro [ 51 ] located a critical point from this theory. The critical region of 
electrolytes is known to be characterized by pairing and clustering of ions and it has been observed 
experimentally that dimers are abundant in the vapour phase of ionic fluids. The nature of the critical 
exponents in this region, whether they are classical or non-classical, and the possibilities of a crossover from 
one to the other are currently under study [52, 53, 54, 55 and 56]. Computer simulation studies of this region 
are also under active investigation [52, 58 and 59]. Koneshan and Rasaiah [60] have observed clusters of 
sodium and chloride ions in simulations of aqueous sodium chloride solutions under supercritical conditions. 


-54- 


Strong electrolytes are dissociated into ions that are also paired to some extent when the charges are high or 
the dielectric constant of the medium is low. We discuss their properties assuming that the ionized gas or 
solution is electrically neutral, i.e. 


J^C;*; =0 (A2.3.182) 

where c is the concentration of the free ion / with charge e- and a is the number of ionic species. The local 

charge density at a distance r from the ion i is related to the ion concentrations c. and pair distribution 

functions g{r) by 
y 

PiU) =^C ;?;&;(}). (A2.3.183) 

;=■ 

The electroneutrality condition can be expressed in terms of the integral of the charge density by recognizing 
the obvious fact that the total charge around an ion is equal in magnitude and opposite in sign to the charge on 
the central ion. This leads to the zeroth moment condition 


-/ 


Pt(r)dr. (A2.3.184) 


The distribution functions also satisfy a second moment condition, as first shown by Stillinger and Lovett 
[61]: 

3E a kT 


= 2_^<i e i f PiKr)r or (A2.3.185) 

where 8 Q is the dielectric constant of the medium in which the ions are immersed. The Debye-Hiickel limiting 
law and the HNC and MS approximations satisfy the zeroth and second moment conditions. 

The thermodynamic properties are calculated from the ion-ion pair correlation functions by generalizing the 
expressions derived earlier for one-component systems to multicomponent ionic mixtures. For ionic solutions 
it is also necessary to note that the interionic potentials are solvent averaged ionic potentials of average force: 

ua {r, T.P) = irfAr: T, P) + — . (A2.3.186) 

J s r 

Here u*Jr,T,P) is the short-range potential for ions, and £g is the dielectric constant of the solvent. The 
solvent averaged potentials are thus actually free energies that are functions of temperature and pressure. The 
thermodynamic properties calculated from the pair correlation functions are summarized below. 
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(i) The virial equation provides the osmotic coefficient measured in isopiestic experiments: 


i=l j=] 


(A2.3.187) 


(ii) The generalization of the compressibility equation, taking into account electroneutrality, 

fl In )/+ 1 


Sine cG± 
where 


- 1 (A2.3.188) 


=/<*- 


G±= t(K±- Ddr (A2.3.189) 


provides the concentration dependence of the mean activity coefficient y determined experimentally from cell 
EMFs. 

(iii) The energy equation is related to the heat of dilution determined from calorimetric measurements 


r „ Iff f im,jir)] ... 


(A2.3.190) 


For an ionic solution 


3r c r [ a In 7 J 3r 


(A2.3.191) 


and d In s Q /d In r= -1.3679 for water at 25°C. 

(iv) The equation for the excess volume is related to the partial molar volumes of the solute determined from 
density measurements 


r = l /=| 
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In an ionic solution 


(A2.3.192) 


m 




■*■ 


a[/9«rj(r)] 


c)/ 5 


(A2.3.193) 


where d In 8 Q /d In P Q = 47. 1 x 10~ 6 for water at 25°C. 

The theory of strong electrolytes due to Debye and Hiickel derives the exact limiting laws for low valence 
electrolytes and introduces the idea that the Coulomb interactions between ions are screened at finite ion 
concentrations. 

(c) The Debye-Huckel theory 

The model used is the RPM. The average electrostatic potential \\f f (r) at a distance r away from an ion i is 
related to the charge density p.(r) by Poisson's equation 




(A2.3.194) 


Debye and Hiickel [ 42 ] assumed that the ion distribution functions are related to \\ffr) by 
which is an approximation. This leads to the PB equation 


vViO) = < 


4tt ^ 


10 


r < cr. 


(A2.3.195) 


Linearizing the exponential, 


tfyfr)= 1 -fiejirrtr) 


(A2.3.196) 


in the PB equation leads to the Debye-Hiickel differential equation: 


2 lM')={*" 


r < <j 
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(A2.3.197) 


where k is defined by 


* 2= ^X/'^ (A2.3.198) 


sot: 


The solution to this differential equation is 


f/f/ CXp(-*r(r -rr)) 

ft7<tt = | S*kJ> (I'W) r > * (A2.3.199) 

r < or 

which obeys the zeroth moment or electroneutrality condition, but not the second moment condition. 

The mean activity coefficient y + _ of a single electrolyte in this approximation is given by 

In y^ = — (A2.3.200) 

where a is the effective distance of closest approach of the ions, and A and B are constants determined by the 
temperature 7 and the dielectric constant of the solvent s Q . This expression is widely used to calculate the 
activity coefficients of simple electrolytes in the usual preparative range. The contributions of the hard cores 
to non-ideal behaviour are ignored in this approximation. 

When kg <K1 (i.e. at very low concentrations), we have the Debye-Hiickel limiting law distribution function: 

gijir) = 1 - fie f ej *x?(-Kr)fe r (r > n) ^ 

= (r <<t) 

which satisfies both the zeroth and second moment conditions. It also has an interesting physical 
interpretation. The total charge Pif) &r in a shell of radius r and thickness dr around an ion is 

P t {f)dr = p t (r)4nr z dr = -* 2 ^r exp(-*r) dr (A2.3.202) 

which has a maximum at a distance r = 1/k, which is called the Debye length or the radius of the 'ionic 
atmosphere'. Each ion is pictured as surrounded by a cloud or 'ionic atmosphere' whose net charge is 
opposite in sign to the central ion. The cloud charge P f (r) has a maximum at r = 1/k. The limiting law 
distribution function implies that the electrostatic potential 

^(r) = ?;Cxp(-Kr)f£ r. (A2.3.203) 
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Expanding the exponential, one finds for small k r that 

#i 00 = — 7T7T* (A2.3.204) 

The first term is the Coulomb field of the ion, and the second is the potential due to the ion atmosphere at an 
effective distance equal to 1/k. For a univalent aqueous electrolyte at 298 K, 

where C is the total electrolyte concentration in moles per litre. 

The thermodynamic properties derived from the limiting law distribution functions are 


NkT Ssr 


inrj 


~ ' 3r n^| (A2.3.205) 

d In 


In y ± = In y m (A2.3.206) 

= HS (A2.3.207) 


24jt<" 


NkT NkT 24,tc 


(A2.3.208) 


where c = E c is the total ionic concentration and the superscript HS refers to the properties of the 
corresponding uncharged hard sphere system. Debye and Hiickel assumed ideal behaviour for the uncharged 
system ((|) HS = y HS = 1 and ^ ex - HS = 0). 

The Debye-Huckel limiting law predicts a square-root dependence on the ionic strength /= \ITLcz? of the 
logarithm of the mean activity coefficient (log y ± ), the heat of dilution (FF^IVT) and the excess volume (F ex ); it 
is considered to be an exact expression for the behaviour of an electrolyte at infinite dilution. Some 
experimental results for the activity coefficients and heats of dilution are shown in figure A2.3. 11 for aqueous 
solutions of NaCl and ZnS0 4 at 25°C; the results are typical of the observations for 1-1 (e.g.NaCl) and 2-2 
(e.g. ZnS0 4 ) aqueous electrolyte solutions at this temperature. 
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Figure A2.3.11 The mean activity coefficients and heats of dilution of NaCl and ZnS0 4 in aqueous solution 
at 25°C as a function of |z + z_| V I, where / is the ionic strength. DHLL = Debye-Hiickel limiting law. 
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The thermodynamic properties approach the limiting law at infinite dilution, but deviate from it at low 
concentrations, in different ways for the two charge types. Evidence from the ionic conductivity of 2-2 
electrolyte solutions suggests that the negative deviations from the limiting law observed for these solutions 
are due to ion pairing or association. The opposite behaviour found for aqueous 1-1 electrolytes, for which 
ion pairing is negligible at room temperature, is caused by the finite size of the ions and is the excluded 
volume effect. The Debye-Hiickel theory ignores ion association and treats the effect of the sizes of the ions 
incompletely. The limiting law slopes and deviations from them depend strongly on the temperature and 


dielectric constant of the solvent and on the charges on the ions. An aqueous solution of sodium chloride, for 
instance, behaves like a weak electrolyte near the critical temperature of water because the dielectric constant 
of the solvent decreases rapidly with increasing temperature. 

As pointed out earlier, the contributions of the hard cores to the thermodynamic properties of the solution at 
high concentrations are not negligible. Using the CS equation of state, the osmotic coefficient of an uncharged 
hard sphere solute (in a continuum solvent) is given by 


d HS = 1 + —I L (A2.3.209) 

where r| = ca 16. For a 1 M solution this contributes 0.03 to the deviation of the osmotic coefficient from 
ideal behaviour. 

(d) Mayer 's theory 

The problem with the virial expansion when applied to ionic solutions is that the virial coefficients diverge. 
This difficulty was resolved by Mayer who showed how the series could be resumed to cancel the 
divergencies and yield a new expansion for a charged system. The terms in the new series are ordered 
differently from those in the original expansion, and the Debye-Hiickel limiting law follows as the leading 
correction due to the non-ideal behaviour of the corresponding uncharged system. In principle, the theory 
enables systematic corrections to the limiting law to be obtained as at higher electrolyte concentrations. The 
results are quite general and are applicable to any electrolyte with a well defined short-range potential «^(r), 

besides the RPM electrolyte. 

The principle ideas and main results of the theory at the level of the second virial coefficient are presented 
below. The Mayer/ -function for the solute pair potential can be written as the sum of terms: 

Mr) = f*(T) HH f*(r)) J2 -i-fiWjfwr (A2.3.210) 

where /£(r)is the corresponding Mayer/ -function for the short-range potential tt* :i {r) which we represent 
graphically as '° °f and p = MkT. Then the above expansion can be represented graphically as 
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(A2.3.211) 


t° %epresents/..(r), the Mayer/-function for the pair potential u.{r), and O^ww^orepresents the 

ij .. ij 

Coulomb potential multiplied by -p. The graphical representation of the virial coefficients in terms of Mayer/ 

-bonds can now be replaced by an expansion in terms of/ 4 bonds (i° °i) and Coulomb bonds ((Ww^o). 

Each/-bond is replaced by an/ 4 -bond and the sum of one or more Coulomb bonds in parallel with or without 
an /'-bond in parallel. The virial coefficients then have the following graphical representation: 
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(A2.3.212) 




K 


line 


+ -HT 


where HT stands for higher-order terms. There is a symmetry number associated with each graph which we 
do not need to consider explicitly in this discussion. Each black circle denotes summation over the 
concentration c f and integration over the coordinates of species i. The sum over all graphs in which the/-bond 

is replaced by an/^-bond gives the free energy A QX of the corresponding uncharged system. The effect of the 
Coulomb potential on the expansion is more complicated because of its long range. The second term in the 

expansion of the second virial coefficient is the bare Coulomb bond multiplied by -p. If we multiply this by a 
screening function and carry out the integration the result is finite, but it contributes nothing to the overall free 
energy because of electroneutrality. This is because the contribution of the charge e f from the single Coulomb 
bond at a vertex when multiplied by c. and summed over i is zero. The result for a graph with a cycle of 
Coulomb bonds, however, is finite. Each vertex in these graphs has two Coulomb bonds leading into it and 

instead of ce. we have Y*ce? (which appears as a factor in the definition of k 2 ). This is not zero unless the 
ion concentration is also zero. Mayer summed all graphs with cycles of Coulomb bonds 
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and found that this leads to the Debye-Hiickel limiting law expression for the excess free energy! The 
essential mechanism behind this astonishing result is that the long-range nature of the Coulomb interaction 
requires that the ions be considered collectively rather than in pairs, triplets etc, which is implied by the 
conventional virial expansion. The same mechanism is also responsible for the modification of the interaction 
between two charges by the presence of others, which is called 'screening'. The sum of all chains of Coulomb 
bonds between two ions represents the direct interaction as well as the sum of indirect interactions (of the 
longest range) through other ions. The latter is a subset of the graphs which contribute to the correlation 


function h.{r) 

v 


■gyir)- 


1 and has the graphical representation 


Q^^w^^O 


.A.n. 



Explicit calculation of this sum shows that it is the Debye screened potential 


tfij (r) = = -&*i*j exp{-tfr)/£flr. (A2.3.213) 

Going beyond the limiting law it is found that the modified (or renormalized) virial coefficients in Mayer's 
theory of electrolytes are functions of the concentration through their dependence on k. The ionic second 
virial coefficient i? 2 ( K ) is given by [ 62 ] 




(A2.3.214) 


-^■(,)72]dr. 


This expression contains the contribution of the short-range potential included earlier in^ ex , so that the 
excess free energy, to this level of approximation, is 


/ A™ \ k 3 B 2 (k) 

\ = + - • . (A2.3.215) 

\NkT/ mLL _ Bl line c 

This is called the DHLL+i?- approximation. On carrying out the integrations over qJr) and qjr) /2 and using 
the electroneutrality condition, this can be rewritten as [ 63 ] 

= t (A2.3.216) 
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where 


S 2 (X)= -T^D^V / K«p(-jS«y(0+wW) " l]dr. (A2.3.217) 

1 r=l j = l J 


This has the form of a second virial coefficient in which the Debye screened potential has replaced the 
Coulomb potential. Expressions for the other excess thermodynamic properties are easily derived. 

Mayer's theory is formally exact within the radius of convergence of the virial series and it predicts the 
properties characteristic of all charge types without the need to introduce any additional assumptions. 
Unfortunately, the difficulty in calculating the higher virial coefficients limits the range of concentrations to 
which the theory can be applied with precision. The DHLL+i? 2 approximation is qualitatively correct in 
reproducing the association effects observed at low concentrations for higher valence electrolytes and the 
excluded volume effects observed for all electrolytes at higher concentrations. 

(e) The MS approximation 

The MS approximation for the RPM, i.e. charged hard spheres of the same size in a continuum dielectric, was 
solved by Waisman and Lebowitz [46] using Laplace transforms. The solutions can also be obtained [ 47 ] by 
an extension of Baxter's method to solve the PY approximation for hard spheres and sticky hard spheres. The 
method can be further extended to solve the MS approximation for unsymmetrical electrolytes (with hard 
cores of unequal size) and weak electrolytes, in which chemical bonding is mimicked by a delta function 
interaction. We discuss the solution to the MS approximation for the symmetrically charged RPM electrolyte. 


For the RPM of an electrolyte the MS approximation is 

Cij (r) = -flui}(r) = -eiejfsnr forr > a (A2.3.218) 

with the exact relation 

htj{r) = gjjir) - I , = -1 for/ < ff. (A2.3.219) 

The generalization of the Orstein-Zernike equation to a mixture is 

fUj(r [2 } = Cij(r\2) l j^Pk I c ik {ru)ktj{ri2)dr } (A2.3.220) 

where i and j refer to the ionic species (positive and negative ions), p. is the concentration (or number density) 
of the /th species and a is the number of ionic species. Taking Fourier transforms and using the convolution 
theorem puts this in matrix form 


-64- 


H =C-HPC (A2.3.221) 

where H and C are matrices whose elements are the Fourier transforms of h.. and c, and P is a diagonal 

matrix whose elements are the concentrations p. of the ions. The correlation function matrix is symmetric 
since c + _ = c_ + and h + _ = h_ + . The RPM symmetrically charged electrolyte has the additional simplification 

(A2.3.222) 
p + = p_= fi/2 

and 

C++ = C— A+- = A— (A2.3.223) 

where e is the magnitude of the charge and p is the total ion concentration. Defining the sum and difference 
functions 

F, = (F + + F.)/2 and F [y = (F-- F.)/2 (A2.3.224) 

of the direct and indirect correlation functions c and h H , the Ornstein-Zernike equation separates into two 
equations 

A t = C s +pc s */f s (A2.3.225) 

/'» = O) - pen *h D (A2.3.226) 

where* stands for a convolution integral and the core condition is replaced by 


th = -l 


Ad = 


for < r < o. 


(A2.3.227) 


The MS solution for c & turns out to be identical to the MS (or PY) approximation for hard spheres of diameter 
a; it is a cubic polynomial in rla. The solution for c D is given by 


cv> = 


fie" 


E /r Tn 

Bf)k Tf 


2B 


- "£)] 


< r < a 


r > a 


(A2.3.228) 


where 
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B = 


[(l+j) - fl + 2jO ,/2 ] 


(A2.3.229) 


and x = kg. The excess energy of a fully dissociated strong electrolyte in the MSA approximation is 


NkT 


4 H TC<T- 


(A2.3.230) 


Integration with respect to P, from p = to finite P, leads to the excess Helmholtz free energy: 

A ™ _ A ™m [dj +3.t 2 + 2 - 2(1 + lv) 3/2 ] 


NkT 


\2jrpa* 


(A2.3.231) 


where ^4 ex ' HS is the excess free energy of hard spheres. The osmotic coefficient follows from this and is given 
by 


^=^ IS + 


[3a- -i- 3.r(l -2a-) 1 -' 7 - 2(1 + 2x) 3 ' 2 < 2] 


(A2.3.232) 


where § is the osmotic coefficient of the uncharged hard spheres of diameter a in the MS or PY 
approximation. The excess Helmholtz free energy is related to the mean activity coefficient y ± by 


(A2.3.233) 


and the activity coefficient from the energy equation, calculated from § and A ex ' is given by 

In y , b = In y Hi + - - 1-; i. (A2.3.234) 


The second term on the right is p E ex /NkT. This is true for any theory that predicts fi(A ex -A) as a function 
of x = kct only, which is the case for the MS approximation. 
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The thermodynamic properties calculated by different routes are different, since the MS solution is an 
approximation. The osmotic coefficient from the virial pressure, compressibility and energy equations are not 
the same. Of these, the energy equation is the most accurate by comparison with computer simulations of 
Card and Valleau [63]. The osmotic coefficients from the virial and compressibility equations are 


<p v = ^ lls + 


x 2 B 


<t>c = 4 HS . 


(A2.3.235) 
(A2.3.236) 


In the limit of zero ion size, i.e. as a — » 0, the distribution functions and thermodynamic functions in the MS 
approximation become identical to the Debye-Hiickel limiting law. 

(f) The HNC approximation 

The solutions to this approximation are obtained numerically. Fast Fourier transform methods and a 
reformulation of the HNC (and other integral equation approximations) in terms of the screened Coulomb 
potential by Allnatt [ 64 ] are especially useful in the numerical solution. Figure A2.3.12 compares the osmotic 
coefficient of a 1-1 RPM electrolyte at 25°C with each of the available Monte Carlo calculations of Card and 
Valleau [63]. 
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Figure A2.3.12 The osmotic coefficient of a 1-1 RPM electrolyte compared with the Monte Carlo results of 
[63]. 


-67- 


The agreement is excellent up to a 1 molar concentration. The excess energies for 1-1, 2-1, 2-2 and 3-1 
charge types calculated from the MS and HNC approximations are shown in figure A2.3.13. The Monte Carlo 


results for 2-2 and 3-1 electrolytes are also shown in the same figure. The agreement is good, even for the 
energies of the higher valence electrolytes. However, as illustrated in figure A2. 3. 14 the HNC and MS 
approximations deteriorate in accuracy as the charges on the ions are increased [67]. 
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Figure A2.3.13 The excess energy of 1-1, 2-1, 3-1 and 2-2 RPM electrolytes in water at 25°C. The full and 
dashed curves are from the HNC and MS approximations, respectively. The Monte Carlo results of Card and 
Valleau [63] for the 1-3 and 2-2 charge types are also shown. 
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Figure A2.3.14 Osmotic coefficients for 1-1, 2-1 and 3-1 RPM electrolytes according to the MS and HNC 
approximations. 

The osmotic coefficients from the HNC approximation were calculated from the virial and compressibility 
equations; the discrepancy between (j>y and (|> c is a measure of the accuracy of the approximation. The osmotic 
coefficients calculated via the energy equation in the MS approximation are comparable in accuracy to the 
HNC approximation for low valence electrolytes. Figure A2.3.15 shows deviations from the Debye-Hiickel 
limiting law for the energy and osmotic coefficient of a 2-2 RPM electrolyte according to several theories. 
The negative deviations from the limiting law are reproduced by the HNC and DHLL + B 2 equations but not 
by the MS approximation. 
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Figure A2.3.15 Deviations (A) of the heat of dilution i^/iand the osmotic coefficient § from the Debye- 
Hiickel limiting law for 1-1 and 2-2 RPM electrolytes according to the DHLL + B v HNC and MS 

approximations. 
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In figure A2. 3. 16 the theoretical HNC osmotic coefficients for a range of ion size parameters in the primitive 

model are compared with experimental data for the osmotic coefficients of several 1-1 electrolytes at 25°C. 

Choosing a + _ = r + + r_ to fit the data at low concentrations, it is found that the calculated osmotic coefficients 

are too large at the higher concentrations. On choosing a + _ to be the sum of the Pauling radii of the ions, and a 

short-range potential given by a square well or mound d.. equal to the width of a water molecule (2.76A), it is 

y 
found that the osmotic coefficients can be fitted to the accuracy shown in figure A2.3.17 [65]. There are other 

models for the short-range potential which produce comparable fits for the osmotic coefficients showing that 

the square well approximation is by no means unique [66], 



Figure A2.3.16. Theoretical HNC osmotic coefficients for a range of ion size parameters in the primitive 
model compared with experimental data for the osmotic coefficients of several 1-1 electrolytes at 25°C. The 
curves are labelled according to the assumed value of a+- = r+ + r- 
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Figure A2.3.17 Theoretical (HNC) calculations of the osmotic coefficients for the square well model of an 
electrolyte compared with experimental data for aqueous solutions at 25°C. The parameters for this model are 
a, = r, (Pauling)+ r_ (Pauling), d ++ = d_ = and d,_ as indicated in the figure. 


A2.3.5.2 WEAK ELECTROLYTES 

In a weak electrolyte (e.g. an aqueous solution of acetic acid) the solute molecules AB are incompletely 
dissociated into ions A + and B~ according to the familiar chemical equation 

AB = A + +B". (A2.3.237) 

The forces binding the atoms in AB are chemical in nature and must be introduced, at least approximately, in 
the Hamiltonian in a theoretical treatment of this problem. The binding between A and B in the dimer AB is 
quite distinct from the formation of ion pairs in higher valence electrolytes (e.g. aqueous solutions of ZnS0 4 
at room temperature) where the Coulomb interactions between the ions lead to ion pairs which account for the 
anomalous conductance and activity coefficients at low concentration. The greater shielding of the ion charges 
with increasing electrolyte concentration would induce the ion pairs to dissociate as the concentration rises, 
whereas the dimer population produced by the chemical bonding represented in the above chemical reaction 
would increase with the concentration of the solution. 


-72- 


Weak electrolytes in which dimerization (as opposed to ion pairing) is the result of chemical bonding between 
oppositely charged ions have been studied using a sticky electrolyte model (SEM). In this model, a delta 
function interaction is introduced in the Mayer /-function for the oppositely charged ions at a distance L = a, 
where a is the hard sphere diameter. The delta function mimics bonding and the Mayer /-function 


fv- = -1 + Lt;Hr - L)/12 r < ff (A2.3.238) 

where Q is the sticking coefficient. This induces a delta function in the correlation function h + _(r) for 
oppositely charged ions with a different coefficient X: 

/ii_ = -I +LA3(r - L)f 12 r < a, (A2.3.239) 

The interaction between ions of the same sign is assumed to be a pure hard sphere repulsion for r < a. It 
follows from simple steric considerations that an exact solution will predict dimerization only if L < a/2, but 
polymerization may occur for a/2 < L = a. However, an approximate solution may not reveal the full extent 
of polymerization that occurs in a more accurate or exact theory. Cummings and Stell [69] used the model to 
study chemical association of uncharged atoms. It is closely related to the model for adhesive hard spheres 
studied by Baxter [70]. 

The association 'constant' K defined by K = p^/p + p_ is 

3<i -m? 

where the average number of dimers (TV) = t\X(L/g) and r| = npo /6, in which p is the total ionic density. We 
can now distinguish three different cases: 


k = tt no dimers $irang clectrolyic (RPM) 

k = (afL)*ft} all dirncrs \f L < a/2 dipolar dumb-hd]s 

< A. < {afL)*fij ions + dinners wcyk c]cclmlyic (SRM). 

Either the same or different approximations may be used to treat the binding at r = L and the remaining 
electrical interactions between the ions. The excess energy of the sticky electrolyte is given by 


/C'\ 


(Af)3 


NkT 2 dfi 
where 


In* / tUri£o\ kH 

— - - [ + A2.3.241) 

W \ 3\iiTj 2 


« = 'L 


hcjfrjrdr (A2.3.242) 
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and h^(r) = \h + _(r) - /z ++ (r)]/2. The first term is the binding energy and the second is the energy due to the 
interactions between the charges which can be determined analytically in the MS approximation and 
numerically in the HNC approximation. For any integer n = alL, H' = H/o in the MS approximation has the 
form [48] 

H = — ^— — ] - (A2.3.243) 

24u 4 r) 

where a. (i = 1 to 4) are functions of the reduced ion concentration r|, the association parameter X and n. When 
X = 0,a-= 1, the average number of dimers (TV) = and the energy of the RPM strong electrolyte in the MS 
approximation discussed earlier is recovered. The effect of a hard sphere solvent on the degree of dissociation 
of a weak electrolyte enhances the association parameter X due to the packing effect of the solvent, while 
adding a dipole to the solvent has the opposite effect [71]. 

The PY approximation for the binding leads to negative results for X; the HNC approximation for this is 
satisfactory. Figure A2.3.18 shows the excess energy as a function of the weak electrolyte concentration for 
the RPM and SEM for a 2-2 electrolyte. 
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Figure A2.3.18 The excess energy E QX in units of NkTas a function of the concentration c t for the RPM and 
SEM 2-2 electrolyte. The curves and points are results of the HNC/MS and HNC approximations, 

respectively, for the binding and the electrical interactions. The ion parameters are a = 4.2 A, and E = 73.4. 

The sticking coefficients C= 1.6*10 and 2.44*10 for L = a/2 and a/3, respectively. 
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In the limit X = (a/L) 3 /r\ with L < a/2, the system should consist of dipolar dumb-bells. The asymptotic form 
of the direct correlation function (defined through the Ornstein-Zernike equation) for this system (in the 
absence of a solvent) is given by 


Q;(r)= -fiAe t e } fr 


(A2.3.244) 


where A = z/(z - 1) and s is the dielectric constant of the system of dipolar dumb-bells. The energy of dipolar 
dumb-bells, excluding the binding energy, in the MS approximation is [ 48 ] 


£** -x[d + t 2 -r') - (c] + 2ox') ,/3 


W D kT 


24 ? j 


(A2.3.245) 


where x' is the reduced dipole moment defined by 


*' = tf<7 =2n(AnpfkT) [/2 t* 


(A2.3.246) 


dipole moment \i = eL = ea/n, N D = N/2 is the number of dipoles, the coefficients c f (i = 1 to 3) depend on the 
dipole elongation and n is an integer. This provides an analytic solution for the energy of dipolar dumb-bells 
in the MSA approximation; it suffers from the defect that it tends to a small but finite constant in the limit of 
zero density and should strictly be applicable only for L < a 12. 


A2.3.6 PERTURBATION THEORY 


The attractive dispersive forces between the atoms of a simple homogeneous fluid increase their cohesive 
energy, but their vector sums nearly cancel, producing little alteration in the structure determined primarily by 
the repulsive part of the interatomic potential. Charges, dipoles and hydrogen bonding, as in water molecules, 
increase the cohesive energy of molecules and produce structural changes. Despite this, the harsh interatomic 
repulsions dominate the structure of simple fluids. This observation forms the physical basis of perturbation 
theory, van der Waals implicitly used this idea in his equation of state in which the attractive part of the 
interaction is treated as a perturbation to the repulsive part in a mean- field approximation. 

In perturbation theories of fluids, the pair total potential is divided into a reference part and a perturbation 

w(L2) = u°(], 2) + ur(], 2) (A2.3.247) 

where u (1, 2) is the pair potential of the reference system which usually has the features that determine the 
size and shape of the molecules, while the perturbation w(l, 2) contains dispersive and attractive components 
which provide the cohesive energy of the fluid. The equilibrium properties of the system are calculated by 
expansion in a suitable parameter about the reference system, whose properties are assumed known to the 
extent that is necessary. 
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The reference system may be anisotropic, i.e. with u (1, 2) = u (V 12? ^p ^2' w h ere (^i> ^2) re P resent the 
angular coordinates of atoms 1 and 2, or it may be isotropic when u (1, 2) = u (^ 12 )- 

The most common choice for a reference system is one with hard cores (e.g. hard spheres or hard spheroidal 
particles) whose equilibrium properties are necessarily independent of temperature. Although exact results are 
lacking in three dimensions, excellent approximations for the free energy and pair correlation functions of 
hard spheres are now available to make the calculations feasible. 

The two principal methods of expansion used in perturbation theories are the high-temperature X expansion of 
Zwanzig [72], and the y expansion introduced by Hemmer [73], In the ^-expansion, the perturbation w(l, 2)1 
is modulated by the switching parameter X which varies between and 1, thereby turning on the perturbation. 
The free energy is expanded in powers of X, and reduces to that of the reference system when X = 0. In the y 
expansion, the perturbation is long ranged of the form w(r) = - y (|)( y r), and the free energy is expanded in 
powers of y about y = 0. In the limit as y^> 0, the free energy reduces to a mean-field van der Waals-like 
equation. The y expansion is especially useful in understanding long-range perturbations, such as Coulomb 
and dipolar interactions, but difficulties in its practical implementation lie in the calculation of higher-order 
terms in the expansion. Another perturbation approach is the mode expansion of Andersen and Chandler [74], 
in which the configurational integral is expanded in terms of collective coordinates that are the Fourier 
transforms of the particle densities. The expansion is especially useful for electrolytes and has been optimized 
and improved by adding the correct second virial coefficient. Combinations of the X and y expansions, the 
union of the X and virial expansions and other improvements have also been discussed in the literature. Our 
discussion will be mainly confined to the X expansion and to applications of perturbation theory to 
determining free energy differences by computer simulation. We conclude the section with a brief discussion 
of perturbation theory of inhomogeneous fluids. 

A2.3.6.1 THE A EXPANSION 

The first step is to divide the total potential into two parts: a reference part and the remainder treated as a 
perturbation. A coupling parameter X is introduced to serve as a switch which turns the perturbation on or off. 
The total potential energy of TV particles in a given configuration (#* l? . . . ,r^) is 


U N (r } r v ; A) = E/JCfi r*) 4 MV iV (t> 7>) (A2.3.248) 

where = A, = 1, ^.v(r l5 . . . ,r^) is the reference potential and W-^jr^ . . . ,r^) is the perturbation. When A is 
zero the perturbation is turned off, and it is on when A = 1 . 

The configurational PF 


Z(N, V, T\ A) = f cxpi-pUnir*', k)dr N 

= /exp(-^(/;-(r A '»exp(-iSi^(r ;i, )dr' v . 


(A2.3.249) 
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Multiplying and dividing by the configurational PF of the reference system 

Z n (/V, V, T) = f exp(-£E/£(r v ))<lr" (A2.3.250) 

one finds that 

Z(tf , V, 7': A) = Z%V, I/, V'Xcxpf-AlV.vtr^)))!) (A2.3.251) 

where ( . . . ) is an average over the reference system. The Helmholtz free energy 

ZtN V T°\) 
At\\ V, T: A) = -(Tin Q(N, V ¥ T: A) = -kiln * ' ' - (A2.3.252) 

/v ! A JiV 

It follows that the change in Helmholtz free energy due to the perturbation is 

A(N+ V t T; A) - A u (JV t V, 7: A) = -JfcT" In(exp(-/JA1^ (r v }}) . (A2.3.253) 

This equation was first derived by Zwanzig [72], Note that P and A always occur together. Expanding about A 
= at constant P (or equivalently about P = at constant A) one finds 

-j*AA(A)= ln(exp(-^XlV w (r w )))o 

(A2.3.254) 




which defines the coefficients a . By comparing the coefficients of (-pA,) w for different n, one finds 


(A2.3.255) 


as = ^[(Wlh - i(W N ) (Wl) Q + 2{Wx)l] etc 

where the averages are over the reference system whose properties are assumed to be known. 

The first term in the high-temperature expansion, a 1? is essentially the mean value of the perturbation 
averaged over the reference system. It provides a strict upper bound for the free energy called the Gibbs- 
Bogoliubov inequality. It follows from the observation that exp(-x)l-x which implies that ln(exp(-x)) ln(l 
x) - (x). Hence 
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ln{exp(-^Vy(r*})) > -pk(W N {r N )}o. 

Multiplying by -1 reverses this inequality, and we see that 

fiAA(k) < 0A(lV*(r jV ))o = /tt«i (A2.3.256) 

which proves the result. The higher-order terms in the high-temperature expansion represent fluctuations 
about the mean. 

Assuming the perturbing potential is pairwise additive, 

W N (r !/ ) = Y f U> t j(r iJ ) (A2.3.257) 

we have 

/ ■ ■ ■ / £,«, **jtnj) cxp(-JW° <r») dr* 

"' = {Wxh = ZfiiKV.T) 

_ N{N - I) / . . . /exp(-/jyJJ(r iV )dr3.dr w dr t dr 3 

= / ■■ / p!v <2) (ri,r2)driT*2 

where translational and rotational invariance of the reference fluid system implies that p >( \r^, r 2 ) = p $n 
(r 12 ). Using this in the above expression, changing to relative coordinates and integrating over coordinates of 
1 , one has 

&\ = ~Y I w 1 2 ( r l2 >*W ('l 2 > dr 1 2 (A2.3.258) 

which was first obtained by Zwanzig. As discussed above, this provides an upper bound for the free energy, 


so that 

&A(k) p 


^— < | j Air,;(r (I )^ v ( n2 )dr l2 , (A2.3.259) 


The high-temperature expansion, truncated at first order, reduces to van der Waals' equation, when the 
reference system is a fluid of hard spheres. 
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The second-order term, a 2 , was also obtained by Zwanzig, and involves two-, three- and four-body correlation 
functions for an TV-particle system. Before passage to the thermodynamic limit, 


° 2 = 2 t" I ' , A' 1 ( r lt r 2) u? J2(' , l2) Jr l Jr 2 


+ / / / Px (n *n-n)w]2(i-]2)w2i(r2j)dri dr^drj 

(A2.3.260) 


0> r 4 )]u)|2(ri2>W23(''2j} dr] dr ; dr 3 d?*j L 


Evaluating its contribution to the free energy of the system requires taking the thermodynamic limit (N^> go) 
for the four-particle distribution function. Lebowitz and Percus [75] and Hiroike [76] showed that the 

asymptotic behaviour of Pn in the canonical ensemble, when the 1,2 and 3,4 pairs are widely separated, is 
given by 

P.v W i^ ^*M> = P*j?iT lf T£pff(ft*r 4 ) + x{r lH r 2l r 3 , r 4 )/W + 0(AT 2 ) (A2.3.261) 

where the 0(1 /TV) term makes a finite contribution to the last term in a^ This correction can also be evaluated 
in the grand canonical ensemble. 

The high-temperature expansion could also be derived as a Taylor expansion of the free energy in powers of A, 
about X = 0: 

A(X) = Ao + \(SA/&k)i=C + (X 2 /2)(^5 2 >t/iA 2 h =( > + ■ ■ (A2.3.262) 

so that the coefficients of the various powers of A, are related to a , with 




(A2.3.263) 


It follows that 

-A 


..v> 


A(K) = A(0) + / . tl\ I r- ) >, 4 l;. . (A2.3.264) 

Jo 


Assuming the perturbing potential is pairwise additive, an argument virtually identical to the calculation of a^ 
= ( Wj^)) shows that 
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where / .v (r 1? r 2 ; X) is the two-particle density correlation function in an TV-particle system with potential U N 
(r 1? . . . 9 r N ; X). Substituting this in the expression for A(k), we have 

&A(k) - - j dX j ... j p^ ) (r\,r2\k)w i 2(r\2)dr\dr2 (A2.3.265) 

where A A(X) = A(X)-A(0). Expanding the two-particle density correlation function in powers of X, 


PJ?Vl ,r 2 ;k)= rf {2> (r] , r 2 ) + X I " r ^ " '' ' *'"' I (A2.3.266) 


we see that the zeroth order term in X yields the first-order term a^ in the high-temperature expansion for the 
free energy, and the first-order term in X gives the second-order term a 1 in this expansion. As is usual for a 

fluid, Ptf (r 1? r 2 ; X) = p Sw(r 12 ; ?i) and 

^ j:i ) =| J^ (r|2) U^) +.... (A2.3.267) 

But by definition 

gf{r yit K) = exp[-0(«? 2 (r, 2 ) + Xi/^(r, 2 )]»(ri 2 ; a) (A2.3.268) 

where X r i2> ^) * s ^ e cav ity function, and tti^ 12 ) * s ^ e P a * r potential of the reference system, from which it 
follows that 

ff!?(ri2; A) ^ [1 - ^«Jl2(l-|2>kS <2> <n2) (A2.3.269) 

which suggests gp ] {r\2\ X) c± £^(r 12 )when p ^2^12) P s ^' w ^ ere £ is the depth of the potential well. It also 
suggests an improved approximation 

c(r) = fiu{r) - \ng{r) + /r(r) + tf(r) (A2.3.270) 
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where if 12(^12: A) = u l ^(fn) + Xw\-*{r\2)i s the pair potential. The calculation of the second-order term in the 
high-temperature expansion involves the three- and four-body correlation functions which are generally not 


known even for a hard sphere reference system. The situation becomes worse for the higher-order terms in the 
perturbation expansion. However, determination of the first-order term in this expansion requires only the pair 
correlation function of the reference system, for which a convenient choice is a fluid of hard spheres whose 
equilibrium properties are known. Barker and Henderson [77] suggested a hard sphere diameter defined by 


il = j [I - exp(-jftfi2(r)]dr (A2.3.271) 

where w 12 (r) is the pair potential. This diameter is temperature dependent and the free energy needs to be 
calculated to second order to obtain the best results. 

Truncation at the first-order term is justified when the higher-order terms can be neglected. When Ps <S1, a 
judicious choice of the reference and perturbed components of the potential could make the higher-order 
terms small. One choice exploits the fact that a 1? which is the mean value of the perturbation over the 
reference system, provides a strict upper bound for the free energy. This is the basis of a variational approach 
[ 78 , 79 ] in which the reference system is approximated as hard spheres, whose diameters are chosen to 
minimize the upper bound for the free energy. The diameter depends on the temperature as well as the 
density. The method was applied successfully to Lennard- Jones fluids, and a small correction for the softness 
of the repulsive part of the interaction, which differs from hard spheres, was added to improve the results. 


A very successful first-order perturbation theory is due to Weeks, Chandler and Andersen [80], in which the 

^rence part u (r) and a perturbation w(r) 

u(r) = it"(r) + w(r) (A2.3.272) 


pair potential u(r) is divided into a reference part u°(r) and a perturbation w(r) 


in which 




«V) = 1 „ \ r> £™J (A2.3.273) 


and 




{r < K,wn) 


where s is the depth of the potential well which occurs at r = ^ min . This division into reference and perturbed 
parts is very fortuitous. The second step is to relate the reference system to an equivalent hard sphere fluid 
with a pair potential w HS (r). This is done by defining v(r) by 
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HS,-s> , _iv, f a.^s.w D „.w tf.-HS, 


exp(-/h>(r» = exp(-/Jw ri5 (r-)) + a[expr-^i4 u (f}} - cxp(-/J« ns tr»] (A2.3.275) 

in which a is an expansion parameter with < a < 1. The free energy of the system with the pair potential v(r) 
is expanded in powers of a to 0(a) to yield 


A , = A „ S _A^i 


J[exp(-/)wV) - exp{-jtfw HS (r)]> HS (r; d)dr t Q{& 2 ) (A2.3.276) 


where j HS (r,<f) is the cavity function of hard spheres of diameter d determined by annihilating the term of 
order a by requiring the integral to be zero. The diameter d is temperature and density dependent as in the 
variational theory. The free energy of the fluid with the pair potential u(f) is now calculated to first order 
using the approximation 

j?V) exp(-jMr))y H V d). (A2.3.277) 

This implies, with the indicated choice of hard sphere diameter d, that the compressibilities of the reference 
system and the equivalent of the hard sphere system are the same. 

Another important application of perturbation theory is to molecules with anisotropic interactions. Examples 
are dipolar hard spheres, in which the anisotropy is due to the polarity of the molecule, and liquid crystals in 
which the anisotropy is due also to the shape of the molecules. The use of an anisotropic reference system is 
more natural in accounting for molecular shape, but presents difficulties. Hence, we will consider only 
isotropic reference systems, in which the reference potential w°(r 12 ) is usually chosen in one of two ways. In 
the first choice, w°(r 12 ) is defined by 

expHEtoVis)] = ^" 2 ff mp[-Puiri2> ^l ^)] dS2 L d£2 2 (A2.3.278) 

which can be applied even to hard non-spherical molecules. The ensuing reference potential, first introduced 
by Rushbrooke [81], is temperature dependent and was applied by Cook and Rowlinson [ 82 ] to spheroidal 
molecules. It is more complicated to use than the temperature-independent reference potential defined by the 
simple averaging 

h°(02) = fi ~" / / "tri:>i ^11 ^)<ifii <1&2 (A2.3.279) 

This choice was introduced independently by Pople [ 83 ] and Zwanzig [84]. 
We assume that the anisotropic pair interaction can be written as 

W(ri 2l fl|, fi 3 ; A) = J*Vl2) + Aw{ri2.Ql*Q2> (A2.3.280) 


-82- 

where the switching function X lies between zero and one. The perturbation is fully turned on when X is one 
and is switched off when X is zero. In the X expansion for the free energy of the fluid, 

A/Ma) = a/1 | +>. 2 A 2 +k 2 A} + ■ (A2.3.281) 

one finds that the leading term of order X 

A] = 2^ J ) J 8 * {r] ^ W(r[2 ' fl| ' ^2)dr\2dQ] dfi 2 (A2.3.282) 


vanishes on carrying out the angular integration due to the spherical symmetry of the reference potential. The 
expressions for the higher-order terms are 

A 2 =-7^2 / / / /C' H ]2)^(rL2,n] ? Q2) 2 drL2dfiLdfi2 (A2.3.283) 


Ai " ^^ fff ^^^{r^^^iydr^dQidii! i ^ 


Jjjg^Hm,^^) 


(A2.3.284) 


dr\i dr\j dQ] dfi; df^ . 

The expansion of the perturbation w(r 12 , Q 1? Q 2 ) in terms of multipole potentials (e.g. dipole-dipole, dipole- 
quadrupole, quadrupole-quadrupole) using spherical harmonics 

H?<r,2. fli, «2> = Y, Y, Xf,hm{r)¥ »u^ Y »>S Q ^ (A2.3.285) 

leads to additional simplifications due to symmetry. For example, for molecules with only dipole and 
quadrupolar interactions, all terms in which the dipole moment appears an odd number of times at an 
integration vertex vanish. In particular, for a pure dipolar fluid, the two-body integral contributing to A^ 
vanishes. Angular integration of the three-body integral leads to the Axelrod-Teller three-body potential: 

M'li ^ n) = (3 COS Si COSy 2 COS 8$ + l)An2^l3^23) 3 (A2.3.286) 
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where 1? 2 , 63 and r 12 , r 13 , r 23 are the angles and sides of the triangle formed by the three dipoles so that 
only the spatial integration remains to evaluate Ay This is accomplished by invoking the superposition 
approximation 


/''Vis, r„, r 23 ) = J?V 12 )J? V, 3 )j(fV 2 3) (A2.3.287) 

which makes only a small error when it contributes to the third-order perturbation term. Tables of the relevant 
integrals for dipoles and multipoles associated with a hard sphere reference system are available [85], 

For many molecules the reduced dipole moment |u* = (|i /(ea )) is greater than 1 and the terms in the 
successive terms in the X expansion oscillate widely. Stell, Rasaiah and Narang [ 85 ] suggested taming this by 
replacing the truncated expansion by the Pade approximant 

AA = A 2 ( I - —I (A2.3.288) 


-0-5)' 


which reproduces the expected behaviour that as |u becomes large the free energy A increases as |i . The Pade 
approximant is quite successful in reproducing the thermodynamic behaviour of polar fluids. However, the 
critical exponents, as in all mean-field theories, are the classical exponents. 

The generalization of the X expansion to multicomponent systems is straightforward but requires knowledge 


of the reference system pair correlation functions of all the different species. Application to electrically 
neutral Coulomb systems is complicated by the divergence of the leading term of order X in the expansion, but 
this difficulty can be circumvented by exploiting the electroneutrality condition and using a screened 
Coulomb potential 

«tf('") = u*!;(r) + -^cxp(-or) (A2.3.289) 

where a is the screening parameter and «y(r)is the reference potential. The term of order X, generalized to the 
multicomponent electrolyte, is 




(A2.3.290) 
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The integrals are divergent in the limit a = 0. However, substituting gfj(r) = itfj(r) 4- I, and making use of the 
electroneutrality condition in the form 

1] J2 Wj^f = ( Yl W ) = ° (A2.3.291) 

one finds, on taking the limit a — > 0, that 

A I = T— 2_^ 1^ e * e J c * C J I — dr (A2.3.292) 

in which the divergences have been subtracted out. It follows from the Gibbs-Bogoliubov inequality that the 
first two terms form an upper bound, and 

A < A°+ Ah (A2.3.293) 

For a symmetrical system in which the reference species are identical (e.g. hard spheres of the same size), the 
integral can be taken outside the summation, which then adds up to zero due to the electroneutrality condition, 
to yield 

A\ = 0. (A2.3.294) 

The reference free energy in this case is an upper bound for the free energy of the electrolyte. A lower bound 

for the free energy difference A A between the charged and uncharged RPM system was derived by Onsager 

[86]; this states that A A/N> -q 2 /zg. Improved upper and lower bounds for the free energy have been 
discussed by Gillan [87], 

The expression for k 2 shows that it is the product of the ionic concentration c and e 2 l&^kT, which is called the 
Bjerrum parameter. The virial series is an expansion in the total ionic concentration c at a fixed value of 

e Iz^kT. A theory due to Stell and Lebowitz (SL) [88], on the other hand, is an expansion in the Bjerrum 
parameter at constant c. The leading terms in this expansion are 


_ = A + JU li_ (A2.3.295) 

NkT NkT NkT I2itc 
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where we have already seen that the first two terms form an upper bound. Here tf~'is a modified Debye length 
defined by 


*?=*> + 


— T?5I5Z*W^ / h l dr > (A2.3.296) 


By the same argument as before, the integral may be taken outside the summations for a symmetrical 
reference system (e.g. the RPM electrolyte) and applying the electroneutrality condition one sees that in this 
case k 1 = k. Since the first two terms in the SL expansion form an upper bound for the free energy, the 
limiting law as T — » go at constant c must always be approached from one side, unlike the Debye-Hiickel 
limiting law which can be approached from above or below as c — » at fixed temperature T(e.g. ZnS0 4 and 
HC1 in aqueous solutions). 

Examination of the terms to 0(k ) in the SL expansion for the free energy show that the convergence is 
extremely slow for a RPM 2-2 electrolyte in 'aqueous solution' at room temperature. Nevertheless, the series 
can be summed using a Pade approximant similar to that for dipolar fluids which gives results that are 
comparable in accuracy to the MS approximation as shown in figure A2. 3. 19 (a). However, unlike the DHLL 
+ B 2 approximation, neither of these approximations produces the negative deviations in the osmotic and 
activity coefficients from the DHLL observed for higher valence electrolytes at low concentrations. This can 
be traced to the absence of the complete renormalized second virial coefficient in these theories; it is present 

only in a linearized form. The union of the Pade approximant (SL6(P)), derived from the SL theory to 0(k 6 ), 
and the Mayer expansion carried as far as DHLL + B 2 

SL6(P) UB 2 = SL6(PJ + fl 3 - SL6(P) n B 2 (A2.3.297) 

produces the right behaviour at low concentrations and has an accuracy comparable to the MS approximation 
at high concentrations. Figure A2.3. 19 (b) shows the coexistence curves for charged hard spheres predicted by 
SL6(P) and the SL6(P) u B 9 . 
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T -AM 


Figure A2.3.19 Coexistence curve for the RPM predicted by SLR(P) and SL6(P)Y B 2 . The reduced 
temperature T* = E kTo/e 2 and x = pa 3 (after [85]). 
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By integrating over the hard cores in the SL expansion and collecting terms it is easily shown this expansion 
may be viewed as a correction to the MS approximation which still lacks the complete second virial 
coefficient. Since the MS approximation has a simple analytic form within an accuracy comparable to the 
Pade (SL6(P)) approximation it may be more convenient to consider the union of the MS approximation with 
Mayer theory. Systematic improvements to the MS approximation for the free energy were used to determine 


the critical point and coexistence curves of charged, hard spheres by Stell, Wu and Larsen [89], and are 
discussed by Haksjold and Stell [90]. 


A2.3.6.2 COMPUTATIONAL ALCHEMY 


Perturbation theory is also used to calculate free energy differences between distinct systems by computer 
simulation. This computational alchemy is accomplished by the use of a switching parameter X, ranging from 
zero to one, that transforms the Hamiltonian of one system to the other. The linear relation 


U(X) = XU C + (1 - A)£/ B (A2.3.298) 

interpolates between the energies U B (r^, co N ) and U c (r^, co N ) of the initial and final states of molecules C 
and B and allows for fictitious intermediate states also to be sampled. The switching parameter could be the 

dihedral angle in a peptide or polymer chain, the charge on an atom or one of the parameters s or a defining 
the size or well depth of its pair interaction with the environment. 

It follows from our previous discussion of perturbation theory that 

&MYl^Q = Ac-A fi = -kT In (exp[-0< U c - t/ B )]) ft (A2.3.299) 

which is the free energy difference between the two states as a function of their energy difference sampled 
over the equilibrium configurations of one of the states. In the above expression, the averaging is over the 
equilibrium states of B, and C is treated as a perturbation of B. One could equally well sample over C and 
treat B as a perturbation of C. The averages are calculated using Monte Carlo or molecular dynamics 
discussed elsewhere; convergence is rapid when the initial and final states are similar. Since free energy 
differences are additive, the change in free energy between widely different states can also be determined 
through multiple simulations via closely-spaced intermediates determined by the switching function X which 
gradually mutates B into C. The total free energy difference is the sum of these changes, so that 


f 1 flA/M 

J a 5 A 


AA(B -> C) = / AM Ja (A2.3.300) 


and one calculates the derivative by using perturbation theory for small increments in X. The accuracy can be 
improved by calculating incremental changes in both directions. 

A closely-related method for determining free energy differences is characterized as thermodynamic 
integration. The configurational free energy of an intermediate state 
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A(X) = -kT In Z{k) (A2.3.301) 

from which it follows that 


A A(B -* C) = / { — ) dk (A2.3.302) 


where the derivative pertains to equilibrated intermediate states. This forms the basis of the 'slow growth' 
method, in which the perturbation is applied linearly over a finite number of time steps and the free energy 
difference computed as the sum of energy differences. This method, however, samples over non-equilibrium 
intermediate states and the choice of the number of time steps over which the perturbation is applied and the 
corresponding accuracy of the calculation must be determined empirically. 


Free energy perturbation (FEP) theory is now widely used as a tool in computational chemistry and 
biochemistry [91]. It has been applied to determine differences in the free energies of solvation of two solutes, 
free energy differences in conformational or tautomeric forms of the same solute by mutating one molecule or 
form into the other. Figure A2.3.20 illustrates this for the mutation of CH 3 OH -> CH 3 CH 3 [92]. 
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Figure A2.3.20 Free energy change in the transformation of CH 3 OH to CH 3 CH 3 (after [92]). 

There are many other applications. They include determination of the ratios of the partition coefficients 
(P R /P C ) of solutes B and C in two different solvents by using the thermodynamic cycle: 
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AG,(B) 
B(solvcnt ] ) — * B<solvcnt 2) 
AG^[B -+ C)l I 4 AG™i(B -+ C) 2 (A2.3.303) 

C(solvcnl 1) — ► C< solvent 2) 


AG t (C). 


It follows that 


23RT In(/Vft0 = Afri(C) - AC^B) 

= A(7^,(B^C) 2 -AG^i(B 


Qi 


(A2.3.304) 


where A G t (B) and A G t (C) are the free energies of transfer of solute B and C, respectively, from solvent 1 to 
2 and A G sol (B -^ C) 1 and A G sol (B -^ C) 2 are differences in the solvation energies of B and C in the 
respective solvents. Likewise, the relative pK a s of two acids HA and HB in the same solvent can be calculated 
from the cycle depicted below for the acid HA 


ACJKHA) 
HA(gas) — * H v (gas) + A~(gas) 
Af7 ]]y j(HA) J | AG h)( i(H + ) i AG hyi i{A~) (A2.3.305) 

HA(soln) — ► H (soln) H-A~(soln) 


AG^(HA) 


with a corresponding cycle for HB. From the cycle for HA, we see that 
2.3/f7>/UHA) = AGS (HA) 


= + AG M (H h ) + AG M (A") - AG h ,dCHA» 


(A2.3.306) 


with a similar expression forpK^HB). The difference in the twopK a s is related to the differences in the gas 

phase acidities (free energies of dissociations of the acids), the free energies of hydration of B ~ and A ~ and 
the corresponding free energies of the undissociated acids: 

2.Jff7[p^(HA) -/jtf„(HB)) = AG**(HA) - AG^{HB} + AG M (HA) - AG h3d (HR) 

+ AG h , d (A^)- AG M (B~). 

The relative acidities in the gas phase can be determined from ab initio or molecular orbital calculations while 
differences in the free energies of hydration of the acids and the cations are obtained from FEP simulations in 
which HA and A ~ are mutated into HB and B ~ respectively. 
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Another important application of FEP is in molecular recognition and host-guest binding with its dependence 
on structural alterations. The calculations parallel our discussion of acid dissociation constants and have been 
used to determine the free energies of binding of A-T and C-G base pairs in solution from the corresponding 
binding energies in the gas phase. The relative free energies of binding two different ligands L 1 and L 2 to the 
same host are obtained from the following cycle: 


AC. 

E + Li — > ELt 

i |AGj |AG 4 (A2.3.307) 

E -f U — > EL 2 
AGi. 

The difference in the free energy change when L 1 is replaced by L 2 is 

AGi - AG i = AG 4 - AGj (A2.3.308) 

which is determined in FEP simulations by mutating L 1 to L 2 and EL 1 to EL 2 . 

FEP theory has also been applied to modelling the free energy profiles of reactions in solution. An important 
example is the solvent effect on the SN2 reaction 


cr +■ ch : xi -* [ci - -ch 3 - —ci] -* cichj + cr 


(A2.3.309) 


as illustrated from the work of Jorgenson [93 ] in figure A2.3.21 . 

The gas phase reaction shows a double minimum and a small barrier along the reaction coordinate which is 
the difference between the two C-CL distances. The minima disappear in aqueous solution and this is 
accompanied by an increase in the height of the barrier. The behaviour in dimethyl formamide is intermediate 
between these two. 
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Figure A2.3.21 Free energy profile of the SN2 reaction Cr+CH 3 C1^[C1-CH 3 -C1]^C1CH 3 +Cr in the gas 
phase, dimethyl formamide and in water (from [93]). 


A2.3.6.3 INHOMOGENEOUS FLUIDS 


An inhomogeneous fluid is characterized by a non-uniform singlet density p(r 1? [(|)]) that changes with 
distance over a range determined by an external field. Examples of an external field are gravity, the walls 
enclosing a system or charges on an electrode. They are important in studies of interfacial phenomena such as 
wetting and the electrical double layer. The attractive interatomic forces in such systems do not effectively 
cancel due to the presence of the external field and perturbation theories applied to homogeneous systems are 
not very useful. Integral equation methods that ignore the bridge diagrams are also not very successful. 
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As discussed earlier, the singlet density p(r 1? [(|)]) in an external field § due to a wall is given by 


- / p(r 2 \ri: 


kT In p(T U [$]) = -V^trO - j p(r 2 \r lk [^])Vj u(r i2 )dr 2 (A2.3.310) 

where the conditional density p(rJrp [(|)]) = p^0*p r^; [(MVpO^; [(|)]). Weeks, Selinger and Broughton (WSB) 
[ 94 ] use this as a starting point ofa perturbation theory in which the potential is separated into two parts, 

Him) = uVii) + wim) (A2.3.311) 

where u R (r 12 ) is the reference potential and w(r^~) the perturbation. An effective field (|) R for the reference 
system is chosen so that the singlet density is unchanged from that of the complete system, implying the same 
average force on the atoms at r 1 : 

p R (r ] .[^ R ]) = p(r,,[^]), (A2.3.312) 

The effective field is determined by assuming that the conditional probabilities are the same, i.e. 

p R (nh= [<P R ]) = Pfaln; [#]) (A2.3.313) 

when it follows that 


V^Vi) - 0(n)l = J pViln; [<p R ])V ] w{r ll )dr 2 . 


(A2.3.314) 


The conditional probability p R (r 9 |r 1 ; [c|) R ]) differs from the singlet density p R (r 9 ; [c|) R ]) mainly when 1 and 2 
are close and the gradient of the perturbation V^w{r^ is small. Replacing p (r 2 |r 1 ;[(|) ]) by p (r 2 , [(|) ]), 
taking the gradient outside the integral and integrating, 

[0 Vl) - 0(f ])J = / p R (r 2 : [tf> R ] - pV(r l3 ) dr 2 

J . (A2.3.315) 

= f> B J /(r 2 ;[4> R ])-l>u^, 2 )dr 2 

where p B is the bulk density far from the wall and p R (f* 2 > [ ( I )R ]) = P B g R (f2' [ ( t )R ])- This equation can be solved 
by standard methods provided the reference fluid distribution functions in the external field (|) R are known, for 
example through computer simulations. Other approximate methods to do this have also been devised by 
WSB [95]. 
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A2.3.7 SOLIDS AND ALLOYS 


A2.3.7.1 INTRODUCTION 


Our discussion of solids and alloys is mainly confined to the Ising model and to systems that are isomorphic 
to it. This model considers a periodic lattice of TV sites of any given symmetry in which a spin variable s^ = 1 is 
associated with each site and interactions between sites are confined only to those between nearest 
neighbours. The total potential energy of interaction 


£/.*<{->■*}) - -J 5Z S| ' 4 j ~~ H X^ Si 


(A2.3.316) 

m * 

where {s^} denotes the spin variable {s^, s 2 , . . . , s N } 9 Jis the coupling constant between neighbouring sites 
and His the external field which acts on the spins s f at each site. The notation (ij) denotes summation over the 
nearest-neighbour sites; there are Nq/2 terms in this sum where q is the coordination number of each site. 
Ferromagnetic systems correspond to J> 0, for which the spins are aligned in domains either up ^ ft ' W 
down 4 1 i i M a t temperatures below the critical point, while in an antiferromagnet J< 0, and alternating spins 
T I T I T * Ton the lattice sites dominate at the lowest temperatures. The main theoretical problem in these 
systems is to predict the critical temperature and the phase diagram. Of added interest is the isomorphism to 
the lattice gas and to a two-component alloy so that the phase diagram for an Ising ferromagnetic can be 
mapped on to those for these systems as well. This analogy is further strengthened by the universality 
hypothesis, which states that the critical exponents and properties near the critical point are identical, to the 
extent that they depend only on the dimensionality of the system and the symmetry of the Hamiltonian. The 
details of the intermolecular interactions are thus of less importance. 

A2.3.7.2 ISING MODEL 

The partition function (PF) for the Ising [96] model for a system of given TV, H and T is 
Z{N. H. T) = exp(-/JG) = £exp[-/?£/ A ({**}) 


W\ 




(A2.3.317) 


There are 2 N terms in the sum since each site has two configurations with spin either up or down. Since the 
number of sites N is finite, the PF is analytic and the critical exponents are classical, unless the 

thermodynamic limit (N -^ go) is considered. This allows for the possibility of non-classical exponents and 
ensures that the results for different ensembles are equivalent. The characteristic thermodynamic equation for 
the variables TV, H and T is 
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dG = -SdT- MdH+fidN (A2.3.318) 

where Mis the total magnetization and \i is the chemical potential. Since the sites are identical, the average 
magnetization per site is independent of its location, rn(H, T) = (s .) for all sites i. The total magnetization 

M = Nm{H, T) = N{sn). (A2.3.319) 

The magnetization per site 


,.*„> = iv«*-mi f ».» = .u "" z<A - tfr n 


(A2.3.320) 


follows from the PF. As H -> 

m(tf. 7") = (j fl J = (7 > 7 C ) (A2.3.321) 

unless the temperature is less than the critical temperature T Q when the magnetization lies between -1 and +1, 

-I <m{H,T)<l (T < 7c) (A2.3.322) 

and m(H, T) versus //is a symmetrical odd function as shown in the figure A2.3.22 . 
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Figure A2.3.22 (a) The free energy G and (b) the magnetization m(H,T) as a function of the magnetic field H 
at different temperatures, (c) The magnetization m(H,T) and (d) the susceptibility % as a function of 
temperature. 
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At T= 0, all the spins are either aligned up or down. The magnetization per site is an order parameter which 
vanishes at the critical point. Along the coexistence curve at zero field 


(A2.3.323) 


m(H,T)*(J-T c ) p 

where P here is a critical exponent identical to that for a fluid in the same dimension due to the universality. 
Since s. = ±1, at all temperatures \ s i ) = J . 

To calculate the spin correlation functions (s^ .) between any two sites, multiply the expression for (sq) by Z 
when 




Differentiating with respect to if and dividing by Z, we have 


where the factor N comes from the fact that there are N identical sites. The magnetic susceptibility per site 

X,(,H) = \Y^~) = ^ S^"^ ~ I**** t"*W- (A2.3.324) 

For T > T Q , (sq) = (s t ) = when H=0. Separating out the term i = from the rest and noting that ( ,v o) = 1 , we 
have 

jfr(O) = fill + J>W)1 for T >T C (A2.3.325) 

which relates the susceptibility at zero field to the sum of the pair correlation function over different sites. 
This equation is analogous to the compressibility equation for fluids and diverges with the same exponent y as 
the critical temperature is approached from above: 

Xr(0) ^\T-T L .\- y . (A2.3.326) 

The correlation length t^ = \T-T\~ y diverges with the exponent v. Assuming that when T> T Q the site 
correlation function decays as r " " 
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to) = l^- (A2.3.327) 

where r is the distance between sites, D is the dimensionality and r| is another critical exponent, one finds, as 
for fluids, that (2 - r|)v = y. 


An alternative formulation of the nearest-neighbour Ising model is to consider the number of up f Tland down 
[i] spins, the numbers of nearest-neighbour pairs of spins 1 1 TL U IM T i land their distribution over the 
lattice sites. Not all of the spin densities are independent since 

N = IT] + Ul (A2.3.328) 

and, ifq is the coordination number of each site, 

tf[t] = 2[tT]+[t+] (A2.3.329) 

tf[l]=2[U] + [H]. (A2.3.330) 

Thus, only two of the five quantities ITMIMtTMIil* [f <Uare independent. We choose the number of down 
spins [i] and nearest-neighbour pairs of down spins [ii] as the independent variables. Adding and 
subtracting the above two equations, 

qN = 2([tt] *[**] + [HD (A2.3.331) 

q([f] ~ UD =2([ftl " [HD (A2.3.332) 

and 

tttl + Ul]='7Ar/2-[t|]. (A2.3.333) 

Defining the magnetization per site as the average number of up spins minus down spins, 

fa) = m(tf , T) = l[tl - [|]}/N = 2([TT1 - [ii])/ff (A2.3.334) 

where the last relation follows because we consider only nearest-neighbour interactions between sites. The 
lattice Hamiltonian 

tf/J r (A2.3.335) 

= - Jiiu] + itti - in]) - nun - up- 
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Making use of the relations between the spin densities, the energy of a given spin configuration can be written 
in terms of the numbers of down spins [i] and nearest-neighbour down spins [ii]: 


t/*({.v*}> = - J (^ - 2tf±]\ - H{N - 2[|]) 

= - N f y +■ H\+2{qJ + H)[U - -4J 


(A2.3.336) 


For given J, H, q and TV, the PF is determined by the numbers [i] and [W] and their distribution over the 
sites, and is given by 


II] [U] 


(A2.3.337) 


where the sum over the number of nearest-neighbour down spins [ii] is for a given number of down spins 
[>l], andg^([>l], [H]) is the number of ways of distributing [i] and [ii] over TV sites. Summing this over all 
[>W] for fixed [i] just gives the number of ways of distributing [i] down spins over TV sites, so that 


HI] 


N\ 


([;i>!ov-U])!' 


(A2.3.338) 


In this formulation a central problem is the calculation of g^([>l], [W]). 

The Ising model is isomorphic with the lattice gas and with the nearest-neighbour model for a binary alloy, 
enabling the solution for one to be transcribed into solutions for the others. The three problems are thus 
essentially one and the same problem, which emphasizes the importance of the Ising model in developing our 
understanding not only of ferromagnets but other systems as well. 

A2.3.7.3 LATTICE GAS 

This model for a fluid was introduced by Lee and Yang [92]. The system is divided into cells with occupation 
numbers 


Hi = 


1 cell i is occupied 
cell i is not occupied. 


(A2.3.339) 


No more than one particle may occupy a cell, and only nearest-neighbour cells that are both occupied interact 
with energy -s. Otherwise the energy of interactions between cells is zero. The total energy for a given set of 
occupation numbers {n} = {n^n 2 , • • • ,n^) of the cells is then 
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(A2.3.340) 


where the sum is over nearest-neighbour cells. The grand PF for this system is 

b 1 =D.I njrsO.I L f Ji) * 


(A2.3.341) 


The relationship between the lattice gas and the Ising model follows from the observation that the cell 
occupation number 


u H = 


(l+*f) fl $; = l 
*,=-! 


(A2.3.342) 


which associates the spin variable s f = ±1 of the Ising model with the cell occupation number of the lattice 
gas. To calculate the energy, note that 


where the second equality follows from 


2 4-^2 


Also 


It follows that the exponent appearing in the PF for the lattice gas, 
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Using this in the lattice gas grand PF, 


e*p,„„V> . «,[,(!£ , f )] £> [»£„, ♦ ,(S . 1 ) £.„]. 


Comparing with the PF of the Ising model 


(A2.3.343) 


(A2.3.344) 


2- "' = 1, "" 5~~ = 7 + jL^ (A2.3.345) 


L "'"/ + ^ Lp "' r = — + "7" + 4 2^ S * S J + I J + "J ) L *>" (A2.3.346) 


(A2.3.347) 


one sees that they are of the same form, with solutions related by the following transcription table: 


(A2.3.348) 


Mug Lmicfrgiu 


4i t - 

.a „-(£& + £) 


It follows from this that 


(A2.3.349) 


^(lattice gas) = fa) = (i + to))/2 = (1 + m)/2 
ju (lattice gas) = 111 - 2Jq 

r. . . ^ G Gq u G Jq 
^(lattice gas) = h_L + L = 1 + //. 

/V K 2 tf 2 

At //= 0, |u(lattice gas) = -2J# and the chemical potential is analytic even at T= T . From the thermodynamic 
relation, 

d/l = -Sm dT + V M dp (A2.3.350) 

where S M and F M are the molar entropy and volume, it follows that 




The specific heat along the critical isochore hence has the same singularity as (crP IdT ) for a lattice gas. 


-101- 


The relationship between the lattice gas and the Ising model is also transparent in the alternative formulation 
of the problem, in terms of the number of down spins [i] and pairs of nearest-neighbour down spins [W]. For 
a given degree of site occupation [>l], 


{/ m = -*[WJ (A2.3.352) 

and the lattice gas canonical ensemble PF 

QiUl W. T) = £ A '*([U [Ul)cxp(MW]). (A2 3 353) 

fll] 

Removing the restriction on fixed [■I], by considering the grand ensemble which sums over [-1], one has 

Vip(fipN) = E(z,N r T) = £V n £>*{[!], [U])CX P <MU]) (A23354) 

[1] [11] 

where the fugacity z = exp(Pu). Comparing this with the PF for the Ising model in this formulation, the entries 
in the transcription table given above are readily derived. Note that 

m(T t H) <=* (1 - Ip) (A2.3.355) 

and 

2H ^^ -kTlnUM (A2.3.356) 

where a = exp(-2Pg J) = exp(-Pqs/2). Since m is an odd function of//, for the Ising ferromagnet (1 - 2r) 


must be an odd function of kTln(z/a) for a lattice gas and m = corresponds to p = 1/2 for a lattice gas. The 
liquid and vapour branches of the lattice gas are completely symmetrical about p = 1/2 when T< T . The 
phase diagram on the p - |u is illustrated in figure A2.3.23 . 
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Figure A2.3.23 The phase diagram for the lattice gas. 

For two symmetrically placed points A and B on the isotherm, i.e. conjugate phases, 


piz) = 1 - piz) 

ln(;/ff) = - ln<*'/<r), le.zz' = v 2 
fip(z)-il/2)\tiz=fip(z)-(l/2)\n^ 
E/N - p{z)q*n = ?JN ~ pOfat/2 


(A2.3.357) 


from which it follows that for the conjugate phases 


fi(z) + p(z') = l 

p(z, 7) - p{z\ T) = (1/2) [^(s, T) -n(z\ T)] 
E(z, T) - E(z\ T) = Wq e /2)[p(z t T) - p(z\ T)]. 


(A2.3.358) 


A2.3.7.4 BINARY ALLOY 


A binary alloy of two components A and B with nearest-neighbour interactions s^, s BB and s AB , 
respectively, is also isomorphic with the Ising model. This is easily seen on associating spin up with atom A 
and spin down with atom B. There are no vacant sites, and the occupation numbers of the site i are defined by 
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n LA =(l/2)(l -Si) Itiji = £l/2)(] +Sj). (A2.3.359) 

Summing over the sites 

^("' A + "f B) = AT A + r V H = W (A2.3.360) 


where N A and N B are the number of atoms of A and B, respectively, distributed over N sites. For an open 
system, 


(N A ) = f (]/2)h - £ J\ = <,V/2)[] - m(ff, 7)] 


(A2.3.361) 


The coordination number of each site is q, and 

qN k = 2N A a + Nad (A2.3.362) 

f/W B = 2^BB + ^AA (A2.3.363) 

N = N A + N A = Q/q)[N AA - N nii + N Aa ]. (A2.3.364) 

On a given lattice of A/ sites, one number from the set {Af A , N B } and another from the set {A^ AA , N BB , A 7 ^} 
determine the rest. We choose N A and A^ AA as the independent variables. Assuming only nearest-neighbour 
interactions, the energy of a given configuration 

q N£ ti (A2.3.365) 

= - + qN A (f. AB - «AA> + JV A a(«AA + *HB ~ 2s AB ) 

which should be compared with the corresponding expressions for the lattice gas and the Ising model. The 
grand PF for the binary alloy is 
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(A2.3.366) 


S(tt, za. :b. '/') = £ Z^Zb* £ <7*(*A, jV AA )exp[-0t/ jVA (*, V , Waa)] 
= (zH* e*^) J2 [<W=b)^ w * a,1 "" m) ] 

where gN(N A , N AA ) is the number of ways of distributing 7V~ A and 7V AA over TV lattice sites and z A = exp(P|i A ) 
and z B = exp(P|u B ) are the fugacities of A and B, respectively. Comparing the grand PF for the binary alloy 
with the corresponding PFs for the lattice gas and Ising model leads to the following transcription table: 

Ising model Lattice gas Binary alloy 

-4./ £ *aa + £hh - 2ff A B 

-2{qJ + H) fi /i A +/i K + g(£AA -£ab) 

When 2s AB > (2^+ £ bb)' ^ e binary alloy corresponds to an Ising ferromagnet (/> 0) and the system splits 
into two phases: one rich in A and the other rich in component B below the critical temperature T Q . On the 
other hand, when 2s AB < (s AA + £ bb)' ^ e s y stem corresponds to an antiferromagnet: the ordered phase below 
the critical temperature has A and B atoms occupying alternate sites. 


A2.3.8 MEAN-FIELD THEORY AND EXTENSIONS 

Our discussion shows that the Ising model, lattice gas and binary alloy are related and present one and the 
same statistical mechanical problem. The solution to one provides, by means of the transcription tables, the 
solution to the others. Historically, however, they were developed independently before the analogy between 
the models was recognized. 

We now turn to a mean-field description of these models, which in the language of the binary alloy is the 
Bragg-Williams approximation and is equivalent to the Curie-Weiss approximation for the Ising model. Both 
these approximations are closely related to the van der Waals description of a one-component fluid, and lead 
to the same classical critical exponents a = 0, p = 1/2, 8 = 3 and y = 1. 

As a prelude to discussing mean-field theory, we review the solution for non-interacting magnets by setting J 
= in the Ising Hamiltonian. The PF 
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N 


S t ^±\ Stf— ±1 St=±\ Sff=±\ 1 = 1 


Vv=±l ' 


(A2.3.367) 
N{pH) 


where the third step follows from the identity of all the N lattice sites. The magnetization per site 

m(H t T) = -J- i^f] =VAZiUfiH) (A2.3.368) 

and the graph of m{H,T) versus //is a symmetrical sigmoid curve through the origin with no residual 
magnetization at any temperature when H= 0. This is because there are no interactions between the sites. We 
will see that a modification of the local field at a site that includes, even approximately, the effect of 
interactions between the sites leads to a critical temperature and residual magnetization below this 
temperature. 

The local field at site i in a given configuration is 

Hi = J J^Si +H = qJ(Si) + H - J J^ist - ft}) (A2 3 369) 

{/) it) 

where the last term represents a fluctuation from the average value of the spin (s f ) at site i which is the 
magnetization m{H,T) per site. In the mean-field theory, this fluctuation is ignored and the effective mean 
field at all sites is 

flW = qJm(H, T) + //. (A2.3.370) 

Substituting this in the expressions for the PF for non-interacting magnets with the external field replaced by 
the effective field H ^ we have 

ZctriN. H, T) = 2 ,V COslv v ^,/M(W l 7 )- H] (A2.3.371) 

and by differentiation with respect to H, 

m(H, T) = toTih[p(qJm(H. T) +■ H)] (A2.3.372) 


from which it follows that 


m- 


&ttf2fHqJM + //)]. (A2.3.373) 
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Since dG = -S dT-M AH, integration with respect to H yields the free energy per site: 

G k'f , f(l -m 2 )l t/Jm 2 

N=— ln L — ^ — J ' ^~ 


(A2.3.374) 


where the first two terms represent the energy contribution, and the last term is the negative of the temperature 


times the contribution of the entropy to the free energy. It is apparent that this entropy contribution 
corresponds to ideal mixing. 

At zero field (H=0), 


m(Q,T) =lmh[flqJm(Q H T)] 


(A2.3.375) 


which can be solved graphically by plotting tanh[P q Jrn(0,T)] versus m(0, T) and finding where this cuts the 
line through the origin with a slope of one, see figure A2.3.24. 

Tanh pqjm 



Figure A2.3.24 Plot of tanh[(3 qJm(0,T)] versus m(0,7) at different temperatures. 
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Since 


xmhifiqJm) = fiqJm - (l/3)(/Jf/ Jm) 3 +■ 


(A2.3.376) 


the slope as m — » is P q J. A solution exists for the residual magnetization when the slope is greater than 1 . 
This implies that the critical temperature T Q = qj/k, which depends on the coordination number q and is 
independent of the dimensionality. Table A2.3.5 compares the critical temperatures predicted by mean-field 
theory with the 'exact' results. Mean-field theory is seriously in error for ID systems but its accuracy 
improves with the dimensionality. For D > 4 it is believed to be exact. 


It follows from our equation (A2.3.373) that 


1 , /1+"'\ 
H = — In [ [ - qJm + 


(A2.3.377) 


The magnetization is plotted as a function of the field in figure A2.3.25. 



Figure A2.3.25 The magnetic field versus the magnetization m(H,T) at different temperatures. 

When T> T Q ,m = and the susceptibility at zero field %j(0) > 0. At T= T c , (dH/dm) T H = q = which implies 
that the susceptibility diverges, i.e. %j(0) = go. 

When T< T , the graph of //versus m shows a van der Waals like loop, with an unstable region where the 

susceptibility Xj(0) < 0- I n the limit H—> , there are three solutions for the residual magnetization m = (-/Wq, 
0, Wq), of which the solution m = is rejected as unphysical since it lies in the unstable region. The 
symmetrically disposed acceptable solutions for the residual magnetizations are solutions to 
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fli = tanh[/£f/Jm ] 


= tatih [('/;/ 7>m] = (7 c /7)mo + (]/3)[(7 c /7)mof + 


from which it follows that as T — » T 


iL/2 


m 9 *yt*iTfTt)[l-TfTt)] 


i a 


(A2.3.378) 


This shows that the critical exponent p = 1/2. 
The susceptibility at finite field H is given by 


Xr(ff)=0M/Ofl)r = 


I -fiqJO -m*)' 


(A2.3.379) 


Recalling that $qJ= T IT, the susceptibility at zero field {H — > 0) 


Xt(0) = 


(1 -«J) 


?i h 


A[r/-7' c )+«] 


(A2.3.380) 


For T> T c ,niQ = and 


XT-(O) ^A\0- r c )-r = (\/kXT - 7t)- [ (A2.3.381) 

which shows that the critical exponent y = 1 and the amplitude A 1 of the divergence of %j(0) is N/k, when the 
critical point is approached from above T Q . When T < 7^, ml ^ 3(7*- - D/T^and 

Xt(0) = A 2 {T - T,r y = (l/2k)(T- TcT 1 (A2.3.382) 

which shows that the critical exponent y remains the same but the amplitude A 2 of the divergence is l/2£ when 
the critical point is approached from below T . This is half the amplitude when T Q is approached from above. 

Along the critical isotherm, T= T c , 

H = kTdl/2)Mtt + m)/0-mu)-mo] 

== k T v [m u + m*/3 + m„/5 + m v ] ^ A T,.^/3 + - - - . 

It follows that the critical exponent 8 defined by " ^ tlI ms 3. 
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Fluctuations in the magnetization are ignored by mean-field theory and there is no correlation between 
neighbouring sites, so that 


iSiSj) = bt)iSj) (A2.3.384) 

and the spins are randomly distributed over the sites. As seen earlier, the entropy contribution to the free 
energy is that of ideal mixing of up and down spins. The average energy 


<"*>= - j (e^-"(x>) 


-/2>>Cf,)-jr2>) < A2 - 3 - 385 ) 

= - J(Nq/2)m 2 - HNm 

which is in accord with our interpretation of the terms contributing to the free energy in the mean-field 
approximation. Since m = at zero field for T> T Q and m = ±m^ at zero field when T< T c , the configurational 
energy at zero field (H = 0) is given by 

W*lH-«S>-[° mq mml T T<T,. <A2 ' 3386 » 

This shows very clearly that the specific heat has a jump discontinuity at T= T Q : 

(A2.3.387) 


C//=0(7) = f i/JV)<DtMf/ = o)/qt) 

o r > ?c 


■f 


The neglect of fluctuations in mean-field theory implies that 

[WjaUf [TT]«[t] 2 [t4]^[t]U] (A2.3.388) 

and it follows that 


it;] 2 4" 


(A2.3.389) 
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This is the equilibrium constant for the 'reaction' 


[U] + [tf] = 2[U] (A2.3.390) 

assuming the energy change is zero. An obvious improvement is to use the correct energy change (47) for the 
'reaction', when 

[Witt] l 

1 JL ' J = -ZXpiAJfkT), (A2.3.391) 

This is the quasi-chemical approximation introduced by Fowler and Guggenheim [98] which treats the 
nearest-neighbour pairs of sites, and not the sites themselves, as independent. It is exact in one dimension. The 
critical temperature in this approximation is 

T, = QJ/k)[l/ln(<i/{q - 2»] (A2.3.392) 

which predicts the correct result of T Q = for the ID Ising model, and better estimates than mean-field theory, 
as seen in table A2.3.5, for the same model in two and three dimensions (d = 2 and 3). Bethe [ 99 ] obtained 
equivalent results by a different method. Mean-field theory now emerges as an approximation to the quasi- 
chemical approximation, but the critical exponents in the quasi-chemical approximation are still the classical 
values. Figure A2. 3. 26 shows mean-field and quasi-chemical approximations for the specific heat and residual 
magnetization of a square lattice (d = 2) compared to the exact results. 

Table A2.3.5 Critical temperatures predicted by mean-field theory (MFT) and the quasi-chemical (QC) 
approximation compared with the exact results. 
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Figure A2.3.26 Mean-field and quasi-chemical approximations for the specific heat and residual 
magnetization of a square lattice (d = 2) compared to the exact results. 
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A2.3.8.1 LANDAU'S GENERALIZED MEAN-FIELD THEORY 

An essential feature of mean-field theories is that the free energy is an analytical function at the critical point. 
Landau [ 100 ] used this assumption, and the up-down symmetry of magnetic systems at zero field, to analyse 
their phase behaviour and determine the mean-field critical exponents. It also suggests a way in which mean- 
field theory might be modified to conform with experiment near the critical point, leading to a scaling law, 
first proposed by Widom [ 101 ], which has been experimentally verified. 

Assume that the free energy can be expanded in powers of the magnetization m which is the order parameter. 
At zero field, only even powers of m appear in the expansion, due to the up-down symmetry of the system, 
and 


G = Go + G 2 m 2 +tf4m 4 + - 


(A2.3.393) 


where the coefficients a f are temperature dependent and a 4 > but a 2 may be positive, negative or zero as 
illustrated in figure A2.3.27. 
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Figure A2.3.27 The free energy as a function of m in the Landau theory for (a) $a_2 > 0, (b)$a 2 = 0, (c)$a 2 < 
and (d) $a 2 < with a 4 > 0. 
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A finite residual magnetization of ±m Q is obtained only if a 2 < 0; and the critical temperature corresponds to 
a 2 = 0. Assume that a 2 is linear in t = (T- TJ/T , near T , when 


a 2 ** aU- (A2.3.394) 

The free energy expansion reads 

(J = <7fl + u*im 2 + u 4 m J + - - - - (A2.3.395) 

The equilibrium magnetization corresponds to a minimum free energy which implies that 

(dG/dm) = = 2^ tm + 4a 4 m 3 + ■ - . 
It follows that 

™0 = [a|/(2a4)](-') (A2.3.396) 

which implies that m Q is real and finite if t < (i.e. T< T Q ) and the critical exponent P = 1/2. For f the only 
solution is m Q = 0. Hence m Q changes continuously with t and is zero for t > 0. 

Along the coexistence curve, f < 0, 

2 , ° 4 (A2.3.397) 

and the specific heat 

i /-i2/-\ f ° ' > ° 


(A2.3.398) 


There is jump discontinuity in the specific heat as the temperature passes from below to above the critical 
temperature. 

To determine the critical exponents y and 8, a magnetic interaction term -hm is added to the free energy and 

G = Gn- hm + aitm 2 + fl 4 m 4 + ■ - ■ ■ (A2.3.399) 
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Minimizing the free energy with respect to m, one finds 

h = 2^/m + 434m 3 + • • ■ . (A2.3.400) 

Along the critical isochore, t = and 

h *s4a 4 JW 5 (A2.3.401) 


which implies that 8 = 3. 

For t * 0, the inverse susceptibility 

^■(G)" 1 == (dfr/dm)r = 2ihf + L2« 4 w 3 (A2.3.402) 

which, as h — » 0, leads to 

jf-HO)- 1 = (d/j/dw), = 2a$t + L2ff 4 mJ. (A2.3.403) 

When f > 0, m Q = and 

X7<0)/f-0 = (l/2^)r _l (A2.3.404) 

while for »<°. m = [^/(2«4)](-0 and 

Xr(0)n=o = -(l/4^)r 3 . (A2.3.405) 

This implies that the critical exponent y = 1, whether the critical temperature is approached from above or 
below, but the amplitudes are different by a factor of 2, as seen in our earlier discussion of mean-field theory. 
The critical exponents are the classical values a = 0, p = 1/2, 8 = 3 and y = 1. 

The assumption that the free energy is analytic at the critical point leads to classical exponents. Deviations 
from this require that this assumption be abandoned. In mean-field theory, 

ft = rtm(/+ftm ? )+-- (A2.3.406) 
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near the critical point, which implies (3 = 1/2 and y = 1. Modifying this to 

h = am(t H-fcm 1 ^^) (A2.3.407) 

implies that 8=1+ y/(3, which is correct. Widom postulated that 

h = m<t>(t,m [/fi ) (A2.3.408) 

where § is a generalized homogeneous function of degree y 

<j?{)J. (Xm) i/P ) = k Y tj?{t, m l/ &). (A2.3.409) 

This is Widom's scaling assumption. It predicts a scaled equation of state, like the law of corresponding 
states, that has been verified for fluids and magnets [ 102 ]. 

A2.3.9 HIGH- AND LOW-TEMPERATURE EXPANSIONS 


Information about the behaviour of the 3D Ising ferromagnet near the critical point was first obtained from 
high- and low-temperature expansions. The expansion parameter in the high-temperature series is tanh K, and 
the corresponding parameter in the low-temperature expansion is exp(-2AT). A 2D square lattice is self-dual in 
the sense that the bisectors of the line joining the lattice points also form a square lattice and the coefficients 
of the two expansions, for the 2D square lattice system, are identical to within a factor of two. The singularity 
occurs when 


tanhtf =exp(-2tfh 


(A2.3.410) 


Kramers and Wannier [ 103 ] used this to locate the critical temperature T = 2.27 'J/k. 


A2.3.9.1 THE HIGH-TEMPERATURE EXPANSION 


The PF at zero field 




(A2.3.411) 
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where K= p J and {s} implies summation over the spins on the lattice sites. Since s. s. = ±1, 

1 J 


expert's %y) = cxp(±A") = cosh K ±sinh K 
= cosh K t A'fSj sirih K 
= cosh £[1 + SiSj lanli K] 


(A2.3.412) 


from which it follows that 


Z(jV, 0, T) = (cosh K)**-* Y^ F[ (] * SiS J tanh K) 


M On 


= (cosh K)^ f2 J2 M + tonh K J2 St *J 


(A2.3.413) 


+ tonh 2 Kj2^jtt s ^ + 


where Ej is the sum over all possible sets of / pairs of nearest-neighbour spins. The expansion parameter is 
tanh K which — » as T^> go and becomes 1 as T^> 0. The expansion coefficients can be expressed 

graphically. A coefficient (ss)(sjjsj) . . . (s s ) of tanh r ATis the product or sum of products of graphs with r 
bonds in which each bond is depicted as aline joining two sites. Note also that 




*=±| 


even 
odd. 


(A2.3.414) 


Hence, on summing over the graphs, the only non-zero terms are closed polygons with an even number of 
bonds at each site, i.e. s f must appear an even number of times at a lattice site in a graph that does not add up 
to zero on summing over the spins on the sites. 


Each lattice point extraneous to the sites connected by graphs also contributes a factor of two on summing 
over spin states. Hence all lattice points contribute a factor of 2 N whether they are connected or not, and 

Z(N. 0. T) = (cosh K)^ f2 2 N ^ ;i(r, W)tanh' K (A2.3.415) 

where (for r ^ 0), n(r,N) is the number of distinct-side polygons (closed graphs) drawn on TV sites such that 
there are an even number of bonds on each site. For r = 0, no lattice site is connected, but define n(N) = 1. 
Also, since closed polygons cannot be connected on one or two sites, n(l, N) = n{2, N) = 0. The problem then 
is to count n(r,N) for all r. On an infinite lattice of identical sites, n(r,N) = Np(r) where p(r) is the number of r- 
side polygons that can be constructed on a given lattice site. This number is closely connected to the structure 
of the lattice. 
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A2.3.9.2 THE LOW-TEMPERATURE EXPANSION 

At zero field (see equation (A2.3.335) ), 


J^^V = [TT] + [HI - [tl] = qN/2 -2[U] (A2.3.416) 

and 

Z(JV,0,IT)= exp(f/A72)£exp(-2K[H]) 

(A2.3.417) 

= cxp(qN/2) 2^m<r, N)(-2Kr) 

r 

where rn(r,N) is the number of configurations with tt -11= r. The high- and low-temperature expansions are 
complementary. 

The dual lattice is obtained by drawing the bisectors of lines connecting neighbouring lattice points. Examples 
of lattices in two dimensions and their duals are shown in figure A2.3.28 . A square lattice is self-dual. 

Consider a closed polygon which appears in the high-temperature expansion. Put up spins [T] on the sites of 
the lattice inside the polygon and down spins [i] on the lattice sites outside. The spins across the sides of the 
closed polygons are oppositely paired. The PF can be calculated equally well by counting closed polygons on 
the original lattice (high-temperature expansion) or oppositely paired spins on the dual lattice (low- 
temperature expansion). Both expansions are exact. 
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Figure A2.3.28 Square and triangular lattices and their duals. The square lattice is self-dual. 
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For a 2D square lattice q = 4, and the high- and low-temperature expansions are related in a simple way 


2n(r. N) = m(r Y N) 


L 1 1 1 1 


(A2.3.418) 


where the factor two comes from the fact that reversing all the spins does not change the number of oppositely 
paired spins. The dual of the lattice is a square lattice. Hence the PFs in the two expansions have the same 


y.\ 


(A2.3.419) 


coefficients except for irrelevant constant factors: 

Z(JV, 0, T) = 2*(co$hK) 2N J2 " (r - N) lanh K 

Both expansions are exact and assuming there is only one singularity, identified with the critical point, this 
must occur when 

tanh K = exp(-2tf). (A2.3.420) 

With x = exp(-2if), this implies that 

x = {]-x)/(\+x) 

which leads to a quadratic equation 

* J +2x- 1 =0. (A2.3.421) 

The solutions are x = -1 ±V2. Since K = p J is necessarily not negative, the only acceptable solution is x = -1+ 
V2. Identifying the singularity with the critical point, the solution x = Qxp(2K c ) = -1+V2 is equivalent to the 
condition 

sirih(2tf c ) = sinh(2J/Ar c ) = 1 (A2.3.422) 

from which it follows that the critical temperature T Q = 2.27 'J/k. This result was known before Onsager's 
solution to the 2D Ising model at zero field. 

More generally, for other lattices and dimensions, numerical analysis of the high-temperature expansion 
provides information on the critical exponents and temperature. The high-temperature expansion of the 
susceptibility may be written in powers of K = p J as 
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*r<0) = J^tfJ)' 1 . (A2.3.423) 


n=0 


Suppose the first n + 1 coefficients are known, where n :^15. The susceptibility diverges as (l-p/P c ) y as P — > 
P c - and we have 

[y{y + l)...(y<n-l) , 1 (A2.3.424) 

1 + y tf/A) + ■ ■ + — ^f -WW + -[ 


For large n 


a„J n **A 


y(y + l)..-(K + *- 1) 


(A2.3.425) 


Taking the ratio of successive terms and dividing by the coordination number q 


a ft __ AT, / y-l \ 
qa a -\ qJ\ n J 


(A2.3.426) 


Plotting r versus \ln gives kT IqJas the intercept and (kT /qJ)(l-y) as the slope from which T Q and y can be 
determined. Figure A2. 3. 29 illustrates the method for lattices in one, two and three dimensions and compares 
it with mean-field theory which is independent of the dimensionality. 
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Figure A2.3.29 Calculation of the critical temperature T and the critical exponent y for the magnetic 
susceptibility of Ising lattices in different dimensions from high-temperature expansions. 


A2.3.10 EXACT SOLUTIONS TO THE ISING MODEL 


The Ising model has been solved exactly in one and two dimensions; Onsager's solution of the model in two 
dimensions is only at zero field. Information about the Ising model in three dimensions comes from high- and 
low-temperature expansions pioneered by Domb and Sykes [ 104 ] and others. We will discuss the solution to 
the ID Ising model in the presence of a magnetic field and the results of the solution to the 2D Ising model at 
zero field. 


A2.3.10.1 ONE DIMENSION 


We will describe two cases: open and closed chains of TV sites. For an open chain of N sites, the energy of a 
spin configuration {s k } is 


N-] 

r-l i~\ 


U N ({s k \) = -J J^A^.i - H^2 x i (A2.3.427) 
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and for a closed chain of TV sites with periodic boundary conditions s^ +1 = s^ 

v /f jV 

^({ftD = -Jj^jp5i +l - y X ( * +Jw). (A2.3.428) 

j=L j=I 

Both systems give the same results in the thermodynamic limit. We discuss the solution for the open chain at 
zero field and the closed chain for the more general case ofH^Q. 

(a) Open chain at zero field, i.e. H = 

ThePF 

Z(JV,0,7')= JV.. J^expf^^j,^,) 

£, ±1 *v=±l ^ i=] ' 

(A2.3.429) 

j»=dbl *w=±! ^ is I ' ,r*r=±! 


Doing the last sum 

Z(tt, 0, T) = Z(N - L 0. T)[exp(fiJxs-\) + optf-tar-i >] 

(A2.3.430) 

since s^_ 1 = ± 1. Proceeding by iteration, starting from TV = 1, which has just two states with the spin up or 
down 

2(1,0,7) = 2 

Z(2,0, T) = ZCL0.r)2cosh(^;) =2 2 cosh(^7) 

Z(3, «J, 7) = 2" 1 coslr(/f.S) (A2.3.431) 

Z(AF t O f T) =2 v cosh > " 1 {^i). 

The free energy G in the thermodynamic limit (N —> go) follows from 


BG I 

•- — = lim — lnZ(N>0>T) 

N ft'-w N 


= ln2+ lim [ — — } ]ncosh(^7) = ln[2ca&h(/3Jj]. 
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(^ Closed chain, H^ 
The PF in this case is 


f|-=l J -V -±l L\ ft-| / z ft- — l -I 


— ^_j - ■ ■ ^^ "x|.^ "^y; ■ ■ ■ Tiy.ii 


Jl=±I Sy=±\ 


where 5 i J 2are the elements of a 2 x 2 matrix called the transfer matrix 


\P-u P-\-v) \ 


e\p#(/ > //} exp(-07 
exp(-07) exp^(7 


07) \ 

-m) 


with the property that s ' 4 Sl * : ""' S{S * . It follows for the closed chain that 

.<i=±l 
where P is also a 2 x 2 matrix. 
The trace is evaluated by diagonalizing the matrix P using a similarity transformation S: 


p' = s-'ps 


■ft i) 


where the diagonal elements of the matrix P' are the eigenvalues of P, and 


"vo j^;- 


(A2.3.432) 


(A2.3.433) 


(A2.3.434) 


(A2.3.435) 


(A2.3.436) 


(A2.3.437) 
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Noting that 

p'*' = s-'pss^ps.-.s-'ps = s-'p a s 

by virtue of the property that SS = I, where I is the identity matrix, we see that 

Tt[P' ,v ) = Tr[S-'P A S] = Tr[S _l SP' v ] = Tr[P v ] 
which leads to 


Z{N, H, T) = k* 1 \*. (A2.3.438) 

Assuming the eigenvalues are not degenerate and X + > X_, 

Z(N>H t T) = )^[\ + (k-/k + y v l 

In the thermodynamic limit of N—> oo, 

^-^ = lim - In Z(N t H, T) = InJL. (A2.3.439) 

This is an important general result which relates the free energy per particle to the largest eigenvalue of the 
transfer matrix, and the problem reduces to determining this eigenvalue. 

The eigenvalues of the transfer matrix are the solutions to 

det |P— Xl| = 0. 

This leads to a quadratic equation whose solutions are 

i± = exp(/U){co<>h{0tf ) ± [s\nb 2 (fifi) + exp(-4/U)] ,/: ] (A2.3.440) 

which confirms that the eigenvalues are not degenerate. The free energy per particle 


A' 


= J + ln{cosh(£tf) < [smh 2 (0H) *■ exp(-40y)] ]i }. (A2.3.441) 
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This reduces to the results for the free energy at zero field (H = 0) 

-fiG 


N 

and the free energy of non-interacting magnets in an external field 


= \n[2cosA\{ ft J)] (A2.3.442) 


(A2.3.443) 


— ^— = ln[2cosh(/fW>] 

iV 

which were derived earlier. At finite T(i.e. T> 0), k + is analytic and there is no phase transition. However, as 

= exp(/0[cosh(/j) + |sinh(fc)|(! + 0(exp(-4JQ)] 

where K = P /and h = $H. But cosh(/!)+|sinh(/;)| = exp|/j|, and it follows that, 

a, -* exp(£ i \h\) 

as T— » 0. We see from this that as T— » 

-2.=kT\nk+ = kT[K + \h\] = J + \H\ (A2.3.444) 


and 


m 


= N{jH)r\-i *<q (A23445) 


which implies a residual magnetization m Q = ± 1 at zero field and a first-order phase transition at T= 0. For 7 
^ 0, there is no discontinuity in m as //passes through zero from positive to negative values or vice versa, and 
differentiation of G with respect to Hat constant T provides the magnetization per site 

m[H t I) = 1 — (A2.3.446) 

which is an odd function of //with m — > as H — > 0. Note that this reduces to the result 

ttt{H ; T) = tailJltfftf ) (A2.3.447) 

for non-interacting magnets. 
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As H -> , sinh(P J) -> p J, m(H, T) -> p H exp(2p J) and 

As /J -f 0, sinhifiJ) -f fiJ,m{H, 7") -»■ 0tfexp(20J) and 

Xi (0) = (d/«/dtf ),- = pcxpilfU) 

which diverges exponentially as T — > 0, which is also characteristic of a phase transition at T = 0. 
The average energy (E) follows from the relation 


(A2.3.448) 


{E)/ N = -{ 1 / NHd In Z/d/?)„. j = - (d In k-/dp) H , s (A2.3.449) 

and at zero field 

(E)ii=a/N = -Jtanh(pj). (A2.3.450) 

The specific heat at zero field follows easily, 

C„= = -^ ( d< ^ = ' > ) = >W> 2 Kch 2 (fiJ) (A2.3.451) 

and we note that it passes through a maximum as a function of T. 

The spin correlation functions and their dependence on the distance between sites and the coupling between 
adjacent sites are of great interest in understanding the range of these correlations. In general, for a closed 
chain 

faSi+n) = Z(N+H,T)- 1 Yi — 51 A ' Vf hfl e * P ( ^ K Wi>\ + *Jf )- (A2.3.452) 

Jt-±] -. J M-l ' 

For nearest-neighbour spins 

(sj*;n) = [NZ(N, H, T)]- ] [dZ(N> H, T)/dK] (A2.3.453) 

and making use of ^™ ■ "* -O = *+ U + (a_/a+) ]j n ^ thermodynamic limit (7V^> oo) 

tes M ) = (3lnii/3JC) 

2exp(-4tf )[sinlrA + exp(-4/0]~ l/2 (A2.3.454) 


= I 


eosh/j + [siiilr/7 + exp(-4JO] ] '- 
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At zero field (H=0),h = and 


tfc*j-n) =tanh^ (A2.3.455) 

which shows that the correlation between neighbouring sites approaches 1 as T^> 0. The correlation between 
non-nearest neighbours is easily calculated by assuming that the couplings (K^ K^ K^ . . . , Kj^) between the 
sites are different, in which case a simple generalization of the results for equal couplings leads to the PF at 
zero field 

rV-L 

Z(N. 0. T) = 2 V [ ] cosh Kj. (A2.3.456) 

; = i 

Repeating the earlier steps one finds, as expected, that the coupling K ( between the spins at the sites i and i + 1 
determines their correlation: 


{sjS i+} ) = Z~ ] (dZ(N t H, T)/dKi ) = taiih *,. (A2.3.457) 

2 _ 1 
Now notice that since J f+i — % 




(A2.3.458) 


In the limit if. = K.,< = K, 

l z+l ' 

(SjSiu) = tanh 3 K (A2.3.459) 

and repeating this argument serially for the spin correlations between i and i + n sites 

(flfr*) = taah fl A" (A2.3.460) 

so the correlation between non-neighbouring sites approaches 1 as T — » since the spins are all aligned in this 
limit. 

The correlation length Q follows from the above relation, since 

{s t s^j) = exp(j In tanh K) = cxp(-y Incnth K) = exp(-jf/£) (A2.3.461) 

from which it follows that 

f = 1/ In coth(/f). (A2.3.462) 
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As expected, as T — » 0, K — » go and the correlation length C, « exp(P J)/2 — » 00, while in the opposite limit, as T 

-> 00, £ -> 0. 

42.3. f 0.2 7WO DIMENSIONS 

Onsager's solution to the 2D Ising model in zero field (H= 0) is one of the most celebrated results in 
theoretical chemistry [ 105 ]; it is the first example of critical exponents. Also, the solution for the Ising model 
can be mapped onto the lattice gas, binary alloy and a host of other systems that have Hamiltonians that are 
isomorphic to the Ising model Hamiltonian. 

By a deft application of the transfer matrix technique, Onsager showed that the free energy is given by 

= In cosh (2fi J) + — J d^ In — - — (A2.3.463) 


N 

where 


2siiM2tU) 

" = Si2W <A2 ' 3464 ' 

which is zero at T= and T= oo and passes through a maximum of 1 when p J = 0.440 69. This corresponds 

c 2 

to a critical temperature T c = 2.269 Jlk when a singularity occurs in the Gibbs free energy, since [1+(1 - k 

sin 2 4>) 1/2 ] -> as r-> T Q and <\> -> tt/2. As T-> 7^, 

C W= *> fe — 7- In |r - 7,1"' (A2.3.465) 

it kl c 

so that the critical exponent a = 0j . The spontaneous magnetization 


wo = 


[i-sinh- 4 (2j9/)i 1 '» r<rl (AZ3 " 466) 


and the critical exponent p = 1/8. This result was first written down by Onsager during a discussion at a 
scientific meeting, but the details of his derivation were never published. Yang [ 107 ] gave the first published 
proof of this remarkably simple result. The spin correlation functions at T= T Q decay in a simple way as 
shown by Kaufman and Onsager [ 106 ], 

[SiSl+j)l/r l/4 (A2.3.467) 

where r is the distance between the sites. 
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A2.3.11 SUMMARY 

We have described the statistical mechanics of strongly interacting systems. In particular those of non-ideal 
fluids, solids and alloys. For fluids, the virial coefficients, the law of corresponding states, integral equation 
approximations for the correlation functions and perturbation theories are treated in some detail, along with 
applications to hard spheres, polar fluids, strong and weak electrolytes and inhomogeneous fluids. The use of 
perturbation theory in computational studies of the free energy of ligand binding and other reactions of 
biochemical interest is discussed. In treating solids and alloys, the Ising model and its equivalence to the 
lattice gas model and a simple model of binary alloys, is emphasized. Mean-field approximations to this 
model and the use of high- and low-temperature approximations are described. Solutions to the ID Ising 
model with and without a magnetic field are derived and Onsager' s solution to the 2D case is briefly 
discussed. 
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A 2.4 Fundamentals of electrochemistry 

Andrew Hamnett 


Electrochemistry is concerned with the study of the interface between an electronic and an ionic conductor 
and, traditionally, has concentrated on: (i) the nature of the ionic conductor, which is usually an aqueous or 
(more rarely) a non-aqueous solution, polymer or superionic solid containing mobile ions; (ii) the structure of 
the electrified interface that forms on immersion of an electronic conductor into an ionic conductor; and (iii) 
the electron-transfer processes that can take place at this interface and the limitations on the rates of such 
processes. 

Ionic conductors arise whenever there are mobile ions present. In electrolyte solutions, such ions are normally 
formed by the dissolution of an ionic solid. Provided the dissolution leads to the complete separation of the 
ionic components to form essentially independent anions and cations, the electrolyte is termed strong. By 
contrast, weak electrolytes, such as organic carboxylic acids, are present mainly in the undissociated form in 
solution, with the total ionic concentration orders of magnitude lower than the formal concentration of the 
solute. Ionic conductivity will be treated in some detail below, but we initially concentrate on the equilibrium 
structure of liquids and ionic solutions. 


A 2 .4.1 THE ELEMENTARY THEORY OF LIQUIDS 

Modern-day approaches to ionic solutions need to be able to contend with the following problems: 

(1) the nature of the solvent itself, and the interactions taking place in that solvent; 

(2) the changes taking place on the dissolution of an ionic electrolyte in the solvent; 


(3) macroscopic and microscopic studies of the properties of electrolyte solutions. 

Even the description of the solvent itself presents major theoretical problems: the partition function for a 
liquid can be written in the classical limit [L, 2] as 

where (3 = llkT and the integral in (1) is over both spatial and orientation coordinates (i.e. both Cartesian and 
Eulerian coordinates) of each of the TV molecules, and is termed the configurational integral, Z N . In this 
equation, q is the partition coefficient for the internal degrees of freedom in each molecule (rotational, 
vibrational and electronic), A is the translational partition function hi (In mkT) and UyfX") is the energy 
associated with the instantaneous configuration of the TV molecules defined by Jr\ Clearly, the direct 
evaluation of (1) for all but the simplest cases is quite impossible, and modern theories have made an indirect 
attack on (1) by defining distribution functions. If we consider, as an example, two particles, which we fix 
with total coordinates X 1 and X 2 , then the joint probability of finding particle 1 in volume d X 1 and particle 2 
in volume d X 2 is 


P a \X^Xi)$X\ dJC 3 = dJC|d-V 5 f ... / dXy iX N ?iXi>X 2 .Xi Xy) (A2.4.2) 

where P(Xj,. . ., X^) is the Boltzmann factor: 


f.<.f&XXzxp[-ftU N (X*)]' 

In fact, given that there are N(N-l) ways of choosing a pair of particles, the pair distribution function, p' 2 ) 

(X x , X 2 ) dX dX 2 = N(N - 1 ) P (2) (X x , X 2 ) dXdJ. is the probability of finding any particle at X in volume d 
X^ ana a different particle atX 2 in volume d X 2 . A little reflection will show that for an isotropic liquid, the 

value of p^ 1 )^) is just the number density of the liquid, p = N/V, since we can integrate over all orientations 
and, if the liquid is isotropic, its density is everywhere constant. 

A2.4. 1. 1 CORRELA TION FUNCTIONS 

The pair distribution function clearly has dimensions (density) 2 , and it is normal to introduce the pair 
correlation function g(X, , X 2 ) defined by 


g(JT,.XJ- '' 3l( *'-* ) (A2 4.3, 


and we can average over the orientational parts of both molecules, to give 

(A2.4.4) 
Given that 7^ l can be arbitrarily chosen as anywhere in the sample volume of an isotropic liquid in the absence 


(8x~V J J 


of any external field, we can transform the variables R^, R 2 into the variables R^ R^ where R^ 2 is the vector 
separation of molecules 1 and 2. This allows us to perform one of the two integrations in (equation A2.4.2) 
above, allowing us to write the probability of a second molecule being found at a distance r ( = |r 12 |) from a 
central molecule as pg(r)r dr sin d d (|). Integration over the angular variables gives the number of 
molecules found in a spherical shell at a distance r from a central molecule as 


N(r)dr = 47rr 2 pg{r)dr. 


(A2.4.5) 


The function g(r) is central to the modern theory of liquids, since it can be measured experimentally using 
neutron or x-ray diffraction and can be related to the interparticle potential energy. Experimental data [1] for 
two liquids, water and argon (iso-electronic with water) are shown in figure A2.4.1 plotted as a function of R* 
= R /a, where a is the effective diameter of the species, and is roughly the position of the first maximum in g 
(R). For water, a = 2.82 A, 


very close to the intermolecular distance in the normal tetrahedrally bonded form of ice, and for argon, a = 
3.4 A. The second peak for argon is at R* = 2, as expected for a spherical molecular system consisting roughly 
of concentric spheres. However, for water, the second peak in g(r) is found at R* = 1.6, which corresponds 
closely to the second-nearest-neighbour distance in ice, strongly supporting the model for the structure of 
water that is ice-like over short distances. This strongly structured model for water in fact dictates many of its 
anomalous properties. 


OCR*) 



Figure A2.4.1. Radial distribution function g(R*) for water (dashed curve) at 4 °C and 1 atm and for liquid 
argon (full curve) at 84.25 K and 0.71 atm as functions of the reduced distance R* = R/a, where a is the 
molecular diameter; from [JJ. 

The relationship between g(r) and the interparticle potential energy is most easily seen if we assume that the 
interparticle energy can be factorized into pairwise additive potentials as 


Uu{X y ) = U N {X l ,X 2 X N ) = J2u{r u ) 


(A2.4.6) 


T<J 


where the summation is over all pairs i, j. From equation A2.4.1 we can calculate the total internal energy U 


as 


f9hkQ\ I f 1 f .. r ^ 
U --{ — -\ = — r^ / dfl A -}itin t ) 


?xp(-i/jv) 


3NQ 9 A 

A ^ 


(A2.4.7) 


and where, for simplicity, we have ignored internal rotational and orientational effects. For an isotropic liquid, 
the summation in A2.4.6 over pairs of molecules yields N(N- l)/2 equal terms, which can be written as the 
product of a two-particle integral over the u(r^ 2 ) and integrals of the type shown in A2.4.2 above. After some 
algebra, we find 


U = -NkT+ - f dn / dr2if(ni>p (ai (ri.ra) = -NkT+— f dr4xr 2 g{r)utr). 
2 2 Jv Jv 2 2 Jn 


(A2.4.8) 


To the same order of approximation, the pressure P can be written as 


— = 1 - -r* / dr4.Ti- — — g(r) 


(A2.4.9) 


where in both A2.4.8 and A2.4.9 the potential u(f) is assumed to be sufficiently short range for the integrals to 
converge. Other thermodynamic functions can be calculated once g(r) is known, and it is of considerable 
importance that g(r) can be obtained from u(r) and, ideally, that the inverse process can also be carried out. 
Unfortunately, this latter process is much more difficult to do in such a way as to distinguish different possible 
u(r)s with any precision. 

Clearly, the assumption of pairwise additivity is unlikely to be a good one for water; indeed, it will break 
down for any fluid at high density. Nonetheless, g(r) remains a good starting point for any liquid, and we need 
to explore ways in which it can be calculated. There are two distinct methods: (a) solving equations relating g 
(r) to u(r) by choosing a specific u(r); (b) by simulation methods using molecular dynamic or Monte Carlo 
methods. 

There are two approaches commonly used to derive an analytical connection between g(r) and u(r): the 
Percus-Yevick (PY) equation and the hypernetted chain (HNC) equation. Both are derived from attempts to 
form functional Taylor expansions of different correlation functions. These auxiliary correlation functions 
include: 

(i) the total correlation function, 

h{v U T2) -£(7-1,7-2)- 1; (A2.4.10) 

(ii) the background correlation function, 

y{T\ ,r 2 ) = g(ru r 2 ) exp[j8u(n f r 2 )]; (A2.4.1 1) 

(iii) the direct correlation function, C(r^ r 2 ), defined through the Ornstein-Zernike relation: 

h{T U T 2 )-C{Ti,r 2 )=p f dr 3 ft(r],r 3 )C(r Jt r 2 ), (A2.4.12) 


The singlet direct correlation function 0-\r) is defined through the relationship 

C 1 V) = ln[p* l} (r)A ? ] + fi[w(r) - fi] (A2.4.13) 

where p^ 1 ) is as defined above, |u is the chemical potential and w(r) the local one-body potential in an 
inhomogeneous system. 


The PY equation is derived from a Taylor expansion of the direct correlation function, and has the form 


y{v u r 2 ) *s 1 + P I dr } C(r 2 , r^)h{r^ n) (A2.4.14) 


■/dr. 


and comparison with the Ornstein-Zernike equation shows that C(r v r 2 ) « g(r^, r 2 ) -y(r^ r 2 ) ~y(r^ r 2 )f(r^ 
r 2 ), where/ (> l5 r 2 ) = exp[- (3w(r 1? r 2 )] - 1. Substitution of this expression into A2.4.14 finally gives us, in 
terms of the pair correlation coefficient alone 


g(ri,r s )expr i 4if(rnri)] = 1 +^ /d^g(r 3 ,r.0^ je(r! ^>[ e -^^ 


(A2.4.15) 
-l]|j(r3,ri)-l]. 

This integral equation can be solved by expansion of the integrand in bipolar coordinates [2, 3]. Further 
improvement to the PY equation can be obtained by analytical fit to simulation studies as described below. 

The HNC equation uses, instead of the expression for C(r^ r 2 ) from A2.4.14 above, an expression C(r^ r 2 ) « 
h(r^ r 2 ) - In (y(r^ r 2 )), which leads to the first-order HNC equation: 


\>r 2 ))*tp j 


ltt(y(r lf n)) **P I dr 3 C(r u r^H^r 2 ). (A2.4.16) 

Comparison with the PY equation shows that the HNC equation is nonlinear, and this does present problems 
in numerical work, as well as preventing any analytical solutions being developed even in the simplest of 
cases. 

In the limit of low densities, A2.4.15 shows that the zeroth-order approximation for g(r) has the form 

g(n,r 2 )^e-^ f " l ^ > (A2.4.17) 

a form that will be useful in our consideration of the electrolyte solutions described below. 
A2.4.1.2 SIMULATION METHODS 

Simulation methods for calculating g(r) have come into their own in the past 20 years as the cost of 
computing has fallen. The Monte Carlo method is the simplest in concept: this depends essentially on 
identifying a statistical or Monte Carlo approach to the solution of ( equation A2.4.2 ). As with all Monte Carlo 
integrations, a series of random values of the coordinates X^,. . .^ N is generated, and the integrand evaluated. 
The essential art in the technique is to pick predominantly configurations of high probability, or at least to 


eliminate the wasteful evaluation of the integrand for configurations of high energy. This is achieved by 
moving one particle randomly from the previous configuration, i, and checking the energy difference AC/ = 
U M - £/.. If A U< the configuration is accepted, and if A U> 0, the 


value of exp(- PACT) is compared to a second random number ^, where < £, < 1. If exp(- PA £/) > £, the 
configuration is accepted, otherwise it is rejected and a new single-particle movement generated. A second 
difficulty is that the total number of particles that can be treated by Monte Carlo techniques is relatively small 
unless a huge computing resource is available: given this, boundary effects would be expected to be dominant, 
and so periodic boundary conditions are imposed, in which any particle leaving through one surface re-enters 
the system through the opposite surface. Detailed treatments of the Monte Carlo technique were first 
described by Metropolis et al [4]; the method has proved valuable not only in the simulation of realistic 
interparticle potentials, but also in the simulation of model potentials for comparison with the integral 
equation approaches above. 

The alternative simulation approaches are based on molecular dynamics calculations. This is conceptually 
simpler that the Monte Carlo method: the equations of motion are solved for a system of N molecules, and 
periodic boundary conditions are again imposed. This method permits both the equilibrium and transport 
properties of the system to be evaluated, essentially by numerically solving the equations of motion 

d-7* 


m 


= Y* F(J?*;) = - J2 V ^^v> (A2.4.18) 


d '^ j=LUk J=Lj*k 


by integrating over discrete time intervals 8 t. Details are given elsewhere [2]. 


A 2.4.2 IONIC SOLUTIONS 

There is, in essence, no limitation, other than the computing time, to the accuracy and predictive capacity of 
molecular dynamic and Monte Carlo methods, and, although the derivation of realistic potentials for water is a 
formidable task in its own right, we can anticipate that accurate simulations of water will have been made 
relatively soon. However, there remain major theoretical problems in deriving any analytical theory for water, 
and indeed any other highly-polar solvent of the sort encountered in normal electrochemistry. It might be felt, 
therefore, that the extension of the theory to analytical descriptions of ionic solutions was a well-nigh hopeless 
task. However, a major simplification of our problem is allowed by the possibility, at least in more dilute 
solutions, of smoothing out the influence of the solvent molecules and reducing their influence to such 
average quantities as the dielectric permittivity, s m , of the medium. Such a viewpoint is developed within the 
McMillan-Mayer theory of solutions [1,2], which essentially seeks to partition the interaction potential into 
three parts: that due to the interaction between the solvent molecules themselves, that due to the interaction 
between the solvent and the solute and that due to the interaction between the solute molecules dispersed 
within the solvent. The main difference from the dilute fluid results presented above is that the potential 

energy u(r.) is replaced by the potential of mean force W(r.) for two particles and, for N particles of solute 

ij ij <x 

in the solvent, by the expression 

W{X""iz ti -> U) = -kT In £<""><**"; -, -* 0) 

where Z a is the so-called activity defined as z a = f^e"^'/A^(cf equation A2.4.1 ); it has units of number 
density. 


The McMillan-Mayer theory allows us to develop a formalism similar to that of a dilute interacting fluid for 
solute dispersed in the solvent provided that a sensible description of ^can be given. At the limit of dilution, 
when intersolute interactions can be neglected, we know that the chemical potential of a can be written as jlx^ 
= JF(a|s) + kTln (p a Ajjcy~ l ) ? where W(a\s) is the potential of mean force for the interaction of a solute 
molecule with the solvent. If we define y® = Ll U, r 8;r 2 fp M ) ■= e*" 1 (d|J, then the grand canonical partition 

function can be written in the form: 

5(7, V, m = Y, ^r 3{T * V, X s ) fc-^^ v ^Nx^ (A2.4.19) 

* h >o N * m * 

where we have successfully partitioned the solute-solute interactions into a modified configuration integral, 

ti- 
the solute-solvent interactions into ^ and the solvent-solvent interactions into the partition coefficient S 

(T,V,\). 

A2.4.2.1 THE STRUCTURE OF WATER AND OTHER POLAR SOLVENTS 

In terms of these three types of interactions, we should first consider the problems of water and other polar 
solvents in more detail. Of the various components of the interaction between water molecules, we may 
consider the following. 

(1 ) At very short distances, less than about 2.5 A, a reasonable description of the interaction will be strongly 
repulsive, to prevent excessive interpenetration; a Lennard- Jones function will be adequate: 


— m a -m 


(A2.4.20) 


(2) At distances of a few molecular diameters, the interaction will be dominated by electric multipole 
interactions; for dipolar molecules, such as water, the dominant term will be the dipole-dipole 
interaction: 

U DD (X]<X 2 )= R 12 [fit ■ /t 3 — 3(/ti -W|2)(/X2-U| 2 )] (A2.4.21) 

where u is a unit vector in the direction of the vector R 2 ~R V 

(3) At intermediate distances, 2.4 A < 7? < 4 A, there is a severe analytical difficulty for water and other 
hydrogen-bonded solvents; in that the hydrogen-bond energy is quite large, but is extremely orientation 
dependent. If the water molecule is treated as tetrahedral, with two O-H vectors h^, h f2 and two lone- 
pair vectors / /1? l f2 for the zth molecule, then the hydrogen-bond energy has the form 

1 


— £iittG ff (/Ty — /?][) 




(A2.4.22) 


an expression that looks unwieldy but is quite straightforward to apply numerically. The function G (x) 

9 9 

is defined either as unity for |x| < a and zero for |x| < a or in a Gaussian form: G = exp(- x 12a ). 
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The form of the hydrogen-bonded potential leads to a strongly structured model for water, as discussed above. 
In principle, this structure can be defined in terms of the average number of hydrogen bonds formed by a 
single water molecule with its neighbours. In normal ice this is four, and we expect a value close to this in 
water close to the freezing point. We also intuitively expect that this number will decrease with increasing 
temperature, an expectation confirmed by the temperature dependence of g(R) for water in figure A2.4.2 [JJ. 
The picture should be seen as highly dynamic, with these hydrogen bonds forming and breaking continuously, 
with the result that the clusters of water molecules characterizing this picture are themselves in a continuous 
state of flux. 


q(R) 


w- 



Figure A2.4.2. The temperature dependence of g(R) of water. From [1] 

A2.4.2.2 HYDRATION AND SOLVATION OF IONS 

The solute-solvent interaction in equation A2.4. 19 is a measure of the solvation energy of the solute species at 
infinite dilution. The basic model for ionic hydration is shown in figure A2.4.3 [5]: there is an inner hydration 
sheath of water molecules whose orientation is essentially determined entirely by the field due to the central 
ion. The number of water molecules in this inner sheath depends on the size and chemistry of the central ion; 

being, for example, four for Be 2+ , but six for Mg 2+ , Al 3+ and most of the first-row transition ions. Outside 
this primary shell, there is a secondary sheath of more loosely bound water molecules oriented essentially by 
hydrogen bonding, the evidence for which was initially indirect and derived from ion mobility measurements. 
More recent evidence for this secondary shell has now come from x-ray diffraction and scattering studies and 
infrared (IR) measurements. A further highly diffuse region, the tertiary region, is probably present, marking a 
transition to the hydrogen-bonded structure of water described above. The ion, as it moves, will drag at least 
part of this solvation sheath with it, but the picture should be seen as essentially dynamic, with the well 
defined inner sheath structure of figure A2.4.3 being mainly found in highly-charged ions of 


high electronic stability, such as Cr . The enthalpy of solvation of cations primarily depends on the charge 
on the central ion and the effective ionic radius, the latter begin the sum of the normal Pauling ionic radius 


and the radius of the oxygen atom in water (0.85 A). A reasonable approximate formula hasi 


AH^ = -rt95Z 2 /(r + + 0.85)[kJ mol -1 ]. 


(A2.4.23) 


In general, anions are less strongly hydrated than cations, but recent neutron diffraction data have indicated 
that even around the halide ions there is a well defined primary hydration shell of water molecules, which, in 

the case of Cl~ varies from four to six in constitution; the exact number being a sensitive function of 
concentration and the nature of the accompanying cation. 
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Figure A2.4.3. The localized structure of a hydrated metal cation in aqueous solution (the metal ion being 
assumed to have a primary hydration number of six). From [5]. 

(A) METHODS FOR DETERMINING THE STRUCTURE OF THE SOLVATION SHEATH 

Structural investigations of metal-ion hydration have been carried out by spectroscopic, scattering and 
diffraction techniques, but these techniques do not always give identical results since they measure in 
different timescales. There are three distinct types of measurement: 

(1) those giving an average structure, such as neutron and x-ray scattering and diffraction studies; 

(2) those revealing dynamic properties of coordinated water molecules, such as nuclear magnetic resonance 
(NMR) and quasi-elastic scattering methods; 

(3) those based on energetic discrimination between water in the different hydration sheaths and bulk water, 
such as IR, Raman and thermodynamic studies. 
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First-order difference neutron scattering methods for the analysis of concentrated solutions of anions and 

cations were pioneered by Enderby [6] and co-workers and some results for Ni 2+ plotted as Ag(r) are shown 
in figure A2.4.4 [5]. The sharp M-0 and M-D pair correlations are typical of long-lived inner hydration 
sheaths, with the broader structure showing the second hydration sheath being clearly present, but more 
diffuse. Note that the water molecule is tilted so that the D-O-D. . .M atoms are not coplanar. This tilt appears 
to be concentration dependent, and decreases to zero below 0.1 M NiCl 2 . It is almost certainly caused by 
interaction between the hydrogen-bonded secondary sheaths around the cations, a fact that will complicate the 
nature of the potential of the mean force, discussed in more detail below. The secondary hydration sheaths 


have been studied by large-angle x-ray scattering (LAXS). For Cr , a well defined secondary sheath 
containing 13 + 1 molecules of water could be identified some 4.02 ± 0.2 A distant from the central ion. The 
extended x-ray absorption-edge fine structure (EXAFS) technique has also been used to study the local 
environment around anions and cations: in principle the technique is ideally suited for this, since it has high 
selectivity for the central ion and can be used in solutions more dilute than those accessible to neutron or x-ray 
scattering. However, the technique also depends on the capacity of the data to resolve different structural 
models that may actually give rise to rather similar EXAFS spectra. The sensitivity of the technique also falls 
away for internuclear distances in excess of 2.5 A. 


Primary shell water 


<WBQ - 


0.045 - 


0.030 - 


0.015 - 


- 


-0l0l5 - 


■0.030 - 


0.045 - 



Secondary shell water 


Figure A2.4.4. Plot of the radial distribution difference function Ag(r) against distance r (pm) for a 1.46 M 
solution of NiCl 2 in D 2 0. From [5]. 
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The secondary hydration sheath has also been studied using vibrational spectroscopy. In the presence of 

highly-charged cations, such as Al 3+ , Cr 3+ and Rh 3+ , frequency shifts can be seen due to the entire primary 
and secondary hydration structure, although the number of water molecules hydrating the cation is somewhat 
lower than that expected on the basis of neutron data or LAXS data. By contrast, comparison of the Raman 

and neutron diffraction data for Sc 3+ indicates the presence of [Sc(H 2 0) 7 ] 3+ in solution, a result supported by 
the observation of pentagonal bipyramidal coordination in the x-ray structure of the aqua di-|u-hydroxo dimer 

[Sc 2 (OH) 2 ] 4+ . 

The hydration of more inert ions has been studied by 18 labelling mass spectrometry. 18 0-enriched water is 
used, and an equilibrium between the solvent and the hydration around the central ion is first attained, after 
which the cation is extracted rapidly and analysed. The method essentially reveals the number of oxygen 
atoms that exchange slowly on the timescale of the extraction, and has been used to establish the existence of 
the stable [Mo 3 4 ] 4+ cluster in aqueous solution. 


1 17 

One of the most powerful methods for the investigation of hydration is NMR, and both H and O nuclei 


have been used. By using paramagnetic chemical shift reagents such as Co and Dy , which essentially 
shift the peak position of bulk water, hydration measurements have been carried out using H NMR on a 
number of tripositive ions. 17 NMR measurements have also been carried out and, by varying the 
temperature, the dynamics of water exchange can also be studied. The hydration numbers measured by this 

technique are those for the inner hydration sheath and, again, values of four are found for Be and six for 
many other di- and tri-positive cations. The hydration numbers for the alkali metals' singly-positive cations 
have also been determined by this method, with values of around three being found. 

Hydration and solvation have also been studied by conductivity measurements; these measurements give rise 
to an effective radius for the ion, from which a hydration number can be calculated. These effective radii are 
reviewed in the next section. 


A2.4.3 IONIC CONDUCTIVITY 

A2.4.3.1 THE BOLTZMANN TRANSPORT EQUATION 

The motion of particles in a fluid is best approached through the Boltzmann transport equation, provided that 
the combination of internal and external perturbations does not substantially disturb the equilibrium. In other 
words, our starting point will be the statistical thermodynamic treatment above, and we will consider the 
effect of both the internal and external fields. Let the chemical species in our fluid be distinguished by the 
Greek subscripts a,p,. . . and lety^r, c, t) d Vd c x d c d c z be the number of molecules of type a located in 
volume d Fat r and having velocities between c x and c x + d c x etc. Note that we expect c and r are 

independent. Let the external force on molecules of type a be F . At any space point, r, the rate of increase of 
fn,> (dfjd Oj w iU be determined by: 
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(1) the nett flow of molecules of type a with velocity c into d V- V • (cf ) = - c • grad (f ); 

(2) acceleration of molecules in d Finto and out of the range d c by F , -(1/m ) V • (F f ) 

^ ' (JC (JC L/ (JC (JC 

accelerations, de-excitations, etc of local molecules by intermolecular collisions. This is the most 
troublesome part analytically: it will be composed of terms corresponding to gain of molecules in d Fat 

(3) c and a loss by collisions. We will not write down an explicit expression for the nett collision effect, but 
rather write (dfjd f) coll . 

The nett result is 


C^f) +C V/ H + —V, - F afa = (?£) (A2.4.24) 

which is Boltzmann's transport equation. To make progress, we make the assumption now that in first order, 
f on the left-hand side of the equation is the equilibrium value, A*. We further make the so-called relaxation- 

time approximation that df /d t) ^ = (-'« ~f a )^)^ where x is, in principle, a function of c, or at least of |c|. 
We then have, from A2.4.24 (z^i[/m a )E • V r (/ ff ) = ((f£ — f v )/i\ where the charge on ions of type a is z a e Q 
and the applied electric field is E. Given that the current density, /, in d Fis 

/ / / :^0C/ cr dc,= I Z*eQC(f u - /„") dc (A2.4.25) 


substituting (A2.4.25) from the Boltzmann equation and evaluating the conductivity, k , from ions of type a, 
we have, after carrying out the spatial integrations 

* fl =-^ f f f T C ^%f dc=^ f f f f ^AT^)dc^^^ (A2.4.26) 

m* J J J m a J J J m a 

where TV is the number of ions per unit volume. From elementary analysis, if we define a mean ionic drift 
velocity v in the direction of the applied electric field, E, the conductivity contribution from ions of type a 
will be Nz a e^vl\E\ = Nz^e^u, where u is termed the mobility', from which we can see that u = z^e^z I m a . 

A2.4.3.2 THE ELEMENTARY THEORY OF IONIC CONDUCTIVITY [7] 

An alternative approach is to consider ions of charge z a e Q accelerated by the electric field strength, E, being 
subject to a frictional force, K R , that increases with velocity, v, and is given, for simple spherical ions of 

radius r , by the Stokes formula, K R = 6nr[r a v, where r| is the viscosity of the medium. After a short 

induction period, the velocity attains a limiting value, w max? corresponding to the exact balance between the 

electrical and frictional forces: 

z^e^E = 6xijr„v imx (A2.4.27) 
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and the terminal velocity is given by 

*W = Z„enEf($7T7}r a ) (A2.4.28) 

and it is evident that n = m a /(67ir|r a ). It follows that, for given values of r| and E, each type of ion will have a 
transport velocity dependent on the charge and the radius of the solvated ion and a direction of migration 
dependent on the sign of the charge. 

For an electrolyte solution containing both anions and cations, with the terminal velocity of the cations being 
*Ws and the number of ions of charge z + e Q per unit volume being N*, the product - " ^.^corresponds just 
to that quantity of positive ions that passes per unit time through a surface of area A normal to the direction of 
flow. The product " ^ v ™™can be defined analogously, and the amount of charge carried through this surface 
per unit time, or the current per area A, is given by 

/ = / + + I" = j4e (tf v*w + "~*~ w ™> 

(A2.4.29) 
= j4* {AT* V + N~z~u~) x \E\ 

where the u are the mobilities defined above. If the potential difference between the electrodes is AV, and the 
distance apart of the electrodes is /, then the magnitude of the electric field \E\ = AV/l. Since / = GAV, where 
G is the conductance, G is given by 

G = {Aft)e^{N^iu + ATz'iO- (A2.4.30) 

The conductivity is obtained from this by division by the geometric factor (A/l), giving 


K = e$(N Z it + N~Z~U~)* (A2.4.31) 

It is important to recognize the approximations made here: the electric field is supposed to be sufficiently 
small so that the equilibrium distribution of velocities of the ions is essentially undisturbed. We are also 
assuming that the we can use the relaxation approximation, and that the relaxation time x is independent of the 
ionic concentration and velocity. We shall see below that these approximations break down at higher ionic 
concentrations: a primary reason for this is that ion-ion interactions begin to affect both x and F . as we shall 
see in more detail below. However, in very dilute solutions, the ion scattering will be dominated by solvent 
molecules, and in this limiting region A2.4.31 will be an adequate description. 

Measurement of the conductivity can be carried out to high precision with specially designed cells. In 
practice, these cells are calibrated by first measuring the conductance of an accurately known standard, and 
then introducing the sample under study. Conductances are usually measured at about 1 kHz AC rather than 
with DC voltages in order to avoid complications arising from electrolysis at anode and cathode [8]. 
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The conductivity of solutions depends, from A2.4.31 , on both the concentration of ions and their mobility. 
Typically, for 1 M NaCl in water at 18°C, a value of 7.44 Q" 1 m~ l is found: by contrast, 1 M H 2 S0 4 has a 
conductivity of 36.6 Q m at the same temperature and acetic acid, a weak electrolyte, has a conductivity 
of only 0.13 Q" 1 m" 1 . 


In principle, the effects of the concentration of ions can be removed by dividing A2.4.31 by the concentration. 
Taking Avagadro's constant as L and assuming a concentration of solute c mol m , then from the 
electroneutrality principle we have N^z + = N^z~ = v±z±cL and clearly 


A = - = V±z±Leo(u* + *#") = V±z±F{u~ + k") (A2.4.32) 

c 

where A is termed the molar conductivity and F is the Faraday, which has the numerical value 96 485 C mol - 

1 

In principle, A should be independent of the concentration according to A2.4.31, but this is not found 
experimentally. At very low concentrations A is roughly constant, but at higher concentrations substantial 
changes in the mobilities of the ions are found, reflecting increasing ion-ion interactions. Even at low 
concentrations the mobilities are not constant and, empirically, for strong electrolytes, Kohlrausch observed 
that A decreased with concentration according to the expression 

A = A - kyfcjc? (A2.4.33) 

where A Q is the molar conductivity extrapolated to zero concentration and c is the standard concentration 
(usually taken as 1 M). A Q plays an important part in the theory of ionic conductivity since at high dilution the 
ions should be able to move completely independently, and as a result equation A2.4.32 expressed in the form 

A = V±Z±F(Uq +Kp = U. Ag +■ V-AJ (A2.4.34) 

is exactly true. 

The fraction of current carried by the cations is clearly I^KJ^ + /"); this fraction is termed the transport 
number of the cations, t + , and evidently 


t = 


U + +W" 


(A2.4.35) 


In general, since the mobilities are functions of the concentration, so are the transport numbers, but limiting 
transport numbers can be defined by analogy to A2.4.34. The measurement of transport numbers can be 
carried out straightforwardly, allowing an unambiguous partition of the conductivity and assessment of the 
individual ionic mobilities at any concentration. Some typical transport numbers are given in table A2.4.1 [7], 
for aqueous solutions at 25° and some limiting single-ion molar conductivities are given in table A2.4.2 [7]. 
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Table A2.4.1. Typical transport numbers for aqueous solutions. 


Electrolyte 

'o + 

Jfl(=1 -tf) 

KCI 

0.4906 

0.5094 

NH 4 CI 

0.4909 

0.5091 

HCI 

0.821 

0.179 

KOH 

0.274 

0.726 

NaCI 

0.3962 

0.6038 

NaOOCCH 3 

0.5507 

0.4493 

KOOCCH3 

0.6427 

0.3573 

CuS0 4 

0.375 

0.625 


Table A2.4.2. Limiting single-ion conductivities. 



A,j, A^ 


A n , A^ 

Ion 

(£2"' ino]" 1 cm 

-) Ion 

(U~ l mt>] -1 cm 2 ) 



Ag + 

62.2 

H + 

349.8 

Na + 

50.11 

OH- 

197 

Li + 

38.68 

K + 

73.5 

[Fe(CN) 6 ] 4 " 

440 

NH3 

73.7 

[Fe(CN) 6 ] 3 " 

303 

Rb + 

77.5 

[Cr0 4 ] 2 " 

166 

Cs + 

77 

[S0 4 ] 2 " 

161.6 



r 

76.5 

Ba 2+ 

126.4 

cr 

76.4 

Ca 2+ 

119.6 

NO s 

71.5 

Mg 2+ 

106 

CH3COO- 

40.9 



C 6 H 5 COO" 

32.4 
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A2.4.3.3 THE SOLVATION OF IONS FROM CONDUCTIVITY MEASUREMENTS 

We know from equation A2.4.32 and equation A2.4.34 that the limiting ionic conductivities are directly 
proportional to the limiting ionic mobilities: in fact 

Aj =z + Ful (A2.4.36) 

X^ =Z"FiV (A2.4.37) 

At infinite dilution, the assumption of a constant relaxation time is reasonable and, using Stokes law as well, 
we have 

UQ=zeof(>7T7}r. (A2.4.38) 

At first sight, we would expect that the mobilities of more highly-charged ions would be larger, but it is 

apparent from table A2.4.2 that this is not the case; the mobilities of Na and Ca are comparable, even 
though equation A2.4.38 would imply that the latter should be about a factor of two larger. The explanation 
lies in the fact that r also increases with charge, which, in turn, can be traced to the increased size of the 
hydration sheath in the doubly-charged species, since there is an increased attraction of the water dipoles to 
the more highly-charged cations. 

It is also possible to explain, from hydration models, the differences between equally-charged cations, such as 
the alkali metals (A* + = 73.5, A^* = 50.1 land k\^= 38.68, all in units of Q" 1 mol -1 cm 2 ). From atomic 

physics it is known that the radii of the bare ions is in the order Li + < Na + < K + . The attraction of the water 
dipoles to the cation increases strongly as the distance between the charge centres of the cation and water 
molecule decreases, with the result that the total radius of the ion and bound water molecules actually 
increases in the order K + < Na + < Li + , and this accounts for the otherwise rather strange order of mobilities. 

The differing extent of hydration shown by the different types of ion can be determined experimentally from 
the amount of water carried over with each type of ion. A simple measurement can be carried out by adding 
an electrolyte such as LiCl to an aqueous solution of sucrose in a Hittorf cell. Such a cell consists of two 
compartments separated by a narrow neck [7]; on passage of charge the strongly hydrated Li + ions will 
migrate from the anode to the cathode compartment, whilst the more weakly hydrated Cl~ ions migrate 
towards the anode compartment; the result is a slight increase in the concentration of sucrose in the anode 
compartment, since the sucrose itself is essentially electrically neutral and does not migrate in the electric 
field. The change in concentration of the sucrose can either be determined analytically or by measuring the 
change in rotation of plane polarized light transmitted through the compartment. Measurements carried out in 
this way lead to hydration numbers for ions, these being the number of water molecules that migrate with each 

cation or anion. Values of 10-12 for Mg , 5.4 for K + , 8.4 for Na + and 14 for Li + are clearly in reasonable 
agreement with the values inferred from the Stokes law arguments above. They are also in agreement with the 
measurements carried out using large organic cations to calibrate the experiment, since these are assumed not 
to be hydrated at all. 

Anions are usually less strongly hydrated, as indicated above, and from equation A2.4.38 this would suggest 
that increasing the charge on the anion should lead unequivocally to an increase in mobility and hence to an 
increase in limiting ionic conductivity. An inspection of table A2.4.2 shows this to be borne out to some 
extent by the limited data 
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available. The rather low conductivities exhibited by organic anions is a result of their considerably larger 
size; even taking hydration into account, their total diameter normally exceeds that of the simple anions. 

One anomaly immediately obvious from table A2.4.2 is the much higher mobilities of the proton and 
hydroxide ions than expected from even the most approximate estimates of their ionic radii. The origin of this 
behaviour lies in the way in which these ions can be accommodated into the water structure described above. 
Free protons cannot exist as such in aqueous solution: the very small radius of the proton would lead to an 
enormous electric field that would polarize any molecule, and in an aqueous solution the proton immediately 

attaches itself to the oxygen atom of a water molecule, giving rise to an H 3 + ion. In this ion, however, the 
positive charge does not simply reside on a single hydrogen atom; NMR spectra show that all three hydrogen 
atoms are equivalent, giving a structure similar to that of the NH 3 molecule. The formation of a water cluster 

around the H 3 + ion and its subsequent fragmentation may then lead to the positive charge being transmitted 
across the cluster without physical migration of the proton, and the limiting factor in proton motion becomes 
hydrogen-bonded cluster formation and not conventional migration. It is clear that this model can be applied 
to the anomalous conductivity of the hydroxide ion without any further modification. Hydrogen-atom 
tunnelling from a water molecule to an OH~ ion will leave behind an OH~ ion, and the migration of OH~ ions 
is, in fact, traceable to the migration of H + in the opposite direction. This type of mechanism is supported by 
the observation of the effect of temperature. It is found that the mobility of the proton goes through a 
maximum at a temperature of 150°C (where, of course, the measurements are carried out under pressure). 
This arises because as the temperature in increased from ambient, the main initial effect is to loosen the 
hydrogen-bonded local structure that inhibits reorientation. However, at higher temperatures, the thermal 
motion of the water molecules becomes so marked that cluster formation becomes inhibited. 

The complete hydration shell of the proton consists of both the central H 3 + unit and further associated water 
molecules; mass spectrometric evidence would suggest that a total of four water molecules form the actual 
H 9 t>4 unit, giving a hydration number of four for the proton. Of course, the measurement of this number by 

the Hittorf method is not possible since the transport of protons takes place by a mechanism that does not 
involve the actual movement of this unit. By examining concentration changes and using large organic cations 
as calibrants, a hydration number of one is obtained, as would be expected. 

From equation A2.4. 36 and equation A2.4.37 , we can calculate the magnitudes of the mobilities for cations 
and anions. As an example, from table A2.4.2 , the limiting ionic conductivity for the Na + ion is 50.1 1 x 10 -4 

9 11 +■+ 8911 

m Q mol - . From this we obtain a value of w<> = -V F ~ 5.19 x 10 m V s , which implies that in a 
field of 100 V m _1 , the sodium ion would move a distance of about 2 cm in 1 h. The mobilities of other ions 
have about the same magnitude (4-8) x 10~ 8 m 2 V -1 s _1 ), with the marked exception of the proton. This has 

7 9 11 

an apparent mobility of 3.63 x 10 m V s , almost an order of magnitude higher, reflecting the different 
conduction mechanism described above. These mobilities give rise to velocities that are small compared to 
thermal velocities, at least for the small electric fields normally used, confirming the validity of the analysis 
carried out above. 

With the knowledge now of the magnitude of the mobility, we can use equation A2. 4. 3 8 to calculate the radii 
of the ions; thus for lithium, using the value of 0.000 89 kg m _1 s _1 for the viscosity of pure water (since we 
are using the conductivity at infinite dilution), the radius is calculated to be 2.38 x 10 m (=2.38 A). This 
can be contrasted with the crystalline ionic radius of Li + , which has the value 0.78 A. The difference between 
these values reflects the presence of the hydration sheath of water molecules; as we showed above, the 

transport measurements suggest that Li + has a hydration number of 14. 
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From equation A2. 4. 3 8 we can, finally, deduce Walden's rule, which states that the product of the ionic 
mobility at infinite dilution and the viscosity of the pure solvent is a constant. In fact 


h jj = z< J of&xr = constant (A2.4.39) 

whereby XqT[ = constant and A Q r| = constant. This rule permits us to make an estimate of the change of u^ and 
A,q with a change in temperature and the alteration of the solvent, simply by incorporating the changes in 
viscosity. 


A2.4.4 IONIC INTERACTIONS 

The McMillan-Mayer theory offers the most useful starting point for an elementary theory of ionic 
interactions, since at high dilution we can incorporate all ion-solvent interactions into a limiting chemical 
potential, and deviations from solution ideality can then be explicitly connected with ion-ion interactions only. 
Furthermore, we may assume that, at high dilution, the interaction energy between two ions (assuming only 
two are present in the solution) will be of the form 



U(ru) ^ \ -1^5 ^ > d (A2.4.40) 

where in the limiting dilution law, first calculated by Debye and Hiickel (DH), d is taken as zero. It should be 
emphasized that u(r) is not the potential of mean force, W{f), defined in the McMillan-Mayer theory above; 
this latter needs to be worked out by calculating the average electrostatic potential (AEP), \|/.(r) surrounding a 
given ion, i, with charge z.e Q . This is because although the interaction between any iony and this central ion is 
given by A2.4.40, the work required to bring the iony from infinity to a distance r from i is influenced by 
other ions surrounding i. Oppositely charged ions will tend to congregate around the central ion, giving rise to 
an ionic 'atmosphere' or cosphere, which intervenes between ions i andy, screening the interaction 
represented in A2.4.40. The resulting AEP is the sum of the central interaction and the interaction with the 
ionic cosphere, and it can be calculated by utilizing the Poisson equation: 

VV*(r) = -M£> (A2.4.41) 

where q,*. (r) is the charge density (i.e. the number of charges per unit volume) at a distance r from the centre 
i. In terms of the pair correlation coefficient defined above: 

q ih {r) = * fl J2 ZjPjXji(r) (A2.4.42) 

J 

where p. is the number density of ions of typey. From A2.4.20 above, we have gjt = e~^ w ' J * r? , and it is the 
potential 
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of the mean force Wthat is related to the AEP. The first approximation in the DH theory is then to write 


Wji(r)^eftZj^(r) (A2.4.43) 

whence g^{r) *& e -/ r -j*whl JP ), which was originally given by DH as a consequence of the Boltzmann law, but 

clearly has a deep connection with statistical thermodynamics of fluids. From (A2.4.41) and (A2.4.42) , we 
have 

V **« = ~E^ C "'^*- (A2.4.44) 

The major deficiency of the equation as written is that there is no excluded volume, a deficiency DH could 
rectify for the central ion, but not for all ions around the central ion. This deficiency has been addressed 
within the DH framework by Outhwaite [9]. 

To solve A2.4.44, the assumption is made that fize^y^r) <K1, so the exponential term can be expanded. 
Furthermore, we must have Z-^p- = since the overall solution is electroneutral. Finally we end up with 

V 2 ^(r) = -^Y,Z,Pi e-***« * ^^ Tfljz) = K^ir) (A2.4.45) 

/ J 

where k has the units of inverse length. The ionic strength, I, is defined as 

I = r ^PjiZjetH 2 (A2.4.46) 

j 

so k 2 = 2//8s kT. For aqueous solutions at 25°C, k" 1 m" 1 = 3.046 x 10~ 10 /(7) 1/2 . Equation A2.4.45 can be 
solved straightforwardly providing the assumption is made that the mean cosphere around each ion is 
spherical. On this basis A2.4.45 reduces to 

&{*•*■)-** <A2A47) 

which solves to give 

^ (j ) = — c~ Kr ■+■ — C iKr (A2.4.48) 

r r 

where A and B are constants of integration. B is clearly zero and, in the original DH model with no core 
repulsion term, A was fixed by the requirement that as r — » 0, \|/(r) must behave as z.e /47issQr. In the 
extended model, 
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equation A2.4.47 is also solved for the central ion, and the integration constants determined by matching \|/ 
and its derivative at the ionic radius boundary. We finally obtain, for the limiting DH model: 


lM'") = Z e ■ (A2.4.49) 


Given that W ..(r) « eQZ.\|/.(r), we finally obtain for the pair correlation coefficient 

Rjdr) = cxp[-j92^ ^(r)] * 1 - -Eiiif2_ e -^. (A2.4.50) 

An alternative derivation, which uses the Ornstein-Zernicke equation ( equation (A2.4.12) , was given by Lee 
[2]. The Ornstein-Zernicke equation can be written as 

hijirr*) - C;j(rr) = ^ A / dsCyjfr 5)^,(3/). (A2.4.51) 

i * 

Given y(r, r r ) « 1 for very dilute solutions, the PY condition leads to 


Qj * fij = e-^'" - 1 * -uy/kT and 


(A2.4.52) 


whence FT., satisfies the integral equation 


Fy(r) = -f^- + Tfiflrf** (£^)w u t\r - *\) t (A2.4.53) 


This can be solved by standard techniques to yield 

Wjj(r) = -^-1 . (A2.4.54) 

4jT££y r 

A result identical to that above. 

From these results, the thermodynamic properties of the solutions may be obtained within the McMillan- 
Mayer approximation; i.e. treating the dilute solution as a quasi-ideal gas, and looking at deviations from this 
model solely in terms of ion-ion interactions, we have 
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U - E^ dMl ! 


f ] f I £ 

/df|[4jrr%(r)] 


(A2.4.55) 


using (A2.4.50) and Z- z. p. = 0, this can be evaluated to give 

v " sir - ' 

The chemical potential may be calculated from the expression 


(A2.4.56) 


j£ = 1n(^ A}) + J2{f[^[ d '" te^UijWgi} fr: *) (A2.4.57) 

where £, is a coupling parameter which determines the extent to which ion j is coupled to the remaining ions in 

the solution. This is closely related to the work of charging they'th ion in the potential of all the other ions and, 

for the simple expression in (A2.4.57), the charging can be represented by writing the charge on they'th ion in 

the equation for g.. as zL, with £, increasing from zero to one as in (A2.4.57). Again, using (A2.4.50) and Y. z. 
V J J J 

p . = 0, we find 

Jl 2 

U = ]a ^ A ^- / jCn lr - (A2A58) 

This is, in fact, the main result of the DH analysis. The activity coefficient is clearly given by 

l n w. = dfjll- (A2.4.59) 

where again the activity coefficient is referred to as a dilute non-interacting 'gas' of solvated ions in the 
solvent. From (equation A2.4.46) , we can express (A2.4.59) in terms of the ionic strength /, and we find: 

hi ft- = -AZj^/J = -L172*jV7 (A2.4.60) 

where / is the ionic strength at a standard concentration of 1 mol kg and A is a constant that depends solely 
on the properties of the solvent, and the equivalence refers to water at 25°C. It should be realized that separate 
ionic activity coefficients are not, in fact, accessible experimentally, and only the mean activity coefficient, 
defined for a binary electrolyte A V *B V by y± = (y+'Y- ) l/Cv *" H '- J is accessible. Straightforward algebra gives 
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lH K± =-A|z^-|y7=-1.172|z^-|v7. (A2.4.61) 

We have seen that the DH theory in the limiting case neglects excluded volume effects; in fact the excluded 
volume of the central ion can be introduced into the theory as explained after A2.4.48 . If the radius of the ions 
is taken as a^ for all ions, we have, in first order, 

111 V- = ! — . (A2.4.62) 

8;r£fiaJt7'(]+fla«r) 

For different electrolytes of the same charge, expression A2.4.61 predicts the same values for y + ; in other 
words, the limiting law does not make allowance for any differences in size or other ionic properties. For 1-1 

electrolytes, this is experimentally found to be the case for concentrations below 10 M, although for multi- 
charged electrolytes, the agreement is less good, even for 10 mol kg . In table A2.4.3 [7] some values of y + 
calculated from A2.4.61 are collected and compared to some measured activity coefficients for a few simple ~ 
electrolytes. Figure A2.4.5 [7] shows these properties graphed for 1-1 electrolytes to emphasize the nature of 
the deviations from the limiting law. It is apparent from the data in both the table and the figure that 
deviations from the limiting law are far more serious for 2-1 electrolytes, such as H 2 S0 4 and Na 2 S0 4 . In the 
latter case, for example, the limiting law is in serious error even at 0.005 M. 



V7^ 

Figure A2.4.5. Theoretical variation of the activity coefficient y ± with v'Tfrom equation (A2.4.61) and 
experimental results for 1-1 electrolytes at 25°C. From [7]. 
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Table A2.4.3. y + values of various electrolytes at different concentration. 



/ Equation (A2.4.60) 


Electrolyte 

m (mol kg 1 ) 

HCI 

KN0 3 

LiF 

1-1 electrolytes 





0.001 

0.001 0.9636 

0.9656 

0.9649 

0.965 

0.002 

0.002 0.9489 

0.9521 

0.9514 

0.951 

0.005 

0.005 0.9205 

0.9285 

0.9256 

0.922 

0.010 

0.010 0.8894 

0.9043 

0.8982 

0.889 

0.020 

0.020 0.8472 

0.8755 

0.8623 

0.850 

0.050 

0.050 

0.8304 

0.7991 


0.100 

0.100 

0.7964 

0.7380 




H 2 S0 4 

Na 2 S0 4 


1-2 or 2-1 electrolytes 





0.001 

0.003 0.8795 

0.837 

0.887 


0.002 

0.006 0.8339 

0.767 

0.847 


0.005 

0.015 0.7504 

0.646 

0.778 


0.010 

0.030 0.6662 

0.543 

0.714 


0.020 

0.060 

0.444 

0.641 


0.050 

0.150 


0.536 


0.100 

0.300 

0.379 

0.453 




CdS0 4 

CuS0 4 



2-2 electrolytes 




0.001 

0.004 0.7433 

0.754 

0.74 

0.002 

0.008 0.6674 

0.671 


0.005 

0.020 0.5152 

0.540 

0.53 

0.010 

0.040 

0.432 

0.41 

0.020 

0.080 

0.336 

0.315 

0.050 

0.200 

0.277 

0.209 

0.100 

0.400 

0.166 

0.149 
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A2.4.4.1 BEYOND THE LIMITING LAW 

At concentrations greater than 0.001 mol kg , equation A2.4. 61 becomes progressively less and less 
accurate, particularly for unsymmetrical electrolytes. It is also clear, from table A2.4.3 , that even the 
properties of electrolytes of the same charge type are no longer independent of the chemical identity of the 
electrolyte itself, and our neglect of the factor Ka Q in the derivation of A2.4.61 is also not valid. As indicated 
above, a partial improvement in the DH theory may be made by including the effect of finite size of the 
central ion alone. This leads to the expression 

\z+z-\elK A\z+z~\\fl 

\ny± = • - = ! — =■ (A2.4.63) 

ftjtestok TO- an* ) ( I + Bu () s/I) 

where the parameter B also depends only on the properties of the solvent, and has the value 3.28 x 10 9 m _1 
for water at 25°C. The parameter a Q is adjustable in the theory, and usually the product B a Q is close to unity. 

Even A2.4.63 fails at concentrations above about 0. 1 M, and the mean activity coefficient for NaCl shown in 
figure A2.4.6 [2] demonstrates that in more concentrated solutions the activity coefficients begin to rise, often 
exceeding the value of unity. This rise can be traced to more than one effect. As we shall see below, the 
inclusion of ion-exclusion effects for all the ions gives rise to this phenomenon. In addition, the ion-ion 
interactions at higher concentrations cannot really be treated by a hard-sphere model anyway, and models 
taking into account the true ion-ion potential for solvated ions at close distances are required. Furthermore, the 
number of solvent molecules essentially immobilized in the solvent sheath about each ion becomes a 
significant fraction of the total amount of solvent present. This can be exemplified by the case of sulphuric 
acid: given that each proton requires four water molecules for solvation and the sulphate ion can be estimated 
to require one, each mole of H 2 S0 4 will require 9 mol of water. One kilogram of water contains 

approximately 55 mol, so that a 1 mol kg solution of H 2 S0 4 will only leave 46 mol of 'free' water. The 
effective concentration of an electrolyte will, therefore, be appreciably higher than its analytical value, and 
this effect becomes more marked the higher the concentration. A further effect also becomes important at 
higher concentrations: implicit in our whole approach is the assumption that the free energy of the solvation of 
the ions is independent of concentration. However, if we look again at our example of sulphuric acid, it is 

clear that for m > 6 mol kg -1 , apparently all the water is present in the solvation sheaths of the ions! Of course 
what actually occurs is that the extent of solvation of the ions changes, in effect decreasing the stability of the 
ions. However, this process essentially invalidates the McMillan-Mayer approach, or at the least requires the 
potential of mean force to be chosen in such a way as to reproduce the change in solvation energy. 
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Figure A2.4.6. Mean activity coefficient for NaCl solution at 25 °C as a function of the concentration: full 
curve from ( (A2.4.61) ); dashed curve from ( (A2.4.63) ); dot-dashed curve from (A2.4.64). The crosses denote 
experimental data. From [2]. 

Within the general DH approach, equation A2.4.63 may be further modified by adding a linear term, as 
suggested by Hitchcock [8]: 


\ny ± = 


A\z + Z-.\VI 


+ W 


(A2.4.64) 
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where b is a parameter to be fitted to the data. As can be seen from figure A2.4.6 this accounts for the 
behaviour quite well, but the empirical nature of the parameter b and the lack of agreement on its 
interpretation mean that A2.4.64 can only be used empirically. 


The simplest extension to the DH equation that does at least allow the qualitative trends at higher 
concentrations to be examined is to treat the excluded volume rationally. This model, in which the ion of 
charge ze^ is given an ionic radius d i is termed the primitive model. If we assume an essentially spherical 
equation for the u tj \ 


wy = 


■rOO 


2 




(A2.4.65) 


This can be treated analytically within the mean spherical approximation for which 

gij = r < dij 


(A2.4.66) 


These equations were solved by Blum [10], and a characteristic inverse length, 2Y, appears in the theory. This 
length is implicitly given by the equation 


2 c -iM 1 '- 


2r = g lg ft [ I'l^ Jf 


(A2.4.67) 


where 




f2 = 


2A ^ I + I ^ 


As ] - 


*C* 


^--^ 


ztftkT 
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In this formalism, which is already far from transparent, the internal energy is given by 


V - u™ 


VkT Ajrcs^kT 


2 (rf piZ ^ +—QF 2 \ 

tokTlfcl + rdf 2A" r "f 


(A2.4.68) 


and the mean activity coefficient by 


. IB V - U" s a 1 (F„\ 


(A2.4.69) 


where the superscript HS refers to solutions of the pure hard-sphere model as given by Lee [2] 


The integral equation approach has also been explored in detail for electrolyte solutions, with the PY equation 
proving less useful than the HNC equation. This is partly because the latter model reduces cleanly to the MSA 
model for small h(\2) since 

C(M) = h(M) - In >(]2) = h(]2) - ln[l +/K12)] - /M12) * -fiu(\2) 

+ I{A(12)) 2 + ---. 

Using the Ornstein-Zernicke equation, numerical solutions for the restricted primitive model can be. 

In principle, simulation techniques can be used, and Monte Carlo simulations of the primitive model of 
electrolyte solutions have appeared since the 1960s. Results for the osmotic coefficients are given for 
comparison in table A2.4.4 together with results from the MSA, PY and HNC approaches. The primitive 
model is clearly deficient for values of r.. close to the closest distance of approach of the ions. Many years 
ago, Gurney [ 11 ] noted that when two ions are close enough together for their solvation sheaths to overlap, 
some solvent molecules become freed from ionic attraction and are effectively returned to the bulk [12]. 
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Table A2.4.4. Osmotic coefficients obtained by various methods. 


Concentration 





(mol dm" 3 ) 

Monte Carlo 

MSA 

PY 

HNC 

0.00911 

0.97 

0.969 

0.97 

0.97 

0.10376 

0.945 

0.931 

0.946 

0.946 

0.425 

0.977 

0.945 

0.984 

0.980 

1.00 

1.094 

1.039 

1.108 

1.091 

1.968 

1.346 

1.276 

1.386 

1.340 


The potential model for this approach has the form 


-,-.„2 


Uii = < 


4itsr^j (A2.4.70) 


2 


ZiZj^Q 


The A are essentially adjustable parameters and, clearly, unless some of the parameters in A2.4.70 are fixed 
by physical argument, then calculations using this model will show an improved fit for purely algebraic 
reasons. In principle, the radii can be fixed by using tables of ionic radii; calculations of this type, in which 
just the A are adjustable, have been carried out by Friedman and co-workers using the HNC approach [12]. 
Further refinements were also discussed by Friedman [13], who pointed out that an additional term u cavi is 
required to account for the fact that each ion is actually in a cavity of low dielectric constant, e , compared to 
that of the bulk solvent, s. A real difficulty discussed by Friedman is that of making the potential continuous, 
since the discontinuous potentials above may lead to artefacts. Friedman [ 13 ] addressed this issue and derived 

formulae that use repulsion terms of the form B.. [(r. + r) I r..] n , rather than the hard-sphere model presented 
above. 


A quite different approach was adopted by Robinson and Stokes [8], who emphasized, as above, that if the 
solute dissociated into v ions, and a total of h molecules of water are required to solvate these ions, then the 
real concentration of the ions should be corrected to reflect only the bulk solvent. Robinson and Stokes derive, 
with these ideas, the following expression for the activity coefficient: 

In y± = - /1| - + -- 1 j_ _ f i \ naA - | n [ | - o.OOl W*(v - h)m] (A2.4.71) 
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where a A is the activity of the solvent, W A is its molar mass and m is the molality of the solution. Equation 
(A2.4.71) has been extensively tested for electrolytes and, provided h is treated as a parameter, fits 
remarkably well for a large range of electrolytes up to molalities in excess of two. Unfortunately, the values of 
h so derived, whilst showing some sensible trends, also show some rather counter-intuitive effects, such as an 

increase from Cl~ to I~. Furthermore, the values of h are not additive between cations and anions in solution, 
leading to significant doubts about the interpretation of the equation. Although considerable effort has gone 
into finding alternative, more accurate expressions, there remains considerable doubt about the overall 
physical framework. 

A2.4.4.2 INTERIONIC INTERACTIONS AND THE CONDUCTIVITY 

Equation (A2.4.24) determines the mobility of a single ion in solution, and contains no correction terms for 
the interaction of the ions themselves. However, in solution, the application of an electric field causes positive 
and negative ions to move in opposite directions and the symmetrical spherical charge distribution of equation 
(A2.4.49) becomes distorted. Each migrating ion will attempt to rebuild its atmosphere during its motion, but 
this rebuilding process will require a certain time, termed the relaxation time, so that the central ion, on its 
progress through the solution, will always be a little displaced from the centre of charge of its ionic cloud. The 
result of this is that each central ion will experience a retarding force arising from its associated ionic cloud, 
which is migrating in the opposite direction, an effect termed the relaxation or asymmetry effect. Obviously 
this effect will be larger the nearer, on average, the ions are in solution; in other words, the effect will increase 
at higher ionic concentrations. 

In addition to the relaxation effect, the theory of Debye, Hiickel and Onsager also takes into account a second 
effect, which arises from the Stokes law discussed above. We saw that each ion travelling through the solution 
will experience a frictional effect owing to the viscosity of the liquid. However, this frictional effect itself 
depends on concentration, since, with increasing concentration, encounters between the solvent sheaths of 
oppositely charged ions will become more frequent. The solvent molecules in the solvation sheaths are 
moving with the ions, and therefore an individual ion will experience an additional drag associated with the 
solvent molecules in the solvation sheaths of oppositely charged ions; this is termed the electrophoretic effect. 

The quantitative calculation of the dependence of the electrolyte conductivity on concentration begins from 
expression (A2.4.49) for the potential exerted by a central ion and its associated ionic cloud. As soon as this 
ionic motion begins, the ion will experience an effective electric field 2? rel in a direction opposite to that of the 
applied electric field, whose magnitude will depend on the ionic mobility. In addition, there is a second effect 
identified by Onsager due to the movement of solvent sheaths associated with the oppositely charged ions 
encountered during its own migration through the solution. This second term, the electrophoresis term, will 
depend on the viscosity of the liquid, and combining this with the reduction in conductivity due to relaxation 
terms we finally emerge with the Debye-Hiickel-Onsager equation [8]: 


(A2.4.72) 


z-z-el 2</*r Lel(z+ + \z-\)k 
A = A ° ~ Afi T^ 7T7T — F 2 
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where 


£ + £_ Aq """ Afl 


'/ = _■,.■■.,"■,_ ,+ (A2.4.73) 


Z is Avagadro's constant and k is defined above. It can be seen that there are indeed two corrections to the 
conductivity at infinite dilution: the first corresponds to the relaxation effect, and is correct in (A2.4.72) only 
under the assumption of a zero ionic radius. For a finite ionic radius, a Q9 the first term needs to be modified: 
Falkenhagen [8] originally showed that simply dividing by a term (1 + k<2q) gives a first-order correction, and 
more complex corrections have been reviewed by Pitts et al [14], who show that, to a second order, the 
relaxation term in (A2.4.72) should be divided by ( 1 + k a§){ I + Ka#*Jql The electrophoretic effect should also 

be corrected in more concentrated solutions for ionic size; again to a first order, it is sufficient to divide by the 

correction factor (1 + kaq). Note that for a completely dissociated 1-1 electrolyte q = 0.5, and expression 

(A2.4.72) can be re-written in terms of the molarity, c/(P , remembering that k can be expressed either in terms 
of molalities or molarities; in the latter case we have 

where c is the standard concentration of 1 M. Finally, we see 

A = A - (Si Ao I B 2 )yfcf£* (A2.4.75) 

in which B^ and B 2 are independent of concentration. This is evidently identical in form to Kohlrausch's 
empirical law already discussed earlier ( equation A2.4.33 ). Equation A2.4.75 is valid in the same 
concentration range as the DH limiting law, i.e. for molalities below 0.01 M for symmetrical electrolytes, and 
for unsymmetrical electrolytes to even lower values. In fact, for symmetrical singly-charged 1-1 electrolytes, 
useful estimations of the behaviour can be obtained, with a few per cent error, for up to 0.1 mol kg 
concentrations, but symmetrical multi-charged electrolytes (z + = z~ ^ 1) usually show deviations, even at 0.01 
M. 

At higher concentrations, division by the factor (1 + Ka Q ) gives rise to an expression of the form 

A = A H ; v Cfc (A2.4.76) 

I + KTtffl 

which is valid for concentrations up to about 0.1 M 

In aqueous solution, the values of B^ and B 2 can be calculated straightforwardly. If A is expressed in m 2 Q _1 
mol -1 and c° is taken as 1 mol dm -3 , then, for water at 298 K and z + = \z~\ = 1, B, = 0.229 and B ? = 6.027 x 
10 m Q mol . At 291 K the corresponding values are 0.229 and 5.15 x 10 m Q mol - respectively. 
Some data for selected 1-1 electrolytes is given in table A2.4.5 [7]; it can be seen that Onsager's formula is 


very well obeyed. 
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Table A2.4.5. Experimental and theoretical values ofB^A^ + B 2 for various salts 
in aqueous solution at 291 K. 


Observed value of Calculated value of 


Salt B^A Q + B 2 (m 2 Q" 1 mol" 1 ) B^A Q + B 2 (m 2 Q" 1 mol" 1 ) 


LiCI 7.422 x10" 3 7.343 x 10" 3 

NaCI 7.459 x 10 -3 7.569 x 10" 3 

KCI 8.054 x 10 -3 8.045 x 10 -3 

LiN0 3 7.236 x 10" 3 7.258 x 10" 3 


A 2.4.5 THE ELECTRIFIED DOUBLE LAYER 

Once an electrode, which for our purposes may initially be treated as a conducting plane, is introduced into an 
electrolyte solution, several things change. There is a substantial loss of symmetry, the potential experienced 
by an ion will now be not only the screened potential of the other ions but will contain a term arising from the 
field due to the electrode and a term due to the image charge in the electrode. The structure of the solvent is 
also perturbed: next to the electrode the orientation of the molecules of solvent will be affected by the electric 
field at the electrode surface and the nett orientation will derive from both the interaction with the electrode 
and with neighbouring molecules and ions. Finally, there may be a sufficiently strong interaction between the 
ions and the electrode surface such that the ions lose at least some of their inner solvation sheath and adsorb 
onto the electrode surface. 

The classical model of the electrified interface is shown in figure A2.4.7 [15], and the following features are 
apparent. 

(1) There is an ordered layer of solvent dipoles next to the electrode surface, the extent of whose orientation 
is expected to depend on the charge on the electrode. 

(2) There is, or may be, an inner layer of specifically adsorbed anions on the surface; these anions have 
displaced one or more solvent molecules and have lost part of their inner solvation sheath. An imaginary 
plane can be drawn through the centres of these anions to form the inner Helmholtz plane (IHP). 

(3) The layer of solvent molecules not directly adjacent to the metal is the closest distance of approach of 
solvated cations. Since the enthalpy of solvation of cations is normally substantially larger than that of 
anions, it is normally expected that there will be insufficient energy to strip the cations of their inner 
solvation sheaths, and a second imaginary plane can be drawn through the centres of the solvated 
cations. This second plane is termed the outer Helmholtz plane (OHP). 

(4) Outside the OHP, there may still be an electric field and hence an imbalance of anions and cations 
extending in the form of a diffuse layer into the solution. 

Owing to the various uncompensated charges at the interface there will be associated changes in the potential, 
but there are subtleties about what can actually be measured that need some attention. 
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Figure A2.4.7. Hypothetical structure of the electrolyte double layer. From [15]. 
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A2.4.5.1 THE ELECTRODE POTENTIAL 


Any measurement of potential must describe a reference point, and we will take as this point the potential of 
an electron well separated from the metal and at rest in vacuo. By reference to figure A2.4.8 [16], we can 
define the following quantities. 


(1) The Fermi energy s p which is the difference in energy between the bottom of the conduction band and 

the Fermi level; it is positive and in the simple Sommerfeld theory of metals [17], 
e f = frkfflm = Tr ( 3n '^.) 2/ V 2 m where n is the number density of electrons. 


v^/ The work function O , which is the energy required to remove an electron from the inner Fermi level to 
vacuum. 

(3) The surface potential of the phase, % , due to the presence of surface dipoles. At the metal-vacuum 
interface these dipoles arise from the fact that the electrons in the metal can relax at the surface to some 
degree, extending outwards by a distance of the order of 1 A, and giving rise to a spatial imbalance of 
charge at the surface. 

(4) The chemical potential of the electrons in the metal, ^'a negative quantity. 

(5) The potential energy of the electrons, V, which is a negative quantity that can be partitioned into bulk 
and surface contributions, as shown. 



Firm* 


Figure A2.4.8. Potential energy profile at the metal-vacuum interface. Bulk and surface contributions to Fare 
shown separately. From [16]. 


^M 


is measurable, as is s F , but the remainder are not and must be 


Of the quantities shown in figure A2.4.8 O 

calculated. Values of 1-2 V have been obtained for % M , although smaller values are found for the alkali 

metals. 


If two metals with different work functions are placed in contact there will be a flow of electrons from the 
metal with the lower work function to that with the higher work function. This will continue until the 
electrochemical potentials of the electrons in the two phases are equal. This change gives rise to a measurable 
potential difference between the two metals, termed the contact potential or Volta potential difference. Clearly 
A M ;<J> = CciA^^ where i'^^is the Volta potential difference between a point close to the surface of M 1 and 
that close to the surface of M 2 . The actual number 
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of electrons transferred is very small, so the Fermi energies of the two phases will be unaltered, and only the 

value of the potential Fwill have changed. If we assume that the % M are unaltered as well, and we define the 
potential inside the metal as (|), then the equality of electrochemical potentials also leads to 


-^ i +«Ai3;* + Mj l2 = o, 


(A2.4.77) 


This internal potential, (|), is not directly measurable; it is termed the Galvani potential, and is the target of 
most of the modelling discussed below. Clearly we have aJ{|^ = &^,x + ^u' $ m 


Once a metal is immersed in a solvent, a second dipolar layer will form at the metal surface due to the 


alignment of the solvent dipoles. Again, this contribution to the potential is not directly measurable; in 
addition, the metal dipole contribution itself will change since the distribution of the electron cloud will be 
modified by the presence of the solvent. Finally, there will be a contribution from free charges both on the 
metal and in the electrolyte. The overall contribution to the Galvani potential difference between the metal 
and solution then consists of these four quantities, as shown in figure A2.4.9 [16]. If the potential due to 

dipoles at the metal-vacuum interface for the metal is % M and for the solvent- vacuum interface is % , then the 
Galvani potential difference between metal and solvent can be written either as 


„m* 


A£V = Um + Sxu) - (xs + *xs) +A'(ion) = ^ s vl (dip) +^(ion) 


(A2.4.78) 


or as 


M 


# = ^ + AgV 


(A2.4.79) 


where 5x M , 5% S are the changes in surface dipole for metal and solvent on forming the interface and the g 
values are local to the interface. In A2.4.78 we pass across the interface, and in A2.4.79 we pass into the 
vacuum from both the metal and the solvent. As before, the value of &^iff, the Volta potential difference, is 

measurable experimentally, but it is evident that we cannot associate this potential difference with that due to 
free charges at the interface, since there are changes in the dipole contribution on both sides as well. Even if 
there are no free charges at the interface (at the point of zero charge, or PZC), the Volta potential difference is 
not zero unless 5% M = 5% s ; i.e. the free surfaces of the two phases will still be charged unless the changes in 
surface dipole of solvent and metal exactly balance. In practice, this is not the case: careful measurements [ 18 ] 
show that i JI !\ i&V at the PZC; showing that the dipole changes do not, in fact, compensate. Historically, this 

discussion is of considerable interest, since a bitter dispute between Galvani and Volta over the origin of the 
EMF when two different metals are immersed in the same solution could, in principle, be due just to the Volta 
potential difference between the metals. In fact, it is easy to see that if conditions are such that there are no 
free charges on either metal, the difference in potential between them, again a measurable quantity, is given 
by 


A£^ = A<t> + (Af ^) ff=u - (A? VUb 
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(A2.4.80) 


showing that the difference in work functions would only account for the difference in the electrode potentials 
if the two Volta terms were actually zero. 


0||M* 



Figure A2.4.9. Components of the Galvani potential difference at a metal-solution interface. From [16]. 

A2.4.5.2 INTERFACIAL THERMODYNAMICS OF THE DIFFUSE LAYER 

Unlike the situation embodied in section A2.4.1 , in which the theory was developed in an essentially isotropic 
manner, the presence of an electrode introduces an essentially non-isotropic element into the equations. 
Neglecting rotational-dependent interactions, we see that the overall partition function can be written 




(A2.4.81) 


where w(x k ) is the contribution to the potential energy deriving from the electrode itself, and x k is the distance 
between the Mi particle and the electrode surface. Clearly, if w(x^) — » 0, then we recover the partition 
function for the isotropic fluid. We will work within the McMillan-Mayer theory of solutions, so we will 
evaluate A2.4.81 for ions in a continuum, recognizing the need to replace u(r f ) by a potential of mean force. 
In a similar way, an exact analytical form for w(x k ) is also expected to prove difficult to derive. A complete 
account of w must include the following contributions. 
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(1) A short-range contribution, w s (x^), which takes into account the nearest distance of approach of the ion 
to the electrode surface. For ions that do not specifically adsorb this will be the OHP, distance h from the 
electrode. For ions that do specifically adsorb w s (x^) will be more complex, having contributions both 

from short-range attractive forces and from the energy of de-solvation. 

(2) A contribution from the charge on the surface, w^^aV). If this charge density is written Q Q then 

elementary electrostatic theory shows that ui f ^ J (ji 4 .)will have the unscreened form 

w iQ ' ] (xk) = constant + '* c jt>. (A2.4.82) 

(3) An energy of attraction of the ion to its intrinsic image, vtA im ) (x^), of unscreened form 

2 2 
W (iffl> U'j;) — . (A2.4.83) 

In addition, the energy of interaction between any two ions will contain a contribution from the mirror 

potential of the second ion; u(r.) is now given by a short-range term and a term of the form 

u 

^^ j)= pfi(^-l) (A2A8 4) 

where if As the distance between ion i and the image of iony . 

Note that there are several implicit approximations made in this model: the most important is that we have 
neglected the effects of the electrode on orientating the solvent molecules at the surface. This is highly 
significant: image forces arise whenever there is a discontinuity in the dielectric function and the simple 
model above would suggest that, at least the layer of solvent next to the electrode should have a dielectric 


function rather different, especially at low frequencies, from the bulk dielectric function. Implicit in 
(A2.4.82), (A2.4.83) and A2.4.84 is also the fact that s is assumed independent of x, an assumption again at 
variance with the simple model presented in A2.4.5.1 . In principle, these deficiencies could be overcome by 
modifying the form of the short-range potentials, but it is not obvious that this will be satisfactory for the 
description of the image forces, which are intrinsically long range. 

The most straightforward development of the above equations has been given by Martynov and Salem [19], 
who show that to a reasonable approximation in dilute electrolytes: 

kThk(p™(x a )fp^) \ u£(jf„) * Vrf^) ■+ ^4M-%) = (A2.4.85) 

kT ln( S (ri, Tj)) + wl fi {Rff) + ZacMrt. Tj) = (A2.4.86) 

where §(x) is the Galvani potential in the electrolyte at distance x from the electrode, §cp(x a ) is the change in 
the single-ion mirror-plane potential on moving the ion from infinity to point x , p^is the number density of 

ions of type a at a 
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distance remote from the electrode, and z a e^ (r., r .) is the binary electrostatic potential determined by 
solution of the relevant Poisson equation: 




ft \fl a \ri)/P„ n p /r0 A 


(A2.4.87) 


The physical meaning of the second term in A2.4.87 is that the bracket gives the excess concentration of ions 
P at point r. given an ion a at point r.. Finally, we need the Poisson equation for the Galvani potential at 
distance x from the electrode, which is given by 

g* __ £, ***/&' »u> _ (A2488) 

Ax 2 SSq 

By using the expressions for pi»and ^'from (A2.4.85) and (A2.4.96) in (A2.4.87) and (A2.4.88), solutions 

may in principle be obtained for the various potentials and charge distributions in the system. The final 
equations for a dilute electrolyte are 


where ;i " = p {l \ and which is to be solved under the boundary conditions 

dx set* 

and 


(A2.4.89) 


Xj =0 


Rjj ->■ oo 

# a ^Q. 


ff ° ' (A2.4.90) 


+v^,^m-,))/a/]- u 
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with the boundary conditions 

equation A2.4. 89 and equation A2.4.90 are the most general equations governing the behaviour of an 
electrolyte near an electrode, and solving them would, in principle, give a combined DH ionic atmosphere and 
a description of the ionic distribution around each electrode. 

The zeroth-order solution to the above equations is the Gouy-Chapman theory dating from the early part of 
the 20th century [20]. In this solution, the ionic atmosphere is ignored, as is the mirror image potential for the 
ion. Equation A2.4.90 can therefore be ignored and equation A2.4. 89 reduces to 

d V = Ec Ei/f^Scxp|-/i[^<f »(jf)]} (A2.4.91) 

d.v 2 ££q 

where we have built in the further assumption that w s (x) = for x > h and w s (x) = oo for x < h. This 
corresponds to the hard-sphere model introduced above. Whilst A2.4.91 can be solved for general electrolyte 
solutions, a solution in closed form can be most easily obtained for a 1-1 electrolyte with ionic charges ±z. 
Under these circumstances, (A2.4.91) reduces to 


&*$ _ 2£*vi° 

„0_ „n _ M 


(S*) 


sinhf — (A2.4.92) 


d* 2 £Fo 

where we have assumed that jf^= fl^= n . Integration under the boundary conditions above gives: 


Q t = (8*7^*0) ^ sink (^ 1 ) (A2.4.93) 

<pih) = sin r ' j ' , ., (A2.4.94) 

, u) = A JlL tanh -L | e — ^tanh (^^) [ (A2.4.95) 

and k has the same meaning as in the DH theory, k~ = 2z 2 £{^fE€r\kT. These are the central results of the 

Gouy-Chapman theory. Clearly, if ze^(h)/4kTis small then (|)(x) - <\>(h) e~ K ( > and the potential decays 
exponentially in the bulk of the electrolyte. The basic physics is similar to the DH analysis, in that the actual 

field due to the electrode becomes screened by the ionic charges in the electrolyte. 


A better approximation may be obtained by expansion of 4> and \\f in powers of the dimensionless variable 
q = (zelfEEokT)K. If *s fa + #i and ^ *s ^, then it is possible to show that 
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VjVi(r,, ry) - K 2 f\ (r ti Vj) = -— &(R tj ) (A2.4.96) 

££0 


and 


^T"* 2 * --£**» (A2.4.97) 

dx z k'l 

where Sep = kT e ^lAe^iz is the screened image potential. Solutions to equations (A2.4.96 and A2.4.97) 
have been obtained, and it is found that, for a given Q Q , the value of §(h) is always smaller than that predicted 
by the Gouy-Chapman theory. This theory, both in the zeroth-order and first-order analyses given here, has 
been the subject of considerable analytical investigation [15], but there has been relatively little progress in 
devizing more accurate theories, since the approximations made even in these simple derivations are very 
difficult to correct accurately. 

A2.4.5.3 SPECIFIC IONIC ADSORPTION AND THE INNER LAYER 

Interaction of the water molecules with the electrode surface can be developed through simple statistical 
models. Clearly for water molecules close to the electrode surface, there will be several opposing effects: the 
hydrogen bonding, tending to align the water molecules with those in solution; the electric field, tending to 
align the water molecules with their dipole moments perpendicular to the electrode surface; and dipole-dipole 
interactions, tending to orient the nearest-neighbour dipoles in opposite directions. Simple estimates [ 21 ] 
based on 20 kJ mol for each hydrogen bond suggest that the orientation energy j^E 1 becomes comparable to 
this for E ~ 5 x 10 V m ; such field strengths will be associated with surface charges of the order of 0.2-0.3 
C m -2 or 20-30 |uC cm -2 assuming p = 6.17 x 10~ 30 C m and the dielectric function for water at the electrode 
surface of about six. This corresponds to all molecules being strongly oriented. These are comparable to the 
fields expected at reasonably high electrode potentials. Similarly, the energy of interaction of two dipoles 
lying antiparallel to each other is -p /(Am^E 3 ); for R ~ 4 A the orientational field needs to be in excess of 10 
V m _1 , a comparable number. 

The simplest model for water at the electrode surface has just two possible orientations of the water molecules 
at the surface, and was initially described by Watts-Tobin [22]. The associated potential drop is given by 

(AL - N.)p 
.<?(dip) = -i- (A2.4.98) 

and if the total potential drop across the inner region of dimension h f is A§: 

N-/N- = exp[-(t/ - 2 P A<t>fh-)fkT] (A2.4.99) 

where Uq is the energy of interaction between neighbouring dipoles. A somewhat more soc|)sticated model is 
to assume that water is present in the form of both monomers and dimers, with the dimers so oriented as not to 
give any nett contribution to the value of g(dip). 
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A further refinement has come with the work of Parsons [23], building on an analysis by Damaskhin and 
Frumkin [24]. Parsons suggested that the solvent molecules at the interface could be thought of as being either 
free or associated as clusters. In a second case the nett dipole moment would be reduced from the value found 
for perpendicular alignment since the clusters would impose their own alignment. The difficulty with such 
models is that the structure of the clusters themselves is likely to be a function of the electric field, and 
simulation methods show, see below, that this is indeed the case. 

The experimental data and arguments by Trassatti [25] show that at the PZC, the water dipole contribution to 
the potential drop across the interface is relatively small, varying from about V for Au to about 0.2 V for In 
and Cd. For transition metals, values as high as 0.4 V are suggested. The basic idea of water clusters on the 
electrode surface dissociating as the electric field is increased has also been supported by in situ Fourier 
transform infrared (FTIR) studies [26], and this model also underlies more recent statistical mechanical 
studies [27]. 

The model of the inner layer suggests that the interaction energy of water molecules with the metal will be at 
a minimum somewhere close to the PZC, a result strongly supported by the fact that adsorption of less polar 
organic molecules often shows a maximum at this same point [18]. However, particularly at anodic potentials, 
there is now strong evidence that simple anions may lose part of their hydration or solvation sheath and 
migrate from the OHP to the IHP. There is also evidence that some larger cations, such as [R 4 N] + , Tl + and 
Cs + also undergo specific adsorption at sufficiently negative potentials. The evidence for specific adsorption 
comes not only from classical experiments in which the surface tension of mercury is studied as a function of 
the potential (electrocapillarity), and the coverage derived from rather indirect reasoning [28], but also more 
direct methods, such as the measurement of the amount of material removed from solution, using radioactive 
tracers and ellipsometry. A critical problem is much of this work, particularly in those data derived from 
electrocapillarity, is that the validity of the Gouy-Chapman model must be assumed, an assumption that has 
been queried. The calculation of the free energy change associated with this process is not simple, and the 
following effects need to be considered. 

(1) The energy gained on moving from the OHP to the IHP. The electrostatic part of this will have the form 
( z ^o2 e //££ n)( x OHP _ x IHp)' ^ ut the de-solvation part is much more difficult to estimate. 

(2) The fact that more than one molecule of water may be displaced for each anion adsorbed, and that the 
adsorption energy of these water molecules will show a complex dependence on the electrode potential. 

(3) The fact that a chemical bond may form between a metal and an anion, leading to, at least, a partial 
discharge of the ion. 

(4) The necessity to calculate the electrostatic contribution to both the ion-electrode attraction and the ion- 
ion repulsion energies, bearing in mind that there are at least two dielectric function discontinuities in 
the simple double-layer model above. 

(5) That short-range contributions to both the ion-ion and ion-electrode interactions must be included. 

These calculations have, as their aim, the generation of an adsorption isotherm, relating the concentration of 
ions in the solution to the coverage in the IHP and the potential (or more usually the charge) on the electrode. 
No complete calculations have been carried out incorporating all the above terms. In general, the analytical 
form for the isotherm is 


ln(/(0)) = constant + Lti^i + AQ t + g(0) (A2.4.100) 
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where/(6) and g(0) are functions of the coverage. For models where lateral interactions are dominant, g(0) 


1 II 

will have a dependence: if multiple electrostatic imaging is important, a term linear in will be found. 
Whereas, if dispersion interactions between ions on the surface are important, then a term in 3 becomes 
significant. The form of/(0) is normally taken as 0/(1 - Q)P where/? is the number of water molecules 
displaced by one adsorbed ion. Details of the various isotherms are given elsewhere [28], but modern 
simulation methods, as reviewed below, are needed to make further progress. 

A2.4.5.4 SIMULATION TECHNIQUES 

The theoretical complexity of the models discussed above and the relative difficulty of establishing 
unequivocal models from the experimental data available has led to an increasing interest in Monte Carlo and, 
particularly, molecular dynamics approaches. Such studies have proved extremely valuable in establishing 
quite independent models of the interface against which the theories described above can be tested. In 
particular, these simulation techniques allow a more realistic explicit treatment of the solvent and ions on an 
equal footing. Typically, the solvent is treated within a rigid multipole model, in which the electrical 
distribution is modelled by a rigid distribution of charges on the various atoms in the solvent molecule. 
Dispersion and short-range interactions are modelled using Lennard- Jones or similar model potentials, and the 
interaction of water with the metal surface is generally modelled with a corrugated potential term to take 
account of the atomic structure of the metal. Such potentials are particularly marked for metal-oxygen 
interactions. In the absence of an electrical charge on the electrode the Pt-0 interaction energy is usually 
given by an expression of the form 

U Pt _o = [^*-* r - Bc-^/f^vJ + Ce"^] - fti.y)] (A2.4.101) 

where f(x, y) = c -*(* 3 +r>and the Pt-H interaction is weakly repulsive, of the form 

t/ P ,_ H = De~'"\ (A2.4.102) 

This potential will lead to a single water molecule adsorbing at the PZC on Pt with the dipole pointing away 
from the surface and the oxygen atom pointing directly at a Pt-atom site (on-top configuration). 

The main difficulty in these simulations is the long-range nature of the Coulomb interactions, since both 
mirror-plane images and real charges must be included, and the finite nature of the simulated volume must 
also be included. A more detailed discussion is given by Benjamin [29], and the following conclusions have 
been reached. 

(1) Only at extremely high electric fields are the water molecules fully aligned at the electrode surface. For 
electric fields of the size normally encountered, a distribution of dipole directions is found, whose half- 
width is strongly dependent on whether specific adsorption of ions takes place. In the absence of such 
adsorption the distribution function steadily narrows, but in the presence of adsorption the distribution 
may show little change from that found at the PZC; an example is shown in figure A2.4.10 [30]. 

(2) The pair correlation functions g 0' §OH an< ^ ^HH ^ ave ^ een °btained for water on an uncharged 
electrode surface. For Pt(100), the results are shown in figure A2.4.11 [29], and compared to the 
correlation functions for the second, much more liquid-like layer. It is clear that the first solvation peak 
is enhanced by comparison to the liquid, but is in the same position and emphasizing the importance of 
hydrogen bonding in determining nearest 
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0-0 distances: however, beyond the first peak there are new peaks in the pair correlation function for 
the water layer immediately adjacent to the electrode that are absent in the liquid, and result from the 
periodicity of the Pt surface. By contrast, these peaks have disappeared in the second layer, which is 


very similar to normal liquid water. 

(3) Simulation results for turning on the electric field at the interface in a system consisting of a water layer 
between two Pt electrodes 3 nm apart show that the dipole density initially increases fairly slowly, but 
that between 10 and 20 V nm -1 there is an apparent phase transition from a moderately ordered 
structure, in which the ordering is confined close to the electrodes only, to a substantially ordered layer 
over the entire 3 nm thickness. Effectively, at this field, which corresponds to the energy of about four 
hydrogen bonds, the system loses all the ordering imposed by the hydrogen bonds and reverts to a purely 
linear array of dipoles. 

(4) For higher concentrations of aqueous electrolyte, the simulations suggest that the ionic densities do not 
change monotonically near the electrode surface, as might be expected from the Goiiy-Chapman 
analysis above, but oscillate in the region x < 10 A. This oscillation is, in part, associated with the 
oscillation in the oxygen atom density caused by the layering effect occurring in liquids near a surface. 

(5) At finite positive and negative charge densities on the electrode, the counterion density profiles often 
exhibit significantly higher maxima, i.e. there is an overshoot, and the derived potential actually shows 
oscillations itself close to the electrode surface at concentrations above about 1 M. 

(6) Whether the potentials are derived from quantum mechanical calculations or classical image forces, it is 
quite generally found that there is a stronger barrier to the adsorption of cations at the surface than 
anions, in agreement with that generally . 
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Figure A2.4.10 . Orientational distribution of the water dipole moment in the adsorbate layer for three 

simulations with different surface charge densities (in units of \xC cm as indicated). In the figure cos 6 is the 
angle between the water dipole vector and the surface normal that points into the aqueous phase. From [30]. 
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Figure A2.4.11. Water pair correlation functions near the Pt(100) surface. In each panel, the full curve is for 
water molecules in the first layer, and the broken curve is for water molecules in the second layer. From [30], 


A 2.4.6 THERMODYNAMICS OF ELECTRIFIED INTERFACES 

If a metal, such as copper, is placed in contact with a solution containing the ions of that metal, such as from 
aqueous copper sulphate, then we expect an equilibrium to be set up of the following form: 


Cu" ++C\x 2 * +2c 


M 


(A2.4.103) 


where the subscript M refers to the metal. As indicated above, there will be a potential difference across the 
interface, between the Galvani potential in the interior of the copper and that in the interior of the electrolyte. 
The effects of this potential difference must be incorporated into the normal thermodynamic equations 
describing the interface, which is done, as above, by defining the electrochemical potential of a species with 
charge ze^ per ion or zipper mole. If one mole of z-valent ions is brought from a remote position to the 
interior of the solution, in which there exists a potential (|), then the work done will be zF§; this work term 
must be added to or subtracted from the free energy per mole, |i, depending on the relative signs of the charge 
and (|), and the condition for equilibrium for component / partitioned between two phases with potentials (|)(I) 
and (|)(II) is 


MI) + ZiW) = MII) + ftWIT) 
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(A2.4.104) 


where (|)(I) and (|)(II) are the Galvani or inner potentials in the interior of phases (I) and (II). The expression \i. 


zF is referred to as the electrochemical potential, /-r. We have 


/i, = fij H-^i 7 ^ = fi? + RT liitfj + ZjF$ 


(A2.4.105) 


and the condition for electrochemical equilibrium can be written for our copper system: 

£ (1] (M) = £ Cu MS) + 2jis (M) (A2.4.106) 

where the labels M and S refer to the metal and to the solution respectively. Assuming the copper atoms in the 
metal to be neutral, so that jj. Cu o = fipj*, we then have 

p*n(M) + fl7 , ln(^ ll (M)) = Mj. u :JS) + /J7'lnto CH3 + 2^+MS (M) 

^ (A2.4.107) 

+ 2fl7"]iHa c )-2F0 M 

Given that the concentration of both the copper atoms and the electrons in the copper metal will be effectively 
constant, so that two of the activity terms can be neglected, we finally have, on rearranging A2.4.107, 


^(S)-^ (M) - jiJ. «(M) RT 

(A2.4.108) 


A<t>=<Pu-<Ps= — — + JJ7 ln(« Cu ? ) 


(ff)ln(^ 


= &<f>\) " ; T7^ ln ( f 'Cu : > 


where A(|> is the Galvani potential difference at equilibrium between the electrode and the solution in the case 
where fl C|l :*(aq) = I, and is referred to as the standard Galvani potential difference. It can be seen, in general, 

that the Galvani potential difference will alter by a factor of (RT/zF) In 10 = 0.059/z Fat 298 K, for every 
order of magnitude change in activity of the metal ion, where z is the valence of the metal ion in solution. 

A2.4.6.1 THE NERNST EQUATION FOR REDOX ELECTRODES 

In addition to the case of a metal in contact with its ions in solution there are other cases in which a Galvani 
potential difference between two phases may be found. One case is the immersion of an inert electrode, such 
as platinum metal, into an electrolyte solution containing a substance 'S' that can exist in either an oxidized or 
reduced form through the loss or gain of electrons from the electrode. In the simplest case, we have 

S^+e-^S^j (A2.4.109) 
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an example being 


F^ h +tT **F€ 2 ' (A2.4.110) 

where the physical process described is the exchange of electrons (not ions) between the electrolyte and the 
electrode: at no point is the electron conceived as being free in the solution. The equilibrium properties of the 
redox reaction A2.4.109 can, in principle, be treated in the same way as above. At equilibrium, once a double 
layer has formed and a Galvani potential difference set up, we can write 

As„ + «/i? =^ (A2.4.111) 

and, bearing in mind that the positive charge on 'ox' must exceed 'red' by \ne~\ if we are to have 


electroneutrality, then A2.4.1 1 1 becomes 

i4 n + RT ln(fl^) * nFfc i h/i". - wf4i = As Kj + ffrln^.J (A2.4.112) 

whence 

Atf =tf v -*; = ^ + "^ "^ +/gnn(^l = A^+- AT :n ( ^sl) (A2.4.113) 

where the standard Galvani potential difference is now defined as that for which the activities of S Qx and S red 
are equal. As can be seen, an alteration of this ratio by a factor often leads to a change of 0.059M V in A(|) at 
equilibrium. It can also be seen that A(|) will be independent of the magnitudes of the activities provided that 
their ratio is a constant. 

For more complicated redox reactions, a general form of the Nernst equation may be derived by analogy with 
A2.4.1 13. If we consider a stoichiometric reaction of the following type: 

U|5| + V2S1 + •" + VjSi + ne~ +* VjSj + ■ - VpSp (A2.4.114) 

which can be written in the abbreviated form 

y . |T ox^oa + H* ^2^ ^ rcd (A2.4.1 15) 

w* red 

then straightforward manipulation leads to the generalized Nernst equation: 

W = ***+■ ^ln ( ^%) (A2.4.116) 
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where the notation 




(A2.4.117) 


os 


As an example, the reduction of permanganate in acid solution follows the equation 

MnOJ h SlIjCT + 5e" +* Mn 2+ + 12I1 2 (A2.4.118) 

and the potential of a platinum electrode immersed in a solution containing both permanganate and Mn 2+ is 
given by 

M = At f+ 01 | n ( "v*>A*y \ (A2 .4. 119) 

11 ^ V "m*:* / 


assuming that the activity of neutral H 2 can be put equal to unity. 

A2.4.6.2 THE NERNST EQUATION FOR GAS ELECTRODES 

The Nernst equation above for the dependence of the equilibrium potential of redox electrodes on the activity 
of solution species is also valid for uncharged species in the gas phase that take part in electron exchange 
reactions at the electrode-electrolyte interface. For the specific equilibrium process involved in the reduction 
of chlorine: 

C] 2 +2e~ **2C1~ (A2.4.120) 

the corresponding Nernst equation can easily be shown to be 

^ = i^i —m^^V^J (A2.4.121) 

where fJ c^(aq) is the activity of the chlorine gas dissolved in water. If the Cl 2 solution is in equilibrium with 
chlorine at pressure Pfhin the gas phase, then 

^c>(sas) = Ma 3 (aq)- (A2.4.122) 

Given that Pchfeu) = j<^(«aa»> + RT lnipo*fp*)and ffn-faq) = i+ci= , «l> + ffT"ln(fln f {*q», where p° is the standard pressure 
of 1 atm (= 101 325 Pa), then it is clear that 
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and we can write 


*rci ? (aq) = \^f J ™P (~^ jfp = J (A2.4.123) 

^=^* + (|I)i„(-£|-) (A24124) 

where A(|) is the Galvani potential difference under the standard conditions of PCi2= p° and fl ci~= 1. 
42.4.6.3 THE MEASUREMENT OF ELECTRODE POTENTIALS AND CELL VOLTAGES 

Although the results quoted above are given in terms of the Galvani potential difference between a metal 
electrode and an electrolyte solution, direct measurement of this Galvani potential difference between an 
electrode and an electrolyte is not possible, since any voltmeter or similar device will incorporate unknowable 
surface potentials into the measurement. In particular, any contact of a measurement probe with the solution 
phase will have to involve a second phase boundary between the metal and the electrolyte somewhere; at this 
boundary an electrochemical equilibrium will be set up and with it a second equilibrium Galvani potential 
difference, and the overall potential difference measured by this instrument will in fact be the difference of 
two Galvani voltages at the two interfaces. In other words, even at zero current, the actual EMF measured for 
a galvanic cell will be the difference between the two Galvani voltages A(|)(I) and Acj)(II) for the two interfaces, 
as shown in figure A2.4.12 [7]. 
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Figure A2.4.12. The EMF of a galvanic cell as the difference between the equilibrium Galvani potentials at 
the two electrodes: (a) Ac|>(I) > 0, A^)(II) > and (b) Ac|>(I) > 0, A^)(II) < 0. From [7]. 

Figure A2.4.12 shows the two possibilities that can exist, in which the Galvani potential of the solution, (|> s , 
lies between (|)(I) and (|)(II) and in which it lies below (or, equivalently, above) the Galvani potentials of the 
metals. It should be emphasized that figure A2.4.12 is highly schematic: in reality the potential near the phase 
boundary in the solution changes initially linearly and then exponentially with distance away from the 
electrode surface, as we saw above. The other point is that we have assumed that (|> s is a constant in the region 
between the two electrodes. This will only be true provided the two electrodes are immersed in the same 
solution and that no current is passing. 

It is clear from figure A2.4.12 that the EMF or potential difference, E, between the two metals is given by 


E = A0([1)-A0(I> = 0(11) -0(1) 


(A2.4.125) 


where we adopt the normal electrochemical convention that the EMF is always equal to the potential on the 
metal on the right of the figure minus the potential of the metal on the left. It follows that once the Galvani 
potential of any one electrode is known it should be possible, at least in principle, to determine the potentials 
for all other electrodes. 
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In practice, since the Galvani potential of no single electrode is known the method adopted is to arbitrarily 


choose one reference electrode and assign a value for its Galvani potential. The choice actually made is that of 
the hydrogen electrode, in which hydrogen gas at one atmosphere pressure is bubbled over a platinized 

platinum electrode immersed in a solution of unit H 3 + activity. From the discussion in section A2.4.6.2 , it 
will be clear that provided an equilibrium can be established rapidly for such an electrode, its Galvani 
potential difference will be a constant, and changes in the measured EMF of the complete cell as conditions 
are altered at the other electrode will actually reflect the changes in the Galvani potential difference of that 
electrode. 

Cells need not necessarily contain a reference electrode to obtain meaningful results; as an example, if the two 
electrodes in figure A2.4. 12 are made from the same metal, M, but these are now in contact with two solutions 
of the same metal ions, jv^'but with differing ionic activities, which are separated from each other by a glass 

frit that permits contact, but impedes diffusion, then the EMF of such a cell, termed a concentration cell, is 
given by 




E = A^(I1) - A0(l) = — In [ ~" 7;; \ . (A2.4.126) 


Equation A2.4.126 shows that the EMF increases by 0.059/z V for each decade change in the activity ratio in 
the two solutions. 

A2.4.6.4 CONVENTIONS IN THE DESCRIPTION OF CELLS 

In order to describe any electrochemical cell a convention is required for writing down the cells, such as the 
concentration cell described above. This convention should establish clearly where the boundaries between 
the different phases exist and, also, what the overall cell reaction is. It is now standard to use vertical lines to 
delineate phase boundaries, such as those between a solid and a liquid or between two immiscible liquids. The 
junction between two miscible liquids, which might be maintained by the use of a porous glass frit, is 
represented by a single vertical dashed line, j, and two dashed lines, ][, are used to indicate two liquid phases 

joined by an appropriate electrolyte bridge adjusted to minimize potentials arising from the different diffusion 
coefficients of the anion and cation (so-called 'junction potentials'). 

The cell is written such that the cathode is to the right when the cell is acting in the galvanic mode, and 
electrical energy is being generated from the electrochemical reactions at the two electrodes. From the point 
of view of external connections, the cathode will appear to be the positive terminal, since electrons will travel 
in the external circuit from the anode, where they pass from electrolyte to metal, to the cathode, where they 
pass back into the electrolyte. The EMF of such a cell will then be the difference in the Galvani potentials of 
the metal electrodes on the right-hand side and the left-hand side. Thus, the concentration cell of section 
A2.4.6.3 would be represented by M|M Z+ (I)|M Z+ (II)|M. 

In fact, some care is needed with regard to this type of concentration cell, since the assumption implicit in the 
derivation of A2.4.126 that the potential in the solution is constant between the two electrodes, cannot be 
entirely correct. At the phase boundary between the two solutions, which is here a semi-permeable membrane 
permitting the passage of water molecules but not ions between the two solutions, there will be a potential 
jump. This so-called liquid-junction potential will increase or decrease the measured EMF of the cell 
depending on its sign. Potential jumps at liquid-liquid junctions are in general rather small compared to 
normal cell voltages, and can be minimized further by suitable experimental modifications to the cell. 
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If two redox electrodes both use an inert electrode material such as platinum, the cell EMF can be written 
down immediately. Thus, for the hydrogen/chlorine fuel cell, which we represent by the cell H 2 (g)|Pt|HCl(m) 
|Pt|Cl 2 (g) and for which it is clear that the cathodic reaction is the reduction of Cl 2 as considered in section 


A2.4.6.2: 


E = A<j>(C\ 2 /Cr) - A<jHH,/H 3 0' (aq» 

= £fl " {wj ltl <» a * } + (f£) W(/W/)Aa , V)> (A2 4 127) 

where Br is the standard EMF of the fuel cell, or the EMF at which the activities of ELO + and Cl~ are unity 
and the pressures of H 2 and Cl 2 are both equal to the standard pressure,/? . 


A 2.4.7 ELECTRICAL POTENTIALS AND ELECTRICAL CURRENT 

The discussion in earlier sections has focussed, by and large, on understanding the equilibrium structures in 
solution and at the electrode-electrolyte interface. In this last section, some introductory discussion will be 
given of the situation in which we depart from equilibrium by permitting the flow of electrical current through 
the cell. Such current flow leads not only to a potential drop across the electrolyte, which affects the cell 
voltage by virtue of an ohmic drop I R> (where R> is the internal resistance of the electrolyte between the 
electrodes), but each electrode exhibits a characteristic current-voltage behaviour, and the overall cell voltage 
will, in general, reflect both these effects. 

A2.4.7.1 THE CONCEPT OF OVERPOTENTIAL 

Once current passes through the interface, the Galvani potential difference will differ from that expected from 
the Nernst equation above; the magnitude of the difference is termed the overpotential, which is defined 
heuristically as 

if = A$ - A0 r = E - E 1 (A2.4.128) 

where the subscript r refers to the 'rest' situation, i.e. to the potential measured in the absence of any current 
passing. Provided equilibrium can be established, this rest potential will correspond to that predicted by the 
Nernst equation. Obviously, the sign of r| is determined by whether E is greater than or less than E . 

At low currents, the rate of change of the electrode potential with current is associated with the limiting rate of 
electron transfer across the phase boundary between the electronically conducting electrode and the ionically 
conducting solution, and is termed the electron transfer overpotential. The electron transfer rate at a given 
overpotential has been found to depend on the nature of the species participating in the reaction, and the 
properties of the electrolyte and the electrode itself (such as, for example, the chemical nature of the metal). 
At higher current densities, the primary electron transfer rate is usually no longer limiting; instead, limitations 
arise through the slow transport of reactants from the solution to the electrode surface or, conversely, the slow 
transport of the product away from the electrode (diffusion overpotential) or through the inability of chemical 
reactions coupled to the electron transfer step to keep pace (reaction overpotential). 
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Examples of the latter include the adsorption or desorption of species participating in the reaction or the 
participation of chemical reactions before or after the electron transfer step itself. One such process occurs in 
the evolution of hydrogen from a solution of a weak acid, HA: in this case, the electron transfer from the 
electrode to the proton in solution must be preceded by the acid dissociation reaction taking place in solution. 


A2.4.7.2 THE THEORY OF ELECTRON TRANSFER 

The rate of simple chemical reactions can now be calculated with some confidence either within the 
framework of activated-complex theory or directly from quantum mechanical first principles, and theories that 
might lead to analogous predictions for simple electron transfer reactions at the electrode-electrolyte interface 
have been the subject of much recent investigation. Such theories have hitherto been concerned primarily with 
greatly simplified models for the interaction of an ion in solution with an inert electrode surface. The specific 
adsorption of electroactive species has been excluded and electron transfer is envisaged only as taking place 
between the strongly solvated ion in the outer Helmholtz layer and the metal electrode. The electron transfer 
process itself can only be understood through the formalism of quantum mechanics, since the transfer itself is 
a tunnelling phenomenon that has no simple analogue in classical mechanics. 

Within this framework, by considering the physical situation of the electrode double layer, the free energy of 
activation of an electron transfer reaction can be identified with the reorganization energy of the solvation 
sheath around the ion. This idea will be carried through in detail for the simple case of the strongly solvated 

Fe 3+ /Fe 2+ couple, following the change in the ligand-ion distance as the critical reaction variable during the 
transfer process. 

In aqueous solution, the oxidation of Fe 2+ can be conceived as a reaction of two aquo-complexes of the form 

[Fe(H 2 Ok] 2+ ** [Fe(HjO) fi ] 3 ' + e". (A2.4.129) 

The H 2 molecules of these aquo-complexes constitute the inner solvation shell of the ions, which are, in 
turn, surrounded by an external solvation shell of more or less uncoordinated water molecules forming part of 
the water continuum, as described in section A2.4.2 above. Owing to the difference in the solvation energies, 

the radius of the Fe aquo-complex is smaller than that of Fe , which implies that the mean distance of the 
vibrating water molecules at their normal equilibrium point must change during the electron transfer. 
Similarly, changes must take place in the outer solvation shell during electron transfer, all of which implies 
that the solvation shells themselves inhibit electron transfer. This inhibition by the surrounding solvent 
molecules in the inner and outer solvation shells can be characterized by an activation free energy AG-*-. 

Given that the tunnelling process itself requires no activation energy, and that tunnelling will take place at 
some particular configuration of solvent molecules around the ion, the entire activation energy referred to 
above must be associated with ligand/solvent movement. Furthermore, from the Franck-Condon principle, the 
electron tunnelling process will take place on a rapid time scale compared to nuclear motion, so that the ligand 
and solvent molecules will be essentially stationary during the actual process of electron transfer. 

Consider now the aquo-complexes above, and let x be the distance of the centre of mass of the water 
molecules constituting the inner solvation shell from the central ion. The binding interaction of these 
molecules leads to vibrations 
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of frequency /= co/27r taking place about an equilibrium point x Q and, if the harmonic approximation is valid, 
the potential energy change U t associated with the ligand vibration can be written in parabolic form as 

(A2.4.130) 

where Mis the mass of the ligands, B is the binding energy of the ligands and U ^ is the electrical energy of 
the ion-electrode system. The total energy of the system will also contain the kinetic energy of the ligands, 
written in the form p /2M, where p is the momentum of the molecules during vibrations: 


(A2.4.131) 

It is possible to write two such equations for the initial state, i, (corresponding to the reduced aquo-complex 

[Fe(H 2 0) 6 ] ) and the final state,/, corresponding to the oxidized aquo-complex and the electron now present 
in the electrode. Clearly 

(A2.4.132) 

with a corresponding equation for state/ and with the assumption that the frequency of vibration does not 
alter between the initial and final states of the aquo-complex. During electron transfer, the system moves, as 
shown in figure A2.4.13 [7]), from an equilibrium situation centred at x Q along the parabolic curve labelled to 
the point x § where electron transfer takes place; following this, the system will move along the curve labelled 
to the new equilibrium situation centred on . 


Figure A2.4.13. Potential energy of a redox system as a function of ligand-metal separation. From [7]. 
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The point at which electron transfer takes place clearly corresponds to the condition v±= ej£; equating equations 
(A2.4.132) for the states i and/we find that 


Bf + [tf - B 1 - U^ + (M<*? 2 {2)([x*] 2 - [xf] 2 ) 

A\ = : 1 -21 — — , (A2.4.133) 

MmHxl -4) 

The activation energy, C/ act , is defined as the minimum additional energy above the zero-point energy that is 
needed for a system to pass from the initial to the final state in a chemical reaction. In terms of equation 
(A2.4.132) , the energy of the initial reactants at x = x § is given by 

U* = -F- + -Ma> 2 (x, ~ 4) 2 + B i + Uit (A2.4.134) 

where B l + £/^is the zero-point energy of the initial state. The minimum energy required to reach the point x § 
is clearly that corresponding to the momentum/? = 0. By substituting for x § from equation (A2.4.133), we find 

where U has the value (Mar/2) (jCy - -i^) . U is termed the reorganization energy since it is the additional 

energy required to deform the complex from initial to final value of x. It is common to find the symbol X for 

U s , and model calculations suggest that U s normally has values in the neighbourhood of 1 eV (10 5 J mol - ) 
for the simplest redox processes. 

In our simple model, the expression in A2.4.135 corresponds to the activation energy for a redox process in 
which only the interaction between the central ion and the ligands in the primary solvation shell is considered, 
and this only in the form of the totally symmetrical vibration. In reality, the rate of the electron transfer 
reaction is also influenced by the motion of molecules in the outer solvation shell, as well as by other 


vibrational modes of the inner shell. These can be incorporated into the model provided that each type of 
motion can be treated within the simple harmonic approximation. The total energy of the system will then 
consist of the kinetic energy of all the atoms in motion together with the potential energy arising from each 
vibrational degree of freedom. It is no longer possible to picture the motion, as in figure A2.4. 13 as a one- 
dimensional translation over an energy barrier, since the total energy is a function of a large number of normal 
coordinates describing the motion of the entire system. Instead, we have two potential energy surfaces for the 
initial and final states of the redox system, whose intersection described the reaction hypersurface. The 
reaction pathway will proceed now via the saddle point, which is the minimum of the total potential energy 
subject to the condition *&.= tr;,as above. 

This is a standard problem [ 31 ] and essentially the same result is found as in equation (A2.4.135), save that 

the B l and i/now become the sum over all the binding energies of the central ion in the initial and final states 
and U is now given by 
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M ^)^f J ,2 


f.=i;-^Mo-<io> ! 


(A2.4.136) 


where M. is the effective mass of they'th mode and <d. is the corresponding frequency; we still retain the 
approximation that these frequencies are all the same for the initial and final states. 

With the help of U , an expression for the rate constant for the reaction 

[Fe(HjO) ft ] a ~ <* [Fe(H 2 0) A ]^ + e„ (A2.4.137) 

can be written 

where A is the so-called frequency factor and e prefers to an electron in the metal electrode. The rate constant 

for the back reaction is obtained by interchanging the indices i and/in equation (A2.4.74) . It will be observed 
that under these circumstances U remains the same and we obtain 

A2.4.7.3 THE EXCHANGE CURRENT 

It is now possible to derive an expression for the actual current density from A2.4.138 and A2.4.139, 

assuming reaction A2.4.137, and, for simplicity, assuming that the concentrations, c, of Fe and Fe are 
equal. The potential difference between the electrode and the outer Helmholtz layer, A(|), is incorporated into 
the electronic energy of the Fe 3+ + ^system through a potential-dependent term of the form 

fd =t/ io-^A* (A2.4.140) 


where the minus sign in A2.4.140 arises through the negative charge on the electron. Inserting this into 
A2.4.138 and A2.4.139 and multiplying by concentration and the Faraday to convert from rate constants to 
current densities, we have 
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(A2.4.141) 


J = FAt ^{ jitif J 

j = -tA^p{ — j (A2.4.142) 

where we adopt the convention that positive current involves oxidation. At the rest potential, A(|) r , which is 
actually the same as the standard Nernst potential A(|> when assuming that the activity coefficients of the ions 
are also equal, the rates of these two reactions are equal, which implies that the terms in brackets in the two 
equations must also be equal when A(|) = A(|> . From this it is clear that 

tfo&^o = t/ c { - (/;, + B f - B' (A2.4.143) 

and if we introduce the overpotential, r| = A(|) - A(|> , evidently 

j = hAr exp I ———\ (A2.4.144) 

j = -f Ac exp ^ ^ (A2.4.145) 

from which we obtain the exchange current density as the current at r\ = 0: 

Jq = F Ac exp - —^ (A2.4.146) 


and the activation energy of the exchange current density can be seen to be UJA. If the overpotential is small, 
such that etf\ ^U s (and recalling that U lies in the region of about 1 eV), the quadratic form of A2.4.144 and 

A2.4.145 can be expanded with neglect of terms in r\ . Recalling, also, that e /k B = F/R, we then finally 
obtain 

J = j- _ J = FAcexp (-J*L)\ «„ (^) - «p ( - ^) | (A2.4.147, 

which is the simplest form of the familiar Butler- Volmer equation with a symmetry factor P = ^. This result 

arises from the strongly simplified molecular model that we have used above and, in particular, the 
assumption that the values of co. are the same for all normal modes. Relaxation of this assumption leads to a 
more general equation: 
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j = j + J 




(A2.4.148) 


(A2.4.149) 


For a more general reaction of the form Ox + m <r^ Red, with differing concentrations of Ox and Red, the 
exchange current density is given by 


j"=nFM<fclJ)cK ? 


(-£> 


(A2.4.150) 


Some values fory and P for electrochemical reactions of importance are given in table A2.4.6, and it can be 
seen that the exchange currents can be extremely dependent on the electrode material, particularly for more 
complex processes such as hydrogen oxidation. Many modern electrochemical studies are concerned with 
understanding the origin of these differences in electrode performance. 

Table A2.4.6. 


System 


Electrolyte 


Temperature (°C) Electrode j (a cm -2 ) P 


Fe 3+ /Fe 2+ 



(0.005 M) 

1 M H 2 S0 4 

25 

K 3 Fe(CN) 6 /K 4 Fe(CN) 6 



(0.02 M) 

0.5 M K 2 S0 4 

25 

Ag/10" 3 MAg + 

1 M HCI0 4 

25 

Cd/10- 2 MCd 2+ 

0.4 M K 2 S0 4 

25 

Cd(Hg)/1.4x 10" 3 MCd 2+ 

0.5 M Na 2 S0 4 

25 

Zn(Hg)/2x 10" 2 MZn 2+ 

1 M HCI0 4 



Ti 4+ /Ti 3+ (10" 3 M) 

1 M acetic acid 

25 

H 2 /OH" 

1 MKOH 

25 

H 2 /H + 

1 M H 2 S0 4 

25 

H 2 /H + 

1 M H 2 S0 4 

25 

2 /H + 

1 M H 2 S0 4 

25 

2 /OH" 

1 MKOH 

25 


Pt 


2 x 1CT 3 0.58 


Pt 

5x 10" 2 

0.49 

Ag 

1.5 x 10" 1 

0.65 

Cd 

1.5 x 10" 3 

0.55 

Cd(Hg) 

2.5 x 10" 2 

0.8 

Zn(Hg) 

5.5 x 10" 3 

0.75 

Pt 

9x 10" 4 

0.55 

Pt 

10" 3 

0.5 

Hg 

10" 12 

0.5 

Pt 

10" 3 

0.5 

Pt 

10" 6 

0.25 

Pt 

10" 6 

0.3 
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A2.5 Phase transitions and critical phenomena 


Robert L Scott 


A2.5.1 ONE-COMPONENT FIRST-ORDER TRANSITIONS 

The thermodynamic treatment of simple phase transitions is straightforward and is discussed in A2.1.6 and 
therefore need not be repeated here. In a one-component two-phase system, the phase rule yields one degree 
of freedom, so the transition between the two phases can be maintained along a pressure-temperature line. 
Figure A2.5.1 shows a typical p,T diagram with lines for fusion (solid-liquid), sublimation (solid-gas), and 
vaporization (liquid-gas) meeting at a triple point (solid-liquid-gas). Each of these lines can, at least in 
principle, be extended as a metastable line (shown as a dashed line) beyond the triple point. (Supercooling of 
gases below the condensation point, supercooling of liquids below the freezing point and superheating of 
liquids above the boiling point are well known; superheating of solids above the melting point is more 
problematic.) The vaporization line (i.e. the vapour pressure curve) ends at a critical point, with a unique 
pressure, temperature, and density, features that will be discussed in detail in subsequent sections. Because 
this line ends it is possible for a system to go around it and move continuously from gas to liquid without a 
phase transition; above the critical temperature the phase should probably just be called a 'fluid'. 



Figure A2.5.1. Schematic phase diagram (pressure/? versus temperature T) for a typical one-component 
substance. The full lines mark the transitions from one phase to another (g, gas; £ , liquid; s, solid). The liquid- 
gas line (the vapour pressure curve) ends at a critical point (c). The dotted line is a constant pressure line. The 
dashed lines represent metastable extensions of the stable phases. 

Figure A2.5.2 shows schematically the behaviour of several thermodynamic functions along a constant- 
pressure line (shown as a dotted line in Figure A2.5.1 ) — the molar Gibbs free energy fj(for a one-component 
system the same as 


the chemical potential |u), the molar enthalpy wand the molar heat capacity at constant pressure C . Again, at 

least in principle, each of the phases can be extended into a metastable region beyond the equilibrium 
transition. 
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Figure A2.5.2. Schematic representation of the behaviour of several thermodynamic functions as a function 
of temperature Tat constant pressure for the one-component substance shown in figure A2.5.1 . (The constant- 
pressure path is shown as a dotted line in figure A2.5.1 .) (a) The molar Gibbs free energy G, (b) the molar 
enthalpy fl, and (c) the molar heat capacity at constant pressure C . The functions shown are dimensionless 

(7? is the gas constant per mole, while K is the temperature unit Kelvin). The dashed lines represent metastable 
extensions of the stable phases beyond the transition temperatures. 


It will be noted that the free energy ( figure A2.5.2(a) ) is necessarily continuous through the phase transitions, 
although its first temperature derivative (the negative of the entropy S) is not. The enthalpy H = G + TS 
(shown in figure A2.5.2(b) is similarly discontinuous; the vertical discontinuities are of course the enthalpies 
of transition. The graph for the molar heat capacity Cp ( figure A2. 5. 2(c) ) looks superficially like that for the 
enthalpy, but represents something quite different at the transition. The vertical line with an arrow at a 
transition temperature is a mathematical delta function, representing an ordinate that is infinite and an abscissa 
(A T) that is zero, but whose product is nonzero, an 'area' that is equal to the molar enthalpy of transition Aw. 


Phase transitions at which the entropy and enthalpy are discontinuous are called 'first-order transitions' 
because it is the first derivatives of the free energy that are discontinuous. (The molar volume V= (dWd p) T 
is also discontinuous.) Phase transitions at which these derivatives are continuous but second derivatives of G 


are discontinuous (e.g. the heat capacity, the isothermal compressibility, the thermal expansivity etc) are 
called 'second order'. 

The initial classification of phase transitions made by Ehrenfest (1933) was extended and clarified by Pippard 
[I], who illustrated the distinctions with schematic heat capacity curves. Pippard distinguished different kinds 
of second- and third-order transitions and examples of some of his second-order transitions will appear in 
subsequent sections; some of his types are unknown experimentally. Theoretical models exist for third-order 
transitions, but whether these have ever been found is unclear. 


A2.5.2 PHASE TRANSITIONS IN TWO-COMPONENT SYSTEMS 

Phase transitions in binary systems, normally measured at constant pressure and composition, usually do not 
take place entirely at a single temperature, but rather extend over a finite but nonzero temperature range. 
Figure A2.5.3 shows a temperature-mole fraction (T, x) phase diagram for one of the simplest of such 
examples, vaporization of an ideal liquid mixture to an ideal gas mixture, all at a fixed pressure, (e.g. 1 atm). 
Because there is an additional composition variable, the sample path shown in the figure is not only at 
constant pressure, but also at a constant total mole fraction, here chosen to be x = 1/2. 

As the temperature of the liquid phase is increased, the system ultimately reaches a phase boundary, the 
'bubble point' at which the gas phase (vapour) begins to appear, with the composition shown at the left end of 
the horizontal two-phase 'tie-line'. As the temperature rises more gas appears and the relative amounts of the 
two phases are determined by applying a lever-arm principle to the tie-line: the ratio of the fraction/ of 
molecules in the gas phase to that An the liquid phase is given by the inverse of the ratio of the distances 
from the phase boundary to the position of the overall mole fraction x Q of the system, 

fzff? = te - *o)/(*o - -* P ). 
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Figure A2.5.3. Typical liquid-gas phase diagram (temperature T versus mole fraction x at constant pressure) 
for a two-component system in which both the liquid and the gas are ideal mixtures. Note the extent of the 
two-phase liquid-gas region. The dashed vertical line is the direction (x = 1/2) along which the functions in 
figure A2.5.5 are determined. 


With a further increase in the temperature the gas composition moves to the right until it reaches x = 1/2 at the 
phase boundary, at which point all the liquid is gone. (This is called the 'dew point' because, when the gas is 
cooled, this is the first point at which drops of liquid appear.) An important feature of this behaviour is that 
the transition from liquid to gas occurs gradually over a nonzero range of temperature, unlike the situation 
shown for a one-component system in figure A2.5.1 . Thus the two-phase region is bounded by a dew-point 
curve and a bubble-point curve. 

Figure A2.5.4 shows for this two-component system the same thermodynamic functions as in figure A2.5.2 , 
the molar Gibbs free energy G= x 1 |li 1 + x 2 (ll 2 , the molar enthalpy wand the molar heat capacity C , again all at 

constant pressure, but now also at constant composition, x = 111. Now the enthalpy is continuous because the 
vaporization extends over an appreciable temperature range. Moreover, the heat capacity, while discontinuous 
at the beginning and at the end of the transition, is not a delta function. Indeed the graph appears to satisfy the 
definition of a second-order transition (or rather two, since there are two discontinuities). 
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Figure A2.5.4. Thermodynamic functions C, H, and C as a function of temperature Tat constant pressure 

and composition (x = 1/2) for the two-component system shown in figure A2.5.3 . Note the difference between 
these and those shown for the one-component system shown in figure A2.5.2 . The functions shown are 
dimensionless as in figure A2.5.2 . The dashed lines represent metastable extensions (superheating or 
supercooling) of the one-phase systems. 


However, this behaviour is not restricted to mixtures; it is also found in a one-component fluid system 
observed along a constant- volume path rather than the constant-pressure path illustrated in figure A2.5.2 . 
Clearly it would be confusing to classify the same transition as first- or second-order depending on the path. 
Pippard described such one-component constant- volume behaviour (discussed by Gorter) as a 'simulated 
second-order transition' and elected to restrict the term 'second-order' to a path along which two phases 
became more and more nearly alike until at the transition they became identical. As we shall see, that is what 
is seen when the system is observed along a path through a critical point. Further clarification of this point 
will be found in subsequent sections. 

It is important to note that, in this example, as in 'real' second-order transitions, the curves for the two-phase 
region cannot be extended beyond the transition; to do so would imply that one had more than 100% of one 
phase and less than 0% of the other phase. Indeed it seems to be a quite general feature of all known second- 
order transitions (although it does not seem to be a thermodynamic requirement) that some aspect of the 
system changes gradually until it becomes complete at the transition point. 

Three other examples of liquid-gas phase diagrams for a two-component system are illustrated in figure 
A2.5.5 all a result of deviations from ideal behaviour. Such deviations in the liquid mixture can sometimes 
produce azeotropic behaviour, in which there are maximum or minimum boiling mixtures (shown in figure 
A2.5.5(a) and figure A2.5.5(b) ). Except at the azeotropic composition (that of the maximum or minimum), a 
constant-composition path through the vaporization yields the same kind of qualitative behaviour shown in 
figure A2.5.4 . Behavior like that shown in figure A2.5.2 is found only on the special path through the 
maximum or minimum, where the entire vaporization process occurs at a unique temperature. 
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Figure A2.5.5. Phase diagrams for two-component systems with deviations from ideal behaviour 
(temperature T versus mole fraction x at constant pressure). Liquid-gas phase diagrams with maximum (a) 
and minimum (b) boiling mixtures (azeotropes). (c) Liquid-liquid phase separation, with a coexistence curve 
and a critical point. 


A third kind of phase diagram in a two-component system (as shown in figure A2.5.5(c) is one showing 
liquid-liquid phase separation below a critical-solution point, again at a fixed pressure. (On a T, x diagram, 
the critical point is always an extremum of the two-phase coexistence curve, but not always a maximum. 
Some binary systems show a minimum at a lower critical-solution temperature; a few systems show closed- 
loop two-phase regions with a maximum and a minimum.) As the temperature is increased at any composition 
other than the critical composition x = x c , the compositions of the two coexisting phases adjust themselves to 
keep the total mole fraction unchanged until the coexistence curve is reached, above which only one phase 


persists. Again, the behaviour of the thermodynamic functions agrees qualitatively with that shown in figure 
A2.5.4 except that there is now only one transition line, not 


two. However, along any special path leading through the critical point, there are special features in the 
thermodynamic functions that will be discussed in subsequent sections. First, however, we return to the one- 
component fluid to consider the features of its critical point. 


A2.5.3 ANALYTIC TREATMENT OF CRITICAL PHENOMENA IN FLUID 
SYSTEMS. THE VAN DER WAALS EQUATION 

All simple critical phenomena have similar characteristics, although all the analogies were not always 
recognized in the beginning. The liquid-vapour transition, the separation of a binary mixture into two phases, 
the order-disorder transition in binary alloys, and the transition from ferromagnetism to paramagnetism all 
show striking common features. At a low temperature one has a highly ordered situation (separation into two 
phases, organization into a superlattice, highly ordered magnetic domains, etc). At a sufficiently high 
temperature all long-range order is lost, and for all such cases one can construct a phase diagram (not always 
recognized as such) in which the long-range order is lost gradually until it vanishes at a critical point. 

A2.5.3.1 THE VAN DER WAALS FLUID 

Although later models for other kinds of systems are symmetrical and thus easier to deal with, the first 
analytic treatment of critical phenomena is that of van der Waals (1873) for coexisting liquid and gas [2]. The 
familiar van der Waals equation gives the pressure/? as a function of temperature T and molar volume V, 


j7= RT/{V -h)-a/V z (A2.5.1) 

where R is the gas constant per mole and a and b are constants characteristic of the particular fluid. The 
constant a is a measure of the strength of molecular attraction, while b is the volume excluded by a mole of 
molecules considered as hard spheres. 

Figure A2.5.6 shows a series of typical/?, V isotherms calculated using equation (A2.5.1). (The temperature, 
pressure and volume are in reduced units to be explained below.) At sufficiently high temperatures the 
pressure decreases monotonically with increasing volume, but below a critical temperature the isotherm 
shows a maximum and a minimum. 



Figure A2.5.6. Constant temperature isotherms of reduced pressure p r versus reduced volume V Y for a van der 
Waals fluid. Full curves (including the horizontal two-phase tie-lines) represent stable situations. The dashed 
parts of the smooth curve are metastable extensions. The dotted curves are unstable regions. 

The coexistence lines are determined from the requirement that the potentials (which will subsequently be 
called 'fields'), i.e. the pressure, the temperature, and the chemical potential |u, be the same in the conjugate 
(coexisting) phases, liquid and gas, at opposite ends of the tie-line. The equality of chemical potentials is 
equivalent to the requirement — for these variables (p, V), not necessarily for other choices — that the two areas 
between the horizontal coexistence line and the smooth cubic curve be equal. The full two-phase line is the 
stable situation, but the continuation of the smooth curve represents metastable liquid or gas phases (shown as 
a dashed curve); these are sometimes accessible experimentally. The part of the curve with a positive slope 
(shown as a dotted curve) represents a completely unstable situation, never realizable in practice because a 
negative compressibility implies instant separation into two phases. In analytic treatments like this of van der 
Waals, the maxima and minima in the isotherms (i.e. the boundary between metastable and unstable regions) 
define a 'spinodal' curve; in principle this line distinguishes between nucleation with an activation energy and 
'spinodal decomposition', (see A3. 3.) It should be noted that, if the free energy is nonanalytic as we shall find 
necessary in subsequent sections, the concept of the spinodal becomes unclear and can only be maintained by 
invoking special internal constraints. However, the usefulness of the distinction between activated nucleation 
and spinodal decomposition can be preserved. 

With increasing temperature the two-phase tie-line becomes shorter and shorter until the two conjugate phases 
become identical at a critical point where the maximum and minimum in the isotherm have coalesced. Thus 
the critical point is defined as that point at which (d p/<9 V) T and (d 2 p/d V 2 ) T are simultaneously zero, or where 

the equivalent quantities (d 2 A/d V) r and (d^/dV^are simultaneously zero. These requirements yield the 
critical constants in terms of the constants R, a and b, 


Pc = a/(27h) V,, = 3fr I, = &a / (27 Rh). 

Equation (A2.5.1) can then be rewritten in terms of the reduced quantities/? = plp , T = TIT , and V = VI V 


/ 7 r = K7 r /(.W r -l)-3/V r 2 < 


(A2.5.2) 


It is this equation with the reduced quantities that appears in figure A2.5.6 . 

Since the pressure/? = -(d A I d V) ^ integration of equation (A2.5.1) yields A(T, V) 

A(T,V) -T*(7\ V*) = -RT ln[(V - h)/V°] - a/V 

where A{T, V°) is the molar free energy of the ideal gas at T and V°. (It is interesting to note that the van der 
Waals equation involves a simple separation of the free energy into entropy and energy; the first term on the 
right is just -T(S - S°), while the second is just U- U°.) 

The phase separation shown in figure A2.5.6 can also be illustrated by the entirely equivalent procedure of 
plotting the molar Helmholtz free energy A(T, V) as a function of the molar volume Vfor a series of constant 
temperatures, shown in figure A2.5.7 . At constant temperature and volume, thermodynamic equilibrium 
requires that the Helmholtz free energy must be minimized. It is evident for temperatures below the critical 
point that for certain values of the molar volume the molar free energy A(T, V) can be lowered by separation 
into two phases. The exact position of the phase separation is found by finding a straight line that is 
simultaneously tangent to the curve at two points; the slope at any point of the curve is (d A I d V) = -p 9 so 
the pressures are equal at the two tangent points. Similarly the chemical potential \x = A+ p Vis the same at 
the two points. That the dashed and dotted parts of the curve are metastable or unstable is clear because they 
represent higher values of .4 than the corresponding points on the two-phase line. (The metastable region is 
separated from the completely unstable region by a point of inflection on the curve.) 

The problem with figure A2.5.6 and figure A2.5.7 is that, because it extends to infinity, volume is not a 
convenient variable for a graph. A more useful variable is the molar density p = 1 / V or the reduced density p r 
= \ I V which have finite ranges, and the familiar van der Waals equation can be transformed into an 
alternative although relatively unfamiliar form by choosing as independent variables the chemical potential \i 
and the density p. 
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Figure A2.5.7. Constant temperature isotherms of reduced Helmholtz free energy A Y versus reduced volume 
V . The two-phase region is defined by the line simultaneously tangent to two points on the curve. The dashed 
parts of the smooth curve are metastable one-phase extensions while the dotted curves are unstable regions. 
(The isotherms are calculated for an unphysical ^=0.1, the only effect of which is to separate the isotherms 


better.) 

Unlike the pressure where/? = has physical meaning, the zero of free energy is arbitrary, so, instead of the 
ideal gas volume, we can use as a reference the molar volume of the real fluid at its critical point. A reduced 
Helmholtz free energy A y in terms of the reduced variables T x and V x can be obtained by replacing a and b by 
their values in terms of the critical constants 

A T = [A(J,V) -A(T, V c )]/(/> c V c ) = -(8/3)7, to[{3V r - l)/2j - 3{l - V : )/V t . 


Then, since the chemical potential for a one-component system is just \i = G= A+pV 9 & reduced chemical 
potential can be written in terms of a reduced density p r = p/p c : 


V IV 
c 


j» r = [fi(Ti p) - /*(7\ p,)](A./A..) 

= -<S/3)r r lln((3 - ft)/(2^ r » + <3/2)(* - l)/(3 - p r )\ - bi Pr - 1). 


(A2.5.3) 


Equation (A2.5.3) is a jj, , p r equation of state, an alternative to the/? , V v equation (A2.5.2) . 

The van der Waals \x r , p r isotherms, calculated using equation (A2.5.3), are shown in figure A2. 5. 8 . It is 
immediately obvious that these are much more nearly antisymmetric around the critical point than are the 
corresponding/?^ V Y isotherms in figure A2.5.6 (of course, this is mainly due to the finite range of p r from to 
3). The symmetry is not exact, however, as a careful examination of the figure will show. This choice of 
variables also satisfies the equal-area condition for coexistent phases; here the horizontal tie-line makes the 
chemical potentials equal and the equal-area construction makes the pressures equal. 
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Figure A2.5.8. Constant temperature isotherms of reduced chemical potential \x r versus reduced density p r for 
a van der Waals fluid. Full curves (including the horizontal two-phase tie-lines) represent stable situations. 
The dashed parts of the smooth curve are metastable extensions, while the dotted curves are unstable regions. 


For a system in which the total volume remains constant, the same minimization condition that applies to 
.4also applies to A/ V = Ap, or to (Ap) , a quantity that can easily be expressed in terms of T r and p r , 


(Ap) t = -(8/3) r rJ o r ln[(3 - p r )/{2p r )) - iMfr - 1) - (4T r - 3)(A - 1). 

It is evident that, for the system shown in figure A2.5.9 A I V = Ap or (Ap) Y can be minimized for certain 

values of p on the low-temperature isotherms if the system separates into two phases rather than remaining as 
a single phase. As in figure A2.5.7 the exact position of the phase separation is found by finding a straight line 
that is simultaneously tangent to the curve at two points; the slope at any point on the curve is the chemical 
potential jli, as is easily established by differentiating j4p with respect to p, 

\9(Afi)/Hp] T = A + p(<tAfap) r = A- V(iiAfdVh = A + pV = /<. 

(The last term in the equation for (Ap) r has been added to avoid adding a constant to |i r ; doing so does not 
affect the principle, but makes figure A2.5.9 clearer.) Thus the common tangent satisfies the condition of 
equal chemical potentials, t 1 * = ^s. (The common tangent also satisfies the condition that P* = ''sbecause/? = 
(up - pA.) The two points of tangency determine the densities of liquid and gas, Ptand p , and the relative 

volumes of the two phases are determined by the lever-arm rule. 
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Figure A2.5.9. (Ap) r , the Helmholtz free energy per unit volume in reduced units, of a van der Waals fluid as 
a function of the reduced density p r for several constant temperatures above and below the critical 
temperature. As in the previous figures the full curves (including the tangent two-phase tie-lines) represent 
stable situations, the dashed parts of the smooth curve are metastable extensions, and the dotted curves are 
unstable regions. See text for details. 

The T r , p r coexistence curve can be calculated numerically to any desired precision and is shown in figure 
A2.5.10 . The spinodal curve (shown dotted) satisfies the equation 

2 A r /3^) r , = -(B/h/aV^T, =6p;\4T r /(3 Pr - I) 2 - p r ] = 0. 

Alternatively, expansion of equation (A2.5.1) , equation (A2.5.2) or equation (A2.5.3) into Taylor series leads 
ultimately to series expressions for the densities of liquid and gas, Pfand p , in terms of their sum (called the 
'diameter') and their difference: 


(A + ^)/(2^) = ] + (2/5)(J-7-) + (l28/875)a-7- r ) 2 + 


(A2.5.4) 


(ft - Pt)f(2pc) = 2(1 - 7 r ) l/2 - (13/25KI - r r ) 3 ' 2 + 

or [(ft - P s )/(2a>] 2 = 4(1 - r r ) - (52/25) (1 - T r f + ■ 


(A2.5.5) 


Note that equation (A2.5.5), like equation (A2.5.4), is just a power series in (1 - 7* ) = (T 7 - 7)/r , a variable 
that will appear often and will henceforth be represented by t. All simple equations of state (e.g. the Dieterici 
and Berthelot equations) yield equations of the same form as equation (A2.5.4) and equation (A2.5.5); only 
the coefficients differ. There_are better expressions for the contribution of the hard-sphere fluid to the pressure 
than the van der Waals RT/( V - b), but the results are similar. Indeed it can be shown that any analytic 
equation of state, however complex, must necessarily yield similar power series. 
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Figure A2.5.10. Phase diagram for the van der Waals fluid, shown as reduced temperature T r versus reduced 
density p r . The region under the smooth coexistence curve is a two-phase liquid-gas region as indicated by 
the horizontal tie-lines. The critical point at the top of the curve has the coordinates (1,1). The dashed line is 
the diameter, and the dotted curve is the spinodal curve. 

If the small terms in t 2 and higher are ignored, equation (A2.5.4) is the 'law of the rectilinear diameter' as 
evidenced by the straight line that extends to the critical point in figure A2.5.10 this prediction is in good 
qualitative agreement with most experiments. However, equation (A2.5.5) , which predicts a parabolic shape 
for the top of the coexistence curve, is unsatisfactory as we shall see in subsequent sections. 

The van der Waals energy U= - al V = - ap. On a path at constant total critical density p = p c , which is a 

constant- volume path, the energy of the system will be the sum of contributions from the two conjugate 
phases, the densities and amounts of which are changing with temperature. With proper attention to the 
amounts of material in the two phases that maintain the constant volume, this energy can be written relative to 
the one-phase energy U at the critical point, 

(U - UJttafr) = (U- U f )/(9RT C /B) = -{p t + p v )/p c + p t p^p\ + I 

= -iPr + P^/Pc + (fit + ^)V(2ft) 2 - iPt - Pgf-fQ-Pc) 2 + I 


or, substituting from equation (A2.5.4) and equation (A2.5.5) , 

W - I7 C )/(K7- ) = (9/2)|-/ + (]4/25)f 2 +■■■]. 


Differentiating this with respect to T = - T Q t yields a heat capacity at constant volume, 

Cv = (dU/ST) v = (9/2) J? [1 - (28/25)r + - -]■ ( A2 - 5 - 6 ) 
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This is of course an excess heat capacity, an amount in addition to the contributions of the heat capacities C y 

of the liquid and vapour separately. Note that this excess (as well as the total heat capacity shown later in 
figure A2.5.26 is always finite, and increases only linearly with temperature in the vicinity of the critical 
point; at the critical point there is no further change in the excess energy, so this part of the heat capacity 
drops to zero. This behaviour looks very similar to that of the simple binary system in section (A2.5.2) . 
However, unlike that system, in which there is no critical point, the experimental heat capacity C y along the 

critical isochore (constant volume) appears to diverge at the critical temperature, contrary to the prediction of 
equation (A2.5.6) . 

Finally, we consider the isothermal compressibility k t = - (8 In Vldp) T = (d In p/dp) T along the coexistence 
curve. A consideration of figure A2.5.6 shows that the compressibility is finite and positive at every point in 
the one-phase region except at the critical point. Differentiation of equation (A2.5.2) yields the 
compressibility along the critical isochore: 

p<fc T = 1/I6C7; - ])] = l/(-6f) (p = p c . T t > 1). 

At the critical point (and anywhere in the two-phase region because of the horizontal tie-line) the 
compressibility is infinite. However the compressibility of each conjugate phase can be obtained as a series 
expansion by evaluating the derivative (as a function of p r ) for a particular value of T , and then substituting 
the values of p r for the ends of the coexistence curve. The final result is 

p Q K T = l/[l2r ±(2l6/5)r 3;2 + ■■■] 

= [l/(12r)][l T (l2/5)f J ' 3 + - - ] (coex, T r < I) {A2 ' S ' 7) 

where in the ± and the T, the upper sign applies to the liquid phase and the lower sign to the gas phase. It is to 
be noted that although the compressibility becomes infinite as one approaches T Q from either direction, its 

value at a small 5 T below T Q is only half that at the same 5 T above T Q ; this means that there is a discontinuity 
at the critical point. 

Although the previous paragraphs hint at the serious failure of the van der Waals equation to fit the shape of 
the coexistence curve or the heat capacity, failures to be discussed explicitly in later sections, it is important to 
recognize that many of the other predictions of analytic theories are reasonably accurate. For example, 
analytic equations of state, even ones as approximate as that of van der Waals, yield reasonable values (or at 
least 'ball park estimates') of the critical constants p , T , and V . Moreover, in two-component systems 

where the critical point expands to a critical line inp, T space, or in three-component systems where there are 
critical surfaces, simple models yield many useful predictions. It is only in the vicinity of critical points that 
analytic theories fail. 

A2.5.3.2 THE VAN DER WAALS FLUID MIXTURE 

Van der Waals (1890) extended his theory to mixtures of components A and B by introducing mole-fraction- 
dependent parameters a and b defined as quadratic averages 
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ttn = (I - X) 2 a XX + 2-C(l - *)"AD + XUm (A2.5.8) 

h m = {l- xfb AA + 2v( I - x)h A n +s 2 hm (A2.5.9) 

where the as and bs extend the meanings defined in section A2. 5. 3.1 for the one-component fluid to the three 
kinds of pairs in the binary mixture; x is the mole fraction of the second component (B). With these 
definitions of a m and b , equation (A2.5.1) for the pressure remains unchanged, but an entropy of mixing 
must be added to the equation for the Helmholtz free energy. 

AtT. V} - [1 - x)Aa(T. V) - xAntT. V) 

= RT\ (I - *) ln( I - x) + y In x] + RT lrtf( V - b„)fV<] - a m f V. 

Van der Waals and especially van Laar simplified these expressions by assuming a geometric mean for a AB 
and an arithmetic mean for Z? AB , 

Then equation (A2.5.8) and equation (A2.5.9) for a m and b m become 

Q m =[(l-x)a)£+xa}g\ 2 (A2.5.10) 

^ = (1 -jr)* AA + -rfr,m. (A2.5.11) 

With these simplifications, and with various values of the as and bs, van Laar (1906-1910) calculated a wide 
variety of phase diagrams, determining critical lines, some of which passed continuously from liquid-liquid 
critical points to liquid-gas critical points. Unfortunately, he could only solve the difficult coupled equations 
by hand and he restricted his calculations to the geometric mean assumption for a l2 (i- e - to equation 
(A2.5.10)). For a variety of reasons, partly due to the eclipse of the van der Waals equation, this extensive 
work was largely ignored for decades. 

A2.5.3.3 GLOBAL PHASE DIAGRAMS 

Half a century later Van Konynenburg and Scott (1970, 1980) [3] used the van der Waals equation to derive 
detailed phase diagrams for two-component systems with various parameters. Unlike van Laar they did not 
restrict their treatment to the geometric mean for # AB , and for the special case of bj^ = b BB = b AB (equal- 
sized molecules), they defined two reduced variables, 

f = (tfim - a A A)/(flnH + ^aa) (A2.5.12) 

a = («bb - 2«ab + «bb)/(«hb +<Vaa)- (A2.5.13) 
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Physically, C, is a measure of the difference in the energies of vaporization of the two species (roughly a 
difference in normal boiling point), and X is a measure of the energy of mixing. With these definitions 
equation (A2.5.8) can be rewritten as 


G„rA*AA = Id - +2.C(C - A)+.C-Aj/(I - f). 

If a AB is weak in comparison to a^ and a BB , A, is positive and separation into two phases may occur. 

With this formulation a large number of very different phase diagrams were calculated using computers that 
did not exist in 1910. Six principal types of binary fluid phase diagrams can be distinguished by considering 
where critical lines begin and end. These are presented in figure A2. 5. 11 as thep, T projections ofp, T, x 
diagrams, which show the vapour pressure curves for the pure substances, critical lines and three-phase lines. 
To facilitate understanding of these projections, a number of diagrams showing T versus x for a fixed 
pressure, identified by horizontal lines in figure A2. 5. 11 , are shown in figure A2. 5. 12 . Note that neither of 
these figures shows any solid state, since the van der Waals equation applies only to fluids. The simple van 
der Waals equation for mixtures yields five of these six types of phase diagrams. Type VI, with its low- 
pressure (i.e. below 1 atm) closed loop between a lower critical-solution temperature (LCST) and an upper 
critical-solution temperature (UCST), cannot be obtained from the van der Waals equation without making X 
temperature dependent. 
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Figure A2.5.11. Typical pressure-temperature phase diagrams for a two-component fluid system. The full 
curves are vapour pressure lines for the pure fluids, ending at critical points. The dotted curves are critical 
lines, while the dashed curves are three-phase lines. The dashed horizontal lines are not part of the phase 
diagram, but indicate constant-pressure paths for the (T, x) diagrams in figure A2. 5. 12 . All but the type VI 
diagrams are predicted by the van der Waals equation for binary mixtures. Adapted from figures in [3]. 
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Figure A2.5.12. Typical temperature T versus mole fraction x diagrams for the constant-pressure paths shown 
in figure A2.5.11 . Note the critical points (x) and the horizontal three-phase lines. 

The boundaries separating these principal types of phase behaviour are shown ona^ij diagram (for equal- 
sized molecules) in figure A2. 5. 13 . For molecules of different size, but with the approximation of equation 
(A2.5.10) , more global phase diagrams were calculated using a third parameter, 

£ = (hm ~ *>aa)/(/>bh + *aa) 

and appropriately revised definitions for C, and X. For different-sized molecules (£, ^ 0), the global phase 
diagram is no longer symmetrical, but the topology is the same. 
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Figure A2.5.13. Global phase diagram for a van der Waals binary mixture for which bj^ = Z? BB . The 
coordinates X and £ are explained in the text. The curves separate regions of the different types shown in 
figure A2.5.11 . The heavier curves are tricritical lines (explained in section A2.5.9). The 'shield region' in the 
centre of the diagram where three tricritical lines intersect consists of especially complex phase diagrams not 
yet found experimentally. Adapted from figures in [3]. 

In recent years global phase diagrams have been calculated for other equations of state, not only van der 
Waals-like ones, but others with complex temperature dependences. Some of these have managed to find type 
VI regions in the overall diagram. Some of the recent work was brought together at a 1999 conference [4]. 


A2.5.4 ANALYTIC TREATMENTS OF OTHER CRITICAL PHENOMENA 

A2.5.4. 1 LIQUID-LIQUID PHASE SEPARA TION IN A SIMPLE BINARY MIXTURE 

The previous section showed how the van der Waals equation was extended to binary mixtures. However, 
much of the early theoretical treatment of binary mixtures ignored equation-of-state effects (i.e. the 
contributions of the expansion beyond the volume of a close-packed liquid) and implicitly avoided the 
distinction between constant pressure and constant volume by putting the molecules, assumed to be equal in 
size, into a kind of pseudo-lattice. Figure A2.5.14 shows schematically an equimolar mixture of A and B, at a 
high temperature where the distribution is essentially random, and at a low temperature where the mixture has 
separated into two virtually one-component phases. 
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Figure A2.5.14. Quasi-lattice representation of an equimolar binary mixture of A and B (a) randomly mixed 
at high temperature, and (b) phase separated at low temperature. 

The molar Helmholtz free energy of mixing (appropriate at constant volume) for such a symmetrical system 
of molecules of equal size, usually called a 'simple mixture', is written as a function of the mole fraction x of 
the component B 


-rH t 


= RT[x ln_v + (I - x)\n{\ -x)]+ Krtl - x) 


(A2.5.14) 


where the (ll°'s are the chemical potentials of the pure components. The Gibbs free energy of mixing AO^is 
(at constant pressure) AA M + pA V^and many theoretical treatments of such a system ignore the volume 
change on mixing and use the equation above for AG , which is the quantity of interest for experimental 
measurements at constant pressure. Equation (A2.5.14) is used to plot A^ M or equivalently AG versus x for 
several temperatures in figure A2.5.15 . As in the case of the van der Waals fluid a tangent line determines 
phase separation, but here the special symmetry requires that it be horizontal and that the mole fractions of the 
conjugate phases x' and x" satisfy the condition x" = 1 - x\ The critical-solution point occurs where (d A 
(i M /d x 2 ) T and (9 3 A(r M /d x^) T are simultaneously zero; for this special case, this point is at x Q = 1/2 and 

T Q = K/2R. The reduced temperatures that appear on the isotherms in figure A2.5.15 are then defined as T x = 
TIT =2RT/K. 
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Figure A2.5.15. The molar Gibbs free energy of mixing AG versus mole fraction x for a simple mixture at 
several temperatures. Because of the symmetry of equation (A2.5.15) the tangent lines indicating two-phase 
equilibrium are horizontal. The dashed and dotted curves have the same significance as in previous figures. 

In the simplest model the coefficient K depends only on the differences of the attractive energies -e of the 
nearest-neighbour pairs (these energies are negative relative to those of the isolated atoms, but here their 
magnitudes s are expressed as positive numbers) 

K = 7v"(z/2)(^aa - 2e AB + *bb) = # w- 

Here Sis the number of molecules in one mole and z is the coordination number of the pseudo-lattice (4 in 
figure A2. 5. 14 ; so S(z/2) is the total number of nearest-neighbour pairs in one mole of the mixture. The 
quantity w is the it interchange energy, the energy involved in the single exchange of molecules between the 
two pure components. (Compare the parameter X for the van der Waals mixture in the section above ( equation 
(A2.5.13Y V). If w is a constant, independent of temperature, then K is temperature independent and Kx (1 - x) 
is simply the molar energy of mixing AU M (frequently called the molar enthalpy of mixing Aw when the 
volume change is assumed to be zero). If the chemical potentials |u 1 and \i 2 are derived from the free energy 
of mixing, |u, x isotherms are obtained that are qualitatively similar to the |u, p isotherms shown in figure 
A2.5.8 an unsurprising result in view of the similarity of the free energy curves for the two cases. For such a 
symmetrical system one may define a 'degree of order' or 'order parameter' s = 2x - 1 such that s varies from 
-1 at x = to + 1 at x = 1. Then x = (1 + s)/2, and 1 - x = (1 - s)l2 and equation (A2.5.14) can be rewritten as: 


AG = AA = RT \ In 

[2 2 


+ — r In = 


+ /: 


(i-. 2 ) 


(A2.5.15) 


It is easy to derive the coexistence curve. Because of the symmetry, the double tangent is horizontal and the 
coexistent 
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phases occur at values of s where (dA M /d s) T equals zero. 


Even if Kis temperature dependent, the coexistence curve can still be defined in terms of a reduced 
temperature T v = 2RTIK{T), although the reduced temperature is then no longer directly proportional to the 
temperature T. 


(A2.5.16) 


Figure A2.5.16 shows the coexistence curve obtained from equation (A2.5.16). The logarithms (or the 
hyperbolic tangent) can be expanded in a power series, yielding 


This series can be reverted and the resulting equation is very simple: 


(A2.5.17) 


The leading term in equation (A2.5.17) is the same kind of parabolic coexistence curve found in section 
A2.5.3.1 from the van der Waals equation. The similarity between equation (A2.5.5) and equation (A2.5.17) 
should be obvious; the form is the same even though the coefficients are different. 


Figure A2.5.16. The coexistence curve, T = K/(2R) versus mole fraction x for a simple mixture. Also shown 
as an abscissa is the order parameter s, which makes the diagram equally applicable to order-disorder 
phenomena in solids and to ferromagnetism. The dotted curve is the spinodal. 
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The derivative (dA(* M /d x) T = \i B - \i A , where jli b and |u A are the chemical potentials of the two species, is the 
analogue of the pressure in the one-component system. From equation (A2.5.15) one obtains 

(0AG M /0x) T = 2{'<}AG M fih) r = RT[ln(l + s) - Infl - x)] - Ks. 

At s = this derivative obviously vanishes for all temperatures, but this is simply a result of the symmetry. 
The second derivative is another matter: 

(H 2 AG M /iix 2 )T =Mn 2 AG Xf /Hs 2 )j = 2[2NT/0 - x 2 ) - K\ = 4/T/ v .[7 r /(1 - s 2 ) - 1J, 

This vanishes at the critical-solution point as does (dp/d V) T at the one-component fluid critical point. Thus 

an 'osmotic compressibility' or 'osmotic susceptibility' can be defined by analogy with the compressibility k 
T of the one-component fluid. Its value along the simple-mixture coexistence curve can be obtained using 

equation (A2.5.17) and is found to be proportional to t . The osmotic compressibility of a binary mixture 
diverges at the critical point just like the compressibility of a one-component fluid (compare this to equation 

(A2.5.7) . 

For a temperature-independent K, the molar enthalpy of mixing is 

A/7" = Kx{\ -x) = 2RT K {\ -s 2 )/4= lfi'Ij2)[] -3/+(l2/5)i 2 ] 

and the excess mixing contribution to the heat capacity (now at constant pressure) is 

AC^_ V< = W2)£3-(24/5)f + ---J. 

Again, as in the case of C y for the van der Waals fluid, there is a linear increase up to a finite value at the 

critical point and then a sudden drop to the heat capacity of the one-phase system because the liquids are now 
completely mixed. 

Few if any binary mixtures are exactly symmetrical around x = 111, and phase diagrams like that sketched in 
figure A2.5.5(c)_ are typical. In particular one can write for mixtures of molecules of different size (different 
molar volumes V° A and V° B ) the approximate equation 

AC Vf = tfJlHn0 + (l -*)ln(] -<£)] + [(] -x)V A -^xV H \K f <}>(l-<}>) 


which is a combination of the Flory entropy of mixing for polymer solutions with an enthalpy of mixing due 
to Scatchard and Hildebrand. The variable § is the volume fraction of component B, 
■F = *u " &/v* "a "a + a"b v ^) anc j ^g p aram eter A 7 now has the dimensions of energy per unit volume. The 
condition for a critical-solution point, that the two derivatives cited above must simultaneously equal zero, 
yields the results 
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K'(T c ) = {RT c /2)\(\/V%) tn + (\/Vi) 1 ^] 1 

This simple model continues to ignore the possibility of volume changes on mixing, so for simplicity the 
molar volumes V° A and V° B are taken as those of the pure components. It should come as no surprise that in 

this unsymmetrical system both §' + §" and §' - §" must be considered and that the resulting equations have 
extra terms that look like those in equation (A2.5.4) and equation (A2.5.5) for the van der Waals mixture; as 
in that case, however, the top of the coexistence curve is still parabolic. Moreover the parameter^' is now 
surely temperature dependent (especially so for polymer solutions), and the calculation of a coexistence curve 
will depend on such details. 

As in the one-fluid case, the experimental sums are in good agreement with the law of the rectilinear diameter, 
but the experimental differences fail to give a parabolic shape to the coexistence curve. 

It should be noted that a strongly temperature-dependent K (or K') can yield more than one solution to the 
equation T Q = K/2R. Figure A2. 5. 17 shows three possible examples of a temperature-dependent K for the 
simple mixture: (a) a constant K as assumed in the discussion above, (b) a K that slowly decreases with T, the 
most common experimental situation, and (c) a K that is so sharply curved that it produces not only an upper 
critical-solution temperature (UCST), but also a lower critical-solution temperature (LCST) below which the 
fluids are completely miscible (i.e. the type VI closed-loop binary diagram of section A2. 5. 3. 3 ). The position 
of the curves can be altered by changing the pressure; if the two-phase region shrinks until the LCST and 
UCST merge, one has a 'double critical point' where the curve just grazes the critical line. A fourth possibility 
(known experimentally but not shown in figure A2. 5. 17 ) is an opposite curvature producing a low- 
temperature UCST and a high-temperature LCST with a one-phase region at intermediate temperatures; if 
these two critical-solution temperatures coalesce, one has a 'critical double point'. 
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one phase 


Figure A2.5.17. The coefficient K as a function of temperature T. The line K = 2RT (shown as dashed line) 
defines the critical point and separates the two-phase region from the one-phase region, (a) A constant K as 
assumed in the simplest example; (b) a slowly decreasing K, found frequently in experimental systems, and 
(c) a sharply curved K(T) that produces two critical-solution temperatures with a two-phase region in between. 

A2.5.4.2 ORDER-DISORDER IN SOLID MIXTURES 

In a liquid mixture with a negative K (negative interchange energy w), the formation of unlike pairs is 
favoured and there is no phase separation. However, in a crystal there is long-range order and at low 
temperatures, although there is no physical phase separation, a phase transition from a disordered arrangement 
to a regular arrangement of alternating atoms is possible The classic example is that of P -brass (CuZn) which 
crystallizes in a body-centred cubic lattice. At high temperature the two kinds of atoms are distributed at 
random, but at low temperature they are arranged on two interpenetrating it superlattices such that each Cu 
has eight Zn nearest neighbours, and each Zn eight Cu nearest neighbours, as shown in figure A2. 5. 18 this is 
like the arrangement of ions in a crystal of CsCl. 
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Figure A2.5.18. Body-centred cubic arrangement of P-brass (CuZn) at low temperature showing two 
interpenetrating simple cubic superlattices, one all Cu, the other all Zn, and a single lattice of randomly 
distributed atoms at high temperature. Reproduced from Hildebrand J H and Scott R L 1950 The Solubility of 
Nonelectrolytes 3rd edn (New York: Reinhold) p 342. 


The treatment of such order-disorder phenomena was initiated by Gorsky (1928) and generalized by Bragg 
and Williams (1934) [5]. For simplicity we restrict the discussion to the symmetrical situation where there are 
equal amounts of each component (x = 1/2). The lattice is divided into two superlattices a and P, like those in 
the figure, and a degree of order s is defined such that the mole fraction of component B on superlattice (3 is (1 
+ s)/4 while that on superlattice a is (1 - s)/4. Conservation conditions then yield the mole fraction of A on 
the two superlattices 

x% = jc| = ( 1 + s)/4 and x{ = x% = ( 1 - s)fA. 

If the entropy and the enthalpy for the separate mixing in each of the half-mole superlattices are calculated 
and then combined, the following equation is obtained: 


AG W = RT 


— - — Jn — - — + — - — In — 


-Nw . (A2.5.18) 


Note that equation (A2.5.18) is almost identical with equation (A2.5.15) . Only the final term differs and then 

only by the sign preceding s 1 . Now, however, the interchange energy can be negative if the unlike attraction is 
stronger than the like attractions; then of course K = Nw is also negative. If a reduced temperature T x is 

defined as - 2R 77 K, a plot of A M versus s for various T s is identical to that in figure A2.5.15 . For all 

values ofK/(2RT) above -1 (i.e. T v > 1), the minimum occurs at s = 0, corresponding to complete disorder 
when each superlattice is filled with equal amounts of A and B. However, for values below -1, i.e. T < 1, the 
minimum occurs at nonzero values of s, values that increase with decreasing temperature. Recall that K/(2RT) 
= + 1 defined the critical temperature for phase separation in a symmetrical binary mixture; here a value of-1 
defines the limit of long-range ordering. Thus for order-disorder 
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behaviour T Q = - K/2R defines a kind of critical temperature, although, by analogy with magnetic phenomena 
in solids, it is more often called the Curie point. 

The free energy minimum is found by differentiating equation (A2.5.18) with respect to s at constant T and 
setting the derivative equal to zero. In its simplest form the resultant equation is 

RT\r\[{ I +.v)/(l -,%)] =2«'rtanh" 1 v = -Ks 

exactly the same as equation (A2.5.13) for phase separation in simple mixtures except that this has -Ks 
instead of +Ks. However, since it is a negative K that produces superlattice separation, the effect is identical, 
and figure A2. 5. 15 and figure A2.5.16 apply to both situations. The physical models are different, but the 
mathematics are just the same. This 'disordering curve', like the coexistence curve, is given by equation 
(A2.5.15) and is parabolic, and, for a temperature-independent K, the molar heat capacity C for the 

equimolar alloy will be exactly the same as that for the simple mixture. 

Other examples of order-disorder second-order transitions are found in the alloys CuPd and Fe 3 Al. However, 
not all ordered alloys pass through second-order transitions; frequently the partially ordered structure changes 
to a disordered structure at a first-order transition. 

Nix and Shockley [6] gave a detailed review of the status of order-disorder theory and experiment up to 1938, 
with emphasis on analytic improvements to the original Bragg-Williams theory, some of which will be 


discussed later in section A2. 5.4.4 . 
A2.5.4.3 MAGNETISM 

The magnetic case also turns out to be similar to that of fluids, as Curie and Weiss recognized early on, but 
later for a long period this similarity was overlooked by those working on fluids (mainly chemists) and by 
those working on magnetism (mainly physicists). In a ferromagnetic material such as iron, the magnetic 
interactions between adjacent atomic magnetic dipoles causes them to be aligned so that a region (a 'domain') 
has a substantial magnetic dipole. Ordinarily the individual domains are aligned at random, and there is no 
overall magnetization. However, if the sample is placed in a strong external magnetic field, the domains can 
be aligned and, if the temperature is sufficiently low, a 'permanent magnet' is made, permanent in the sense 
that the magnetization is retained even though the field is turned off. Above a certain temperature, the Curie 
temperature r c , long-range ordering in domains is no longer possible and the material is no longer 
ferromagnetic, but only paramagnetic. Individual atoms can be aligned in a magnetic field, but all ordering is 
lost if the field is turned off. (The use of a subscript C for the Curie temperature should pose no serious 
confusion, since it is a kind of critical temperature too.) 

The little atomic magnets are of course quantum mechanical, but Weiss 's original theory of paramagnetism 
and ferromagnetism (1907) [7] predated even the Bohr atom. He assumed that in addition to the external 
magnetic field 2? , there was an additional internal 'molecular field' B i proportional to the overall 
magnetization M of the sample, 
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B - B [} + Bi - By + XM, 

If this field is then substituted into the Curie law appropriate for independent dipoles one obtains 

(A2.5.19) 

where C is the Curie constant. The experimental magnetic susceptibility % is defined as just Ml /? , since the 
internal field cannot be measured. Rearrangement of equation (A2.5.19) leads to the result 

X = M/Bo = C/{T - CX) = C/{T - 7' c ). (A2.5.20) 

Equation (A2.5.20) is the Curie-Weiss law, and T^, the temperature at which the magnetic susceptibility 
becomes infinite, is the Curie temperature. Below this temperature the substance shows spontaneous 
magnetization and is ferromagnetic. Normally the Curie temperature lies between 1 and 10 K. However, 
typical ferromagnetic materials like iron have very much larger values for quantum-mechanical reasons that 
will not be pursued here. 

Equation (A2.5.19) and equation (A2.5.20) are valid only for small values of Bq and further modelling is 
really not possible without some assumption, usually quantum mechanical, about the magnitude and 
orientation of the molecular magnets. This was not known to Weiss, but in the simplest case (half- integral 
spins), the magnetic dipole has the value of the Bohr magneton P e , and the maximum possible magnetization 
M max when all the dipoles are aligned with the field is N P e / V, where Nl Vis the number of dipoles per unit 
volume. 

If an order parameter s is defined as M I ^ max , it can be shown that 


■v = taiih|^+ftB /*r r )(Tr/r)| = tanhU.v + Br)/rrl- 


(A2.5.21) 


Isotherms of P e /? / kT c , which might be called a reduced variable B x , versus s are shown in figure A2.5.19 
and look rather similar to the j^,^ plots for a fluid ( figure A2.5.6 ). There are some differences, however, 
principally the symmetry that the fluid plots lack. At values of T> T c , the curves are smooth and monotonic, 
but at 7 C , as required, the magnetic susceptibility become infinite (i.e. the slope of B Y versus s becomes 
horizontal). 
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Figure A2.5.19. Isotherms showing the reduced external magnetic field B r = P e B Q /^r c versus the order 
parameter s = M/M m . for various reduced temperatures T = T/T r . 

For T< T c (T. < 1), however, the isotherms are S-shaped curves, reminiscent of the/? r , V y isotherms that the 
van der Waals equation yields at temperatures below the critical ( figure A2.5.6 ). As in the van der Waals case, 
the dashed and dotted portions represent metastable and unstable regions. For zero external field, there are 
two solutions, corresponding to two spontaneous magnetizations. In effect, these represent two 'phases' and 
the horizontal line is a 'tie-line'. Note, however, that unlike the fluid case, even as shown in |u r , p r form 
( figure A2.5.8 ), the symmetry causes all the 'tie-lines' to lie on top of one another at B x = (Bq = 0). 


For Bq = 0, equation (A2.5.21) reduces to 


s = tanh(.T/7 r ) 


which, while it looks somewhat different, is exactly the same as equation (A2.5.16) and yields exactly the 
same parabolic 'coexistence curve' as that from equation (A2.5.17) . Experimentally, as we shall see in the 
next section, the curve is not parabolic, but more nearly cubic. More generally, equation (A2.5.21) may be 
used to plot T r versus s for fixed values of B Y as shown in figure A2. 5. 20 . The similarity of this to a typical 
phase diagram (r, p or T, x) is obvious. Note that for nonzero values of the external field B Y the curves always 
lie outside the 'two-phase' region. 
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Figure A2.5.20. The reduced temperature T = TIT r versus the order parameter s = M/M mctv for various 
values of the reduced magnetic field B r . Note that for all nonzero values of the field the curves lie outside the 
'two-phase' region. 

Related to these ferromagnetic materials, but different, are antiferromagnetic substances like certain 
transition-metal oxides. In these crystals, there is a complicated three-dimensional structure of two 
interpenetrating superlattices not unlike those in CuZn. Here, at low temperatures, the two superlattices 
consist primarily of magnetic dipoles of opposite orientation, but above a kind of critical temperature, the 
Neel temperature T N , all long-range order is lost and the two superlattices are equivalent. For B^ = the 
behaviour of an antiferromagnet is exactly analogous to that of a ferromagnet with a similar 'coexistence 
curve' 5(r ), but for nonzero magnetic fields they are different. Unlike a ferromagnet at its Curie temperature, 
the susceptibility of an antiferromagnet does not diverge at the Neel temperature; extrapolation using the 
Curie-Weiss law yields a negative Curie temperature. Below the Neel temperature the antiferromagnetic 
crystal is anisotropic because there is a preferred axis of orientation. The magnetic susceptibility is finite, but 
varies with the angle between the crystal axis and the external field. 

A related phenomenon with electric dipoles is 'ferroelectricity' where there is long-range ordering (nonzero 
values of the polarization P even at zero electric field E) below a second-order transition at a kind of critical 
temperature. 

A2.5.4.4 MEAN FIELD VERSUS 'MOLECULAR FIELD' 

Apparently Weiss believed (although van der Waals did not) that the interactions between molecules were 
long-range and extended over the entire system; under such conditions, it was reasonable to assume that the 
energies could be represented as proportional to the populations of the various species. With the development 
of theories of intermolecular forces in the 1920s that showed that intermolecular interactions were usually 
very short-range, this view was clearly unrealistic. In the discussions of liquid and solid mixtures in the 
preceding sections it has been assumed that the principal interactions, or perhaps even the only ones, are 
between nearest neighbours; this led to energies proportional to the interchange energy w. It was therefore 
necessary to introduce what is clearly only an 
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approximation, that the probability of finding a particular molecular species in the nearest-neighbour shell (or 


indeed any more distant shell) around a given molecule is simply the probability of finding that species in the 
entire system. This is the 'mean-field' approximation that underlies many of the early analytic theories. 

However, one can proceed beyond this zeroth approximation, and this was done independently by 
Guggenheim (1935) with his 'quasi-chemical' approximation for simple mixtures and by Bethe (1935) for the 
order-disorder solid. These two approximations, which turned out to be identical, yield some enhancement to 
the probability of finding like or unlike pairs, depending on the sign of w and on the coordination number z of 
the lattice. (For the unphysical limit of z equal to infinity, they reduce to the mean-field results.) 

The integral under the heat capacity curve is an energy (or enthalpy as the case may be) and is more or less 
independent of the details of the model. The quasi-chemical treatment improved the heat capacity curve, 
making it sharper and narrower than the mean-field result, but it still remained finite at the critical point. 
Further improvements were made by Bethe with a second approximation, and by Kirkwood (1938). Figure 
A2.5.21 compares the various theoretical calculations [6]. These modifications lead to somewhat lower values 
of the critical temperature, which could be related to a flattening of the coexistence curve. Moreover, and 
perhaps more important, they show that a short-range order persists to higher temperatures, as it must because 
of the preference for unlike pairs; the excess heat capacity shows a discontinuity, but it does not drop to zero 
as mean-field theories predict. Unfortunately these improvements are still analytic and in the vicinity of the 
critical point still yield a parabolic coexistence curve and a finite heat capacity just as the mean-field 
treatments do. 
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Figure A2.5.21. The heat capacity of an order-disorder alloy like P-brass calculated from various analytic 
treatments. Bragg-Williams (mean-field or zeroth approximation); Bethe-1 (first approximation also 
Guggenheim); Bethe-2 (second approximation); Kirkwood. Each approximation makes the heat capacity 
sharper and higher, but still finite. Reproduced from [6] Nix F C and Shockley W 1938 Rev. Mod. Phys. 10 
14, figure 13. Copyright (1938) by the American Physical Society. 

Figure A2. 5. 22 shows [6] the experimental heat capacity of P-brass (CuZn) measured by Moser in 1934. Note 
that the experimental curve is sharper and goes much higher than any of the theoretical curves in figure 
A2.5.21 ; however, at that time it was still believed to have a finite limit. 
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Figure A2.5.22. The experimental heat capacity of a P-brass (CuZn) alloy containing 48.9 atomic percent Zn 
as measured by Moser (1934). The dashed line is calculated from the specific heats of Cu and Zn assuming an 
ideal mixture. Reproduced from [6] Nix F C and Shockley W 1938 Rev. Mod. Phys. 10 4, figure 4. Copyright 
(1938) by the American Physical Society. 


A2.5.4.5 THE CRITICAL EXPONENTS 


It has become customary to characterize various theories of critical phenomena and the experiments with 
which they are compared by means of the exponents occurring in certain relations that apply in the limit as the 
critical point is approached. In general these may be defined by the equation 


E = lim 


i)ln\X - J£,| ] 


ixiili 


where E is an exponent, Xand Fare properties of the system, and the path along which the derivative is 
evaluated must be specified. 
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Exponents derived from the analytic theories are frequently called 'classical' as distinct from 'modern' or 
'nonclassical' although this has nothing to do with 'classical' versus 'quantum' mechanics or 'classical' 
versus 'statistical' thermodynamics. The important thermodynamic exponents are defined here, and their 
classical values noted; the values of the more general nonclassical exponents, determined from experiment 
and theory, will appear in later sections. The equations are expressed in reduced units in order to compare the 
amplitude coefficients in subsequent sections. 


(A) THE HE A T-CAPA CITY EXPONENT A. 


An exponent a governs the limiting slope of the molar heat capacity, variously C v , C , or C M , along a line 
through the critical point, 


C(p c r c /j7 c ) = /!*/-« + ... (A2.5.22) 

where the ± recognizes that the coefficient A + for the function above the critical point will differ from the AT 
below the critical point. A similar quantity is the thermal expansivity a = (9 In VI d T) . For all these 
analytic theories, as we have seen on pages 533 and 539, the heat capacity remains finite, so a = 0. As we 
shall see, these properties actually diverge with exponents slightly greater than zero. Such divergences are 
called 'weak'. 

(B) THE COEXISTENCE-CURVE EXPONENT B. 

In general the width of the coexistence line (Ap, Ax, or AM) is proportional to an order parameter s, and its 
absolute value may be written as 

\(P ~ A: )/Pc I = \S | = Bt fi 4 ■ ■ ■ . (A2.5.23) 

As we have seen, all the analytic coexistence curves are quadratic in the limit, so for all these analytic 
theories, the exponent (3 = 1/2. 

(C) THE SUSCEPTIBILITY EXPONENT T. 

A third exponent y, usually called the 'susceptibility exponent' from its application to the magnetic 
susceptibility % in magnetic systems, governs what in pure-fluid systems is the isothermal compressibility k^, 
and what in mixtures is the osmotic compressibility, and determines how fast these quantities diverge as the 
critical point is approached (i.e. as T v — » 1). 

P< Kj = p c (3 In V/dp) T = r^ _/ + - ■ + (A2.5.24) 

For analytic theories, y is simply 1, and we have seen that for the van der Waals fluid T + / T~ equals 2. 
Divergences with exponents of the order of magnitude of unity are called 'strong'. 
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(D) THE CRITICAL-ISOTHERM EXPONENT A. 

Finally the fourth exponent 8 governs the limiting form of the critical isotherm, in the fluid case, simply 

(A2.5.25) 

Since all the analytic treatments gave cubic curves, their 8 is obviously 3. 

Exponent values derived from experiments on fluids, binary alloys, and certain magnets differ substantially 
from all those derived from analytic (mean-field) theories. However it is surprising that the experimental 
values appear to be the same from all these experiments, not only for different fluids and fluid mixtures, but 
indeed the same for the magnets and alloys as well (see section A2.5.5). 


(E) THERMODYNAMIC INEQUALITIES. 

Without assuming analyticity, but by applying thermodynamics, Rushbrooke (1963) and Griffiths (1964) 
derived general constraints relating the values of the exponents. 

«£" + 2£ + y t ~ > 2 
tf 2 " + jS(] +5) > 2. 

Here a 2 is the exponent for the heat capacity measured along the critical isochore (i.e. in the two-phase region) 
below the critical temperature, while Fi is the exponent for the isothermal compressibility measured in the 
one-phase region at the edge of the coexistence curve. These inequalities say nothing about the exponents a + 
and y + in the one-phase region above the critical temperature. 

Substitution of the classical values of the exponents into these equations shows that they satisfy these 
conditions as equalities. 


A2.5.5 THE EXPERIMENTAL FAILURE OF THE ANALYTIC 
TREATMENT 

Nearly all experimental 'coexistence' curves, whether from liquid-gas equilibrium, liquid mixtures, order- 
disorder in alloys, or in ferromagnetic materials, are far from parabolic, and more nearly cubic, even far below 
the critical temperature. This was known for fluid systems, at least to some experimentalists, more than one 
hundred years ago. Verschaffelt (1900), from a careful analysis of data (pressure-volume and densities) on 
isopentane, concluded that the best fit was with p = 0.34 and 5 = 4.26, far from the classical values. Van Laar 
apparently rejected this conclusion, believing that, at least very close to the critical temperature, the 
coexistence curve must become parabolic. Even earlier, van der Waals, who had derived a classical theory of 
capillarity with a surface-tension exponent of 3/2, found (1893) 


-36- 


that experimental results on three liquids yielded lower exponents (1.23-1.27); he too apparently expected 
that the discrepancy would disappear closer to the critical point. Goldhammer (1920) formulated a law of 
corresponding states for a dozen fluids assuming that the exponent P was 1/3. For reasons that are not entirely 
clear, this problem seems to have attracted little attention for decades after it was first pointed out. (This 
interesting history has been detailed by Levelt Sengers [8, 9].) 

In 1945 Guggenheim [10], as part of an extensive discussion of the law of corresponding states, showed that, 
when plotted as reduced temperature T Y versus reduced density p , all the coexistence-curve measurements on 
three inert gases (Ar, Kr, Xe) fell on a single curve, and that Ne, N 2 , 2 , CO and CH 4 also fit the same curve 
very closely. Moreover he either rediscovered or re-emphasized the fact that the curve was unequivocally 
cubic (i.e. P = 1/3) over the entire range of experimental temperatures, writing for p r 


A = J+(3/4)f±(7/4>/" 3 . (A 2.5.26) 

Figure A2. 5. 23 reproduces Guggenheim's figure, with experimental results and the fit to equation (A2.5.25) . 


It is curious that he never commented on the failure to fit the analytic theory even though that treatment — with 
the quadratic form of the coexistence curve — was presented in great detail in it Statistical Thermodynamics 
(Fowler and Guggenheim, 1939). The paper does not discuss any of the other critical exponents, except to fit 
the vanishing of the surface tension a at the critical point to an equation 


_ Jl/Q 


<T = G$l 


This exponent 1 1/9, now called |u, is almost identical with that found by van der Waals in 1893. 


-37- 



Figure A2.5.23. Reduced temperature T T = T/T Q versus reduced density p r = p/p c for Ne, Ar, Kr, Xe, N 2 , 2 , 
CO, and CH 4 . The full curve is the cubic equation (A2.5.26) . Reproduced from [10], p 257 by permission of 
the American Institute of Physics. 

In 1953 Scott [ 11 ] pointed out that, if the coexistence curve exponent was 1/3, the usual conclusion that the 
corresponding heat capacity remained finite was invalid. As a result the heat capacity might diverge and he 
suggested an exponent a = 1/3. Although it is now known that the heat capacity does diverge, this suggestion 
attracted little attention at the time. 

However, the discovery in 1962 by Voronel and coworkers [ 12 ] that the constant-volume heat capacity of 

argon showed a weak divergence at the critical point, had a major impact on uniting fluid criticality with that 

of other systems. They thought the divergence was logarithmic, but it is not quite that weak, satisfying 

equation (A2.5.21) with an exponent a now known to be about 0.11. The equation applies both above and 

below the critical point, but with different coefficients; AT is larger than A + . Thus the heat capacity ( figure 
A2.5.24 ) is quite asymmetrical around T c and appears like a sharp discontinuity. 
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Figure A2.5.24. The heat capacity of argon in the vicinity of the critical point, as measured by Voronel and 
coworkers. Adapted from figure 1 of [12]. 

In 1962 Heller and Benedek made accurate measurements of the zero-field magnetization of the 
antiferromagnet MnF 2 as a function of temperature and reported a P of 0.335±0.005, a result supporting an 
experimental parallelism between fluids and magnets. 

By 1966 the experimental evidence that the classical exponents were wrong was overwhelming and some 
significant theoretical advances had been made. In that year an important conference on critical phenomena 
[ 13 ] was held at the US National Bureau of Standards, which brought together physicists and chemists, 
experimentalists and theoreticians. Much progress had already been made in the preceding several years, and 
finally the similarity between the various kinds of critical phenomena was clearly recognized. The next 
decade brought near resolution to the problems. 
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A2.5.6 THE ISING MODEL AND THE GRADUAL SOLUTION OF THE 
PROBLEM 


A2.5.6.1 THE ISING MODEL 


In 1925 Ising [ 14 ] suggested (but solved only for the relatively trivial case of one dimension) a lattice model 
for magnetism in solids that has proved to have applicability to a wide variety of other, but similar, situations. 
The mathematical solutions, or rather attempts at solution, have made the Ising model one of the most famous 
problems in classical statistical mechanics. 

The model is based on a classical Hamiltonian W(here shown in script to distinguish it from the enthalpy H) 


H = — ^ Jij&iGj - ft ^ <7/ 


r<; i 

where a f and a. are scalar numbers (+1 or -1) associated with occupancy of the lattice sites. In the magnetic 
case these are obviously the two orientations of the spin s = 1/2, but without any vector significance. The 
same Hamiltonian can be used for the lattice-solid mixture, where +1 signifies occupancy by molecule A, 
while -1 signifies a site occupied by molecule B (essentially the model used for the order-disorder transition 
in section A2. 5.4. 2 ). For the 'lattice gas', +1 signifies a site occupied by a molecule, while -1 signifies an 
unoccupied site (a 'hole'). 

The parameter J., is a measure of the energy of interaction between sites / andy while h is an external potential 
or field common to the whole system. The term il Z^j °* L is a generalized work term (i.e. -pV, \i N, VB^M, 

etc), so Wis a kind of generalized enthalpy. If the interactions J are zero for all but nearest-neighbour sites, 
there is a single nonzero value for J, and then 


H=—J 2J flfrffj ~^ ?* a *' 


WW, J < / t 

Thus any nearest-neighbour pair with the same signs for a (spins parallel) contributes a term -J to the energy 
and hence to W. Conversely any nearest-neighbour pair with opposite signs for a (spins opposed) contributes 
+/to the energy. (If this doesn't seem right when extended to the lattice gas or to the lattice solid, it should be 
noted that a shift of the zero of energy resolves this problem and yields exactly the same equation. Thus, in 
the lattice mixture, there is only one relevant energy parameter, the interchange energy w.) What remained to 
be done was to derive the various thermodynamic functions from this simple Hamiltonian. 

The standard analytic treatment of the Ising model is due to Landau (1937). Here we follow the presentation 
by Landau and Lifschitz [15], which casts the problem in terms of the order-disorder solid, but this is 
substantially the same as the magnetic problem if the vectors are replaced by scalars (as the Ising model 
assumes). The thermodynamic 
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potential, in this case G(T,p, s), is expanded as a Taylor series in even powers of the order parameter s 
(because of the symmetry of the problem there are no odd powers) 

G(7\ p, s) = Go +G 2 -v : + G*\ A + Gbs* + ♦ -. 

Here the coefficients G 2 , G 4 , and so on, are functions ofp and T, presumably expandable in Taylor series 
around p -p and T- T . However, it is frequently overlooked that the derivation is accompanied by the 
comment that 'since ... the second-order transition point must be some singular point of the thermodynamic 
potential, there is every reason to suppose that such an expansion cannot be carried out up to terms of 
arbitrary order', but that 'there are grounds to suppose that its singularity is of higher order than that of the 
terms of the expansion used'. The theory developed below was based on this assumption. 

For the kind of transition above which the order parameter is zero and below which other values are stable, 
the coefficient A 2 must change sign at the transition point and A 4 must remain positive. As we have seen, the 
dependence of s on temperature is determined by requiring the free energy to be a minimum (i.e. by setting its 
derivative with respect to s equal to zero). Thus 


(dGfdx)T^ = 2G?s + 4Gj x* + 6G$s 5 + - = G. 

If the G coefficients are expanded (at constant pressure /? ) in powers of t, this can be rewritten as 

i-Sl\t +i'23f" + ■ ■> + QjW - Jf-ilf + * ' )V 2 + (fiw+ ■ ■ V + ■ - = 0. 

Reverting this series and simplifying yields the final result in powers of t 

* 2 = {g22fgto)t ~ l<822gii ~ 82\g4\gW + 84l86o)/(8^8m)\t 2 + " " (A2.5.27) 

and we see that, like all the previous cases considered, this curve too is quadratic in the limit. (The derivation 
here has been carried to higher powers than shown in [15].) These results are more general than the analytic 
results in previous sections (in the sense that the coefficients are more general), but the basic conclusion is the 
same; moreover other properties like the heat capacity are also described in the analytic forms discussed in 
earlier sections. There is no way of explaining the discrepancies without abandoning the assumption of 
analyticity. (It is an interesting historical note that many Russian scientists were among the last to accept this 
failure; they were sure that Landau had to have been right, and ignored his stated reservations.) 

That analyticity was the source of the problem should have been obvious from the work of Onsager (1944) 
[ 16 ] who obtained an exact solution for the two-dimensional Ising model in zero field and found that the heat 
capacity goes to infinity at the transition, a logarithmic singularity that yields a = 0, but not the a = of the 
analytic theory, which corresponds to a finite discontinuity. (While diverging at the critical point, the heat 
capacity is symmetrical without an actual discontinuity, so perhaps should be called third-order.) 
Subsequently Onsager (1948) reported other exponents, and Yang (1952) completed the derivation. The 
exponents are rational numbers, but not the classical ones. 
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The 'coexistence curve' is nearly flat at its top, with an exponent P = 1/8, instead of the mean-field value of 
1/2. The critical isotherm is also nearly flat at T c ; the exponent 8 (determined later) is 15 rather than the 3 of 
the analytic theories. The susceptibility diverges with an exponent y = 7/4, a much stronger divergence than 
that predicted by the mean-field value of 1 . 

The classical treatment of the Ising model makes no distinction between systems of different dimensionality, 
so, if it fails so badly for d = 2, one might have expected that it would also fail for d = 3. Landau and Lifschitz 
[ 15 ] discussed the Onsager and Yang results, but continued to emphasize the analytic conclusions for d = 3. 

A2.5.6.2 THE ASSUMPTION OF HOMOGENEITY. THE 'SCALING' LAWS 

The first clear step away from analyticity was made in 1965 by Widom [ 17 ] who suggested that the 
assumption of analytic functions be replaced by the less severe assumption that the singular part of the 
appropriate thermodynamic function was a homogeneous function of two variables, (p - 1) and (1 - T 7 ). A 
homogeneous function flu, v) of two variables is one that satisfies the condition 

/{**«, V'v)^ */<«,*>). 

If one assumes that the singular part A of the Helmholtz free energy is such a function 


then a great deal follows. In particular, the reduced chemical potential |u r = [|u(p, T) - |u(p c , T)](p c /p c ) of a 
fluid can be written as 

iUaV^'V - i). r'^-^a - r r )\ = \fiMPr - i),d - '/;))> 

(The brackets symbolize 'function of, not multiplication.) Since there are only two parameters, a and a v in 
this expression, the homogeneity assumption means that all four exponents a, P, y and 8 must be functions of 
these two; hence the inequalities in section A2. 5. 4. 5(e) must be equalities. Equations for the various other 
thermodynamic quantities, in particular the singular part of the heat capacity Cy and the isothermal 
compressibility k^, may be derived from this equation for |u r . The behaviour of these quantities as the critical 
point is approached can be satisfied only if 

a p = l/(S + 1} = 0/(2 - a) and ct T = a p fp. 

This implies that \x may be written in a scaled form 

^ r = [flip, T) - /i(p ct 7)]( A /p c ) = (ft - 1)|ft - 1|*- ] Z>/iU/-V ) (A2.5.28) 
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where h(x/x^) is an analytic function of x = (T - l)/|p r - 1| P and x Q , the value of x at the critical point, x Q = 

B 'P. The curve x = -x Q is the coexistence curve, the curve x = is the critical isotherm, and the curve x = go 
is the critical isochore. All the rest of the thermodynamic behaviour in the critical region can be derived from 

this equation, with the appropriate exponents as functions of (3 and 8. Note that there are now not only just 
two independent exponents, but also only two independent amplitudes, B and D, the amplitudes in equation 
(A2.5.23) and equation (A2.5.25) . This homogeneity assumption is now known as the 'principle of two-scale- 
factor universality'. This principle, proposed as an approximation, seems to have stood the test of time; no 
further generalization seems to be needed. (We shall return to discuss exponents and amplitudes in section 
A2.5.7.1 ). 

An unexpected conclusion from this formulation, shown in various degrees of generality in 1970-71, is that 
for systems that lack the symmetry of simple lattice models the slope of the diameter of the coexistence curve 

should have a weak divergence proportional to t~ a . This is very hard to detect experimentally because it 
usually produces only a small addition to the classical linear term in the equation for the diameter 

(Pi + Pg)/(2A;) = Ai = l + di-„r l-u + ^ir + - ♦ 

However this effect was shown convincingly first [ 18 ] by Jiingst, Knuth and Hensel (1985) for the fluid 
metals caesium and rubidium (where the effect is surprisingly large) and then by Pestak et al (1987) for a 
series of simple fluids; figure A2. 5. 25 shows the latter results [19]. Not only is it clear that there is curvature 
very close to the critical point, but it is also evident that for this reason critical densities determined by 
extrapolating a linear diameter may be significantly too high. The magnitude of the effect (i.e. the value of the 
coefficient^, ), seems to increase with the polarizability of the fluid. 
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Figure A2.5.25. Coexistence-curve diameters as functions of reduced temperature for Ne, N 2 , C 2 H 4 , C 2 H 6 , 
and SF 6 . Dashed lines indicate linear fits to the data far from the critical point. Reproduced from [ 19 ] Pestak 
M W, Goldstein R E, Chan M H W, de Bruyn J R, Balzarini D A and Ashcroft N W 1987 Phys. Rev. B 36 
599, figure 3. Copyright (1987) by the American Physical Society. 
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Figure A2.5.26. Molar heat capacity C y of a van der Waals fluid as a function of temperature: from mean- 
field theory (dotted line); from crossover theory (full curve). Reproduced from [29] Kostrowicka 
Wyczalkowska A, Anisimov M A and Sengers J V 1999 Global crossover equation of state of a van der 
Waals fluid Fluid Phase Equilibria 158-160 532, figure 4, by permission of Elsevier Science. 

A2.5.6.3 THE 'REASON' FOR THE NONANALYTICITY: FLUCTUATIONS 

No system is exactly uniform; even a crystal lattice will have fluctuations in density, and even the Ising model 
must permit fluctuations in the configuration of spins around a given spin. Moreover, even the classical 
treatment allows for fluctuations; the statistical mechanics of the grand canonical ensemble yields an exact 
relation between the isothermal compressibility K^and the number of molecules N in volume V: 

rr- = (N 2 ) - {N} 2 = kTK T {N 2 {V) 

where a is the standard deviation of the distribution of TV's, and the brackets indicate averages over the 
distribution. 


i20, 


If the finite size of the system is ignored (after all, TV is probably 10 or greater), the compressibility is 
essentially infinite at the critical point, and then so are the fluctuations. In reality, however, the 

compressibility diverges more sharply than classical theory allows (the exponent y is significantly greater than 
1), and thus so do the fluctuations. 


Microscopic theory yields an exact relation between the integral of the radial distribution function g(r) and the 
compressibility 
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RTpx? =]+p [}{(/) -l)dr 


J{ K (r)-l)dr = \ + p[ 


htndr 


where g(r) is the radial distribution function which is the probability density for finding a molecule a distance 


r from the centre of a specified molecule, and h(r) is the pair correlation function. At sufficiently long 
distances g(r) must become unity while h(r) must become zero. Since k^ diverges at the critical point, so also 
must the integral. The only way the integral can diverge is for the integrand to develop a very long tail. The 
range of the fluctuations is measured by the correlation length £. Near, but not exactly at, the critical point, the 
behaviour of h(r) can be represented by the Ornstein-Zernike (1914, 1916) equation 

while, at the critical point, h(r) oc l/r^ _2+r| , where d is the dimensionality of the system and r| is a very small 
number (zero classically). The correlation length £, increases as the critical point is approached and it will 
ultimately diverge. On the critical isochore, p = p c , one finds 

* = foi/p 

where classically v = y 12 = 1/2. If the hypothesis of homogeneity is extended to the correlation length, what 
has become known as hyperscaling yields relations between the exponents v and r| and the thermodynamic 
exponents: 

v = (2 - a)/d and 2 - 7 = </(! + A)/(l - S). 

Here d is the dimensionality of the system. (One recovers the analytic values with d = 4.) 

Fluctuations in density and composition produce opalescence, a recognized feature of the critical region. 
Since systems very close to a critical point become visibly opaque, the fluctuations must extend over ranges 
comparable to the wavelength of light (i.e. to distances very much greater than molecular dimensions). 
Measurements of light scattering can yield quantitative information about the compressibility and thus about 
the magnitude of the fluctuations. Such measurements in the critical region showed the failure of the analytic 
predictions and yielded the first good experimental determinations of the exponent y. As predicted even by 
classical theory the light scattering (i.e. the compressibility) on the critical isochore at a small temperature 8 T 
above the critical temperature is larger than that at the same 8 T below the critical temperature along the 
coexistence curve. 

What this means is that mean-field (analytic) treatments fail whenever the range of correlations greatly 
exceeds the range of intermolecular forces. It follows that under these circumstances there should be no 
difference between the limiting behaviour of an Ising lattice and the nonlattice fluids; they should have the 
same exponents. Nearly a century after the introduction of the van der Waals equation for fluids, Kac, 
Uhlenbeck and Hemmer (1963) [ 20 ] proved that, in a one-dimensional system, it is exact for an 
intermolecular interaction that is infinite in range and infinitesimal in magnitude. (It is interesting to note that, 
in disagreement with van der Waals, Boltzmann insisted that the equation could only be correct if the range of 
the interactions were infinite.) 
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Moreover, well away from the critical point, the range of correlations is much smaller, and when this range is 
of the order of the range of the intermolecular forces, analytic treatments should be appropriate, and the 
exponents should be 'classical'. The need to reconcile the nonanalytic region with the classical region has led 
to attempts to solve the 'crossover' problem, to be discussed in section A2. 5. 7.2 . 

A2.5.6.4 A UNIFORM GEOMETRIC VIEW OF CRITICAL PHENOMENA: 'FIELDS' AND 'DENSITIES' 


While there was a general recognition of the similarity of various types of critical phenomena, the situation 
was greatly clarified in 1970 by a seminal paper by Griffiths and Wheeler [21]. In particular the difference 
between variables that are 'fields' and those that are 'densities' was stressed. A 'field' is any variable that is 
the same in two phases at equilibrium, (e.g. pressure, temperature, chemical potential, magnetic field). 
Conversely a 'density' is a variable that is different in the two phases (e.g. molar volume or density, a 
composition variable like mole fraction or magnetization). The similarity between different kinds of critical 
phenomena is seen more clearly when the phase diagram is shown exclusively with field variables. (Examples 
of this are figure A2.5.1 and figure A2.5.11 ) 

The field-density concept is especially useful in recognizing the parallelism of path in different physical 
situations. The criterion is the number of densities held constant; the number of fields is irrelevant. A path to 
the critical point that holds only fields constant produces a strong divergence; a path with one density held 
constant yields a weak divergence; a path with two or more densities held constant is nondivergent. Thus the 
compressibility K^of a one-component fluid shows a strong divergence, while Cy in the one-component fluid 
is comparable to C (constant pressure and composition) in the two-component fluid and shows a weak 
divergence. 

The divergences of the heat capacity Cy and of the compressibility K^for a one-component fluid are usually 
defined as along the critical isochore, but if the phase diagram is shown in field space (p versus T as in figure 
A2.5.1 or figure A2.5.11 , it is evident that this is a 'special' direction along the vapour pressure curve. Indeed 
any direction that lies within the coexistence curve (e.g. constant enthalpy etc) and intersects that curve at the 
critical point will yield the same exponents. Conversely any path that intersects this special direction, such as 
the critical isobar, will yield different exponents. These other directions are not unique; there is no such thing 
as orthogonality in thermodynamics. Along the critical isobar, the compressibility divergence is still strong, 
but the exponent is reduced by renormalization from y to y/p5, nearly a 40% reduction. The weak divergence 
of Cy is reduced by a similar amount from a to a/pS. 

Another feature arising from field-density considerations concerns the coexistence curves. For one-component 
fluids, they are usually shown as temperature T versus density p, and for two-component systems, as 
temperature versus composition (e.g. the mole fraction x); in both cases one field is plotted against one 
density. However in three-component systems, the usual phase diagram is a triangular one at constant 
temperature; this involves two densities as independent variables. In such situations exponents may be 
'renormalized' to higher values; thus the coexistence curve exponent may rise to (3/(1 - a). (This 
'renormalization' has nothing to do with the 'renormalization group' to be discussed in the next section.) 

Finally the concept of fields permits clarification of the definition of the order of transitions [22]. If one 
considers a space of all fields (e.g. Figure A2.5.1 but not figure A2.5.3 , a first-order transition occurs where 
there is a discontinuity in the first derivative of one of the fields with respect to another (e.g. (d\i/d T) =-S 

and (d\i/dp) T = V), while a second-order transition occurs when the corresponding first derivative is 

continuous but the second is not and so on. Thus the Ehrenfest-Pippard definitions are preserved if the paths 
are not defined in terms of any densities. 
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A feature of a critical point, line, or surface is that it is located where divergences of various properties, in 
particular correlation lengths, occur. Moreover it is reasonable to assume that at such a point there is always 
an order parameter that is zero on one side of the transition and that becomes nonzero on the other side. 
Nothing of this sort occurs at a first-order transition, even the gradual liquid-gas transition shown in figure 
A2.5.3 and figure A2.5.4 . 

A2.5.6.5 THE CALCULATION OF EXPONENTS 


From 1965 on there was an extensive effort to calculate, or rather to estimate, the exponents for the Ising 
model. Initially this usually took the form of trying to obtain a low-temperature expansion (i.e. in powers of T) 
or a high-temperature expansion (i.e. in powers of 1/2) of the partition function, in the hope of obtaining 
information about the ultimate form of the series, and hence to learn about the singularities at the critical 
point. Frequently this effort took the form of converting the finite series (sometimes with as many as 25 
terms) into a Pade approximant, the ratio of two finite series. From this procedure, estimates of the various 
critical exponents (normally as the ratio of two integers) could be obtained. For the two-dimensional Ising 
model these estimates agreed with the values deduced by Onsager and Yang, which encouraged the belief that 
those for the three-dimensional model might be nearly correct. Indeed the d = 3 exponents estimated from 
theory were in reasonable agreement with those deduced from experiments close to the critical point. In this 
period much of the theoretical progress was made by Domb, Fisher, Kadanoff, and their coworkers. 

In 1971 Wilson [ 23 ] recognized the analogy between quantum-field theory and the statistical mechanics of 
critical phenomena and developed a renormalization-group (RG) procedure that was quickly recognized as a 
better approach for dealing with the singularities at the critical point. New calculation methods were 
developed, one of which, expansion in powers of s = 4 - d, where d is the dimension taken as a continuous 
variable, was first proposed by Wilson and Fisher (1972). These new procedures led to theoretical values of 
the critical exponents with much smaller estimates of uncertainty. The best current values are shown in table 
A2.5.1 in section A2. 5. 7.1 . The RG method does assume, without proof, the homogeneity hypothesis and thus 
that the exponent inequalities are equalities. Some might wish that these singularities and exponents could be 
derived from a truly molecular statistical-mechanical theory; however, since the singular behaviour arises 
from the approach of the correlations to infinite distance, this does not seem likely in the foreseeable future. 
This history, including a final chapter on the renormalization group, is discussed in detail in a recent (1996) 
book by Domb [23]. 
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Table A2.5.1 Ising model exponents. 


Exponent 


d=2 d=3 


Classical (cf>4) 


a Heat capacity, C y , C p x , C M 

P Coexistence, Ap, Ax, AM 

y Compressibility, k^ a , k 7 , x T 

5 Critical isotherm, p(V), |u(x), B (M) 

v Correlation length, £, 

r| Critical correlation function 


(log) 


0. 1 09 ± 0.004 (finite jump) 


1/8 

0.3258 ±0.0014 

1/2 

7/4 

1.2396 ±0.0013 

1 

15 

4.8047 ± 0.0044 

3 

1 

0.6304 ±0.0013 

1/2 

1/4 

0.0335 ± 0.0025 




A2.5.6.6 EXTENDED SCALING. WEGNER CORRECTIONS 


In 1972 Wegner [ 25 ] derived a power-series expansion for the free energy of a spin system represented by a 
Hamiltonian roughly equivalent to the scaled equation (A2.5.28) , and from this he obtained power-series 
expansions of various thermodynamic quantities around the critical point. For example the compressibility 


can be written as 

k t = 4 + r r y + iy-^* 1 + iy -*-*** + - - - . 

The new parameters in the exponents, A 1 and A 2 , are exactly or very nearly 0.50 and 1.00 respectively. 
Similar equations apply to the 'extended scaling' of the heat capacity and the coexistence curve for the 
determination of a and p. 

The Wegner corrections have been useful in analysing experimental results in the critical region. The 'correct' 
exponents are the limiting values as T r approaches unity, not the average values over a range of temperatures. 
Unfortunately the Wegner expansions do not converge very quickly (if they converge at all), so the procedure 
does not help in handling a crossover to the mean-field behaviour at lower temperatures where the correlation 
length is of the same order of magnitude as the range of intermolecular forces. A consistent method of 
handling crossover is discussed in section A2. 5. 7. 2 . 

A2.5.6.7 SOME EXPERIMENTAL PROBLEMS 

The scientific studies of the early 1970s are full of concern whether the critical exponents determined 
experimentally, particularly those for fluids, could be reconciled with the calculated values, and at times it 
appeared that they could not be. However, not only were the theoretical values more uncertain (before RG 
calculations) than first believed, but also there were serious problems with the analysis of the experiments, in 
addition to those associated with the Wegner 


-49- 

corrections outlined above. Scott [26] has discussed in detail experimental difficulties with binary fluid 
mixtures, but some of the problems he cited apply to one-component fluids as well. 

An experiment in the real world has to deal with gravitational effects. There will be gravity-induced density 
gradients and concentration gradients such that only at one height in an experimental cell will the system be 
truly at the critical point. To make matters worse, equilibration in the critical region is very slow. These 
problems will lead to errors of uncertain magnitude in the determination of all the critical exponents. For 
example, the observed heat capacity will not display an actual divergence because the total enthalpy is 
averaged over the whole cell and only one layer is at the critical point. 

Another problem can be the choice of an order parameter for the determination of P and of the departure from 

linearity of the diameter, which should be proportional to t^~ a . In the symmetrical systems, the choice of the 
order parameter s is usually obvious, and the symmetry enforces a rectilinear diameter. Moreover, in the one- 
component fluid, the choice of the reduced density p/p c has always seemed the reasonable choice. However, 
for the two-component fluid, there are two order parameters, density and composition. It is not the density p 
that drives the phase separation, but should the composition order parameter be mole fraction x, volume 
fraction §, or what? For the coexistence exponent p the choice is ultimately immaterial if one gets close 
enough to the critical temperature, although some choices are better than others in yielding an essentially 
cubic curve over a greater range of reduced temperature. (Try plotting the van der Waals coexistence curve 
against molar volume V instead of density p.) However this ambiguity can have a very serious effect on any 
attempt to look for experimental evidence for departures from the rectilinear diameter in binary mixtures; an 
unwise choice for the order parameter can yield an exponent 2p rather than the theoretical 1 - a (previously 
discussed in section A2. 5. 6. 2 ) thus causing a much greater apparent departure from linearity. 


A2.5.7 THE CURRENT STATUS OF THE ISING MODEL; THEORY AND 
EXPERIMENT 

Before reviewing the current knowledge about Ising systems, it is important to recognize that there are non- 
Ising systems as well. A basic feature of the Ising model is that the order parameter is a scalar, even in the 
magnetic system of spin 1/2. If the order parameter is treated as a vector, it has a dimensionality n, such that n 
= 1 signifies a scalar (the Ising model), n = 2 signifies a vector with two components (the XT model), n = 3 
signifies a three-component vector (the Heisenberg model), n = oo is an unphysical limit to the vector concept 
(the so-called spherical model), and n — » is a curious mathematical construct that seems to fit critical 
phenomena in some polymer equilibria. Some of these models will be discussed in subsequent sections, but 
first we limit ourselves to the Ising model. 
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A2.5.7.1 THE ISING EXPONENTS AND AMPLITUDES 

There is now consensus on some questions about which there had been lingering doubts. 


(a) There is now agreement between experiment and theory on the Ising exponents. Indeed it is now 
reasonable to assume that the theoretical values are better, since their range of uncertainty is less. 

(b) There is no reason to doubt that the inequalities of section A2. 5.4. 5(e) are other than equalities. The 
equalities are assumed in most of the theoretical calculations of exponents, but they are confirmed 
(within experimental error) by the experiments. 

(c) The exponents apply not only to solid systems (e.g. order-disorder phenomena and simple magnetic 
systems), but also to fluid systems, regardless of the number of components. (As we have seen in section 
A2.5.6.4 it is necessary in multicomponent systems to choose carefully the variable to which the 
exponent is appropriate.) 

W There is no distinction between the exponents above and below the critical temperature. Thus y + = y~ = y 

and . However, there is usually a significant difference in the coefficients above and below (e.g. A + and 

A~); this produces the discontinuities at the critical point. 

Many of the earlier uncertainties arose from apparent disagreements between the theoretical values and 
experimental determinations of the critical exponents. These were resolved in part by better calculations, but 
mainly by measurements closer and closer to the critical point. The analysis of earlier measurements assumed 
incorrectly that the measurements were close enough. (Van der Waals and van Laar were right that one 
needed to get closer to the critical point, but were wrong in expecting that the classical exponents would then 
appear.) As was shown in section A2. 5. 6. 7 , there are additional contributions from 'extended' scaling. 

Moreover, some uncertainty was expressed about the applicability to fluids of exponents obtained for the 
Ising lattice. Here there seemed to be a serious discrepancy between theory and experiment, only cleared up 
by later and better experiments. By hindsight one should have realized that long-range fluctuations should be 
independent of the presence or absence of a lattice. 

Table A2.5.1 shows the Ising exponents for two and three dimensions, as well as the classical exponents. The 
uncertainties are those reported by Guida and Zinn- Justin [27]. These exponent values satisfy the equalities 
(as they must, considering the scaling assumption) which are here reprised as functions of P and y: 


a = 2 - 2fi - y 
& = (0 + y)/fi 

rt = 2-dfiy/afi + y) = 2-y/v. 

The small uncertainties in the calculated exponents seem to preclude the possibility that the d = 3 exponents 
are rational numbers (i.e. the ratio of integers). (At an earlier stage this possibility had been suggested, since 
not only the classical exponents, but also the d = 2 exponents are rational numbers; pre-RG calculations had 
suggested p = 5/16 and y = 5/4.) 
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As noted earlier in section A2. 5. 6. 2 , the assumption of homogeneity and the resulting principle of two-scale- 
factor universality requires the amplitude coefficients to be related. In particular the following relations can be 
derived: 

aA+r+fB 2 = 0.0574 ± 0.0020 
r + DB*-' = I.669 ±0OJ8 
A + /A~ -0.537 ±0.019 
rVT"= 4.79 ±0.10. 

These numerical values come from theory [ 26 ] and are in good agreement with recent experiments. 
A2.5.7.2 CROSSOVER FROM MEAN-FIELD TO THE CRITICAL REGION 

At temperatures well below the critical region one expects a mean-field treatment (or at least a fully analytic 
one) to be applicable, since the correlations will be short range. In the critical region, as we have seen, when 
the correlation length becomes far greater than the range of intermolecular forces, the mean-field treatment 
fails. Somewhere between these two limits the treatment of the problem has to 'cross over'. Early attempts to 
bridge the gap between the two regimes used switching functions, and various other solutions have been 
proposed. A reasonably successful treatment has been developed during the past few years by Anisimov and 
Sengers and their collaborators. (Detailed references will be found in a recent review chapter [28].) 

As a result of long-range fluctuations, the local density will vary with position; in the classical Landau- 
Ginzburg theory of fluctuations this introduces a gradient term. A Ginzburg number N G is defined (for a 

three-dimensional Ising system) as proportional to a dimensionless parameter *ii/ l Which may be regarded as 
the inverse sixth power of a normalized interaction range. (^ is the coefficient of the correlation length 
equation in section A2. 5. 6. 3 and v Q is a molecular volume.) The behaviour of the fluid will be nonanalytic 
(Ising-like) when x = (T - T)/T= tl{\ - t) is much smaller than 7V G , while it is analytic (van der Waals-like) 
when x is much greater than N G . A significant result of this recent research is that the free energy can be 
rescaled to produce a continuous function over the whole range of temperatures. 

For simple fluids N G is estimated to be about 0.01, and Kostrowicka Wyczalkowska et al [ 29 ] have used this 
to apply crossover theory to the van der Waals equation with interesting results. The critical temperature T Q is 

reduced by 1 1% and the coexistence curve is of course flattened to a cubic. The critical density p c is almost 
unchanged (by 2%), but the critical pressure/? is reduced greatly by 38%. These changes reduce the critical 


compression factor (p V/RT) c from 3.75 to 2.6; the experimental value for argon is 2.9. The molar heat 
capacity C y for the classical van der Waals fluid and the crossover van der Waals fluid are compared in figure 
A2.5.26 . 

Povodyrev et al [30] have applied crossover theory to the Flory equation ( section A2. 5.4.1 ) for polymer 
solutions for various values of TV, the number of monomer units in the polymer chain, obtaining the 
coexistence curve and values of the coefficient P eff from the slope of that curve. Figure A2. 5. 27 shows their 
comparison between classical and crossover values of (3 eff for N= 1, which is of course just the simple 
mixture. As seen in this figure, the crossover to classical behaviour is not complete until far below the critical 
temperature. 
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Figure A2.5.27. The effective coexistence curve exponent P eff = d In x/d In x for a simple mixture (N= 1) as a 
function of the temperature parameter x = 1 1 (\ -t) calculated from crossover theory and compared with the 
corresponding curve from mean-field theory (i.e. from figure A2.5.15 ). Reproduced from [30], Povodyrev A 
A, Anisimov M A and Sengers J V 1 999 Crossover Flory model for phase separation in polymer solutions 
Physica A 264 358, figure 3, by permission of Elsevier Science. 

Sengers and coworkers (1999) have made calculations for the coexistence curve and the heat capacity of the 
real fluid SF 6 and the real mixture 3-methylpentane + nitroethane and the agreement with experiment is 
excellent; their comparison for the mixture [ 28 ] is shown in figure A2.5.28 . 
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Figure A2.5.28. The coexistence curve and the heat capacity of the binary mixture 3-methylpentane + 
nitroethane. The circles are the experimental points, and the lines are calculated from the two-term crossover 
model. Reproduced from [28], 2000 Supercritical Fluids — Fundamentals and Applications ed E Kiran, P G 
Debenedetti and C J Peters (Dordrecht: Kluwer) Anisimov M A and Sengers J V Critical and crossover 
phenomena in fluids and fluid mixtures, p 16, figure 3, by kind permission from Kluwer Academic Publishers. 
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However, for more complex fluids such as high-polymer solutions and concentrated ionic solutions, where the 
range of intermolecular forces is much longer than that for simple fluids and N G is much smaller, mean-field 
behaviour is observed much closer to the critical point. Thus the crossover is sharper, and it can also be 
nonmonotonic. 


A2.5.8 OTHER EXAMPLES OF SECOND-ORDER TRANSITIONS 

There are many other examples of second-order transitions involving critical phenomena. Only a few can be 
mentioned here. 

A2.5.8.1 TWO-DIMENSIONAL ISING SYSTEMS 

No truly two-dimensional systems exist in a three-dimensional world. However monolayers absorbed on 
crystalline or fluid surfaces offer an approximation to two-dimensional behaviour. Chan and coworkers [ 31 ] 
have measured the coexistence curve for methane adsorbed on graphite by an ingenious method of 
determining the maximum in the heat capacity at various coverages. The coexistence curve (figure A2.5.29) is 
fitted to P = 0.127, very close to the theoretical 1/8. A 1992 review [ 32 ] summarizes the properties of rare 
gases on graphite. 
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Figure A2.5.29. Peak positions of the liquid-vapour heat capacity as a function of methane coverages on 
graphite. These points trace out the liquid-vapour coexistence curve. The full curve is drawn for p = 0.127. 
Reproduced from [31] Kim H K and Chan M H W Phys. Rev. Lett 53 171 (1984) figure 2. Copyright (1984) 
by the American Physical Society. 

A2.5.8.2 THE XY MODEL (N = 2) 

If the scalar order parameter of the Ising model is replaced by a two-component vector (n = 2), the XT model 

results. An important example that satisfies this model is the ^-transition in helium, from superfluid helium-II 

to ordinary liquid helium, occurring for the isotope He and for mixtures of He with 3 He. (This is the 
transition at 1 . 1 K, not the 


-55- 


liquid-gas critical point at 5.2 K, which is Ising.) Calculations indicate that at the n = 2 transition, the heat 
capacity exponent a is very small, but negative. If so, the heat capacity does not diverge, but rather reaches a 
maximum just at the Appoint, as shown in the following equation: 


C{n = 2>d = 3) = C, yAt -At 


tf 


where C max is the value at the ^-transition. At first this prediction was hard to distinguish experimentally 
from a logarithmic divergence but experiments in space under conditions of microgravity by Lipa and 


coworkers (1996) have confirmed it [ 33 ] with an a = - 0.01285, a value within the limits of uncertainty of the 
theoretical calculations. The results above and below the transition were fitted to the same value of t* ov and 

IllaX 

a but with^4 + /v4~ = 1.054. Since the heat capacity is finite and there is no discontinuity, this should perhaps be 
called a third-order transition. 

The liquid-crystal transition between smectic-A and nematic for some systems is an XT transition. Depending 
on the value of the MacMillan ratio, the ratio of the temperature of the smectic-A-nematic transition to that of 
the nematic-isotropic transition (which is Ising), the behaviour of such systems varies continuously from a X- 
type transition to a tricritical one (see section A2. 5. 9 ). Garland and Nounesis [ 34 ] reviewed these systems in 
1994. 

A2.5.8.3 THE HEISENBERG MODEL (N = 3) 

While the behaviour of some magnetic systems is Ising-like, others require a three-dimensional vector. In the 
limit of where the value of the quantum number J goes to infinity (i.e. where all values of the magnetic 
quantum number Mare possible), the Heisenberg model (n = 3) applies. The exponents P and y are somewhat 
larger than the Ising or XT values; the exponent a is substantially negative (about - 0.12). 

A2.5.8.4 POLYMERIZATION SYSTEMS (N -> 0) 

Some equilibrium polymerizations are such that over a range of temperatures only the monomer exists in any 
significant quantity, but below or above a unique temperature polymers start to form in increasing number. 
Such a polymerization temperature is a critical point, another kind of second-order transition. The classic 
example is that of the ring-chain transition in sulfur, but more recently similar behaviour has been found in a 
number of 'living polymers'. Wheeler and coworkers [35] have shown that these systems can best be treated 
as examples of the mathematical limit of the n-vQCtor model with n — » 0. The heat capacity in such a system 
diverges more strongly than that of an Ising system (a = 0.235 [27]); the heat capacity of sulfur fits the model 
qualitatively, but there are chemical complications. 

Mixtures of such polymeric substances with solvents show a line of critical points that in theory end at a 
tricritical point. (See section A2. 5. 9 for further discussion of tricritical phenomena.) 

A2.5.8.5 SUPERCONDUCTIVITY 

Alone among all known physical phenomena, the transition in low-temperature (T < 25 K) superconducting 
materials (mainly metals and alloys) retains its classical behaviour right up to the critical point; thus the 
exponents are the analytic ones. Unlike the situation in other systems, such superconducting interactions are 
truly long range and thus 
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mean field. For the newer high-temperature superconducting materials, the situation is different. These 
substances crystallize in structures that require a two-component order parameter and show XT behaviour, 
usually three dimensional (i.e. n = 2, d = 3). Pasler et al [ 36 ] have measured the thermal expansivity of 
YBa 2 Cu 3 7§ and have found the exponent a to be 0±0.018, which is consistent with the small negative 
value calculated for the XT model and found for the ^-transition in helium. 


A2.5.9 MULTICRITICAL POINTS 

An ordinary critical point such as those discussed in earlier sections occurs when two phases become more 


and more nearly alike and finally become one. Because this involves two phases, it is occasionally called a 
'bicritical point'. A point where three phases simultaneously become one is a 'tricritical point'. There are two 
kinds of tricritical points, symmetrical and unsymmetrical; there is a mathematical similarity between the two, 
but the physical situation is so different that they need to be discussed quite separately. One feature that both 
kinds have in common is that the dimension at and above which modern theory yields agreement between 
'classical' and 'nonclassical' treatments is d = 3, so that analytic treatments (e.g. mean-field theories) are 
applicable to paths leading to tricritical points, unlike the situation with ordinary critical points where the 
corresponding dimension is d = 4. (In principle there are logarithmic corrections to these analytic predictions 
for d = 3, but they have never been observed directly in experiments.) 

A 1984 volume reviews in detail theories and experiments [37] on multicritical points; some important papers 
have appeared since that time. 

A2.5.9.1 SYMMETRICAL TRICRITICAL POINTS 

In the absence of special symmetry, the phase rule requires a minimum of three components for a tricritical 
point to occur. Symmetrical tricritical points do have such symmetry, but it is easiest to illustrate such 
phenomena with a true ternary system with the necessary symmetry. A ternary system comprised of a pair of 
enantiomers (optically active d- and /-isomers) together with a third optically inert substance could satisfy this 
condition. While liquid-liquid phase separation between enantiomers has not yet been found, ternary phase 
diagrams like those shown in figure A2. 5. 30 can be imagined; in these diagrams there is a necessary 
symmetry around a horizontal axis that represents equal amounts of the two enantiomers. 
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Figure A2.5.30. Left-hand side: Eight hypothetical phase diagrams (A through H) for ternary mixtures of d- 
and /-enantiomers with an optically inactive third component. Note the symmetry about a line corresponding 
to a racemic mixture. Right-hand side: Four T, x diagrams ((a) through (d)) for 'pseudobinary' mixtures of a 
racemic mixture of enantiomers with an optically inactive third component. Reproduced from [37] 1984 
Phase Transitions and Critical Phenomena ed C Domb and J Lebowitz, vol 9, ch 2, Knobler C M and Scott R 
L Multicritical points in fluid mixtures. Experimental studies pp 213-14, (Copyright 1984) by permission of 
the publisher Academic Press. 


Now consider such a symmetrical system, that of a racemic mixture of the enantiomers plus the inert third 
component. A pair of mirror- image conjugate phases will not physically separate or even become turbid, since 
they have exactly the same density and the same refractive index. Unless we find evidence to the contrary, we 
might conclude that this is a binary mixture with a T, x phase diagram like one of those on the right-hand side 
of figure A2.5.30. In particular any symmetrical three-phase region will have to shrink symmetrically, so it 
may disappear at a tricritical point, as shown in two of the four 'pseudobinary' diagrams. The dashed lines in 
these diagrams are two-phase critical points, and will show the properties of a second-order transition. Indeed, 
a feature of these diagrams is that with increasing temperature, a first-order transition ends at a tricritical point 
that is followed by a second-order transition line. (This is even more striking if the phase diagram is shown in 
field space as a/?, Tor \i, T diagram.) 

These unusual 'pseudobinary' phase diagrams were derived initially by Meijering (1950) from a 'simple 
mixture' model for ternary mixtures. Much later, Blume, Emery and Griffiths (1971) deduced the same 
diagrams from a three-spin model of helium mixtures. The third diagram on the right of figure A2.5.30 is 

essentially that found experimentally for the fluid mixture 4 He+ 3 He; the dashed line (second-order transition) 
is that of the ^-transition. 

Symmetrical tricritical points are predicted for fluid mixtures of sulfur or living polymers in certain solvents. 
Scott (1965) in a mean-field treatment [38] of sulfur solutions found that a second-order transition line (the 
critical 
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polymerization line) ended where two-phase separation of the polymer and the solvent begins; the theory 
yields a tricritical point at that point. Later Wheeler and Pfeuty [39] extended their n — » treatment of 
equilibrium polymerization to sulfur solutions; in mean field their theory reduces to that of Scott, and the 
predictions from the nonclassical formulation are qualitatively similar. The production of impurities by slow 
reaction between sulfur and the solvent introduces complications; it can eliminate the predicted three-phase 
equilibrium, flatten the coexistence curve and even introduce an unsymmetrical tricritical point. 

Symmetrical tricritical points are also found in the phase diagrams of some systems forming liquid crystals. 

A2.5.9.2 UNSYMMETRICAL TRICRITICAL POINTS 

While, in principle, a tricritical point is one where three phases simultaneously coalesce into one, that is not 
what would be observed in the laboratory if the temperature of a closed system is increased along a path that 
passes exactly through a tricritical point. Although such a difficult experiment is yet to be performed, it is 
clear from theory (Kaufman and Griffiths 1982, Pegg et al 1990) and from experiments in the vicinity of 
tricritical points that below the tricritical temperature T t only two phases coexist and that the volume of one 
shrinks precipitously to zero at T v 

While the phase rule requires three components for an unsymmetrical tricritical point, theory can reduce this 
requirement to two components with a continuous variation of the interaction parameters. Lindh et al (1984) 
calculated a phase diagram from the van der Waals equation for binary mixtures and found (in accord with 
figure A2.5.13 that a tricritical point occurred at sufficiently large values of the parameter C, (a measure of the 
difference between the two components). 

One can effectively reduce the three components to two with 'quasibinary' mixtures in which the second 
component is a mixture of very similar higher hydrocarbons. Figure A2.5.31 shows a phase diagram [ 40 ] 
calculated from a generalized van der Waals equation for mixtures of ethane (n^ = 2) with normal 
hydrocarbons of different carbon number n 2 (treated as continuous). It is evident that, for some values of the 
parameter n 2 , those to the left of the tricritical point at n 2 = 16.48, all that will be observed with increasing 


temperature is a two-phase region (a + P) above which only the P phase exists. Conversely, for larger values 
of n 2 , those to the right of the tricritical point, increasing the temperature takes the system from the two-phase 

region (a + P) through a narrow three-phase region (a + P + y) to a different two-phase region (P + y). 
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Figure A2.5.31. Calculated Z/r ic ,n 2 phase diagram in the vicinity of the tricritical point for binary mixtures 
of ethane (n^ = 2) with a higher hydrocarbon of continuous n^ The system is in a sealed tube at fixed 
tricritical density and composition. The tricritical point is at the confluence of the four lines. Because of the 
fixing of the density and the composition, the system does not pass through critical end points; if the critical 
end-point lines were shown, the three-phase region would be larger. An experiment increasing the 
temperature in a closed tube would be represented by a vertical line on this diagram. Reproduced from [40], 
figure 8, by permission of the American Institute of Physics. 

Most of the theoretical predictions have now been substantially verified by a large series of experiments in a 
number of laboratories. Knobler and Scott and their coworkers (1977-1991) have studied a number of 
quasibinary mixtures, in particular ethane + (hexadecane + octadecane) for which the experimental n 2 = 17.6. 
Their experimental results essentially confirm the theoretical predictions shown in figure A2.5.31. 

A2.5.9.3 HIGHER-ORDER CRITICAL POINTS 

Little is known about higher order critical points. Tetracritical points, at least unsymmetrical ones, require 
four components. However for tetracritical points, the crossover dimension d = 2, so any treatment can surely 
be mean-field, or at least analytic. 


A2.5.10 HIGHER-ORDER PHASE TRANSITIONS 

We have seen in previous sections that the two-dimensional Ising model yields a symmetrical heat capacity 
curve that is divergent, but with no discontinuity, and that the experimental heat capacity at the ^-transition of 
helium is finite without a discontinuity. Thus, according to the Ehrenfest-Pippard criterion these transitions 
might be called third-order. 
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It has long been known from statistical mechanical theory that a Bose-Einstein ideal gas, which at low 
temperatures would show condensation of molecules into the ground translational state (a condensation in 
momentum space rather than in position space), should show a third-order phase transition at the temperature 

at which this condensation starts. Normal helium (He) is a Bose-Einstein substance, but is far from ideal at 
low temperatures, and the very real forces between molecules make the ^-transition to He II very different 
from that predicted for a Bose-Einstein gas. 

Recent research (1995-) has produced at very low temperatures (nanokelvins) a Bose-Einstein condensation 
of magnetically trapped alkali metal atoms. Measurements [ 41 ] of the fraction of molecules in the ground 

state of 87 Rb as a function of temperature show good agreement with the predictions for a finite number of 
noninteracting bosons in the three-dimensional harmonic potential produced by the magnets; indeed the 
difference in this occupancy differs only slightly from that predicted for translation in a 3D box. However the 
variation of the energy as a function of temperature is significantly different from that predicted for a 3D box; 
the harmonic potential predicts a discontinuity in the heat capacity which is confirmed by experiment; thus 
this transition is second-order rather than third-order. 
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A3.1 Kinetic theory: transport and fluctuations 

J R Dorfman 


A3.1.1 INTRODUCTION 

The kinetic theory of gases has a long history, extending over a period of a century and a half, and is 
responsible for many central insights into, and results for, the properties of gases, both in and out of 
thermodynamic equilibrium [JJ. Strictly speaking, there are two familiar versions of kinetic theory, an 
informal version and a formal version. The informal version is based upon very elementary considerations of 
the collisions suffered by molecules in a gas, and upon elementary probabilistic notions regarding the velocity 
and free path distributions of the molecules. In the hands of Maxwell, Boltzmann and others, the informal 
version of kinetic theory led to such important predictions as the independence of the viscosity of a gas on its 
density at low densities, and to qualitative results for the equilibrium thermodynamic properties, the transport 
coefficients, and the structure of microscopic boundary layers in a dilute gas. The more formal theory is also 
due to Maxwell and Boltzmann, and may be said to have had its beginning with the development of the 
Boltzmann transport equation in 1872 [2]. At that time Boltzmann obtained, by heuristic arguments, an 
equation for the time dependence of the spatial and velocity distribution function for particles in the gas. This 
equation provided a formal foundation for the informal methods of kinetic theory. It leads directly to the 
Maxwell-Boltzmann velocity distribution for the gas in equilibrium. For non-equilibrium systems, the 
Boltzmann equation leads to a version of the second law of thermodynamics (the Boltzmann //-theorem), as 
well as to the Navier-Stokes equations of fluid dynamics, with explicit expressions for the transport 
coefficients in terms of the intermolecular potentials governing the interactions between the particles in the 
gas [3]. It is not an exaggeration to state that the kinetic theory of gases was one of the great successes of 
nineteenth century physics. Even now, the Boltzmann equation remains one of the main cornerstones of our 
understanding of non-equilibrium processes in fluid as well as solid systems, both classical and quantum 
mechanical. It continues to be a subject of investigation in both the mathematical and physical literature and 
its predictions often serve as a way of distinguishing different molecular models employed to calculate gas 
properties. Kinetic theory is typically used to describe the non-equilibrium properties of dilute to moderately 
dense gases composed of atoms, or diatomic or polyatomic molecules. Such properties include the 
coefficients of shear and bulk viscosity, thermal conductivity, diffusion, as well as gas phase chemical 
reaction rates, and other, similar properties. 

In this section we will survey both the informal and formal versions of the kinetic theory of gases, starting 
with the simpler informal version. Here the basic idea is to combine both probabilistic and mechanical 
arguments to calculate quantities such as the equilibrium pressure of a gas, the mean free distance between 
collisions for a typical gas particle, and the transport properties of the gas, such as its viscosity and thermal 
conductivity. The formal version again uses both probabilistic and mechanical arguments to obtain an 
equation, the Boltzmann transport equation, that determines the distribution function, /(r, v, t), that describes 
the number of gas particles in a small spatial region, Sr, about a point r, and in a small region of velocities, 
5v, about a given velocity v, at some time t. The formal theory forms the basis for almost all applications of 
kinetic theory to realistic systems. 

We will almost always treat the case of a dilute gas, and almost always consider the approximation that the 
gas particles obey classical, Hamiltonian mechanics. The effects of quantum properties and/or of higher 
densities will be briefly commented upon. A number of books have been devoted to the kinetic theory of 
gases. Here we note that some 


of the interesting and easily accessible ones are those of Boltzmann [2], Chapman and Cowling [3], 
Hirshfelder et al [4], Hanley [5], Fertziger and Kaper [6], Resibois and de Leener [7], Liboff [8] and Present 
[9]. Most textbooks on the subject of statistical thermodynamics have one or more chapters on kinetic theory 
[10, 11,12 and 13]. 


A3.1.2 THE INFORMAL KINETIC THEORY FOR THE DILUTE GAS 

We begin by considering a gas composed of TV particles in a container of volume V. We suppose, first, that the 
particles are single atoms, interacting with forces of finite range denoted by a. Polyatomic molecules can be 
incorporated into this informal discussion, to some extent, but atoms and molecules interacting with long- 
range forces require a separate treatment based upon the Boltzmann transport equation. This equation is 
capable of treating particles that interact with infinite-range forces, at least if the forces approach zero 
sufficiently rapidly as the separation of the particles becomes infinite. Typical potential energies describing 
the interactions between particles in the gas are illustrated in figure A3. 1.1 where we describe Lennard- Jones 
(LJ) and Weeks-Chandler-Anderson (WCA) potentials. The range parameter, a, is usually taken to be a value 
close to the first point where the potential energy becomes negligible for all greater separations. While choice 
of the location of this point is largely subjective, it will not be a serious issue in what follows, since the results 
to be described below are largely qualitative order-of-magnitude results. However we may usefully take the 
distance a to represent the effective diameter of a particle. 
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Figure A3. 1.1. Typical pair potentials. Illustrated here are the Lennard- Jones potential, (|) L j, and the Weeks- 
Chandler- Anderson potential, P WCA , which gives the same repulsive force as the Lennard- Jones potential. 
The relative separation is scaled by a, the distance at which the Lennard- Jones first passes through zero. The 
energy is scaled by the well depth, e. 


The dilute gas condition can be stated as the condition that the available volume per particle in the container is 
much larger that the volume of the particle itself. In other words 


^»« 3 or «fl 3 <Kl (A3.1.1) 

N 

where n = N/Vis the average number of particles per unit volume. We will see below that this condition is 
equivalent to the requirement that the mean free path between collisions, which we denote by X, is much 
greater than the size of a particle, a. Next we suppose that the state of the gas can be described by a 
distribution function/(r,v,0, such that/(r,v,0 dr dv is the number of gas particles in dr about r, and in dv 
about v at time t. To describe the state of a gas of polyatomic molecules, or of any mixture of different 
particles, we would need to include additional variables in the argument of/ to describe the internal states of 
the molecules and the various components of the mixture. To keep the discussion simple, we will consider 
gases of monoatomic particles, for the time being. 

At this point it is important to make some clarifying remarks: (1) clearly one cannot regard dr in the above 
expression, strictly, as a mathematical differential. It cannot be infinitesimally small, since dr much be large 
enough to contain some particles of the gas. We suppose instead that dr is large enough to contain some 
particles of the gas but small compared with any important physical length in the problem under 
consideration, such as a mean free path, or the length scale over which a physical quantity, such as a 
temperature, might vary. (2) The distribution function/(r,v,0 typically does not describe the exact state of the 
gas in the sense that it tells us exactly how many particles are in the designated regions at the given time t. To 
obtain and use such an exact distribution function one would need to follow the motion of the individual 
particles in the gas, that is, solve the mechanical equations for the system, and then do the proper counting. 
Since this is clearly impossible for even a small number of particles in the container, we have to suppose that/ 
is an ensemble average of the microscopic distribution functions for a very large number of identically 
prepared systems. This, of course, implies that kinetic theory is a branch of the more general area of statistical 
mechanics. As a result of these two remarks, we should regard any distribution function we use as an 
ensemble average rather than an exact expression for our particular system, and we should be careful when 
examining the variation of the distribution with space and time, to make sure that we are not too concerned 
with variations on spatial scales that are of the order or less than the size of a molecule, or on time scales that 
are of the order of the duration of a collision of a particle with a wall or of two or more particles with each 
other. 

A3.1.2.1 EQUILIBRIUM PROPERTIES FROM KINETIC THEORY 

The equilibrium state for a gas of monoatomic particles is described by a spatially uniform, time independent 

distribution function whose velocity dependence has the form of the Maxwell-Boltzmann distribution, 

obtained from equilibrium statistical mechanics. That is,/(r,v,0 has the formf(v) given by 

eq 

/flj(v) =n<p(v) (A312 ) 


where 


(p ( v)= fH!l\\-^2m (A3.1.3) 

is the usual Maxwell-Boltzmann velocity distribution function. Here m is the mass of the particle, and the 
quantity (3 = (k^T)~ , where T is the equilibrium thermodynamic temperature of the gas and k^ is Boltzmann's 


constant, k B = 1.380 x 10" 23 J K" 1 . 

We are now going to use this distribution function, together with some elementary notions from mechanics 
and probability theory, to calculate some properties of a dilute gas in equilibrium. We will calculate the 
pressure that the gas exerts on the walls of the container as well as the rate of effusion of particles from a very 
small hole in the wall of the container. As a last example, we will calculate the mean free path of a molecule 
between collisions with other molecules in the gas. 

(A) THE PRESSURE 

To calculate the pressure, we need to know the force per unit area that the gas exerts on the walls of the 
vessel. We calculate the force as the negative of the rate of change of the vector momentum of the gas 
particles as they strike the container. We consider then some small area, A, on the wall of the vessel and look 
at particles with a particular velocity v, chosen so that it is physically possible for particles with this velocity 
to strike the designated area from within the container. We consider a small time interval 8t, and look for all 
particles with velocity v that will strike this area, A over time interval 8t. As illustrated in figure A3. 1.2 all 
such particles must lie in a small 'cylinder' of base area A, and height, |v ■ n \8t, where ^is a unit normal to the 
surface of the container at the small area^4, and directed toward the interior of the vessel. We will assume that 
the gas is very dilute and that we can ignore the collisions between particles, and take only collisions of 
particles with the wall into account. Every time such a particle hits our small area of the wall, its momentum 
changes, since its momentum after a collision differs from its momentum prior to the collision. Let us suppose 
that the particles make elastic, specular collisions with the surface, so that the momentum change per particle 
at each collision is Ap = -2(p • «)/} = -2m(v • £)£. This vector is directed in toward the container. Now to 

calculate the total change in the momentum of the gas in time 8t due to collisions with the wall at the point of 
interest, we have to know how many particles with velocity v collide with the wall, multiply the number of 
collisions by the change in momentum per collision, and then integrate over all possible values of the velocity 
v than can lead to such a collision. To calculate the number of particles striking the small area, A, in time 
interval 8t, we have to invoke probabilistic arguments, since we do not know the actual locations and the 
velocities of all the particles at the beginning of the time interval. We do know that if we ignore possible 
collisions amongst the particles themselves, all of the particles with velocity v colliding with A in time 8t will 
have to reside in the small cylinder illustrated in figure A3. 1.2 , with volume A|v ■ n\8t. Now, using the 

distribution function/given by equation (A3. 1.2) , we find that the number, SA'(v), of particles with velocity v 
in the range dv, in the collision cylinder is 


f dv 


Apioiai = -2AnStn f dv|v - h|(v *n)*p(v). (A3.1.4) 


Now each such particle adds its change in momentum, as given above, to the total change of momentum of 
the gas in time 8t. The total change in momentum of the gas is obtained by multiplying 8 A' by the change in 
momentum per particle and integrating over all allowed values of the velocity vector, namely, those for which 
v • «< 0. That is 


A|Wi = -lAfi&tn I dv[v *j5|(v*/i)f»(u). (A3.1.5) 

Finally the pressure, P, exerted by the gas on the container, is the negative of the force per unit area that the 


wall exerts on the gas. This force is measured by the change in momentum of the gas per unit time. Thus we 
are led to 


P = 2* / dv|y.fl|V(u) 


(A3. 1.6) 


= — = nk\\T. 


Here we have carried out the velocity integral over the required half- space and used the explicit form of the 
Maxwell-Boltzmann distribution function, given by equation (A3. 1.3) . 



Figure A3. 1.2. A collision cylinder for particles with velocity v striking a small region of area A on the 
surface of a container within a small time interval 8t. Here n is a unit normal to the surface at the small region, 
and points into the gas. 


(B) THE RATE OF EFFUSION THROUGH A SMALL HOLE 

It is a simple matter now to calculate number of particles per unit area, per unit time, that pass through a small 
hole in the wall of the vessel. This quantity is called the rate of effusion, denoted by n Q9 and it governs the loss 
of particles in a container when there is a small hole in the wall separating the gas from a vacuum, say. This 
number is in fact obtained by integrating the quantity, 8 A'(v) over all possible velocities having the proper 
direction, and then dividing this number by Abt. Thus we find 


(A3. 1.7) 


where is the average speed of a particle in a gas in equilibrium, given by 


- = /_M ,/2 


(A3. 1.8) 


The result, (A3. 1.7), can be viewed also as the number of particles per unit area per unit time colliding from 


one side of any small area in the gas, whether real or fictitious. We will use this result in the next section 
when we consider an elementary kinetic theory for transport coefficients in a gas with some kind of flow 
taking place. 

(C) THE MEAN FREE PA TH 

The previous calculations, while not altogether trivial, are among the simplest uses one can make of kinetic 
theory arguments. Next we turn to a somewhat more sophisticated calculation, that for the mean free path of a 
particle between collisions with other particles in the gas. We will use the general form of the distribution 
function at first, before restricting ourselves to the equilibrium case, so as to set the stage for discussions in 
later sections where we describe the formal kinetic theory. Our approach will be first to compute the average 
frequency with which a particle collides with other particles. The inverse of this frequency is the mean time 
between collisions. If we then multiply the mean time between collisions by the mean speed, given by 
equation (A3. 1.8), we will obtain the desired result for the mean free path between collisions. It is important 
to point out that one might choose to define the mean free path somewhat differently, by using the root mean 
square velocity instead of v, for example. The only change will be in a numerical coefficient. The important 
issue will be to obtain the dependence of the mean free path upon the density and temperature of the gas and 
on the size of the particles. The numerical factors are not that important. 

Let us focus our attention for the moment on a small volume in space, dr, and on particles in the volume with 
a given velocity v. Let us sit on such a particle and ask if it might collide in time 8t with another particle 
whose velocity is v 1? say. Taking the effective diameter of each particle to be a, as described above, we see 

that our particle with velocity v presents a cross sectional area of size n a for collisions with other particles. 
If we focus on collisions with another 


particle with velocity v 1? then, as illustrated in figure A3. 1.3 a useful coordinate system to describe this 
collision is one in which the particle with velocity v is located at the origin and the z-axis is aligned along the 
direction of the vector g = v 1 - v. In this coordinate system, the centre of the particle with velocity v 1 must be 

somewhere in the collision cylinder of volume n a 2 \g\8t in order that a collision between the two particles 
takes place in the time interval 5^. Now in the small volume dr there are/(r,v,0 dr particles with velocity v at 
time t, each one with a collision cylinder of the above type attached to it. Thus the total volume, Si^v^) of 

these (VjVj) collision cylinders is 


JVtv, vi) = jr<r|g|Jr/{r ? v. r), (A3.1.9) 

Now, again, we use a probabilistic argument to say that the number of particles with velocity v 1 in this total 
volume is given by the product of the total volume and the number of particles per unit volume with velocity 
v 1? that is, S^VjU^/^VpO- To complete the calculation, we suppose that the gas is so dilute that each of the 
collision cylinders has either zero or one particle with velocity v 1 in it, and that each such particle actually 
collides with the particle with velocity v. Thus the total number of collisions suffered by particles with 
velocity v in time 8t is 

nahtf(r,v,t) J dv 1 |v 1 -v| f{r,v,t). 
Then it follows that the total number of collisions per unit time suffered by particles with all velocities is 


™7 dv / 


<Iv,|vl -v|/(r,v, 0/(r.V|./). 


(A3. 1.10) 


Notice that each collision is counted twice, once for the particle with velocity v and once for the particle with 
velocity Vj. We also note that we have assumed that the distribution functions/do not vary over distances 
which are the lengths of the collision cylinders, as the interval 8t approaches some small value, but still large 
compared with the duration of a binary collision. 
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Figure A3. 1.3. The collision cylinder for collisions between particles with velocities v and Vj. The origin is 
placed at the centre of the particle with velocity v and the z-axis is in the direction of v 1 - v. The spheres 
indicate the range, a, of the intermolecular forces. 

Our first result is now the average collision frequency obtained from the expression, (A3. 1.10) , by dividing it 
by the average number of particles per unit volume. Here it is convenient to consider the equilibrium case, and 
to use (A3. 1.2) for/ Then we find that the average collision frequency, v, for the particles is 


v = F}7t(r I dvdvi|vi — v|p(i>)p(i J i) 

2 / i6 y* 


(A3.1.11) 


The average time between collisions is then v , and in this time the particle will typically travel a distance X 9 
the mean free path, where 


X = vv = 


1 


l^Ttna 2 ' 


(A3. 1.1 2) 


This is the desired result. It shows that the mean free path is inversely proportional to the density and the 
collision cross section. This is a physically sensible result, and could have been obtained by dimensional 


arguments alone, except for the unimportant numerical factor. 


A3.1.2.2 THE MEAN FREE PATH EXPRESSIONS FOR TRANSPORT COEFFICIENTS 

One of the most useful applications of the mean free path concept occurs in the theory of transport processes 
in systems where there exist gradients of average but local density, local temperature, and/or local velocity. 
The existence of such gradients causes a transfer of particles, energy or momentum, respectively, from one 
region of the system to another. 

The kinetic theory of transport processes in gases rests upon three basic assumptions. 

(i) The gas is dense enough that the mean free path is small compared with the characteristic size of the 
container. Consequently, the particles collide with each other much more often than they collide with 
the walls of the vessel. 

(ii) As stated above, the gas is sufficiently dilute that the mean free path is much larger than the diameter 
of a particle. 

(iii) The local density, temperature and density vary slowly over distances of the order of a mean free 
path. 

If these assumptions are satisfied then the ideas developed earlier about the mean free path can be used to 
provide qualitative but useful estimates of the transport properties of a dilute gas. While many varied and 
complicated processes can take place in fluid systems, such as turbulent flow, pattern formation, and so on, 
the principles on which these flows are analysed are remarkably simple. The description of both simple and 
complicated flows in fluids is based on five hydrodynamic equations, the Navier-Stokes equations. These 
equations, in turn, are based upon the mechanical laws of conservation of particles, momentum and energy in 
a fluid, together with a set of phenomenological equations, such as Fourier's law of thermal conduction and 
Newton's law of fluid friction. When these phenomenological laws are used in combination with the 
conservation equations, one obtains the Navier-Stokes equations. Our goal here is to derive the 
phenomenological laws from elementary mean free path considerations, and to obtain estimates of the 
associated transport coefficients. Here we will consider thermal conduction and viscous flow as examples. 

(A) THERMAL CONDUCTION 

We can obtain an understanding of Fourier's law of thermal conduction by considering a very simple 
situation, frequently encountered in the laboratory. Imagine a layer of gas, as illustrated in figure A3. 1.4 
which is small enough to exclude convection, but many orders of magnitude larger than a mean free path. 
Imagine further that the temperature is maintained at constant values, Tj and T 2 , T 2 > Tp along two planes 
separated by a distance Z, as illustrated. We suppose that the system has reached a stationary state so that the 
local temperature at any point in the fluid is constant in time and depends only upon the z-component of the 
location of the point. Now consider some imaginary plane in the fluid, away from the boundaries, and look at 
the flow of particles across the plane. We make a major simplification and assume that all particles crossing 
the plane carry with them the local properties of the system a mean free path above and below the plane. That 
is, suppose we examine the flow of particles through the plane, coming from above it. Then we can say that 
the number of particles crossing the plane per unit area and per unit time from above, i.e. the particle current 
density, Aa heading down, is given by 


-10- 


J,;(z) = inU + >-)u(z + >>) (A3. 1.1 3) 

where z is the height of the plane we consider, X is the mean free path, and we use (A3. 1.7) for this current 
density. Similarly, the upward flux is 

£(z) = i*(z-X)v(z-X). (A3.1.14) 

In a steady state, with no convection, the two currents must be equal, Ar ^ ™ A» w = /hUJ. Now we 
assume that each particle crossing the place carries the energy per particle characteristic of the location at a 
mean free path above or below the plane. Thus the upward and downward energy current densities, h , are 

j?(l) = MzMz*k) (A3.1.15) 

where e(z TX) is the local energy per particle at a distance X below and above the plane. The net amount of 
energy transferred per unit area per unit time in the positive z direction, q(z), is then 




, w . »«^> ^/'' 2 ^ (A3.1.16) 


Neglecting derivatives of the third order and higher, we obtain Fourier's law of thermal conduction 

3T 
q(i) = -k— (A3. 1.1 7) 

Oz 

where the coefficient of thermal conductivity, k, is given by 

k = lnvk^. (A3.1.18) 

2 dT 

The result is, of course, a case of the more general expression of Fourier's law, namely 

q = -kVT (A3.1.19) 

adjusted to the special situation that the temperature gradient is in the z-direction. Since k is obviously 
positive, our result is in accord with the second law of thermodynamics, which requires heat to flow from 
hotter to colder regions. 
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Figure A3. 1.4. Steady state heat conduction, illustrating the flow of energy across a plane at a height z. 

We can easily obtain an expression for k by using the explicit forms for vand X, given in (A3. 1.8) and 
(A3. 1.12) . Thus, in this approximation 


Jt = 


eAkyT) in 

a 2 m lf ' 2 7f** 2 


(A3. 1.20) 


where c is the specific heat per particle. We have assumed that the gradients are sufficiently small that the 
local average speed and mean free path can be estimated by their (local) equilibrium values. The most 
important consequences of this result for the thermal conductivity are its independence of the gas density and 

its variation with temperature as T . The independence of density is well verified at low gas pressures, but 
the square-root temperature dependence is only verified at high temperatures. Better results for the 

temperature dependence of k can be obtained by use of the Boltzmann transport equation, which we discuss in 
the next section. The temperature dependence turns out to be a useful test of the functional form of the 
intermolecular potential energy. 

(B) THE SHEAR VISCOSITY 

A distribution of velocities in a fluid gives rise to a transport of momentum in the fluid in complete analogy 
with the transport of energy which results from a distribution of temperatures. To analyse this transport of 
momentum in a fluid with a gradient in the average local velocity, we use the same method as employed in the 
case of thermal conduction. That is, we consider a layer of fluid contained between two parallel planes, 
moving with velocities in the x-direction with values Uj and U 2 , U 2 > Up as illustrated in figure A3. 1.5 . We 
suppose that the width of the layer is very large compared with a mean free path, and that the fluid adjacent to 
the moving planes moves with the velocity of the adjacent plane. If the velocities are not so large as to 
develop a turbulent flow, then a steady state can be maintained with an average local velocity, u(x,j,z), in the 
fluid of the form, u(x,^,z) = u (z)i, where iis a unit vector in the x-direction. 
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Figure A3. 1.5. Steady state shear flow, illustrating the flow of momentum across a plane at a height z. 

The molecules of the gas are in constant motion, of course, and there is a transport of particles in all directions 
in the fluid. If we consider a fictitious plane in the fluid, far from the moving walls, at a height z, then there 
will be a flow of particles from above and below the plane. The particles coming from above will carry a 
momentum with them typical of the average flow at a height z + X, while those coming from below will carry 
the typical momentum at height z-X, where X is the mean free path length. Due to the velocity gradient in the 
fluid there will be a net transport of momentum across the plane, tending to slow down the faster regions and 
to accelerate the slower regions. This transport of momentum leads to viscous forces (or stresses if measured 
per unit area) in the fluid, which in our case will be in the x-direction. The analysis of this viscous stress is 
almost identical to that for thermal conduction. 

Following the method used above, we see that there will be an upward flux of momentum in the x-direction, 
J Pi w , across the plane at z given by 


JpM) = Arfe>"w ir (z-A) 


(A3. 1.21) 


and a downward flux 


j Pr (z) = j tr (z)miiAz + X). 


(A3. 1.22) 


The net upward flow in the x-component of the momentum is called the shear stress, a zx , and by combining 
(A3 . 1 .2 1 ) and (A3 . 1 .22), we see that 
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ffw = JpM) - Jf,M) = Mz)mu x {z - A) - Mz)mu x iz + ^) 


= --Mt-n(zMz)— — + 

2 dz 


(A3. 1.23) 


Here we have neglected derivatives of the local velocity of third and higher orders. Equation (A3. 1.23) has the 
form of the phenomenological Newton's law of friction 

«,„ = -,*£« (A3.1.24) 


if we identify the coefficient of shear viscosity r| with the quantity 

r t = \mkn(z)v(z). (A3.1.25) 

An explicit expression for the coefficient of shear viscosity can be obtained by assuming the system is in local 
thermodynamic equilibrium and using the previously derived expression for X and v. Thus we obtain 

As in the case of thermal conductivity, we see that the viscosity is independent of the density at low densities, 
and grows with the square root of the gas temperature. This latter prediction is modified by a more systematic 
calculation based upon the Boltzmann equation, but the independence of viscosity on density remains valid in 
the Boltzmann equation approach as well. 

(C) THE EUKEN FACTOR 

We notice, using (A3. 1.20) and (A3. 1.26), that this method leads to a simple relation between the coefficients 
of shear viscosity and thermal conductivity, given by 

t- 

= 1. (A3. 1.27) 


WJje\ 


That is, this ratio should be a universal constant, valid for all dilute gases. A more exact calculation based 
upon the Boltzmann equation shows that the right-hand side of equation (A3. 1.27) should be replaced by 2.5 
instead of 1, plus a correction that varies slightly from gas to gas. The value of 2.5 holds with a very high 
degree of accuracy for dilute monatomic gases [5]. However, when this ratio is computed for diatomic and 
polyatomic gases, the value of 2.5 is no longer recovered. 
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Euken advanced a very simple argument which allowed him to extend the Boltzmann equation formula for k/ 
(mr\cj to diatomic and polyatomic gases. His argument is that when energy is transported in a fluid by 
particles, the energy associated with each of the internal degrees of freedom of a molecule is transported. 
However, the internal degrees of freedom play no role in the transport of momentum. Thus we should modify 
(A3. 1.20) to include these internal degrees of freedom. If we also modify it to correct for the factor of 2.5 
predicted by the Boltzmann equation, we obtain 


k - \nvUCJl + c£) (A3.1.28) 


where C = 2,5, c[ r is the translational specific heat per molecule, and c[.is the specific heat per molecule 

associated with the internal degrees of freedom. We can easily obtain a better value for the ratio, k/(mx\c^ in 
terms of the ratio of specific heat at constant pressure per molecule to the specific heat at constant volume, y = 
c p /c v , as 

Jfc/(mijc v ) = {(?]/- 5). (A3.1.29) 

The right-hand side of (A3. 1.29), called the Euken factor, provides a reasonably good estimate for this ratio 
[11]. 


A3.1.3 THE BOLTZMANN TRANSPORT EQUATION 

In 1872, Boltzmann introduced the basic equation of transport theory for dilute gases. His equation determines 
the time-dependent position and velocity distribution function for the molecules in a dilute gas, which we 
have denoted by/(r,v,0- Here we present his derivation and some of its major consequences, particularly the 
so-called //-theorem, which shows the consistency of the Boltzmann equation with the irreversible form of the 
second law of thermodynamics. We also briefly discuss some of the famous debates surrounding the 
mechanical foundations of this equation. 

We consider a large vessel of volume V, containing N molecules which interact with central, pairwise 
additive, repulsive forces. The latter requirement allows us to avoid the complications of long-lived 'bound' 
states of two molecules which, though interesting, are not central to our discussion here. We suppose that the 
pair potential has a strong repulsive core and a finite range a, such as the WCA potential illustrated in figure 
A3. 1.1 . Now, as before, we define a distribution function, /(r,v,0, for the gas over a six-dimensional position 
and velocity space, (r,v), such that 

/(r. v. t)SrSv = ihc number of particles in SrSv around r and v at time /. (A3. 1.30) 

To get an equation for/(r,v,0, we take a region 8r5v about a point (r,v), that is large enough to contain a lot 
of particles, but small compared with the range of variation off. 

There are four mechanisms that change the number of particles in this region. The particles can: 
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(i) flow into or out of Sr, the free-streaming term, 

(ii) leave the 8v region as a result of a direct collision, the loss term, 

(iii) enter the 8v region after a restituting collision, the gain term, and 

(iv) collide with the wall of the container (if the region contains part of the walls), the wall term. 

We again assume that there is a time interval 8t which is long compared with the duration of a binary collision 
but is too short for particles to cross a cell of size 5r. Then the change in the number of particles in 8r8v in 
time 8t can be written as 


[/(r.v.i >*r)-/(i\v,r)]artv = r f -r- ^ f +r, 


(A3. 1.31) 


where T f , T_, T + , and T w represent the changes in/due to the four mechanisms listed above, respectively. 

We suppose that each particle in the small region suffers at most one collision during the time interval 8t, and 
calculate the change inf. 

The computation of T f is relatively straightforward. We simply consider the free flow of particles into and out 
of the region in time 8t. An expression for this flow in the x-direction, for example, can be obtained by 
considering two thin layers of size vjytdr 8r z that contain particles that move into or out of a cell with its 
centre at (x,y,z) in time §£ (see figure A3. 1.6 . 
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Figure A3. 1.6. A schematic illustration of flow into and out of a small region. The hatched areas represent 
regions where particles enter and leave the region in time 5^. 
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The free streaming term can be written as the difference between the number of particles entering and leaving 
the small region in time 5^. Consider, for example, a cubic cell and look at the faces perpendicular to the x- 
axis. The flow of particles across the faces at x - Vihr x and at x + l Abr x is 


<•(*) 


ry 1 = v x ht&r y hr T &\[f{x - \Sr XA y\ z< V, f) - f(x + ^r iVl y h z, V, /)] 


(A3. 1.32) 


and similar expressions exist for the y- and z-directions. The function/is supposed to be sufficiently smooth 
that it can be expanded in a Taylor series around {x,y,z). The zeroth-order terms between the parentheses 

cancel and the first-order terms add up. Neglecting terms of order 8 and higher and summing over all 
directions then yields 


r f =-av^r(v>V)/(r,V t /). 


(A3. 1.33) 


Next we consider the computation of the loss term, r_- As in the calculation of the mean free path, we need 

to calculate the number of collisions suffered by particles with velocity v in the region SrSv in time &, 
assuming that each such collision results in a change of the velocity of the particle. We carry out the 
calculation in several steps. First, we focus our attention on a particular particle with velocity v, and suppose 
that it is going to collide sometime during the interval [t,t + 8t] with a particle with velocity v^ Now examine 
again the coordinate system with origin at the center of the particle with velocity v, and with the z-axis 
directed along the vector g = v 1 - v. By examining figure A3. 1.3 one can easily see that if the particle with 

velocity v 1 is somewhere at time t within the collision cylinder illustrated there, with volume |v 1 - v\na 2 8t, 
this particle will collide sometime during the interval [t,t + 5t] with the particle with velocity v, if no other 
particles interfere, which we assume to be the case. These collision cylinders will be referred to as (Vpv)- 
collision cylinders. We also ignore the possibility that the particle with velocity v 1 might, at time t, be 
somewhere within the action sphere of radius a about the centre of the velocity-v particle, since such events 
lead to terms that are of higher order in the density than those we are considering here, and such terms do not 
even exist if the duration of a binary collision is strictly zero, as would be the case for hard spheres, for 
example. 

We now compute r_by noting again the steps involved in calculating the mean free path, but applying them 
now to the derivation of an expression for r_ • 

• The number of (v 1? v)-collision cylinders in the region 8rSv is equal to the number of particles with 
velocity v in this region, /(r,v,0SrSv. 

• Each (v 1? v)-collision cylinder has the volume given above, and the total volume of these cylinders is 
equal to the product of the volume of each such cylinder with the number of these cylinders, that is/ 
(r,v,0|v 1 - \\na 2 8r8\8t. 

• If we wish to know the number of (v 1 ,v)-collisions that actually take place in this small time interval, 
we need to know exactly where each particle is located and then follow the motion of all the particles 
from time t to time t + 8t. In fact, this is what is done in computer simulated molecular dynamics. We 
wish to avoid this exact specification of the particle trajectories, and instead carry out a plausible 
argument for the computation of r_- To do this, Boltzmann made the following assumption, called 

the Stosszahlansatz, which we encountered already in the calculation of the mean free path: 
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Stosszahlansatz. The total number of(vj, v)-collisions taking place in bt equals the total volume of the (Vj, v)- 
collision cylinders times the number of particles with velocity v 1 per unit volume. 

After integration over v 1? we obtain 

T_ =Sr&\fir.\\l) j dY|frjra 2 |Y| - v|/(i\v ( ,0 (A3.1.34) 

The gas has to be dilute because the collision cylinders are assumed not to overlap, and also because collisions 
between more than two particles are neglected. Also it is assumed that/hardly changes over 5r so that the 
distribution functions for both colliding particles can be taken at the same position r. 

The assumptions that go into the calculation of f_are referred to collectively as the assumption of molecular 


chaos. In this context, this assumption says that the probability that a pair of particles with given velocities 
will collide can be calculated by considering each particle separately and ignoring any correlation between the 
probability for finding one particle with velocity v and the probability for finding another with velocity v 1 in 
the region Sr. 

For the construction of T + , we need to know how two particles can collide in such a way that one of them has 
velocity v after the collision. The answer to this question can be found by a more careful examination of the 

'direct' collisions which we have just discussed. To proceed with this examination, we note that the factor na 
appearing in (A3. 1.34) can also be written as an integral over the impact parameters and azimuthal angles of 

the (v 1? v) collisions. That is, na 2 = j b db j ds, where b, the impact parameter, is the initial distance between 
the centre of the incoming v 1 -particle and the axis of the collision cylinder (z-axis), and s is the angle between 
the x-axis and the position of particle 2 in the x-y plane. Here <b< a, and < s < 2tt. The laws of 
conservation of linear momentum, angular momentum, and energy require that both the impact parameter b, 
and |g| = |vj - v|, the magnitude of the relative velocity, be the same before and after the collision. To see what 
this means let us follow the two particles through and beyond a direct collision. We denote all quantities after 
the collision by primes. The conservation of momentum 

V] +Y = V| + V 

implies, after squaring and using conservation of energy 

v 2 + v 2 = v? + v t2 


that 


V] - v = vj ■ v'- 


By multiplying this result by a factor of -2, and adding the result to the conservation of energy equation, one 
easily finds |g| = |g'| = |v f j - v'|. This result, taken together with conservation of angular momentum, \igb = 
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where jli = ^m is the reduced mass of the two-particle system, shows that b is also conserved, b = b'. This is 
illustrated in figure A3. 1.7. 



direct 

collision 

cylinder 


Figure A3. 1.7. Direct and restituting collisions in the relative coordinate frame. The collision cylinders as 
well as the appropriate scattering and azimuthal angles are illustrated. 

Next, we denote the line between the centres of the two particles at the point of closest approach by the unit 
vector k. In figure A3. 1.7 it can also be seen that the vectors -g and g' are each other's mirror images in the 
direction of kin the plane of the trajectory of particles: 


g' = g-2(g-k)k 


(A3. 1.35) 


and thus (g • k) = -(g 1 • k). Together with conservation of momentum this gives 


Vi = V| - (g • k)k 
v' = v+ (g« k)k. 


(A3. 1.36) 


The main point of this argument is to show that if particles with velocities V and v^ collide in the right 
geometric configuration with impact parameter b, such a collision will result in one of the particles having the 
velocity of interest, v, after the collision. These kinds of collisions which produce particles with velocity v, 


contribute to T + , and are 
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referred to as 'restituting' collisions. This is illustrated in figure A3. 1.8 where particles having velocities v' 
and v'j are arranged to collide in such a way that the unit vector of closest approach, k, is replaced by -k. 
Consider, then, a collision with initial velocities v^ and v' and the same impact parameter as in the direct 
collision, but with kreplaced by -k. The final velocities are now v'^ and v", which are equal to v 1 and v, 
respectively, because 


v'( = vj - (g' ■ k)k = v{ + (g ■ k)k = V, (A3.1 .37) 


and 


v" = v'-(g.k)iUv. ( A3 - 1 - 38 ) 

Thus the increase of particles in our region due to restituting collisions with an impact parameter between b 
and b + db and azimuthal angle between e and e + ds (see figure A3. 1.7 can be obtained by adjusting the 
expression for the decrease of particles due to a 'small' collision cylinder: 

Loss: Athdhde |g|/(i\ v + t)f(r, v h t)AvA\\fir 
Gain: SThdhdc\g\fir.Y\rtf(r,v\,rt&y\$y f &r 

where 6 has to be integrated from to a, and 8 from to 2tt. Also, by considering the Jacobian for the 
transformation to relative and centre-of-mass velocities, one easily finds that dv 1 dv = dv dg, where v is the 
velocity of the centre-of-mass of the two colliding particles with respect to the container. After a collision, g is 
rotated in the centre-of-mass frame, so the Jacobian of the transformation (v,g) — » (v',g') is unity and dv dg = 
dv'dg 1 . So 

dV] dv = dv dg = dv' dg' = dv', dv (A3.1 .39) 

Now we are in the correct position to compute T + , using exactly the same kinds of arguments as in the 
computation of r_, namely, the construction of collision cylinders, computing the total volume of the relevant 

cylinders and again making the Stosszahlansatz. Thus, we find that 


l\= jll dv\bdh<Ae\v\ -v\f(r.Y\t)fir.v\.ty&rXvHt. 


(A3. 1.40) 


For every value of the velocity v, the velocity ranges dv^ 5v' in the above expression are only over that range 
of velocities v', v'j such that particles with velocity in the range 8v about v are produced in the (\\\\)- 
collisions. If we now use the equalities, equation (A3. 1.39), as well as the fact that |g| = |g'|, we can write 
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I\ = J I j dv, Ad/j<b|vi - v]f{r.Y\nf{r,v\.t)Sr&v&t. 


(A3. 1.41) 


The term describing the interaction with the walls, T w , is discussed in a paper by Dorfman and van Beijeren 
[14]. 




Figure A3. 1.8. Schematic illustration of the direct and restituting collisions. 

Finally, all of the T-terms can be inserted in (A3. 1.31) , and dividing by &8r5v gives the Boltzmann transport 
equation 


a/(r,Y,Q 
91 


+ v-V/(r,v,0 = J(./;/)+T w 


(A3. 1.42) 


where J(f,f) 


JJjdv, 


bAbte\v x -v\\ff x -fJ]. 


The primes and subscripts on the/s refer to their velocity arguments, and the primed velocities in the gain 
term should be regarded as functions of the unprimed quantities according to (A3. 1.36) . It is often convenient 
to rewrite the integral over the impact parameter and the azimuthal angle as an integral over the unit vector 
kas 


«. «. 


ghdh&t = B(g t k)dk 


(A3. 1.43) 


where 


dk = sin(7r — ^ } d(;r — ^ } d* 


(A3. 1.44) 


and \|/ is the angle between g and k. Then dk = |sin \\f d\|/ ds|, so that 
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B(£, k) = |vi - v| 


b 

Siller 

db 


(A3. 1.45) 


with the restriction for purely repulsive potentials that g • k< 0. As can be seen in figure A3. 1.7 . 


wg\k) = tf(g.-ib. 


(A3. 1.46) 


Let us apply this to the situation where the molecules are hard spheres of diameter a. We have db/d\\i = d(a sin 


\|/)/d\|/ = a cos \|/ (see figure A3. 1.9 , and£(g,lt) = ga 2 cos \|/ = a 2 |(g • k)|. 



Figure A3. 1.9. Hard sphere collision geometry in the plane of the collision. Here a is the diameter of the 
spheres. 

The Boltzmann equation for hard spheres is given then as 


§L + (v . 7/) = fl 2 f dvi / d£|g . j^y _ /,/] h 


(A3. 1.47) 


This completes the heuristic derivation of the Boltzmann transport equation. Now we turn to Boltzmann' s 
argument that his equation implies the Clausius form of the second law of thermodynamics, namely, that the 
entropy of an isolated system will increase as the result of any irreversible process taking place in the system. 
This result is referred to as Boltzmann 's H-theorem. 


A3.1.3.1 BOLTZMANN 'S H-THEOREM 


Boltzmann showed that under very general circumstances, there exists a time-dependent quantity, H(t), that 
never increases in the course of time. This quantity is given by 
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H(t) 


-/*/ 


dvi/(«*i>vi,0[ln/(ri.v u r>- l]. 


(A3. 1.48) 


Here, the spatial integral is to be carried out over the entire volume of the vessel containing the gas, and for 
convenience we have changed the notation slightly. Now we differentiate H with time, 


dH(t) 


tf(o / . f . a/, . , 

_ = ydr 1< /dv l -ln/ 1 


(A3. 1.49) 


and use the Boltzmann equation to find that 


%-JJ 


dridv,[-(v,.V/,) > 7(/i,/i) ir w ]ln/,. 


(A3. 1.50) 


We are going to carry out some spatial integrations here. We suppose that the distribution function vanishes at 
the surface of the container and that there is no flow of energy or momentum into or out of the container. (We 
mention in passing that it is possible to relax this latter condition and thereby obtain a more general form of 
the second law than we discuss here. This requires a careful analysis of the wall-collision term T . The 
interested reader is referred to the article by Dorfman and van Beijeren [14]. Here, we will drop the wall 
operator since for the purposes of this discussion it merely ensures that the distribution function vanishes at 
the surface of the container.) The first term can be written as 


-// 


dndV| V| - V[/l(1ii/| - ])]. (A3.1.51) 


This can be evaluated easily in terms of the distribution function at the walls of the closed container and 
therefore it is zero. The second term of (A3. 1.50) is based on the Stosszahlansatz, and is 

^P = //// dr i dv " dv 2 dkB(g,kWv,X/,'./2 - ft ft) < A3 - 1 - 52 ) 

with YfVj) = ln/j. The integrand may be symmetrized in Vj and v 2 to give 
6HU) I 


<lt 


= ^ jjjj <Jr, dv, dv, dk B( % , k)[*(v0 + *(v:)](/,7 2 ' - f,h). 


For each collision there is an inverse one, so we can also express the time derivative of the //-function in 
terms of the inverse collisions as 
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d//(f) 


dT^ = + ?//// dr| d> ' dv ' dit B(K '' - k)[nv ' ]) + *^JK/i A - Aft 
= -\ jjjj dr, dv, dv 2 dk /?(g, k)[*(v;> + *(v;>](/;y 2 ' - /,/,). 

We obtain the //-theorem by adding these expressions and dividing by two, 

^p- = I fffj dr, dv, dv 2 dk Big. k)[¥, + ¥ 2 - *( - * 2 ](/,7 2 - j\fi). 

Now, using ^(vj) = ln/j, we obtain 
Iffxfy *f-\ f^ the integrand is negative; 


fyfi < /1/2 l ^ c second faeior is positive* ihe firai is negative; 
f\ fi > ftfi the second is negative, and the first is positive. 


Both cases give a decreasing H(t). That is 

dtf(0 


< 0. (A3.1.53) 


The integral is zero only if for all v 1 and v 2 

flfz^ /|7:- (A3. 1.54) 

This is Boltzmann's //-theorem 

We now show that when // is constant in time, the gas is in equilibrium. The existence of an equilibrium state 
requires the rates of the restituting and direct collisions to be equal; that is, that there is a detailed balance of 
gain and loss processes taking place in the gas. 

Taking the natural logarithm of (A3. 1.54), we see that ln/j + ln^ has to be conserved for an equilibrium 
solution of the Boltzmann equation. Therefore, ln/j can generally be expressed as a linear combination with 
constant coefficients 
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of the (d + 2) quantities conserved by binary collisions, i.e. (i) the number of particles, (ii) the d components 

of the linear momentum, where d is the number of dimensions, and (iii) the kinetic energy: In 

/p = A + B ■ vi + C V] (Adding an angular momentum term to ln^^) is not independent of conservation of 

momentum, because the positions of the particles are the same.) The particles are assumed to have no internal 
degrees of freedom. Then 

ft a exp(B-Vi +Cv?) = dexp[-£jBm(vi - u) 2 ]. (A3.1.55) 

When H has reached its minimum value this is the well known Maxwell-Boltzmann distribution for a gas in 
thermal equilibrium with a uniform motion u. So, argues Boltzmann, solutions of his equation for an isolated 
system approach an equilibrium state, just as real gases seem to do. Up to a negative factor (-^ B , in fact), 
differences in H are the same as differences in the thermodynamic entropy between initial and final 
equilibrium states. Boltzmann thought that his //-theorem gave a foundation of the increase in entropy as a 
result of the collision integral, whose derivation was based on the Stosszahlansatz. 

(A) THE REVERSIBILITY AND THE RECURRENCE PARADOXES 

Boltzmann's //-theorem raises a number of questions, particularly the central one: how can a gas that is 
described exactly by the reversible laws of mechanics be characterized by a quantity that always decreases? 
Perhaps a non-mechanical assumption was introduced here. If so, this would suggest, although not imply, that 
Boltzmann's equation might not be a useful description of nature. In fact, though, this equation is so useful 


and accurate a predictor of the properties of dilute gases, that it is now often used as a test of intermolecular 
potential models. 

The question stated above was formulated in two ways, each using an exact result from classical mechanics. 
One way, associated with the physicist Loschmidt, is fairly obvious. If classical mechanics provides a correct 
description of the gas, then associated with any physical motion of a gas, there is a time-reversed motion, 
which is also a solution of Newton's equations. Therefore if //decreases in one of these motions, there ought 
to be a physical motion of the gas where //increases. This is contrary to the //-theorem. The other objection is 
based on the recurrence theorem of Poincare [15], and is associated with the mathematician Zermelo. 
Poincare's theorem states that in a bounded mechanical system with finite energy, any initial state of the gas 
will eventually recur as a state of the gas, to within any preassigned accuracy. Thus, if //decreases during part 
of the motion, it must eventually increase so as to approach, arbitrarily closely, its initial value. 

The recurrence paradox is easy to refute and was done so by Boltzmann. He pointed out that the recurrence 

time even for a system of a several particles, much less a system of 10 particles, is so enormously long 
(orders of magnitude larger than the age of the universe) that one will never live long enough to observe a 
recurrence. The usual response to Loschmidt is to argue that while the gas is indeed a mechanical system, 
almost all initial states of the gas one is likely to encounter in the laboratory will show an approach to 
equilibrium as described by the //-theorem. That is, the Boltzmann equation describes the most typical 
behaviour of a gas. While an anti-Boltzmann-like behaviour is not ruled out by mechanics, it is very unlikely, 
in a statistical sense, since such a motion would require a very careful (to put it mildly) preparation of the 
initial state. Thus, the reversibility paradox is more subtle, and the analysis of it eventually led Boltzmann to 
the very fruitful idea of an ergodic system [16]. In any case, there is no reason to doubt the validity of the 
Boltzmann equation for the description of irreversible processes in dilute gases. It describes the typical 
behaviour of a laboratory system, while any individual system may have small fluctuations about this typical 
behaviour. 
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A3.1.3.2 THE CHAPMAN-ENSKOG NORMAL SOLUTIONS OF THE BOLTZMANN EQUATION 

The practical value of the Boltzmann equation resides in the utility of the predictions that one can obtain from 
it. The form of the Boltzmann is such that it can be used to treat systems with long range forces, such as 
Lennard- Jones particles, as well as systems with finite-range forces. Given a potential energy function, one 
can calculate the necessary collision cross sections as well as the various restituting velocities well enough to 
derive practical expressions for transport coefficients from the Boltzmann equation. The method for obtaining 
solutions of the equation used for fluid dynamics is due to Enskog and Chapman, and proceeds by finding 
solutions that can be expanded in a series whose first term is a Maxwell-Boltzmann distribution of local 
equilibrium form. That is, the first takes the form given by (A3. 1.55) , with the quantities A, p and u being 

functions of r and t. One then assumes that the local temperature, (£gP) _1 , mean velocity, u, and local density, 
n, are slowly varying in space and time, where the distance over which they change, Z, say is large compared 

with a mean free path, X. The higher terms in the Chapman-Enskog solution are then expressed in a power 
series in gradients of the five variables, n, P and u, which can be shown to be an expansion in powers of l/L 
4^1. Explicit results are then obtained for the first, and higher, order solution in l/L, which in turn lead to 
Navier-Stokes as well as higher order hydrodynamic equations. Explicit expressions are obtained for the 
various transport coefficients, which can then be compared with experimental data. The agreement is 
sufficiently close that the theoretical results provide a useful way for checking the accuracy of various trial 
potential energy functions. A complete account of the Chapman-Enskog solution method can be found in the 
book by Chapman and Cowling [3], and comparisons with experiments, the extension to polyatomic 
molecules, and to quantum gases, are discussed at some length in the books of Hirshfelder et al [4], of Hanley 
[5] and of Kestin [J/7] as well as in an enormous literature. 


A3.1.3.3 EXTENSION OF THE BOLTZMANN EQUATION TO HIGHER DENSITIES 

It took well over three quarters of a century for kinetic theory to develop to the point that a systematic 
extension of the Boltzmann equation to higher densities could be found. This was due to the work of 
Bogoliubov, Green and Cohen, who borrowed methods from equilibrium statistical mechanics, particularly 
the use of cluster expansion techniques, to obtain a virial expansion of the right-hand side of the Boltzmann 
equation to include the effects of collisions among three, four and higher numbers of particles, successively. 
However, this virial expansion was soon found to diverge term-by-term in the density, beyond the three-body 
term in three dimensions, and beyond the two-body term in two dimensions. In order to obtain a well behaved 
generalized Boltzmann equation one has to sum these divergent terms. This has been accomplished using 
various methods. One finds that the transport coefficients for moderately dense gases cannot be expressed 
strictly in a power series of the gas density, but small logarithmic terms in the density also appear. Moreover, 
one finds that long-range correlations exist in a non-equilibrium gas, that make themselves felt in the effects 
of light scattering by a dense fluid with gradients in temperature or local velocity. Reviews of the theory of 
non-equilibrium processes in dense gases can be found in articles by Dorfman and van Beijeren [14], by 
Cohen [18] and by Ernst [19], as well as in the book of Resibois and de Leener [7]. 


A3.1.4 FLUCTUATIONS IN GASES 

Statistical mechanics and kinetic theory, as we have seen, are typically concerned with the average behaviour 
of an ensemble of similarly prepared systems. One usually hopes, and occasionally can demonstrate, that the 
variations of these properties from one system to another in the ensemble, or that the variation with time of 
the properties of any 
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one system, are very small. There is a well developed theory for equilibrium fluctuations of these types. In 
this theory one can relate, for example, the specific heat at constant volume of a system in contact with a 
thermal reservoir to the mean square fluctuations of the energy of the system. It is also well known that the 
scattering of light by a fluid system is determined by the density fluctuations in the system, caused by the 
motion of the particles in the system. These thermal fluctuations in density, temperature and other equilibrium 
properties of the system typically scale to zero as the number of degrees of freedom in the system becomes 
infinite. A good account of these equilibrium fluctuations can be found in the text by Landau and Lifshitz 
[20]. 

When a system is not in equilibrium, the mathematical description of fluctuations about some time-dependent 
ensemble average can become much more complicated than in the equilibrium case. However, starting with 
the pioneering work of Einstein on Brownian motion in 1905, considerable progress has been made in 
understanding time-dependent fluctuation phenomena in fluids. Modern treatments of this topic may be found 
in the texts by Keizer [21] and by van Kampen [22]. Nevertheless, the non-equilibrium theory is not yet at the 
same level of rigour or development as the equilibrium theory. Here we will discuss the theory of Brownian 
motion since it illustrates a number of important issues that appear in more general theories. 

We consider the motion of a large particle in a fluid composed of lighter, smaller particles. We also suppose 
that the mean free path of the particles in the fluid, X, is much smaller than a characteristic size, R, of the large 
particle. The analysis of the motion of the large particle is based upon a method due to Langevin. Consider the 
equation of motion of the large particle. We write it in the form 


dv{r) 

M — — = SW) + V[t) (A3.1.56) 

df 


where Mis the mass of the large particle, \(t) is its velocity at time t, the quantity -Cy(t) represents the 
hydrodynamic friction exerted by the fluid on the particle, while the term F(t) represents the fluctuations in 
the force on the particle produced by the discontinuous nature of the collisions with the fluid particles. If the 
Brownian particle is spherical, with radius 7?, then £ is usually taken to have the form provided by Stokes' law 
of friction on a slowly moving particle by a continuum fluid, 

f = 6th}R (A3. 1.57) 

where r| is the shear viscosity of the fluid. The fact that the fluid is not a continuum is incorporated in the 
fluctuating force F(t). This fluctuating force is taken to have the following properties 

(¥(t)} =0 (A3.1.58) 

(Fj£fi)F,(/ 2 )} = AS(h - t 2 )8k t U. j) (A3.1.59) 

where A is some constant yet to be determined, 8(t 1 - 1 2 ) is a Dirac delta function in the time interval between 
fj and t 2 , and 5 Kr (/,y) is a Kronecker delta function in the components of the fluctuating force in the directions 
denoted by i, j. 
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The angular brackets denote an average, but the averaging process is somewhat subtle, and discussed in some 
detail in the book of van Kampen [22]. For our purposes, we will take the average to be over an ensemble 
constructed by following a long trajectory of one Brownian particle, cutting the trajectory into a large number 
of smaller, disjoint pieces, all of the same length, and then taking averages over all of the pieces. Thus the 
pieces of the long trajectory define the ensemble over which we average. The delta function correlation in the 
random force assumes that the collisions of the Brownian particle with the particles of the fluid are 
instantaneous and uncorrelated with each other. 

If we now average the Langevin equation, (A3. 1.56) , we obtain a very simple equation for (\(t)), whose 
solution is clearly 


(v(|» = (v(0))e" fr/W . (A3.1.60) 

Thus the average velocity decays exponentially to zero on a time scale determined by the friction coefficient 
and the mass of the particle. This average behaviour is not very interesting, because it corresponds to the 
average of a quantity that may take values in all directions, due to the noise and friction, and so the decay of 
the average value tells us little about the details of the motion of the Brownian particle. A more interesting 

quantity is the mean square velocity, (\(t) ), obtained by solving (A3. 1.56) , squaring the solution and then 
averaging. Here the correlation function of the random force plays a central role, for we find 

\ A 

<v 2 (0> = (v I (0)}c -2c " ,,w + — (I -e _ic ' /M ). (A3. 1.61) 


Notice that this quantity does not decay to zero as time becomes long, but rather it reaches the value 

{V 2 (0>-|£ (A3.1.62) 

as t — » oo. Here we have the first appearance of A, the coefficient of the correlation of the random force. It is 
reasonable to suppose that in the infinite-time limit, the Brownian particle becomes equilibrated with the 
surrounding fluid. This means that the average kinetic energy of the Brownian particle should approach the 
value 3k B T/2 as time gets large. This is consistent with (A3. 1.62), if the coefficient A has the value 

A = -^77-. (A3.1.63) 

Thus, the requirement that the Brownian particle becomes equilibrated with the surrounding fluid fixes the 
unknown value of A, and provides an expression for it in terms of the friction coefficient, the thermodynamic 
temperature of the fluid, and the mass of the Brownian particle. Equation (A3. 1.63) is the simplest and best 
known example of a fluctuation-dissipation theorem, obtained by using an equilibrium condition to relate the 
strength of the fluctuations to the frictional forces acting on the particle [22]. 
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Two more important ideas can be illustrated by means of the Langevin approach to Brownian motion. The 
first result comes from a further integration of the velocity equation to find an expression for the fluctuating 
displacement of the moving particle, and for the mean square displacement as a function of time. By carrying 
out the relevant integrals and using the fluctuation-dissipation theorem, we can readily see that the mean 
square displacement, ((r(t) - r(0)) 2 ), grows linearly in time t, for large times, as 


(fr(0-r(0)) 2 > = ^li£,. (A3.1.64) 


Now for a particle undergoing diffusion, it is also known that its mean square displacement grows linearly in 
time, for long times, as 

((r(f) - r(0)) 3 ) = 6Dt (A3. 1.65) 

where D is the diffusion coefficient of the particle. By comparing (A3. 1.64) and (A3. 1.65) we see that 

O = — . (A3. 1.66) 

This result is often called the Stokes-Einstein formula for the diffusion of a Brownian particle, and the Stokes' 
law friction coefficient 6iir[R is used for Q. 

The final result that we wish to present in this connection is an example of the Green-Kubo time-correlation 
expressions for transport coefficients. These expressions relate the transport coefficients of a fluid, such as 


viscosity, thermal conductivity, etc, in terms of time integrals of some average time-correlation of an 
appropriate microscopic variable. For example, if we were to compute the time correlation function of one 
component of the velocity of the Brownian particle, ( v x (^) v x (^2))' we wou ld obtain 

M'lKrta)) = kvTe-^^M** (A3.1.67) 

for large times, neglecting factors that decay exponentially in both t^ and tj. The Green-Kubo formula for 
diffusion relates the diffusion coefficient, D to the time integral of the time-correlation of the velocity through 




dr {^{0)^(0} (A3.1.68) 


a result which clearly reproduces (A3. 1.66). The Green-Kubo formulae are of great interest in kinetic theory 
and non-equilibrium statistical mechanisms since they provide a new set of functions, the time-correlation 
functions, 
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that tell us more about the microscopic properties of the fluid than do the transport coefficients themselves, 
and that are very useful for analysing fluid behaviour when making computer simulations of the fluid. 
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A3.2 Non-equilibrium thermodynamics 


Ronald F Fox 


A3.2.1 INTRODUCTION 

Equilibrium thermodynamics may be developed as an autonomous macroscopic theory or it may be derived 
from microscopic statistical mechanics. The intrinsic beauty of the macroscopic approach is partially lost with 
the second treatment. Its beauty lies in its internal consistency. The advantage of the second treatment is that 
certain quantities are given explicit formulae in terms of fundamental constants, whereas the purely 
macroscopic approach must use measurements to determine these same quantities. The Stefan-Boltzmann 
constant is a prime example of this dichotomy. Using purely macroscopic thermodynamic arguments, 

Boltzmann showed that the energy density emitted per second from a unit surface of a black body is oT^ 
where T is the temperature and a is the Stefan-Boltzmann constant, but it takes statistical mechanics to 
produce the formula 


15c 2 /i 3 


in which k is Boltzmann's constant, c is the speed of light and h is Planck's constant. This beautiful formula 
depends on three fundamental constants, an exhibition of the power of the microscopic viewpoint. Likewise, 
non-equilibrium thermodynamics may be developed as a purely autonomous macroscopic theory or it may be 
derived from microscopic kinetic theory, either classically or quantum mechanically. The separation between 
the macroscopic and microscopic approaches is a little less marked than for the equilibrium theory because 
the existence of the microscopic underpinning leads to the existence of fluctuations in the macroscopic 
picture, as well as to the celebrated Onsager reciprocal relations. On purely macroscopic grounds, the 
fluctuation-dissipation relation that connects the relaxation rates to the strengths of the fluctuations may be 
established, but it takes the full microscopic theory to compute their quantitative values, at least in principle. 
In practice, these computations are very difficult. This presentation is primarily about the macroscopic 
approach, although at the end the microscopic approach, based on linear response theory, is also reviewed. 


The foundations for the macroscopic approach to non-equilibrium thermodynamics are found in Einstein's 
theory of Brownian movement [1] of 1905 and in the equation of Langevin [2] of 1908. Uhlenbeck and 
Ornstein [3] generalized these ideas in 1930 and Onsager [4, 5] presented his theory of irreversible processes 
in 1931. Onsager' s theory [4] was initially deterministic with little mention of fluctuations. His second paper 
[5] used fluctuation theory to establish the reciprocal relations. The fundamental role of fluctuations was 
generalized by the works of Chandrasekhar [6] in 1943, of Wang and Uhlenbeck [7] in 1945, of Casimir [8] in 
1945, of Prigogine [9] in 1947 and of Onsager and Machlup [10, 11] in 1953. In 1962 de Groot and Mazur 
[ 12 ] published their definitive treatise Non-Equilibrium Thermodynamics which greatly extended the 
applicability of the theory as well as deepening its foundations. By this time, it was clearly recognized that the 
mathematical setting for non-equilibrium thermodynamics is the theory of stationary, Gaussian-Markov 
processes. The Onsager reciprocal relations may be most easily understood within this context. Nevertheless, 
the issue of the most general form for stationary, Gaussian-Markov processes, although 


broached by de Groot and Mazur [12], was not solved until the work of Fox and Uhlenbeck [13, 14 and 15] in 
1969. For example, this work made it possible to rigorously extend the Onsager theory to hydrodynamics and 
to the Boltzmann equation. 


A3.2.2 GENERAL STATIONARY GAUSSIAN-MARKOV PROCESSES 

A3.2.2.1 CHARACTERIZATION OF RANDOM PROCESSES 

Let a(t) denote a time dependent random process. a(t) is a random process because at time t the value of a(t) is 
not definitely known but is given instead by a probability distribution function W^(a, t) where a is the value a 
(t) can have at time t with probability determined by W^(a, t). W^(a 9 i) is the first of an infinite collection of 
distribution functions describing the process a(i) [7, 13 ]. The first two are defined by 

W^(a, t) da = probability at time t that the value of a(t) is between a and a + da 

W 2 (a v fp a j, t 2 ) da i da 2 = probability that at time ^ the value of a(t) is between a^ and a^ + da^ and that at 
time t 2 the value of a(t) is between a 2 and a 2 + da 2 . 

The higher order distributions are defined analogously. W 2 contains W^ through the identity 

Similar relations hold for the higher distributions. The Gaussian property of the process implies that all 
statistical information is contained in just W^ and W 2 . 

The condition that the process a(t) is a stationary process is equivalent to the requirement that all the 
distribution functions for a(t) are invariant under time translations. This has as a consequence that W^{a, t) is 
independent of t and that W 2 (a^ t^; a 2 , t 2 ) only depends on t = t 2 - tj. An even stationary process [4] has the 
additional requirement that its distribution functions are invariant under time reflection. For W 2 , this implies 
W 2 (a^ ; a 2 , t) = W 2 {a 2 ; a^i). This is called microscopic reversibility. It means that the quantities are even 
functions of the particle velocities [12]. It is also possible that the variables are odd functions of the particle 
velocities [8], say, Z? 1 and b 2 for which W 2 (b^ b 2 , t) = W 2 {- b 2 ; -b^ i). In the general case considered later, 
the thermodynamic quantities are a mixture of even and odd [14, 15 ]. For the odd case, the presence of a 


magnetic field, B, or a coriolis force depending on angular velocity co, requires that B and co also change sign 
during time reversal and microscopic reversibility reads as W 2 (b^ b 2 , B, cot) = W 2 {- b 2 ; -bp -B, -co ,t). 
Examples of even processes include heat conduction, electrical conduction, diffusion and chemical reactions 
[4]. Examples of odd processes include the Hall effect [ 12 ] and rotating frames of reference [4]. Examples of 
the general setting that lacks even or odd symmetry include hydrodynamics [14] and the Boltzmann equation 
[15]. 

Before defining a Markov process a(t), it is necessary to introduce conditional probability distribution 
functions P 2 (a^ , ^ | a 2 , t 2 ) and P^a^ ^ ; a 2 , t 2 \ a^ t^) defined by 


P 2 (a v t^ | a 2 , t 2 ) da 2 = probability at time t 2 that the value of a(t) is between a 2 and a 2 + da 2 given that at 
time fj < t 2 a (t) had the definite value a v 

P^cip tp a 2 , t 2 | a 3 , ty) da 3 = probability at time t^ that the value of a(t) is between a^ and a^ + da^ given that 
at time t 2 < t^ a(t) had the definite value a 2 and at time t^ < t 2 a(t) had the definite value a v 

These conditional distributions are related to the W by 

n J 


and so forth for the higher order distributions. The Markov property of a(t) is defined by 

^2(^2^2 \a^r,) = Pi{a\,ti:a 2 ,t2 | a 3i f 3 ) (A3.2.1) 

which means that knowledge of the value of a(t) at time ^ does not influence the distribution of values of a(t) 
at time t^ > ^ if there is also information giving the value of a(t) at the intermediate time t 2 . Therefore, a 
Markov process is completely characterized by its W^ and P 2 or equivalently by only W 2 . A stationary 
Markov process has distributions satisfying the Smoluchowski equation (also called the Chapman- 
Kolmogorov equation) 

Vfi{u\\a2t)— i daW 2 (ti\,a,t - s)P2(a | « 2 >j) forall s € [0. r], (A3.2.2) 

Proof. For ty < t^ < t 2 and using (A3. 2.1) 


■/ 


= j da}W 2 {ai.t\'*aiit3)P2(aytt}\a2t2). 

Setting s = t 2 - t^ and t= t 2 -t ] and a^ = a gives (A3. 2. 2) for a stationary process a(t). QED 


While the Smoluchowski equation is necessary for a Markov process, in general it is not sufficient, but known 
counter-examples are always non-Gaussian as well. 


A3.2.2.2 THE LANGEVIN EQUATION 

The prototype for all physical applications of stationary Gaussian-Markov processes is the treatment of 
Brownian movement using the Langevin equation [2, 3, 7]. The Langevin equation describes the time change 
of the velocity of a slowly moving colloidal particle in a fluid. The effect of the interactions between the 
particle and the fluid molecules produces two forces. One force is an average effect, the frictional drag that is 
proportional to the velocity, whereas the other force is a fluctuating force, f\t), that has mean value zero. 
Therefore, a particle of mass M obeys the Langevin equation 


M— = -uw + F(/) (A3.2.3) 

dt 

where a is the frictional drag coefficient and u is the particle's velocity. It is the fluctuating force, f\t) 9 that 
makes u a random process. For a sphere of radius R in a fluid of viscosity r| , a = 6n r\ R, a result obtained 
from hydrodynamics by Stokes in 1854. To characterize this process it is necessary to make assumptions 
about f\t). F(t) is taken to be a stationary Gaussian process that is called white noise. This is defined by the 
correlation formula 

{FffiFis)} = 2XS(t - S) (A3.2.4) 

where (...) denotes averaging over F(f), X is a constant and the Dirac delta function of time expresses the 
quality of whiteness for the noise. The linearity of (A3. 2. 3) is sufficient to guarantee that u is also a stationary 
Gaussian process, although this claim requires some care as is shown below. 

Equation (A3. 2. 3) must be solved with respect to some initial value for the velocity, w(0). In the conditional 
distribution for the process u(t), the initial value, w(0), is denoted by u^ giving P 2 ( u q I u , 0- Because u(t) is a 
Gaussian process, P 2 ( u q I u -> is completely determined by the mean value of u(t) and by its mean square. 
Using (A3. 2.4) and recalling that 
(F(0) = it is easy to prove that 

(U(O) = W(0) CXp [ - -wl (A32.5) 


(A3.2.6) 


-cxp[-^„ + .v)]) 


(A3.2.7) 


Using a = X /a Mand p(t) = exp [-(a/M)t], P 2 ( u q I u > is given by 

e xp[-f»-ii^(/)) 2 /2a 2 (1-p 2 {f»] 
P 2 (h | «, 1) = == (A3.2.8) 

which is checked by seeing that it reproduces ( A3.2.5 ) and ( A3.2.6 ). From (A3. 2. 8), it is easily seen that the t 
— > oo limit eliminates any influence of Uq 

limit ft(«o | w,0 = ^l(w) = JI -_— L -■ 
*■*«> yf27rXf&M 

However, ^(w) should also be given by the equilibrium Maxwell distribution 

w ' ( " )= v^^ (A329) 

in which k is Boltzmann's constant and T is the equilibrium temperature of the fluid. The equality of these two 
expressions for W^(u) results in Einstein's relation 

k = kT&. (A3.2.10) 

Putting (A3. 2. 10) into ( A3. 2.4 ) gives the prototype example of what is called the fluctuation-dissipation 
relation 

(f{!)F{s)} =2kTu&(i -*). 

Looking back at ( A3. 2. 7 ), we see that a second average over u(0) can be performed using W^(u^). This second 
type of averaging is denoted by {...}. Using (A3. 2. 9) we obtain 

M 

which with ( A3. 2. 7 ) and (A3. 2. 10), the Einstein relation, implies 


{(«(0»Cv))) = ^rexp[-^|r-.v|]. 


This result clearly manifests the stationarity of the process that is not yet evident in ( A3. 2. 7 ). 


Using W 2 = ^^2' ( A3. 2. 8 ) and ( A3. 2. 9 ) may be used to satisfy the Smoluchowski equation, ( A3. 2. 2 ), 
another necessary property for a stationary process. Thus u(t) is an example of a stationary Gaussian-Markov 


process. In the form given by ( A3. 2. 3 ), the process u(t) is also called an Ornstein-Uhlenbeck process ('OU 
process'). 

Consider an ensemble of Brownian particles. The approach of P 2 to W^ as t — > oo represents a kind of 
diffusion process in velocity space. The description of Brownian movement in these terms is known as the 
Fokker-Planck method [16]. For the present example, this equation can be shown to be 


d w 3 ATc* d 2 

3/ M wit Af J 3w J 

subject to the initial condition P 2 (u, 0) = 8(u - u^). The solution to (A3. 2. 1 1) is given by ( A3. 2. 8 ). The 
Langevin equation and the Fokker-Planck equation provide equivalent complementary descriptions [17]. 


A3.2.3 ONSAGER'S THEORY OF NON-EQUILIBRIUM 
THERMODYNAMICS 

A3.2.3.1 REGRESSION EQUATIONS AND FLUCTUATIONS 

For a system which is close to equilibrium, it is assumed that its state is described by a set of extensive 
thermodynamic variables, a^(t), a 2 (t), . . . , a n (t) where n is very much less than the total number of degrees of 

freedom for all of the molecules in the system. The latter may be of order 10 while the former may be fewer 
than 10. In equilibrium, the a. are taken to have value zero so that the non-equilibrium entropy is given by 

S= So- \kaiEijQj (A3.2.12) 

where S^ is the maximum equilibrium value of the entropy, E.. is a symmetric positive definite time 
independent entropy matrix and repeated indices are to be summed, a convention used throughout this 
presentation. Thermodynamic forces are defined by 

X i =^-=-kE iJ aj. (A3.2.13) 

Onsager postulates [4, 5] the phenomenological equations for irreversible processes given by 

fly^tf/ = RijJj = Xi = -kE f} aj (A3.2.14) 


in which the J. are called the thermodynamic fluxes, and which is a natural generalization of the linear 
phenomenological laws such as Fourier's law of heat conduction, Newton's law of internal friction etc. The 
matrix R.. is real with eigenvalues having positive real parts and it is invertible. These equations are 
regression equations whose solutions approach equilibrium asymptotically in time. 


Since the a f are thermodynamic quantities, their values fluctuate with time. Thus, ( A3. 2. 14 ) is properly 
interpreted as the averaged regression equation for a random process that is actually driven by random 


thermodynamic forces, l'.(t). The completed equations are coupled Langevin-like equations 


Rij — aj = R fJ Jj = X; +^ = -kEijQj + fl (A3.2.15) 

The mean values of the £.(£) are zero and each is assumed to be stationary Gaussian white noise. The linearity 

of these equations guarantees that the random process described by the a f is also a stationary Gaussian- 
Markov process [12]. Denoting the inverse of R.. by L.. and using the definition 

(A3.2.15) may be rewritten as 

— tfi = Ji = LfjXj + F { = -kL;jEj k a k + Ft. (A3.2.16) 

Since the f\ are linearly related to the eJ(f), they are also stationary Gaussian white noises. This property is 
explicitly expressed by 

{F l U)F J (t)) = 2Q, / Wt-s) (A3.2.17) 

in which Q.., the force-force correlation matrix, is necessarily symmetric and positive definite. While 

u 
(A3.2.16) suggests that the fluxes may be coupled to any force, symmetry properties may be applied to show 

that this is not so. By establishing the tensor character of the different flux and force components, it can be 

shown that only fluxes and forces of like character can be coupled. This result is called Curie's principle [9, 

12]. 

Let G f . = kL ik Er.. The solution to (A3.2.16) is 

MO = (cxp[-G/]); > a > (0)+ / d*(cxp[-G(f - s)]) h f )(s). (A3.2.18) 

/0 


The statistics for the initial conditions, a.(0), are determined by the equilibrium distribution obtained from the 
entropy in ( A3. 2. 12 ) and in accordance with the Einstein-Boltzmann-Planck formula 

]/2 


/ iieii \ j ^ r i i 


(A3.2.19) 


where ||E|| is the determinant of E... {. . .} will again denote averaging with respect to W^ while (. . .) 
continues to denote averaging over the F.. Notice in (A3.2.19) that W^ now depends on n variables at a single 

time, a natural generalization of the situation reviewed above for the one-dimensional Langevin equation. 
This simply means the process is n dimensional. 

A3.2.3.2 TWO TIME CORRELATIONS AND THE ONSAGER RECIPROCAL RELATIONS 


From ( A3.2.18 ) it follows that 

(^(0) = {exp[-GfJ) j;f i y (0) 

and 

{{<KO)}=0. 

All the other information needed for this process is contained in the two time correlation matrix because the 
process is Gaussian. A somewhat involved calculation [ 18 ] results (for t 2 > t^) in 

Xfj{t*ti) = {(fli('2)«yt/|))} = (cxp[-G(/j -/OJJ^cxpI-QfilE" 1 cxp[-G + /,]), ; 
+ 2Cexp[-GU 2 - r,)])u j d^expI-Gfr, - j)]Qexp[-e*(/, 


(A3.2.20) 


If we set t 1 = t,= t, stationarity requires that %Jt, t) = (E )•• because (A3.2.19) implies 

Z 1 •" - ^ ij 

[a;,aj}= j d Ii aW ] (a [ ,a 2 thM^j = (E -1 )/;. 

By looking at G% (t, t) + % (t, t)G\ the resulting integral implied by (A3. 2. 20) for t 2 = t^ = t contains an exact 
differential [ 18 ] and one obtains 

GE -i^ E -J G * =Ga(M) + jdKOG f =2Q+exp[-Gf]lGE"' -E _l G f - 2Q)exp[-G*fJ. 

This is compatible with stationarity if and only if 


GE" 1 +E"'G + = 2Q (A3.2.21) 

which is the general case fluctuation-dissipation relation [12]. Inserting this identity into ( A3.2.17 ) and using 
similar techniques reduces X/-(^' *i) to ^ e man if est ly stationary form 

X;j(h -h) = (exp[-G|r 2 - 1\ \])u(E-%. (A3.2.22) 

In Onsager's treatment, the additional restriction that the a. are even functions of the particle velocities is 
made. As indicated above, this implies microscopic reversibility which in the present ^-dimensional case 
means (for t = t 2 -t^>0) W 2 (a^ a 2 , . . . , a ; a^, a 2 \ . . . , ', t) = W 2 (a^\ a 2 \ . . . , a '; a^, a 2 , . . . , a , i). 
This implies 


Xijttz -t { )= I d ff rt dV W 2 («! , a 2 a fl Ty,(i\.a f 2 a fr , tx)a f a f 

/(A3.2.23) 
d' f a d' J V W2(&\, 1*2* * - * *Qn*t\l a \* a 2* - - - 1 

Take the ^-derivative of this equation by using (A3. 2. 22) and set t 2 = ^ 

j I 

■7-Xij{t2 ~ h),-= fi = -G ik (ET% = —Xjiit2 ~ fl),.- f| = -C;t(E" ] )*,- (A3.2.24) 

Inserting the definition of G gives the celebrated Onsager reciprocal relations [4, 5] 

Ly = L/,. (A3.2.25) 

If odd variables, the 6, are also included, then a generalization by Casimir [8] results in the Onsager-Casimir 
relations 

L im {B, M) = -/-™(-B, -w) (A3.2.26) 

L„ m {B.w) = Zr imr (-B,-^) 

wherein the indices i and 7 are for variables a and the indices m and « are for variables b. These are proved in 
a similar fashion. For example, when there are mixtures of variables a and b, microscopic reversibility 
becomes W 2 (a 1 , . . . , a p , b p + v . . . , 6 w ;a 1 f , . . ., a p \ b p + 1 ', . . . , & w ', £, to, = J^p • • • > V' ~ 6 /> + 1' * * * ' 
-ft' ; a, ..... a , -Z? ,! ..... -Z? , -5, -co, 0- A cross-correlation between an even variable and an odd 
variable is given by 
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»«(«. w. n - r, ) = /" d p « d'-'frdV d rt "^' W a (^ ^. V* *>■• '1 = *i- - - ■ V 

b*r V „^b w n .B^h\&'fi m 
= f &pQ&"-*bWa f 4r-Pb'W 2 Ui r x *i--Vi 

-**M -A, r , -B, -faM; )/;„£!,' 

= -/«,(- B. -w,t 2 -U) 

in which the last identity follows from replacing all b by their negatives. Differentiation of this expression 
with respect to t 2 followed by setting t 2 = t^ results in the middle result of ( A3. 2. 26 ). 

In the general case, ( A3. 2. 23 ) cannot hold because it leads to ( A3. 2. 24 ) which requires GE _1 = (GE _1 ) * 
which is in general not true. Indeed, the simple example of the Brownian motion of a harmonic oscillator 
suffices to make the point [7, 14, 18]. In this case the equations of motion are [3, 7] 


^d* j &P ljr 3 car - 

H4— = p and — + Mw * = - — p + f (A3.2.27) 

dt dr M 


where M is the oscillator mass, co is the oscillator frequency and a is the friction coefficient. The fluctuating 
force, F, is Gaussian white noise with zero mean and correlation formula 

K x V,/) = S«/(r)«p[-<tj-iu>)f] 

Define y by y = Mcox. The identifications 

permit writing (A3. 2. 27) as 

d 

— m = -Ajjiij - Sijdj + F ( . 

Clearly, G = A + S in this example. The entropy matrix can be obtained from the Maxwell-Boltzmann 
distribution 


*(,.,) -*-p[-|^r 
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which implies that 


£ •--!-c , n 


The 2 of ( A3. 2. 17 ) is clearly given by 


li >-\ci dj- 


In this case the fluctuation-dissipation relation, ( A3. 2. 21 ), reduces to D = kTa. It is also clear that GE = (A 
+ S)/Mkt which is not self-adjoint. 

A3.2.3.3 LEAST DISSIPATION VARIATIONAL PRINCIPLES 

Onsager [4, 5] generalized Lord Rayleigh's 'principle of the least dissipation of energy' [19]. In homage to 
Lord Rayleigh, Onsager retained the name of the principle (i.e. the word energy) although he clearly stated [4] 
that the role of the potential in this principle is played by the rate of increase of the entropy [9]. This idea is an 
attempt to extend the highly fruitful concept of an underlying variational principle for dynamics, such as 
Hamilton's principle of least action for classical mechanics, to irreversible processes. Because the regression 


equations are linear, the parallel with Lord Rayleigh's principle for linear friction in mechanics is easy to 
make. It has also been extended to velocity dependent forces in electrodynamics [20]. 

From ( A3. 2. 12 ), it is seen that the rate of entropy production is given by 


i s = -(^ a ) kE ^ = J ^ 


Wherein the definition of the thermodynamic fluxes and forces of ( A3. 2. 13 ) and ( A3. 2. 14 ) have been used. 
Onsager defined [5] the analogue of the Rayleigh dissipation function by 

When the reciprocal relations are valid in accord with ( A3. 2. 25 ) then R is also symmetric. The variational 
principle in this case may be stated as 

= &[J { X; - *] = [X; - R r jJj]SJi 
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wherein the variation is with respect to the fluxes for fixed forces. The second variation is given by -R, which 
has negative eigenvalues (for symmetric 7? the eigenvalues are real and positive), which implies that the 
difference between the entropy production rate and the dissipation function is a maximum for the averaged 
irreversible process, hence least dissipation [5]. By multiplying this by the temperature, the free energy is 
generated from the entropy and, hence, Onsager's terminology of least dissipation of energy. Thus, the 
principle of least dissipation of entropy for near equilibrium dynamics is already found in the early work of 
Onsager [5,9]. 

Related variational principles for non-equilibrium phenomena have been developed by Chandrasekhar [21]. 
How far these ideas can be taken remains an open question. Glansdorff and Prigogine [22] attempted to 
extend Onsager's principle of the least dissipation of entropy [4, 5, 9] to non-linear phenomena far away from 
equilibrium. Their proposal was ultimately shown to be overly ambitious [23]. A promising approach for the 
non-linear steady state regime has been proposed by Keizer [23, 24]. This approach focuses on the covariance 
of the fluctuations rather than on the entropy, although in the linear regime around full equilibrium, the two 
quantities yield identical principles. In the non-linear regime the distinction between them leads to a novel 
thermodynamics of steady states that parallels the near equilibrium theory. An experiment for non-equilibrium 
electromotive force [25] has confirmed this alternative approach and strongly suggests that a fruitful avenue 
of investigation for far from equilibrium thermodynamics has been opened. 


A3.2.4 APPLICATIONS 

A3.2.4.1 SORET EFFECT AND DUFOUR EFFECT [12] 

Consider an isotropic fluid in which viscous phenomena are neglected. Concentrations and temperature are 
non-uniform in this system. The rate of entropy production may be written 


fi-Vvi + j>v(-£) (A3.2.28, 


in which / is the heat flux vector and /. is the mass flux vector for species i which has chemical potential \i f . 
This is over-simplified for the present discussion because the n mass fluxes, J., are not linearly independent 
[12]. This fact may be readily accommodated by eliminating one of the fluxes and using the Gibbs-Duhem 
relation [12]. It is straightforward to identify the thermodynamic forces, X, using the generic form for the 
entropy production in ( A3. 2. 27 ). The fluxes may be expressed in terms of the forces by 
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The Onsager relations in this case are 

Lqi = Ljtf tUUl Lfj — Ljj. 

The coefficients, L., are characteristic of the phenomenon of thermal diffusion, i.e. the flow of matter caused 


by a temperature gradient. In liquids, this is called the Soret effect [12]. A reciprocal effect associated with the 

coefficient L . is called the Dufour effect [ 12 ] and describes heat flow caused by concentration gradients. The 

qi 

Onsager relation implies that measurement of one of these effects is sufficient to determine the coupling for 

both. The coefficient L is proportional to the heat conductivity coefficient and is a single scalar quantity in 

qq 

an isotropic fluid even though its associated flux is a vector. This fact is closely related to the Curie principle. 

The remaining coefficients, Z.., are proportional to the mutual diffusion coefficients (except for the diagonal 

y 
ones which are proportional to self-diffusion coefficients). 

Chemical reactions may be added to the situation giving an entropy production of 

f = J ,vl + ^.v(-f)-l|;^ 

in which there are r reactions with variable progress rates related to the J. and with chemical affinities A .. 
Once again, these fluxes are not all independent and some care must be taken to rewrite everything so that 
symmetry is preserved [12]. When this is done, the Curie principle decouples the vectorial forces from the 
scalar fluxes and vice versa [9]. Nevertheless, the reaction terms lead to additional reciprocal relations 
because 

r 

implies that L ^ = L^.. 

These are just a few of the standard examples of explicit applications of the Onsager theory to concrete cases. 


There are many more involving acoustical, electrical, gravitational, magnetic, osmotic, thermal and other 
processes in various combinations. An excellent source for details is the book by DeGroot and Mazur [12], 
which was published in a Dover edition in 1984, making it readily accessible and inexpensive. There, one will 
find many specific accounts. For example, in the case of thermal and electric or thermal and electromagnetic 
couplings: (1) the Hall effect is encountered where the Hall coefficient is related to Onsager's relation through 
the resistivity tensor; (2) the Peltier effect is encountered where Onsager's relation implies the Thompson 
relation between the thermo-electric power and the Peltier heat and (3) galvanomagnetic and thermomagnetic 
effects are met along with the Ettinghausen effect, the Nernst effect and the Bridgman relation. In the case of 
so-called discontinuous systems, the thermomolecular pressure effect, thermal effusion and the 
mechanocaloric effect are encountered as well as electro-osmosis. Throughout, the entropy production 
equation plays a central role [12]. 
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A3.2.4.2 THE FLUCTUATING DIFFUSION EQUATION 

A byproduct of the preceding analysis is that the Onsager theory immediately determines the form of the 
fluctuations that should be added to the diffusion equation. Suppose that a solute is dissolved in a solvent with 
concentration c. The diffusion equation for this is 

(A3.2.29) 

in which D is the diffusion constant. This is called Pick's law of diffusion [12]. From ( A3. 2. 28 ), the 
thermodynamic force is seen to be (at constant T) 

wherein it has been assumed that the solute is a nonelectrolyte exhibiting ideal behaviour with \i = kT ln(c) 
and c is the equilibrium concentration. Since this is a continuum system, the general results developed above 
need to be continuously extended as follows. The entropy production in a volume Fmay be written as 

^-S= ( d-V(DVc) ■ (—Vc\ = — [ d-V(Vc-) * (Vc) 
d/ J \ci\ / Cq J 

= ~f dV j 1 dV(Vc-(r)) - S(r - r)(Vc-(r» (A3.2.30) 

= — / dV /dVr(r)(V- V'i(r-r')Mr'). 

The continuous extension of ( A3. 2. 12 ) becomes 

u 1 Hu k 

X = -V- = -Vc = Vc 

T T dc qj 

The time derivative of this expression together with (A3. 2. 29) implies 


= --^- f dV [ dV[c{r)(V 2 £;(r - r') t V^Eir - r'))c(r')]. 


(A3.2.31) 
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Equations ( A3.2.30 ) and ( A3.2.31 ) imply the identity for the entropy matrix 

E{r — r) = — &(r — r). 

Equation (A3 .2 .29) also implies that the extension of G is now 

G(r - r) = -DV 2 &(r - r'). 
The extension of the fluctuation-dissipation relation of ( A3.2.21 ) becomes 

2Q(r - r) = f dV[G(r - r")E~\r" - r') + £~V - r")G(r'' - r')] 
= -2Dc (l V 2 5(r-r'). 

This means that the fluctuating force can be written as 

/->, /) = V . (f(r. /) where &,<r, i)g fi {r\ s)) = 2l) l{i S aft ?Hr - r')S(f - s). 

The resulting fluctuating diffusion equation is 

a 
dt 

The quantity Scan be thought of as a fluctuating mass flux. 

Two applications of the fluctuating diffusion equation are made here to illustrate the additional information 
the fluctuations provide over and beyond the deterministic behaviour. Consider an infinite volume with an 
initial concentration, c, that is constant, c Q , everywhere. The solution to the averaged diffusion equation is 
then simply (c) = c Q for all t. However, the two-time correlation function may be shown [26] to be 


r. 1 r |r-r|- 


As the time separation \t-s\ approaches oo the second term in this correlation vanishes and the remaining term 
is the equilibrium density-density correlation formula for an ideal solution. The second possibility is to 
consider a non-equilibrium initial state, c(r, i) = c n S (r). The averaged solution is [ 26 ] 
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i r i 2 I 

whereas the two-time correlation function may be shown after extensive computation [26] to be 

I r Ir+rf 1 

L [ |r "^ |2 L-: f |r " rf|2 11 

" f87r«|r-.T|) 3CXp ["s/J|A -j|J CXp L"s«|/- 1 ?|JJ " 


This co variance function vanishes as \t - s\ approaches oo because the initial density profile has a finite 
integral, that creates a vanishing density when it spreads out over the infinite volume. 

This example illustrates how the Onsager theory may be applied at the macroscopic level in a self-consistent 
manner. The ingredients are the averaged regression equations and the entropy. Together, these quantities 
permit the calculation of the fluctuating force correlation matrix, Q. Diffusion is used here to illustrate the 
procedure in detail because diffusion is the simplest known case exhibiting continuous variables. 

A3.2.4.3 FLUCTUATING HYDRODYNAMICS 

A proposal based on Onsager' s theory was made by Landau and Lifshitz [27] for the fluctuations that should 
be added to the Navier-Stokes hydrodynamic equations. Fluctuating stress tensor and heat flux terms were 
postulated in analogy with the Onsager theory. However, since this is a case where the variables are of mixed 
time reversal character, the 'derivation' was not fully rigorous. This situation was remedied by the derivation 
by Fox and Uhlenbeck [13, 14, 18] based on general stationary Gaussian-Markov processes [12]. The precise 
form of the Landau proposal is confirmed by this approach [14]. 

Let Ap, Au and Ar denote the deviations of the mass density, p, the velocity field, u, and the temperature, T, 
from their full equilibrium values. The fluctuating, linearized Navier-Stokes equations are 

a 

— &p + AqV ■ Au = G 

t\ n T 1 

Aiq^-Aff* + A t(t -^—A/> + B? Ci ^—AT = ^— [2t)AD afl + {$ - f n)AD yY S afl ] 

*< •* "^p /y^3 2 32) 

<i 
PtqCfl, — AT = KV-AT-T^B^V-Au + V-g 


-17- 


in which r\ is the shear viscosity, £, is the bulk viscosity, K is the heat conductivity, the subscript 'eq' denotes 


equilibrium values and ^4 , B and C are defined [14] by 


^i*l 


" iHK *" " (arX C ^ " (^l q 


(A3.2.33) 


in which/? is the pressure and s is the energy per unit mass. D^ is the strain tensor and S.* is the fluctuating 
stress tensor while R a is the fluctuating heat flux vector. These fluctuating terms are Gaussian white noises 
with zero mean and correlations given by 

(&0(r, V< r '> ''» = 2kT n Hr - r)S{i - t') 

The lack of correlation between the fluctuating stress tensor and the fluctuating heat flux in the third 
expression is an example of the Curie principle for the fluctuations. These equations for fluctuating 
hydrodynamics are arrived at by a procedure very similar to that exhibited in the preceding section for 
diffusion. A crucial ingredient is the equation for entropy production in a fluid 

j f s(t) = j d-V [^(vn ■ (vn + i \in&«pb<$ + Mr - \n\ (D aa ) 2 \\ . 

This expression determines the entropy matrix needed for the fluctuation-dissipation relation [ 14 ] used to 
obtain (A3.2.33). 

Three interesting applications of these equations are made here. The first is one of perspective. A fluid in full 
equilibrium will exhibit fluctuations. In fact, these fluctuations are responsible for Rayleigh-Brillouin light 
scattering in fluids [28]. From the light scattering profile of an equilibrium fluid, the viscosities, heat 
conductivity, speed of sound and sound attenuation coefficient can be determined. This is a remarkable 
exhibition of how non-equilibrium properties of the fluid reside in the equilibrium fluctuations. Jerry Gollub 
once posed to the author the question: 'how does a fluid know to make the transition from steady state 
conduction to steady state convection at the threshold of instability in the Rayleigh-Benard system [21]?' The 
answer is that the fluid fluctuations are incessantly testing the stability and nucleate the transition when 
threshold conditions exist. Critical opalescence [28] is a manifestation of this macroscopic influence of the 
fluctuations. 
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The second application is to temperature fluctuations in an equilibrium fluid [18]. Using ( A3. 2. 32 ) and 
( A3. 2. 33 ) the correlation function for temperature deviations is found to be 

<A7>. OAIV, ,'» = JS- (- PkC * V" exp [-^"•-'"H . (A3.2.34) 

ft,C t , \4xK\i-r\) '[ 4*r|i-f| J 

When the two times are identical, the formula simplifies to 


Ml 

{AT(r)AT{r')) = —^U(r-r'). 


Define the temperature fluctuations in a volume Fby 


A7V = ^- f d\AT(r). 


This leads to the well known formula 


kr 2 kr 2 

in which Cy is the ordinary heat capacity since C is the heat capacity per unit mass. This formula can be 
obtained by purely macroscopic thermodynamic arguments [29]. However, the dynamical information in 
(A3. 2. 34) cannot be obtained from equilibrium thermodynamics alone. 

The third application is to velocity field fluctuations. For an equilibrium fluid the velocity field is, on average, 
zero everywhere but it does fluctuate. The correlations turn out to be 

/>«, L L 4 H' - ' I J 

n*a*,L v W>' 2 \t-n l/2 )l\ 


(A3.2.35) 


+ 


in which v = ri /p , the kinematic viscosity and 0(x) is defined by 

eq 


2 f x 
*C0- -t= / dvexp(->- 2 ). 
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When r = r' or for v |t - 1'| 3>|r- r'\, ( A3.2.35 ) simplifies greatly, yielding 

K(r,OM**-r')> = ^— (47rv|r - r'| 3 r 3 %j 

which is indistinguishable from the famous long-time-tail result [30], 
A3.2.4.4 FLUCTUATING BOLTZMANN EQUATION 

Onsager's theory can also be used to determine the form of the fluctuations for the Boltzmann equation [15]. 
Since hydrodynamics can be derived from the Boltzmann equation as a contracted description, a contraction 
of the fluctuating Boltzmann equation determines fluctuations for hydrodynamics. In general, a contraction of 
the description creates a new description which is non-Markovian, i.e. has memory. The Markov 


approximation to the contraction of the fluctuating Boltzmann equation is identical with fluctuating 
hydrodynamics [15]. This is an example of the internal consistency of the Onsager approach. Similarly, it is 
possible to consider the hydrodynamic problem of the motion of a sphere in a fluctuating fluid described by 
fluctuating hydrodynamics (with appropriate boundary conditions). A contraction of this description [ 14 ] 
produces Langevin's equation for Brownian movement. Thus, three levels of description exist in this 
hierarchy: fluctuating Boltzmann equation, fluctuating hydrodynamic equations and Langevin's equation. The 
general theory for such hierarchies of description and their contractions can be found in the book by Keizer 
[31]. 


A3.2.5 LINEAR RESPONSE THEORY 

Linear response theory is an example of a microscopic approach to the foundations of non-equilibrium 
thermodynamics. It requires knowledge of the Hamiltonian for the underlying microscopic description. In 
principle, it produces explicit formulae for the relaxation parameters that make up the Onsager coefficients. In 
reality, these expressions are extremely difficult to evaluate and approximation methods are necessary. 
Nevertheless, they provide a deeper insight into the physics. 

The linear response of a system is determined by the lowest order effect of a perturbation on a dynamical 
system. Formally, this effect can be computed either classically or quantum mechanically in essentially the 
same way. The connection is made by converting quantum mechanical commutators into classical Poisson 
brackets, or vice versa. Suppose that the system is described by Hamiltonian H+ H Qx where H Qx denotes an 
external perturbation that may depend on time and generally does not commute with H. The density matrix 
equation for this situation is given by the Bloch equation [ 32 ] 

a i 

—p = --[// + /W] (A3.2.36) 

tit Ti 


-20- 


where p denotes the density matrix and the square brackets containing two quantities separated by a comma 
denotes a commutator. In the classical limit, the density matrix becomes a phase space distribution,/ of the 
coordinates and conjugate momenta and the Bloch equation becomes Liouville's equation [ 32 ] 

^• f = 2J 4 a 1 T~ " {H + H **> f] (A3.2.37) 

in which the index i labels the different degrees of freedom and the second equality defines the Poisson 
bracket. Both of these equations may be expressed in terms of Liouville operators in the form 

a 

— p = \(L + L^)p (A3.2.38) 

where quantum mechanically these operators are defined by ( A3. 2. 36 ), and classically p means /and the 
operators are defined by (A3. 2. 37) [32]. 

Assuming explicit time dependence inL , (A3. 2. 38) is equivalent to the integral equation 


f is i 


fi(t) = exp[i(f -/o)L]p(f ) + / d.vexp[i(r - t^)L]iL^{s)p(s) 


as is easily proved by ^-differentiation. Note that the exponential of the quantum mechanical Liouville 
operator may be shown to have the action 


cxp[iv L] A — exp — — H A exp - H\ 


in which A denotes an arbitrary operator. This identity is also easily proved by ^-differentiation. 

The usual context for linear response theory is that the system is prepared in the infinite past, t^ — > -go, to be 
in equilibrium with Hamiltonian H and then H is turned on. This means that p(7 Q ) is given by the canonical 
density matrix 

] 
P(h) = Pcq = — exp[-0fl] 

where P = l/kTmdZ= Trace exp [~p//]. Clearly 
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Thus, to first order in L , p (t) is given by 


p(r) = p C q-j[ ds exp ~(t - s)H [ff ex <j). Pcq] exp y{i - s)H 

The classical analogue is 

«*p[if LJfleq = 0^ 

Let i? denote an observable value. Its expectation value at time t is given by 


(A3.2.39) 


J-oo 


} + •". 


Denote the deviation of i? from its equilibrium expectation value by A B = B - Trace (Bp ). From (A3. 2. 39), 
the deviation of the expectation value of B from its equilibrium expectation value, 8 (B), is 

{B) = Ex(B) = Trac^tfpfr))* 

where B(t - s) is the Heisenberg operator solution to the Heisenberg equation of motion 


d l 

— AH = -[// 1 AB[ (A3.2.40) 

at 7i 

The second equality follows from the fact that going from B to A B involves subtracting a c-number from B 
and that c-number can be taken outside the trace. The resulting trace is the trace of a commutator, which 
vanishes. The invariance of the trace to cyclic permutations of the order of the operators is used in the third 
equality. It is straightforward to write 

S(B) = -^j Jv Trace (fiexp \~(1 - *)ffl [^{-O^cql^p j(t -s)hY\ 

= -^f d.vTraceUz?ex P r-^(/ -s)lA [//«(*), A*] exp \^{t -*)#]) 

= -i j' d.i Trace (cxp U-(i - .Owl AH cxp [-i(/ - s)ffl [tf„(.v). p LH| ] j 

= -i / d.i -Tracc( A fi(f - .s)[// tt Cv), Pcq ]) 
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The transition from H to AH inside the commutator is allowed since Trace (H p ) is a c-number and 
ex ex v ex r eq y 

commutes with any operator. Thus, the final expression is [ 32 ] 

Tmce{AB{t - s)[H QX k), p^]) = Trace{[Atf(r - *)• H^is)]^). 

If the external perturbation is turned on with a time dependent function F(t) and H Qx takes the form AF(t) 
where A is a time independent operator (or H is the sum of such terms), then 

&{B(t)) = -1 f d*Trace([AB(f - j), AH^U)]p^)- 
which defines the linear response function Q> BA (t - s). This quantity may be written compactly as 

*iu(0 = -^([Afl(O.AA]) lH| . 

An identical expression holds classically [ 32 ] if -ilh times the commutator is replaced by the classical Poisson 
bracket. 

The Heisenberg equation of motion, ( A3. 2.40 ), may be recast for imaginary times t = -ihX as 
with the solution 


AM = exp[A//]-4 exp[-Atf], 

Therefore, the Kubo identity [32] follows 
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exp[-£//] / dAe.Kp[A//][/t t //]exp[-A//] 
Jn 


= csp[-/*HJjf d^cxp[X//J^-^,^cxpL-J^J 

= *xp[-$If] f dWI*Kp[/M]Acxp[-kfI] - ttp[ktl]Amp[-}JI]H) 

= exp[-/J H] [ d).[H. cxp[kH]A cxp[-A/yj] 

= cxpl-fiH] f dA^-(cxp[>.W]Aexp[-A//)) 

= exp[-0WJ(exp[/*tf]/Uxp[-/*W] - A) 

= Auxp[-PH) -uxp[-pH)A = [ii.cxpf-^tf]]. 


Therefore, 


(A3.2.41) 


Using the Heisenberg equation of motion, ( A3. 2.40 ), the commutator in the last expression may be replaced 
by the time-derivative operator 

\7i — AA = [AA.H]. 
at 

This converts (A3.2.41) into 
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where the time derivative of A A is evaluated at t = 0. The quantity kT<& BA (t) is called the canonical 

correlation of d/dt A A and A i? [32]. It is invariant under time translation by exp[-i//x/ft] because both p^ n 

eq 

and Qxp[±XH] commute with this time evolution operator and the trace operation is invariant to cyclic 

permutations of the product of operators upon which it acts. Thus 


(A3.2.42) 


**>i(/ + t) = Trace f/^exp - y Hr \&H(t) espUtfr / dAcxp W (>. - M f — Art) 
= Tracefcxpr^//rl^,expr-^Hrli/;(/)y dXexp[w^l«p[« f k - £*}1 

Consider the canonical correlation of A A and A B, C (A A, A B), defined by 

C{AA{Q). ABU)) = kT(AB(t) f dk e\p[k H] A A (0) exp[-XH]} CCi . 

JO 

The analysis used for (A3. 2.42) implies 

C{AA(x), Aii{f + r)) = C(AA{0). AB(t)) 

which means that this correlation is independent of x, i.e. stationary. Taking the x-derivative implies 


C ( — AA(0). AB{t)\ + C ( AA(0). — ABtt) J = 


This is equivalent to 


Vfi A U) = Uliit) j dkexp[\fi] f^A/l(0)J exp[->,//]| 

= -/[ — AS(/))/ dkcxp[kM]AA(Q)cxp[-kH]\ . 

In different applications, one or the other of these two equivalent expressions may prove useful. 


(A3.2.43) 
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As an example, let B be the current J { corresponding to the displacement A { appearing in H . Clearly 

dt h 

Because this current is given by a commutator, its equilibrium expectation value is zero. Using the first 
expression in ( A3. 2.43 ), the response function is given by 

4>ij(i)=U(i)[ dXexp[AW]/j(0)exp[-^ff]J . (A3 .2.44) 


For a periodic perturbation, 8 (AB(t)) is also periodic. The complex admittance [ 30 ] is given by 


xw(0= / d/* ffA »«"- 
Jo 

For the case of a current as in (A3. 2. 44) the result is the Kubo formula [32] for the complex conductivity 
u u {to)= f dtc^tJiti) f dJUxp[Atf]Jy(Q)cxp[-J.W]) . 

Several explicit applications of these relations may be found in the books by Kubo et al [32] and by 
McLennan [33]. 

There are other techniques leading to results closely related to Kubo's formula for the conductivity 
coefficient. Notable among them is the Mori-Zwanzig theory [34, 35] based on projection operator techniques 
and yielding the generalized Langevin equation [18]. The formula for the conductivity coefficient is an 
example of the general formula for relaxation parameters, the Green-Kubo formula [36, 37]. The examples of 
Green-Kubo formulae for viscosity, thermal conduction and diffusion are in the book by McLennan [33]. 


A3.2.6 PROSPECTS 

The current frontiers for the subject of non-equilibrium thermodynamics are rich and active. Two areas 
dominate interest: non-linear effects and molecular bioenergetics. The linearization step used in the near 
equilibrium regime is inappropriate far from equilibrium. Progress with a microscopic kinetic theory [38] for 
non-linear fluctuation phenomena has been made. Careful experiments [39] confirm this theory. Non- 
equilibrium long range correlations play an important role in some of the light scattering effects in fluids in 
far from equilibrium states [38, 39]. 
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The role of non-equilibrium thermodynamics in molecular bioenergetics has experienced an experimental 
revolution during the last 35 years. Membrane energetics is now understood in terms of chemiosmosis [40], In 
chemiosmosis, a trans-membrane electrochemical potential energetically couples the oxidation-reduction 
energy generated during catabolism to the adenosine triphosphate (ATP) energy needed for chemosynthesis 
during anabolism. Numerous advances in experimental technology have opened up whole new areas of 
exploration [41]. Quantitative analysis using non-equilibrium thermodynamics to account for the free energy 
and entropy changes works accurately in a variety of settings. There is a rich diversity of problems to be 
worked on in this area. Another biological application brings the subject back to its foundations. Rectified 
Brownian movement (involving a Brownian ratchet) is being invoked as the mechanism behind many 
macromolecular processes [42]. It may even explain the dynamics of actin and myosin interactions in muscle 
fibres [43]. In rectified Brownian movement, metabolic free energy generated during catabolism is used to 
bias boundary conditions for ordinary diffusion, thereby producing a non-zero flux. In this way, thermal 
fluctuations give the molecular mechanisms of cellular processes their vitality [44]. 
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A 3.3 Dynamics in condensed phase (including 
nucleation) 


Rashmi C Desai 


A3.3.1 INTRODUCTION 

Radiation probes such as neutrons, x-rays and visible light are used to 'see' the structure of physical systems 
through elastic scattering experiments. Inelastic scattering experiments measure both the structural and 
dynamical correlations that exist in a physical system. For a system which is in thermodynamic equilibrium, 
the molecular dynamics create spatio-temporal correlations which are the manifestation of thermal 
fluctuations around the equilibrium state. For a condensed phase system, dynamical correlations are intimately 
linked to its structure. For systems in equilibrium, linear response theory is an appropriate framework to use to 
inquire on the spatio-temporal correlations resulting from thermodynamic fluctuations. Appropriate response 
and correlation functions emerge naturally in this framework, and the role of theory is to understand these 
correlation functions from first principles. This is the subject of section A3. 3. 2 . 

A system of interest may be macroscopically homogeneous or inhomogeneous. The inhomogeneity may arise 
on account of interfaces between coexisting phases in a system or due to the system's finite size and 
proximity to its external surface. Near the surfaces and interfaces, the system's translational symmetry is 
broken; this has important consequences. The spatial structure of an inhomogeneous system is its average 
equilibrium property and has to be incorporated in the overall theoretical structure, in order to study spatio- 
temporal correlations due to thermal fluctuations around an inhomogeneous spatial profile. This is also 
illustrated in section A3. 3. 2 . 

Another possibility is that a system may be held in a constrained equilibrium by external forces and thus be in 
a non-equilibrium steady state (NESS). In this case, the spatio-temporal correlations contain new ingredients, 
which are also exemplified in section A3. 3.2 . 

There are also important instances when the system in neither in equilibrium nor in a steady state, but is 
actually evolving in time. This happens, for example, when a binary homogeneous mixture at high 
temperature is suddenly quenched to a low-temperature non-equilibrium state in the middle of the coexistence 
region of the mixture. Following the quench, the mixture may be in a metastable state or in an unstable state, 
as defined within a mean field description. The subsequent dynamical evolution of the system, which follows 
an initial thermodynamic metastability or instability, and the associated kinetics, is a rich subject involving 
many fundamental questions, some of which are yet to be fully answered. The kinetics of thermodynamically 
unstable systems and phenomena like spinodal decomposition are treated in section A3. 3. 3 , after some 
introductory remarks. The late-stage kinetics of domain growth is discussed in section A3. 3.4 . The discussion 
in this section is applicable to late-stage growth regardless of whether the initial post-quench state was 
thermodynamically unstable or metastable. The study of metastable states is connected with the subject of 


nucleation and subsequent growth kinetics (treated in section A3. 3. 4 ). Homogeneous nucleation is the subject 
of section A3. 3. 5 . As will be clear from section A3. 3.3. L , the distinction between the spinodal decomposition 
and nucleation is not sharp. Growth morphology with apparent nucleation characteristics can occur when 
post-quench states are within the classical spinodal, except when the binary mixture is symmetric. 


The specific examples chosen in this section, to illustrate the dynamics in condensed phases for the variety of 
system-specific situations outlined above, correspond to long-wavelength and low-frequency phenomena. In 
such cases, conservation laws and broken symmetry play important roles in the dynamics, and a macroscopic 
hydrodynamic description is either adequate or is amenable to an appropriate generalization. There are other 
examples where short- wavelength and/or high-frequency behaviour is evident. If this is the case, one would 
require a more microscopic description. For fluid systems which are the focus of this section, such 
descriptions may involve a kinetic theory of dense fluids or generalized hydrodynamics which may be linear 
or may involve nonlinear mode coupling. Such microscopic descriptions are not considered in this section. 


A3.3.2 EQUILIBRIUM SYSTEMS: THERMAL FLUCTUATIONS AND 
SPATIO-TEMPORAL CORRELATIONS 

In this section, we consider systems in thermodynamic equilibrium. Even though the system is in equilibrium, 
molecular constituents are in constant motion. We inquire into the nature of the thermodynamic fluctuations 
which have at their root the molecular dynamics. The space-time correlations that occur on account of 
thermodynamic fluctuations can be probed through inelastic scattering experiments, and the range of space 
and time scales explored depends on the wavenumber and frequency of the probe radiation. We illustrate this 
by using inelastic light scattering from dense fluids. Electromagnetic radiation couples to matter through its 
dielectric fluctuations. Consider a non-magnetic, non-conducting and non-absorbing medium with the average 
dielectric constant e Q . Let the incident electric field be a plane wave of the form 

where H^ is a unit vector in the direction of the incident field, E is the field amplitude, j^ is the incident 
wavevector and w- is the incident angular frequency. The plane wave is incident upon a medium with local 
dielectric function e(r,t)=e o I+8e(F,t), where 8e(r,t) is the dielectric tensor fluctuation at position j^and time t, 
and I is a unit second-rank tensor. Basic light scattering theory can be used to find the inelastically scattered 
light spectrum. If the scattered field at the detector is also in the direction J^ (i.e. fl^=fl^ for a polarized light 
scattering experiment), the scattered wavevector is j£f = £ - jfand the scattered frequency is Wr=w-- w, then, 

apart from some known constant factors that depend on the geometry of the experiment and the incident field, 
the inelastically scattered light intensity is proportional to the spectral density of the local dielectric 
fluctuations. If the medium is isotropic and made up of spherically symmetrical molecules, then the dielectric 
tensor is proportional to the unit tensor I: s(r,t)=[e o +8s(r,t)]I. From the dielectric equation of state £ = £ 

(p o ,r o ), one can proceed to obtain the local dielectric fluctuation as 8E(P 9 t)=(d&/dp)j5p(r,t) +(ds/dT) 8T(r,t). 

In many simple fluids, it is experimentally found that the thermodynamic derivative (ds/dT) is approximately 
zero. One then has a simple result that 

/Ji t w) - ( — J S pp (k, w) (A3.3.1) 


where S 00 (k,w) is the spectrum of density fluctuations in the simple fluid system. S 00 (k,w) is the space-time 
Fourier transform of the density-density correlation function S (r,t;? f ,t')=(5p(?,t)5p (?'/))• 

Depending on the type of scattering probe and the scattering geometry, other experiments can probe other 
similar correlation functions. Elastic scattering experiments effectively measure frequency integrated spectra 
and, hence, probe only the space-dependent static structure of a system. Electron scattering experiments probe 
charge density correlations, and magnetic neutron scattering experiments the spin density correlations. 
Inelastic thermal neutron scattering from a non-magnetic system is a sharper probe of density-density 
correlations in a system but, due to the shorter wavelengths and higher frequencies involved, these results are 
complementary to those obtained from inelastic polarized light scattering experiments. The latter provide 
space-time correlations in the long-wavelength hydrodynamic regime. 

In order to analyse results from such experiments, it is appropriate to consider a general framework, linear 
response theory, which is useful whenever the probe radiation weakly couples to the system. The linear 
response framework is also convenient for utilizing various symmetry and analyticity properties of correlation 
functions and response functions, thereby reducing the general problem to determining quantities which are 
amenable to approximations in such a way that the symmetry and analyticity properties are left intact. Such 
approximations are necessary in order to avoid the full complexity of many -body dynamics. The central 
quantity in the linear response theory is the response function. It is related to the corresponding correlation 
function (typically obtained from experimental measurements) through a fluctuation dissipation theorem. In 
the next section, section A3.3.2.1, we discuss only the subset of necessary results from the linear response 
theory, which is described in detail in the book by Forster (see Further Readin g). 

A3.3.2.1 LINEAR RESPONSE THEORY 

Consider a set of physical observables \A;(7i f )|. If a small external field &a^(? m /fcouples to the observable 
Ap then in presence of a set of small external fields {5a t }, the Hamiltonian H of a system is perturbed to 


«(0= H -^ I drAi{r)&tf*\r s t) 


(A3.3.2) 


in a Schrodinger representation. One can use time-dependent perturbation theory to find the linear response of 
the system to the small external fields. If the system is in equilibrium at time t = -co, and is evolved under H 
(t), the effect on A-(r,t) which is 5( A .) = ( A r )_____ -(A .) can be calculated to first order in external fields. 

i 11 houccj i eel 

The result is (causality dictates the upper limit of time integration to i) 




(A3.3.3) 


where the response function (matrix) is given by 


Xijirj-r'. /') = xj(r -r'.t- t') = (JglMr, t). Aj(r\ t')]\ (A3.3.4) 


in a translationally invariant system. Note that the response function is an equilibrium property of the system 
with Hamiltonian H, independent of the small external fields {S^.}. In the classical limit (see section A2.2.3 ) 
the quantum mechanical commutator becomes the classical Poisson bracket and the response function reduces 

tO<(i/2)[^.(?,0,^(?'/)]p.B> 


Since typical scattering experiments probe the system fluctuations in the frequency-wavenumber space, the 
Fourier transform x- J {jfc. U7)is closer to measurements, which is in fact the imaginary (dissipative) part of the 

response function (matrix) defined as 

Xti( l z) ^(^J^l < Im ^ ). (A3.3„ 

J 7T W — I 

The real part of x^(£. w), x r 'i s the dispersive (reactive) part of %.., and the definition of %.. implies a relation 
between X^and Jf/;which is known as the Kramers-Kronig relation. 

The response function X/JC^ 0, which is defined in equation (A3. 3. 4), is related to the corresponding 
correlation function, %(ft Othrough the fluctuation dissipation theorem: 

X-jil ™) = ^r(l - C-^)S t j(l w). (A3.3.6) 

The fluctuation dissipation theorem relates the dissipative part of the response function (%") to the correlation 
of fluctuations (A J, for any system in thermal equilibrium. The left-hand side describes the dissipative 
behaviour of a many-body system: all or part of the work done by the external forces is irreversibly 
distributed into the infinitely many degrees of freedom of the thermal system. The correlation function on the 
right-hand side describes the manner in which a fluctuation arising spontaneously in a system in thermal 
equilibrium, even in the absence of external forces, may dissipate in time. In the classical limit, the fluctuation 
dissipation theorem becomes ^(*,i«) = (Miii^tt, m). 

There are two generic types of external fields that are of general interest. In one of these, which relates to the 
scattering experiments, the external fields are to be taken as periodic perturbations 


faf^r, t) = Sa f {7) exrf-fo -lw)t] 


where r| is an infinitesimally small negative constant, and 8aJ(jr) can also be a periodic variation in ?, as in the 
case for incident plane wave electromagnetic radiation considered earlier. 

In the other class of experiments, the system, in equilibrium att = -oo, is adiabatically perturbed to a non- 
equilibrium state which gets fully switched on by t = 0, through the field, Saf^F, t) = Sa ( (?)v*\t < 0, with s 

an infinitesimally small positive constant. At t = the external field is turned off, and the system so prepared 
in a non-equilibrium state will, if left to itself, relax back to equilibrium. This is the generic relaxation 
experiment during which the decay of the initial (t = 0) value is measured. Such an external field will produce, 
at t = 0, spatially varying initial values 8( AJ(r 9 t = 0)) whose spatial Fourier transforms are given by 


&{AAk, t = 0)) = J^ XijihSctjih (A3.3.7) 


where 


_ fdw *;;(*. «?) 


- fdu 


(A3.3.8) 

Lf. 1 


If 8 a-(F) is slowly varying in space, the long-wavelength limit %..(k — » 0) reduces to a set of static 

susceptibilities or thermodynamic derivatives. Now, since for t > the external fields are zero, it is useful to 
evaluate the one-sided transform 

&(Ai}{tz)= f dtc^${A f (tt)) 
Jo 

= r- £><*. t)x~ 3 «) - l]y ({Ajft 0)). 

IZ I (A3.3.9) 

The second equality is obtained using the form of the external field 5o^ c (F, ^specific to the relaxation 

experiments. The last equality is to be read as a matrix equation. The system stability leads to the positivity of 

all susceptibilities %(A), so that its inverse exists. This last equality is superior to the second one, since the 

external fields 8a. have been eliminated in favour of the initial values 8(^4 .(k,t=Q)), which are directly 
i j 

measurable in a relaxation experiment. It is then possible to analyse the relaxation experiments by obtaining 
the measurements for positive times and comparing them to 8(AJ(k,t)) as evaluated in terms of the initial 

values using some approximate model for the dynamics of the system's evolution. One such model is a linear 
hydrodynamic description. 


A3.3.2.2 FLUCTUATIONS IN THE HYDRODYNAMIC DOMAIN 

We start with a simple example: the decay of concentration fluctuations in a binary mixture which is in 
equilibrium. Let 8C(? \t)=C(f \i) - C Q be the concentration fluctuation field in the system where C Q is the mean 

concentration. C is a conserved variable and thus satisfies a continuity equation: 

(A3.3.10) 

where a phenomenological linear constitutive relation relates the concentration flux to the gradient of the 
local chemical potential \i(,f) as follows: 

l(r,l)=-LVn(r T T). ( A3 - 3 - 11 ) 

Here L is the Onsager coefficient and the minus sign (-) indicates that the concentration flow occurs from 
regions of high \i to low \i in order that the system irreversibly flows towards the equilibrium state of a 


uniform chemical potential. In a system slightly away from equilibrium, the dependence of |u on the 
thermodynamic state variables, concentration, pressure and temperature (C,p,T), would, in general, relate 
changes in the chemical potential like V|u to VC, Vp and Vr. However, for most systems the thermodynamic 
derivatives d\i/dp and d\ildT&VQ small, and one has, to a good approximation, V|u = (d\ildC) jVSC. This 

linear approximation is not always valid; however, it is valid for the thermodynamic fluctuations in a binary 
mixture at equilibrium. It enables us to thus construct a closed linear equation for the concentration 
fluctuations, the diffusion equation: 

— = inHc ( A3 - 3 - 12 ) 

where D=L(d\ildC) ^is the diffusion coefficient, which we assume to be a constant. The diffusion equation 
is an example of a hydrodynamic equation. The characteristic ingredients of a hydrodynamic equation are a 
conservation law and a linear transport law. 

The solutions of such partial differential equations require information on the spatial boundary conditions and 
initial conditions. Suppose we have an infinite system in which the concentration fluctuations vanish at the 
infinite boundary. If, at t = we have a fluctuation at origin 8C(?,0) = AC Q 8(r), then the diffusion equation 

can be solved using the spatial Fourier transforms. The solution in Fourier space is C(k,t) = Qxp(-DIrt)AC ' 
which can be inverted analytically since the Fourier transform of a Gaussian is a Gaussian. In real space, the 
initial fluctuation decays in time in a manner such that the initial delta function fluctuation broadens to a 

Gaussian whose width increases in time as (Dt) for a d-dimensional system, while the area under the 
Gaussian remains equal to AC Q due to the conservation law. Linear hydrodynamics are not always valid. For 
this example, near the consolute (critical) point of the mixture, the concentration fluctuations nonlinearly 
couple to transverse velocity modes and qualitatively change the result. Away from the critical point, 
however, the above, simple analysis illustrates the manner in which the thermodynamic 


fluctuations decay in the hydrodynamic (i.e. long-wavelength) regime. The diffusion equation and its 
solutions constitute a rich subject with deep connections to brownian motion theory £1}: both form a paradigm 
for many other models of dynamics in which diffusion-like decay and damping play important roles. 

In dense systems like liquids, the molecular description has a large number of degrees of freedom. There are, 
however, a few collective degrees of freedom, collective modes, which when perturbed through a fluctuation, 
relax to equilibrium very slowly, i.e. with a characteristic decay time that is long compared to the molecular 
interaction time. These modes involve a large number of particles and their relaxation time is proportional to 
the square of their characteristic wavelength, which is large compared to the intermolecular separation. 
Hydrodynamics is suitable to describe the dynamics of such long-wavelength, slowly-relaxing modes. 

In a hydrodynamic description, the fluid is considered as a continuous medium which is locally homogeneous 
and isotropic, with dissipation occurring through viscous friction and thermal conduction. For a one- 
component system, the hydrodynamic (collective) variables are deduced from conservation laws and broken 
symmetry. We first consider (section A3. 3. 2. 3) the example of a Rayleigh-Brillouin spectrum of a one- 
component monatomic fluid. Here conservation laws play the important role. In the next example ( section 
A3. 3. 2.4 ), we use a fluctuating hydrodynamic description for capillary waves at a liquid-vapour interface 
where broken symmetry plays an important role. A significant understanding of underlying phenomena for 
each of these examples has been obtained using linear hydrodynamics [2], even though, in principle, nonlinear 
dynamical aspects are within the exact dynamics of these systems. 


In the next section we discuss linear hydrodynamics and its role in understanding the inelastic light scattering 
experiments from liquids, by calculating the density-density correlation function, S . 

A3.3.2.3 RAYLEIGH-BRILLOUIN SPECTRUM 

The three conservation laws of mass, momentum and energy play a central role in the hydrodynamic 
description. For a one-component system, these are the only hydrodynamic variables. The mass density has an 
interesting feature in the associated continuity equation: the mass current (flux) is the momentum density and 
thus itself is conserved, in the absence of external forces. The mass density p(r,t) satisfies a continuity 
equation which can be expressed in the form (see, for example, the book on fluid mechanics by Landau and 
Lifshitz, cited in the Further Reading) 


I— + » Vjj£>= -pVv, 


(A3.3.13) 


The equation of momentum conservation, along with the linear transport law due to Newton, which relates the 
dissipative stress tensor to the rate of strain tensor ^ — I ( ^. ^ 4, ^. y . ) t and which introduces two 

transport coefficients, shear viscosity r| and bulk viscosity r| b , lead to the equation of motion for a Newtonian 
fluid: 


{ — + v V]v = --V/? + v¥ 2 v + {Vf - r)7(V v) 


(A3.3.14) 


where the kinematic viscosity v = r|/p and the kinematic longitudinal viscosity vj = (^ij + rf^/p- 

The energy conservation law also leads to an associated continuity equation for the total energy density. The 
total energy density contains both the kinetic energy density per unit volume and the internal energy density. 
The energy flux is made up of four terms: a kinematic term, the rates of work done by reversible pressure and 
dissipative viscous stress, and a dissipative heat flux. It is the dissipative heat flux that is assumed to be 
proportional to the temperature gradient and this linear transport law, Fourier's law, introduces as a 
proportionality coefficient, the coefficient of thermal conductivity, k. From the resulting energy equation, one 
can obtain the equation for the rate of entropy balance in the system, which on account of the irreversibility 
and the arrow of time implied by the Second Law of Thermodynamics leads to the result that each of the 
transport coefficients r|, r| b and k is a positive definite quantity. Using the mass conservation equation 
(A3.3.13) , and thermodynamic relations which relate entropy change to changes in density and temperature, 
the entropy balance equation can be transformed to the hydrodynamic equation for the local temperature T( 
?,t): 


I — + i}- V] T = -a l [y - l)V v+{pC v ) '[V-(irVn 

\dt / , , (A3.3.15) 

where a is the thermal expansion coefficient, y = C JCy, C the heat capacity per unit mass at constant 
pressure and C^the same at constant volume. e jk is the rate of the strain tensor defined above, and a repeated 


subscript implies a summation over that subscript, here and below. 

The three equation (A3.3.13) , equation (A3. 3. 14) and equation (A3. 3. 15) are a useful starting point in many 
hydrodynamic problems. We now apply them to compute the density-density correlation function 

$ PP (r< r; r\ t') = (Sp{r, i)Sp(r\ /')}. (A3.3.16) 

Since the fluctuations are small, it is appropriate to linearize the three equations in 5p(iV) = p - p o , 8T(r,t) = 

T-T Q and v{r,t) = v, by expanding around their respective equilibrium values p o , T Q and zero, where we 

assume that the mean fluid velocity is zero. The linearization eliminates the advective term if-V etc from each 
of the three equations, and also removes the bilinear viscous dissipation terms from the temperature equation 
(A3.3.15). The Vp term in the velocity equation (A3. 3. 14) can be expressed in terms of density and 
temperature gradients using thermodynamic derivative identities: 

MS,[M£),* r ]-7F w * ir ] 

where {dp I dp) j = c}= c /y with c T and c the isothermal and adiabatic speeds of sound, respectively. The 
momentum equation then linearizes to 


-^ + — [Vip + p crV5T| - vV 2 v - (vs - v)V(V ■ V) = 0. (A3.3.17) 

The linearized equations for density and temperature are: 

— i--h Aj V^w=0 (A3.3.18) 

at 

and 

ti&T 

— — +a"'(y - l)V -v-yDtVHt = 0. (A3.3.19) 

df 

Here the thermal diffiisivity D T = k/(p q C ). These two equations couple only to the longitudinal part W = V-S 
of the fluid velocity. From equation (A3.3.17) it is easy to see that W satisfies 

+ L^ 2 *P + p^aV 2 ST\ - VjV 2 *. (A3.3.20) 

Out of the five hydrodynamic modes, the polarized inelastic light scattering experiment can probe only the 
three modes represented by equation (A3.3.18), equation (A3.3.19) and equation (A3. 3. 20). The other two 
modes, which are in equation (A3.3.17), decouple from the density fluctuations; these are due to transverse 


velocity components which is the vorticity jj= Vx v. Vorticity fluctuations decay in a manner analogous to 
that of the concentration fluctuations discussed in section A3. 3. 2. 2 , if one considers the vorticity fluctuation 
Fourier mode of the wavevector k. Then the correlations of the Mi Fourier mode of vorticity also decays in an 
exponential manner with the form exp(-vA^). 

The density fluctuation spectrum can be obtained by taking a spatial Fourier transform and a temporal Laplace 
transform of the three coupled equation (A3.3.18), equation (A3. 3. 19) and equation (A3. 3. 20), and then 
solving the resulting linear coupled algebraic set for the density fluctuation spectrum. (See details in the books 
by Berne-Pecora and Boon-Yip.) The result for S (k,w) given below is proportional to its frequency integral 

S (k) which is the liquid structure factor discussed earlier in section A2. 2. 5. 2 . The density fluctuation 


spectrum is 


S Pfi (k) 

= 2 Ke Mm 


_ y - 1 2D r k 2 I [ Tk 2 Tk 2 

y w 2 + (D T k 2 ) 2 y \_(w ±ck) 2 + (Vk 2 ) 2 (w - ck) 2 + {rk 2 ) 2 


1 k [ (w + cJt) (w — ck) 

i fF \ tx' 1\ D-j-I 


j/ 1 y } n c L(m.i + ck) 2 + {rk 2 ) 2 (w-ck) 2 + {rk 1 ) 2 _ 


(A3.3.21] 
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where T = I[v 1 + (y - l)D T ]. 

This is the result for monatomic fluids and is well approximated by a sum of three Lorentzians, as given by 
the first three terms on the right-hand side. The physics of these three Lorentzians can be understood by 
thinking about a local density fluctuation as made up of thermodynamically independent entropy and pressure 
fluctuations: p = p(^). The first term is a consequence of the thermal processes quantified by the entropy 

fluctuations at constant pressure, which lead to the decaying mode [(y - l)/y]exp[-Z)^|t|] and the associated 
Lorentzian known as the Rayleigh peak is centred at zero frequency with a half-width at half-maximum of 

DjJr. The next two terms (Lorentzians) arise from the mechanical part of the density fluctuations, the 
pressure fluctuations at constant entropy. These are the adiabatic sound modes (l/y^xpf-rA^IJcosfco^ltl] 
with co(k) = ±ck, and lead to the two spectral lines (Lorentzians) which are shifted in frequency by -ck (Stokes 
line) and +ck (anti-Stokes line). These are known as the Brillouin-Mandelstam doublet. The half-width at 

half-maximum of this pair is Tic which gives the attenuation of acoustic modes. In dense liquids, the last two 
terms in the density fluctuation spectrum above are smaller by orders of magnitude compared to the three 
Lorentzians, and lead to s-shaped curves centred at w = ±ck. They cause a weak asymmetry in the Brillouin 
peaks which induces a slight pulling of their position towards the central Rayleigh peak. The Rayleigh- 
Brillouin spectrum from liquid argon, as measured by an inelastic polarized light scattering experiment, is 
shown in figure A3. 3.1. An accurate measurement of the Rayleigh-Brillouin lineshape can be used to measure 
many of the thermodynamic and transport properties of a fluid. The ratio of the integrated intensity of the 
Rayleigh peak to those of the Brillouin peaks, known as the Landau-Placzek ratio, is (7 R )/(2/ B ) = (y - 1), and 
directly measures the ratio of specific heats y. From the position of the Brillouin peaks one can obtain the 
adiabatic speed of sound c, and knowing y and c one can infer isothermal compressibility. From the width of 
the Rayleigh peak, one can obtain thermal diffusivity (and if C is known, the thermal conductivity k). Then 
from the width of the Brillouin peaks, one can obtain the longitudinal viscosity (and, if shear viscosity is 
known, the bulk viscosity). 
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Figure A3.3.1 Rayleigh-Brillouin spectrum from liquid argon, taken from [4]. 

A large variety of scattering experiments (inelastic light scattering using polarized and depolarized set ups, 
Raman scattering, inelastic neutron scattering) have been used over the past four decades to probe and 
understand the spatio-temporal correlations and molecular dynamics in monatomic and polyatomic fluids in 
equilibrium, spanning the density range from low-density gases to dense liquids. In the same fashion, 
concentration fluctuations in binary mixtures have also been probed. See [3, 4, 5, 6, 7 and 8] for further 
reading for these topics. 
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In the next section, we consider thermal fluctuations in an inhomogeneous system. 
A3.3.2.4 CAPILLARY WAVES 

In this section we discuss the frequency spectrum of excitations on a liquid surface. While we used linearized 
equations of hydrodynamics in the last section to obtain the density fluctuation spectrum in the bulk of a 
homogeneous fluid, here we use linear fluctuating hydrodynamics to derive an equation of motion for the 
instantaneous position of the interface. We then use this equation to analyse the fluctuations in such an 
inhomogeneous system, around equilibrium and around a NESS characterized by a small temperature 
gradient. More details can be found in [9, 10 ]. 

Surface waves at an interface between two immiscible fluids involve effects due to gravity (g) and surface 
tension (a) forces. (In this section, a denotes surface tension and o jk denotes the stress tensor. The two should 
not be confused with one another.) In a hydrodynamic approach, the interface is treated as a sharp boundary 
and the two bulk phases as incompressible. The Navier-Stokes equations for the two bulk phases (balance of 
macroscopic forces is the ingredient) along with the boundary condition at the interface (surface tension a 
enters here) are solved for possible harmonic oscillations of the interface of the form, exp [-(iw + s)t + i}5\J], 
where w is the frequency, s is the damping coefficient, */is the 2- d wavevector of the periodic oscillation and 
.ra 2-d vector parallel to the surface. For a liquid-vapour interface which we consider, away from the critical 
point, the vapour density is negligible compared to the liquid density and one obtains the hydrodynamic 
dispersion relation for surface waves ur = (nfpu}(j* + £*y- The term gq in the dispersion relation arises from 

the gravity waves, and dominates for macroscopic wavelengths, but becomes negligible for wavelengths 

shorter than the capillary constant (2a/gp o ) , which is of the order of a few millimetres for water. In what 
follows we discuss phenomena at a planar interface (for which g is essential), but restrict ourselves to the 

capillary waves regime and set g = + . Capillary wave dispersion is then w t .(q) = I ^} f q- ? and the 


damping coefficient z(q) = (2r\/p )q . Consider a system of coexisting liquid and vapour contained in a 

~ o 

cubical box of volume Z . An external, infinitesimal gravitational field locates the liquid of density pj in the 
region z < -%, while the vapour of lower density p v is in the region z > %. A flat surface, of thickness 2%, is 
located about z = in the .?= (x,y) plane. The origin of the z axis is defined in accord with Gibbs' prescription: 

/U fL/2 

{P(z)-Pi\dz+ / [pU)-p T |dz = (A3.3.22) 

where p(z) is the equilibrium density profile of the inhomogeneous system. Let us first consider the system in 
equilibrium. Let it also be away from the critical point. Then p v ^p 1 and the interface thickness is only a few 

nanometres, and a model with zero interfacial width and a step function profile (Fowler model) is appropriate. 
Also, since the speed of sound is much larger than the capillary wave speed we can assume the liquid to be 
incompressible, which implies a constant p in the liquid and, due to the mass continuity equation ( equation 
(A3. 3. 13) ), also implies y, v = Furthermore, if the amplitude of the capillary waves is small, the nonlinear 
convective (advective) term ([5,V [> )can also be ignored in (A3. 3. 14) . The approach of fluctuating 
hydrodynamics corresponds to having additional Gaussian random stress-tensor fluctuations in the Newtonian 
transport law and analogous heat flux fluctuations in the Fourier transport law. These fluctuations arise from 
those short lifetime degrees of freedom that are not included in a hydrodynamic description, a description 
based only on long-lifetime conserved hydrodynamic variables. 
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The equations of motion for the bulk fluid for z < are: 

3v x Bv 


+ — - = (A3.3.23) 

dx dz 


which is the continuity equation, and 




$X p p\Ax- no) te\p) $z\p) 
dt dzp p\^ az*} - i±\ P J n.x\ p J 

These are the two components of the Navier-Stokes equation including fluctuations 5.., which obey the 
fluctuation dissipation theorem, valid for incompressible, classical fluids: 

(A3. 3. 24) 

8(x-?)8{z-z')Ht-t r ). 

This second moment of the fluctuations around equilibrium also defines the form of ensemble <— ) for the 
equilibrium average at temperature T. 

Surface properties enter through the Young-Laplace equation of state for the 'surface pressure' P : 


/> ur = -<7-^ al Z = 0. (A3.3.25) 

ux- 

The non-conserved variable C,(x,i) is a broken symmetry variable', it is the instantaneous position of the Gibbs' 
surface, and it is the translational symmetry in z direction that is broken by the inhomogeneity due to the 
liquid-vapour interface. In a more microscopic statistical mechanical approach [91, it is related to the number 
density fluctuation 8p(x,z,t) as 

$(Xj)~(ft-p v y l J izSfi{x t Z f t) (A3.3.26) 

but in the present hydrodynamic approach it is defined by 

-^ = V z at z — 0. (A3.3.27) 
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The boundary conditions at the z=0 surface arise from the mechanical equilibrium, which implies that both 
the normal and tangential forces are balanced there. This leads to 


p = -<r ^ + 2 n ^ + s zz at Z = (A3.3.28) 


iix 1 dz 


IT* It)- -"-- 


at z = (l (A3.3.29) 


If the surface tension is a function of position, then there is an additional term, daldx, to the right-hand side in 
the last equation. From the above description it can be shown that the equation of motion for the Fourier 
component C$J) of the broken symmetry variable £ is 




+ 2d<f>^^- + u^Kiij.r) = -- dz&'Unfaz, 0-**A^ z, t) 


dj2 ^ d( b«.„ w ., ^ ^ - i v.vi-w «v^-* (A3.330) 

where s(g) and wjq) are the damping coefficient and dispersion relation for the capillary waves defined 
earlier. This damped driven harmonic oscillator equation is driven by spontaneous thermal fluctuations and is 
valid in the small viscosity limit. It does not have any special capillary wave fluctuations. The thermal random 

force fluctuations s f . are in the bulk and are coupled to the surface by the e qz factor. This surface-bulk 
coupling is an essential ingredient of any hydrodynamic theory of the liquid surface: the surface is not a 
separable phase. 

We now evaluate the spectrum of interfacial fluctuations S(q,w). It is the space-time Fourier transform of the 

correlation function (£(,r,f)C(* V))- It is convenient to do this calculation first for the fluctuations around a 
NESS which has a small constant temperature gradient, no convection and constant pressure. The 
corresponding results for the system in equilibrium are obtained by setting the temperature gradient to zero. 


There are three steps in the calculation: first, solve the full nonlinear set of hydrodynamic equations in the 
steady state, where the time derivatives of all quantities are zero; second, linearize about the steady-state 
solutions; third, postulate a non-equilibrium ensemble through a generalized fluctuation dissipation relation. 

A steady-state solution of the full nonlinear hydrodynamic equations is v=0,p = constant and d 775 x = 
constant, where the yz walls perpendicular to the xy plane of the interface are kept at different temperatures. 
This steady-state solution for a small temperature gradient means that the characteristic length scale of the 
temperature gradient (d In T/d x) <KL. The solution also implicitly means that the thermal expansion 
coefficient and surface excess entropy are negligible, i.e. (d lnp/9 In T) and (d lna/3 In T) are both 
approximately zero, which in turn ensures that there is no convection in the bulk or at the surface. We again 
assume that the fluid is incompressible and away from the critical point. Then, linearizing around the steady- 
state solution once again leads, for £, to the equation of motion identical to (A3. 3. 30), which in Fourier space 
(</,w) can be written as (assuming s <Kw c ) 
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P J-»c 


(A3.3.31) 

- tf XJ (</ ? z, w) - 2is xz {g, z, w)l 

The shear viscosity r| is, to a good approximation, independent of T. So the only way the temperature gradient 
can enter the analysis is through the form of the non-equilibrium ensemble, i.e. through the random forces s jk . 
Now we assume that the short-ranged random forces have the same form of the real-space correlation as in the 
thermal equilibrium case above ( equation (A3. 3. 24) ), but with T replaced by T(x) = T Q + (d Tldx) Q -x. Thus the 

generalized fluctuation dissipation relation for a NESS, which determines the NESS ensemble, is 

{jtttf, ■:, w)s im (i}\ -,'. u/Mnlss = 2ftTij \Su& km +S itri 5 k t - -5^ ffl 

Hz - z)Slw - u/)(2?r> 3 (A3.3.32) 

Then, from equation (A3. 3. 31) and equation (A3. 3. 32), we obtain the spectrum of interfacial fluctuations: 

(£{Jr, M'H(x'. «/)) = 2x&{u> ~ w') [ -—Z^'^Sfa «')■ (A3.3.33) 

J (2jt)- 

In the absence of a temperature gradient, i.e. in thermal equilibrium, the dynamic structure factor S(q,w) is 

Siq, W) = ^5 ,, / ,, , , (A3.3.34) 

which is sharply and symmetrically peaked at the capillary wave frequencies w c (q) = ±(g q /p) . In the 
NESS, the result has asymmetry and is given by 


■SNWffiti- W> = 5(^ IW)(] - ^ A(5 ? W» (A3.3.35) 


where 


[2^ 2 + ^(f/> 2 ]4^f^//»f7 ■ (3 In ?7 f lT)o (A3 3 36) 
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Since £ is the 'surface-averaged' part of 8p from equation (A3. 3. 3 6) , S(*/,w) is the appropriately 'surface- 
averaged' density fluctuation spectrum near an interface, and is thus experimentally accessible. The correction 
term A(</, w) is an odd function of frequency which creates an asymmetry in the heights of the two ripplon 

peaks. This is on account of the small temperature gradient breaking the time reversal symmetry: there are 
more ripples travelling from the hot side to the cold side than from cold to hot. One can also calculate the 
zeroth and first frequency moments of S(q,w): 

5%,)=.^ (A3.3.37) 

<?q 2 


and 


*■>-[£•■(¥); 


kT 


IH 


V 1 


(A3.3.38) 


These are both long ranged in the long-wavelength limit q — » 0: Si(q) due to broken translational symmetry 

and S (q) due to broken time reversal symmetry. S (q) vanishes for fluctuations around equilibrium, and Si(q) 
is the same for both NESS and equilibrium. The results above are valid only for L^ ^>L ^>/ c where the two 

bounding length scales are respectively characteristic of the temperature gradient 


I". /ainrv 


and of the capillary wave mean free path 


"" %£($)' 

The correction due to the temperature gradient in the capillary wave peak heights is the corresponding 
fractional difference, which can be obtained by evaluating A(</, w = w ). The result is simple: 

A((j,W = W i: ) =3— . 

For the system in thermal equilibrium, one can compute the time-dependent mean square displacement (ICI ) 


(q,t), from the damped forced harmonic oscillator equation for £, equation (A3. 3. 30) . The result is 


{\<\ 2 )(q.t) = ^ 

CJ(f- 


- exp 1 q-t \ (A3.3.39) 
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which goes to Si(q) as t —> oo as required. By integrating it over the two-dimensional wavevector </, one can 
find the mean square displacement of the interface: 

kT 
tt-)(0 = -^[lnr + E,{r) + C] 

where t = (4^/m M /pVwith q mSiX = 2n/a, with a being a typical molecular size (diameter), E^ is Euler's 

integral and C ~ 0.577 is Euler's number. Thus, as t — » go, (C~)(t) diverges as In t, which is the dynamic 
analogue of the well known infrared divergence of the interfacial thickness. Numerically, the effect is small: 

at t ~ 10 18 s we find using typical values of T, a, r| and p such that [(£ )(*)] ~ 9 A. From equation (A3. 3. 3 9) 
one can also proceed by first taking the t — » oo limit and then integrating over */, with the result 

(C 2 > = J^L In ^ 
lira q min 


where q - n = 2n/L. Again (Cf) shows a logarithmic infrared divergence as 2n/L —> which is the 
conventional result obtained from equilibrium statistical mechanics. This method hides the fact, which is 
transparent in the dynamic treatment, that the source of the divergence is the spontaneous random force 
fluctuations in the bulk, which drive the oscillations of the surface C,. The equilibrium description of capillary 
wave excitations of a surface are often introduced in the framework of the so-called capillary wave model: the 
true free energy is functionally expanded around a 'bare' free energy in terms of the suppressed density 
fluctuations 8 p, and these 'capillary wave fluctuations' are assumed to be of the form 

S^ = -<(x)-^T-- (A3.3.40) 

dz 


It can be shown that this form leads to an unphysical dispersion relation for capillary waves: wj *~ f/ 4 , rather 

than ~q . This is precisely because of the neglect of the surface-bulk coupling in the above assumed form. 
One can show that a fluctuation consistent with capillary wave dispersion is 




(A3.3.41) 


for z > £, where one neglects the vapour density, and where c is the speed of sound in the bulk phase coupled 
to the surface. Thus, if one wants to introduce density fluctuations into the description, the entire fluid has to 
be self-consistently treated as compressible. Physically, the first term ^{dp{z)ldz) corresponds to a 
perturbation, or kick, of the interface, and the second term self-consistently accounts for the pressure 
fluctuations in the bulk due to that kick. Neglecting the second term amounts to violating momentum 
conservation, resulting in an incorrect 'energy-momentum relation' for the capillary wave excitations. 
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A3.3.3 NON-EQUILIBRIUM TIME-EVOLVING SYSTEMS 

There are many examples in nature where a system is not in equilibrium and is evolving in time towards a 
thermodynamic equilibrium state. (There are also instances where non-equilibrium and time variation appear 
to be a persistent feature. These include chaos, oscillations and strange attractors. Such phenomena are not 
considered here.) 

A pervasive natural phenomenon is the growth of order from disorder which occurs in a variety of systems. 
As a result, an interdisciplinary area rich in problems involving the formation and evolution of spatial 
structures has developed, which combines non-equilibrium dynamics and nonlinear analysis. An important 
class of such problems deals with the kinetics of phase ordering and phase separation, which are 
characteristics of any first-order phase transition. Examples of such growth processes occur in many diverse 
systems, such as chemically reacting systems, biological structures, simple and binary fluids, crystals, 
polymer melts and metallic alloys. It is interesting that such a variety of systems, which display growth 
processes, have common characteristics. In the remainder of chapter A3. 3 we focus our attention on such 
common features of kinetics, and on the models which attempt to explain them. Substantial progress has 
occurred over the past few decades in our understanding of the kinetics of domain growth during first-order 
phase transitions. 

Consider an example of phase separation. It is typically initiated by a rapid change (quench) in a 
thermodynamic variable (often temperature, and sometimes pressure) which places a disordered system in a 
post-quench initial non-equilibrium state. The system then evolves towards an inhomogeneous ordered state 
of coexisting phases, which is its final equilibrium state. Depending on the nature of the quench, the system 
can be placed in a post-quench state which is thermodynamically unstable or metastable (see figure A3. 3. 2 ). 
In the former case, the onset of separation is spontaneous, and the kinetics that follows is known as spinodal 
decomposition. For the metastable case, the nonlinear fluctuations are required to initiate the separation 
process; the system is said to undergo phase separation through homogeneous nucleation if the system is pure 
and through heterogeneous nucleation if system has impurities or surfaces which help initiate nucleation. The 
phase transformation kinetics of supercooled substances via homogeneous nucleation is a fundamental topic. 
It is also important in science and technology: gases can be compressed way beyond their equilibrium 
pressures without forming liquids, and liquids can be supercooled several decades below their freezing 
temperature without crystallizing. 
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Figure A3.3.2 A schematic phase diagram for a typical binary mixture showing stable, unstable and 
metastable regions according to a van der Waals mean field description. The coexistence curve (outer curve) 
and the spinodal curve (inner curve) meet at the (upper) critical point. A critical quench corresponds to a 
sudden decrease in temperature along a constant order parameter (concentration) path passing through the 
critical point. Other constant order parameter paths ending within the coexistence curve are called off-critical 
quenches. 

In both cases the late stages of kinetics show power law domain growth, the nature of which does not depend 
on the initial state; it depends on the nature of the fluctuating variable(s) which is (are) driving the phase 
separation process. Such a fluctuating variable is called the order parameter; for a binary mixture, the order 
parameter 0(r,t) is the relative concentration of one of the two species and its fluctuation around the mean 
value is Sc(iV) = c(r,t) - c Q . In the disordered phase, the system's concentration is homogeneous and the order 

parameter fluctuations are microscopic. In the ordered phase, the inhomogeneity created by two coexisting 
phases leads to a macroscopic spatial variation in the order parameter field near the interfacial region. In a 
magnetic system, the average magnetization characterises the para-ferro magnetic transition and is the order 
parameter. Depending on the system and the nature of the phase transition, the order parameter may be scalar, 
vector or complex, and may be conserved or non-conserved. 

Here we shall consider two simple cases: one in which the order parameter is a non-conserved scalar variable 
and another in which it is a conserved scalar variable. The latter is exemplified by the binary mixture phase 
separation, and is treated here at much greater length. The former occurs in a variety of examples, including 
some order-disorder transitions and antiferromagnets. The example of the para-ferro transition is one in 
which the magnetization is a conserved quantity in the absence of an external magnetic field, but becomes 
non-conserved in its presence. 


-19- 


For a one-component fluid, the vapour-liquid transition is characterized by density fluctuations; here the order 
parameter, mass density p, is also conserved. The equilibrium structure factor S(k) of a one component fluid is 


discussed in section A2. 2. 5. 2 and is the Fourier transform of the density-density correlation function. For 
each of the examples above one can construct the analogous order parameter correlation function. Its spatial 
Fourier transform (often also denoted by S(k)) is, in most instances, measurable through an appropriate elastic 
scattering experiment. In a quench experiment which monitors the kinetics of phase transition, the relevant 
structure evolves in time. That is, the equal-time correlation function of the order parameter fluctuations (8(|>( 
F,£)8(|)(0, t))^. which would be time independent in equilibrium, acquires time dependence associated with 

the growth of order in the non-equilibrium system. Its spatial Fourier transform, S(k,t) is called the time- 
dependent structure factor and is experimentally measured. 

The evolution of the system following the quench contains different stages. The early stage involves the 
emergence of macroscopic domains from the initial post-quench state, and is characterized by the formation 
of interfaces (domain walls) separating regions of space where the system approaches one of its final 
coexisting states (domains). Late stages are dominated by the motion of these interfaces as the system acts to 
minimize its surface free energy. During this stage the mean size of the domains grows with time while the 
total amount of interface decreases. Substantial progress in the understanding of late stage domain growth 
kinetics has been inspired by the discovery of dynamical scaling, which arises when a single length dominates 
the time evolution. Then various measures of the morphology depend on time only through this length (an 
instantaneous snapshot of the order parameter's space dependence is referred to as the system's morphology 
at that time). The evolution of the system then acquires self-similarity in the sense that the spatial patterns 
formed by the domains at two different times are statistically identical apart from a global change of the 
length scale. 

— r 

The time-dependent structure factor S(k,t), which is proportional to the intensity I(k,t) measured in an elastic 
scattering experiment, is a measure of the strength of the spatial correlations in the ordering system with 
wavenumber k at time t. It exhibits a peak whose position is inversely proportional to the average domain size. 
As the system phase separates (orders) the peak moves towards increasingly smaller wavenumbers (see figure 

A3.3.3. 
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Figure A3 .3 .3 Time-dependent structure factor as measured through light scattering experiments from a phase 

separating mixture of polystyrene (PS) (M= 1.5 x 10 5 ) and poly(vinylmethylether) (PVME) (M= 4.6 x 10 ) 
following a fast quench from a homogeneous state to T= 101 °C located in the two-phase region. The time in 


seconds following the quench is indicated for each structure factor curve. Taken from [11]. 

A signature of the dynamical scaling is evidenced by the collapse of the experimental data to a scaled form, 
for a d-dimensional system: 


i(t,o = (ff(or^u«(0) 


(A3.3.42) 


where S Q is a time-independent function and R(t) is a characteristic length (such as the average domain size) 
(see figure A3. 3. 4 ). To the extent that other lengths in the system, such as the interfacial width, play important 
roles in the kinetics, the dynamical scaling may be valid only asymptotically at very late times. 
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Figure A3. 3. 4 Time-dependent structure factor as measured through light scattering experiments from a phase 
separating mixture of 2,6-lutidine and water, following a fast quench from a homogeneous state through the 
critical point to a temperature 0.6 mK below the critical temperature. The time in seconds, following the 
quench is indicated for each structure factor curve. In the figure on the right-hand side the data collapse 
indicates dynamic scaling. Taken from [12]. 

Another important characteristic of the late stages of phase separation kinetics, for asymmetric mixtures, is the 
cluster size distribution function of the minority phase clusters: n(R,x)dR is the number of clusters of minority 
phase per unit volume with radii between R and 7? + dR. Its zeroth moment gives the mean number of clusters 
at time x and the first moment is proportional to the mean cluster size. 


A3.3.3.1 LANGEVIN MODELS FOR PHASE TRANSITION KINETICS 


Considerable amount of research effort has been devoted, especially over the last three decades, on various 
issues in domain growth and dynamical scaling. See the reviews [13, 14, 15, 16 and 17 ]. 

Although in principle the microscopic Hamiltonian contains the information necessary to describe the phase 
separation kinetics, in practice the large number of degrees of freedom in the system makes it necessary to 
construct a reduced description. Generally, a subset of slowly varying macro variables, such as the 
hydrodynamic modes, is a useful starting point. The equation of motion of the macro variables can, in 
principle, be derived from the microscopic 
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Hamiltonian, but in practice one often begins with a phenomenological set of equations. The set of 
macrovariables are chosen to include the order parameter and all other slow variables to which it couples. 
Such slow variables are typically obtained from the consideration of the conservation laws and broken 
symmetries of the system. The remaining degrees of freedom are assumed to vary on a much faster timescale 
and enter the phenomenological description as random thermal noise. The resulting coupled nonlinear 
stochastic differential equations for such a chosen 'relevant' set of macrovariables are collectively referred to 
as the Langevin field theory description. 

In two of the simplest Langevin models, the order parameter § is the only relevant macro variable; in model A 
it is non-conserved and in model B it is conserved. (The labels A, B, etc have historical origin from the 
Langevin models of critical dynamics; the scheme is often referred to as the Hohenberg-Halperin 
classification scheme.) For model A, the Langevin description assumes that, on average, the time rate of 
change of the order parameter is proportional to (the negative of) the thermodynamic force that drives the 
phase transition. For this single variable case, the thermodynamic force is canonically conjugate to the order 
parameter: i.e. in a thermodynamic description, if § is a state variable, then its canonically conjugate force is 
S/700 (see figure A3. 3. 5), where/is the free energy. 
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Figure A3 .3.5 Thermodynamic force as a function of the order parameter. Three equilibrium isotherms (full 
curves) are shown according to a mean field description. For T < T c , the isotherm has a van der Waals loop, 
from which the use of the Maxwell equal area construction leads to the horizontal dashed line for the 
equilibrium isotherm. Associated coexistence curve (dotted curve) and spinodal curve (dashed line) are also 
shown. The spinodal curve is the locus of extrema of the various van der Waals loops for T< T Q . The states 
within the spinodal curve are thermodynamically unstable, and those between the spinodal and coexistence 


curves are metastable according to the mean field description. 

In a field theory description, the thermodynamic free energy/is generalized to a free energy functional 
jFl0(r, .0], leading to the thermodynamic force as the analogous functional derivative. The Langevin equation 
for model A is then 

-f = -M— +,,(F t |) (A3.3.43) 

dt 5<p 
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where the proportionality coefficient M is the mobility coefficient, which is related to the random thermal 
noise r| through the fluctuation dissipation theorem: 

(j|<r H i)n{r\ 0) = kTM5(r - r')&(t - *')- (A3.3.44) 

The phenomenology of model B, where § is conserved, can also be outlined simply. Since § is conserved, it 
obeys a conservation law (continuity equation): 

HA - - 

-=-VJ(Fj) (A3.3.45) 

(it 

where (provided ^ itself is not a conserved variable) one can write the transport law 

j(r t t) = -LM^(M) + (*J (A3.3.46) 

with f 'being the order parameter current arising from thermal noise, and \i(r,t), which is the local chemical 

potential, being synonymous with the thermodynamic force discussed above. It is related to the free energy 
functional as 

^(r,0=— (A3.3.47) 

Putting it all together, one has the Langevin equation for model B: 

+ £ (A3.3.48) 


fa \&$J 


where f = v ■ f *is the random thermal noise which satisfies the fluctuation dissipation theorem: 

tf (7. tX(?\ t f )) = -2kTMVH(7 - T')Hr - *'). (A3.3.49) 

As is evident, the free energy functional Splays a crucial role in the model A/B kinetics. It contains a number 

of terms. One of these is the local free energy termf(§) which can be thought of as a straightforward 
generalization of the thermodynamic free energy function in which the global thermodynamic variable § is 


replaced by its local field value §(r,t). Many universal features of kinetics are insensitive to the detailed shape 
off$). Following Landau, one often uses for it a form obtained by expanding around the value of § at the 
critical point, (j> . If the mean value of § is #, then 


8$ = (<p - 0) = tf - 4) + (A- - 4>) = tf> - + * tf 
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with §* = ((|) - (|> c ) and (|> o = (§ c - $)• The Landau expansion is written in terms of §* as 


(A3.3.50) 


/(*) = Wr - W + \ur A - w* 


(A3.3.51) 


where an external field Wis assumed to couple linearly to (|). In the absence of ?i,f{<\>) has a single minimum 

for temperatures above T n at (|>* = 0, and two minima below 

7^ aE $* = ±[*j C7i - T }/rr J 1 -''" = ±^V n corresponding to the two coexisting ordered phases in equilibrium 

(see figure A3. 3. 6 . 



order parameter 



order parameter 


Figure A3 .3.6 Free energy as a function of the order parameter §* for the homogeneous single phase (a) and 
for the two-phase regions (b), 7i= 0. 

The free energy functional also contains a square gradient term which is the cost of the inhomogeneity of the 
order parameter at each point in the system. Such a surface energy cost term occurs in many different 
contexts; it was made explicit for binary mixtures by Cahn and Hilliard [18] , for superconductors by 
Ginzburg, and is now commonplace. It is often referred to as the Ginzburg term. This Landau-Ginzburg free 
energy functional is 


JW. oi - J ci'V [/{$) + ^(v^) 2 ] 


(A3.3.52) 


where the coefficient k of the square gradient term is related to the interfacial (domain wall) tension a through 
the mean field expression 


-L<$ 


(A3.3.53) 


where z is the direction of the local normal to the interface, and §(z) is the equilibrium order parameter profile. 
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There is a class of systems where, in addition to the above two terms, there is a non-local free energy 
functional term in f. This can arise due to elastic fields, or due to other long-range repulsive interactions 

(LRRI) originating from the coherent action of molecular dipoles (electric or magnetic). The square gradient 
(Ginzburg) term is a short-range attractive term which then competes with the LRRI, resulting in a rich 
variety of structures often termed supercrystals. For such systems, which include Langmuir monolayers and 
uniaxial magnetic garnet films, the kinetics is much richer. It will not be considered here. See [19, 20] for 
reviews. 

With the form of free energy functional prescribed in equation (A3. 3. 52) , equation (A3. 3.43) and equation 
(A3. 3.48) respectively define the problem of kinetics in models A and B. The Langevin equation for model A 
is also referred to as the time-dependent Ginzburg-Landau equation (if the noise term is ignored); the model 
B equation is often referred to as the Cahn-Hilliard-Cook equation, and as the Cahn-Hilliard equation in the 
absence of the noise term. 

For deep quenches, where the post-quench T is far below T c , the equations are conveniently written in terms 
of scaled (dimensionless) variables: ${*> t) = ^V^im a«d -f = ?/§, where the correlation length £ ) =(k/(ci o \T c 

- 71)) ? and the dimensionless time x is defined as equal to [{2Ma\T n - T\)t] for model A and equal to 

9 9 O C 

[(2Ma Q (T c - T) /k)t] for model B. In terms of these variables the model B Langevin equation can be written 
as 


^ = -ly^V^ + ^ _ ^ + f \/2 fl{ - T) (A3.3.54) 

az 2 

where 


f 


= -£L-i 1 \ A3.3.55 


is the strength of the random thermal noise |u which satisfies 

(tl{X> T)lttf\ T')) = -Vj5(jf -X)S{I - T f ). (A3.3.56) 

Similarly, the dimensionless model A Langevin equation can also be obtained. The result is recovered by 
replacing the outermost V^by (-1) in equation (A3. 3. 54) and by (-1) in equation (A3. 3. 56). 

Using the renormalization group techniques, it has been shown, by Bray [16] , that the thermal noise is 
irrelevant in the deep-quench kinetics. This is because the free energy has two stable fixed points to which the 
system can flow: for T> T Q it is the infinite temperature fixed point, and for T< T Q it is the zero-temperature 

strong coupling fixed point. Since at T= 0, the strength of the noise s vanishes, the thermal noise term |u can 
be neglected in model B phase separation kinetics during which T< T . The same conclusion was also 
obtained, earlier, [21] from a numerical simulation of equation (A3. 3. 54) and equation (A3. 3. 56). In what 
follows, we ignore the thermal noise term. One must note, however, that there are many examples of kinetics 


where the thermal noise can play an important role. See for example a recent monograph [22] . 
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For critical quench experiments there is a symmetry (|> o = and from equation (A3. 3. 50) 8(|> = ()>*, leading to a 
symmetric local free energy ( figure A3. 3. 6 ) and a scaled order parameter whose average is zero, S\|/ = \|/. For 
off-critical quenches this symmetry is lost. One has 8(j> = (j>* + (|) which scales to S\|/ = v|/+\|/ with 
^ = ^L>/^* fl ;, r tfu. \|/ Q is a measure of how far off-critical the system is. For \|/ Q = ±1 the system will be 

quenched to the coexistence curve and \|/ Q = corresponds to a quench through the critical point. In general, 
one has to interpret the dimensionless order parameter \|/ in (A3. 3. 54) as a mean value plus the fluctuations. If 
one replaces \|/ in (A3. 3. 54) by \|/+\|/ , the mean value of the order parameter \|/ becomes explicit and the 
average of such a replaced \|/ becomes zero, so that now \|/ is the order parameter fluctuation. The 
conservation law dictates that the average value of the order parameter remains equal to \\f Q throughout the 
time evolution. Since the final equilibrium phase corresponds to \|/ + = ±1, non-zero \|/ Q reflects an asymmetry 
in the spatial extent of these two phases. The degree of asymmetry is given by the lever rule. A substitution of 
\|/ by \|/+v|/ in (A3. 3. 54) yields the following nonlinear partial differential equation (we ignore the noise term): 

"57 = " j 7M + VjW " 3lM 2 - *>) (A3.3.57) 

where q^ = (1 — 3^j). For a critical quench, when \|/ Q = 0, the bilinear term vanishes and ^"becomes one, so 

that the equation reduces to the symmetric equation (A3. 3. 54) . In terms of the scaled variables, it can be 
shown that the equation of the classical spinodal, shown in figure A3. 3. 2 and figure A3. 3. 5 is </ L ~= or 
|^| = l/%/3. For states within the classical mean field spinodal, </ L ~> 0. 

Equation (A3. 3. 57) must be supplied with appropriate initial conditions describing the system prior to the 
onset of phase separation. The initial post-quench state is characterized by the order parameter fluctuations 
characteristic of the pre-quench initial temperature 7\. The role of these fluctuations has been described in 
detail in [23] . However, again using the renormalization group arguments, any initial short-range correlations 
should be irrelevant, and one can take the initial conditions to represent a completely disordered state at T = 
oo. For example, one can choose the white noise form (\|/(,r,0)\|/(.r',0)) = £ 5(.r - Jf '), where (•••) represents an 

average over an ensemble of initial conditions, and s Q controls the size of the initial fluctuations in \|/; s q 4C1. 

The fundamental problem of understanding phase separation kinetics is then posed as finding the nature of 
late-time solutions of deterministic equations such as (A3. 3. 57) subject to random initial conditions. 

A linear stability analysis of (A3. 3. 5 7) can provide some insight into the structure of solutions to model B. 
The linear approximation to (A3. 3. 57) can be easily solved by taking a spatial Fourier transform. The result 
for the Mi Fourier mode is 

Y'(*,T) = C* r V/(JU>) (A3.3.58) 

where the exponential growth exponent y k is given by 

Yk= f* 2 ^-* 2 ). (A3.3.59) 
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For <k<q c ,y k is positive, and the corresponding Fourier mode fluctuations grow in time, i.e. these are the 
linearly unstable modes of the system. The maximally unstable mode occurs at k m = ^./v^and overwhelms 
all other growing modes due to exponential growth in the linear approximation. The structure factor can also 
be computed analytically in this linear approximation, and has a time invariant maximum at k = k .In binary 
polymer mixtures (polymer melts), the early time experimental observations can be fitted to a structure factor 
form obtained from a linear theory on account of its slow dynamics. 

The limitations and range of validity of the linear theory have been discussed in [ 17 , 23 , 24 ]. The linear 
approximation to equation (A3. 3. 54) and equation (A3. 3. 5 7) assumes that the nonlinear terms are small 
compared to the linear terms. As ^{jfc, t Jincreases with time, at some crossover time t cr the linear 

approximation becomes invalid. This occurs roughly when (\|/ 2 ) becomes comparable to 

{tysp — $o) 2 = ^rtfr O ne can obtain t cr using equation (A3. 3. 58) , in which k can be replaced by k m , since 

the maximally unstable mode grows exponentially faster than other modes. Then the dimensionless crossover 
time T cr = / tr (2Mtf/^)is obtained from 

(ft» - %) 2 = (l*&, r cr )\ 2 ) = C^'-flWJU.O)! 2 ) 

where the initial fluctuation spectrum is to be determined from the Ornstein-Zernicke theory, at the pre- 
quench temperature T : 

{\nlm 2 )= ** ,. . 

(*£ + Qc > 

Here 8 Q is given by equation (A3. 3. 5 5) evaluated at T , and can be written as e fl = kT^HK~ 2 ^t~^'- Using the 
values Jt; r = q*/2 t ^ = 1/3, and = Kf[a t AT„ - f r )], one obtains 

2y^x„ =-ln^> + ln(^) + lnf ^- I. 

As is evident from the form of the square gradient term in the free energy functional, equation (A3. 3. 52) , k is 
like the square of the effective range of interaction. Thus, the dimensionless crossover time depends only 
weakly on the range of interaction as In (k). For polymer chains of length TV, k - TV. Thus for practical 
purposes, the dimensionless crossover time x cr is not very different for polymeric systems as compared to the 

small molecule case. On the other hand, the scaling of t to x is through a characteristic time which itself 

' & cr cr & 

increases linearly with k, and one has 

which behaves like k In (k) ~ Nln(N) for polymeric systems. It is clear that the longer time for the validity of 
linear theory for polymer systems is essentially a longer characteristic time phenomenon. 

For initial post-quench states in the metastable region between the classical spinodal and coexistence curves, 
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is negative and so is y k for all values of k Linear stability analysis is not adequate for the metastable region, 
since it predicts that all modes are stable. Nonlinear terms are important and cannot be ignored in the kinetics 
leading to either nucleation or spinodal decomposition. The transition from spinodal decomposition to 
nucleation is also not well defined because nonlinear instabilities play an increasingly more important role as 
the 'classical spinodal' is approached from within. 

A3.3.3.2 UNSTABLE STATES AND KINETICS OF SPINODAL DECOMPOSITION 

Equation (A3. 3. 5 7) is an interesting nonlinear partial differential equation, but it is mathematically intractable. 
It contains quadratic and cubic nonlinear terms. The cubic term treats both phases in a symmetric manner; for 
a symmetric binary mixture only this term survives and leads to a labyrinthian morphology in which both 
phases have an equal share of the system volume. This term is the source of spinodal decomposition. For the 
symmetric case the partial differential equation is parameter free and there is no convenient small expansion 
parameter, especially during early times (the linear approximation loses its validity around x ~ 10). At late 
times, the ratio of the interfacial width to the time-dependent domain size ^/i?(x) was used as a small 
parameter by Pego [25] in a matched asymptotic expansion method. It leads to useful connections of this 
nonlinear problem to the Mullins-Sekerka instability for the slowest timescale and to the classic Stefan 
problem on a faster timescale. The quadratic term treats the two phases in an asymmetric manner and is the 
source of nucleation-like morphology. As the off criticality \|/ increases, the quadratic term gradually 
assumes a greater role compared to the cubic nonlinear term. Nucleation-like features in the kinetics occur 
even for a 49-51 mixture in principle, and are evident at long enough times, since the minority phase will 
form clusters within the majority background phase for any asymmetric mixture. 

While approximate analytical methods have played a role in advancing our understanding of the model B 
kinetics, complimentary information from laboratory experiments and numerical simulations have also played 
an important role. Figure A3. 3. 3 and figure A3. 3. 4 show the time-dependent structure factors from laboratory 
experiments on a binary polymer melt and a small molecule binary mixture, respectively. Compared to the 
conceptual model B Langevin equation discussed above, real binary mixtures have additional physical effects: 
for a binary polymer melt, hydrodynamic interactions play a role at late times [17] ; for a small molecule 
binary fluid mixture, hydrodynamic flow effects become important at late times [26] ; and for a binary alloy, 
the elastic effects play a subsidiary, but important, role [37]. In each of these systems, however, there is a 
broad range of times when model B kinetics are applicable. Comparing the approximate theory of model B 
kinetics with the experimental results from such systems may not be very revealing, since the differences may 
be due to effects not contained in model B. Comparing an approximate theory to computer simulation results 
provides a good test for the theory, provided a good estimate of the numerical errors in the simulation can be 
made. 

In the literature there are numerical simulations of equation (A3. 3. 5 7) for both two- and three-dimensional 
systems [21, 23, 28, 29, 30 and 31 ]. For a two-dimensional system, morphology snapshots of the order 
parameter field \|/ are shown in figure A3. 3. 7 for late times, as obtained from the numerical simulations of 
(A3. 3. 57) . The light regions correspond to positive values of \|/ and the dark regions to negative values. For 
the critical quench case ( figure A3. 3. 7(a) and (b) )), the (statistical) symmetry of \|/ between the two phases is 
apparent. The topological difference between the critical and off-critical quench evolutions at late times is 
also clear: bicontinuous for critical quench and isolated closed cluster topology for asymmetric off-critical 
quench. Domain coarsening is also evident from these snapshots for each of the two topologies. For the off- 
critical quench, from such snapshots one can obtain the time evolution of the cluster size distribution. 
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Figure A3.3.7 The order parameter field morphology, for \|/ = 0.0 at (a) x = 500 and (b) x = 5000; and for \\r 
= 0.4 at (c) x = 500 and (d) x = 5000. The dark regions have \|/ < 0. From [28]. 

From a snapshot at time x, the spatial correlation function 0(x. r) = (^(.v, t)^(0, T)} can be computed, where 
<•••) includes the angular average assuming the system to be spatially isotropic. Repeating this for various 
snapshots yields the full space-and-time-dependent correlation function G(x,x). Its spatial Fourier transform is 
essentially the time-dependent structure factor S(k,t) measured in light scattering experiments (see figure 
A3. 3. 3 and figure A3. 3. 4 . There are a number of ways to obtain the time-dependent domain size, R(t): (i) 
first zero of G(x,x), (ii) first moment of S(k,t), (iii) value k m where S(k,t) is a maximum. The result that is now 
firmly established from experiments and simulations, is that 


RtT) 


.1/3 


(A3.3.60) 


independent of the system dimensionality d. In the next section ( section A3.3.4 ) we describe the classic theory 

1 fX 

of Lifshitz, Slyozov and Wagner, which is one of the cornerstone for understanding the x growth law and 
asymptotic cluster size distribution for quenches to the coexistence curve. 

As in the experiments, the simulation results also show dynamic scaling at late times. The scaling function S Q 

(kR(x)) at late times has the large k behaviour S (y) ~ y~(" +1 ) known as Porod's law [13, 16 ]. This result is 
understood to be the consequence of the sharp interfaces at late times. The small £ behaviour, S Q (y) ~y was 
independently predicted in [32, 33], and was put on a firm basis in [34] . 
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Interfaces play a central role in phase transition kinetics of both models A and B. Figure A3. 3. 8 shows the 
interfacial structure corresponding to Figure A3. 3. 7 (b). One can see the relationship between the interfacial 
width and the domain size for a late-stage configuration. The upper part of the figure demarks the interfacial 


regions of the system where 0.75\|/_ > \|/ > 0.75\|/ + . The lower plot gives a cross sectional variation of \|/ as the 
system is traversed. The steep gradients in \|/ in the lower plot clearly indicates the sharpness of interfaces at 
late times. 



(b) 


Figure A3.3.8 Interface structure for x = 5000, \|/ Q = 0. In (a) the shaded regions correspond to interfaces 
separating the domains. In (b) a cross sectional view of the order parameter \|/ is given. The location of the 
cross section is denoted by the horizontal line in (a). From [35], 

In figure A3. 3. 9 the early-time results of the interface formation are shown for \\f Q = 0.48. The classical 
spinodal corresponds to \\f Q ~ 0.58. Interface motion can be simply monitored by defining the domain 
boundary as the location where \\f = 0. Surface tension smooths the domain boundaries as time increases. 
Large interconnected clusters begin to break apart into small circular droplets around x = 160. This is because 
the quadratic nonlinearity eventually outpaces the cubic one when off-criticality is large, as is the case here. 
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Figure A3.3.9 Time dependence of the domain boundary morphology for \|/ Q = 0.48. Here the domain 
boundary is the location where \|/ = 0. The evolution is shown for early-time x values of (a) 50, (b) 100, (c) 
150, (d) 200, (e) 250 and (f) 300. From [29]. 

Some features of late-stage interface dynamics are understood for model B and also for model A. We now 
proceed to discuss essential aspects of this interface dynamics. Consider the Langevin equations without 
noise. Equation (A3. 3. 5 7) can be written in a more general form: 


|U-V^V> -/'(*)) 


(A3.3.61) 


where we have absorbed the factor ^in the time units of x, introduced £, even though it is one, in order to keep 

track of characteristic lengths, and denoted the thermodynamic force (chemical potential) by/. At late times 
the domain size R(x) is much bigger than the interfacial width £. Locally, therefore, the interface appears to be 
planar. Let its normal be in direction u, and let u = u Q at the point within the interface where \|/ = 0. Then 
iff = ^t(m, J, r) where prefers to the (d-l) coordinates parallel to the interface at point x . In essence, we have 
used the interface specific coordinates: Jf = R(s) + jjh(j), where n(s) is a unit normal at the interface, pointing 
from the \|/_ phase into the \|/ + phase. 
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The stationary solution of (A3. 3. 61) , when ^/R(x) is very small, satisfies 


?£h m 2L (A3.3.62) 

du- 3\,'/o 

which has a kink profile solution ^(w) = ±tanh((u — w c> )/( j \/2|)for the double well free energy 
f{$ ) = ftiJ/4 — 1^/2 • By linearizing around such a kink solution, one obtains a linear eigenvalue problem. 

Its lowest eigenmode is £> (u) = d\\f (u)/du and the corresponding eigenvalue is zero. It is localized within the 

interface and is called the Goldstone mode arising out of the broken translational symmetry at the interface. 

The higher eigenmodes, which are constructed to be orthogonal to the Goldstone mode, are the capillary wave 

fluctuation modes with the dispersion relation that is a generalised version [35] of that discussed in section 

A3. 3. 2.4 . The orthogonality to the Goldstone mode leads to a constraint which is used to show an important 

— — + 

relation between the local interface velocity v(,r) and local curvature K($). It is, to the lowest order in £, K, 

a^K(s) = J dw dT G(x\ x')$ (u)k{u?)v{x f ) (A3.3.63) 

where G{jc, F)is the diffusion Green's function satisfying V"G(,t, x x ) = Six — x'). The mean field surface 
tension a, defined in equation (A3. 3. 5 3) , is the driving force for the interface dynamics. The diffusion Green's 
function couples the interface motion, at two points (_7, JT)on the interface, inextricably to the bulk dynamics. 

For a conserved order parameter, the interface dynamics and late-stage domain growth involve the 
evaporation-diffusion-condensation mechanism whereby large droplets (small curvature) grow at the expense 
of small droplets (large curvature). This is also the basis for the Lifshitz-Slyozov analysis which is discussed 
in section A3. 3. 4 . 

If the order parameter is not conserved, the results are much simpler and were discussed by Lifshitz and by 
Allen and Cahn [36] , For model A, equation (A3.3.61) is to be replaced by the time-dependent Ginzburg- 
Landau equation which is obtained by removing the overall factor of (— V^ )from the right-hand side. This has 
the consequence that, in the constraint, equation (A3. 3. 63), the diffusion Green's function is replaced by 
S(j - -F)and the integrals can be performed with the right-hand side reducing to -a v(s). The surface tension 
then cancels from both sides and one gets the Allen-Cahn result: 

— f 2 jiT(.r) = t'(.s). (A3.3.64) 

For model A, the interfaces decouple from the bulk dynamics and their motion is driven entirely by the local 
curvature, and the surface tension plays only a background, but still an important, role. From this model A 
interface dynamics result, one can also simply deduce that the domains grow as R(%) ~ x : at some late time, 
a spherical cluster of radius R grows; since K~(d-1)/R and v — d/J/dx, one has R 2 ~ (d - l)x. 
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A3.3.4 LATE-STAGE GROWTH KINETICS AND OSTWALD RIPENING 

Late stages of model B dynamics for asymmetric quenches may be described by the Lifshitz-Slyozov- 
Wagner (LSW) theory of coarsening. When the scalar order parameter is conserved, the late-stage coarsening 
is referred to as Ostwald ripening. The LSW analysis is valid for late-stage domain growth following either 
spinodal decomposition or nucleation. A recent paper [37] has combined the steady-state homogeneous 
nucleation theory described in the next section, A3. 3. 5 , with the LSW analysis in a new model for the entire 
process of phase separation. If the initial condition places the post-quench system just inside and quite near 


the coexistence curve, the conservation law dictates that one (minority) phase will occupy a much smaller 
'volume' fraction than the other (majority) phase in the final equilibrium state. 

The dynamics is governed by interactions between different domains of the minority phase. At late times 
these will have attained spherical (circular) shape for a three (two)-dimensional system. For model B systems, 
the classic work of Lifshitz and Slyozov [38] and the independent work by Wagner [39] form the theoretical 
cornerstone. The late-stage dynamics is mapped onto a diffusion equation with sources and sinks (i.e. 
domains) whose boundaries are time dependent. The Lifshitz-Slyozov (LS) treatment of coarsening is based 
on a mean field treatment of the diffusive interaction between the domains and on the assumption of an 
infinitely sharp interface with well defined boundary conditions. The analysis predicts the onset of dynamical 
scaling. As in section A3. 3. 3.1 we shall denote the extent of the off-criticality by \|/ Q > 0. The majority phase 
equilibrates at \|/ + = +1 and the minority phase at \|/_ = -1. At late times, the minority clusters have radius R(x) 
which is much larger than the interface width £. An important coupling exists between the interface and the 
majority phase through the surface tension a. This coupling is manifested through a Gibbs-Thomson 
boundary condition, which is given later. 

The LS analysis is based on the premise that the clusters of the minority phase compete for growth through an 
evaporation-condensation mechanism, whereby larger clusters grow at the expense of smaller ones. (Material 
(of the minority phase) evaporates from a smaller cluster, diffuses through the majority phase background 
matrix and condenses on a larger cluster.) That is, the dominant growth mechanism is the transport of the 
order parameter from interfaces of high curvature to regions of low curvature by diffusion through the 
intervening bulk phases. The basic model B equations, (A3. 3.48) and (A3. 3. 52) , can be linearized around the 
majority phase bulk equilibrium value of the order parameter, \|/ + = 1 (which corresponds to the off-criticality 
\|/ = -1), by using \|/ = 1 +8\|/ and keeping only up to first-order terms in 5\|/. The result in dimensionless form 
is 


JL*_-^„ + g^) v»,„ 


(A3.3.65) 


where we have kept the interfacial width ^ as a parameter to be thought of as one; we retain it in order to keep 
track of the length scales in the problem. Since at late times the characteristic length scales are large compared 
to ^, the V^erm is negligible and S\|/ satisfies a diffusion equation, 

n 

—Si; = /"{l)V^ (A3.3.66) 
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Due to the conservation law, the diffusion field S\|/ relaxes in a time much shorter than the time taken by 
significant interface motion. If the domain size is ^(x), the diffusion field relaxes over a time scale x D - R . 
However a typical interface velocity is shown below to be ~R~ 2 . Thus in time x D , interfaces move a distance 
of about one, much smaller compared to R. This implies that the diffusion field S\|/ is essentially always in 
equilibrium with the interfaces and, thus, obeys Laplace's equation 


VSf =0 (A3.3.67) 


in the bulk. 


A3.3.4.1 GIBBS-THOMSON BOUNDARY CONDITION 

To derive the boundary condition, it is better to work with the chemical potential instead of the diffusion field. 
We have 

_L = -V - i (A3.3.68) 

dx J 

j = -V/* (A3.3.69) 

and 

(l = f{f)-fV 2 ir. (A3.3.70) 

In the bulk, linearizing |u leads to ^ = / '"($+)&$ - £-V 3 5^, where the v 2 term is again negligible, so that \i is 
proportional to S\|/. Thus |i also obeys Laplace's equation 

VV=0. (A3.3.71) 

Let us analyse \i near an interface. The Laplacian in the curvilinear coordinates («, -v)can be written such that 
(A3. 3.71) becomes (near the interface) 

"-'w-^ta ),*-«'(*?), <A3372) 

where K = V • wis the total curvature. The value of |i at the interface can be obtained from (A3. 3. 72) by 
multiplying it with (d\\f/du) T (which is sharply peaked at the interface) and integrating over u across the 
interface. 
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Since \i and K vary smoothly through the interface, one obtains a general result that, at the interface, 

jtA$ = &f -% 2 aK (A3.3.73) 

where A\|/ is the change in \|/ across the interface and A/is the difference in the minima of the free energy /for 
the two bulk phases. For the symmetric double well, A/= and A \|/ = 2. Thus 

At = -JfVir. (A3.3.74) 

We make two side remarks. 
(1) If the free energy minima have unequal depths, then this calculation can also be done. See [14] . 


v 2 ) Far away from the interface, |i=f ' (\(/ + )8\|/=28\|/ for the \\f form. Then one also has for the 
supersaturation 

&#{cc) = - lim SfUi) = -11/2 = +% 2 aK/4. (A3.3.75) 


it—^& 


The supersaturation s = 8\(/(oo) is the mean value of 8\|/, which reflects the presence of other subcritical 
clusters in the system. 

Equation (A3. 3. 73) is referred to as the Gibbs-Thomson boundary condition, equation (A3. 3. 74) determines |u 
on the interfaces in terms of the curvature, and between the interfaces |u satisfies Laplace's equation, equation 
(A3. 3. 71) . Now, since J = - V^i, an interface moves due to the imbalance between the current flowing into 
and out of it. The interface velocity is therefore given by 

inirt -Jin = t'A^f (A3.3.76) 

and also from equation (A3. 3. 69) , 

(A3.3.77) 


Jain - Jin = - — = -L JJ fc V /*J 


Here [•••] denotes the discontinuity in ••• across the interface. Equation (A3. 3. 71) , equation (A3. 3. 74), equation 
(A3. 3. 76) and equation (A3. 3. 77) together determine the interface motion. 

Consider a single spherical domain of minority phase (\|/_ = -1) in an infinite sea of majority phase (\|/ + = +1). 
From the definition of jlx in (A3. 3. 70) , |u = at infinity. Let R(x) be the domain radius. The solution of 
Laplace's equation, (A3. 3. 71) , for d > 2, with a boundary condition at oo and equation (A3. 3. 74) at r = R, is 
spherically symmetric and is, using K= (d- \)IR, 


2r 
and 
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(j. = - — for r > R (A3.3.78) 


ft = - Ut 1)g * 2 f or r < R. (A3.3.79) 


2R 

Then, using equation (A3. 3. 7 6) and equation (A3. 3. 77) , we obtain, since A\|/ = 2, 




[d- DjV 
4/f 2 


(A3.3.80) 


Integrating equation (A3. 3. 80), we get (setting £, = 1) 

R*(t) = R*(i)) - }(rf - l)rrr (A3.3.81) 

which leads to a '7? 3 proportional to x' time dependence of the evaporating domain: the domain evaporates in 
time x proportional to 7? 3 (0). 

A3.3.4.2 LS ANALYSIS FOR GROWING DROPLETS 

Again consider a single spherical droplet of minority phase (\|/_ = -1) of radius 7? immersed in a sea of 
majority phase. But now let the majority phase have an order parameter at infinity that is (slightly) smaller 
than +1, i.e. i|/(oo) = \|/ < 1. The majority phase is now 'supersaturated' with the dissolved minority species, 
and if the minority droplet is large enough it will grow by absorbing material from the majority phase. 
Otherwise it will evaporate as above. The two regimes are separated by a critical radius R . 

Let/(±1) = by convention, then the Gibbs-Thomson boundary condition, equation (A3. 3. 73) , becomes at r 
= R, 

{ ] + ^ = /( VU - {d ~ l ^ m (A3.3.82) 

At r = oo, from equation (A3 .3 .70) , 

H = f'M (A3.3.83) 
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The solution of Laplace's equation, (A3.3.71) , with these boundary conditions is, for d = 3, 


{ 7Wo> \ R 2(T 1 


A* = 


(A3.3.84) 


I l+Vo (1 + lMfl 


r < R. 


(A3.3.85) 


Using equation (A3 .3 .76) , equation (A3.3.77) and equation (A.3.3.84), one finds the interface velocity v = 
d/?/dx as 

dR _ / f(f„) f(*J \ 1 2a 1 

For a small supersaturation, \|/ = 1 - s with s ^1. To leading (non-trivial) order in s, equation (A3. 3. 86) 
reduces to 




(A3.3.87) 


with R = g/(/"(1)s) as the critical radius. 

The form of v(R) in (A3. 3. 8 7) is valid only for d = 3. If we write it as 

d/? tfj / 1 


It " R {r^ tfj 


, (A3.3.88) 

then the general expression (see [40] ) for a^ is a d = (d- \){d - 2)a/4. For d = 2,a d vanishes due to the 
singular nature of the Laplacian in two-dimensional systems. For d = 2 and in the limit of a small (zero) 
volume fraction of the minority phase, equation (A3. 3. 87) is modified to (see the appendix of [28]), 


d7 " 4tfln(4r) V fc " * / 


(A3.3.89) 


with R c = a/(2/'(l) £ )- A change of variable x* = x/ln(4x) converts (A3. 3. 89) into the same form as (A3. 3. 87), 
but now the time-like variable has a logarithmic modification. 

In the LS analysis, an assembly of drops is considered. Growth proceeds by evaporation from drops with 7? < 
R c and condensation onto drops R > R Q . The supersaturation s changes in time, so that s (x) becomes a sort of 
mean field due to all the other droplets and also implies a time-dependent critical radius R (x) = a/[/"(l) 8 ( T )]- 
One of the starting equations in the LS analysis is equation (A3. 3. 87) with/? (t). 
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For a general dimension d, the cluster size distribution function n(R, x) is defined such that n{R, x)d7? equals 
the number of clusters per unit 'volume' with a radius between 7? and 7? + d7?. Assuming no nucleation of new 
clusters and no coalescence, n(R, x) satisfies a continuity equation 

rlfl 3 

^ + "T77( l " | )=° (A3.3.90) 

tit dfi 

where v = dR/dx is given by equation (A3. 3. 87) . Finally, the conservation law is imposed on the entire system 
as follows. Let the spatial average of the conserved order parameter be (1 - s Q ). At late times the 
supersaturation s(x) tends to zero giving the constraint 

f u = e(z) + V tf [ dR H ( tn(R> r) - Vj f dR R {i t}(R, z) (A3.3.91) 

where V d is the volume of the J-dimensional unit sphere. Equation (A3. 3. 88) , equation (A3. 3. 90) and 
equation (A3. 3. 91) constitute the LS problem for the cluster size distribution function n{R, x). The LS analysis 
of these equations starts by introducing a scaling distribution of droplet sizes. For a J-dimensional system, one 
writes 


Mtt,i)=tt^ +u j(j^y 


(A3.3.92) 


Equation (A3. 3. 91) becomes, denoting R/R by x, 


= 2V< t / dr; 
Jo 


x*f(x) 


(A3.3.93) 


and fixes the normalization off[x). If equation (A3. 3. 92) is substituted into equation (A3. 3. 90) we obtain, 
using the velocity equation (A3. 3. 88) , 


Mi 


« tl)/w+ ,£ 


. = ^[(^-i) /(x)+ (i _ i)fc 


(A3.3.94) 


For the consistency of the scaling form, equation (A3. 3. 92), R dependence should drop out from equation 
(A3.3.94); i.e. 


which integrates to 


fljflc = «,iY 
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(A3.3.95) 


R e (i) = <3flr,,yT) ,/: \ 


(A3.3.96) 


Equation (A3 .3 .94) simplifies to 


2 I 
- 1 - — -y(d+\) 


x y x- 


fix) 


= \yx : - - + — 
L x x ~ 


<ix 


(A3.3.97) 


which integrates to 


f*dv(2-v-y(J+l)y s ) 
ln/(j)= / — — 


(A3.3.98) 


Due to the normalization integral, equation (A3 .3 .93) , fix) cannot be non-zero for arbitrarily large x;f(x) must 
vanish for x greater than some cut-off value x n , which must be a pole of the integrand in equation (A3. 3. 98) 
on the positive real axis. For this to occur y < ^ = y„. Equation (A3 .3 .88) and equation (A3. 3. 96) together 
yield an equation for x = R/R : 


^ = _L/I_1_ )M .) 

dr 3yr \x x 2 r } 


(A3.3.99) 
(A3.3.100) 


The form of g(x) is shown in figure A3. 3. 10 . For y < y , all drops with x > x, will asymptotically approach the 

size x 2 R c (t), which tends to infinity with x as x from equation (A3. 3. 96). For y > y o , all points move to the 
origin and the conservation condition again cannot be satisfied. The only allowed solution consistent with 

conservation condition, equation (A3. 3. 93) , is that y asymptotically approaches y from above. In doing this it 
takes an infinite time. (If it reaches y o in finite time, all drops with x > ^would eventually arrive at x = ^and 

become stuck and one has a repeat of the y > y case.) y = y*> = ^then corresponds to a double pole in the 

integrand in (A3.3.98). LS show that y{z) = Kill - e 2 (r)]with i (r ) -+ a& r -^ oo a s x -> oo. For a 
asymptotic scaled distribution, one uses y = y* — ^ an d evaluates the integral in (A3. 3. 98) to obtain 


/« = 


-<^) 


constant a- + x)~ 0+ h *( T " * ) C *P ( - ^ 1 for * < j 





for x > | 


(A3.3.101) 


where the normalization constraint, equation (A3. 3. 93) , can be used to determine the constant. f(x) is the 
scaled LS cluster distribution and is shown in figure A3. 3. 11 for d = 3. 
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Figure A3 .3. 10 g(x) as a function of x for the three possible classes of y. 
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Figure A3.3.11 The asymptotic cluster size distribution^) from LS analysis for d = 3. 


In section A3. 3. 3 the Langevin models that were introduced for phase transition kinetics utilized the Landau 
mean field expansion of the free energy, equation (A3. 3. 50) and equation (A3. 3. 51) . In spite of this, many of 
the subsequent results based on (A3. 3. 57) as a starting point are more broadly valid and are dependent only on 
the existence of a double-well nature of the free energy functional as shown in figure A3. 3. 6 . Also, as the 
renormalization group analysis shows, the role of thermal noise is irrelevant for the evolution of the initially 
unstable state. Thus, apart from the random fluctuations in the initial state, which are essential for the 
subsequent growth of unstable modes, the mean field description is a good theoretical starting point in 
understanding spinodal decomposition and the ensuing growth. The late-stage growth analysis given in this 
section is a qualitatively valid starting point for quenches with sufficient off-criticality, and becomes correct 
asymptotically as the off-criticality \|/ increases, bringing the initial post-quench state closer to the 
coexistence curve, where it is one. In general, one has to add to the LS analysis the cluster-cluster 
interactions. The current states of such extensions (which are non-trivial) are reviewed in [37, 40, 41 ]. 
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The main results are that the universal scaling form of the LS cluster distribution function/(x) given above 
acquires a dependence on S\|/ = (l-i|/ ) which measures the proximity to the coexistence curve (it is 
essentially the volume fraction for the vapour-liquid nucleation); also, the x growth law for the domain size 
has the form: R(x) = [K(8\\f o )x] , where K(8\\f o ) is a monotonically increasing function of 8\\f Q . 


A3.3.5 NUCLEATION KINETICS— METASTABLE SYSTEMS 

In this section, we restrict our discussion to homogeneous nucleation, which has a illustrious history spanning 
at least six decades (see [37, 42] and references therein). Heterogeneous nucleation occurs more commonly in 
nature, since suspended impurities or imperfectly wetted surfaces provide the interface on which the growth 
of the new phase is initiated. Heterogeneous nucleation is treated in a recent book by Debenedetti (see section 
3.4 of this book, which is listed in Further Reading); interesting phenomena of breath figures and dew 
formation are related to heterogeneous nucleation, and are discussed in [43, 44 and 45]. 

In contrast to spinodal decomposition, where small-amplitude, long-wavelength fluctuations initiate the 
growth, the kinetics following an initial metastable state requires large-amplitude (nonlinear) fluctuations for 
the stable phase to nucleate. A qualitative picture for the nucleation event is as follows. For the initial 
metastable state, the two minima of the local free energy functional are not degenerate, in contrast to the 
initially unstable case shown in figure A3. 3. 6 . The metastable system is initially in the higher of the two 
minimum energy states and has to overcome a free energy barrier (provided by the third extremum which is a 
maximum) in order to go over to the absolute minimum, which is the system's ground state. For this, it 
requires an activation energy which is obtained through rarely occurring large-amplitude fluctuations of the 
order parameter, in the form of a critical droplet. Physically, the rarity of the nonlinear fluctuation introduces 
large characteristic times for nucleation to occur. 

The central quantity of interest in homogeneous nucleation is the nucleation rate J, which gives the number of 
droplets nucleated per unit volume per unit time for a given supersaturation. The free energy barrier is the 
dominant factor in determining J; J depends on it exponentially. Thus, a small difference in the different 
model predictions for the barrier can lead to orders of magnitude differences in J. Similarly, experimental 
measurements of J are sensitive to the purity of the sample and to experimental conditions such as 
temperature. In modern field theories, J has a general form 

(A3.3.102) 


J = J_ fie <^*r, 


where x* is the time scale for the macroscopic fluctuations and Q is the volume of phase space accessible for 

fluctuations. The barrier height to nucleation E is described below. 

& c 

A homogeneous metastable phase is always stable with respect to the formation of infinitesimal droplets, 
provided the surface tension a is positive. Between this extreme and the other thermodynamic equilibrium 
state, which is inhomogeneous and consists of two coexisting phases, a critical size droplet state exists, which 
is in unstable equilibrium. In the 'classical' theory, one makes the capillarity approximation: the critical 
droplet is assumed homogeneous up to the boundary separating it from the metastable background and is 
assumed to be the same as the new phase in the bulk. Then the work of formation W(R) of such a droplet of 
arbitrary radius R is the sum of the 
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free energy gain of the new stable phase droplet and the free energy cost due to the new interface that is 
formed: 


W(R) = 4xR 2 a - ^?iR 3 Af (A3.3.103) 

where A/is the positive bulk free energy difference per unit volume between the stable and metastable phases. 
From this, by maximizing W(R) with respect to R, one obtains the barrier height to nucleation W(R ) = E Q and 

the critical radius R : R c = 2g/(A/) and E = (167i/3)a 3 /(A/) 2 . For a supercooled vapour nucleating into liquid 
drops, A/is given by kTp^ ln(s), where p 1 is the bulk liquid density and s = P/P Q is the supersaturation ratio, 
which is the ratio of the actual pressure P to the equilibrium vapour pressure P Q of the liquid at the same T. 

For the case of the nucleation of a crystal from a supercooled liquid, A/= A|i/v ? where v is the volume per 
particle of the solid and A|u is the chemical potential difference between the bulk solid and the bulk liquid. 
These results, given here for three-dimensional systems, can be easily generalized to an arbitrary d- 
dimensional case [37] . Often, it is useful to use the capillary length l Q = (2av)/(kT) as the unit of length: the 
critical radius is R = I Je(t), where the supersaturation s(7) = A|u/(£7), and the nucleation barrier is (E JkT) = 

i 1/9 

(s Q /s(0) , where the dimensionless quantity s Q = l c [(4na)/(3kT)] ' . 

Early (classical) theories of homogeneous nucleation are based on a microscopic description of cluster 
dynamics (see reference (1) in [37]). A kinetic equation for the droplet number density n{t) of a given size / at 
time t is written, in which its time rate of change is the difference between J f _^ and J., where J. is the rate at 
which droplets of size i grow to size / + 1 by gaining a single molecule [13] . By providing a model for the 
forward and backward rates at which a cluster gains or loses a particle, J. is related to {n -(f)}, and a set of 
coupled rate equations for {nit)} is obtained. The nucleation rate is obtained from the steady-state solution in 
which J i = J for large i. The result is in the form of equation (A3. 3. 102) , with specific expressions for J Q = 
Q/t* obtained for various cases such as vapour-liquid and liquid-solid transitions. Classical theories give 
nucleation rates that are low compared to experimental measurements. Considerable effort has gone into 
attempts to understand 'classical' theories, and compare their results to experiments (see references in 42 ). 

In two classic papers [ 18 , 46], Cahn and Hilliard developed a field theoretic extension of early theories of 
nucleation by considering a spatially inhomogeneous system. Their free energy functional, equations 
(A3. 3. 52) , has already been discussed at length in section A3. 3. 3 . They considered a two-component 
incompressible fluid. The square gradient approximation implied a slow variation of the concentration on the 


coarse-graining length scale £, (i.e. a diffuse interface). In their 1959 paper [46] , they determined the saddle 
point of this free energy functional and analysed the properties of a critical nucleus of the minority phase 
within the metastable binary mixture. While the results agree with those of the early theories for low 
supersaturation, the properties of the critical droplet change as the supersaturation is increased: (i) the work 
required to form a critical droplet becomes progressively less compared to 'classical' theory result, and 
approaches zero continuously as spinodal is approached; (ii) the interface with the exterior phase becomes 
more diffuse and the interior of the droplet becomes inhomogeneous in its entirety; (iii) the concentration at 
the droplet centre approaches that of the exterior phase; and (iv) the radius and excess concentration in the 
droplet at first decrease, pass through a minimum and become infinite at the spinodal. These papers provide a 
description of the spatially inhomogeneous critical droplet, which is not restricted to planar interfaces, and 
yields, for W{R), an expression that goes to zero at the mean field spinodal. The Cahn-Hilliard theory has 
been a useful starting point in the development of modern nucleation theories. 

A full theory of nucleation requires a dynamical description. In the late 1960s, the early theories of 
homogeneous nucleation were generalized and made rigorous by Langer [47] . Here one starts with an 
appropriate Fokker-Planck 
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(or its equivalent Langevin) equation for the probability distribution function P({\\f f },t) for the set of relevant 
field variables {\|/.} which are semi-macroscopic and slowly varying: 


= - > — - (A3.3.104) 

us y*v- 

where the probability current J { is given by 


(A3.3.105) 


/"is the free energy functional, for which one can use equation (A3. 3. 52) . The summation above corresponds 

to both the sum over the semi-macroscopic variables and an integration over the spatial variable ?. The 

mobility matrix M.. consists of a symmetric dissipative part and an antisymmetric non-dissipative part. The 
ij 

symmetric part corresponds to a set of generalized Onsager coefficients. 

The decay of a metastable state corresponds to passing from a local minimum of J 7 to another minimum of 
lower free energy which occurs only through improbable free energy fluctuations. The most probable path for 
this passage to occur when the nucleation barrier is high is via the saddle point. The saddle point corresponds 
to a critical droplet of the stable phase in a metastable phase background. The nucleation rate is given by the 
steady-state solution of the Fokker-Planck equation that describes a finite probability current across the 
saddle point. The result is of the form given in (A3. 3. 102) . The quantity 1/t* is also referred to as a dynamical 
prefactor and Q as a statistical prefactor. 

Within this general framework there have been many different systems modelled and the dynamical, statistical 
prefactors have been calculated. These are detailed in [42]. For a binary mixture, phase separating from an 
initially metastable state, the work of Langer and Schwartz [48] using the Langer theory [47] gives the 
nucleation rate as 


™-«^s(?)>?r-[-te?] 

where l Q is the capillary length defined above and the characteristic time t c = t^/[DvC^(cc)]with D as the 

diffusion coefficient and C* n (oo) the solute concentration in the background matrix at a planar interface in the 
phase separated system [37] . 

One can introduce a distributed nucleation rate y (7?, t)dR for nucleating clusters of radius between R and R + 
dR. Its integral over R is the total nucleation rate J(t). Equation (A3.3.103) can be viewed as a radius- 
dependent droplet energy which has a maximum atR = R Q . If one assumes j(R, i) to be a Gaussian function, 
then 


j(R, r) = -= -exp 
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(A3.3.107) 


where (5J?)~ = 2\E^ — W(R)]f\E"\, with ^ and t^being, respectively, the values of W(R) and its second 
derivative evaluated at 7? = R . Langer [47] showed that the droplet energy is not only a function of R but can 
also depend on the capillary wavelength fluctuations w; i.e. W(R) — » E(R,w). Then, the droplets appear at the 
saddle point in the surface of E(R,w). The 2 - d surface area of the droplet is given by 4n(R + w ), which 
gives the change in the droplet energy due to non-zero w, as AE(R) = 4now . Both approaches lead to the 
same Gaussian form of the distributed nucleation rate with w = (8R) estimated from an uncertainty in the 
required activation energy of the order ofkT/2. 

Just as is the case for the LSW theory of Ostwald ripening, the Langer-Schwartz theory is also valid for 
quenches close to the coexistence curve. Its extension to non-zero volume fractions requires that such a theory 
take into account cluster-cluster correlations. A framework for such a theory has been developed [37] using a 
multi-droplet diffusion equation for the concentration field. This equation has been solved analytically using 
(i) a truncated multipole expansion and (ii) a mean field Thomas-Fermi approximation. The equation has also 
been numerically simulated. Such studies are among the first attempts to construct a unified model for the 
entire process of phase separation that combines steady-state homogeneous nucleation theory with the LSW 
mechanism for ripening, modified to account for the inter-cluster correlations. 


A3.3.6 SUMMARY 

In this brief review of dynamics in condensed phases, we have considered dense systems in various situations. 
First, we considered systems in equilibrium and gave an overview of how the space-time correlations, arising 
from the thermal fluctuations of slowly varying physical variables like density, can be computed and 
experimentally probed. We also considered capillary waves in an inhomogeneous system with a planar 
interface for two cases: an equilibrium system and a NESS system under a small temperature gradient. 
Finally, we considered time evolving non-equilibrium systems in which a quench brings a homogeneous 
system to an initially unstable (spinodal decomposition) or metastable state (nucleation) from which it evolves 
to a final inhomogeneous state of two coexisting equilibrium phases. The kinetics of the associated processes 
provides rich physics involving nonlinearities and inhomogeneities. The early-stage kinetics associated with 
the formation of interfaces and the late-stage interface dynamics in such systems continues to provide 


challenging unsolved problems that have emerged from the experimental observations on real systems and 
from the numerical simulations of model systems. 
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A 3.4 Gas-phase kinetics 

David Luckhaus and Martin Quack 


A3.4.1 INTRODUCTION 

Gas-phase reactions play a fundamental role in nature, for example atmospheric chemistry [1, 2, 3, 4 and 5] 
and interstellar chemistry [6], as well as in many technical processes, for example combustion and exhaust 
fume cleansing [7, 8 and 9]. Apart from such practical aspects the study of gas-phase reactions has provided 
the basis for our understanding of chemical reaction mechanisms on a microscopic level. The typically small 
particle densities in the gas phase mean that reactions occur in well defined elementary steps, usually not 
involving more than three particles. 

At the limit of extremely low particle densities, for example under the conditions prevalent in interstellar 
space, ion-molecule reactions become important (see chapter A3. 5 ). At very high pressures gas-phase kinetics 
approach the limit of condensed phase kinetics where elementary reactions are less clearly defined due to the 
large number of particles involved (see chapter A3. 6 ). 

Here, we mainly discuss homogeneous gas-phase reactions at intermediate densities where ideal gas 
behaviour can frequently be assumed to be a good approximation and diffusion is sufficiently fast that 
transport processes are not rate determining. The focus is on thermally activated reactions induced by 
collisions at well defined temperatures, although laser induced processes are widely used for the experimental 
study of such gas-phase reactions (see chapter B2.1 ). The aim of the present chapter is to introduce the basic 
concepts at our current level of understanding. It is not our goal to cover the vast original literature on the 
general topic of gas reactions. We refer to the books and reviews cited as well as to chapter B2.1 for specific 
applications. 

Photochemical reactions ( chapter A3. 13 ) and heterogeneous reactions on surfaces ( chapter A3. 10 ) are 
discussed in separate chapters. 


A3.4.2 DEFINITIONS OF THE REACTION RATE 

The are many ways to define the rate of a chemical reaction. The most general definition uses the rate of 
change of a thermodynamic state function. Following the second law of thermodynamics, for example, the 
change of entropy S with time t would be an appropriate definition under reaction conditions at constant 
energy U and volume V: 


= (-) > 


MO = I zr ) > 0. (A3.4.1) 


An alternative rate quantity under conditions of constant temperature T and volume, frequently realized in gas 
kinetics, would be 


" w ~(t&," 


(A3.4.2) 


where A is the Helmholtz free energy. 

For non-zero v> s and v A the problem of defining the thermodynamic state functions under non-equilibrium 
conditions arises (see chapter A3. 2 ). The definition of rate of change implied by equation (A3. 4.1) and 
equation (A3. 4. 2) includes changes that are not due to chemical reactions. 

In reaction kinetics it is conventional to define reaction rates in the context of chemical reactions with a well 
defined stoichiometric equation 

= ^ VjB; (A3.4.3) 

i 

where v f are the stoichiometric coefficients of species ET (v z - < for reactants and v f > for products, by 
convention). This leads to the conventional definition of the 'rate of conversion': 

»0). ft-ri- 1 ^. (A3A4) 

* ; dt ' df 

The 'extent of reaction' £, is defined in terms of the amount n ; . of species B. (i.e. the amount of substance or 
enplethy n p usually expressed in moles [10]): 

= ■.,<»>-..,(>=<». (A345) 

Ot is an extensive quantity, i.e. for two independent subsystems I and II we have u, {I + II) = o, (7) u. (77). 
For homogeneous reactions we obtain the conventional definition of the 'reaction rate' o . as rate of 
conversion per volume 

H(r)«y- l l -,(/) = Ti" l ^f. (A3.4.6) 


where c^. is the concentration of species B /? for which we shall equivalently use the notation [B^.] (with the 

common unit mol dm i 
use the quantity particle 


common unit mol dm and the unit of o being mol dm 3 s ). In gas kinetics it is particularly common to 
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density for concentration, for which we shall use C x (capital letter) with 


(A3.4.7) 


vcU)=N A V- l v s it) = V i 


-1 #i.fn — 


-,<JC; 


d/ 


N A is Avogadro's constant. The most commonly used unit then is cm s , sometimes inconsistently written 

— 3 —1 

(molecule cm J s . u c is an intensive quantity. Table A3. 4.1 summarizes the definitions. 
Table A3.4.1 Definitions of the reaction rate. 


Constraint 

Extensive quantity 

Intensive quantity 

Reaction rate 

U, V = constant 

adiabatic 

Entropy S 

tthcrnnodyn amies of 
irreversible processes) 

Local entropy 

d5 n 
unit: JK - ' s" 1 

unit: J K" 1 s" 1 cm -3 
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He] mboltz energy 

Local A 

„ A = -->0 

unit: J s" 1 

isothermal 

A = U -TS 

A &A 




unit: J s" 1 cm -3. 

V = constant 

Amount of 

Concentration 

dwi dci 

— or — 

df dr 

unit: mol s" 1 or mol cm -3 s" 1 
df 1 d£ i dc* 

dt J V dt "' Vi dt 

unit: mol s _1 , mol cm -3 s~ l 

or molecule cm -3 s _1 

isothermal or 
adiabatie, fixed 
stoichiometry 

substance n lt 
number of particles Ni 

extent of react ion f 
d? = v~ l dn t 

$V V 




I dd 

u *"dT 

Figure A3. 4.1 shows as an example the time depe: 

ndent concentrations and • 

entropy for the simple 


decomposition reaction of chloroethane: 


C 3 H s Cl = CaH 4 + HCL 


(A3.4.8) 


The slopes of the functions shown provide the reaction rates according to the various definitions under the 
reaction conditions specified in the figure caption. These slopes are similar, but not identical (nor exactly 
proportional), in this simple case. In more complex cases, such as oscillatory reactions ( chapter A3. 14 and 
chapter C3. 6 ), the simple definition of an overall rate law through equation (A3. 4. 6) loses its usefulness, 
whereas equation (A3. 4.1) could still be used for an isolated system. 
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Figure A3.4.1. Concentration and entropy as functions of time for reaction equation (A3.4.8). S m is the 
maximum value of the entropy ([20]). 


A3.4.3 EMPIRICAL RATE LAWS AND REACTION ORDER 

A general form of the 'rate law', i.e. the differential equation for the concentrations is given by 


^<') = v f l ^ = /fa*^---)- 


(A3.4.9) 


The functional dependence of the reaction rate on concentrations may be arbitrarily complicated and include 
species not appearing in the stoichiometric equation, for example, catalysts, inhibitors, etc. Sometimes, 
however, it takes a particularly simple form, for example, under certain conditions for elementary reactions 
and for other relatively simple reactions: 


Uc to = *n^ j 


(A3.4.10) 


with a concentration-independent and frequently time-independent 'rate coefficient' or 'rate constant' k. m i is 
the order of the reaction with respect to the species B f and the total order of the reaction m is given by 


ttf 


= J2 m ' 


(A3.4.11) 


where m and m. are real numbers. Table A3. 4. 2 summarizes a few examples of such rate laws. In general, one 
may allow for rate coefficients that depend on time (but not on concentration) [11]. 


Table A3.4.2 Rate laws, reaction order, and rate constants. 
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If certain species are present in large excess, their concentration stays approximately constant during the 
course of a reaction. In this case the dependence of the reaction rate on the concentration of these species can 
be included in an effective rate constant k ^ The dependence on the concentrations of the remaining species 
then defines the apparent order of the reaction. Take for example equation (A3. 4. 10) with c f >1 !&Cy The 

result would be apseudo m^th order effective rate law: 


i' c {0 = k^c! il 


(A3.4.12) 


*u!T=*n<f ■ 


(A3.4.13) 


This is the situation exploited by the so-called isolation method to determine the order of the reaction with 
respect to each species (see chapter B2.1 ). It should be stressed that the rate coefficient k in (A3. 4. 10 ) depends 
upon the definition of the v f in the stoichiometric equation. It is a conventionally defined quantity to within 
multiplication of the stoichiometric equation by an arbitrary factor (similar to reaction enthalpy). 


The definitions of the empirical rate laws given above do not exclude empirical rate laws of another form. 
Examples are reactions, where a reverse reaction is important, such as in the cis-trans isomerization of 1,2- 
dichloroethene: 


cix — C2H2O2 = trans — C2H2CI2 


di 


= k ti [tix] -kb[tranx] 


(A3.4.14) 
(A3.4.15) 


or the classic example of hydrogen bromide formation: 


-H:+-Br 2 =UBr (A3.4.16) 


«., [H;][Br2] „( 1+ ,^l)-' 


(A3.4.17) 


Neither (A3.4.15) nor (A3.4.17) is of the form (A3. 4. 10) and thus neither reaction order nor a unique rate 
coefficient can be defined. Indeed, the number of possible rate laws that are not of the form of (A3. 4. 10) 
greatly exceeds those cases following (A3. 4. 10) . However, certain particularly simple reactions necessarily 
follow a law of type of (A3. 4. 10) . They are particularly important from a mechanistic point of view and are 
discussed in the next section. 


A3.4.4 ELEMENTARY REACTIONS AND MOLECULARITY 

Sometimes the reaction orders m. take on integer values. This is generally the case, if a chemical reaction 

A B^ product* (A3.4.18) 

or 

2 A \ B -> products (A3.4.19) 

takes place on a microscopic scale through direct interactions between particles as implied by equation 
(A3.4.18) or equation (A3.4.19). Thus, the coefficients of the substances in (A3.4.18) and (A3. 4. 19) represent 
the actual number of particles involved in the reaction, rather than just the stoichiometric coefficients. To keep 
the distinction clear we shall reserve the reaction arrow '— »' for such elementary reactions. Sometimes the 
inclusion of the reverse elementary reaction will be signified by a double arrow '^±\ Other, compound 
reactions can always be decomposed into a set of — not necessarily consecutive — elementary steps 
representing the reaction mechanism. 


Elementary reactions are characterized by their molecularity, to be clearly distinguished from the reaction 
order. We distinguish uni- (or mono-), bi-, and trimolecular reactions depending on the number of particles 
involved in the 'essential' step of the reaction. There is some looseness in what is to be considered 'essential', 
but in gas kinetics the definitions usually are clearcut through the number of particles involved in a reactive 
collision; plus, perhaps, an additional convention as is customary in unimolecular reactions. 

A3.4.4.1 UNIMOLECULAR REACTIONS 

Strictly unimolecular processes — sometimes also called monomolecular — involve only a single particle: 


(A3.4.20) 


A — *■ products. 

Classic examples are the spontaneous emission of light or spontaneous radioactive decay. In chemistry, an 
important class of monomolecular reactions is the predissociation of metastable (excited) species. An example 
is the formation of oxygen atoms in the upper atmosphere by predissociation of electronically excited 2 
molecules [12, 13 and 14]: 

O^ 20. (A3.4.21) 

Excited O 5 molecules are formed by UV light absorption. Monomolecular reactions 
(e.g., c = [O?]) show a first-order rate law: 

4c 

=kc(t), (A3.4.22) 

Integration of the differential equation with time-independent k leads to the familiar exponential decay: 

c(r) = f;(D)exp{-*0. (A3.4.23) 

The rate constant in this case is of the order of 10 s depending on the rovibronic level considered. 

Another example of current interest is the vibrational predissociation of hydrogen bonded complexes such as 
(HF) 2 : 


H 


\ 


(*) -^21 IF, (A3.4.24) 


H-F 


With one quantum of non-bonded (HF)-stretching excitation (*) the internal energy (-50 kJ mol ) is about 
four times in excess of the hydrogen bond dissociation energy (12.7 kJ mol -1 ). At this energy the rate constant 
is about k « 5 x 10 7 s _1 [15]. With two quanta of (HF)-stretching (at about seven times the dissociation 

O 1 

energy) the rate constant is k « 7.5 x 10 s~ in all cases, depending on the rovibrational level considered [ 16 , 
17]. 

While monomolecular collision-free predissociation excludes the preparation process from explicit 
consideration, thermal unimolecular reactions involve collisional excitation as part of the unimolecular 
mechanism. The simple mechanism for a thermal chemical reaction may be formally decomposed into three 
(possibly reversible) steps (with rovibronically excited (CH 3 NC)*): 

CHyNC + M ^ (CH^NCr + M (A3.4.25) 


(CH}NC)* ^ (CHjCN)* (A3.4.26) 


(CH^CN)* + M ^ CHjCN + M. (A3.4.27) 

The inert collision partner M is assumed to be present in large excess: 

[M] » [CH 3 NC] (A3.4.28) 

[M] ^ constant, (A3.4.29) 

This mechanism as a whole is called 'unimolecular' since the essential isomerization step equation (A3. 4. 26) 
only involves a single particle, viz. CH 3 NC. Therefore it is often simply written as follows: 

CHjNC [ 4? CH 3 CN. (A3.4.30) 

Experimentally, one finds the same first-order rate law as for monomolecular reactions, but with an effective 
rate constant k that now depends on [M]. 

-— = A([M]Mf). (A3.4.31) 

The correct treatment of the mechanism (equation (A3. 4.25), equation (A3. 4. 26) and equation (A3. 4. 27), 
which goes back to Lindemann [ 18 ] and Hinshelwood [19], also describes the pressure dependence of the 
effective rate constant in the low-pressure limit ([M] < [CH 3 NC], see section A3.4.8.2 ). 


The unimolecular rate law can be justified by a probabilistic argument. The number (N A Vdc oc dc) of 
particles which react in a time dt is proportional both to this same time interval dt and to the number of 
particles present (7V A Vc oc c). However, this probabilistic argument need not always be valid, as illustrated in 
figure A3. 4. 2 for a simple model [20]: 

A number of particles perform periodic rotations in a ring-shaped container with a small opening, through 
which some particles can escape. Two situations can now be distinguished. 



1 2 

Figure A3.4.2. A simple illustration of limiting dynamical behaviour: case 1 statistical, case 2 coherent (after 
[20]). 

Case 1. The particles are statistically distributed around the ring. Then, the number of escaping 
particles will be proportional both to the time interval (opening time) dt and to the total number of 
particles in the container. The result is a first-order rate law. 

Case 2. The particles rotate in small packets ('coherently' or 'in phase'). Obviously, the first-order 
rate law no longer holds. In chapter B2.1 we shall see that this simple consideration has found a 
deeper meaning in some of the most recent kinetic investigations [21]. 

A3.4.4.2 BIMOLECULAR REACTIONS 

Bimolecular reactions involve two particles in their essential step. In the so-called self-reactions they are of 
the same species: 


A + A^ products 


(A3.4.32) 


with the stoichiometric equation 


2A = product 


(A3.4.33) 


Typical examples are radical recombinations: 


CHj + CHj -* (QH 6 )* 


(A3.4.34) 
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(C 2 H«r + M^C 2 H«. 


(A3.4.35) 


Here the initially formed excited species (C 2 H 6 )* is sufficiently long lived that the deactivation step (equation 
(A3.4.35)) is not essential and one writes 


CI I.i * CHi^Calk 


(A3.4.36) 


The rate is given by the second-order law (c = [CH 3 ] or c = [A]) 


--— = ki: 2 . (A3.4.37) 

2 d/ 

Integration leads to 

= 2kt + — — . (A3.4.38) 

Bimolecular reactions between different species 

A + B -► products (A3.4.39) 

lead to the second-order rate law 

-^L=tCAC BT (A3.4.40) 

For c B (0) ^ c A (0) the solution of this differential equation is 

In C-^) - In C-^) = < C B<°> " c AC0))ftf. (A3.4.41) 

The case of equal concentrations, c B = c A = c(?), is similar to the case A + A in equation (A3.4.37), except for 
the stoichiometric factor of two. The result thus is 

i „ l 

= kt + . (A3.4.42) 

f(0 r(0) 
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If one of the reactants is present in large excess c B ^c A its concentration will essentially remain constant 
throughout the reaction. Equation (A3 .4.41) then simplifies to 


r A (f) = <; A (0>ex P {-W} (A3.4.43) 

with the effective pseudo first-order rate constant k Q ^= kc B . 

One may justify the differential equation (A3.4.37) and equation (A3. 4.40) again by a probability argument. 
The number of reacting particles N A Vdc oc dc is proportional to the frequency of encounters between two 
particles and to the time interval dt. Since not every encounter leads to reaction, an additional reaction 
probability P R has to be introduced. The frequency of encounters is obtained by the following simple 
argument. Assuming a statistical distribution of particles, the probability for a given particle to occupy a 


volume element SFis proportional to the concentration c. If the particles move independently from each other 
(ideal behaviour) the same is true for a second particle. Therefore the probability for two particles to occupy 

the same volume element (an encounter) is proportional to c 2 . This leads to the number of particles reacting in 
the time interval dt: 


N A Vdttt P R r 2 di. 


(A3.4.44) 


In the case of bimolecular gas-phase reactions, 'encounters' are simply collisions between two molecules in 
the framework of the general collision theory of gas-phase reactions ( section A3.4.5.2 ). For a random thermal 
distribution of positions and momenta in an ideal gas reaction, the probabilistic reasoning has an exact 
foundation. However, as noted in the case of unimolecular reactions, in principle one must allow for 
deviations from this ideal behaviour and, thus, from the simple rate law, although in practice such deviations 
are rarely taken into account theoretically or established empirically. 

The second-order rate law for bimolecular reactions is empirically well confirmed. Figure A3. 4. 3 shows the 
example of methyl radical recombination ( equation (A3. 4. 36) ) in a graphical representation following 
equation (A3. 4. 3 8) [22, 23 and 24]. For this example the bimolecular rate constant is 


Jt = 4.4 x 10 


IL emV 


(A3.4.45) 


or 


k = 2,6x LQ 13 cm 3 mol-V 1 , 


(A3.4.46) 


It is clear from figure A3. 4. 3 that the second-order law is well followed. However, in particular for 
recombination reactions at low pressures, a transition to a third-order rate law (second order in the 
recombining species and first order in some collision partner) must be considered. If the non-reactive collision 
partner M is present in excess and its concentration [M] is time-independent, the rate law still is pseudo- 
second order with an effective second-order rate coefficient proportional to [M]. 
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Figure A3.4.3. Methyl radical recombination as a second-order reaction (after [22, 23]). 


A3.4.4.3 TRIMOLECULAR REACTIONS 

Trimolecular reactions require the simultaneous encounter of three particles. At the usually low particle 
densities of gas phase reactions they are relatively unlikely. Examples for trimolecular reactions are atom 
recombination reactions 


A + B+M^AB + M (A3.4.47) 

with the stoichiometric equation 

A+B = AB. (A3.4.48) 

In contrast to the bimolecular recombination of polyatomic radicals ( equation (A3. 4. 34) ) there is no long- 
lived intermediate AB* since there are no extra intramolecular vibrational degrees of freedom to 
accommodate the excess energy. Therefore, the formation of the bond and the deactivation through collision 
with the inert collision partner M have to occur simultaneously (within 10-100 fs). The rate law for 
trimolecular recombination reactions of the type in equation (A3 .4.47) is given by 

t [ ' A = Jt[M]r A c B (A3.4.49) 


d/ 

as can be derived by a probability argument similar to bimolecular reactions (and with similar limitations). 
Generally, collisions with different collision partners M f may have quite different efficiencies. The rate law 
actually observed is therefore given by 
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l " X = ^2k f [Mi]cME. (A3.4.50) 


d/ 


If the dominant contributions k.[M] are approximately constant, this leads to pseudo second-order kinetics 
with an effective rate constant 


k^= J^ki[Mi\. (A3.4.51) 


The recombination of oxygen atoms affords an instructive example: 


Q-0 + 0-^0 2 + (A3.4.52) 

O - O + Oi 5 2 - 2 (A3.4.53) 


with the common stoichiometric equation 


20 = 2 . 


(A3.4.54) 


Here k ^ 02 because (A3 .4. 52) proceeds through a highly-excited molecular complex O^yith particularly 

efficient redistribution pathways for the excess energy. As long as [O] > [0 2 ] the rate law for this trimolecular 
reaction is given by (c(t) = [O], k = k ): 


2dr 


(A3.4.55) 


Integration leads to 


1 =4*, + ' 


c{ty 


t(0) 


(A3.4.56) 


Trimolecular reactions have also been discussed for molecular reactions postulating concerted reactions via 
cyclic intermediate complexes, for example 


2NO i O: 
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K 


^^ .N 

^^ "' 


Empirically, one indeed finds a third-order rate law 


2N0 3 - 


(A3.4.57) 


1 d[NO] 

2 d/ 


= *[NO] 2 [0 2 ]- 


(A3.4.58) 


However, the postulated trimolecular mechanism is highly questionable. The third-order rate law would also 
be consistent with mechanisms arising from consecutive bimolecular elementary reactions, such as 


NO - NO ^ (NO), 


(A3.4.59) 


(NO) 2 I 2 -> 2N0 2 


(A3.4.60) 


or 


NO + 2 ^ N0 3 


(A3.4.61) 


NCh * NO -> 2N0 2 . (A3.4.62) 

In fact, the bimolecular mechanisms are generally more likely. Even the atom recombination reactions 
sometimes follow a mechanism consisting of a sequence of bimolecular reactions 

A + M == AM (A3.4.63) 

AM + A -> A5 + M. (A3.4.64) 

This so-called complex mechanism has occasionally been proven to apply [ 25 , 26 ]. 
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A3.4.5 THEORY OF ELEMENTARY GAS-PHASE REACTIONS 

A3.4.5.1 GENERAL THEORY 

The foundations of the modern theory of elementary gas-phase reactions lie in the time-dependent molecular 
quantum dynamics and molecular scattering theory, which provides the link between time-dependent quantum 
dynamics and chemical kinetics (see also chapter A3. 11 ). A brief outline of the steps in the development is as 
follows [27]. 

We start from the time-dependent Schrodinger equation for the state function (wave function *¥ (ff) of the 
reactive molecular system with Hamiltonian operator /?: 


ifi — ^H*(l). (A3.4.65) 

dt 

Its solution can be written in terms of the time evolution operator V 

*(I) = ti(Mo)*(/ u ) (A3.4.66) 

which satisfies a similar differential equation 

at 

For time-independent Hamiltonians we have 

U(t 7 r ) = exp[-i//(r - *&)/&]. (A3.4.68) 

For strictly monomolecular processes the general theory would now proceed by analysing the time-dependent 


wavefunction as a function of space (and perhaps spin) coordinates {q f } of the particles in terms of time- 
dependent probability densities. 

P({qtlt)= |*(|<fi}.OI 2 (A3.4.69) 

which are integrated over appropriate regions of coordinate space assigned to reactants and products. These 
time-dependent probabilities can be associated with time-dependent concentrations, reaction rates and, if 
applicable, rate coefficients. 
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For thermal unimolecular reactions with bimolecular collisional activation steps and for bimolecular reactions, 
more specifically one takes the limit of the time evolution operator for t^ —> - go and t — » + go to describe 
isolated binary collision events. The corresponding matrix representation of Vis called the scattering matrix or 
S-matrix with matrix elements 


S p = UjiQ -+ +00, h ~> -00). (A3.4.70) 

The physical interpretation of the scattering matrix elements is best understood in terms of its square modulus 

Pfi =\S fi \ 2 (A3.4.71) 

which is the transition probability between an initial fully specified quantum state | i ) before the collision and 
a final quantum state \f) after the collision. 

In a third step the S-matrix is related to state-selected reaction cross sections a ^, in principle observable in 
beam scattering experiments [28, 29, 30, 31, 32, 33, 34 and 35], by the fundamental equation of scattering 
theory 

°f i = lyl 5 ^' ~ S M 2 ' (A3.4.72) 

Here 8^ = 1(0) is the Kronecker delta for/= i (f^ i) and k f is the wavenumber for the collision, related to the 
initial relative centre of mass translational energy E f i before the collision 


* J =B"V2/^r.f (A3.4.73) 

with reduced mass |u for the collision partners of mass m A and m B : 

H = fflAmB . (A3.4.74) 

Actually equation (A3. 4. 72) for a^ is still formal, as practically observable cross sections, even at the highest 
quantum state resolution usually available in molecular scattering, correspond to certain sums and averages of 


the individual a ^. We use capital indices for such coarse-grained state-selected cross sections 

on = (<Jfi). (A3.4.75) 
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In a fourth step the cross section is related to a state-selected specific bimolecular rate coefficient 


kfi(E t ) = <T FI (E u W2Eu/H- < A3A76 ) 

This rate coefficient can be averaged in a fifth step over a translational energy distribution P (E f ) appropriate 
for the bulk experiment. In principle, any distribution P (£" ) as applicable in the experiment can be introduced 
at this point. If this distribution is a thermal Maxwell-Boltzmann distribution one obtains a partially state- 
selected thermal rate coefficient 




(A3.4.77) 


In a final, sixth step one may also average (sum) over a thermal (or other) quantum state distribution / (and F) 
and obtain the usual thermal rate coefficient 

k{T)={k Fl (T)l (A3.4.78) 

Figure A3. 4.4 summarizes these steps in one scheme. Different theories of elementary reactions represent 
different degrees of approximations to certain averages, which are observed in experiments. 

There are two different aspects to these approximations. One consists in the approximate treatment of the 
underlying many-body quantum dynamics; the other, in the statistical approach to observable average 
quantities. An exhaustive discussion of different approaches would go beyond the scope of this introduction. 
Some of the most important aspects are discussed in separate chapters (see chapter A3. 7 , chapter A3. 11 , 
chapter A3. 12 , chapter A3. 13 ). 

Here, we shall concentrate on basic approaches which lie at the foundations of the most widely used models. 
Simplified collision theories for bimolecular reactions are frequently used for the interpretation of 
experimental gas-phase kinetic data. The general transition state theory of elementary reactions forms the 
starting point of many more elaborate versions of quasi-equilibrium theories of chemical reaction kinetics [27, 
36, 37 and 381. 
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Figure A3.4.4. Steps in the general theory of chemical reactions. 

In practice, one of the most important aspects of interpreting experimental kinetic data in terms of model 
parameters concerns the temperature dependence of rate constants. It can often be described 
phenomenologically by the Arrhenius equation [39, 40 and 41] 


k(T) = AU)exp[-£ A {I )/RT} 


(A3.4.79) 


where the pre-exponential Arrhenius factor A and the Arrhenius activation energy E A generally depend on the 
temperature. R is the gas constant. This leads to the definition of the Arrhenius parameters: 


£ iff ? din UCO) 

E A (T) C ^RT 2 ]-^- 

d/ 
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(A3.4.80) 


A(T)=k{T)Qxp 


\ E A (T) ] 


(A3.4.81) 


The usefulness of these definitions is related to the usually weak temperature dependence of E A and A. In the 
simplest models they are constant, whereas k(T) shows a very strong temperature dependence. 


A3.4.5.2 SIMPLE COLLISION THEORIES OF BIMOLECULAR REACTIONS 

A bimolecular reaction can be regarded as a reactive collision with a reaction cross section a that depends on 
the relative translational energy E t of the reactant molecules A and B (masses m A and m B ). The specific rate 
constant k(E^) can thus formally be written in terms of an effective reaction cross section a, multiplied by the 
relative centre of mass velocity v rel 

k{E x ) = o(E t )v K[ = aiEiWlEtffi. (A3.4.82) 

Simple collision theories neglect the internal quantum state dependence of a. The rate constant as a function 
of temperature T results as a thermal average over the Maxwell-Boltzmann velocity distribution^^): 

k(T) = / p(Edk(E t )dE t = {lt re |}{<7), (A34.83) 

Here one has the thermal average centre of mass velocity 


and the thermally averaged reaction cross section 


- x /8%r (A3.4.84) 


We use the symbol k B for Boltzmann's constant to distinguish it from the rate constant k. Equation (A3.4.85) 
defines the thermal average reaction cross section (a). 
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In principle, the reaction cross section not only depends on the relative translational energy, but also on 
individual reactant and product quantum states. Its sole dependence on E t in the simplified effective 
expression ( equation (A3. 4. 82) ) already implies unspecified averages over reactant states and sums over 
product states. For practical purposes it is therefore appropriate to consider simplified models for the energy 
dependence of the effective reaction cross section. They often form the basis for the interpretation of the 
temperature dependence of thermal cross sections. Figure A3. 4. 5 illustrates several cross section models. 


<Tn 



K 


^0 


£1 


Figure A3.4.5. Simple models for effective collision cross sections a: hard sphere without threshold (dotted 
line) hard sphere with threshold (dashed line) and hyperbolic threshold (full curve). E t is the (translational) 
collision energy and Eq is the threshold energy. a Q is the hard sphere collision cross section. The dashed-- 
dotted curve is of the generalized type o R (E t > Eq) = a Q (1 - E^/E^ exp[(l - E^IE^)l(aE^)\ with the parameter 


a = 3 E. 


o- 


(A) HARD SPHERE COLLISIONS 

The reactants are considered as hard spheres with radii r A and r B , respectively. A (reactive) collision occurs 
on contact yielding a constant cross section a Q independent of the energy: 


ff = n(r a + ?]})* 


(A3.4.86) 


k(T) = v* 


r st B r 

TTfl 


(A3.4.87) 


(BJ CONSTANT CROSS SECTION WITH A THRESHOLD 

The reaction can only occur once the collision energy reaches at least a value E^. The reaction cross section 
remains constant above this threshold: 
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tor £ t < £o 

for £ t > £ 


(A3.4.88) 


*«-W?('-&H-£l- 


(A3.4.89) 


(C) CROSS SECTION WITH A HYPERBOLIC THRESHOLD 


Again, the reaction requires a minimum collision energy Eq, but increases only gradually above the threshold 
towards a finite, high-energy limit a Q : 





for Ei < Eu 


"~ * ^('"fr) for£ t >£ (] 


(A3.4.90) 


jSkuT f £(> 


(A3.4.91) 


(DJ GENERALIZED COLLISION MODEL 

The hyperbolic cross section model can be generalized further by introducing a function/(A£) (AE = E t - Eq) 
to describe the reaction cross section above a threshold: 


n = 





for E L < /l'o 

" ¥) fiAE) for £t - £ ° 


(A3.4.92) 


ft(D = ffoj g(T) sxp 

V ^^ 


ill 


/ M Ai: . f AE 1 /Ai?\ 


(A3.4.93) 


(A3.4.94) 
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A3.4.6 TRANSITION STATE THEORY 


Transition state theory or 'activated complex theory' has been one of the most fruitful approximations in 
reaction kinetics and has had a long and complex history [42, 43 and 44]. Transition state theory is originally 
based on the idea that reactant molecules have to pass a bottleneck to reach the product side and that they stay 
in quasi-equilibrium on the reactant side until this bottleneck is reached. The progress of a chemical reaction 
can often be described by the motion along a reaction path in the multidimensional molecular configuration 
space. Figure A3. 4. 6 shows typical potential energy profiles along such paths for uni- and bimolecular 
reactions. The effective potential energy V{r ) includes the zero point energy due to the motion orthogonal to 

the reaction coordinate r . The bottleneck is located at rf, usually coinciding with an effective potential 

q 

barrier, i.e. a first-order saddle point of the multidimensional potential hypersurface. Its height with respect to 
the reactants' zero point level is E . In its canonical form the transition state theory assumes a thermal 


equilibrium between the reactant molecules A and molecules X moving in some infinitesimal range 8 over the 
barrier towards the product side. For the unimolecular case this yields the equilibrium concentration: 




(A3.4.95) 


qx = -q*&V27riiksTfh 2 


(A3.4.96) 


where h is Planck's constant, q stands for molecular partition functions referred to the corresponding zero 

point level. Thus q A is the partition function for the reactant A. q * is the restricted partition function for fixed 

reaction coordinate r = r* referring to the top of the effective (i.e. zero point corrected) barrier. It is often 
called the 'partition function of the transition state' bearing in mind that — in contrast to the X molecules — it 

does not correspond to any observable species. Rather, it defines the meaning of the purely technical term 

'transition state'. Classically it corresponds to a (3 N- 7)-dimensional hypersurface in the (3 N- 6)- 

dimensional internal coordinate space of an TV atomic system. The remainder of (A3. 4. 96) derives from the 

classical partition function for the motion in a one-dimensional box of length 8 with an associated reduced 

mass |i. The factor of one half accounts for the fact that, in equilibrium, only half of the molecules located 

within r* ±8/2 move towards the product side. 
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C + B 


<1 q 

Figure A3.4.6. Potential energy along the reaction coordinate r for an unimolecular isomerization (left) and a 
bimolecular reaction (right). r r fis the location of the transition state at the saddle point. [X] is the 
concentration of molecules located within 8 of ^moving from the reactant side to the product side (indicated 
by the arrow, which is omitted in the text). 

Assuming a thermal one-dimensional velocity (Maxwell-Boltzmann) distribution with average velocity 
%/2fr|j77irjuth e reaction rate is given by the equilibrium flux if (1) the flux from the product side is neglected 
and (2) the thermal equilibrium is retained throughout the reaction: 


d[Al _ TV ^2k B T/nn 


(A3.4.97) 


Combining equation (A3. 4. 95) , equation (A3. 4. 96) and equation (A3. 4.97) one obtains the first Ey ring 
equation for unimolecular rate constants: 

* Uni <r) = ^—^{-Eu/kvTl (A3.4.98) 

A completely analogous derivation leads to the rate coefficient for bimolecular reactions, where flare partition 
functions per unit volume: 

t bl( y ) = ^^c*p{-£ / V/l ( A3A ") 

h q A qn 

In the high barrier limit, Eq S>£ B T, E^ is approximately equal to the Arrhenius activation energy. The ratio of 

the partition functions is sometimes called the 'statistical' or 'entropic' factor. Its product with the 'universal 
frequency factor' k B T/h corresponds approximately to Arrhenius' pre-exponential factor A(T). 
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The quasi-equilibrium assumption in the above canonical form of the transition state theory usually gives an 
upper bound to the real rate constant. This is sometimes corrected for by multiplying (A3.4.98) and (A3.4.99) 
with a transmission coefficient < k < 1. 

In a formal analogy to the expressions for the thermodynamical quantities one can now define the standard 
enthalpy A^ H and entropy A^ a of activation. This leads to the second Eyring equation: 


k(T) = ^explA^/fllexpl-A^/tfJ-} | ^±- ) (A3.4.100) 


li 


m 


where p is the standard pressure of the ideal gas (j = for unimolecular andy = 1 for bimolecular reactions). 
As a definition (A3. 4. 100) is strictly identical to (A3.4.98) and (A3. 4. 99) if considered as a theoretical 

equation. Since neither A * S nor A * H are connected to observable species, equation (A3. 4. 100) may also 

be taken as an empirical equation, viz. an alternative representation of Arrhenius' equation ( equation 

e e 

(A3.4.79) ). In the field of thermochemical kinetics [43] one tries, however, to estimate A * H and A * S on 

the basis of molecular properties. 


There is an immediate connection to the collision theory of bimolecular reactions. Introducing internal 
partition functions q mV excluding the (separable) degrees of freedom for overall translation, 

q — yim^traiis (A3.4.101) 

with 

(A3.4.102) 


-y^F)' 




and comparing with equation (A3. 4. 83) the transition state theory expression for the effective thermal cross 
section of reaction becomes 

<*> = « *' '* exp{-F /fr tf T} (A3.4.103) 

where Fis the volume, M= m A + m B is the total mass, and |u AB is the reduced mass for the relative 
translation of A and B. One may interpret equation (A3.4.103) as the transition state version of the collision 
theory of bimolecular reactions: Transition state theory is used to calculate the thermally averaged reaction 
cross section to be inserted into equation (A3. 4. 83) . 
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A3.4.7 STATISTICAL THEORIES BEYOND CANONICAL TRANSITION 
STATE THEORY 

Transition state theory may be embedded in the more general framework of statistical theories of chemical 
reactions, as summarized in figure A3. 4. 7 [ 27 , 36 ]. Such theories have aimed at going beyond canonical 
transition state theory in several ways. The first extension concerns reaction systems with potential energy 
schemes depicted in figure A3. 4. 8 (in analogy to figure A3. 4. 6 ), where one cannot identify a saddle point on 
the potential hypersurface to be related to a transition state. The left-hand diagram corresponds to a complex 
forming bimolecular reaction, and the right-hand to a direct barrierless bimolecular reaction. The individual 
sections (the left- and right-hand parts) of the left diagram correspond to the two unimolecular dissociation 
channels for the intermediate characterized by the potential minimum. These unimolecular dissociation 
channels correspond to simple bond fissions. The general types of reactions shown in figure A3.4.8 are quite 
abundant in gas kinetics. Most ion molecule reactions as well as many radical-radical reactions are of this 
type. Thus, most of the very fast reactions in interstellar chemistry, atmospheric and combustion chemistry 
belong to this class of reaction, where standard canonical transition state theory cannot be applied and 
extension is clearly necessary. A second extension of interest would apply the fundamental ideas of transition 
state theory to state-selected reaction cross sections (see section A3.4. 5.1 ). This theoretical program is carried 
out in the framework of phase space theory [45, 46] and of the statistical adiabatic channel model [27, 47], the 
latter being more general and containing phase space theory as a special case. In essence, the statistical 
adiabatic channel model is a completely state-selected version of the transition state theory. Here, the starting 
point is the S-matrix element (equation (A3. 4. 104)), which in the statistical limit takes the statistically 
averaged form 


.,- ,a, [ W(E, J) L for strongly coupled channels 

(|S "' >'■'"= U.* for weaHy coup W channels <A3A104 > 

where W {E, J) is the total number of adiabatically open reaction channels for a given total angular momentum 
quantum number J (or any other good quantum number). () F 1 A E refers to the averaging over groups of final 
and initial states ('coarse graining') and over suitably chosen collision energy intervals A E. Following the 


lines of the general theory of reaction cross sections, section A3.4. 5.1 , and starting from equation (A3.4.104) 
one can derive all the relevant kinetic specific reaction cross sections, specific rate constants and lifetimes in 
unimolecular reactions and the thermal rate constants analogous to transition state theory. 
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Figure A3.4.7. Summary of statistical theories of gas kinetics with emphasis on complex forming reactions 
(in the figure A.M. is the angular momentum, after Quack and Troe [27, 36, 74]). The indices refer to the 
following references: (a) [75, 76 and 77]; (7>) [78]; fc) [79, 80 and 81]; ^ [82, 83, 84 and 85]; (e) [86, 87 and 
88]; (f) [36, 37, 42, 89 and 90]; (gj [45, 46, 91]; (7j) [ 92, 93, 94 and 95]; (i) [96, 97, 98, 99, 100, 101, 102, 
103 , 104 and 105]; (j) [106, 107, 108 and 109]; (k) [88, 94, 98, 99]; and (I) [24, 106, 107, 108, 109, 110, HI, 
112 1. 


V{r a i 


V{r 9 ) 


Figure A3.4.8. Potential energy profiles for reactions without barrier. Complex forming bimolecular reaction 
(left) and direct barrierless bimolecular reaction (right). 

We summarize here only the main results of the theory and refer to a recent review [27] for details. The total 
number of adiabatically open channels is computed by searching for channel potential maxima V n m . The 
channel potentials VJr ) are obtained by following the quantum energy levels of the reaction system along 
the reaction path r . An individual adiabatic channel connects an asymptotic scattering channel 
(corresponding to a reactant or to a product quantum level) with the reaction complex. One has the total 
number of open channels as a function of energy E, angular momentum J and other good quantum numbers: 




(A3.4.105) 
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Here h(x) is the Heaviside step function with h(x > 0) = 1 and h(x > 0) = (not to be confused with Planck's 
constant). The limit a(J . . .) indicates that the summation is restricted to channel potentials with a given set of 
good quantum numbers (J . . .). 

A state-to-state integral reaction cross section from reactant level a to product level b takes the form 

"'"'btih W <^> (A3A106) 


Here the levels consist of several states. g a is the reactant level degeneracy and k is the collision wavenumber 
(see equation (A3. 4. 73) ). 

A specific unimolecular rate constant for the decay of a highly excited molecule at energy E and angular 
momentum J takes the form 

*(E,J„.0=r , \ r ; 7 (A3.4.107) 

where y is a dimensionless transmission coefficient (usually < y < 1) and p(E, J,. . .) is the density of 
molecular states. These expressions are relevant in the theory of thermal and non-thermal unimolecular 
reactions and are generalizations of the Rice-Ramsperger-Kassel-Marcus (RRKM) theory (see chapter 
A3. 12 ). 

Finally, the generalization of the partition function q * in transition state theory ( equation (A3. 4. 96) ) is given 
by 


er nl = J>p(-^WW = J^ W{E)exp(-E/kuT) (^Lj 


(A3.4.108) 


with the total number of open channels 

W(E) = J^ J^[2J + ])W(E ¥ J, a). (A3.4.109) 

These equations lead to forms for the thermal rate constants that are perfectly similar to transition state theory, 
although the computations of the partition functions are different in detail. As described in figure A3 .4. 7 
various levels of the theory can be derived by successive approximations in this general state-selected form of 
the transition state theory in the framework of the statistical adiabatic channel model. We refer to the literature 
cited in the diagram for details. 
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It may be useful to mention here one currently widely applied approximation for barrierless reactions, which 
is now frequently called microcanonical and canonical variational transition state theory (equivalent to the 
'minimum density of states' and 'maximum free energy' transition state theory in figure A3 .4.7 . This type of 
theory can be understood by considering the partition functions Q{r ) as functions of r similar to equation 
(A3. 4. 108) but with V (r ) instead of V m _ . Obviously Q(r ) > Q* so that the best possible choice for a 

(A, Cj 6/,IIlclX. (J 

transition state results from minimizing the partition function along the reaction coordinate r : 


Q*(T) = mm Q{r, v 7) = Quf, T). (A3.4.110) 

Equation (A3 .4.1 10) represents the canonical form (T= constant) of the 'variational' theory. Minimization at 
constant energy yields the analogous microcanonical version. It is clear that, in general, this is only an 
approximation to the general theory, although this point has sometimes been overlooked. One may also define 
a free energy 

M^) = --t B 7 In Qirq) (A3.4.111) 

which leads to a maximum free energy condition 

A*(T) = maxA(r<i>T) = A{r^T) r (A3.4.112) 

The free energy as a function of reaction coordinates has been explicitly represented by Quack and Troe [ 36 , 
112 ] for the reaction 

C 2 H* — * 2CH^ (A3.4.113) 

but the general concept goes back to Eyring (see [27, 36]). 


A3.4.8 GAS-PHASE REACTION MECHANISMS 

The kinetics of a system of elementary reactions forming a reaction mechanism are described by a system of 
coupled differential equations. Disregarding transport processes there is one differential equation for each 
species involved. Few examples for these systems of coupled differential equations can be solved exactly in 
closed form. The accurate solution more generally requires integration by numerical methods. In the simplest 
case of reversible elementary reactions the stoichiometry is sufficient to decouple the differential equations 
leading to simple rate laws. For more complicated compound reaction mechanisms this can only be achieved 
with more or less far reaching approximations, usually concerning reactive intermediates. The most important 
are quasi-equilibrium (or partial equilibrium) and the quasi-stationarity (or quasi-steady-state), whose 
practical importance goes far beyond gas-phase kinetics. 
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A3.4.8.1 ELEMENTARY REACTIONS WITH BACK-REACTION 

The simplest possible gas-phase reaction mechanisms consist of an elementary reaction and its back reaction. 


Here we consider uni- and bimolecular reactions yielding three different combinations. The resulting rate laws 
can all be integrated in closed form. 

(A) UNIMOLECULAR REACTIONS WITH UNIMOLECULAR BACK REACTION 

The equation 

JL i 

is the elementary mechanism of reversible isomerization reactions, for example 

|Vlj 

CH^NC ^ CH^CR (A3.4.114) 

[M] 

The rate law is given by 

dt 


= Jt l tA-A_K C . (A3.4.115) 


Exploiting the stoichiometric equation one can eliminate c r . Integration yields the simple relaxation of the 
initial concentrations into the equilibrium, c^ = (c m \(co), with a relaxation time x: 

c A U) -c-J = (c A <0 - ^)exp{-f/r} (A3.4.116) 

1 
T= ■ , ■ ■ (A3.4.117) 

(BJ BIMOLECULAR REACTIONS WITH UNIMOLECULAR BACK REACTION 

For example 

2A^A 2 + 
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The rate law is given by 


~=k 2 ci-k.^. (A3A118) 


C A , 3 


2 d/ 

After transformation to the turnover variable "~ ''' "'" Ai A: integration yield: 


In I I - In f ■ I = M4fA<0) + K - E.v r )j 


(A3.4.119) 


-^4-(!H^)+(f)T 


(A3.4.120) 


where K = k 2 /k_ l is the equilibrium constant. 


(C) BIMOLECULAR REACTIONS WITH BIMOLECULAR BACK-REACTION 


For example 


The rate law is given by 


A + B£=C + D. 


dr 


r A 


d/ 


= ht^CB — ft-KC^D 


(A3.4.121) 


After transformation to the turnover variable x = c A (0) - c A (t), integration yields 


/i-[,/( t , + *)n 
\]-[^/( fl -fr)]/ 


(A3.4.122) 


f ; = 


2(1 - A - " 1 ) 


(A3.4.123) 


■('- 


r A fO)r B (0) - A-'rc(0)CD(0)\ l/2 


1 - K 


i 


"J 


(A3.4.124) 


where K = k_ 2 is the equilibrium constant. 
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Bimolecular steps involving identical species yield correspondingly simpler expressions. 
A3.4.8.2 THE LINDEMANN-HINSHELWOOD MECHANISM FOR UNIMOLECULAR REACTIONS 


The system of coupled differential equations that result from a compound reaction mechanism consists of 
several different (reversible) elementary steps. The kinetics are described by a system of coupled differential 
equations rather than a single rate law. This system can sometimes be decoupled by assuming that the 
concentrations of the intermediate species are small and quasi-stationary. The Lindemann mechanism of 
thermal unimolecular reactions [18, 19] affords an instructive example for the application of such 
approximations. This mechanism is based on the idea that a molecule A has to pick up sufficient energy 


before it can undergo a monomolecular reaction, for example, bond breaking or isomerization. In thermal 
reactions this energy is provided by collisions with other molecules M in the gas to produce excited species 
A*: 


A + M ^ A* + M (A3.4.125) 

A* 4 products. (A3.4.126) 

Two important points must be noted here. 

(1) The collision partners may be any molecule present in the reaction mixture, i.e., inert bath gas 
molecules, but also reactant or product species. The activation (£ a ) and deactivation (kj) rate 

constants in equation (A3 .4. 125) therefore represent the effective average rate constants. 

(1) The collision (k^ k d ) and reaction (k r ) efficiencies may significantly differ between different excited 

reactant states. This is essentially neglected in the Lindemann-Hinshelwood mechanism. In 
particular, the strong collision assumption implies that so much energy is transferred in a collision 
that the collision efficiency can be regarded as effectively independent of the energy. 

With k^ = kJM] and k_^ = k^M] the resulting system of differential equations is 

_^3 = Jt,[A] - ft., [A*] (A3.4.127) 

at 
d[A*J 

d [products] 


= -Jt : [A] + MA"] +*-l[A"] (ASA 128) 

= k T [A*l (A3.4.129) 


dt 
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If the excitation energy required to form activated species A* is much larger than k^T its concentration will 
remain small. This is fulfilled if k a ^k^. Following Bodenstein, [A*] is then assumed to be quasi-stationary, 

i.e. after some initialization phase the concentration of activated species remains approximately constant 
(strictly speaking the ratio [A*]/[A] remains approximately constant (see section A3.4.8.3 )): 


d ^ A J - f (A3.4.130) 


di 


[*']»= I^TT (A3.4.131) 

n. | "I - Plj 


This yields the quasi-stationary reaction rate with an effective unimolecular rate constant 


d[products] 
uc = t = *cfrfAl = A>[A ] gs 


(A3.4.132) 


^cflf = 


Mr 


ftJM]tr 


Jt_! +A> t d [M] i A> 


(A3.4.133) 


The effective rate law correctly describes the pressure dependence of unimolecular reaction rates at least 
qualitatively. This is illustrated in figure A3 .4. 9 . In the limit of high pressures, i.e. large [M], & eff becomes 
independent of [M] yielding the high-pressure rate constant k of an effective first-order rate law. At very low 
pressures, product formation becomes much faster than deactivation, k ^ now depends linearly on [M]. This 
corresponds to an effective second-order rate law with the pseudo first-order rate constant £ Q : 




(A3.4.134) 


h = MM]. 


(A3.4.135) 


In addition to [A*] being quasi-stationary the quasi-equilibrium approximation assumes a virtually 
unperturbed equilibrium between activation and deactivation ( equation (A3 .4. 125) ): 

[A*l h 
[A] *j" 


(A3.4.136) 
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This approximation is generally valid if A: ^&,. For the Lindemann mechanism of unimolecular reactions 
this corresponds to the high-pressure limit & eff = k m . 


[uk 


ln*o ,'* 


In A„ 



In [A/I 


Figure A3.4.9. Pressure dependence of the effective unimolecular rate constant. Schematic fall-off curve for 
the Lindemann-Hinshelwood mechanism, k is the (constant) high-pressure limit of the effective rate constant 


k Q ff and k^ is the low-pressure limit, which depends linearly on the concentration of the inert collision partner 
[M]. 

The approximate results can be compared with the long time limit of the exact stationary state solution 
derived in section A3.4.8.3: 


* t ir= \ {Ai + A-i +tr~[(*i + *-i +t r ) 2 -4ft|A r ] l/2 ). (A3.4.137) 

This leads to the quasi-stationary rate constant of equation (A3.4.133) if4k^k Y <K(^ 1 + k_^ +k), which is 
more general than the Bodenstein condition k^ 4£k^. 

A3.4.8.3 GENERALIZED FIRST-ORDER KINETICS 

The Lindemann mechanism for thermally activated unimolecular reactions is a simple example of a particular 
class of compound reaction mechanisms. They are mechanisms whose constituent reactions individually 
follow first-order rate laws [U, 20, 36, 48, 49, 50, 51, 52, 53, 54, 55 and 56]: 

A^A; ij= L..../V (A3.4.138) 

where TV is the number of different species involved. With c = [A.] this leads to the following system of TV 
coupled differential equations called generalized first-order kinetics: 
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— - 1 = 2_t K U c i ' = 1 N - (A3.4.139) 

The individual reactions need not be unimolecular. It can be shown that the relaxation kinetics after small 
perturbations of the equilibrium can always be reduced to the form of (A3. 4. 138) in terms of extension 
variables from equilibrium, even if the underlying reaction system is not of first order [51, 52, 57, 58 ]. 

Generalized first-order kinetics have been extensively reviewed in relation to technical chemical applications 
[ 59 ] and have been discussed in the context of copolymerization [53]. From a theoretical point of view, the 
general class of coupled kinetic equation (A3.4.138) and equation (A3. 4. 139) is important, because it allows 
for a general closed- form solution (in matrix form) [49]. Important applications include the Pauli master 
equation for statistical mechanical systems (in particular gas-phase statistical mechanical kinetics) [ 48 ] and 
the investigation of certain simple reaction systems [49, 50, 55]. It is the basis of the many-level treatment of 
thermal unimolecular reactions in terms of the appropriate master equations for energy transfer [36, 55, 60, 
61 , 62 and 63]- Generalized first-order kinetics also form the basis for certain statistical limiting cases of 
multiphoton induced chemical reactions and laser chemistry [54, 56]. 

Written in matrix notation, the system of first-order differential equations, (A3.4.139) takes the form 


dccn v .. . 

= Kc(./) (A3.4.140) 


With time independent matrix K it has the general solution 

C(f) ^OXp(-KMc(0). (A3.4.141; 

The exponential function of the matrix can be evaluated through the power series expansion of exp(). c is the 
column vector whose elements are the concentrations c f . The matrix elements of the rate coefficient matrix K 
are the first-order rate constants K... The system is called closed if all reactions and back reactions are 
included. Then K is of rank TV- 1 with positive eigenvalues, of which exactly one is zero. It corresponds to 
the equilibrium state, with concentrations ^determined by the principle of microscopic reversibility: 




^. (A3.4.142) 


In this case K is similar to a real symmetric matrix and equation (A3.4.141) can easily be solved by 
diagonalization of K. 
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If some of the reactions of (A3.4.138) are neglected in (A3.4.139) , the system is called open. This generally 
complicates the solution of (A3.4.141) . In particular, the system no longer has a well defined equilibrium. 
However, as long as the eigenvalues of K remain positive, the kinetics at long times will be dominated by the 
smallest eigenvalue. This corresponds to a stationary state solution. 

As an example we take again the Lindemann mechanism of unimolecular reactions. The system of differential 
equations is given by equation (A3. 4. 127) , equation (A3. 4. 128) and equation (A3. 4. 129) . The rate coefficient 
matrix is 


*. -A-, 


(A3.4.143) 


-k r 


Since the back reaction, products — > A*, has been neglected this is an open system. Still K has a trivial zero 
eigenvalue corresponding to complete reaction, i.e. pure products. Therefore we only need to consider 
(A3 .4. 127) and (A3 .4. 128) and the corresponding (2 x 2) submatrix indicated in equation (A3 .4. 143). 

The eigenvalues A,j < X 2 of K are both positive 

h.2 = i {*] + *-l + k r ± [(it, + Jt_] +*,) 2 - 4*i*,] l/2 } > 0. (A3.4.144) 

For long times, the smaller eigenvalue A,j will dominate (A3.4.141) , yielding the stationary solution 


W))-'»"- i "'(0 

where a and Z? are time-independent functions of the initial concentrations. With the condition A^ ^A, 2 one 
obtains the effective unimolecular rate constant 

*^ = ^— ^ = Xl * , , ■ ' , ■ (A3.4.146) 

For £ a <K^ d this is identical to the quasi-stationary result, equation (A3.4.133) , although only the ratio [A*]/ 

[A] = b/a (equation (A3 .4. 145)) is stationary and not [A*] itself. This suggests d[A*]Ak <Kd[A]Ak as a more 
appropriate formulation of quasi-stationarity. Furthermore, the general stationary state solution (equation 
(A3 .4. 144)) for the Lindemann mechanism contains cases that are not usually retained in the Bodenstein 
quasi-steady-state solution. 

An important example for the application of general first-order kinetics in gas-phase reactions is the master 
equation treatment of the fall-off range of thermal unimolecular reactions to describe non-equilibrium effects 
in the weak collision limit when activation and deactivation cross sections ( equation (A3. 4. 125) ) are to be 
retained in detail [60], 
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General first-order kinetics also play an important role for the so-called local eigenvalue analysis of more 
complicated reaction mechanisms, which are usually described by nonlinear systems of differential equations. 
Linearization leads to effective general first-order kinetics whose analysis reveals information on the time 
scales of chemical reactions, species in steady states (quasi-stationarity), or partial equilibria (quasi- 
equilibrium) [64, 65 and 66 ], 

A3.4.8.4 GENERAL COMPOUND REACTION MECHANISMS 

More general compound reaction mechanisms lead to systems of differential equations of different orders. 
They can sometimes be treated by applying a quasi-stationarity or a quasi-equilibrium approximation. Often, 
this may even work for simple chain reactions. Chain reactions generally consist of four types of reaction 
steps: In the chain initiation steps, reactive species (radicals) are produced from stable species (reactants or 
catalysts). They react with stable species to form other reactive species in the chain propagation. Reactive 
species recovered in the chain propagation steps are called chain carriers. Propagation steps where one 
reactive species is replaced by another less-reactive species are sometimes termed inhibiting. Chain branching 
occurs if more than one reactive species are formed. Finally, the chain is terminated by reactions of reactive 
species, which yield stable species, for example through recombination in the gas phase or at the surface of 
the reaction vessel. 

The assumption of quasi-stationarity can sometimes be justified if there is no significant chain branching, for 
example in HBr formation at 200-3 00°C: 


(1) 

initialization 

Br 2 

i-"i 

2Br* 

(2) 

propagation 

Br* + H 2 

—*■ 

IlBr i 11* 

(3) 


H* + Br 2 

— ► 

HBr + Br' 

(4) 

inhibition 

H* + HBr 

—> 

H 2 + Br* 

(5) 

termination 

Br' f Br" 

[M] 

Br 2 

(6) 


H' + H* 

[Ml 

H 2 

(7) 


H* + Br" 

[M] 

HBr. 


(A3.4.147) 


Chain carriers are indicated by an asterisk. Assuming quasi-stationarity for [H*] and [Br*] and neglecting (6) 
and (7) (because [H*] <K[Br*]) yields 

d[HBr] = 2* 2 (*./*s) l '-[H 2 ][Br 2 ] 1 ' 2 4 

dt H-(Jt 4 [HBr]/* 3 [Br 2 ]) " 

The resulting rate law agrees with the form found experimentally. Of course the postulated mechanism can 
only be proven by measuring the rate constants of the individual elementary steps separately and comparing 
calculated rates of equation (A3.4.148) with observed rates of HBr formation. 
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In general, the assumption of quasi-stationarity is difficult to justify a priori. There may be several possible 
choices of intermediates for which the assumption of quasi-stationary concentrations appears justified. It is 
possible to check for consistency, for example [A*] os 4C[A], but the final justification can only come from a 

comparison with the exact solution. These have usually to be obtained with numerical solvers for systems of 
differential equations [7, 8 and 9]. In particular, if transport phenomena with complicated boundary conditions 
must be taken into account this is the only viable solution. Modern fields of application include atmospheric 
chemistry and combustion chemistry [67, 68]. A classic example is the H 2 /0 2 reaction. The mechanism 
includes more than 60 elementary steps and has been discussed in great detail [69]- A recent analysis of the 
explosion limits of this system in the range of 0.3-15.7 atm and 850-1040 K included 19 reversible 
elementary reactions [67]. Table A3. 4. 3 summarizes some of the major reactions for the hydrogen-oxygen 
reaction. A simplified mechanism involves only six reactions: 


CD 

initiation 

H 2 + 2 

-* 

20H 

(2) 

propagation 

OH + H 2 

— > 

HjO + H 

(3) 

branching 

H + 2 

-> 

OH 1 O 

(4) 

branching 

+ H 2 

-> 

CJH + H 

<5) 

termination 

211 

[M],wall 

u 2 

(6) 

termination 

H+O, 

[Ml 

H0 2 


(A3.4.149) 


3H; + O; = 2H +■ 2H 3 (2+2+3 + 4). 

Reaction (5) proceeds mostly heterogeneously, reaction (6) mostly homogeneously. This mechanism can be 
integrated with simplifying assumptions to demonstrate the main features of gas-phase explosion kinetics [8]. 


The importance of numerical treatments, however, cannot be overemphasized in this context. Over the 
decades enormous progress has been made in the numerical treatment of differential equations of complex 
gas-phase reactions [8, 70, 71 ]. Complex reaction systems can also be seen in the context of nonlinear and 
self-organizing reactions, which are separate subjects in this encyclopedia (see chapter A3. 14 , chapter C3. 6 ). 
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Table A3.4.3. Rate constants for the reaction of H 2 with 2 [73]. The rate constants are given in terms of the 
following expression: k(T) = A(TIKf Qxp(-E/RT). 


A 
Reaction (cm 3 mol" 1 s" 1 ) b E(kJmol" 1 ) 


OH + H 2 ^H 2 + H 1.2 x 10 9 

H + H 2 O^OH + H 2 4.5 x 10 9 

H + 2 ^OH + 2.2 x 10 14 

OH + 0-»H + 2 1.0 x 10 13 

+ H 2 ^OH + H 1.8 x 10 10 

OH + H^O + H 2 8.3 x 10 9 

OH + OH^H 2 + 1.5 x 10 9 

+ H 2 O^OH + OH 1.6 x 10 10 

H + H0 2 ^OH + OH 1.5 x 10 14 

H + H0 2 ^H 2 + 2 2.5 x 10 13 

OH + H0 2 -^ H 2 + O 1.5 x 10 13 

+ H0 2 ^OH + 2 2.0 x 10 13 

cm 6 mol -2 s~ 

H + H + M^H 2 + M 9.0 x10 16 -0-6 

+ OH + M^H0 2 + M 2.2 x10 22 "2-0 

H + 2 + M^H0 2 + M 2.3 x10 18 -0-8 
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1.3 

15.2 

1.3 

78.7 



70.4 





1.0 

37.3 

1.0 

29.1 

1.14 



1.14 

72.4 



4.2 



2.9 










A3.4.9 SUMMARIZING OVERVIEW 


Although the field of gas-phase kinetics remains full of challenges it has reached a certain degree of maturity. 
Many of the fundamental concepts of kinetics, in general take a particularly clear and rigorous form in gas- 
phase kinetics. The relation between fundamental quantum dynamical theory, empirical kinetic treatments, 
and experimental measurements, for example of combustion processes [72], is most clearly established in gas- 
phase kinetics. It is the aim of this article to review some of these most basic aspects. Details can be found in 
the sections on applications as well as in the literature cited. 
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Ion chemistry A3. 5 

A A Viggiano and Thomas M Miller 


A3.5.1 INTRODUCTION 

Ion chemistry is a product of the 20th century. J J Thomson discovered the electron in 1897 and identified it 
as a constituent of all matter. Free positive ions (as distinct from ions deduced to exist in solids or 
electrolytes) were first produced by Thomson just before the turn of the century. He produced beams of light 
ions, and measured their mass-to-charge ratios, in the early 1900s, culminating in the discovery of two 
isotopes of neon in 1912 [1]. This year also marked Thomson's discovery of H*, which turns out to be the 

single most important astrophysical ion and which may be said to mark the beginning of the study of the 
chemistry of ions. Thomson noted that 'the existence of this substance is interesting from a chemical point of 
view', and the problem of its structure soon attracted the distinguished theorist Niels Bohr [2]. (In 1925, the 
specific reaction producing H*was recognized [2].) The mobilities of electrons and ions drifting in weak 

electric fields were first measured by Thomson, Rutherford and Townsend at the Cavendish Laboratory of 
Cambridge University in the closing years of the 19th century. The average mobility of the negative charge 
carrier was observed to increase dramatically in some gases, while the positive charge carrier mobility was 
unchanged — the anomalous mobility problem — which led to the hypothesis of electron attachment to 
molecules to form negative ions [3]. In 1936, Eyring, Hirschfelder and Taylor calculated the rate constant for 
an ion-molecule reaction (the production of H*!), showing it to be 100 times greater than for a typical neutral 

reaction, but it was not until 20 years later that any ion-molecule rate constant was measured experimentally 
[4]. Negative ion-molecule reactions were not studied at all until 1957 [5]. 

In this section, the wide diversity of techniques used to explore ion chemistry and ion structure will be 
outlined and a sampling of the applications of ion chemistry will be given in studies of lamps, lasers, plasma 
processing, ionospheres and interstellar clouds. 

Note that chemists tend to refer to positive ions as cations (attracted to the cathode in electrolysis) and 
negative ions as anions (attracted to an anode). In this section of the encyclopedia, the terms positive ion and 
negative ion will be used for the sake of clarity. 


A3.5.2 METHODOLOGIES 

A3.5.2.1 SPECTROSCOPY 
(A) ACTION SPECTROSCOPY 


The term action spectroscopy refers to how a particular 'action', or process, depends on photon energy. For 

example, the photodissociation of O^ with UV light leads to energetic 0~ + O fragments; the kinetic energy 
released has been 


studied as a function of photon energy by Lavrich et al [6, 7]. Many of the processes discussed in this section 
may yield such an action spectrum and we will deal with the processes individually. 

(B) LASER INDUCED FLUORESCENCE 

Laser induced fluorescence (LIF) detection of molecules has served as a valuable tool in the study of gas- 
phase processes in combustion, in the atmosphere, and in plasmas [8, 9, 10, 11 and 12 ]. In the LIF technique, 
laser light is used to excite a particular level of an atom or molecule which then radiates (fluoresces) to some 
lower excited state or back to the ground state. It is the fluorescence photon which signifies detection of the 
target. Detection may be by measurement of the total fluorescence signal or by resolved fluorescence, in 
which the various rovibrational populations are separated. LIF is highly selective and may be used to detect 
molecules with densities as low as 10 cm"- 3 in low pressure situations (<0.1 Pa) where collisional quenching 
is negligible. In the presence of an atmosphere of air, the detection limit is about 10 10 cm -3 . The use of LIF 
for ions is more difficult than for neutrals because a typical ion number density may be orders of magnitude 
lower than for neutrals. Nevertheless, important LIF work with ions has been reported. 

LIF has been used to study state-selected ion-atom and ion-molecule collisions in gas cells. Ar + reactions 
with N 2 and CO were investigated by Leone and colleagues in the 1980s [13, 14] and that group has 

continued to contribute new understanding of the drifting and reaction of ions in gases, including studies of 

velocity distributions and rotational alignment [15, 16, 17, 18 and 19]. The vibrational state dependence of the 

charge transfer reaction 

Nj(v = 0, ],2) + X-> N 2 + X + 

where X = Ar or 2 and the collisional deactivation reaction 

K(v= l t 2) + X^ NUv' < u) + X 

were studied by this group using LIF [20]. They showed that charge transfer is enhanced by vibrational 
excitation and that vibrational deactivation is much more likely with 2 than with Ar. We also consider here a 
reaction that displays both electron-transfer and proton-transfer channels, 

DBr^n, V. J + ) +HBr -+ HBr^FL , t>\ f*i + DBi 

and 

HBi^-rij , v* = 0, y + ) + HBi -* H ; Br 4 + Br 

studied via LIF of the A 2 S-X 2 n i/ 9 3/2 (0>0) bands of HBr + , using photons in the range 358-378 nm [ 21 , 22 ]. 
For the electron transfer reaction, it' was found that any excess energy in the process was statistically 
partitioned among all degrees of freedom of the complex and was manifested in the LIF spectra as rotational 
heating. Flow tube experiments tuned to different Br isotopes also showed a hydrogen-atom transfer channel 
in the HBr + + HBr reaction. 


LIF is also used with liquid and solid samples. For example, LIF is used to detect UO^ + ions in minerals; the 

uranyl ion is responsible for the bright green fluorescence given off by minerals such as autunite and opal 
upon exposure to UV light [23]. 

(C) PHOTODISSOCIATION OF IONS 

Photodissociation of molecular ions occurs when a photon is absorbed by the ion and the energy is released 
(at least partly) by the breaking of one of the molecular bonds. The photodissociation of a molecular ion is 
conceptually similar to that for neutral molecules, but the experimental techniques differ. Photodissociation 
events are divided into two categories: direct dissociation, in which the photoexcitation is from a bound state 
to a repulsive state and predissociation, in which a quasi-bound state is accessed in the excitation. Direct 
dissociation takes place rapidly (fs to ps timescale). The shape of the direct dissociation cross section curve 
against photon energy is governed by the (Franck-Condon) overlap of wavefunctions of the initial state 
(usually the ground state) and the final, repulsive state. It will normally consist of peaks corresponding to 
vibrational structure in the initial level of the target ion with shapes skewed by the overlap with the repulsive 
state. One can model these shapes to obtain the potential curves of both the initial and repulsive states. 
Predissociation, in contrast, may take place over a much longer timescale; the lifetime of a particular 
predissociating state may be determined from the width of the resonance observed. Measurements of the 
lifetime for a series of predissociating states gives a picture of the predissociation mechanism. 

Photodissociation cross sections tend to peak in the 1CT 18 cm 2 range and hence are often given in Mb 
(megabarn) units. 

There are many experimental methods by which photodissociation of ions have been studied. The earliest 
were crossed-beams experiments on ^beginning in the late 1960s [24, 25 and 26] and experiments on a 

variety of ions in the 1970s using drift tubes [ 27 , 28 and 29]. Later techniques allowed more detailed 
information to be obtained on state symmetries and kinetic energy releases [30, 31 and 32]. Figure A3. 5.1 
shows the fast ion beam photofragment spectrometer at SRI International; similar apparatus is in use at other 
institutions [33]. The apparatus consists of an ion source and mass selector, two electrostatic quadrupole 
benders that allow a laser beam to interact coaxially with the ion beam, a product-ion (photofragment) energy 
analyser and a particle detector. An interesting feature of the coaxial beam technique, aside from the long 
interaction region, is that sub-Doppler line widths can be obtained because of a thousandfold or more 
narrowing of the ion velocity distribution in the centre-of-mass reference frame for typical keV ion energies. 
By the same token, photofragment ions that differ in energy by a tenth of an electron volt in the centre-of- 
mass frame will be separated by typically 10-20 eV in the laboratory frame. This simplifies the job of the 
photofragment energy analyser. An example of a photofragment kinetic energy spectrum is shown in figure 
A3. 5. 2 . If the laser beam is sent at right angles to the ion beam (instead of coaxially), the optical polarization 
vector can be rotated to map out angular distributions of photofragments. (In the coaxial arrangement, the 
optical polarization is necessarily always perpendicular to the ion beam direction.) 
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Figure A3.5.1. The fast ion beam photofragment spectrometer at SRI International. 'L' labels electrostatic 
lenses, 'D' labels deflectors and 'A' labels apertures. 
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Figure A3.5.2. The Ar + photofragment energy spectrum for the dissociation of 3 ke v Arsons at 752.5 nm. 
The upper scale gives the kinetic energy release in the centre-of-mass reference frame, both parallel and 
antiparallel to the ion beam velocity vector in the laboratory. 


In the past decade there has been photodissociation work on doubly charged positive ions, e.g., 

Ni + . NO; + , CF^ + , CCI^"\ SiH + and SiF^ + [24]- Interestingly, the result of photoexcitation of the latter four 

molecular ions is the loss of neutrals, as a consequence of the electronic structures. There are two possible 
scenarios, illustrated by 

NQ- + +Ai/-^ N + +0*andSiFf +hv^> SiF^+F. 

There has been much activity in the study of photodissociation of cluster ions, dating back to the 1970s when 
it was realized that most ions in the earth's lower atmosphere were heavily clustered [7, 35, 36 ]. 


(D) PHOTOELECTRON SPECTROSCOPY 

Photoelectron spectroscopy (PES) of negative ions involves irradiation of an ion beam with laser light and 
energy analysis of the electrons liberated when the photon energy exceeds the binding energy of the electron. 
The kinetic energy of the detached electron is the difference between the photon energy and the binding 
energy of the electron [37, 38, 39, 40, 41, 42, 43, 44, 45, 46 and 42]. Analysis of the electron energy thus 
gives a direct measurement of the electron affinity of the corresponding neutral atom or molecule, a very 
important thermochemical quantity. Generally speaking, PES yields more information about the neutral atom 
or molecule than the corresponding negative ion, because the target ion is ideally in its ground state and the 
electron kinetic energy is then dependent on the final state of the neutral product. The energy resolution of a 
PES experiment is usually adequate (often 5-10 meV) to resolve vibrational structure due to the neutral 
molecule, certainly for low-mass systems of few atoms and likewise electronic structure, including singlet- 
triplet splittings and fine structure separations. In a few cases, rotational energy levels have been resolved. 
Features may appear in a photoelectron spectrum due to excited levels of the target negative ion and give 
valuable information about the structure of the negative ion, but at the cost of complicating the spectrum. 

An example of a PES apparatus is shown in figure A3. 5. 3 . A PES apparatus consists of (a) an ion source, (b) a 
fixed-frequency laser, (c) an interaction region, (d) an electron energy analyser and (e) an electron detector. 
Ion sources include gas discharge, sputtering, electron-impact and flowing afterglow. The laser may be cw 
(the argon-ion laser operated at 488 nm is common) or pulsed (which allows frequency doubling etc.). Recent 
trends have been toward UV laser light because the negative ions of importance in practical chemistry (e.g., 
atmospheric chemistry and biochemistry) tend to be strongly bound and because the more energetic light 
allows one to access more electronic and vibrational states. The interaction region may include a magnetic 
field that routes detached electrons toward the energy analyser. The energy analyser is either a hemispherical 
electrostatic device or a time-of- flight energy analyser; the latter is especially suited to a pulsed-laser system. 
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Figure A3.5.3. The negative ion photoelectron spectrometer used at the University of Colorado. The 
apparatus now contains a UV-buildup cavity inside the vacuum system (not shown in this sketch). 

Data obtained with a PES apparatus are shown in figure A3. 5.4 [48], Interpretation of the spectrum for a 
diatomic molecule is particularly straightforward. Peaks to the left of the origin band (each band containing 
unresolved rotational structure) are spaced by the vibrational separation in the neutral molecule, and their 
relative intensities are determined by the amount of spatial overlap between wavefunctions for the negative 
ion and the neutral molecule (Franck-Condon factors). Peaks to the right of the origin band are spaced by the 
vibrational separation in the negative ion and their relative intensities give the effective temperature of the ion 
source. Subtracting the electron kinetic energy 


corresponding to the origin band from the photon energy yields the electron affinity of the molecule. The 
energy of the maximum of the envelope of neutral molecule peaks is referred to as the vertical detachment 
energy, i.e., the energy required from the ground vibrational state of the negative ion to the neutral molecule 
with no change in nuclear geometry. Photoelectron spectra are often far more complicated than the example 
shown, especially for polyatomic molecules. The origin band may have zero intensity, making the electron 


affinity difficult to determine directly. Fine structure at least doubles the number of peaks in the spectrum. 
PES experiments have been carried out for doubly charged negative ions [49] and using multiphoton 
detachment [37]. 
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Figure A3.5.4. The 488 nm photoelectron spectrum of NaCl . The arrow marks the origin band, for 
transitions from NaCl - (v = 0) to NaCl (v = 0), from which the electron affinity of NaCl is obtained. 

It is advantageous if the laser system permits rotation of the optical polarization. Detached electrons 
correlated with different final electronic states of the neutral molecule will generally be emitted with different 
angular distributions about the direction of polarization. Measurement of the angular distribution helps in the 
interpretation of complex photoelectron spectra. The angular distribution/^) of photoelectrons is [ 50 ] 

/(0) = [L+£/Mco.s0)]/4jr 

where is the angle between the optical polarization vector and the direction of emission of a photoelectron, p 
is the asymmetry parameter ((3 is in the range -1 to 2 and in general depends on photon energy) and P 2 ( cos0) 

is the second-order Legendre polynomial, (3 cos 2 - l)/2. PJ(cosQ) * s zero if © = 54.7°, in which case the 
detected electron signal is proportional to the angle-averaged detachment cross section. The distribution 
function is independent of the azimuthal angle. 

ZEKE (zero kinetic energy) photoelectron spectroscopy has also been applied to negative ions [51]. In ZEKE 
work, the laser wavelength is swept through photodetachment thresholds and only electrons with near-zero 
kinetic energy are 


allowed into a detector, resulting in narrow threshold peaks. The resulting resolution (2-3 cm ) is superior to 
that commonly encountered with PES (40-300 cm -1 ). 

PES of neutral molecules to give positive ions is a much older field [52]. The information is valuable to 
chemists because it tells one about unoccupied orbitals in the neutral that may become occupied in chemical 
reactions. Since UV light is needed to ionize neutrals, UV lamps and synchrotron radiation have been used as 
well as UV laser light. With suitable electron-energy resolution, vibrational states of the positive ions can be 


resolved, as with the negative-ion PES described above. The angular distribution of photoelectrons can be also 
determined as described above. 

(E) ABSORPTION 

In absorption spectroscopy, the attenuation of light as it passes through a sample is measured as a function of 
wavelength. The attenuation is due to ro vibrational or electronic transitions occurring in the sample. Mapping 
out the attenuation versus photon frequency gives a description of the molecule or molecules responsible for 
the absorption. The attenuation at a particular frequency follows the Beer-Lambert law, 

/ = / exp{-fruL) 

where / and / Q are the attenuated and unattenuated intensities, a is the cross section, n is the number density of 
target molecules and L is the path length. Broadening of spectral lines may be observed, and is classed as 
homogeneous broadening (e.g., collisions with other molecules and laser power effects) or inhomogeneous 
broadening (e.g., Doppler broadening) [53, 54]. So-called 'UV/vis' absorption spectroscopy is a standard tool 
for analysis of chemical samples. In organic samples, absorption by functional groups in the sample aids in 
identification of the species because it is strongly dependent upon the relative number of single, double, and 
triple bonds. 

Microwave spectra (giving pure rotational spectra) are especially useful for the detection of interstellar 
molecular ions (in some cases the microwave spectrum has first been observed in interstellar spectra!). 

Typically a DC glow discharge tube is used to produce the target ion (e.g., HCO + ) [55, 56]. If the photons 
travel parallel or antiparallel to the electric field direction, there is a small but measurable Doppler shift in 
frequencies. This is due to the drift velocity of ions in the electric field, which may aid in distinguishing ion 
spectra from neutral spectra, but in any case must be accounted for [55, 58]. Infrared spectroscopy has also 
been carried out on ions in a glow discharge tube using the beat frequency between a fixed- frequency visible 
laser and a tunable dye laser. The difference frequency laser, in the IR, irradiates a long discharge tube. The 
method was first used to study the important astrophysical ion H,[59]. Velocity modulation spectroscopy 

utilizes an audio frequency glow discharge coupled with phase synchronous demodulation of the absorbed IR 
laser radiation to take advantage of the Doppler shift occurring for ions drifting in glow discharge tubes. 
Many important positive ions, such as HiO\ NH^and H^, have been studied with this technique with the high 

precision common to IR spectroscopy [ 58 , 60 ], 

Far-infrared spectra of great sensitivity may be obtained with laser magnetic resonance (LMR). The 
sensitivity comes about because the gas sample is located inside the long cavity of the laser where the 
circulating power is typically 100 times that used in extracavity work. A discharge in the gas cell produces the 
radical species to be studied, and an axial magnetic field is varied to bring energy levels into resonance with 

one of many laser lines. HBr + was the first ion to be observed with LMR, in 1979. OH~ was one of the first 
negative ions to have been detected by direct absorption spectroscopy [61]. 
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Strictly speaking, the term absorption spectroscopy refers to measurements of light intensity. In practice, the 
absorption may be deduced from the detection of electrons or ions produced in the process, such as in 
absorption of light leading to photodetachment or photodissociation, i.e., action spectra. In the absorption 
spectroscopy of ions, this is a natural tack to take as the charged-particle production can be detected with 
greater precision than is possible for a measurement of a small change in the light intensity. The most 
important of such experiment types is the coaxial beams spectrometer, one of which is described in detail in 
the section on dissociation of ions. In these experiments, the ions are identified by mass and collimated, and 
interact with the laser beam over a long distance (0.25-1 m). The method was first used with ions in 1976 for 
HD + , with the absorption events detected via enhanced production of buffer gas ions as a result of charge 


transfer reactions [62]. Since this time many small ions have been studied in great detail, notably 

H^ fc 0|. CH*, H;0 4 ? and CO + [31, 32, 63]. A few negative ions have been studied using coaxial fast-ion/laser 

beams. The high-resolution IR spectrum of NH~, for example, was studied in this manner. The negative ion 
was excited to an autodetaching state with a photon energy greater than the electron affinity of NH. Detection 
of autodetached electrons signified an absorption event. Aside from determination of spectroscopic constants 
for NH~, information on autodetachment dynamics was obtained [64]. 

A3.5.2.2 KINETICS AND DYNAMICS 

In principle the study of ion-molecule kinetics and dynamics is no different from studies of the same 
processes in neutral species; however, there are additional forces that govern reactivity, often leading to 
behaviour that is fundamentally different from neutral processes. An important factor in determining ion- 
molecule rate constants and cross sections is the rate at which the reactants collide, i.e. the collision rate. In 
contrast to neutral kinetics, the collision rate at low energies or temperatures is determined not by the size of 
the molecule but by electrical forces. The ion-molecule collision rate is determined by the classical capture 
cross section for a point charge interacting with a structureless multipole. This was first described analytically 
for a point charge interacting with a polarizable species with no other multipole. In this case, the collisional 
value of the rate constant is independent of temperature [65]. The only other force of any significance is from 
the ion-permanent dipole interaction. Other forces, such as those between the ion and the quadrupole moment 
of the neutral, and between the neutral dipole and the induced dipole of the ion, have been shown to be of 
minor importance [58, 59, 60, 61, 62 and 63]. If the physical size of the reactants is greater than the capture 
radius, e.g. at translational energies of several tenths of an electron volt and greater, more conventional 
notions apply. Except for species with very small polarizabilities and systems of large mass, ion-molecule 

Q ^ 1 

collision rates are above 10 molecules cm"- 3 s , or about a factor often larger than neutral collision rates. 

Several processes are unique to ions. A common reaction type in which no chemical rearrangement occurs but 
rather an electron is transferred to a positive ion or from a negative ion is termed charge transfer or electron 
transfer. Proton transfer is also common in both positive and negative ion reactions. Many proton- and 
electron-transfer reactions occur at or near the collision rate [72]. A reaction pertaining only to negative ions 
is associative detachment [73, 74], 

A" + B -+ AB + e. 

Associative detachment reactions are important in controlling the electron concentration in the earth's 
mesosphere [75]. Reactions in which more than one neutral product are formed also occur and are sometimes 
referred to as reactive detachment [76]. 

Several reactivity trends are worth noting. Reactions that are rapid frequently stay rapid as the temperature or 
centre-of-mass kinetic energy of the reactants is varied. Slow exothermic reactions almost always show 
behaviour such that 
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the rate constant decreases with increasing temperature at low temperature or kinetic energies and then 
increases at higher temperature. As an example, figure A3. 5. 5 shows rate constants for the charge-transfer 

reaction of Ar + with 2 as a function of temperature. The data are from five separate experiments and four 
experimental techniques [77, 78, 79, 80 and 81] and cover the extremely wide temperature range of 0.8 K to 
1400 K. The extremely low temperature data are relatively flat. At approximately 20 K, the rate constants 
decrease. The decrease is described by a power law. A minimum is found at 800-900 K and a steep increase 
is found above 1000 K. The position of the minimum varies considerably for other reactions. 
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Figure A3.5.5. Rate constants for the reaction of Ar + with CL as a function of temperature. CRESU stands for 
the French translation of reaction kinetics at supersonic conditions, SIFT is selected ion flow tube, FA is 
flowing afterglow and HTFA is high temperature flowing afterglow. 

The decrease in reactivity with increasing temperature is due to the fact that many low-energy ion-molecule 
reactions proceed through a double-well potential with the following mechanism [82]: 


A ± + B U(AB) 


*± 


The minimum energy pathway for the reaction of Cl~ with CH 3 Br is shown in figure A3. 5. 6 [83]. As the 
reactants approach they are attracted by the ion-dipole and ion-induced-dipole forces and enter the entrance 
channel complex. As the reaction proceeds along the minimum energy path, the potential energy increases due 
to the forces necessary for rearrangement. The species then enter the product well and finally separate into 
products. The two wells are separated by a barrier that is often below the energy of the reactants but still plays 
an important role in controlling reactivity. The decrease in the overall rate constant with increasing 
temperature is due to the rate constant for collision complex 
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dissociation to reactants, k^, increasing more rapidly with temperature than the rate constant for the complex 
going to products, k - d [68]. The increase at higher energies and temperatures is often due to new channels 
opening, including new vibrational and electronic states as well as new chemical channels. 
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Figure A3.5.6. Minimum energy pathway for the reaction of CI with CH 3 Br. 

Rotational and translational energy have been shown to be equivalent in controlling most reactions [84]. 
Vibrational energy often increases reactivity; however, sometimes it does not affect reactivity, or occasionally 
decreases reactivity. The following sections describe several of the more common techniques used to measure 
ion-molecule rate constants or cross sections. 

(A) FLOW TUBES 

Flow tube studies of ion-molecule reactions date back to the early 1960s, when the flowing afterglow was 
adapted to study ion kinetics [85]. This represented a major advance since the flowing afterglow is a thermal 
device under most situations and previous instruments were not. Since that time, many iterations of the ion- 
molecule flow tube have been developed and it is an extremely flexible method for studying ion-molecule 
reactions [86, 87, 88, 89, 90, 91 and 92]. 

The basic flow system is conceptually straightforward. A carrier gas, often helium, flows into the upstream 
end of a tube approximately 1 m long with a radius of several centimetres. This buffer gas pressure is 
approximately 100 Pa. Ions are created either in the flow tube or injected from an external source at the 

upstream end of the pipe. The carrier gas transports the ions downstream at approximately 100 m s . Part 
way down the tube, a neutral reactant is added and the ions created in the source region are transformed into 
products. Conditions are chosen so that all ion chemistry leading to the reactant ion is complete and the ions 
are thermalized before they encounter the neutral reagent gas. The rate constant is determined by sampling a 
small portion of the gas with a quadrupole mass spectrometer and monitoring the disappearance of the 

primary ion and the appearance of product ions. For the reaction of A + + B — » products, the rate constant is 
given by 
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k= i/[R]Tln([A^]/[A + ]) 


A3.5.1 


where [B] is the reactant neutral concentration in the flow tube, x is the reaction time and [AJ] and [A + ] are 
the ion concentrations with and without reactant neutrals in the flow tube. This equation assumes that the 


concentration of B is much greater than that of A + , i.e. first order kinetics apply. This situation applies to all 
ion kinetics since it is difficult to make large quantities of ions. Fortunately, the derivation of the rate constant 
depends only on the relative ion concentration, which is much easier to measure than the absolute 
concentration. The reaction time is determined from the flow velocity of the carrier gas. The average ion flow 
velocity is approximately 1.6 times the average neutral flow velocity [87], a result of ion diffusion, ions being 
neutralized on the flow tube walls and the carrier gas having a parabolic flow profile characteristic of laminar 
flow. 

The basic system described above can be easily modified to study many processes. Figure A3. 5. 7 shows an 
example of a modern ion-molecule flow tube [ 93 ] with a number of interesting features. First, ions are created 
external to the flow tube. Any suitable ion source can be used, including high- and low-pressure electron- 
impact ion sources, a supersonic-expansion source (shown) or a flow-tube source. Once created the ions are 
injected into a quadrupole mass spectrometer and only ions with the proper mass are injected into the flow 
tube through a Venturi inlet. Under favourable circumstances, only one ion species enters the flow tube. This 
configuration is called the selected-ion flow tube (SIFT) [89, 90 and 91] . Alternatively, ions can be created in 
the carrier-gas flow by a filament or discharge. Neutral reagents are added through a variety of inlets. 
Unstable species such as O, H and N atoms, molecular radicals and vibrationally excited diatomics can be 
injected by passing the appropriate gas through a microwave discharge. In a SIFT, the chemistry is usually 
straightforward since there is only one reactant ion and one neutral present in the flow tube. 
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Figure A3.5.7. Schematic diagram of a selected ion flow drift tube with supersonic expansion ion source. 
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Flowing afterglows and SIFTs have been operated between 80 K and 1800 K. In addition, the ion kinetic 
energy can be varied by adding a drift tube either at room temperature [94, 95, 96 and 92] or with temperature 
variability [84]. A drift tube is a series of rings electrically connected by uniform resistors. Applying a voltage 
to the resistor chain forms a uniform electric field in the flow tube. Ions are accelerated by the electric field 
and decelerated by collisions so that a steady-state velocity results. The ion kinetic energy in the centre of 


mass, (KE) , is given by the Wannier expression [ 98 ] 


(Knj Clll = — -t;j * yikT 


A3.5.2 


where m^ m n and m^ are the mass of the ion, neutral and buffer respectively, v d is the velocity of the ion due 
to the electric field and Tis the temperature. Equation (A3. 5. 2) shows that the kinetic energy in a drift tube is 
the sum of a thermal component (-} A7")and a drift field component. At a fixed kinetic energy, varying the 

contribution of each term yields information on rotational and vibrational effects [84]- Excited-state effects 
can be studied in a number of other ways. Electronic excitation often occurs in SIFT studies of atomic ions. 
Vibrational effects can be studied by exciting neutral diatomics in a microwave discharge. Ion vibrations can 
be excited in the source and monitored by LIF or judicious choice of a reactant neutral, i.e. one that reacts 
differently with excited states than for ground states. Often one looks for a reaction that is endothermic with 
respect to the ground state and energetically allowed for the excited state. Product-state information can be 
obtained by the monitor method or through optical spectroscopy. This list of possibilities is not exhaustive but 
it does give a sample of the type of information that can be obtained. 

(B) TRAPS 

Another powerful class of instrumentation used to study ion-molecule reactivity is trapping devices. Traps 
use electric and magnetic fields to store ions for an appreciable length of time, ranging from milliseconds to 
thousands of seconds. Generally, these devices run at low pressure and thus can be used to obtain data at 
pressures well below the range in which flow tubes operate. 

The most widely used type of trap for the study of ion-molecule reactivity is the ion-cyclotron-resonance 
(ICR) [ 99 ] mass spectrometer and its successor, the Fourier-transform mass spectrometer (FTMS) [ 100 , 101 ]. 
Figure A3. 5. 8 shows the cubic trapping cell used in many FTMS instruments [ 101 ]. Ions are created in or 
injected into a cubic cell in a vacuum of 10 Pa or lower. A magnetic field, B, confines the motion in the x-y 
plane through ion-cyclotron motion. The frequency of motion, co, is given as 1.537 x 10 7 B e/m where B is the 
magnetic field in tesla (typically 1-7 T), e is the charge of an electron and m the mass in atomic mass units. 
To trap ions in the z direction, a potential is placed on the two end electrodes. The ions oscillate in the z 
direction until their motion is damped by collisions. The magnetic field adds little energy to the motion, and 
the ions can be described as thermal. Ions are detected by applying a radio frequency (RF) pulse (a chirp) to 
the transmitter plates. The RF pulse causes ions with the matching cyclotron frequency to absorb energy. The 
ions are not only energized but quickly move coherently. Image currents on the receiver plates are detected. 
By putting a rapidly varying RF pulse (0-2 MHz) on the transmitter plate one obtains image currents as a 
function of time. A fast Fourier transform yields the frequency spectrum that is directly related to the mass 
spectrum by the equation described above. 
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Figure A3.5.8. Schematic diagram of the cell used in a Fourier transform mass spectrometer. 

Kinetics measurements are made by adding a known concentration of reactant gas to the cell and monitoring 
the time evolution of the ion intensities. Rate constants are derived from equation (A3. 5.1) . Many useful tricks 
can be employed. The most useful is to chirp the cell so that all ions except those with the correct mass are 
excited out of the cell. In this manner the kinetics are simplified; only one ion and one neutral exist at time 
zero, and product information is easily obtained. A mass-specific excitation pulse adds energy such that ions 
may acquire enough energy to dissociate upon collision with background gas. The pattern of the dissociation 
often yields structural information. The ICR is particularly suited to the study of radiative association [ 101 ] 
and radiative cooling [ 102 ] of ions since the pressure is low and the trapping time can be long. 

Another class of trapping device that is gaining importance is the radiofrequency trap [ 103 ], Quadrupole ion 
traps (also called Paul traps) are three-dimensional traps with rotationally symmetric ring and endcap 
electrodes. An RF voltage of opposite phase is applied to the ring and endcaps, respectively, to create a 
quadrupolar RF field. This type of trap suffers from electric field heating of the ions and can be classified as a 
nonthermal device. More innovative traps in limited use are the ring electrode trap and 22-pole trap. Both of 
these devices trap ions in a large field-free region and produce thermal ions. Reactions at very low 
temperatures have been studied with these types of trap [81, 103 ]. 

(C) BEAMS 

The guided-ion beam has become the instrument of choice for studying ion-molecule reactions at elevated 
kinetic energies [ 103 ]. In many guided-ion beam systems the lowest energy obtainable is slightly above 
thermal energy (-0.1-0.2 eV), although it can be as low as the thermal energy of the target gas. The upper 
range varies but is generally in the tens of electron volts. 
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In essence, a guided-ion beam is a double mass spectrometer. Figure A3. 5. 9 shows a schematic diagram of a 
guided-ion beam apparatus [ 104 ]. Ions are created and extracted from an ion source. Many types of source 
have been used and the choice depends upon the application. Combining a flow tube such as that described in 
this chapter has proven to be versatile and it ensures the ions are thermalized [ 105 ]. After extraction, the ions 
are mass selected. Many types of mass spectrometer can be used; a Wien ExB filter is shown. The ions are 
then injected into an octopole ion trap. The octopole consists of eight parallel rods arranged on a circle. An RF 


voltage is applied to alternating sets of rods to trap ions in the centre of the octopole in an approximately 
square well potential. Little energy is transferred to the ions. The surrounding part of the octopole is a 

chamber where reactive gas is added. Typical pressures of added gas are of the order of 10 Pa. Pressure is 
kept low so single-collision conditions apply; the primary ions collide at most once with the reactant gas. The 
collision cell is generally run at room temperature although cooled and heated versions have been used. The 
main advantage of the octopole collision cell arrangement, over the arrangement used in early beam 
apparatuses, is greater collection efficiency of the product ions since products scattering in all directions are 
collected. The primary ions react with the reactant neutral and the resulting mix of ions exits the octopole to 
be mass analysed and detected. A quadrupole mass filter is often used for mass analysis although other mass 
spectrometers can be used. The reaction cross section is derived from the Beer-Lambert law, / = / Q exp(-anL) 
where / and 7 are the reactant ion signals with and without the reactant gas, n is the number density in the 
collision cell andZ is the length of the collision cell. In the single-collision limit, lis taken as the product ion 
signal and 7 Q as the primary ion signal. 
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Figure A3.5.9. Schematic diagram of a guided-ion beam. 

As with most methods for studying ion-molecule kinetics and dynamics, numerous variations exist. For low- 
energy processes, the collision cell can be replaced with a molecular beam perpendicular to the ion beam 
[ 106 ]. This greatly reduces the thermal energy spread of the reactant neutral. Another approach for low 
energies is to use a merged beam [ 103 ]. In this system the supersonic expansion is aimed at the throat of the 
octopole, and the ions are passed through 
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a quadrupole bender and merged with the neutral beam. Exceedingly low collision energies can be obtained 
with this arrangement. Another important modification is obtained by adding a second octopole between the 
collision cell and the mass analyser. This allows the product-ion flight times to be measured, thus yielding the 
kinetic-energy release in the reaction to provide important dynamical information. Laser radiation can be 
introduced into the octopole to measure product distributions or to study dissociative processes. One valuable 
use of guided-ion beams has been the study of thresholds for endothermic processes in order to measure bond 
strengths and other thermodynamic quantities. 

Two techniques exist for measuring the angular distribution of products. In the crossed-beam setup, the 


octopole/collision cell is replaced with an interaction zone defined by the overlap of an ion beam and a 
supersonic neutral beam [ 106 ]. The angular distribution is measured by moving a mass spectrometer to detect 
ions at various angles. A simpler approach is to measure the product transmission as a function of trapping 
potential on the octopole [ 103 ]. The derivative of the signal yields the angular information but with limited 
resolution. Angular distributions are often used to determine the extent of collision complex formation. 

(D) OTHER TECHNIQUES 

While the techniques described above are the most common and versatile for measuring ion-molecule 
kinetics, several other techniques are worth mentioning. An important technique for measuring ion energetics 
is the pulsed, high-pressure mass spectrometer (PHPMS) [ 107 ]. In PHPMS, a pulsed beam of 2 keV electrons 
enters a small chamber containing reactants and a buffer gas. The chamber is maintained at -500 Pa. Ion 
signals are then recorded as a function of time until equilibrium is established. Knowledge of the ion signals 
and partial pressures of the reactant neutral(s) yields the equilibrium constant. Temperature variation allows 
the enthalpy and entropy of reaction to be derived. Important thermodynamic information obtained by this 
technique includes ligand bond strengths, proton affinities, gas phase basicities, electron affinities and 
ionization energies. Information on kinetics can also be obtained. 

Several instruments have been developed for measuring kinetics at temperatures below that of liquid nitrogen 
[81]. Liquid helium cooled drift tubes and ion traps have been employed, but this apparatus is of limited use 
since most gases freeze at temperatures below about 80 K. Molecules can be maintained in the gas phase at 
low temperatures in a free jet expansion. The CRESU apparatus (acronym for the French translation of 
reaction kinetics at supersonic conditions) uses a Laval nozzle expansion to obtain temperatures of 8-160 K. 
The merged ion beam and molecular beam apparatus are described above. These techniques have provided 
important information on reactions pertinent to interstellar-cloud chemistry as well as the temperature 
dependence of reactions in a regime not otherwise accessible. In particular, information on ion-molecule 
collision rates as a function of temperature has proven valuable in refining theoretical calculations. 

Most ion-molecule techniques study reactivity at pressures below 1000 Pa; however, several techniques now 
exist for studying reactions above this pressure range. These include time-resolved, atmospheric-pressure, 
mass spectrometry; optical spectroscopy in a pulsed discharge; ion-mobility spectrometry [ 108 ] and the 
turbulent flow reactor [ 109 ], 
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A3.5.3 APPLICATIONS 

A3.5.3.1 ION STRUCTURE AND ENERGETICS 

The molecular constants that describe the structure of a molecule can be measured using many optical 
techniques described in section A3. 5.1 as long as the resolution is sufficient to separate the rovibrational states 
[ 110 , 111 and 112 ]. Absorption spectroscopy is difficult with ions in the gas phase, hence many ion species 
have been first studied by matrix isolation methods [ 113 ], in which the IR spectrum is observed for ions 
trapped within a frozen noble gas on a liquid-helium cooled surface. The measured frequencies may be shifted 
as much as 1% from gas phase values because of the weak interaction with the matrix. 

These days, remarkably high-resolution spectra are obtained for positive and negative ions using coaxial- 
beam spectrometers and various microwave and IR absorption techniques as described earlier. Information on 
molecular bond strengths, isomeric forms and energetics may also be obtained from the techniques discussed 
earlier. The kinetics of cluster-ion formation, as studied in a selected-ion flow tube (SIFT) or by high-pressure 


mass spectrometry, may be interpreted in terms of cluster bond strengths [ 114 ]. In addition, the chemistry of 
ions may be used to identify the structure. For example, the ionic product of reaction between OWd CH 4 at 

300 K has been identified as CH 2 OOH + from its chemistry; the reaction mechanism is insertion [ 115 ]. 
Collision-induced dissociation (in a SIFT apparatus, a triple-quadrupole apparatus, a guided-ion beam 
apparatus, an ICR or a beam-gas collision apparatus) may be used to determine ligand-bond energies, 
isomeric forms of ions and gas-phase acidities. 

Photoelectron spectra of cluster ions yields cluster-bond strengths, because each added ligand increases the 
binding energy of the extra electron in the negative ion by the amount of the ligand bond strength (provided 
the bond is electrostatic and does not appreciably affect the chromophore ion) [ 116 ], 

One example of the determination of molecular constants can be taken from the photoelectron spectrum for 

NaCl - shown in figure A3. 5. 4 [48]. The peak spacing to the left of the origin band is 45 meV: the nominal 
vibrational frequency co e in neutral NaCl. The spectral resolution is not good enough to specify the small 
anharmonic correction, co e x e , or the rotational constant, B Q , but these, along with the equilibrium separation, r Q 
(=2.361 A), are accurately known from optical spectra. The spectrum in figure A3. 5. 4 also provides new 
information about the negative ion: the peaks to the right of the origin band are spaced by 33 meV, the 
nominal vibrational spacing in NaCl - . The distribution of peak heights everywhere is determined by the 

Franck-Condon overlap of wavefunctions for NaCl and NaCl - vibrational states, so the data give the ion 
temperature and the magnitude of the change in r Q between the neutral and negative ion (0.136 A in this case). 

Vibrational frequencies and bond-energy considerations imply that r e (NaCl~) > r Q (NaCl). Therefore, r Q 
(NaCl - ) = 2.497 A, and B Q = 0.195 cm . Finally, the position of the origin peak gives the electron binding 
energy (the electron affinity of NaCl, 0.727 eV) and a thermochemical cycle allows one to calculate the bond 

energy of NaCl - (all other quantities being known): 

« (Na-CI") = /> (Na-C;l) + EA(NaCl) - EA(C1) 
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yielding £> (Na-Cr) = 1.34 eV. Admittedly, this is a simple spectrum to interpret. Had the system involved a 
Ft state instead of S states, additional peaks would have complicated the spectrum (but yielded additional 
information if resolved). Had the molecules been polyatomic, where vibrations may be bending modes or 
stretches and combinations, the spectra and interpretation become more complex — the systems must be 
described in terms of normal modes instead of the more intuitive, but coupled, stretches and bends. 

A.3.5.3.2 THERMOCHEMISTRY 

The principles of ion thermochemistry are the same as those for neutral systems; however, there are several 
important quantities pertinent only to ions. For positive ions, the most fundamental quantity is the adiabatic 
ionization potential (IP), defined as the energy required at K to remove an electron from a neutral molecule 
[117, 118 and 119], 

Positive ions also form readily by adding a proton to a neutral atom or molecule [ 120 ] 

M + H 4 -+ MH . 

The proton affinity, PA, is defined (at 298 K) as [ 117 ] 

PA = AH}(M) i AHf{U + ) - &H f {MU + ). 


Negative ions also have two unique thermodynamic quantities associated with them: the electron affinity, EA, 
defined as the negative of the enthalpy change for addition of an electron to a molecule at K [ 117 , 121 , 122 ] 

and the gas-phase acidity of a molecule, defined as the Gibbs energy change at 298 K, AG acid (AH), for the 
process [ 117 , 121 ] 

AH-> H + + A"> 

The enthalpy for this process is the proton affinity of the negative ion. 

Much effort has gone into determining these quantities since they are fundamental to ionic reactivity. 
Examples include thermodynamic equilibrium measurements for all quantities and photoelectron studies for 
determination of EAs and IPs. The most up-to-date tabulation on ion thermochemistry is the NIST Chemistry 
WebBook (webbook.nist.gov/chemistry) [ 123 ], 

Neutral thermochemistry can be determined by studying ion thermochemistry. For example, the following 
cycle can be used to determine a neutral bond strength, 
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A" -* A + c~ KA(A) 

AH^ A^+H + AH^(AH) 

ir \ c~ -* n -mil) 

AH-^Ai 11 ZJ(A-Il) 

where £>(A-H) is the bond dissociation energy or enthalpy for dissociating a hydrogen atom from AH. Often 
it is easier to determine the EA and anionic proton affinity than it is to determine the bond strength directly, 
especially when AH is a radical. The IP of hydrogen is well known. As an example, this technique has been 
used to determine all bond strengths in ethylene and acetylene [ 124 ]. 

A3.5.3.3 CLUSTER PROPERTIES 

A gas phase ionic cluster can be described as a core ion solvated by one or more neutral atoms or molecules 

and it is often represented as A ± (B) or A ± • Bn, where A ± is the core ion and B are the ligand molecules. Of 
course, the core and the ligand can be the same species, e.g. the hydrated electron. The interactions governing 

the properties of these species are often similar to those governing liquid-phase ionic solvation. Modern 

techniques allow clusters with a specific number of neutral molecules to be studied, providing information on 

the evolution of properties as a function of solvation number. This leads to insights into the fundamental 

properties of solutions and has made this field an active area of research. 

The most fundamental of cluster properties are the bond strengths and entropy changes for the process [ 125 ] 

A ± B, 4 h B \ M = A ± -D ir _i ■ M. 

The thermodynamic quantities are derived from equilibrium measurements as a function of temperature. The 
measurements are frequently made in a high-pressure mass spectrometer [ 107 ]. The pertinent equation is In 


(K +1 ) = -AG°/RT=-AH°/RT+ ASl°/T. Another important method to determine bond strengths is from 
threshold measurements in collisional dissociation experiments [ 126 ]. Typically, A Fr changes for n = are 
1.5 to several times the solution value [ 127 ], The value usually drops monotonically with increasing n. The 
step size can have discontinuities as solvent shells are filled. The discontinuities in the thermodynamic 
properties appear as magic numbers in the mass spectra, i.e. ions of particular stability such as those with a 
closed solvation shell tend to be more abundant than those with one more or less ligand. A graph of bond 
strengths for H 2 bonding to several ions against cluster size is shown in figure A3. 5. 10 [ 125 ]. 
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Figure A3.5.10. Bond strengths of water clustering to various core ions as a function of the number of water 
molecules. 

Clusters can undergo a variety of chemical reactions, some relevant only to clusters. The simplest reaction 
involving clusters is association, namely the sticking of ligands to an ionic core. For association to occur, the 
ion-neutral complex must release energy either by radiating or by collision with an inert third body. The latter 
is an important process in the earth's atmosphere [ 128 , 129 , 130 and 131 ] and the former in interstellar clouds 
[ 101 ]. Cluster ions can be formed by photon or electron interaction with a neutral cluster produced in a 
supersonic expansion [ 132 ]. Another process restricted to clusters is ligand-switching or the replacement of 
one ligand for another. Often exothermic ligand-switching reactions take place at rates near the gas kinetic 
limit, especially for small values of n [72, 133 ]. Chemical-reactivity studies as a function of cluster size show 
a variety of trends [93, 127 , 133 ]. Proton-transfer reactions are often unaffected by solvation, while 
nucleophilic-displacement reactions are often shut down by as few as one or two solvent molecules. 
Increasing solvation number can also change the type of reactivity. A good example is the reaction of NO + 
(H 2 0)^ with H 2 0. These associate for small n but react to form H 3 + (H 2 0)^ ions for n = 3. This is an 
important process in much of the earth's atmosphere. Neutral reactions have been shown to proceed up to 30 
orders of magnitude faster when clustered to inert alkali ions than in the absence of the ionic clustering [ 134 ], 


Caging is an important property in solution and insight into this phenomenon has been obtained by studying 
photodestruction of Br^"(M) N and 1J(M)ac lusters, where M is a ligand such as Ar or C0 2 . When the Xjcore is 


photoexcited above the dissociation threshold of X 2 , the competition between the two processes forming 
X"(M) m and X^(.M vindicates when caging is occurring. For I~(CO±) r: the caging is complete at n = 16 [ 127 , 
135 ]. 
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An important class of molecule often described as clusters may better be referred to as micro-particles. This 
class includes metal, semiconductor and carbon clusters. Particularly interesting are the carbon clusters, C*. 

Mass spectra from a carbon cluster ion source show strong magic numbers at C ( ^and C^ [ 136 ], This led to the 

discovery of the class of molecules called buckminsterfullerenes. Since that time, other polyhedra have been 
discovered, most notably metallocarbohedrenes [ 137 ]. The first species of this type discovered was TixCp. 

Much of the work done on metal clusters has been focused on the transition from cluster properties to bulk 
properties as the clusters become larger, e.g. the transition from quantum chemistry to band theory [ 127 ]. 

A3.5.3.4 ATMOSPHERIC CHEMISTRY 

Atmospheric ions are important in controlling atmospheric electrical properties and communications and, in 
certain circumstances, aerosol formation [ 128 , 130 , 131 , 138 , 139 , 140 , 141 , 142 , 143 , 144 and 145]. In 
addition, ion composition measurements can be used to derive trace neutral concentrations of the species 
involved in the chemistry. Figure A3. 5. 11 shows the total-charged-particle concentration as a function of 

altitude [ 146 ]. The total density varies between 10 3 and 10 6 ions cm -3 . The highest densities occur above 100 
km. Below 100 km the total ion density is roughly constant even though the neutral density changes by a 

factor of approximately 4 x 10 . Most negative charge is in the form of electrons above 80 km, while negative 
ions dominate below this altitude. 

Above approximately 80 km, the prominent bulge in electron concentration is called the ionosphere. In this 
region ions are created from UV photoionization of the major constituents — O, NO, N 2 and 2 . The 
ionosphere has a profound effect on radio communications since electrons reflect radio waves with the same 
frequency as the plasma frequency, / = 8.98 x IO*Fji'"\ where n Q is the electron density in cm -3 [ 147 ]. The 

large gradient in electron density ensures that a wide variety of frequencies are absorbed. It is this 
phenomenon that allows one to hear distant radio signals. Ion chemistry plays a major role in determining the 
electron density. Diatomic ions recombine rapidly with electrons while monatomic ions do not. Monatomic 
positive ions do not destroy electrons until they are converted to diatomic ions. The most important reaction in 
the ionosphere is the reaction of + with N 2 , 

+ + N 2 ^ NO + +K 

Although this reaction is exothermic, the reaction has a small rate constant. This is one of the most studied 
ion-molecule reactions, and dependences on many parameters have been measured [ 148 ]. 
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Figure A3.5.11. Charged particle concentrations in the atmosphere. 

More complex ions are created lower in the atmosphere. Almost all ions below 70-80 km are cluster ions. 

Below this altitude range free electrons disappear and negative ions form. Three-body reactions become 

important. Even though the complexity of the ions increases, the determination of the final species follows a 

rather simple scheme. For positive ions, formation of H + (H ? 0) is rapid, occurring in times of the order of 

milliseconds or shorter in the stratosphere and troposphere. After formation of H (H 2 0)^, the chemistry 
involves reaction with species that have a higher proton affinity than that of H 2 0. The resulting species can be 

written as H + (X) (H 2 0)^. The main chemical processes include ligand exchange and proton transfer as well 
as association and dissociation of H 2 ligands. Examples of species X include NH 3 [ 149 ], CH 3 COCH 3 [ 150 ] 

and CH 3 CN [ 151 ]. The rate constants are large, so the proton hydrates are transformed even when the 

concentration of X is low. 

The negative ion chemistry is equally clear. NG^ ( H Nth ) wr ( Hi Onions are formed rapidly. Only acids, HX, 

stronger than HN0 3 react with this series of ions producing X~(HX) m (HyO) . Most regions of the atmosphere 
have low concentrations of such acids. The two exceptions are a layer ofTT 2 S0 4 in the 30-40 km region [ 152 , 

153 ] and H 2 S0 4 and CH 3 S0 3 H which play an important role near the ground under some circumstances 
[154]. 
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Ion-composition measurements can be used to derive the concentrations of X and HX involved in the 
chemistry. This remains the only practical method of monitoring abundances of H 2 S0 4 and CH 3 CN in the 

upper atmosphere. Concentrations as low as 10 molecules cm -3 have been measured. 


A3.5.3.5 ASTROCHEMISTRY 

The astrochemistry of ions may be divided into topics of interstellar clouds, stellar atmospheres, planetary 
atmospheres and comets. There are many areas of astrophysics (stars, planetary nebulae, novae, supernovae) 
where highly ionized species are important, but beyond the scope of 'ion chemistry'. (Still, molecules, 
including H 2 0, are observed in solar spectra [ 155 ] and a surprise in the study of Supernova 1987A was the 
identification of molecular species, CO, SiO and possibly |-It[ 156 , 157 ].) In the early universe, after expansion 

had cooled matter to the point that molecules could form, the small fraction of positive and negative ions that 
remained was crucial to the formation of molecules, for example [ 156 ] 

H" + H -* H 2 + e" and H + + H -* HJ +Iw. 

The formation of molecules was the first step toward local gravitational collapses which led to, among other 
things, the production of this encyclopedia. 

Interstellar clouds of gases contain mostly H, H 2 and He, but the minority species are responsible for the 
interesting chemistry that takes place, just as in the earth's atmosphere. Interstellar clouds are divided into two 

types: diffuse, with atomic or molecular concentrations in the neighbourhood of 100 cm -3 and temperatures of 
100-200 K, in which ionization is accomplished primarily by stellar UV light, and dense (or dark) clouds, 

with densities of 10—10 cm -3 and temperatures of 10-20 K, in which ionization is a result of galactic cosmic 
rays since visible and UV light cannot penetrate the dense clouds [ 156 , 158 ], The dense clouds also contain 
particulate matter (referred to as dust or grains). Close to 100 molecular species, as large as 13-atomic, have 
been detected in interstellar clouds by RF and MM spectroscopy; among these are nine types of molecular 
positive ion. It is assumed that the neutral molecular species (except H 2 ) are mainly synthesized through ion- 
molecule reactions, followed by electron-ion recombination, since neutral-neutral chemical reactions proceed 
very slowly in the cold temperatures of the clouds, except on grain surfaces. Ion-molecule reactions are 
typically even faster at lower temperatures. Extensive laboratory studies of ion-molecule reactions, including 
work at very low temperatures, have mapped out the reaction schemes that take place in interstellar clouds. In 
dense clouds the reactions 

H2 + H 2 -* HJ + H and He* + CO -+ CT + O 1 He 

are of paramount importance. These reactions are followed by reactions with C and H 2 to produce CH1J ? that 
subsequently undergoes reaction with many neutral molecules to give ion products such as CHJ, C2H5OH >and 

CH 3 CNH + . Many of the reactions involve radiative association. Dissociative electron-ion recombination then 
yields neutrals such as CH 4 (methane), C 2 H 5 OH (ethanol) and CH 3 CN (acetonitrile) [ 158 ], It is often joked 

that diffuse interstellar clouds contain enough grain alcohol to keep space travellers happy on their long 

journeys. In diffuse clouds, the reaction scheme is more varied and leads to smaller molecules in general. 
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solute surface which, under the assumption of slip boundary conditions, gives for the correction factor C in 
equation (A3. 6. 35) : 

/riipVh 

Caw = - v , m ... ,,, ,, u (A3.6.36) 

with isothermal compressibility k t , ratio of radii of solvent to solute a r and a temperature-dependent 
parameter B. If one compares equation (A3. 6. 36) with the empirical friction model mentioned above, one 


realizes that both contain a factor of the form C= 1/1 + ar\, suggesting that these models might be physically 
related. 

Another, purely experimental possibility to obtain a better estimate of the friction coefficient for rotational 
motion y rot in chemical reactions consists of measuring rotational relaxation times x rot of reactants and 
calculating it according to equation (A3. 6. 35) as y rot = 6kTi v 


A3.6.4 SELECTED REACTIONS 

A3.6.4.1 PHOTOISOMERIZATION 

According to Kramers' model, for flat barrier tops associated with predominantly small barriers, the transition 
from the low- to the high-damping regime is expected to occur in low-density fluids. This expectation is borne 
out by an extensively studied model reaction, the photoisomerization of trans -stilbene and similar compounds 
[ 70 , 71 ] involving a small energy barrier in the first excited singlet state whose decay after photoexcitation is 
directly related to the rate coefficient of ^rafts-c/s-photoisomerization and can be conveniently measured by 
ultrafast laser spectroscopic techniques. 

(A) PRESSURE DEPENDENCE OF PHOTOISOMERIZATION RATE CONSTANTS 

The results of pressure-dependent measurements for trans -stilbene in supercritical ^2-pentane [46] ( figure 
A3. 6. 5 ) and the prediction from the model described by equation (A3. 6. 29) , using experimentally determined 
microcanonical rate coefficients in jet-cooled trans -stilbene to calculate k^, show two marked discrepancies 
between model calculation and measurement: (1) experimental values of k are an order of magnitude higher 
already at low pressure and (2) the decrease of k due to friction is much less pronounced than predicted. As 
interpretations for the first observation, several ideas have been put forward that will not be further discussed 
here, such as a decrease of the effective potential barrier height due to electrostatic solute-solvent interactions 
enhanced by cluster formation at relatively low pressures [72, 73], or incomplete intramolecular vibrational 
energy redistribution in the isolated molecule [74, 75, 76, 77, 78, 79 and 80], or Franck-Condon cooling in 
the excitation process [79, 80]. The second effect, the weak viscosity dependence, which was first observed in 
solvent series experiments in liquid solution [81, 82 and 83], has also led to controversial interpretations: (i) 
the macroscopic solvent viscosity is an inadequate measure for microscopic friction acting along the reaction 
path [84, 85], (ii) the multidimensional character of the barrier crossing process leads to a fractional power 
dependence of k on l/r| [54, 81, 86, 87], (iii) as the reaction is very fast, one has to take into account the finite 
response time of the solvent, i.e. consider frequency-dependent friction [ 81 , 87 ] and (iv) the effective barrier 
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Depending on the electron and ion temperatures in a plasma, all of the processes mentioned in this section of 
the encyclopedia may be taking place simultaneously in the plasma [ 163 ]. Understanding, or modelling, the 
plasma may be quite complicated [ 164 ]. Flame chemistry involves charged particles [ 165 ]. Most of the early 
investigations and classifications of electron and ion interactions came about in attempts to understand electric 
discharges, and these continue today in regard to electric power devices, such as switches and high-intensity 
lamps [ 166 ]. Often the goal is to prevent discharges in the face of high voltages. Military applications 
involving the earth's ionosphere funded refined work during and following the Second World War. Newer 
applications such as gas discharge lasers have driven recent studies of plasma chemistry. The rare-gas halide 
excimer laser is a marvellous example of plasma chemistry, because the lasing molecule may be formed in 


recombination between a positive and a negative ion, for example [ 167 , 168 and 169 ] 

Ar - ir > ArF (excimer slate) > ArF (ground stale) + in 1 (193 nm) 


or 


Arj + F -* ArF* (excimer state) + Ar -± ArF (ground state) + hv (193 nm) + Ar. 

The Ar+is formed from Ar*+Ar, where the metastable Ar* is a product of electron-impact or charge-transfer 

collisions. The F~ is formed by dissociative electron attachment to F 2 or NF 3 . The population inversion 
required for light amplification is simple to obtain in the ArF laser since the ground state of the lasing 
molecule is not bound, except by van der Waals forces and quickly dissociates upon emission of the laser 
light. 
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A3.6 Chemical kinetics in condensed phases 

Jorg Schroeder 


A3.6.1 INTRODUCTION 

The transition from the low-pressure gas to the condensed phase is accompanied by qualitative changes in the 
kinetics of many reactions caused by the presence of a dense solvent environment permanently interacting 
with reactants (also during their motion along the reaction path). Though this solvent influence in general may 
be a complex phenomenon as contributions of different origin tend to overlap, it is convenient to single out 
aspects that dominate the kinetics of certain types of reactions under different physical conditions. 

Basic features of solvent effects can be illustrated by considering the variation of the rate constant £ uni of a 
unimolecular reaction as one gradually passes from the low-pressure gas phase into the regime of liquid-like 
densities [I] (see figure A3. 6.1 .) At low pressures, where the rate is controlled by thermal activation in 
isolated binary collisions with bath gas molecules, £ uni is proportional to pressure, i.e. it is in the low-pressure 
limit £q. Raising the pressure further, one reaches the fall-off region where the pressure dependence of £ uni 
becomes increasingly weaker until, eventually, it attains the constant so-called high-pressure limit k^. At this 
stage, collisions with bath gas molecules, which can still be considered as isolated binary events, are 
sufficiently frequent to sustain an equilibrium distribution over rotational and vibrational degrees of freedom 
of the reactant molecule, and k^ is determined entirely by the intramolecular motion along the reaction path. 
k^ may be calculated by statistical theories (see chapter A3. 4 ) if the potential-energy (hyper)surface (PES) for 
the reaction is known. What kind of additional effects can be expected, if the density of the compressed bath 
gas approaches that of a dense fluid? Ideally, there will be little further change, as equilibration becomes even 
more effective because of permanent energy exchange with the dense heat bath. So, even with more 
confidence than in the gas phase, one could predict the rate constant using statistical reaction rate theories 
such as, for example, transition state theory (TST). However, this ideal picture may break down if (i) there is 
an appreciable change in charge distribution or molar volume as the system moves along the reaction path 
from reactant to product state, (ii) the reaction entails large-amplitude structural changes that are subject to 
solvent frictional forces retarding the motion along the reaction path or (iii) motion along the reaction path is 
sufficiently fast that thermal equilibrium over all degrees of freedom of the solute and the bath cannot be 
maintained. 

(i) This situation can still be handled by quasi-equilibrium models such as TST, because the solvent 
only influences the equilibrium energetics of the system. The ensuing phenomena may be loosely 
referred to as 'static' solvent effects. These may be caused by electronic solute-solvent interactions 
that change the effective PES by shifting intersection regions of different electronic states or by 
lowering or raising potential-energy barriers, but also by solvent structural effects that influence the 


free-energy change along the reaction path associated with variations in molar volume. 

(ii) The decrease of the rate constant due to the viscous drag exerted by the solvent medium requires an 
extension of statistical rate models to include diffusive barrier crossing, because the no-recrossing 
postulate of TST is obviously violated. In the so-called Smoluchowski limit, one would expect an 
inverse dependence of £ uni on solvent viscosity r| at sufficiently high pressure. A reaction rate 

constant is still well defined and kinetic rate equations may be used to describe the course of the 
reaction. 



logp 

Figure A3.6.1. Pressure dependence of unimolecular rate constant £ uni . 

This is no longer the case when (iii) motion along the reaction path occurs on a time scale comparable to other 
relaxation times of the solute or the solvent, i.e. the system is partially non-relaxed. In this situation dynamic 
effects have to be taken into account explicitly, such as solvent-assisted intramolecular vibrational energy 
redistribution (IVR) in the solute, solvent-induced electronic surface hopping, dephasing, solute-solvent 
energy transfer, dynamic caging, rotational relaxation, or solvent dielectric and momentum relaxation. 

The introductory remarks about unimolecular reactions apply equivalently to bimolecular reactions in 
condensed phase. An essential additional phenomenon is the effect the solvent has on the rate of approach of 
reactants and the lifetime of the collision complex. In a dense fluid the rate of approach evidently is 
determined by the mutual diffusion coefficient of reactants under the given physical conditions. Once 
reactants have met, they are temporarily trapped in a solvent cage until they either diffusively separate again 
or react. It is common to refer to the pair of reactants trapped in the solvent cage as an encounter complex. If 
the 'unimolecular' reaction of this encounter complex is much faster than diffusive separation: i.e., if the 
effective reaction barrier is sufficiently small or negligible, the rate of the overall bimolecular reaction is 
diffusion controlled. 


As it has appeared in recent years that many fundamental aspects of elementary chemical reactions in solution 
can be understood on the basis of the dependence of reaction rate coefficients on solvent density [2, 3, 4 and 
5], increasing attention is paid to reaction kinetics in the gas-to-liquid transition range and supercritical fluids 
under varying pressure. In this way, the essential differences between the regime of binary collisions in the 
low-pressure gas phase and that of a dense environment with typical many -body interactions become 
apparent. An extremely useful approach in this respect is the investigation of rate coefficients, reaction yields 
and concentration-time profiles of some typical model reactions over as wide a pressure range as possible, 
which permits the continuous and well controlled variation of the physical properties of the solvent. Among 
these the most important are density, polarity and viscosity in a continuum description or collision frequency, 


local solvent shell structure and various relaxation time scales in a microscopic picture. 


Progress in the theoretical description of reaction rates in solution of course correlates strongly with that in 
other theoretical disciplines, in particular those which have profited most from the enormous advances in 
computing power such as quantum chemistry and equilibrium as well as non-equilibrium statistical mechanics 
of liquid solutions where Monte Carlo and molecular dynamics simulations in many cases have taken on the 
traditional role of experiments, as they allow the detailed investigation of the influence of intra- and 
intermolecular potential parameters on the microscopic dynamics not accessible to measurements in the 
laboratory. No attempt, however, will be made here to address these areas in more than a cursory way, and the 
interested reader is referred to the corresponding chapters of the encyclopedia. 

In the sections below a brief overview of static solvent influences is given in A3. 6. 2, while in A3. 6. 3 the focus 
is on the effect of transport phenomena on reaction rates, i.e. diffusion control and the influence of friction on 
intramolecular motion. In A3. 6.4 some special topics are addressed that involve the superposition of static and 
transport contributions as well as some aspects of dynamic solvent effects that seem relevant to understanding 
the solvent influence on reaction rate coefficients observed in homologous solvent series and compressed 
solution. More comprehensive accounts of dynamics of condensed-phase reactions can be found in chapter 
A3. 8 , chapter A3. 13 , chapter B 3. 3 , chapter C3.1 , chapter C3. 2 and chapter C3. 5 . 


A3.6.2 STATIC SOLVENT EFFECTS 

The treatment of equilibrium solvation effects in condensed-phase kinetics on the basis of TST has a long 
history and the literature on this topic is extensive. As the basic ideas can be found in most physical chemistry 
textbooks and excellent reviews and monographs on more advanced aspects are available (see, for example, 
the recent review article by Truhlar et al [6] and references therein), the following presentation will be brief 
and far from providing a complete picture. 

A3.6.2.1 SEPARATION OF TIME SCALES 

A reactive species in liquid solution is subject to permanent random collisions with solvent molecules that 
lead to statistical fluctuations of position, momentum and internal energy of the solute. The situation can be 
described by a reaction coordinate X coupled to a huge number of solvent bath modes. If there is a reaction 

barrier ^o '' + refers to the forward direction and '-' to the reverse reaction), in a way similar to what is 
common in gas phase reaction kinetics, one may separate the reaction into the elementary steps of activation 

of A or B, barrier crossing, and equilibration of B or A, respectively (see figure A3. 6. 2 .) The time scale T t for 
mounting and crossing the barrier is determined by the magnitude of statistical fluctuations X(t) = ( X(t)) at 
temperature T, where ( ) indicates ensemble average. In a canonical ensemble this is mainly the Boltzmann 

factor T r "** e ° , where k denotes Boltzmann's constant. Obviously, the reaction is a rare event if the 
barrier is large. On the other hand, the time scale for energy relaxation x § in a potential well is inversely 
proportional to the curvature of the potential V along X, 



where jli denotes reduced mass. So the overall time scale for the reaction 

r* *= ^ A ,fl;iexp(Z: + /i7) » r^Bi 

for ^o ^ . If at the same time T r is also significantly larger than all other relevant time constants of the 
solute-bath system (correlation time of the bath, energy and momentum relaxation time, barrier passage time), 
Xmay be considered to be a random variable and the motion of the reacting species along this reaction 
coordinate a stochastic Markov process under the influence of a statistically fluctuating force. This simply 
means that before, during and after the reaction all degrees of freedom of the solute-solvent system but X are 
in thermodynamic equilibrium. In this case quasi-equilibrium models of reaction rates are applicable. If the 
additional requirements are met that (i) each trajectory crossing the transition state at the barrier top never 
recrosses and (ii) the Born-Oppenheimer approximation is fulfilled, TST can be used to calculate the reaction 
rate and provide an upper limit to the real rate coefficient (see chapter A3. 12 ). 



a x b 

Figure A3.6.2. Activation and barrier crossing. 

A3.6.2.2 THERMODYNAMIC FORMULATION OF TST AND REFERENCE STATES 

For analysing equilibrium solvent effects on reaction rates it is common to use the thermodynamic 
formulation of TST and to relate observed solvent-induced changes in the rate coefficient to variations in 
Gibbs free-energy differences between solvated reactant and transition states with respect to some reference 
state. Starting from the simple one-dimensional expression for the TST rate coefficient of a unimolecular 
reaction A — * P 


kT Qi t kT[A$] 

*tst = — ^-exp(-G /*r) = 


h <?, 


h [A] 


(A3.6.1) 


where Q A and Q* denote the partition functions of reactant and transition state per volume, respectively, Eq is 
the barrier height, and [A], [A*] stand for equilibrium concentration of reactant and, in a formal not physical 
sense, the transition state, respectively. Defining an equilibrium constant in terms of activities a 


ua Ya [A] Qa 


(A3.6.2) 


with corresponding activity coefficients denoted by y, one obtains for the rate coefficient from equation 


(A3. 6.1) and equation (A3. 6.2) 


AT [AM kTQ* VA ., YA 

-T777 = T — ap-Wt/ -rstTsr-r 


*™ = — 77T = — ^-cxp(-f»/*r)^ = 4>T7T (A3 - 6 - 3) 


where A TSTis a standard rate coefficient which depends on the reference state chosen. If one uses the dilute-gas 
phase as reference, i.e. *tst = *>* 5 all equilibrium solvation effects according to equation (A3. 6. 3) are 
included in the ratio of activity coefficients y A /y+ which is related to the Gibbs free energy of activation for 
the reaction in the gas ?^and in solution DlU u^m* ; 


* rin (^) = * nn © =Ac -- Ac - 


rolution- (A3-6.4) 

\ *?^ / \/ T / 

Since s^ " wiwipnnin equation (A3. 6.4) is equal to the difference between the Gibbs free energy of 

solvation of reactant and transition state, AG gol (A) - AG^A*), one has a direct correlation between 
equilibrium solvation free enthalpies and rate coefficient ratios. It is common practice in physical organic 
chemistry to use as a reference state not the gas phase, but a suitable reference solvent M, such that one 
correlates measured rate coefficient ratios of equation (A3. 6.4) to relative changes in Gibbs free energy of 
solvation 

[AG wLS (A) -AG ml , s (A')]- [ AG wlA , (A) - Afi S0] , M (A*)] = S M &G'. (A3.6.5) 

The shorthand notation in the rhs of equation (A3. 6. 5) is frequently referred to as the Leffler-Grunwald 
operator [7]. 

Considering a bimolecular reaction A+Ii — *P- 5 one correspondingly obtains for the rate constant ratio 

*»luiion/*iw = YhYn/Y* (A3.6.6) 

In the TST limit, the remaining task strictly speaking does not belong to the field of reaction kinetics: it is a 
matter of obtaining sufficiently accurate reactant and transition state structures and charge distributions from 
quantum chemical calculations, constructing sufficiently realistic models of the solvent and the solute-solvent 
interaction potential, and calculating from these ingredients values of Gibbs free energies of solvation and 
activity coefficients. In many cases, a microscopic description may prove a task too complex, and one rather 
has to use simplifying approximations to characterize influences of different solvents on the kinetics of a 
reaction in terms of some macroscopic physical or empirical solvent parameters. In many cases, however, this 
approach is sufficient to capture the kinetically significant contribution of the solvent-solute interactions. 
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A3.6.2.3 EQUILIBRIUM SOLVATION— MACROSCOPIC DESCRIPTION 

(A) NAIVE VIEW OF SOLVENT CAVITY EFFECTS 

Considering equation (A3. 6. 3) , if activity coefficients of reactant and transition state are approximately equal, 
for a unimolecular reaction one should observe £ so i ut j on ~ k . This in fact is observed for many unimolecular 
reactions where the reactant is very similar to the transition state, i.e. only a few bond lengths and angles 
change by a small amount and there is an essentially constant charge distribution. There are, however, also 
large deviations from this simplistic prediction, in particular for dissociation reactions that require separation 


of fragments initially formed inside a common solvent cavity into individually solvated products. 

For a bimolecular reaction in such a case one obtains from equation (A3. 6. 6) £ so i ut j on ~ Y ' k , so one has to 
estimate the activity coefficient of a reactant to qualitatively predict the solvent effect. Using ad hoc models 
of solvation based on the free-volume theory of liquids or the cohesive energy density of a solvent cavity, 
purely thermodynamic arguments yield y - 10 2 - 10 3 [8, 9 and 10 ]. 

The reason for this enhancement is intuitively obvious: once the two reactants have met, they temporarily are 
trapped in a common solvent shell and form a short-lived so-called encounter complex. During the lifetime of 
the encounter complex they can undergo multiple collisions, which give them a much bigger chance to react 
before they separate again, than in the gas phase. So this effect is due to the microscopic solvent structure in 
the vicinity of the reactant pair. Its description in the framework of equilibrium statistical mechanics requires 
the specification of an appropriate interaction potential. 

(B) ELECTROSTATIC EFFECTS-ONSAGER AND BORN MODELS 

If the charge distribution changes appreciably during the reaction, solvent polarity effects become dominant 
and in liquid solution often mask the structural influences mentioned above. The calculation of solvation 
energy differences between reactant and transition state mainly consists of estimating the Gibbs free energies 
of solvation AG §ol of charges, dipoles, quadrupoles etc in a polarizable medium. If the solute itself is 
considered non-polarizable and the solvent a continuous linear dielectric medium without internal structure, 

then "^'i — I^im where E^, is the solute-solvent interaction energy [11]. Reactant and transition state are 

modelled as point charges or point dipoles situated at the centre of a spherical solvent cavity. The point charge 
or the point dipole will polarize the surrounding dielectric continuum giving rise to an electric field which in 
turn will act on the charge distribution inside the cavity. The energy of the solute in this so-called reaction 
field may be calculated by a method originally developed by Onsager. Using his reaction field theory [12, 13 ], 
one obtains the molar Gibbs free energy of solvation (with respect to vacuum) of an electric point dipole \i Q ^ in 
a spherical cavity of radius r embedded in a homogeneous dielectric of dielectric constant s as 


■T. 

AG, oUip = -N A e ~\/« , (A3.6.7) 

with s Q and N A denoting vacuum permittivity and Avogadro's constant, respectively. The dielectric constant 
inside the cavity in this approximation is assumed to be unity. Applying this expression to a solvent series 
study of a reaction 


involving large charge separation, such as the Menshutkin reaction of triethylamine with ethyliodide 

Et.,N + EtT -v (EtjNEt"N)» -v El 4 N' +1" 

t t 

P***i ^b ^ J*ib' r alb 

one obtains 


A M AG* = -N t 


s- I 
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predicting a linear relationship between l n (^ so i ven /^ re f erence ) an d (s - l)/(2s +1) which is only approximately 
reflected in the experimental data covering a wide range of solvents [ 14 ] (see figure A3. 6. 3 . This is not 
surprising, in view of the approximate character of the model and, also, because a change of solvent does not 
only lead to a variation in the dielectric constant, but at the same time may be accompanied by a change in 
other kinetically relevant properties of the medium, demonstrating a general weakness of this type of 
experimental approach. 
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Figure A3.6.3. Solvent polarity dependence of the rate constant for the Menshutkin reaction (data from [14]). 

Within the framework of the same dielectric continuum model for the solvent, the Gibbs free energy of 
solvation of an ion of radius r ion and charge z^ or] e may be estimated by calculating the electrostatic work done 
when hypothetically charging a sphere at constant radius r ion from q = — » q = z- xon e. This yields the Born 
equation [ 13 ] 
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(A3.6.8) 


such that for a reaction of the type 


A T * + B 


■is 


CAB i+ > : 
z.f ? 


the change in effective barrier height (difference of Gibbs free energy of solvation changes between transition 
state and reactants) according to equation (A3. 6. 7) and equation (A3. 6. 8) equals 


& M AG* = - 
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This formula does not include the charge-dipole interaction between reactants A and B. The correlation 
between measured rate constants in different solvents and their dielectric parameters in general is of a similar 
quality as illustrated for neutral reactants. This is not, however, due to the approximate nature of the Born 
model itself which, in spite of its simplicity, leads to remarkably accurate values of ion solvation energies, if 
the ionic radii can be reliably estimated [15]. 

Onsager's reaction field model in its original form offers a description of major aspects of equilibrium 
solvation effects on reaction rates in solution that includes the basic physical ideas, but the inherent 
simplifications seriously limit its practical use for quantitative predictions. It since has been extended along 
several lines, some of which are briefly summarized in the next section. 

(C) IMPROVED DIELECTRIC CONTINUUM MODELS 

Onsager's original reaction field method imposes some serious limitations: the description of the solute as a 
point dipole located at the centre of a cavity, the spherical form of the cavity and the assumption that cavity 
size and solute dipole moment are independent of the solvent dielectric constant. 

Kirkwood generalized the Onsager reaction field method to arbitrary charge distributions and, for a spherical 
cavity, obtained the Gibbs free energy of solvation in terms of a multipole expansion of the electrostatic field 
generated by the charge distribution [12, 13] 

where N is the number of point charges in the cavity, vectors r. denote their position, fr^the angle between 
respective vectors, and P are the Legendre polynomials. This expression reduces to equation (A3. 6. 8) and 
equation (A3. 6. 7) for n = and n = 1 , respectively. It turns out that usually it is sufficient to consider terms up 
to n « 4 to achieve convergence of the expansion. The absolute value of the solvation energy calculated from 
equation (A3. 6. 9), however, critically depends on the size and the shape of the cavity. Even when the charge 
distribution of reactants and transition state can be calculated to sufficient accuracy by advanced quantum 
chemical methods, this approach only will give 


useful quantitative information about the solvent dependence of reaction rates, if the cavity does not change 
appreciably along the reaction path from reactants to transition state and if it is largely solvent independent. 

As this condition usually is not met, considerable effort has gone into developing methods to calculate 
solvation energies for cavities of arbitrary shape that as closely as possible mimic the topology of the interface 
between solute molecule and solvent continuum. Among these are various implementations of boundary 
element methods [16], in which a cavity surface of arbitrary shape is divided into surface elements carrying a 
specified surface charge. In one of the more simple variants, a virtual charge scheme as proposed by Miertus 

[17], the charge distribution of the solute p°(r) reflects itself in corresponding polarization surface charge 
densities at the cavity interface a(s). that are assigned to each of m surface elements s. and assumed to be 
constant across the respective surface areas AS.. The electric potential generated by these virtual charges is 

V, = X>£-^-. (A3.6.10) 


The surface charge density on each surface element is determined by the boundary condition 

0(s f )=— -(i-J (A3.6.11) 

where s denotes the static dielectric constant of the solvent, and the derivative of the total electrical potential 
Fat the interface is taken with respect to the normal vector n. of each surface element s. Fis the sum of 

contributions from the solute charges p° and the induced polarization surface charges g(s). Using equation 
(A3. 6. 10) and equation (A3. 6.1 1), the virtual surface charge densities a(s z ) can be calculated iteratively, and 
the Gibbs free energy of solvation is then half the electrostatic interaction energy of the solute charge 
distribution in the electric potential generated by the induced polarization surface charges 

AG™, = ^ / V„(r)p°(r)dr. (A3.6.12) 


-*/* 


Of course, one has to fix the actual shape and size of the cavity, before one can apply equation (A3. 6. 12). 
Since taking simply ionic or van der Waals radii is too crude an approximation, one often uses basis-set- 
dependent ab initio atomic radii and constructs the cavity from a set of intersecting spheres centred on the 
atoms [18, 19]. An alternative approach, which is comparatively easy to implement, consists of using an 
electrical equipotential surface to define the solute-solvent interface shape [20], 

The most serious limitation remaining after modifying the reaction field method as mentioned above is the 
neglect of solute polarizability. The reaction field that acts back on the solute will affect its charge distribution 
as well as the cavity shape as the equipotential surface changes. To solve this problem while still using the 
polarizable continuum model (PCM) for the solvent, one has to calculate the surface charges on the solute by 
quantum chemical methods and represent their interaction with the solvent continuum as in classical 
electrostatics. The Hamiltonian of the system thus is written as the sum of the Hamilton operator for the 
isolated solute molecule and its interaction with the macroscopic 
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electrostatic reaction field. The coupled equations of the solute subject to the reaction field induced in the 
solvent are then solved self-consistently to obtain the electron density of the solute in the presence of the 
polarizable dielectric — the basis of self-consistent reaction field (SCRF) models [21]. Whether this is done in 
the framework of, for example, Hartree-Fock theory or density functional theory, is a question of optimizing 
quantum chemical techniques outside the topics addressed here. 

If reliable quantum mechanical calculations of reactant and transition state structures in vacuum are feasible, 
treating electrostatic solvent effects on the basis of SRCF-PCM using cavity shapes derived from methods 

mentioned above is now sufficiently accurate to predict variations of Gibbs free energies of activation SAG-f 
with solvent polarity reliably, at least in the absence of specific solute-solvent interactions. For instance, 
considering again a Menshutkin reaction, in this case of pyridine with methylbromide, 
Pyr+MeBr^MePyr + +Br~, in cyclohexane and di-^-butyl ether, the difference between calculated and 
experimental values of AG-*- is only about 2% and 4%, respectively [22, 23 ]. 

As with SCRF-PCM only macroscopic electrostatic contributions to the Gibbs free energy of solvation are 
taken into account, short-range effects which are limited predominantly to the first solvation shell have to be 
considered by adding additional terms. These correct for the neglect of effects caused by solute-solvent 
electron correlation including dispersion forces, hydrophobic interactions, dielectric saturation in the case of 


multiply charged ions and solvent structural influences on cavitation. In many cases, however, the 
electrostatic contribution dominates and dielectric continuum models provide a satisfactory description. 

A3.6.2.4 EQUILIBRIUM SOLVENT EFFECTS— MICROSCOPIC VIEW 

Specific solute-solvent interactions involving the first solvation shell only can be treated in detail by discrete 
solvent models. The various approaches like point charge models, supermolecular calculations, quantum 
theories of reactions in solution, and their implementations in Monte Carlo methods and molecular dynamics 
simulations like the Car-Parrinello method are discussed elsewhere in this encyclopedia. Here only some 
points will be briefly mentioned that seem of relevance for later sections. 

(A) POINT CHARGE DISTRIBUTION MODEL [11] 

Considering, for simplicity, only electrostatic interactions, one may write the solute-solvent interaction term 
of the Hamiltonian for a solute molecule surrounded by S solvent molecules as 


S r 'V ^ 1 ft" W s 7 M Ms -7 7 M N* 




(A3.6.13) 


where the solute contains TV electrons and M nuclei with charges Z l and the solvent molecules 7V § electrons and 
M § nuclei with charge Z . In the point charge method equation (A3.6.13) reduces to 

(A3.6.14) 


^=EEf-EE 

p=[ A=l ^ ;>=L /=! 
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Here the position r. of the point charges located on the solvent molecules q is determined by the structure of 

ip p 

the solvent shell and the electron density distribution within the solvent molecule. In this type of model, the 
latter is assumed to be fixed, i.e. the solvent molecules are considered non-polarizable while solving the 
Schrodinger equation for the coupled system. 

Instead of using point charges one may also approximate the interaction Hamiltonian in terms of solute 
electrons and nuclei interacting with solvent point dipoles \i d 

^-EE^-EE^- («.S.15) 

In either case, the structure of the solvation shell has to be calculated by other methods supplied or introduced 
ad hoc by some further model assumptions, while charge distributions of the solute and within solvent 
molecules are obtained from quantum chemistry. 

(B) SOLVATION SHELL STRUCTURE 

The quality of the results that can be obtained with point charge or dipole models depends critically on the 
input solvation shell structure. In view of the computer power available today, taking the most rigorous route 


is feasible in many cases, i.e. using statistical methods to calculate distribution functions in solution. In this 
way the average structure of solvation shells is accessible, that is, to be used in equilibrium solvation 
calculations required to obtain, for example, TST rate constants. 

Assuming that additive pair potentials are sufficient to describe the inter-particle interactions in solution, the 
local equilibrium solvent shell structure can be described using the pair correlation function g^(r 1? r 2 ). If the 
potential only depends on inter-particle distance, g( 2 )(r 1? r 2 ) reduces to the radial distribution function g(r) = g 
( \\r^ - r 2 |) such that p • 4nr drg(r) gives the number of particles in a spherical shell of thickness dr at 
distance r from a reference particle (p denotes average particle density). The local particle density is then 
simply p • g(r). The radial distribution function can be obtained experimentally in neutron scattering 
experiments by measuring the angular dependence of the scattering amplitude, or by numerical simulation 
using Monte Carlo methods. 

(C) POTENTIAL OF MEAN FORCE 

At low solvent density, where isolated binary collisions prevail, the radial distribution function g(r) is simply 
related to the pair potential u(r) via g (V) = exp[-u(r)/kT]. Correspondingly, at higher density one defines a 
function w(r) = -kTln[g(r)]. It can be shown that the gradient of this function is equivalent to the mean force 
between two particles obtained by holding them at fixed distance r and averaging over the remaining N- 2 
particles of the system. Hence w(r) is called the potential of mean force. Choosing the low-density system as a 
reference state one has the relation 

lim g(r) = gty(r) => lim w(r ) = u(r) 


-12- 


and Aw(r) = w(r) - u(r) describes the average many-body contribution such as, for example, effects due to 
solvation shell structure. In the language of the thermodynamic formulation of TST, the ratio of rate constants 
in solution and dilute-gas phase consequently may be written as 


tr]n kflluiicin = _ &Aw i s _[ A ^( r :) _ Aw(r rac ,)]. (A3.6.16) 

A3.6.2.5 PRESSURE EFFECTS 

The inherent difficulties in interpreting the effects observed in solvent series studies of chemical reaction 
rates, which offer little control over the multitude of parameters that may influence the reaction, suggest rather 
using a single liquid solvent and varying the pressure instead, thereby changing solvent density and polarity in 
a well known way. One also may have to consider, of course, variations in the local solvent shell structure 
with increasing pressure. 

(A) ACTIVATION VOLUME 

In the thermodynamic formulation of TST the pressure dependence of the reaction rate coefficient defines a 
volume of activation [24, 25 and 26 ] 




AV* 

(A3.6.17) 


with "" — ^ " i-i n, the difference of the molar volume of transition state and the sum over molar 

volumes of reactants. Experimental evidence shows that \AV+\ is of the order of 10 - 10 cm 3 mol and 
usually pressure dependent [27]. It is common practice to interpret it using geometric arguments considering 
reactant and transition state structures and by differences in solvation effects between reactant and transition 

state. If one uses a molar concentration scale (standard state 1 mol dm -3 ), an additional term +k lv Av-f 
appears in the rhs of equation (A3. 6. 16), the product of isothermal solvent compressibility and change in sum 
over stoichiometric coefficients between reactants and transition state. 

There is one important caveat to consider before one starts to interpret activation volumes in terms of changes 
of structure and solvation during the reaction: the pressure dependence of the rate coefficient may also be 
caused by transport or dynamic effects, as solvent viscosity, diffusion coefficients and relaxation times may 
also change with pressure [2]. Examples will be given in subsequent sections. 

(B) ACTIVATION VOLUME IN A DIELECTRIC CONTINUUM 

If, in analogy to equation (A3. 6. 5) , one denotes the change of activation volume with respect to some 

reference solvent as 5 M AF* and considers only electrostatic interactions of reactant and transition state with a 
dielectric continuum solvent, one can calculate it directly from 
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/ 3tf M AG?) \ 


&W&V+ = [ ■ } (A3.6.18) 


by using any of the models mentioned above. If the amount of charge redistribution is significant and the 

solvent is polar, the dielectric contribution to AV$ by far dominates any so-called intrinsic effects connected 
with structural changes between reactant and transition state. For the Menshutkin reaction, for example, 
equation (A3. 6. 17) gives 

faAVl = ^ 


4jt£o 


Qeip) 




which includes a positive term resulting from the pressure dependence of the dielectric constant (in square 
brackets) and represents the experimentally observed pressure dependence of the activation volume quite 
satisfactorily [25]. For the Menshutkin reaction, only the large dipole moment of the transition state needs to 
be considered, resulting in a negative activation volume, a typical example of electrostriction. If one assumes 
that the neglect of solute polarizability is justified and, in addition, the cavity radius is constant, one may use 
this kind of expression to estimate transition state dipole moments. Improved continuum models as outlined in 
the preceding sections may, of course, also be applied to analyse activation volumes. 

(C) ACTIVATION VOLUME AND LOCAL SOLVENT STRUCTURE 

In a microscopic equilibrium description the pressure-dependent local solvent shell structure enters through 

variations of the potential of mean force, (dSAw$/d p)j, such that the volume of activation contains a 
contribution related to the pressure dependence of radial distribution functions for reactants and transition 
state, i.e. 


"*■" lap L \*.tr 5 )*(r™..)/JjT 


This contribution of local solvent structure to AV^ may be quite significant and, even in nonpolar solvents, in 
many cases outweigh the intrinsic part. It essentially describes a caging phenomenon, as with increasing 
pressure the local solvent density or packing fraction of solvent molecules around reactants and transition 
state increases, thereby enhancing the stability of the solvent cage. This constitutes an equilibrium view of 
caging in contrast to descriptions of the cage effect in, for example, photodissociation where solvent friction is 
assumed to play a central role. 

How large the magnitude of this packing effect can be was demonstrated in simple calculations for the atom 
transfer reaction CH 3 +CH 4 ^>CH 4 +CH 3 using a binary solution of hard spheres at infinite dilution as the 
model system [28]. Allowing spheres to partially overlap in the transition state, i.e. assuming a common 
cavity, reaction rates were calculated by variational TST for different solute-to-solvent hard-sphere ratios r Q = 
a M^ a s an< ^ s °l vent densities ft°i . Increasing the latter from 0.70 to 0.95 led to an enhancement of the relative 
rate constant £ so i ut j OI /^cr as by factors of 8.5, 15.5 and 53 for r Q equal to 0.93, 1.07 and 1.41, respectively, thus 
clearly showing the effect of local packing density. With respect to the calculated gas phase value the rate 
constants at the highest density were 95, 280 and 
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2670 times larger, respectively. This behaviour is typical for 'tight' transition states, whereas for loose 
transition states as they appear, for example, in isomerization reactions, this caging effect is orders of 
magnitude smaller. 


A3.6.3 TRANSPORT EFFECTS 

If reactant motion along the reaction path in the condensed phase involves significant displacement with 
respect to the surrounding solvent medium and there is non-negligible solute-solvent coupling, frictional 
forces arise that oppose the reactive motion. The overall rate of intrinsically fast reactions for which, for 
example, the TST rate constant is sufficiently large, therefore, may be influenced by the viscous drag that the 
molecules experience on their way from reactants to products. As mentioned in the introduction, dynamic 
effects due to other partially non-relaxed degrees of freedom will not be considered in this section. 

For a bimolecular reaction, this situation is easily illustrated by simply writing the reaction as a sequence of 
two steps 


tarn 1 j. 

A + B ==(A ■ ■ B) -=4 (products) (A3.6.19) 

where brackets denote common solvent cage (encounter complex), k^ is the rate constant of diffusive 
approach of reactants, sometimes called the 'encounter rate', £ is that of diffusive separation of the 
unreacted encounter pair and £ mol that of the reactive step in the encounter complex. If & mol » k , the overall 
reaction rate constant k essentially equals £ diff , and the reaction is said to be diffusion controlled. One 
important implicit assumption of this phenomenological description is that diffusive approach and separation 


are statistically independent processes, i.e. the lifetime of the encounter pair is sufficiently long to erase any 
memory about its formation history. Examples of processes that often become diffusion controlled in solution 
are atom and radical recombination, electron and proton transfer, fluorescence quenching and electronic 
energy transfer. 

In a similar phenomenological approach to unimolecular reactions involving large-amplitude motion, the 
effect of friction on the rate constant can be described by a simple transition formula between the high- 
pressure limit k^ of the rate constant at negligible solvent viscosity and the so-called Smoluchowski limit of 
the rate constant, £ SM , approached in the high-damping regime at large solvent viscosity [2]: 

I 1 1 

- = -f . (A3.6.20) 

As £ SM is inversely proportional to solvent viscosity, in sufficiently viscous solvents the rate constant k 
becomes equal to £ SM . This concerns, for example, reactions such as isomerizations involving significant 
rotation around single or double bonds, or dissociations requiring separation of fragments, although it may be 
difficult to experimentally distinguish between effects due to local solvent structure and solvent friction. 

Systematic experimental investigations of these transport effects on reaction rates can either be done by 
varying solvents in a homologous series to change viscosity without affecting other physicochemical or 
chemical properties 
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(or as little as possible) or, much more elegantly and experimentally demanding, by varying pressure and 
temperature in a single solvent, maintaining control over viscosity, polarity and density at the same time. As 
detailed physical insight is gained by the latter approach, the few examples shown all will be from pressure- 
dependent experimental studies. Computer experiments involving stochastic trajectory simulations or classical 
molecular dynamics simulations have also been extremely useful for understanding details of transport effects 
on chemical reaction rates, though they have mostly addressed dynamic effects and been less successful in 
actually providing a quantitative connection with experimentally determined solvent or pressure dependences 
of rate constants or quantum yields of reactions. 

A3.6.3.1 DIFFUSION AND BIMOLECULAR REACTIONS 

(A) DIFFUSION-CONTROLLED RATE CONSTANT 

Smoluchowski theory [29, 30] and its modifications form the basis of most approaches used to interpret 
bimolecular rate constants obtained from chemical kinetics experiments in terms of diffusion effects [31]. The 
Smoluchowski model is based on Brownian motion theory underlying the phenomenological diffusion 
equation in the absence of external forces. In the standard picture, one considers a dilute fluid solution of 
reactants A and B with [A] « [B] and asks for the time evolution of [B] in the vicinity of A, i.e. of the density 
distribution p(r,t) = [B](r,t)/[B] t=0 —[B](r(t))/[B] t=0 ([B] is assumed not to change appreciably during the 

reaction). The initial distribution and the outer and inner boundary conditions are chosen, respectively, as 


11 for i > R 

p(r «* oc-.i) = 1 for/ > (A3.6.21) 

Awi^(/?J=4 j t^D ab ^ 

where R is the encounter radius and D AB the mutual diffusion coefficient of reactants. The reflecting 
boundary condition [32] at the encounter distance 7? ensures that, once a stationary concentration of encounter 
pairs is established, the intrinsic reaction rate in the encounter pair, k ^p(R), equals the rate of diffusive 
formation of encounter pairs. In this formulation £ mol is a second-order rate constant. Solving the diffusion 
equation 


til 


i) 2 p 2iipl 

itr 2 r i)r J 


(A3.6.22) 


subject to conditions (A3. 6.21) and realizing that the observed reaction rate coefficient k(t) equals £ mol p(/?,*) 5 
one obtains 

k i 
k{t) = -^-{1 Kvexp[y 2 (l + j0 2 /]«fc[y(l + *)VF]} (A3.6.23) 
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using the abbreviations x = k mo ^4nRD AB and y = v'A-ml/?. The time-dependent terms reflect the transition 

from the initial to the stationary distribution. After this transient term has decayed to zero, the reaction rate 
attains its stationary value 

j _ ^mol 4jt RD^fikjnpi A'^-ft^i ,^ g 24) 


such that for £ mol » k^ one reaches the diffusion limit k ^^iff Comparing equation (A3. 6. 24) with the 

simple kinetic scheme (A3. 6. 19) , one realizes that at this level of Smoluchowski theory one has £ = k^^p 
(R), i.e. there is no effect due to caging of the encounter complex in the common solvation shell. There exist 
numerous modifications and extensions of this basic theory that not only involve different initial and 
boundary conditions, but also the inclusion of microscopic structural aspects [31]. Among these are 
hydrodynamic repulsion at short distances that may be modelled, for example, by a distance-dependent 
diffusion coefficient 


DAB(r)-£>AB[l-^exph--£) 


or the potential of mean force via the radial distribution function g(r), which leads to a significant reduction of 
the steady-state rate constant by about one-third with respect to the Smoluchowski value [33, 34]: 


L / 4jrH J P AB 0-)g(r)J 


L 


Diffusion-controlled reactions between ions in solution are strongly influenced by the Coulomb interaction 
accelerating or retarding ion diffusion. In this case, the diffusion equation for p concerning motion of one 
reactant about the other stationary reactant, the Debye-Smoluchowski equation, 

dp 


-£ = D An V-[vp + £:VVlr)] (A3.6.25) 


includes the gradient of the potential energy V(r) of the ions in the Coulomb field. Using boundary conditions 
equivalent to equation (A3. 6. 21) and an initial condition corresponding to a Boltzmann distribution of 
interionic distances 


and solving equation (A3. 6. 25), one obtains the steady-state solution 
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Many additional refinements have been made, primarily to take into account more aspects of the microscopic 
solvent structure, within the framework of diffusion models of bimolecular chemical reactions that encompass 
also many-body and dynamic effects, such as, for example, treatments based on kinetic theory [35]. One 
should keep in mind, however, that in many cases the practical value of these advanced theoretical models for 
a quantitative analysis or prediction of reaction rate data in solution may be limited. 

(B) TRANSITION FROM GASEOUS TO LIQUID SOLVENT— ONSET OF DIFFUSION CONTROL 

Instead of concentrating on the diffusion limit of reaction rates in liquid solution, it can be instructive to 
consider the dependence of bimolecular rate coefficients of elementary chemical reactions on pressure over a 
wide solvent density range covering gas and liquid phase alike. Particularly amenable to such studies are atom 
recombination reactions whose rate coefficients can be easily investigated over a wide range of physical 
conditions from the dilute-gas phase to compressed liquid solution [3, 4]. 

As discussed above, one may try to represent the density dependence of atom recombination rate coefficients 
k in the spirit of equation (A3. 6. 24) as 

] 1 1 

- ?Z -g- + — (A3.6.26) 

where ^-denotes the low-pressure second-order rate coefficient proportional to bath gas density, and £ diff is 


the second-order rate coefficient of diffusion-controlled atom recombination as discussed in the previous 
section. In order to apply equation (A3. 6. 26), a number of items require answers specific to the reaction under 
study: (i) the density dependence of the diffusion coefficient D^j^ (ii) the magnitude of the encounter radius 

R, (iii) the possible participation of excited electronic states and (iv) the density dependence of™. After 
these have been dealt with adequately, it can be shown that for many solvent bath gases, the phenomenon of 
the turnover from a molecular reaction into a diffusion-controlled recombination follows equation (A3. 6. 26) 
without any apparent discontinuity in the rate coefficient k at the gas-liquid phase transition, as illustrated for 
iodine atom recombination in argon [36, 37 ]. For this particular case, D AA is based on and extrapolated from 
experimental data, R is taken to be one-half the sum of the Lennard- Jones radii of iodine atom and solvent 
molecule, and the density-dependent contribution of excited electronic states is implicitly considered by 
making the transition from the measured ttein dilute ethane gas to & diff in dense liquid ethane. 

A more subtle point concerns scaling of *«cwith density. Among the various possibilities that exist, either 
employing local densities obtained from numerically calculated radial distribution functions [ 38 ] 


ft! &»-*■ U) PM{n p) 
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or taking into account that in the gas phase the reaction is controlled to a large extent by the energy transfer 
mechanism, such that ttc« P C ^LJ w h ere P c * s a c °Ui s i° n efficiency and Z LJ the Lennard- Jones collision 


frequency, are probably the most practical. As Z LJ - 1/^ AM throughout the whole density range, **«(p) in the 
latter case may be estimated by scaling with the diffusion coefficient [ 37 ] 

I jam \P) 

Although the transition to diffusion control is satisfactorily described in such an approach, even for these 
apparently simple elementary reactions the situation in reality appears to be more complex due to the 
participation of weakly bonding or repulsive electronic states which may become increasingly coupled as the 
bath gas density increases. These processes manifest themselves in iodine atom and bromine atom 
recombination in some bath gases at high densities where marked deviations from 'normal' behaviour are 
observed [3, 4]. In particular, it is found that the transition from ft&cto £ diff is significantly broader than 

predicted by equation (A3. 6. 26) , the reaction order of iodine recombination in propane is higher than 3, and S- 
shaped curves are observed with He as a bath gas [ 36 ] (see figure A3. 6.4 . This is in contrast to the 
recombination of the methyl radicals in Ar which can be satisfactorily described by a theory of particle 
encounter kinetics using appropriate interaction potentials and a modified friction for relative motion [39]. 
The only phenomena that cannot be reproduced by such treatments were observed at moderate gas pressures 
between 1 and 100 bar. This indicates that the kinetics of the reaction in this density regime may be 
influenced to a large extent by reactant-solute clustering or even chemical association of atoms or radicals 
with solvent molecules. 


11 r 



togio([A r ]/ m °l dm ) 


Figure A3.6.4. Pressure dependence of atom recombination rate constant of iodine in argon: experiment 
(points) [36] and theory (full line) [ 120 ]. 
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This problem is related to the question of appropriate electronic degeneracy factors in chemical kinetics. 
Whereas the general belief is that, at very low gas pressures, only the electronic ground state participates in 
atom recombination and that, in the liquid phase, at least most of the accessible states are coupled somewhere 
'far out' on the reaction coordinate, the transition between these two limits as a function of solvent density is 
by no means understood. Direct evidence for the participation of different electronic states in iodine geminate 
recombination in the liquid phase comes from picosecond time-resolved transient absorption experiments in 
solution [40, 41] that demonstrate the participation of the low-lying, weakly bound iodine A and A' states, 
which is also taken into account in recent mixed classical-quantum molecular dynamics simulations [42, 43]. 

A3.6.3.2 UNIMOLECULAR REACTIONS AND FRICTION 

So far the influence of the dense solvent environment on the barrier crossing process in a chemical reaction 
has been ignored. It is evident from the typical pressure dependence of the rate coefficient k of a unimolecular 
reaction from the low-pressure gas phase to the compressed-liquid phase that the prerequisites of TST are 
only met, if at all, in a narrow density regime corresponding to the plateau region of the curve. At low 
pressures, where the rate is controlled by thermal activation in binary collisions with the solvent molecules, k 
is proportional to pressure. This regime is followed by a plateau region where k is pressure independent and 
controlled by intramolecular motion along the reaction coordinate. Here k attains the so-called high-pressure 
limit k which can be calculated by statistical theories if the PES for the reaction is known. If the reaction 

00 J 

entails large-amplitude structural changes, further increasing the pressure can lead to a decrease of k as a 
result of frictional forces retarding the barrier crossing process. In the simplest approach, k eventually 
approaches an inverse dependence on solvent friction, the so-called Smoluchowski limit k^ M of the reaction 
rate. 


The transition from k^ to k^ on the low-pressure side can be constructed using multidimensional unimolecular 
rate theory [1, 44], if one knows the barrier height for the reaction and the vibrational frequencies of the 
reactant and transition state. The transition from k^ to £ SM can be described in terms of Kramers' theory [ 45 ] 


which, in addition, requires knowledge of the pressure dependence of the solvent friction acting on the 
molecule during the particular barrier crossing process. The result can be compared with rate coefficients 
measured over a wide pressure range in selected solvents to test the theoretical models that are used to 
describe this so-called Kramers' turnover of the rate coefficient. 

(A) KRAMERS' THEORY 

Kramers' solution of the barrier crossing problem [ 45 ] is discussed at length in chapter A3. 8 dealing with 
condensed-phase reaction dynamics. As the starting point to derive its simplest version one may use the 
Langevin equation, a stochastic differential equation for the time evolution of a slow variable, the reaction 
coordinate r, subject to a rapidly statistically fluctuating force F caused by microscopic solute-solvent 
interactions under the influence of an external force field generated by the PES Ffor the reaction 


Mu = -yu +■ F(0 - y r V (A3.6.27) 

where dots denote time derivative, M is the mass moving with velocity u along the reaction path and y is the 
constant friction coefficient for motion along that path. The assumption is that there are no memory effects in 
the solvent bath, 
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i.e. one considers a Markov process such that for the ensemble average (F(t)-F(f)) ~ 8 (t - f). The 
corresponding two-dimensional Fokker-Planck equation for the probability distribution in phase space can be 
solved for the potential-barrier problem involving a harmonic well and a parabolic barrier in the limit of low 
and large friction. Since the low-friction limit, corresponding to the reaction in the gas phase, is correctly 
described by multidimensional unimolecular rate theory, only the solution in the large-friction limit is of 
interest in this context. One obtains a correction factor F^ to the high-pressure limit of the reaction rate 
constant k „ 


*-[((®'-r- 
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(A3.6.28) 


which contains as an additional parameter the curvature of the parabolic barrier top, the so-called imaginary 
barrier frequency co B . F^ is less than unity and represents the dynamic effect of trajectories recrossing the 
barrier top, in contrast to the central assumption of canonical and microcanonical statistical theories, like TST 
or RRKM theory. In the high-damping limit, when y/M» co B , F^ reduces to co B M/y which simply represents 
the Smoluchowski limit where velocities relax much faster than the barrier is crossed. As y approaches zero, 
^Kr § oes to un ity an d the rate coefficient becomes equal to the high-pressure limit k^. In contrast to the 
situation in the Smoluchowski limit, the velocities do not obey a Maxwell-Boltzmann distribution. 

(B) PRESSURE DEPENDENCE OF REACTION RATES 

If other fall-off broadening factors arising in unimolecular rate theory can be neglected, the overall 
dependence of the rate coefficient on pressure or, equivalently, solvent density may be represented by the 
expression [1, 2] 


kip) = ~^F Kr (p). (A3.6.29) 

This ensures the correct connection between the one-dimensional Kramers model in the regime of large 
friction and multidimensional unimolecular rate theory in that of low friction, where Kramers' model is 
known to be incorrect as it is restricted to the energy diffusion limit. For low damping, equation (A3. 6. 29) 
reduces to the Lindemann-Hinshelwood expression, while in the case of very large damping, it attains the 
Smoluchowski limit 

*SM =k^— —. (A3.6.30) 

y/M 

Sometimes it may be convenient to use an even simpler interpolation formula that connects the different rate 
coefficient limits [4] 

L«-L + ± + -L^ k *, *2£*22 (A3.6.31) 

k hp kin ftsM Aflo + Ml + y/A/wn) 

for which numerical simulations have shown that it is accurate to within 10-20%. 
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Predicting the solvent or density dependence of rate constants by equation (A3. 6. 2 9) or equation (A3. 6. 31) 
requires the same ingredients as the calculation of TST rate constants plus an estimate of a> B and a suitable 
model for the friction coefficient y and its density dependence. While in the framework of molecular 
dynamics simulations it may be worthwhile to numerically calculate friction coefficients from the average of 
the relevant time correlation functions, for practical purposes in the analysis of kinetic data it is much more 
convenient and instructive to use experimentally determined macroscopic solvent parameters. 

As in the case of atom recombination, a convenient 'pressure scale' to use across the entire range is the 

inverse of the binary diffusion coefficient, J am, of reactant A in solvent M, as compared to density p in the 

low-pressure gas and the inverse of solvent viscosity r| in liquid solution [46]. According to kinetic theory 
the diffusion coefficient in a dilute Lennard- Jones gas is given by 

3 it 7" ft a - J * I A D 


2%/2MAM« iL,)i|, ZLjp Z U P 

with reduced collision integrals QW) for Lennard- Jones well depths ^ AS1 = v * a* m and reduced mass H AM? 
such that the low-pressure rate coefficient is 


f am Ji: 


In liquid solution, Brownian motion theory provides the relation between diffusion and friction coefficient 


D AM = kT/y. Substituting correspondingly in equation (A3. 6. 31) , one arrives at an expression representing the 
pressure dependence of the rate constant in terms of the pressure-dependent diffusion coefficient: 

k 3z , (A3. 6. 32) 

As data of the binary diffusion coefficient D AM (p,T) are not available in many cases, one has to resort to 
taking the solvent self-diffusion coefficient Z) M (p,7) which requires rescaling in the low-pressure regime 
according to 

L/2r -|2 fLIJt 




fl.ll*" 

In the Smoluchowski limit, one usually assumes that the Stokes-Einstein relation (Dr[/kT)a = C holds, which 
forms the basis of taking the solvent viscosity as a measure for the zero-frequency friction coefficient 
appearing in Kramers' expressions. Here C is a constant whose exact value depends on the type of boundary 
conditions used in deriving Stokes' law. It follows that the diffusion coefficient ratio is given by D M /D AM = 
^M a AM^AM a M' w hi c h may be considered as approximately pressure independent. 
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(C) EXTENSIONS OF KRAMERS' BASIC MODEL 

As extensions of the Kramers theory [ 47 ] are essentially a topic of condensed-phase reaction dynamics, only a 
few remarks are in place here. These concern the barrier shape and the dimensionality in the high-damping 
regime. The curvature at the parabolic barrier top obviously determines the magnitude of the friction 
coefficient at which the rate constant starts to decrease below the upper limit defined by the high-pressure 
limit: for relatively sharp barriers this 'turnover' will occur at comparatively high solvent density 
corresponding almost to liquid phase densities, whereas reactions involving flat barriers will show this 
phenomenon in the moderately dense gas, maybe even in the unimolecular fall-off regime before they reach 

Non-parabolic barrier tops cause the pre factor to become temperature dependent [48]. In the Smoluchowski 

limit, £ SM oc 7**, \n\ ~ 1, with n > and n < for curvatures smaller and larger than parabolic, respectively. For 
a cusp-shaped barrier top, i.e. in the limit a> B — » oo as might be applicable to electron transfer reactions, one 
obtains [45] 

where a> A is the harmonic frequency of the potential well in this one-dimensional model. In the other limit, for 
an almost completely flat barrier top, the transition curve is extremely broad and the maximum of k is far 
below k^ [49]. A qualitatively different situation arises when reactant and product well are no longer 
separated by a barrier, but one considers escape out of a Lennard- Jones potential well. In this case, dynamics 
inside the well and outside on top of the 'barrier' plateau are no longer separable and, in a strict sense, the 
Smoluchowski limit is not reached any more. The stationary rate coefficient in the high-damping limit turns 


out to be [50] 

The original Kramers model is restricted to one-dimensional barriers and cannot describe effects due to the 
multidimensional barrier topology that may become important in cases where the system does not follow the 
minimum energy path on the PES but takes a detour across a higher effective potential energy barrier which is 
compensated by a gain in entropy. Considering a two-dimensional circular reaction path, the Smoluchowski 
limit of the rate coefficient obtained by solving the two-dimensional Fokker-Planck equation in coordinate 
space was shown to be [ 51 ] 




where "^ -Lis the harmonic frequency of the transverse potential well and r Q the radius of curvature of the 

reaction path. This result is in good agreement with corresponding Langevin simulations [52]. A related 
concept is based on the picture that with increasing excitation of modes transverse to the reaction path the 

effective barrier curvature may increase according to co & (^j.) a i^-±/' } ) ? where a and b are dimensionless 
parameters [53]. Approximating the 
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topology of the saddle point region by a combination of a parabolic barrier top and a transverse parabolic 
potential, one arrives at a rate constant in the Smoluchowski limit given by 


*sm = -777*>b<'' "> with w »< r > oc 

y/M 
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Multidimensionality may also manifest itself in the rate coefficient as a consequence of anisotropy of the 
friction coefficient [54]. Weak friction transverse to the minimum energy reaction path causes a significant 
reduction of the effective friction and leads to a much weaker dependence of the rate constant on solvent 
viscosity. These conclusions based on two-dimensional models also have been shown to hold for the general 
multidimensional case [55, 56, 57, 58, 59, 60 and 61 ]. 

To conclude this section it should be pointed out again that the friction coefficient has been considered to be 
frequency independent as implied in assuming a Markov process, and that zero-frequency friction as 
represented by solvent viscosity is an adequate parameter to describe the effect of friction on observed 
reaction rates. 

(D) FREQUENCY-DEPENDENT FRICTION 

For very fast reactions, as they are accessible to investigation by pico- and femtosecond laser spectroscopy, 
the separation of time scales into slow motion along the reaction path and fast relaxation of other degrees of 
freedom in most cases is no longer possible and it is necessary to consider dynamical models, which are not 
the topic of this section. But often the temperature, solvent or pressure dependence of reaction rate 


coefficients determined in chemical kinetics studies exhibit a signature of underlying dynamic effects, which 
may justify the inclusion of some remarks at this point. 

The key quantity in barrier crossing processes in this respect is the barrier curvature a> B which sets the time 
window for possible influences of the dynamic solvent response. A sharp barrier entails short barrier passage 
times during which the memory of the solvent environment may be partially maintained. This non-Markov 
situation may be expressed by a generalized Langevin equation including a time-dependent friction kernel y(t) 
[62] 


= - fya- 


Jo 

in which case the autocorrelation function of the randomly fluctuating force is no longer a S-function but 
obeys (F(7)'F(7')) = kTy(t - ?). This ensures that a Maxwell-Boltzmann distribution is re-established after 
decay of the solvent response. Adding the assumption of a Gaussian friction kernel, a generalized Fokker- 
Planck equation with time-dependent friction may be set up, and for a piecewise parabolic potential one 
obtains an expression for the rate coefficient, the so-called Grote-Hynes formula [63]: 

km = —K- (A3.6.33) 
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X is the reactive frequency or unstable mode which is related to the friction coefficient by the implicit 
equation 

A r = (A3.6.34) 

with y(^ r ) being the Laplace transform of the time-dependent friction, ^ " "" *u P [ ~ t JFl . It is 
obvious that calculation of £ GH requires knowledge of potential barrier parameters and the complete 
viscoelastic response of the solvent, demonstrating the fundamental intimate link between condensed-phase 
reaction dynamics and solvation dynamics. This kind of description may be equivalently transferred to the 
dielectric response of the solvent causing dielectric friction effects in reactions with significant and fast charge 
rearrangement [64, 65 and 66 ]. 

In the Smoluchowski limit the reaction is by definition the slow coordinate, such that 

y(k r ) ?z y((i) = f^ y (r)tir y y(Q) 3> A r and A'gii **= Asm = k^o^M/yiO). Though the time-dependent friction 
in principle is accessible via molecular dynamics simulations, for practical purposes in chemical kinetics in 
most cases analytical friction models have to be used including a short-time Gaussian 'inertial' component 
and a hydrodynamic tail at longer times. In the Grote-Hynes description the latter term only comes into play 
when the barrier top is sufficiently flat. As has been pointed out, the reactive mode frequency X T can be 
interpreted as an effective barrier curvature such that coupling of the reaction coordinate to the solvent 
changes position and shape of the barrier in phase space. 

Because of the general difficulty encountered in generating reliable potentials energy surfaces and estimating 
reasonable friction kernels, it still remains an open question whether by analysis of experimental rate 
constants one can decide whether non-Markovian bath effects or other influences cause a particular solvent or 
pressure dependence of reaction rate coefficients in condensed phase. From that point of view, a purely 


empirical friction model might be a viable alternative, in which the frequency-dependent friction is replaced 

by a state-dependent friction Yy. w i ~ * *^ = Ctf fl/v^/ , J + ")that is described in terms of properties of PES and 
solute-solvent interaction, depicting the reaction as occurring in a frozen environment of fixed microscopic 
viscosity [ 67 , 68 ]. 

(E) MICROSCOPIC FRICTION 

The relation between the microscopic friction acting on a molecule during its motion in a solvent environment 
and macroscopic bulk solvent viscosity is a key problem affecting the rates of many reactions in condensed 
phase. The sequence of steps leading from friction to diffusion coefficient to viscosity is based on the general 
validity of the Stokes-Einstein relation and the concept of describing friction by hydrodynamic as opposed to 
microscopic models involving local solvent structure. In the hydrodynamic limit the effect of solvent friction 
on, for example, rotational relaxation times of a solute molecule is [ 69 ] 

T W = l/6Z? m[ = (Vh/A WbcC I To (A3.6.35) 

where V^ is the hydrodynamic volume of the solute in the particular solvent, whereas f^ c and C are parameters 
describing hydrodynamic boundary conditions and correcting for aspherical shape, respectively. t q in turn 
may be related to the relaxation time of the free rotor. Though in many cases this equation correctly 
reproduces the viscosity dependence of x rot , in particular when solute and solvent molecules are comparable 
in size there are quite a number of significant deviations. One may incorporate this size effect by explicitly 
considering the first solvation shell on the 
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solute surface which, under the assumption of slip boundary conditions, gives for the correction factor C in 
equation (A3. 6. 35) : 

C ^ = 7 w ■ pi i *Ai iih (A3.6.36) 

with isothermal compressibility k t , ratio of radii of solvent to solute a r and a temperature-dependent 
parameter B. If one compares equation (A3. 6. 36) with the empirical friction model mentioned above, one 
realizes that both contain a factor of the form C= 1/1 + ar\ 9 suggesting that these models might be physically 
related. 

Another, purely experimental possibility to obtain a better estimate of the friction coefficient for rotational 
motion y rot in chemical reactions consists of measuring rotational relaxation times x rot of reactants and 
calculating it according to equation (A3. 6. 35) as y rot = 6kTi roV 


A3.6.4 SELECTED REACTIONS 

A3.6.4.1 PHOTOISOMERIZATION 

According to Kramers' model, for flat barrier tops associated with predominantly small barriers, the transition 
from the low- to the high-damping regime is expected to occur in low-density fluids. This expectation is borne 


out by an extensively studied model reaction, the photoisomerization of trans -stilbene and similar compounds 
[ 70 , 71 ] involving a small energy barrier in the first excited singlet state whose decay after photoexcitation is 
directly related to the rate coefficient of ^rafts-c/s-photoisomerization and can be conveniently measured by 
ultrafast laser spectroscopic techniques. 

(A) PRESSURE DEPENDENCE OF PHOTOISOMERIZATION RATE CONSTANTS 

The results of pressure-dependent measurements for trans -stilbene in supercritical n-pentane [46] ( figure 
A3. 6. 5 ) and the prediction from the model described by equation (A3. 6. 29) , using experimentally determined 
microcanonical rate coefficients in jet-cooled trans -stilbene to calculate k^, show two marked discrepancies 
between model calculation and measurement: (1) experimental values of k are an order of magnitude higher 
already at low pressure and (2) the decrease of k due to friction is much less pronounced than predicted. As 
interpretations for the first observation, several ideas have been put forward that will not be further discussed 
here, such as a decrease of the effective potential barrier height due to electrostatic solute-solvent interactions 
enhanced by cluster formation at relatively low pressures [72, 73], or incomplete intramolecular vibrational 
energy redistribution in the isolated molecule [74, 75, 76, 77, 78, 79 and 80], or Franck-Condon cooling in 
the excitation process [79, 80]. The second effect, the weak viscosity dependence, which was first observed in 
solvent series experiments in liquid solution [81, 82 and 83], has also led to controversial interpretations: (i) 
the macroscopic solvent viscosity is an inadequate measure for microscopic friction acting along the reaction 
path [84, 85], (ii) the multidimensional character of the barrier crossing process leads to a fractional power 
dependence of k on l/r| [54, 81, 86, 87], (iii) as the reaction is very fast, one has to take into account the finite 
response time of the solvent, i.e. consider frequency-dependent friction [ 81 , 87 ] and (iv) the effective barrier 
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height decreases further with increasing electronic polarizability and polarity of the solvent, and the observed 
phenomenon is a manifestation of the superposition of a static solvent effect and hydrodynamic solvent 
friction correctly described by r| [88]. One may test these hypotheses by studying molecular rotational motion 
and reaction independently in compressed sample solutions. A few examples will serve here to illustrate the 
main conclusions one can draw from the experimental results. 



(T/D'Vl0 10 Ksm ? 


Figure A3.6.5. Photoisomerization rate constant of trans -stilbene in n-pentane versus inverse of the self- 
diffusion coefficient. Points represent experimental data, the dashed curve is a model calculation based on an 


RRKM fit to microcanonical rate constants of isolated trans -stilbGne and the solid curve a fit that uses a 
reaction barrier height reduced by solute-solvent interaction [46]. 

(B) MICROSCOPIC AND FREQUENCY-DEPENDENT FRICTION 

Rotational relaxation times x rot of trans -stilbene and E,E-diphenylbutadiene (DPB) in liquid solvents like 
subcritical ethane and ^-octane show a perfectly linear viscosity dependence with a slope that depends on the 
solvent [89] ( figure A3. 6. 6 ), showing that microscopic friction acting during molecular rotational diffusion is 
proportional to the macroscopic solvent viscosity and that the relevant solute-solvent coupling changes with 
solvent. It seems reasonable to assume, therefore, that a corresponding relation also holds for microscopic 
friction governing diffusive motion along the reaction path. 


-27- 



il ■' mPa s 


Figure A3.6.6. Viscosity dependence of rotational relaxation times of trans -stilberiQ in ethane (open circles) 
and ?z-octane (full circles) 


The validity of this assumption is apparent in the viscosity dependence of rate coefficients for S^ 
photoisomerization reactions in a number of related molecules such as c/s-stilbene [ 90 ] (see figure A3. 6. 7 ), 
tetraphenylethylene (TPE) [91], DPB [92] and 'stiff trans -stilbene [93] (where the phenyl ring is fixed by a 
five-membered ring to the ethylenic carbon atom). In all these cases a study of the pressure dependence 
reveals a linear correlation between k and l/r| in ^2-alkane and w-alkanol solvents, again with a solvent- 
dependent slope. The time scale for motion along the reaction path extends from several hundred picoseconds 
in DPB to a couple of hundred femtoseconds in c/s-stilbene. There is no evidence for a frequency dependence 
of the friction coefficient in these reactions. As the time scale for the similar reaction in frYms-stilbene is 
between 30 and 300 ps, one may conclude that also in this case the dynamics is mainly controlled by the zero- 
frequency friction which, in turn, is adequately represented by the macroscopic solvent viscosity. Therefore, 
the discrepancy between experiment and model calculation observed for frYms-stilbene in compressed-liquid 
?z-alkanes does not indicate a breakdown of the simple friction model in the Kramers-Smoluchowski theory. 
This result is in contrast to the analysis of solvent series study in linear alkanes, in which a solvent size effect 
of the micro viscosity was made responsible for weak viscosity dependence [94], Surprisingly, in a different 
type of non-polar solvent like methylcyclohexane, an equally weak viscosity dependence was found when the 
pressure was varied [95]. So the details of the viscosity influence are still posing puzzling questions. 
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Figure A3.6.7. Viscosity dependence of reduced S 1 -decay rate constants of c/s-stilbene in various solvents 
[90]. The rate constants are divided by the slope of a linear regression to the measured rate constants in the 
respective solvent. 

(C) EFFECTIVE BARRIER HEIGHT 

Measuring the pressure dependence of k at different temperatures shows that the apparent activation energy at 
constant viscosity decreases with increasing viscosity [46, 89] ( figure A3. 6. 8 ). From a detailed analysis one 
can extract an effective barrier height E^ along the reaction path that decreases linearly with increasing 
density of the solvent. The magnitude of this barrier shift effect is more than a factor of two in nonpolar 
solvents like w-hexane or n-pentane [46]. It is interesting to note that in compressed-liquid n-propanol one 
almost reaches the regime of barrierless dynamics [96]. This is also evident in the room-temperature k{r[) 
isotherm measured in n-butanol ( figure A3. 6. 9 ) which turns into linear k versus \lr\ dependence at higher 
pressures, indicating that there is no further decrease of the effective barrier height. Thus the unexpected 
dependence of the reaction rate on solvent viscosity is connected with specific properties of the PES of trans- 
stilbene in its first excited singlet state, because corresponding measurements for, for example, DPB or TPE in 
^2-alkanes and ^2-alkanols do not show any evidence for deviations from standard Kramers-Smoluchowski 
behaviour. 
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Figure A3.6.8. Isotherms of £(r|) for frYms-stilbene photoisomerization in n-hexane at temperatures between 


300 K (bottom) and 480 K (top). The curvature of the isotherms is interpreted as a temperature-dependent 
barrier shape [89]. 
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Figure A3.6.9. Viscosity dependence of photoisomerization rate constants of trans -stilbene (open circles) and 
E,E-diphenylbutadiene (full circles) in w-butanol. The broken line indicates a r| -dependence of k [96], 

As a multidimensional PES for the reaction from quantum chemical calculations is not available at present, 
one does not know the reason for the surprising barrier effect in excited £ra?zs-stilbene. One could suspect that 
£ra?zs-stilbene possesses already a significant amount of zwitterionic character in the conformation at the 
barrier top, implying a fairly 'late' barrier along the reaction path towards the twisted perpendicular structure. 
On the other hand, it could also be possible that the effective barrier changes with viscosity as a result of a 
multidimensional barrier crossing process along a curved reaction path. 

(D) SOLVATION DYNAMICS 

The dependence of k on viscosity becomes even more puzzling when the time scale of motion along the 
reaction coordinate becomes comparable to that of solvent dipole reorientation around the changing charge 
distribution 
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within the reacting molecule — in addition to mechanical, one also has to consider dielectric friction. For 
frYms-stilbene in ethanol, the k(r\) curve exhibits a turning point which is caused by a crossover of competing 
solvation and reaction time scales [ 97 ] (figure A3. 6. 10): as the viscosity increases the dielectric relaxation 
time of the solvent increases more rapidly than the typical time necessary for barrier crossing. Gradually, the 
solvation dynamics starts to freeze out on the time scale of reactive motion, the polar barrier is no longer 
decreased by solvent dipole reorientation and the rate coefficient drops more rapidly with increasing viscosity. 
As soon as the solvent dipoles are completely 'frozen', one has the same situation as in a non-polar solvent: 
i.e. only the electronic polarizability of the solvent causes further decrease of the barrier height. 
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Figure A3.6.10. Viscosity dependence of photoisomerization rate constants of trans -stilbenQ (open circles) 

and E,E-diphenylbutadiene (full circles) in ethanol. The dashed line indicates a r| _1 -dependence of £, the 
dotted line indicates the viscosity dependence of the dielectric relaxation time of ethanol and the solid curve is 
the result of a kinetic model describing the parallel processes of reaction and solvent relaxation [97]. 

A3.6.4.2 CHAIR-BOAT INVERSION OF CYCLOHEXANE 

As mentioned above, in liquid solution most reactions are expected to have passed beyond the low-damping 
regime where the dynamics is dominated by activating and deactivating collisions between reactants' solvent 
molecules. In general, this expectation is met, as long as there is a sufficiently strong intramolecular coupling 
of the reaction coordinate to a large number of the remaining modes of the reactant at the transition state 
which leads to fast IVR within the reactant. In this case, the high-pressure limit of unimolecular rate theory is 
reached, and additional coupling to the liquid solvent environment leads to a decrease of the rate coefficient 
through the factor F^. From this point of view, the observation of rate coefficient maxima in liquid solution 
would appear to signal a breakdown of RRKM theory. In particular it has been argued that, for the case of 
weak m^ramolecular coupling, a strong coupling of the reaction coordinate to the solvent could effectively 
decrease the volume of phase space accessible to the reactant in 
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the liquid with respect to the gas phase [98, 99]. As the relative strength of intra- and mtermolecular coupling 
may change with solvent properties, the breakdown of the RRKM model might be accompanied by the 
appearance of a rate coefficient maximum in liquid solution as a function of solvent friction. 


Among the few reactions for which an increase of a reaction rate coefficient in liquid solution with increasing 
reactant-solvent coupling strength has been observed, the most notable is the thermal chair-boat isomerization 
reaction of cyclohexane ( figure A3. 6. 11 ) and 1 , 1 -difluorocyclohexane [ 100 , 101 , 102 and 103 ]. The observed 
pressure dependence of the rate coefficients along different isotherms was analysed in terms of one- 
dimensional transition state theory by introducing a transmission coefficient k describing the effect of solvent 
friction £ Qbs = k £ tst - In the intermediate- to high-damping regime, k can be identified with the Kramers term 
F^.. The observed pressure-dependent activation volumes " ^oas^were considered to represent the sum of a 


pressure-independent intrinsic activation volume Tar'and a pressure-dependent formal collisional 
activation volume ^ MTOLL 'arising from the increase of that reactant-solvent coupling with pressure which 
corresponds to viscous effects 


RT {— Y—* v « 


TfiT 


The intrinsic volume of activation was estimated to correspond to the molar volume difference between 
cyclohexene and cyclohexane, adding the molar volume difference between ethane and ethene to account for 

the two missing protons and shortened double bond in cyclohexane. This yields a value of "tst'= -1.5 cm 3 
mol -1 . Then, knowing the pressure dependence of the solvent viscosity, the viscosity dependence of the 
relative transmission coefficient k was estimated from 
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The experimental values of k(t|) have a maximum at a viscosity close to 3 cP and varies by about 15% over 
the entire viscosity range studied. As discussed above, this unexpected dependence of k on solvent friction in 
liquid CS 2 is thought to be caused by a relatively weak intramolecular coupling of the reaction coordinate to 
the remaining modes in cyclohexane. At viscosities below the maximum, motion along the reaction 
coordinate due to the reduction of the accessible phase space region is fast. The barrier passage is still in the 
inertial regime, and the strong coupling to the solvent leads to increasingly rapid stabilization in the product 
well. With increasing solvent friction, the barrier crossing enters the diffusive regime and begins to show a 
slowdown with further increasing solvent viscosity. 
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Figure A3.6.11. Viscosity dependence of transmission coefficient of the rate of cyclohexane chair-boat 
inversion in liquid solution (data from [ 100 ]). 

This interpretation of the experimentally determined pressure dependence of the isomerization rate rests on 


the assumptions that (i) the barrier height for the reaction is independent of pressure and (ii) the estimate of 

the intrinsic volume of activation is reliable to within a factor of two and TST'does not change with 
pressure. As pointed out previously, due to the differences in the pressure dependences of solvent viscosity 
and density, a change of the barrier height with solvent density can give rise also to an apparent maximum of 

the rate coefficient as a function of viscosity. In particular, a decrease of E^ with pressure by about 1 kJ mol 
could explain the observed non-monotonic viscosity dependence. Therefore, the constancy to within 0.05 kJ 

mol -1 of the isoviscous activation energy over a limited viscosity range from 1.34 to 2.0 cP lends some 
support to the first assumption. 

From stochastic molecular dynamics calculations on the same system, in the viscosity regime covered by the 
experiment, it appears that intra- and mtermolecular energy flow occur on comparable time scales, which 
leads to the conclusion that cyclohexane isomerization in liquid CS 2 is an activated process [99]. Classical 
molecular dynamics calculations [ 104 ] also reproduce the observed non-monotonic viscosity dependence of k. 
Furthermore, they also yield a solvent contribution to the free energy of activation for the isomerization 

reaction which in liquid CS 9 increases by about 0.4 kJ mol, when the solvent density is increased from 1.3 
to 1.5 g cm . Thus the molecular dynamics calculations support the conclusion that the high-pressure limit of 
this unimolecular reaction is not attained in liquid solution at ambient pressure. It has to be remembered, 
though, that the analysis of the measured isomerization rates depends critically on the estimated value of 

A *t.st\ What is still needed is a reliable calculation of this quantity in CS 2 . 
A3.6.4.3 PHOTOLYTIC CAGE EFFECT AND GEMINATE RECOMBINATION 

For very fast reactions, the competition between geminate recombination of a pair of initially formed reactants 
and its escape from the common solvent cage is an important phenomenon in condensed-phase kinetics that 
has received considerable attention both theoretically and experimentally. An extremely well studied example 
is the 
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photodissociation of iodine for which the quantum yield ® d decreases from unity in the dilute-gas phase by up 
to a factor often or more in compressed-liquid solution. An intuitively appealing interpretation of this so- 
called photolytic cage effect, predicted by Franck and Rabinovitch in the 1930s [ 105 ], is based on models 
describing it as diffusive escape of the pair [ 106 ], formed instantaneously at t^ with initial separation r Q , from 
the solvent cage under the influence of Stokes friction subject to inner boundary conditions similar to equation 
(A3.6.21) [31], 

dp ? dp 

-f = D AA V 2 p + S(r - m)S(t - to) k ma[ p(rj) = 4* tf D AA -^ 

df Sr 

Solving this diffusion problem yields an analytical expression for the time-dependent escape probability q{t)\ 

*< f > = l - //■ J grfcfT-^) - e *PK* - 0(1 +jc) + y l ti\+x) 2 \ 
z(]+x)\ \2yV/v 
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where x and y are as defined above and z = r^/R. This equation can be compared with time-resolved 
measurements of geminate recombination dynamics in liquid solution [ 107 , 108 ] if the parameters r Q , R, D^ 
and £ mol are known or can be reliably estimated. This simple diffusion model, however, does not satisfactorily 
represent the observed dynamics, which is in part due to the participation of different electronic states. Direct 
evidence for this comes from picosecond time-resolved transient absorption experiments in solution that 
demonstrate the involvement of the low-lying, weakly bound iodine A and A' states. In these experiments it 
was possible to separate geminate pair dynamics and vibrational energy relaxation of the initially formed hot 
iodine molecules [40, 41, 109 ]. The details of the complex steps of recombination dynamics are still only 
partially understood and the subject of mixed quantum-classical molecular dynamics simulations [ 110 ]. 

In order to probe the importance of van der Waals interactions between reactants and solvent, experiments in 
the gas-liquid transition range appear to be mandatory. Time-resolved studies of the density dependence of 
the cage and cluster dynamics in halogen photodissociation are needed to extend earlier quantum yield studies 
which clearly demonstrated the importance of van der Waals clustering at moderate gas densities [37, 111 ] 
(see figure A3. 6. 12 ). The pressure dependence of the quantum yield established the existence of two different 
regimes for the cage effect: (i) at low solvent densities, excitation of solvent-clustered halogen molecules 
leads to predissociation of the van der Waals bond and thereby to stabilization of the halogen molecule, 
whereas (ii), at high liquid phase densities, the hard-sphere repulsive caging takes over which leads to a strong 
reduction in the photodissociation quantum yield. 
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Figure A3.6.12. Photolytic cage effect of iodine in supercritical ethane. Points represent measured 
photodissociation quantum yields [37] and the solid curve is the result of a numerical simulation [ 111 ]. 

Attractive long-range and repulsive short-range forces both play a role in the cage effect, though each type 
dominates in a different density range. Whereas the second component has traditionally been recognized as 
being responsible for caging in liquid solution and solids, theoretical models and molecular dynamics 
calculations [ 112 , 113 ] have confirmed the idea that complex formation between halogen and solvent 
molecules in supercritical solvents is important in photodissociation dynamics and responsible for the 
lowering of quantum yields at moderate gas densities [ 114 , 115 ]. 


The traditional diffusion model permits estimation of the magnitude of the cage effect in solution according to 
[37] 


Hm qlt) = 1 - * 

which should directly represent overall photodissociation quantum yields measured in dense solvents, as in 
this quantity dynamical effects are averaged out as a consequence of multiple collisions in the cage and 
effective collision-induced hopping between different electronic states at large interatomic distances. The 
initial separation of the iodine atom pair in the solvent cage may be calculated by assuming that immediately 
after excitation, the atoms are spherical particles subject to Stokes friction undergoing a damped motion on a 
repulsive potential represented by a parabolic branch. This leads to an excitation energy dependence of the 
initial separation [ 37 ] 

z - 1 = — exp . with c = = for c < ] 

2c \ ^fT=c?J Jmrtkv - Do) 

where Gj and m l are radius and mass of the iodine atom, respectively, hv is the photon energy and D^ the 
dissociation energy of iodine molecules. Obviously, c > 1 corresponds to the overdamped case for which r Q = 
R irrespective of 
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initial energy. As in experiments a fairly weak dependence of ® d on excitation wavelength was found, it 
seems that, at least at liquid phase densities, separation of the iodine pair is overdamped, a finding 
corroborated by recent classical molecular dynamics simulations using simple model potentials [38]. 

The simple diffusion model of the cage effect again can be improved by taking effects of the local solvent 
structure, i.e. hydrodynamic repulsion, into account in the same way as discussed above for bimolecular 
reactions. The consequence is that the potential of mean force tends to favour escape at larger distances (r Q > 
1.57?) more than it enhances caging at small distances, leading to larger overall photodissociation quantum 
yields [116,117]. 

The analysis of recent measurements of the density dependence of ® d has shown, however, that considering 
only the variation of solvent structure in the vicinity of the atom pair as a function of density is entirely 
sufficient to understand the observed changes in ® d with pressure and also with size of the solvent molecules 
[38]. Assuming that iodine atoms colliding with a solvent molecule of the first solvation shell under an angle 
a less than a max (the value of a max is solvent dependent and has to be found by simulations) are reflected 
back onto each other in the solvent cage, ® d is given by 

o 
where the solvation shell radius shell is obtained from Lennard- Jones radii (figure A3.6.13). 
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Figure A3.6.13. Density dependence of the photolytic cage effect of iodine in compressed liquid n-pentane 
(circles), w-hexane (triangles), and ^-heptane (squares) [38]. The solid curves represent calculations using the 
diffusion model [37], the dotted and dashed curves are from 'static' caging models using Carnahan-Starling 
packing fractions and calculated radial distribution functions, respectively [38]. 
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As these examples have demonstrated, in particular for fast reactions, chemical kinetics can only be 
appropriately described if one takes into account dynamic effects, though in practice it may prove extremely 
difficult to separate and identify different phenomena. It seems that more experiments under systematically 
controlled variation of solvent environment parameters are needed, in conjunction with numerical simulations 
that as closely as possible mimic the experimental conditions to improve our understanding of condensed- 
phase reaction kinetics. The theoretical tools that are available to do so are covered in more depth in other 
chapters of this encyclopedia and also in comprehensive reviews [6, 118 , 119 ]. 
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A3.7 Molecular reaction dynamics in the gas 
phase 


Daniel M Neumark 


A3.7.1 INTRODUCTION 

The field of gas phase reaction dynamics is primarily concerned with understanding how the microscopic 
forces between atoms and molecules govern chemical reactivity. This goal is targeted by performing exacting 
experiments which yield measurements of detailed attributes of chemical reactions, and by developing state- 
of-the-art theoretical techniques in order to calculate accurate potential energy surfaces for reactions and 
determine the molecular dynamics that occur on these surfaces. It has recently become possible to compare 
experimental results with theoretical predictions on a series of benchmark reactions. This convergence of 
experiment and theory is leading to significant breakthroughs in our understanding of how the peaks and 
valleys on a potential energy surface can profoundly affect the measurable properties of a chemical reaction. 

In most of gas phase reaction dynamics, the fundamental reactions of interest are bimolecular reactions, 


A + BC -* AB + C (A3.7.1) 

and unimolecular photodissociation reactions, 

AiiC-^AIi + C ( A3 - 7 - 2 ) 


There are significant differences between these two types of reactions as far as how they are treated 
experimentally and theoretically. Photodissociation typically involves excitation to an excited electronic state, 
whereas bimolecular reactions often occur on the ground-state potential energy surface for a reaction. In 
addition, the initial conditions are very different. In bimolecular collisions one has no control over the reactant 
orbital angular momentum (impact parameter), whereas in photodissociation one can start with cold 
molecules with total angular momentum J « 0. Nonetheless, many theoretical constructs and experimental 
methods can be applied to both types of reactions, and from the point of view of this chapter their similarities 
are more important than their differences. 

The field of gas phase reaction dynamics has been extensively reviewed elsewhere [1, 2 and 3] in 
considerably greater detail than is appropriate for this chapter. Here, we begin by summarizing the key 
theoretical concepts and experimental techniques used in reaction dynamics, followed by a 'case study', the 
reaction F + H 2 — » HF + H, which serves as an illustrative example of these ideas. 


A3.7.2 THEORETICAL BACKGROUND: THE POTENTIAL ENERGY 
SURFACE 

Experimental and theoretical studies of chemical reactions are aimed at obtaining a detailed picture of the 
potential 


energy surface on which these reactions occur. The potential energy surface represents the single most 
important theoretical construct in reaction dynamics. For N particles, this is a 37V- 6 dimensional function V 
(q^ . . .^3^_ 6 ) that gives the potential energy as a function of nuclear internal coordinates. The potential energy 
surface for any reaction can, in principle, be found by solving the electronic Schrodinger equation at many 
different nuclear configurations and then fitting the results to various functional forms, in order to obtain a 
smoothly varying surface in multiple dimensions. In practice, this is extremely demanding from a 
computational perspective. Thus, much of theoretical reaction dynamics as recently as a few years ago was 
performed on highly approximate model surfaces for chemical reactions which were generated using simple 
empirical functions (the London-Eyring-Polanyi-Sato potential, for example [4]). The H + H 2 reaction was 
the first for which an accurate surface fitted to ab initio points was generated [5, 6]. However, recent 
conceptual and computational advances have made it possible to construct accurate surfaces for a small 
number of benchmark systems, including the F + H 2 , CI + H 2 and OH + H 2 reactions [7, 8 and 9]. Even in 
these systems, one must be concerned with the possibility that a single Born-Oppenheimer potential energy 
surface is insufficient to describe the full dynamics [10]. 

Let us consider the general properties of a potential energy surface for a bimolecular reaction involving three 
atoms, i.e. equation (A3. 7.1) with A, B and C all atomic species. A three-atom reaction requires a three- 
dimensional function. It is more convenient to plot two-dimensional surfaces in which all coordinates but two 
are allowed to vary. Figure A3. 7.1 shows a typical example of a potential energy surface contour plot for a 
collinear three-atom reaction. The dotted curve represents the minimum energy path, or reaction coordinate, 
that leads from reactants on the lower right to products on the upper left. The reactant and product valleys 
(often referred to as the entrance and exit valleys, respectively) are connected by the transition-state region, 
where the transformation from reactants to products occurs, and ends in the product valley at the upper left. 
The potential energy surface shown in Figure A3. 7.1 is characteristic of a 'direct' reaction, in that there is a 
single barrier (marked by J in Figure A3. 7.1 ) along the minimum energy path in the transition-state region. In 
the other general class of bimolecular reaction, a 'complex' reaction, one finds a well rather than a barrier in 
the transition-state region. 



*BC 


Figure A3. 7.1. Two-dimensional contour plot for direct collinear reaction A 

state is indicated by $. 


BC -> AB + C. Transition 


The barrier on the surface in figure A3. 7.1 is actually a saddle point; the potential is a maximum along the 
reaction coordinate but a minimum along the direction perpendicular to the reaction coordinate. The classical 
transition state is defined by a slice through the top of the barrier perpendicular to the reaction coordinate. 
This definition holds for multiple dimensions as well; for TV particles, the classical transition state is a saddle 
point that is unbound along the reaction coordinate but bound along the 37V- 7 remaining coordinates. A cut 
through the surface at the transition state perpendicular to the reaction coordinate represents a 37V- 7 
dimensional dividing surface that acts as a 'bottleneck' between reactants and products. The nature of the 
transition state and, more generally, the region of the potential energy in the vicinity of the transition state 
(referred to above as the transition-state region) therefore plays a major role in determining many of the 
experimental observables of a reaction such as the rate constant and the product energy and angular 
distributions. For this reason, the transition-state region is the most important part of the potential energy 
surface from a computational (and experimental) perspective. 

Once such an ab initio potential energy surface for a reaction is known, then all properties of the reaction can, 
in principle, be determined by carrying out multidimensional quantum scattering calculations. This is again 
computationally very demanding, and for many years it was more useful to perform classical and quasi- 
classical trajectory calculations to explore dynamics on potential energy surfaces [11]. The simpler 
calculations led to very valuable generalizations about reaction dynamics, showing, for example, that for an 
exothermic reaction with an entrance channel barrier, reactant translation was far more effective than 
vibration in surmounting the barrier and thus forming products, and are still very useful, since quantum effects 
in chemical reactions are often relatively small. However, recent conceptual and computational advances [ 12 , 
13 and 14] have now made it possible to carry out exact quantum scattering calculations on multidimensional 
potential energy surfaces, including the benchmark surfaces mentioned above. Comparison of such 
calculations with experimental observables provides a rigorous test of the potential energy surface. 


A3.7.3 EXPERIMENTAL TECHNIQUES IN REACTION DYNAMICS 


We now shift our focus to a general discussion of experimental chemical reaction dynamics. Given that the 
goal of these experiments is to gain an understanding of the reaction potential energy surface, it is important 
to perform experiments that can be interpreted as cleanly as possible in terms of the underlying surface. 
Hence, bimolecular and unimolecular reactions are often studied under 'single-collision' conditions, meaning 


that the number density in the experiment is sufficiently low that each reactant atom or molecule undergoes at 
most one collision with another reactant or a photon during the course of the experiment, and the products are 
detected before they experience any collisions with bath gases, walls, etc. One can therefore examine the 
results of single-scattering events without concern for the secondary collisions and reactions that often 
complicate the interpretation of more standard chemical kinetics experiments. Moreover, the widespread use 
of supersonic beams in reaction dynamics experiments [15, 16] allows one to perform reactions under well 
defined initial conditions; typically the reactants are rotationally and vibrationally very cold, and the spread in 
collision energies (for bimolecular reactions) is narrow. The study of photodissociation reactions [2, 17] has 
been greatly facilitated by recent developments in laser technology, which now permit one to investigate 
photodissociation at virtually any wavelength over a spectral range extending from the infrared to vacuum 
ultraviolet (VUV). 

What attributes of bimolecular and unimolecular reactions are of interest? Most important is the identity of the 
products, without which any further characterization is impossible. Once this is established, more detailed 
issues can be addressed. For example, in any exothermic reaction, one would like to determine how the excess 
energy is 
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partitioned among the translational, rotational, vibrational and electronic degrees of freedom of the products. 
Under the ideal of 'single-collision' conditions, one can measure the 'nascent' internal energy distribution of 
the products, i.e. the distribution resulting from the reaction before any relaxation (collisional or otherwise) 
has occurred. Measurements of the product angular distribution provide considerable insight into the topology 
and symmetry of the potential energy surface(s) on which the reaction occurs. More recently, the 
measurement of product alignment and orientation has become an area of intense interest; in 
photodissociation reactions, for example, one can determine if the rotational angular momentum of a 
molecular fragment is randomly oriented or if it tends to be parallel or perpendicular to the product velocity 
vector. 

An incredible variety of experimental techniques have been developed over the years to address these issues. 
One of the most general is the crossed molecular beams method with mass spectrometric detection of the 
products, an experiment developed by Lee, Herschbach and co-workers [18, 19]. A schematic illustration of 
one version of the experiment is shown in figure A3. 7. 2. Two collimated beams of reactants cross in a 
vacuum chamber under single-collision conditions. The scattered products are detected by a rotatable mass 
spectrometer, in which the products are ionized by electron impact and mass selected by a quadrupole mass 
spectrometer. By measuring mass spectra as a function of scattering angle, one obtains angular distributions 
for all reaction products. In addition, by chopping either the products or one of the reactant beams with a 
rapidly spinning slotted wheel, one can determine the time of flight of each product from the interaction 
region, where the two beams cross, to the ionizer, and from this the product translational energy Ej can be 
determined at each scattering angle. The resulting product translational energy distributions P(Ej) also 
contain information on the internal energy distribution of the products via conservation of energy, so long as 
the reactant collision energy is well defined. 
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Figure A3.7.2. Schematic illustration of crossed molecular beams experiment for F + H + 2 reaction. 


In an important variation of this experiment, one of the reactant beams is replaced by a pulsed laser which 
photodissociates molecules in the remaining reactant beam. Use of a pulsed laser makes it straightforward to 
determine the product translational energy distribution by time of flight. This experiment, photo fragment 
translational spectroscopy, was first demonstrated by Wilson [20, 21] in and is now used in many laboratories 
[17]. 


Mass spectrometry, the primary detection method in the above crossed beams experiments, is a particularly 
general means of analysing reaction products, since no knowledge of the optical spectroscopy of the products 
is required. On the other hand, electron impact ionization often leads to extensive fragmentation, thereby 
complicating identification of the primary products. Very recently, tunable VUV radiation from synchrotrons 
has been used to ionize scattered products from both photodissociation [ 22 ] and bimolecular reactions [23]; 
other than the ionization mechanism, the instrument is similar in principle to that shown in figure A3. 7. 2 . By 
choosing the VUV wavelength to lie above the ionization potential of the product of interest but below the 

lowest dissociative ionization threshold (i.e. the minimum energy for AB + hv — » A + B + e~) one can 
eliminate fragmentation and thus simplify interpretation of the experiments. 


A complementary approach to reaction dynamics centres on probing reaction products by optical 
spectroscopy. Optical spectroscopy often provides higher resolution on the product internal energy 
distribution than the measurement of translational energy distributions, but is less universally applicable than 
mass spectrometry as a detection scheme. If products are formed in electronically excited states, their 
emission spectra (electronic chemiluminescence) can be observed, but ground-state products are more 
problematic. Polanyi [24] made a seminal contribution in this field by showing that vibrationally excited 
products in their ground electronic state could be detected by spectrally resolving their spontaneous emission 
in the infrared; this method of 'infrared chemiluminescence' has proved of great utility in determining product 
vibrational and, less frequently, rotational distributions. 


However, with the advent of lasers, the technique of 'laser-induced fluorescence' (LIF) has probably become 
the single most popular means of determining product-state distributions; an early example is the work by 
Zare and co-workers on Ba + HX(X= F, CI, Br, I) reactions [25], Here, a tunable laser excites an electronic 
transition of one of the products (the BaX product in this example), and the total fluorescence is detected as a 


function of excitation frequency. This is an excellent means of characterizing molecular products with bound- 
bound electronic transitions and a high fluorescence quantum yield; in such cases the LIF spectra are often 
rotationally resolved, yielding rotational, vibrational and, for open shell species, fine-structure distributions. 
LIF has been used primarily for diatomic products since larger species often have efficient non-radiative 
decay pathways that deplete fluorescence, but there are several examples in which LIF has been used to detect 
polyatomic species as well. 

LIF can provide more detail than the determination of the product internal energy distribution. By measuring 
the shape LIF profile for individual rotational lines, one can obtain Doppler profiles which yield information 
on the translational energy distribution of the product as well [26, 27]. In photodissociation experiments 
where the photolysis and probe laser are polarized, the Doppler profiles yield information on product 
alignment, i.e. the distribution of m^ levels for products in a particular rotational state J [28]. Experiments of 
this type have shown, for example, that the rotational angular momentum of the OH product from H 2 
photodissociation tends to be perpendicular to o [29], the vector describing the relative velocity of the 
products, whereas for H 2 2 photodissociation [ 30 ] one finds J tends to be parallel to u. These 'vector 
correlation' measurements [ 31 , 32 and 33] are proving very useful in unravelling the detailed dynamics of 
photodissociation and, less frequently, bimolecular reactions. 

The above measurements are 'asymptotic', in that they involve looking at the products of reaction long after 
the collision has taken place. These very valuable experiments are now complemented by 'transition-state 
spectroscopy' 


experiments, in which one uses frequency- or time-domain experiments to probe the very short-lived complex 
formed when two reactants collide [34]. For example, in our laboratory, we have implemented a transition- 
state spectroscopy experiment based on negative-ion photodetachment [35]. The principle of the experiment, 
in which a stable negative ion serves as a precursor for a neutral transition state, is illustrated in figure A3. 7. 3. 
If the anion geometry is similar to that of the transition state, then photodetachment of the anion will access 
the transition-state region on the neutral surface. The resulting photoelectron spectrum can give a vibrationally 
resolved picture of the transition-state dynamics, yielding the frequencies of the bound vibrational modes of 
the transition state (i.e. those perpendicular to the reaction coordinate) and thereby realizing the goal of 
transition-state spectroscopy. An example of the successful application of this technique is given below in the 
discussion of the F + H 2 reaction. 
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Figure A3.7.3. Principle of transition-state spectroscopy via negative-ion photodetachment. 


Alternatively, one can take advantage of the developments in ultrafast laser technology and use femtosecond 
lasers to follow the course of a reaction in real time. In this approach, pioneered by Zewail [36], a 
unimolecular or bimolecular reaction is initiated by a femtosecond pump pulse, and a femtosecond probe 
pulse monitors some aspect of the reaction as a function of pump-probe delay time. The first example of such 
an experiment was the photodissociation of ICN [37], Here the pump pulse excited ICN to a repulsive 
electronic state correlating to ground state I + CN(X 2 S + ) products. The probe pulse excited the dissociating 
ICN to a second repulsive state correlating to excited I + CN(i? 2 S + ) products, and progress of the dissociation 
was monitored via LIF. If the probe pulse is tuned to be resonant with the CN B <— X transition, the LIF signal 
rises monotonically on a 200 fs time scale, attributed to the time delay for the formation of CN product. On 
the other hand, at slightly redder probe wavelengths, the LIF signal rises then falls, indicative of the transient 
ICN* species formed by the pump pulse. This experiment thus represented the first observation of a molecule 
in the act of falling apart. 

In an elegant application of this method to bimolecular reactions, the reaction H + C0 2 — » OH + CO was 
studied by forming the C0 2 - HI van der Waals complex, dissociating the HI moiety with the pump pulse, 
allowing the resulting H atom to react with the C0 2 , and then using LIF to probe the OH signal as a function 
of time [38]. This experiment represents the 'real-time clocking' of a chemical reaction, as it monitors the 
time interval between initiation of a bimolecular reaction and its completion. 


The above discussion represents a necessarily brief summary of the aspects of chemical reaction dynamics. 
The theoretical focus of this field is concerned with the development of accurate potential energy surfaces and 
the calculation of scattering dynamics on these surfaces. Experimentally, much effort has been devoted to 
developing complementary asymptotic techniques for product characterization and frequency- and time- 
resolved techniques to study transition-state spectroscopy and dynamics. It is instructive to see what can be 
accomplished with all of these capabilities. Of all the benchmark reactions mentioned in section A3. 7.2 , the 
reaction F + H 2 — » HF + H represents the best example of how theory and experiment can converge to yield a 
fairly complete picture of the dynamics of a chemical reaction. Thus, the remainder of this chapter focuses on 
this reaction as a case study in reaction dynamics. 


A3.7.4 CASE STUDY: THE F + H 2 REACTION 

The energetics for the F + H 2 reaction is shown in figure A3. 7.4. The reaction is exothermic by 32.07 kcal 

mol, so that at collision energies above 0.5 kcal mol, enough energy is available to populate HF 
vibrational levels up to and including u = 3. Hence the determination of the HF vibration-rotation distribution 
from this reaction has been of considerable interest. How might one go about this? Since HF does not have an 
easily accessible bound excited state, LIF is not an appropriate probe technique. On the other hand, the HF 
vibrational transitions in the infrared are exceedingly strong, and this is the spectral region where 
characterization of the HF internal energy distribution has been carried out. 
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Figure A3.7.4. Energetics of the F + H 2 reaction. All energies in kcal mol *. 

The first information on the HF vibrational distribution was obtained in two landmark studies by Pimentel 
[ 39 ] and Polanyi [ 24 ] in 1969; both studies showed extensive vibrational excitation of the HF product. 
Pimental found that the F + H 2 reaction could pump an infrared chemical laser, i.e. the vibrational distribution 
was inverted, with the HF(u = 2) population higher than that for the HF(u = 1) level. A more complete picture 
was obtained by Polanyi by measuring and spectrally analysing the spontaneous emission from vibrationally 
excited HF produced by the reaction. This 'infrared chemiluminescence' experiment yielded relative 
populations of 0.29, 1 and 0.47 for the HF(u = 1, 2 and 3) 


vibrational levels, respectively. While improvements in these measurements were made in subsequent years, 
the numbers describing the vibrational populations have stayed approximately constant. The highly inverted 
vibrational distributions are characteristic of a potential energy surface for an exothermic reaction with a 
barrier in the entrance channel. 

Spectroscopic determination of the HF rotational distribution is another story. In both the chemical laser and 
infrared chemiluminescence experiments, rotational relaxation due to collisions is faster or at least 
comparable to the time scale of the measurements, so that accurate determination of the nascent rotational 
distribution was not feasible. However, Nesbitt [40, 41] has recently carried out direct infrared absorption 
experiments on the HF product under single-collision conditions, thereby obtaining a full vibration-rotation 
distribution for the nascent products. 


These spectroscopic probes have been complemented by studies using the crossed molecular beams 
technique. In these experiments, two well collimated and nearly monoenergetic beams of H 2 and F atoms 
cross in a large vacuum chamber. The scattered products are detected by a rotatable mass spectrometer, 
yielding the angular distribution of the reaction products. The experiment measures the transitional energy of 
the products via time of flight. Thus, one obtains the full transitional energy and angular distribution, P(E T ,Q), 
for the HF products. The first experiments of this type on the F + D 2 reaction were carried out by Lee [42] in 
1970. Subsequent work by the Lee [43, 44] and Toennies [45, 46] groups on the F + H 2 , D 2 and HD reactions 
has yielded a very complete characterization of the P(E, 0) distribution. 

As an example, figure A3. 7. 5 shows a polar contour plot of the HF product velocity distribution at a reactant 

collision energy of 2? « = 1.84 kcal mol -1 [43]./?-H 2 refers to para-hydrogen, for which most of the 
rotational population is in the J= level under the experimental conditions used here. This plot is in the 
centre-of-mass (CM) frame of reference. F atoms are coming from the right, and H 2 from the left, and the 


scattering angle is reference to the H 2 beam. The dashed circles ('Newton circles') represent the maximum 
speed of the HF product in a particular vibrational state, given by 
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where AE is the exothermicity, E the vibrational energy, M is the total mass (M = m H + w HF ) and \i = 
w F w H /Mthe reduced mass of the products. Thus, all the signal inside the u = 3 circle is from HF(u = 3), all 
the signal inside the u = 2 circle is from HF(u = 2) or HF(u = 3), etc. 
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Figure A3.7.5. Velocity-flux contour plot for HF product from the reaction F +para-H 2 — > HF + H at a 
reactant collision energy of 1.84 kcal mol . 

An important feature of figure A3. 7. 5 is that the contributions from different HF vibrational levels, 
particularly the o = 2 and 3 levels, are very distinct, a result of relatively little rotational excitation of the HF 
(u = 2) products (i.e. if these products had sufficient rotational excitation, they would have the same 
translational energy as HF(u = 3) product in low /levels). As a consequence, from figure A3. 7. 5 one can infer 
the angular distribution for each HF vibrational state, in other words, vibrationally state-resolved differential 
cross sections. These are quite different depending on the vibrational level. The HF(u = 2) and (o = 1) 
products are primarily back-scattered with their angular distributions peaking at = n, while the HF(u = 3) 
products are predominantly forward-scattered, peaking sharply at = 0°. In general, backward-scattered 
products result from low impact parameter, head-on collisions, while forward-scattered products are a 
signature of higher impact parameter, glancing collisions. To understand the significance of these results, it is 
useful to move away from experimental results and consider the development of potential energy surfaces for 
this reaction. 


Many potential energy surfaces have been proposed for the F + H 2 reaction. It is one of the first reactions for 
which a surface was generated by a high-level ab initio calculation including electron correlation [47]. The 

resulting surface (restricted to collinear geometries) was imperfect, but it had a low barrier (1.66 kcal mol -1 ) 
lying in the entrance channel, as expected for an exothermic reaction with a low activation energy (-1.0 kcal 

mol). In the 1970s, several empirical surfaces were developed which were optimized so that classical 
trajectory calculations performed on these surfaces reproduced experimental results, primarily the rate 


constant and HF vibrational energy distribution. One of these, the Muckerman V surface [48], was used in 
many classical and quantum mechanical scattering calculations up until the mid-1980s and provided a 
generally accepted theoretical foundation for the F + H 2 reaction. However, one notable feature of this surface 
was its rather stiff bend potential near the transition state. With such a potential, only near-collinear collisions 
were likely to lead to reaction. As a consequence, the HF product angular distribution found by scattering 
calculations on this surface was strongly back-scattered for all vibrational states. This is in marked 
disagreement with the experimental results in figure A3. 7. 5 which show the HF(u = 3) distribution to be 
strongly forward-scattered. 
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At the time the experiments were performed (1984), this discrepancy between theory and experiment was 
attributed to quantum mechanical resonances that led to enhanced reaction probability in the HF(u = 3) 
channel for high impact parameter collisions. However, since 1984, several new potential energy surfaces 
using a combination of ab initio calculations and empirical corrections were developed in which the bend 
potential near the barrier was found to be very flat or even non-collinear [49, 51 ], in contrast to the 
Muckerman V surface. In 1988, Sato [ 52 ] showed that classical trajectory calculations on a surface with a 
bent transition-state geometry produced angular distributions in which the HF(u = 3) product was peaked at 
= 0°, while the HF(u = 2) product was predominantly scattered into the backward hemisphere (0 > 90°), 
thereby qualitatively reproducing the most important features in figure A3. 7. 5 . 

At this point it is reasonable to ask whether comparing classical or quantum mechanical scattering 
calculations on model surfaces to asymptotic experimental observables such as the product energy and 
angular distributions is the best way to find the 'true' potential energy surface for the F + H 2 (or any other) 
reaction. From an experimental perspective, it would be desirable to probe the transition-state region of the F 
+ H 2 reaction in order to obtain a more direct characterization of the bending potential, since this appears to 
be the key feature of the surface. From a theoretical perspective, it would seem that, with the vastly increased 
computational power at one's disposal compared to 10 years ago, it should be possible to construct a 
chemically accurate potential energy surface based entirely on ab initio calculations, with no reliance upon 
empirical corrections. Quite recently, both developments have come to pass and have been applied to the F + 
H 2 reaction. 

The transition-state spectroscopy experiment based on negative-ion photodetachment described above is well 
suited to the study of the F + H 2 reaction. The experiment is carried out through measurement of the 
photoelectron spectrum of the anion FH ; . This species is calculated to be stable with a binding energy of 

about 0.20 eV with respect to F~ + H 2 [53]. Its calculated equilibrium geometry is linear and the internuclear 
distances are such that good overlap with the entrance barrier transition state is expected. 

The photoelectron spectrum of FH ? is shown in figure A3. 7. 6 [54]. The spectrum is highly structured, showing 

a group of closely spaced peaks centred around 1 eV, and a smaller peak at 0.5 eV. We expect to see 
vibrational structure corresponding to the bound modes of the transition state perpendicular to the reaction 
coordinate. For this reaction with its entrance channel barrier, the reaction coordinate at the transition state is 

the F ' • ' H 2 distance, and the perpendicular modes are the F-H-H bend and H-H stretch. The bend frequency 
should be considerably lower than the stretch. We therefore assign the closely spaced peaks to a progression 
in the F-H-H bend and the small peak at 0.5 eV to a transition-state level with one quantum of vibrational 
excitation in the H 2 stretch. 
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Figure A3.7.6. Photoelectron spectrum of FH, . Here the F is complexed to para-H 2 . Solid curve: 
experimental results. Dashed curve: simulated spectrum from scattering calculation on ab initio surface. 

The observation of a bend progression is particularly significant. In photoelectron spectroscopy, just as in 
electronic absorption or emission spectroscopy, the extent of vibrational progressions is governed by Franck- 
Condon factors between the initial and final states, i.e. the transition between the anion vibrational level u" 
and neutral level o ' is given by 


/.^oc|(f,H^}| 2 


(A3.7.4) 


where \|/ , and \|/ „ are the neutral and anion vibrational wavefunctions, respectively. Since the anion is linear, 
a progression in a bending mode of the neutral species can only occur if the latter is bent. Hence the 
FH ? photoelectron spectrum implies that the [^transition state is bent. 

While this experimental work was being carried out, an intensive theoretical effort was being undertaken by 
Werner and co-workers to calculate an accurate F + H 2 potential energy surface using purely ab initio 
methods. The many previous unsuccessful attempts indicated that an accurate calculation of the barrier height 
and transition-state properties requires both very large basis sets and a high degree of electron correlation; 
Werner incorporated both elements in his calculation. The resulting Stark-Werner (SW) surface [7] has a bent 

geometry at the transition state and a barrier of 1.45 ± 0.25 kcal mol . A two-dimensional contour plot of this 
potential near the transition state is shown in figure A3. 7. 7 . The reason for the bent transition state is 
illuminating. The F atom has one half-filled p orbital and one might expect this to react most readily with H 2 
by collinear approach of the reactants with the half-filled p orbital lined up with the internuclear axis of the H 2 

molecule. On the other hand, at longer F ■*■ H 2 distances, where electrostatic forces dominate, there is a 
minimum in the potential energy surface at a T-shaped geometry with the half-filled orbital perpendicular to 
the H-H bond. (This arises from the quadrupole-quadrupole interaction between the F and H 2 .) 
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The interplay between favourable reactivity at a collinear geometry and electrostatic forces favouring a T- 
shaped geometry leads to a bent geometry at the transition state. 
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Figure A3.7.7. Two-dimensional contour plot of the Stark-Werner potential energy surface for the F + H 2 
reaction near the transition state. is the F-H-H bend angle. 

How good is this surface? The first test was to simulate the FH^photoelectron spectrum. This calculation was 

carried out by Manolopoulos [ 54 ] and the result is shown as a dashed curve in figure A3. 7. 6 . The agreement 
with experiment is excellent considering that no adjustable parameters are used in the calculation. In addition, 
Castillo et al [55, 56] and Aoiz et al [57] have performed quasi-classical and quantum scattering calculations 
on the SW surface to generate angular distributions for each HF product vibrational state for direct 
comparison to the molecular beam scattering results in figure A3. 7. 5 . The state-specific forward-scattering of 
the HF(u = 3) product is indeed reproduced in the quantum calculations and, to a somewhat lesser extent, in 
the quasi-classical calculations. The experimental product vibrational populations are also reproduced by the 
calculations. It therefore appears that scattering calculations on the SW surface agree with the key 
experimental results for the F + FL reaction. 

What is left to understand about this reaction? One key remaining issue is the possible role of other electronic 
surfaces. The discussion so far has assumed that the entire reaction takes place on a single Born-Oppenheimer 
potential energy surface. However, three potential energy surfaces result from the interaction between an F 
atom and H 2 . The spin-orbit splitting between the 2 P 3 / 2 an d 2 ?i/2 states of a free F atom is 404 cm -1 . When 
an F atom interacts with H 2 , the 2 P 3/2 state splits into two states with A r and A" symmetry ( 2 S + and 2 n 3/2 , 
respectively, for collinear geometry) while the higher-lying Pi /2 state becomes an A' state ( Ft 1/2 for 
collinear geometry). Only the lower A' state correlates adiabatically to ground-state HF + H products; the 
other two states correlate to highly excited products and are therefore non-reactive in the adiabatic limit. In 
this limit, the excited F( 2 P 1/2 ) state is completely unreactive. 

Since this state is so low in energy, it is likely to be populated in the F atom beams typically used in scattering 
experiments (where pyrolysis or microwave/electrical discharges are used to generate F atoms), so the issue of 
its reactivity is important. The molecular beam experiments of Lee [43] and Toennies [45] showed no 
evidence for 
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reaction from the F( Pi /2 ) state. However, the recent work of Nesbitt [40, 41], in which the vibrational and 
rotational HF distribution was obtained by very high-resolution IR spectroscopy, shows more rotational 

excitation of the HF(u = 3) product than should be energetically possible from reaction with the F( 2 P 3/2 ) state. 
They therefore suggested that this rotationally excited product comes from the F( P 1/2 ) state. This work 
prompted an intensive theoretical study of spin-orbit effects on the potential energy surface and reaction 


dynamics. A recent study by Alexander and co-workers [58] does predict a small amount of reaction from the 

F( 2 P 1 / 2 ) state but concludes that the adiabatic picture is largely correct. The issue of whether a reaction can be 
described by a single Born-Oppenheimer surface is of considerable interest in chemical dynamics [10], and it 
appears that the effect of multiple surfaces must be considered to gain a complete picture of a reaction even 
for as simple a model system as the F + H 2 reaction. 


A3.7.5 CONCLUSIONS AND PERSPECTIVES 

This chapter has summarized some of the important concepts and results from what has become an 
exceedingly rich area of chemical physics. On the other hand, the very size of the field means that the vast 
majority of experimental and theoretical advances have been left out; the books referenced in the introduction 
provide a much more complete picture of the field. 

Looking toward the future, two trends are apparent. First, the continued study of benchmark bimolecular and 
photodissociation reactions with increasing levels of detail is likely to continue and be extremely productive. 
Although many would claim that the 'three-body problem' is essentially solved from the perspective of 
chemical reaction dynamics, the possibility of multiple potential surfaces playing a role in the dynamics adds 
a new level of complexity even for well studied model systems such as the F + H 2 reaction considered here. 
Slightly more complicated benchmark systems such as the OH + H 2 and OH + CO reactions present even 
more of a challenge to both experiment and theory, although considerable progress has been achieved in both 
cases. 

However, in order to deliver on its promise and maximize its impact on the broader field of chemistry, the 
methodology of reaction dynamics must be extended toward more complex reactions involving polyatomic 
molecules and radicals for which even the primary products may not be known. There certainly have been 
examples of this: notably the crossed molecular beams work by Lee [ 59 ] on the reactions of O atoms with a 
series of hydrocarbons. In such cases the spectroscopy of the products is often too complicated to investigate 
using laser-based techniques, but the recent marriage of intense synchrotron radiation light sources with state- 
of-the-art scattering instruments holds considerable promise for the elucidation of the bimolecular and 
photodissociation dynamics of these more complex species. 
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A3.8 Molecular reaction dynamics in condensed 
phases 

Gregory A Voth 


A3.8.0 INTRODUCTION 

The effect of the condensed phase environment on chemical reaction rates has been extensively studied over 
the past few decades. The central framework for understanding these effects is provided by the transition state 
theory (TST) [1, 2] developed in the 1930s, the Kramers theory [3] of 1940, the Grote-Hynes [4] and related 
theories [5] of the 1980s and 1990s and the Yamamoto reactive flux correlation function formalism [6] as 
extended and further developed by a number of workers [7, §]. Each of these seminal theoretical 
breakthroughs has, in turn, generated an enormous amount of research in its own right. There are many good 
reviews of this body of literature, some of which are cited in [5, 9, 10, H and 12 ]. It therefore serves no useful 
purpose to review the field again in the present chapter. Instead, the key issues involving condensed phase 
effects on chemical reactions will be organized around the primary theoretical concepts as they stand at the 
present time. Even more importantly, the gaps in our understanding and prediction of these effects will be 
highlighted. From this discussion it will become evident that, despite the large body of theoretical work in this 
field, there are significant questions that remain unanswered, as well as a need for greater contact between 
theory and experiment. The discussion here is by no means intended to be exhaustive, nor is the reference list 
comprehensive. 


A3.8.1 THE REACTIVE FLUX 

To begin, consider a system which is at equilibrium and undergoing a forward and reverse chemical reaction. 
For simplicity, one can focus on an isomerization reaction, but the discussion also applies to other forms of 
unimolecular reactions as well as to bimolecular reactions that are not diffusion limited. The equilibrium of 
the reaction is characterized by the mole fractions x R and x p of reactants and products, respectively, and an 
equilibrium constant K . For gas phase reactions, it is commonplace to introduce the concept of the minimum 
energy path along some reaction coordinate, particularly if one is interested in microcanonical reaction rates. 
In condensed phase chemical dynamics, however, this concept is not useful. In fact, a search for the minimum 
energy path in a liquid phase reaction would lead one to the solid state! Instead, one considers &free energy 
path along the reaction coordinate q, and the dominant effect of a condensed phase environment is to change 
the nature of this path (i.e. its barriers and reactant and product wells, or minima). To illustrate this point, the 
free energy function along the reaction coordinate of an isomerizing molecule in the gas phase is shown by 
the full curve in figure A3.8.1 . In the condensed phase, the free energy function will almost always be 
modified by the interaction with the solvent, as shown by the broken curve in figure A3. 8.1 . (It should be 
noted that, in the spirit of TST, the definition of the optimal reaction coordinate should probably be redefined 
for the condensed phase reaction, but for simplicity it can be taken to be the same coordinate as in the gas 
phase.) As can be seen from figure A3.8.1 , the solvent can modify the barrier height for the reaction, the 
location of the barrier along q, and the reaction free energy (i.e. the difference between the reactant and 
product minima). It may also introduce dynamical effects that are not apparent from the curve, and it is noted 
here that a classical framework has been implicitly used — the generalization to the quantum regime will be 
addressed in a later section. 
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Figure A3.8.1 A schematic diagram of the PMF along the reaction coordinate for an isomerizing solute in the 
gas phase (full curve) and in solution (broken curve). Note the modification of the barrier height, the well 
positions, and the reaction free energy due to the interaction with the solvent. 

It is worth discussing the fact that a free energy can be directly relevant to the rate of a dynamical process 
such as a chemical reaction. After all, a free energy function generally arises from an ensemble average over 
configurations. On the other hand, most condensed phase chemical rate constants are indeed thermally 
averaged quantities, so this fact may not be so surprising after all, although it should be quantified in a 
rigorous fashion. Interestingly, the free energy curve for a condensed phase chemical reaction (cf figure 
A3. 8.1) can be viewed, in effect, as a natural consequence of Onsager's linear regression hypothesis as it is 
applied to condensed phase chemical reactions, along with some additional analysis and simplifications [7]. 

In the spirit of Onsager, if one imagines a relatively small perturbation of the populations of reactants and 
products away from their equilibrium values, then the regression hypothesis states that the decay of these 
populations back to their equilibrium values will follow the same time-dependent behaviour as the decay of 
correlations of spontaneous fluctuations of the reactant and product populations in the equilibrium system. In 
the condensed phase, it is this powerful principle that connects a macroscopic dynamical quantity such as a 
kinetic rate constant with equilibrium quantities such as a free energy function along a reaction pathway and, 
in turn, the underlying microscopic interactions which determine this free energy function. The effect of the 
condensed phase environment can therefore be largely understood in the equilibrium, or quasi-equilibrium, 
context in terms of the modifications of the free energy curve as shown in figure A3.8.1. As will be shown 
later, the remaining condensed phase effects which are not included in the equilibrium picture may be defined 
as being 'dynamical'. 

The Onsager regression hypothesis, stated mathematically for the chemically reacting system just described, is 
given in the classical limit by 
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where A N R (t) = tf R (t) - (7V R ) is the time-dependent difference between the number of reactant molecules /^ R 
(t) arising from an initial non-equilibrium (perturbed) distribution and the final equilibrium number of the 
reactants (N v ). On the right-hand side of the equation, (8 N v (t) = N^(0 - (N v ) is the instantaneous fluctuation 


in the number of reactant molecules away from its equilibrium value in the canonical ensemble, and the 
notation (•••) denotes the ensemble average over initial conditions. 

The solution to the usual macroscopic kinetic rate equations for the reactant and product concentrations yields 

an expression for the left-hand side of (A3. 8.1) that is equal to A N^(t) = A 7V R (0) exp(-t/x ), where ^ rxn is 
the sum of the forward and reverse rate constants, £ f and £ r , respectively. The connection with the microscopic 

dynamics of the reactant molecule comes about from the right-hand side of (A3. 8.1). In particular, in the 
dilute solute limit, the reactant and product states of the reacting molecule can be identified by the reactant 
and product population of functions h R [g(0] = l-h p [g(()] and h^[q(t)] 9 respectively, where h p [g(()] = h[q*-q 
(t)] and h(x) is the Heaviside step function. The product population function abruptly switches from a value of 
zero to one as the reaction coordinate trajectory q(t) passes through the barrier maximum at q* (cf. Figure 
A3. 8.1 ). The important connection between the macroscopic (exponential) rate law and the decay of 
spontaneous fluctuations in the reactant populations, as specified by the function h R [q(t)] = l-h^,[q(t)] and in 
terms of the microscopic reaction coordinate q, is valid in a 'coarse-grained' sense in time, i.e. after a period 
of molecular-scale transients usually of the order of a few tens of femtoseconds. From the theoretical point of 
view, the importance of the connection outlined above cannot be overstated because it provides a link between 
the macroscopic (experimentally observed) kinetic phenomena and the molecular scale dynamics of the 
reaction coordinate in the equilibrium ensemble. 

However, further analysis of the linear regression expression in A3. 8.1 is required to achieve a useful 
expression for the rate constant both from a computational and a conceptual points of view. Such an 
expression was first provided by Yamamoto [6], but others have extended, validated, and expounded upon his 
analysis in considerable detail [7, 8]. The work of Chandler [7] in this regard is followed most closely here in 
order to demonstrate the places in which condensed phase effects can appear in the theory, and hence in the 
value of the thermal rate constant. The key mathematical step is to differentiate both sides of the linear 
regression formula in (A3. 8.1) and then carefully analyse its expected behaviour for systems having a barrier 
height of at least several times k B T. The resulting expression for the classical forward rate constant in terms 
of the so-called 'reactive flux' time correlation function is given by [6, 7 and 8] 

*r = ^M'/co)]M*<'i.i)]> , A382 , 

where x R is the equilibrium mole fraction of the reactant. The classical rate constant is obtained from (A3. 8. 2) 
when the correlation function reaches a 'plateau' value at the time t = t * after the molecular-scale transients 
have ended [7]. Upon inspection of the above expression, it becomes apparent that the classical rate constant 
can be calculated by averaging over trajectories initiated at the barrier top with a velocity Boltzmann 
distribution for the reaction coordinate and an equilibrium distribution in all other degrees of freedom of the 
system. Those trajectories are then weighted by their initial velocity and the initial flux over the barrier is 
correlated with the product state population function h^[q(t)]. The time dependence of the correlation function 
is computed until the plateau value is reached, at which point it 


becomes essentially constant and the numerical value of the thermal rate constant can be evaluated. An 
example of such a correlation function obtained through molecular dynamics simulations is shown in figure 

A3.8.2. 
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Figure A3.8.2 The correlation function k(7) for particular case of the reaction of methyl vinyl ketone with 
cyclopentadiene in water. The leveling-off of this function to reach a constant value at the plateau time t , is 
clearly seen. 

It is important to recognize that the time-dependent behaviour of the correlation function during the molecular 
transient time seen in figure A3. 8. 2 has an important origin [7, 8]. This behaviour is due to trajectories that 
recross the transition state and, hence, it can be proven [7] that the classical TST approximation to the rate 
constant is obtained from A3. 8. 2 in the t — » + limit: 
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It is, of course, widely considered that the classical TST provides the central framework for the understanding 
of thermal rate constants (see the review article by Truhlar et al [13]) and also for quantifying the dominant 
effects of the considered phase in chemical reactions (see below). 

In order to segregate the theoretical issues of condensed phase effects in chemical reaction dynamics, it is 
useful to rewrite the exact classical rate constant in ( A3. 8. 2 ) as [5, 6, 7, 8, 9, 10 and 11 ] 
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where k is the dynamical correction factor (or 'transmission coefficient') which is given by 
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Here, the symbol (— ) ± denotes an averaging over the flux -weighted distribution [7, 8] for positive or negative 
initial velocities of the reaction coordinate. In figure A3. 8. 2 is shown the correlation function k(0 for the 
particular case of the reaction of methyl vinyl ketone with cyclopentadiene in water. The leveling-off of this 


function to reach a constant value at the plateau time t * is clearly seen. The effect of the condensed phase 
environment in a thermal rate constant thus appears both in the value of the TST rate constant and in the value 
of the dynamical correction factor in ( A3. 8.4 ). These effects will be described separately in the following 
sections, but it should be noted that the two quantities are not independent of each other in that they both 
depend on the choice of the reaction coordinate q. The 'variational' choice of q amounts to finding a 
definition of that coordinate that causes the value of k to be as close to unity as possible, i.e. to minimize the 
number of recrossing trajectories. It seems clear that an important area of research for the future will be to 
define theoretically the 'best' reaction coordinate in a condensed phase chemical reaction — one in which the 
solvent is explicitly taken into account. In charge transfer reactions, for example, a collective solvent 
polarization coordinate can be treated as being coupled to a solute coordinate (see, for example, 14), but a 
more detailed and rigorous microscopic treatment of the full solution — phase reaction coordinate is clearly 
desirable for the future (see, for example, 15 for progress in this regard). Before describing the effects of a 
solvent on the thermal rate constants, it is worthwhile to first reconsider the above analysis in light of current 
experimental work on condensed phase dynamics and chemical reactions. The formalism outlined above, 
while exceptionally powerful in that it provides a link between microscopic dynamics and macroscopic 
chemical kinetics, is intended to help us calculate and analyse only thermal rate constants in equilibrium 
systems. The linear regression hypothesis provides the key line of analysis for this problem. To the extent that 
the thermal rate constant is the quantity of interest — and many times it is the primary quantity of interest — this 
theoretical approach would appear to be the best. However, in many experiments, for example nonlinear 
optical experiments involving intense laser pulses and/or photoinitiated chemical reactions, the system may 
initially be far from equilibrium and the above theoretical analysis may not be completely applicable. 
Furthermore, experimentally measured quantities such as vibrational or phase relaxation rates are often only 
indirectly related to the thermal rate constant. It would therefore appear that more theoretical effort will be 
required in the future to relate experimental measurements to the particular microscopic dynamics in the 
liquid phase that influence the outcome of such measurements and, in turn, to the more standard quantities 
such as the thermal rate constant. 


A3.8.2 THE ACTIVATION FREE ENERGY AND CONDENSED PHASE 
EFFECTS 

Having separated the dynamical from equilibrium (or, more accurately, quasi-equilibrium) effects, one can 
readily discover the origin of the activation free energy and define the concept of the potential of mean force 
by analysis of the expression for the TST rate constant, k TST f in ( A3. 8. 3 ). The latter can be written as [7] 

*™ = «» l^T'Z , ,1 *PH» W>] (A 3.8.6) 


where P = l/k B T and V (q) is the potential of mean force (PMF) along the reaction coordinate q. The latter 
quantity is all important for quantifying and understanding the effect of the condensed phase on the value of 


the thermal rate constant. It is defined as 
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where x are all coordinates of the condensed phase system other than the reaction coordinate, and V{q, x) is 
the total potential energy function. The additive constant in (A3. 8. 7) is irrelevant to the value of the thermal 
rate constant in ( A3. 8. 6 ). If the PMF around its minimum in the reactant state (cf. Figure A3. 8.1 ) is expanded 


quadratically, i.e. V '(q) ~ F (qQ)+(l/2)mco Q (g - q^) , then ( A3. 8. 6 ) simplifies to [5, 7] 

k] ST = ^txp(-0AF- i ) (A 3.8.8) 

where the activation free energy of the system is defined as A F* cl = V (q*) - V '(q^)- The PMF is often 
decomposed as V (q) = v(q) + W eQ (q), where \j(q) is the intrinsic contribution to the PMF from the solute 

potential energy function and, therefore, by definition, W(q) is the contribution arising from the solute- 

eq 

solvent coupling. Figure A3. 8.1 illustrates how the latter coupling is responsible for the condensed phase- 
induced change in the activation free energy, the reaction free energy, and the position of the reactant and 
product wells. Thus, within the context of the TST, one can conclude that the condensed phase enters into the 
picture in a 'simple' way through the aforementioned modifications of the reaction coordinate free energy 
profile in figure A3.8.1 . 

In principle, nothing more is necessary to understand the influence of the solvent on the TST rate constant 
than the modification of the PMF, and the resulting changes in the free energy barrier height should be viewed 
as the dominant effect on the rate since these changes appear in an exponential form. As an example, an error 

in calculating the solvent contribution to the barrier of 1 kcal mol will translate into an error of a factor of 
four in the rate constant — a factor which is often larger than any dynamical and/or quantum effects such as 
those described in later sections. This is a compelling fact for the theorist, so it is therefore no accident that 
the accurate calculation of the solvent contribution to the activation free energy has become the primary focus 
of many theoretical and computational chemists. The successful completion of such an effort requires four 
things: (1) an accurate representation of the solute potential, usually from highly demanding ab initio 
electronic structure calculations; (2) an accurate representation of both the solvent potential and the solute- 
solvent coupling; (3) an accurate computational method to compute, with good statistics, the activation free 
energy in condensed phase systems; and (4) improved theoretical techniques, both analytical and 
computational, to identify the microscopic origin of the dominant contributions to the activation free energy 
and the relationship of these effects to experimental parameters such as pressure, temperature, solvent 
viscosity and polarity, etc. Each of these areas has in turn generated a significant number of theoretical papers 
over the past few decades — too many in fact to fairly cite them here — and many of these efforts have been 
major steps forward. There seems to be little dispute, however, that much work remains to be done in all of 
these areas. Indeed, one of the computational 'grand challenges' facing theoretical chemistry over the coming 
decades will surely be the quantitative prediction (better than a factor of two) of chemical reaction rates in 
highly complex systems. Some of this effort may, in fact, be driven by the needs of industry and government 
in, for example, the environmental fate prediction of pollutants. 


A3.8.3 THE DYNAMICAL CORRECTION AND SOLVENT EFFECTS 

While the TST estimate of the thermal rate constant is usually a good approximated to the true rate constant 
and contains most of the dominant solvent effects, the dynamical corrections to the rate can be important as 
well. In the classical limit, these corrections are responsible for a value of the dynamical correction factor k in 
A3. 8.4 that drops below unity. A considerable theoretical effort has been underway over the past 50 years to 
develop a general theory for the dynamical correction factor (see, for example, [5, 6, 7, 8, 9, 10, 11 and 12]). 
One approach to the problem is a direct calculation of k using molecular dynamics simulation and the reactive 
flux correlation function formalism [7, 8, 16]. This approach obviously requires the numerically exact 
integration of Newton's equations for the many-body potential energy surface and a good microscopic model 
of the condensed phase interactions. However, another approach [5, 9, 10, H and 12] has been to employ a 
model for the reaction coordinate dynamics around the barrier top, for example, the generalized Langevin 


equation (GLE) given by 
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In this equation, m is the effective mass of the reaction coordinate, r|(7 - 1'; q*) is the friction kernel calculated 
with the reaction coordinate 'clamped' at the barrier top, and 8 F(t) is the fluctuating force from all other 
degrees of freedom with the reaction coordinate so configured. The friction kernel and force fluctuations are 
related by the fluctuation-dissipation relation 

nti;q w ) = fiiSF(Q)&Flt)) tr - (A 3.8.10) 

In the limit of a very rapidly fluctuating force, the above equation can sometimes be approximated by the 
simpler Langevin equation 
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where 'J(O) is the so-called 'static' friction, ^ = Jo d ' ^ ? +} 

The GLE can be derived by invoking the linear response approximation for the response of the solvent modes 
coupled to the motion of the reaction coordinate. 

It should be noted that the friction kernel is not in general independent of the reaction coordinate motion [17], 
i.e. a nonlinear response, so the GLE may have a limited range of validity [18, 19 and 20]. Furthermore, even 
if the equation is valid, the strength of the friction might be so great that the second and third terms on the 
right-hand side of (A3. 8. 9) could dominate the dynamics much more so than the force generated by the PMF. 
It should also be noted that, even though the friction in (A3. 8. 9) may be adequately approximated to be 
dynamically independent of the value of the reaction coordinate, the equation is still in general nonlinear, 
depending on the nature of the PMF. For non-quadratic 


forms of the PMF, V (g), even the solution of the reactive dynamics from the model perspective of the GLE 
becomes a non-trivial problem. 

Two central results have arisen from the GLE-based perspective on the dynamical correction factor. The first 
is the Kramers theory of 1940 [3], based on the simpler Langevin equation, while the second is the Grote- 
Hynes theory of 1980 [4]. Both have been extensively discussed and reviewed in the literature [5, 9, 10, 11 
and 12]. The important insight of the Kramers theory is that the transmission coefficient for an isomerization 
or metastable escape reaction undergoes a 'turnover' as one increases the static friction from zero to large 
values. For weak damping (friction), the transmission coefficient is proportional to the friction, i.e. k oc *J(0). 
This dependence arises because the barrier recrossings are caused by the slow energy diffusion (equilibration) 
in the reaction coordinate motion as it leaves the barrier region. For strong damping, on the other hand, the 
transmission coefficient is inversely proportional to the friction, i.e. k oc 1A?(0), because the barrier crossings 
are caused by the diffusive spatial motion of the reaction coordinate in the barrier region. For systems such as 
atom exchange reactions that do not involve a bound reactant state, only the spatial diffusion regime is 
predicted. The basic phenomenology of condensed phase activated rate processes, as mapped out by Kramers, 
captures the essential physics of the problem and remains the seminal work to this day. 


The second key insight into the dynamical corrections to the TST was provided by the Grote-Hynes theory 
[4]. This theory highlights the importance of the time dependence of the friction and demonstrates how it may 
be taken into account at the leading order. In the overdamped regime this is done so through the insightful and 
compact Grote-Hynes (subscript GH) formula for the transmission coefficient [4]. 

*CIT = — 2- K = f / , (A 3.8.12) 
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where %z) is the Laplace transform of the friction kernel, i.e. %z) = /°° dt e _zt r|(7), and co b is the 

magnitude of the unstable PMF barrier frequency. Importantly, the derivation of this formula assumes a 

quadratic approximation to the barrier V(q) « F (a*V-(l/2)mco , _ (# - q*) that may not always be a good 
one. 

Research over the past decade has demonstrated that a multidimensional TST approach can also be used to 
calculate an even more accurate transmission coefficient than k gh for systems that can be described by the 
full GLE with a non-quadratic PMF. This approach has allowed for variational TST improvements [21] of the 
Grote-Hynes theory in cases where the nonlinearity of the PMF is important and/or for systems which have 
general nonlinear couplings between the reaction coordinate and the bath force fluctuations. The Kramers 
turnover problem has also been successfully treated within the context of the GLE and the multidimensional 
TST picture [22]. A multidimensional TST approach has even been applied [ 15 ] to a realistic model of an S N 2 
reaction and may prove to be a promising way to elaborate the explicit microscopic origins of solvent friction. 
While there has been great progress toward an understanding and quantification of the dynamical corrections 
to the TST rate constant in the condensed phase, there are several quite significant issues that remain largely 
open at the present time. For example, even if the GLE were a valid model for calculating the dynamical 
corrections, it remains unclear how an accurate and predictive microscopic theory can be developed for the 
friction kernel r\(t) so that one does not have to resort to a molecular dynamics simulation [ 17 ] to calculate 
this quantity. Indeed, if one could compute the solvent friction along the reaction coordinate in such a manner, 
one could instead just calculate the exact rate 


constant using the reactive-flux formalism. A microscopic theory for the friction is therefore needed to relate 
the friction along the reaction coordinate to the parameters varied by experimentalists such as pressure or 
solvent viscosity. No complete test of Kramers theory will ever be possible until such a theoretical effort is 
completed. Two possible candidates in the latter vein are the instantaneous normal mode theory of liquids [ 23 ] 
and the damped normal mode theory [24] for liquid state dynamics. 

Another key issue remaining to be resolved is whether a one-dimensional GLE as in A3. 8. 11 is the optimal 
choice of a dynamical model in the case of strong damping, or whether a two- or multi-dimensional GLE that 
explicitly includes coupling to solvation and/or intramolecular modes is more accurate and/or more insightful. 
Such an approach might, for example, allow better contact with nonlinear optical experiments that could 
measure the dynamics of such additional modes. It is also entirely possible that the GLE may not even be a 
good approximation to the true dynamics in many cases because, for example, the friction strongly depends 
on the position of the reaction coordinate. In fact, a strong solvent modification of the PMF usually ensures 
that the friction will be spatially dependent [25]. Several analytical studies have dealt with this issue (see, for 
example, [26, 27 and 28] and literature cited therein). Spatially-dependent friction is found to have an 
important effect on the dynamical correction in some instances, but in others the Grote-Hynes estimate is 
predicted to be robust [29]. Nevertheless, the question of the nonlinearity and the accurate modelling of real 
activated rate processes by the GLE remains an open one. 

Another important issue has been identified by several authors [30, 31] which involves the participation of 


intramolecular solute modes in defining the range of the energy diffusion-limited regime of condensed phase 
activated dynamics. In particular, if the coupling between the reaction coordinate and such modes is strong, 
then the Kramers turnover behaviour as a function of the solvent friction occurs at a significantly lower value 
of the friction than for the simple case of the reaction coordinate coupled to the solvent bath alone. In fact, the 
issue of whether the turnover can be experimentally observed at all in the condensed phase hinges on this 
issue. To date, it has remained a challenge to calculate the effective number of intramolecular modes that are 
strongly coupled to the reaction coordinate; no general theory yet exists to accomplish this important goal. 

As a final point, it should again be emphasized that many of the quantities that are measured experimentally, 
such as relaxation rates, coherences and time-dependent spectral features, are complementary to the thermal 
rate constant. Their information content in terms of the underlying microscopic interactions may only be 
indirectly related to the value of the rate constant. A better theoretical link is clearly needed between 
experimentally measured properties and the common set of microscopic interactions, if any, that also affect 
the more traditional solution phase chemical kinetics. 


A3.8.4 QUANTUM ACTIVATED RATE PROCESSES AND SOLVENT 
EFFECTS 

The discussion thus far in this chapter has been centred on classical mechanics. However, in many systems, an 
explicit quantum treatment is required (not to mention the fact that it is the correct law of physics). This 
statement is particularly true for proton and electron transfer reactions in chemistry, as well as for reactions 
involving high-frequency vibrations. 

The exact quantum expression for the activated rate constant was first derived by Yamamoto [6]. The 
resulting quantum reactive flux correlation function expression is given by 
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where h p (t) is the Heisenberg product state population operator. As opposed to the classical case, however, 

the t — » + limit of this expression is always equal to zero [ 32 ] which ensures that an entirely different 
approach from the classical analysis must be adopted in order to formulate a quantum TST (QTST), as well as 
a theory for its dynamical corrections. An article by Truhlar et al [13] describes many of the efforts over the 
past 60 years to develop quantum versions of the TST, and many, if not most, of these efforts have been 
applicable to primarily low-dimensional gas phase systems. A QTST that is useful for condensed phase 
reactions is an extremely important theoretical goal since a direct numerical attack on the time-dependent 
Schrodinger equation for many-body systems is computationally prohibitive, if not impossible. (The latter fact 
seems to be true in the fundamental sense, i.e. there is an exponential scaling of the numerical effort with 
system size for the exact solution.) In this section, some of the leading candidates for a viable condensed 
phase QTST will now be briefly described. The discussion should by no means be considered complete. 

As a result of several complementary theoretical efforts, primarily the path integral centroid perspective [33, 
34 and 35], the periodic orbit [36] or instanton [37] approach and the 'above crossover' quantum activated 
rate theory [38], one possible candidate for a unifying perspective on QTST has emerged [39] from the ideas 
from [39, 40, 41 and 42]. In this theory, the QTST expression for the forward rate constant is expressed as 
[39] 


if ** V— - — (A 3.8.14) 

where v is a simple frequency factor, Q R is the reactant partition function, and Q b is the barrier 'partition 
function' which is to be interpreted in the appropriate asymptotic limit [39, 40, 41 and 42]. The frequency 
factor has the piecewise continuous form [ 39 ] 


M = J ^ 
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while the barrier partition function is defined under most conditions as 39 
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The quantity p c (q c ) is the Feynman path integral centroid density [43] that is understood to be expressed 
asymptotically as 

pciqc) * Ar(<r>exp[-0V £ 'V><^ -I*?/*] (A 3.8.17) 

where the quantum centroid potential of mean force is given by V(q n ) = -k n T ln[p(q)] + constant and q* is 
defined 


-11- 

to be the value of the reaction coordinate that gives the maximum value of V Q (q) in the barrier region (i.e. it 
may differ [33, 35] from the maximum of the classical PMF along q). The path integral centroid density along 
the reaction coordinate is given by the Feynman path integral expression 


=/■■•/ 


Ar(ft) = / ■- I Dq(T)Dx(r)&{q c - qQ)exp{-S[q(T) ¥ n{r)]f7t\ (A 3.8.18) 


which is a functional integral over all possible cyclic paths of the system coordinates weighted by the 
imaginary time action function [43]: 




(A 3.8.19) 


The key feature of A3. 8. 18 is that the centroids of the reaction coordinate Feynman paths are constrained to 
be at the position q Q . The centroid ? Q of a particular reaction coordinate path q(x) is given by the zero- 
frequency Fourier mode, i.e. 

I /*' 

qt> = — drq(t) (A 3.8.20) 

"P Jo 

Under most conditions, the sign of V" c (q*) in ( A3. 8. 17 ) is negative. In such cases, the centroid variable 
naturally appears in the theory 39, and the equation for the quantum thermal rate constant from ( A3. 8. 14 ) - 
( A3.8.17 ) is then given by [39] 


J2a/0\V£(q*)\ 
t r ^ v- — - exp[-fly c (g')]- (A 3.8.21) 

Qr 

It should be noted that in the cases where V" c (g*) > 0, the centroid variable becomes irrelevant to the 
quantum activated dynamics as defined by ( A3. 8. 14 ) and the instanton approach [37] to evaluate Q^ based on 
the steepest descent approximation to the path integral becomes the approach one may take. Alternatively, one 
may seek a more generalized saddle point coordinate about which to evaluate A3. 8. 14 . This approach has also 
been used to provide a unified solution for the thermal rate constant in systems influenced by non-adiabatic 
effects, i.e. to bridge the adiabatic and non-adiabatic (Golden Rule) limits of such reactions. 

In the limit of reasonably high temperatures (above the so-called 'crossover' temperature), i.e. fipA> < 2tt, the 
above formula in A3. 8. 21 is best simplified further and approximately written as 

(2tt/jj/JJ~ i/3 
tf^^H-3 exp[-£V c (jy*)]- (A 3.8.22) 
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This formula, aside from the prefactor k gh , is often referred to as the path integral quantum transition state 
theory (PI-QTST) formula 33. One clear strength of this formula is its clear analogy with and generalization 
of the classical TST formula in A3. 8. 6 . In turn, this allows for an interpretation of solvent effects on quantum 
activated rate constants in terms of the quantum centroid potential of mean force in a fashion analogous to the 
classical case. The quantum activation free energy for highly-non-trivial systems can also be directly 
calculated with imaginary time path integral Monte Carlo techniques 44. Many such studies have now been 
carried out, but a single example will be described in the following section. 

The preceding discussion has focused on the path integral centroid picture of condensed phase quantum 
activated dynamics, primarily because of its strong analogy with the classical case, the PMF, etc, as well as its 
computational utility for realistic problems. However, several recent complementary developments must be 
mentioned. The first is due to Pollak, Liao and Shao 45 who have significantly extended an earlier idea 30 in 
which the exact Heisenberg population operator in h p (^) in A3. 8. 13 is replaced by one for a parabolic barrier 
(plus some other important manipulations, such as symmetrization of the flux operator, that were not done in 
30 ). The dynamical population operator then has an analytic form which in turn leads one to a purely analytic 
'quantum transition state theory' approximation to A3. 8. 13 . This approach, which in principle can be 
systematically improved upon through perturbation theory, has been demonstrated to be as accurate as the 
path integral centroid-based formulae in A3. 8. 21 and A3. 8. 22 above the crossover temperature. 

A second recent development has been the application 46 of the initial value representation 47 to 
semiclassically calculate A3. 8. 13 (and/or the equivalent time integral of the 'flux-flux' correlation function). 
While this approach has to date only been applied to problems with simplified harmonic baths, it shows 
considerable promise for applications to realistic systems, particularly those in which the real solvent 'bath' 
may be adequately treated by a further classical or quasiclassical approximation. 


A3.8.5 SOLVENT EFFECTS IN QUANTUM CHARGE TRANSFER 
PROCESSES 

In this section, the results of a computational study 48 will be used to illustrate the effects of the solvent — and 
the significant complexity of these effects — in quantum charge transfer processes. The particular example 


described here is for a 'simple' modelistic proton transfer reaction in a polar solvent. This study, while useful 
in its own right, also illustrates the level of detail and theoretical formalism that is likely to be necessary in the 
future to accurately study solvent effects in condensed phase charge transfer reactions, even at the equilibrium 
(quantum PMF) level. 

Some obvious targets for quantum activated rate studies are proton, hydride, and hydrogen transfer reactions 
because they are of central importance in the solution phase and acid-base chemistry, as well as in 
biochemistry. These reactions are particularly interesting because they can involve large quantum mechanical 
effects and, since there is usually a redistribution of solute electronic charge density during the reaction, a 
substantial contribution to the activation free energy may have its origin from the solvent reorganization 
process. It is thought that intramolecular vibrations may also play a crucial role in modulating the reactive 
process by lowering the intrinsic barrier for the reaction. 

Many of the condensed phase effects mentioned above have been studied computationally using the PI-QTST 
approach outlined in the first part of the last section. One such study 48 has focused on the model symmetric 
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three-body proton transfer reaction in a polar fluid with its dipole moment chosen to model methanol. The 
molecular group 'A' represents a generic proton donor/acceptor group. 

After some straightforward manipulations of A3. 8. 22 , the PI-QTST estimate of the proton transfer rate 
constant can be shown to be given by 48 

<.P,- y rST = ^ cxp( _ Mr;) (A 3.8.23) 

where a> c = [V" c (#o)/m] ' and the quantum activation free energy is given by 48 

AF;= -* B r]n[ ft (**)/* too)] 

(A 3.8.24 

= -J*r]n[P,<«>-»»4*)]- 

The probability ^ c (q — » q*) to move the reaction coordinate centroid variable from the reactant configuration 
to the transition state is calculated 48 by path integral Monte Carlo techniques 44 combined with umbrella 
sampling [48, 49]. From the calculations on the model proton transfer system above, the quantum activation 
free energy curves are shown in figure A3. 8. 3 for both a rigid and non-rigid (vibrating) intra-complex A-A 
(donor/acceptor) distance. Shown are both the activation curves for the complex in isolation and in the 
solvent. The effect of the solvent in the total activation free energy is immediately obvious, contributing 2-4 

kcal moP 1 to its overall value. One effect of the A-A distance fluctuations is a lowering of the quantum 
activation free energy (i.e. increased tunnelling) both when the solvent is present and when it is not. A second 
interesting effect becomes evident from a comparison of the curves for the systems with the rigid versus 
flexible A-A distance. The contribution to the quantum activation free energy from the solvent is reduced 
when the A-A distance can fluctuate, resulting in a rate that is 20 times higher than in the rigid case. This 
novel behaviour was found to arise from a nonlinear coupling between the intra-complex fluctuations and the 
solvent activation, resulting in a reduced dipole moment of the solute when there is an inward fluctuation of 
the A-A distance. 
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Figure A3.8.3 Quantum activation free energy curves calculated for the model A-H-A proton transfer 
reaction described 45. The full line is for the classical limit of the proton transfer solute in isolation, while the 
other curves are for different fully quantized cases. The rigid curves were calculated by keeping the A-A 
distance fixed. An important feature here is the direct effect of the solvent activation process on both the 
solvated rigid and flexible solute curves. Another feature is the effect of a fluctuating A-A distance which 
both lowers the activation free energy and reduces the influence of the solvent. The latter feature enhances the 
rate by a factor of 20 over the rigid case. 

From the above PI-QTST studies, it was found that, in order to fully quantify the solvent effects for even a 
'simple' model proton transfer reaction, one must deal with a number of complex, nonlinear interactions. 
Examples of other such interactions include the nonlinear dependence of the solute dipole on the position of 
the proton and the intrinsically nonlinear interactions arising from both solute and solvent polarizability 
effects 48. In the latter context, it was found that the solvent electronic polarizability modes must be treated 
quantum mechanically when studying their influence on the proton transfer activation free energy 48. (In 
general, the adequate treatment of electronic polarizability in a variety of condensed phase contexts is 
emerging as an extremely important problem in many contexts; condensed phase reactions may never be 
properly described until this problem is addressed.) The detailed calculations described above, while only for 
a model proton transfer system, clearly illustrate the significant challenge that lies ahead for those who hope 
to quantitatively predict the rates of computation phase chemical reactions through computer simulation. 


A3.8.6 CONCLUDING REMARKS 

In this chapter many of the basic elements of condensed phase chemical reactions have been outlined. Clearly, 
the material presented here represents just an overview of the most important features of the problem. There is 
an extensive literature on all of the issues described herein and, more importantly, there is still much work to 
be done before a complete understanding of the effects of condensed phase environments on chemical 
reactions can be achieved. The theorist and experimentalist alike can therefore look forward to many more 
years of exciting and challenging research in this important area of physical chemistry. 
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A3.9 Molecular reaction dynamics: surfaces 

George R Darling, Stephen Holloway and Charles Rettner 


A3.9.1 INTRODUCTION 

Molecular reaction dynamics is concerned with understanding elementary chemical reactions in terms of the 
individual atomic and molecular forces and the motions that occur during the process of chemical change. In 
gas phase and condensed phase reactions (discussed in section A3. 7 and section A3. 8 ) the reactants, products 
and all intermediates are in the same phase. This 'reduces' the complexity of such systems such that we need 
'only' develop experimental and theoretical tools to treat one medium. In a surface reaction, the reactants 
derive from the gas phase, to which the products may or may not return, but the surface is a condensed phase 
exchanging energy with reactants and products and any intermediates in a nontrivial fashion. The electronic 
states of the surface may also play a role by changing the bonding within and between the various species, 
affecting the reaction as a heterogeneous catalyst (see section A3. 10 ). Of course, the surface itself may be one 
of the reactants, as in the etching of silicon surfaces by halide molecules. Indeed, it might be argued that if the 


reactants achieve thermal equilibrium with the surface, they have become part of a new surface, with 
properties differing from those of the clean surface. 

An individual surface reaction may be the result of several steps occurring on very different timescales. For 
example, a simple bimolecular reaction A (gas) + B (gas) — » AB (gas) might proceed as follows: A strikes the 
surface, losing enough energy to stick (i.e. adsorb), B also adsorbs, A and B diffuse across the surface and 
meet to form AB, after some time AB acquires enough energy to escape (i.e. desorb) from the surface. Each 
part of this schematic process is itself complicated. In the initial collisions with the surface, the molecules can 
lose or gain energy, and this can be translational energy (i.e. from the centre-of-mass motion) or internal 
(rotational, vibrational etc) energy, or both. Internal energy can be exchanged for translational energy, or vice 
versa, or the molecule can simply fragment on impact. Thermalization (i.e. the attainment of thermal 
equilibrium with the surface) is a slower process, requiring possibly tens of bounces of the molecule on the 
surface. The subsequent diffusion of A and B towards each other is even slower, while the desorption of the 
product AB might occur as soon as it is formed, leaving the molecule with some of the energy released in the 
association step. 

Why should we be interested in the dynamics of such complex systems? Apart from the intellectual rewards 
offered by this field, understanding reactions at surfaces can have great practical and economic value. Gas- 
surface chemical reactions are employed in numerous processes throughout the chemical and electronic 
industries. Heterogeneous catalysis lies at the heart of many synthetic cycles, and etching and deposition are 
key steps in the fabrication of microelectronic components. Gas-surface reactions also play an important role 
in the environment, from acid rain to the chemistry of the ozone hole. Energy transfer at the gas-surface 
interface influences flight, controls spacecraft drag, and determines the altitude of a slider above a computer 
hard disk. Any detailed understanding of such processes needs to be built on fundamental knowledge of the 
dynamics and kinetics at the molecular level. 

For any given gas-surface reaction, the various elementary steps of energy transfer, adsorption, diffusion, 
reaction and desorption are inextricably linked. Rather than trying to study all together in a single system 
where they cannot easily be untangled, most progress has been made by probing the individual steps in 
carefully chosen systems [1, 2 and 3]. 


For example, energy transfer in molecule-surface collisions is best studied in nonreactive systems, such as the 
scattering and trapping of rare-gas atoms or simple molecules at metal surfaces. We follow a similar approach 
below, discussing the dynamics of the different elementary processes separately. The surface must also be 
'simplified' compared to technologically relevant systems. To develop a detailed understanding, we must 
know exactly what the surface looks like and of what it is composed. This requires the use of surface science 
tools (section B 1.19-26) to prepare very well-characterized, atomically clean and ordered substrates on which 
reactions can be studied under ultrahigh vacuum conditions. The most accurate and specific experiments also 
employ molecular beam techniques, discussed in section B2. 3 . 


A3.9.2 REACTION MECHANISMS 

The basic paradigms of surface reaction dynamics originate in the pioneering studies of heterogeneous 
catalysis by Langmuir [4, 5 and 6]. Returning to our model bimolecular reaction A (gas) + B (gas) — » AB 
(gas), let us assume first that A adsorbs on, and comes into thermal equilibrium with, the surface. We 
categorize the reaction according to the behaviour of molecule B. For most surface reactions, B adsorbs and 
thermalizes on the surface before meeting and reacting with A, by way of a Langmuir-Hinshelwood 
mechanism. However, in some systems, AB can only be formed as a result of a direct collision of the 
incoming B with the adsorbed A. Such reactions, which are discussed in further detail in section A3. 9. 6 , are 


said to occur by an Eley-Rideal mechanism. A schematic illustration of these processes is shown in figure 
A3.9.1 . 

For a Langmuir-Hinshelwood reaction, we can expect the surface temperature to be an important variable 
determining overall reactivity because it determines how fast A and B diffuse across the surface. If the 
product AB molecules thermalize before desorption, the distribution of internal and translational energies in 
the gas phase will also reflect the surface temperature (yielding Boltzmann distributions that are modified by a 
dynamical factor related by the principle of detailed balance to the energetics of adsorption [7]). The main 
factor discriminating between the reaction schemes is that for a Langmuir-Hinshelwood reaction, the AB 
molecule can have no memory of the initial state and motion of the B molecule, but these should be evident in 
the AB products if the mechanism is of the Eley-Rideal type. These simple divisions are of course too black 
and white. In all probability, the two paradigms are actually extremes, with real systems reflecting aspects of 
both mechanisms [8]. 
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Figure A3.9.1. Schematic illustrations of (a) the Langmuir-Hinshelwood and (b) Eley-Rideal mechanisms in 
gas-surface dynamics. 


A3.9.3 COLLISION DYNAMICS AND TRAPPING IN NONREACTIVE 
SYSTEMS 

As with any collision process, to understand the dynamics of collisions we need an appreciation of the 
relevant forces and masses. Far from the surface, the incoming atom or molecule will experience the van der 
Waals attraction of the form 


Vtz) = -C/z 


,.t 


(A3.9.1) 


where z is the distance from the surface, and C is a constant dependent on the polarizability of the particle and 
the dielectric properties of the solid [9]. Close to the surface, where z is 0.1 nm for a nonreactive system, this 
attractive interaction is overwhelmed by repulsive forces (Pauli repulsion) due to the energy cost of 
orthogonalizing the overlapping electronic orbitals of the incoming molecule and the surface. The net result of 
van der Waals attraction and Pauli repulsion is a potential with a shallow well, the physisorption well, 
illustrated in figure A3. 9. 2 . The depth of this well ranges from a few meV for He adsorption to -30 meV for 
H 2 molecules on noble metal surfaces, and to -100 meV for Ar or Xe on metal surfaces. 
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Figure A3.9.2. Interaction potential for an atom or molecule physisorbed on a surface. A convenient model is 
obtained by 'squaring off the potential, which facilitates solution of the Schrodinger equation for the 
scattering of a quantum particle. 

The van der Waals attraction arises from the interaction between instantaneous charge fluctuations in the 
molecule and surface. The molecule interacts with the surface as a whole. In contrast the repulsive forces are 
more short-range, localized to just a few surface atoms. The repulsion is, therefore, not homogeneous but 
depends on the point of impact in the surface plane, that is, the surface is corrugated. 

A3.9.3.1 BINARY COLLISION (HARD-CUBE) MODEL 

We can obtain an approximate description of the molecule-surface encounter using a binary collision model, 
with the projectile of mass m as one collision partner and a 'cube' having an effective mass, M, related to that 
of a surface atom, as the other partner. M depends on how close the projectile approaches the atoms of the 
surface, on the stiffness of the surface, and on the degree of corrugation of the repulsive potential. For rare- 
gas atoms interacting with metal surfaces, the surface electronic orbitals are delocalized and the repulsive 
interaction is effectively with a large cluster. The effective mass is correspondingly large, some 3-9 times the 
mass of a surface atom [10]. In other cases, such as 2 colliding with Ag(l 11), the degree of corrugation and 
the effective mass may be closer to that expected for one atom [11]. 


Approximating the real potential by a square well and infinitely hard repulsive wall, as shown in figure A3. 9. 2 
we obtain the hard cube model. For a well depth of W, conservation of energy and momentum lead [ 11 , 12 ] to 
the very useful Baule formula for the translational energy loss, 8E, to the substrate 


SB = 


4/< 


(E+W) 


(A3.9.2) 


where E is the initial translational energy of the projectile and ju is the ratio mIM. This formula shows us that 
the energy transfer increases with the mass of the projectile, reaching a maximum when projectile and cube 
masses are equal. 


Of course the real projectile-surface interaction potential is not infinitely hard (cf figure A3. 9. 2 . As E 
increases, the projectile can penetrate deeper into the surface, so that at its turning point (where it momentarily 
stops before reversing direction to return to the gas phase), an energetic projectile interacts with fewer surface 
atoms, thus making the effective cube mass smaller. Thus, we expect 5E/E to increase with E (and also with 
W since the well accelerates the projectile towards the surface). 

The effect of surface temperature, T s , can be included in this model by allowing the cube to move [12]. E 
becomes the translational energy in the frame of the centre-of-mass of projectile and cube; then we average 
the results over E, weighting with a Boltzmann distribution at T^. This causes 8E to decrease with increasing 
7 S , and when the thermal energy of the cube, kT^, substantially exceeds E, the projectile actually gains energy 
in the collision! This is qualitatively consistent with experimental observations of the scattering of beams of 
rare-gas atoms from metal surfaces [14, 15 ], 

A3.9.3.2 SCATTERING AND TRAPPING-DESORPTION DISTRIBUTIONS 

Projectiles leaving the surface promptly after an inelastic collision have exchanged energy with the surface, 
yet their direction of motion and translational and internal energies are clearly related to their initial values. 
This is called direct-inelastic (DI) scattering. At low E, the projectile sees a surface in thermal motion. This 
motion dominates the final energy and angular distributions of the scattering, and so this is referred to as 
thermal scattering. As E becomes large, the projectile penetrates the surface more deeply, seeing more of the 
detailed atomic structure, and the interaction comes to be dominated by scattering from individual atoms. 
Eventually E becomes so large that the surface thermal motion becomes negligible, and the energy and 
angular distributions depend only on the atomic structure of the surface. This is known as the structure- 
scattering regime. Comparing experimental results with those of detailed classical molecular dynamics 
modelling of these phenomena can allow one to construct good empirical potentials to describe the projectile- 
surface interaction, as has been demonstrated for the Xe/Pt(l 11) [ 16 ] and Ar/Ag(l 1 1) [ 17 ] systems. 

From equation (A3. 9. 2) , we can see that at low E, the acceleration into the well dominates the energy loss, 
that is, 8E does not reduce to zero with decreasing E. Below a critical translational energy, given by 

4fiW 

E € = -r (A3.9.3) 

-/<)- 

the projectile has insufficient energy remaining to escape from the well and it traps at the surface. Inclusion of 
surface temperature (cube motion) leads to a blurring of this cut-off energy so that trapping versus energy 
curves are predicted to be smoothed step functions. In fact, true trapping versus energy curves are closer to 
exponential in form, due to the combined effects of additional averaging over variations of ^with surface site 
and with the orientation of the incident molecule. Additionally, transfer of motion normal to the surface to 
motion parallel to the surface, or into internal motions (rotations) can also lead to trapping, as we shall discuss 
below. 

Trapped molecules can return to the gas phase once the thermal energy, kT^ becomes comparable to the well 
depth. Having equilibrated with the surface, they have velocity, angular distribution and internal energies 
determined by T^. This is visible in experiment as a scattering component (the trapping-desorption (TD) 
scattering component) with a very different appearance to the DI component, being peaked at and 
symmetrical about the surface normal, independently of the incidence conditions of the beam of projectiles. 


Such behaviour has been seen in many systems, for example in the scattering of Ar from Pt(l 1 1) [ 10 ] as 
illustrated in figure A3. 9. 3 . 


Figure A3.9.3. Time-of-flight spectra for Ar scattered from Pt(l 1 1) at a surface temperature of 100 K [10]. 
Points in the upper plot are actual experimental data. Curve through points is a fit to a model in which the 
bimodal distribution is composed of a sharp, fast moving (hence short flight time), direct-inelastic (DI) 
component and a broad, slower moving, trapping-desorption (TD) component. These components are shown 
separately in the lower curves. Parameters: E = 12.5 kJ mol ; 0. = 60°; f = 40°; T = 100 K. 

I J s 

A3.9.3.3 SELECTIVE ADSORPTION 

Light projectiles impinging on a cold surface exhibit strong quantum behaviour in the scattering and trapping 
dynamics. Motion in the physisorption well is quantized normal to the surface, as indicated in figure A3. 9.2 . 
Although in the gas phase the projectile can have any parallel momentum, when interacting with a perfect 
surface, the parallel momentum can only change by whole numbers of reciprocal lattice vectors (the 
wavevectors corresponding to wavelengths fitting within the surface lattice) [9] The scattering is thus into 
special directions, forming a diffraction pattern, which is evident even for quite massive particles such as Ar 
[18]. These quantizations couple to yield maxima in the trapping probability when, to accommodate the gain 
in parallel momentum, the projectile must drop into one of the bound states in the z-direction. In other words, 
the quantized gain in parallel motion leaves the projectile with more translational energy than it had initially, 
but the excess is cancelled by the negative energy of the bound state [19]. This is an entirely elastic 
phenomenon, no energy loss to the substrate is required, simply a conversion of normal for parallel motion. 
The trapping is undone if the parallel momentum gain is reversed. 

The energies of the selective adsorption resonances are very sensitive to the details of the physisorption 
potential. Accurate measurement allied to computation of bound state energies can be used to obtain a very 
accurate quantitative form for the physisorption potential, as has been demonstrated for helium atom 
scattering. For molecules, we have 


the additional possibility of exchanging normal translations for rotational motion (the vibrational energies of 
light molecules are much larger than typical physisorption energies). Parallel momentum changes are effected 
by the surface corrugation, giving rise to corrugation mediated selective adsorption (CMSA). By analogy, 
rotational excitations produce rotation mediated selective adsorption (RMSA). Together these yield the 
acronym CRMSA. All such processes have been identified in the scattering of H 2 and its isotopomers from 
noble and simple metal surfaces [20], Typical results are shown in figure A3. 9.4. The selective adsorption 
resonances show up as peaks in the trapping (minima in the reflectivity) because the long residence time at the 
surface increases the amount of energy lost to the substrate, resulting in sticking [21]. 
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Figure A3.9.4. The ratio of specular reflectivity to incident beam intensity ratio for D 2 molecules scattering 
from a Cu(100) surface at 30 K [21] . 


A3.9.4 MOLECULAR CHEMISORPTION AND SCATTERING 


Unlike physisorption, chemisorption results from strong attractive forces mediated by chemical bonding 
between projectile and surface [9]. There is often significant charge transfer between surface and molecule, as 
in the adsorption of 2 on metal surfaces [22]. The characteristics, well depth, distance of minimum above the 
surface etc. can vary greatly with surface site [23]. The degree of charge transfer can also differ, such that in 
many systems we can speak of there being more than one chemisorbed species [24]. 

The chemisorption interaction is also very strongly dependent on the molecular orientation, especially for 
heteronuclear molecules. This behaviour is exemplified by NO adsorption on metal surfaces, where the N end 
is the more strongly bound. These anisotopic interactions lead to strong steric effects and consequent 
rotational excitation in the scattering dynamics. Rainbows are evident in the rotational distributions [25], as 
can be seen in figure A3. 9. 5 . These steric effects show up particularly strongly when the incident molecules 
are aligned prior to scattering (by magnetic fields) [26, 27 ], 
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Figure A3.9.5. Population of rotational states versus rotational energy for NO molecules scattered from an Ag 
(1 1 1) surface at two different incidence energies and at T^ = 520 K [25]: (a) E = 0.85 eV, Q f = 15° and (b) E = 
0.09 eV, Q f = 15°. Results at E = 0.85 eV show a pronounced rotational rainbow. 

The change of charge state of an adsorbed molecule leads to a change in the intramolecular bonding, usually a 
lengthening of the bond, which can result in vibrational excitation of the scattered molecule. Once again, this 
shows up in the scattering of NO from the Ag(l 11) surface [28], as shown in figure A3. 9. 6 . In this case, the 
vibrational excitation probability is dependent on both the translational energy and the surface temperature. 
The translational energy dependence is probably due to the fact that the closer the molecule is to a surface, the 
more extended the molecular bond becomes, that is, in the language of section A3. 9. 5 , the NO is trying to get 
round the elbow (see figure A3. 9. 8 ) to dissociate. It fails, and returns to the gas phase with increased 
vibrational energy. Surface temperature can enhance this process by supplying energy from the thermal 
motion of a surface atom towards the molecule [29, 30], but interaction with electronic excitations in the 
metal has also been demonstrated to be an efficient and likely source of energy transfer to the molecular 
vibrations [29, 30 ]. 



1.4 16 16 

10 3 /T S {K- 1 ) 


Figure A3.9.6. Population of the first excited vibrational state (u = 1) versus inverse of surface temperature 
for NO scattering from an Ag(l 11) surface [28]. Curves: (a) E= 102 kJ mol and (b) E = 9 kJ mol . 

A3.9.4.1 CHEMISORPTION AND PRECURSOR STATES 

The chemisorption of a molecule is often a precursor [ 31 ] to further reactions such as dissociation (see section 
A3. 9. 5. 2 ), that is, the molecule must reside in the precursor state exploring many configurations until finding 
that leading to a reaction. Where there is more than one distinct chemisorption state, one can act as a precursor 
to the other [32]. The physisorption state can also act as a precursor to chemisorption, as is observed for the 
2 /Ag(l 10) system [33]. 

The presence of a precursor breaks the dynamical motion into three parts [34]. First, there is the dynamics of 
trapping into the precursor state; secondly, there is (at least partial) thermalization in the precursor state; and, 
thirdly, the reaction to produce the desired species (possibly a more tightly bound chemisorbed molecule). 
The first two of these we can readily approach with the knowledge gained from the studies of trapping and 
sticking of rare-gas atoms, but the long timescales involved in the third process may perhaps more usefully be 
addressed by kinetics and transition state theory [35]. 
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A3.9.5 DYNAMICS OF DISSOCIATION REACTIONS 


A3.9.5.1 DIRECT DISSOCIATION OF DIATOMICS 


The direct dissociation of diatomic molecules is the most well studied process in gas-surface dynamics, the 
one for which the combination of surface science and molecular beam techniques allied to the computation of 
total energies and detailed and painstaking solution of the molecular dynamics has been most successful. The 
result is a substantial body of knowledge concerning the importance of the various degrees of freedom (e.g. 
molecular rotation) to the reaction dynamics, the details of which are contained in a number of review articles 
[2, 36, 37, 38, 39, 40 and 41]. 


(A) LENNARD-JONES MODEL OF HYDROGEN DISSOCIATION 

In the 1930s Lennard- Jones [ 42 ] introduced a model that is still in use today in discussions of the dissociation 
of molecules at surfaces. He proposed a description based on two potential energy curves. The first, 
describing the interaction of the intact molecule with the surface as a physisorption potential, is shown as 
curve (a) in figure A3. 9. 7. Coupled with this, there is a second potential describing the interaction of the two 
separately chemisorbed atoms with the surface (curve (b) in figure A3. 9. 7). In equilibrium the adsorbed atoms 
are located at the minimum, L, of curve (b). The difference between (a) and (b) far from the surface is the gas- 
phase molecular dissociation energy, D. A dissociation event occurs if a molecule approaches the surface until 
K, where it makes a radiationless transition from (a) to (b) becoming adsorbed as atoms. 
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Figure A3.9.7. A representation of the Lennard- Jones model for dissociative adsorption of H 2 . Curves: (a) 
interaction of intact molecule with surface; (b) interaction of two separately chemisorbed atoms with surface. 

There is an inconsistency in the model in that when changing from (a) to (b) the molecular bond is 
instantaneously elongated. Lennard- Jones noted that although one-dimensional potential energy curves (such 
as shown in figure A3. 9. 7 can prove of great value in discussions, 'they do not lend themselves to 
generalization when more than one coordinate is necessary to specify a configuration'. In a quantitative theory 
there should be a number of additional curves between (a) and (b) corresponding to rotational and vibrational 
states of the molecule. In modern terms, we try to describe each 
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degree-of- freedom relevant to the problem with a separate dimension. The potential energy curves of figure 
A3. 9. 7 then become a multidimensional surface, the potential energy surface (PES). 

The model illustrated in figure A3. 9. 7 is primarily diabatic, the molecule jumps suddenly from one type of 
bonding, represented by a potential energy curve, to another. However, much of the understanding in gas- 
surface dynamics derives from descriptions based on motion on a single adiabatic PES, usually the ground- 
state PES. In the Lennard- Jones model, this would approximately correspond to whichever of (a) and (b) has 
the lower energy. Although this approach is successful in describing H 2 dissociation, it will not be adequate 
for reactions involving very sudden changes of electronic state [43]. These may occur, for example, in the 2 
reaction with simple metal surfaces [44]; they are so energetic that they can lead to light or electron emission 
during reaction [45], 

(B) INFLUENCE OF MOLECULAR VIBRATION ON REACTION 


Dissociation involves extension of a molecular bond until it breaks and so it might seem obvious that the more 
energy we can put into molecular vibration, the greater the reactivity. However, this is not always so: the 


existence of a vibrational enhancement of dissociation reveals something about the shape, or topography of 
the PES itself. This is illustrated in figure A3. 9. 8 which shows a generic elbow PES [37], This two- 
dimensional PES describes the dynamics in the molecule-surface and intramolecular bond length coordinates 
only. Far from the surface, it describes the intramolecular bonding of the projectile by, for example, a Morse 
potential. Close to the surface at large bond length, the PES describes the chemisorption of the two atoms to 
the surface in similar fashion to curve (b) in figure A3. 9. 7 . The curved region linking these two extremes is 
the interaction region (shaded in figure A3. 9. 8), where the bonding is changing from one type to another. It 
corresponds roughly to the curve crossing point, K, in the Lennard- Jones model. 
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Figure A3.9.8. An elbow potential energy surface representing the dissociation of a diatomic in two 
dimensions-the molecular bond length and the distance from the molecule to the surface. 

For vibrational effects in the dynamics, the location of the dissociation barrier within the curved interaction 
region of figure A3. 9. 8 is crucial. If the barrier occurs largely before the curved region, it is an 'early' barrier 
at point E, then vibration will not promote reaction as it occurs largely at right angles to the barrier. In 
contrast, if the barrier occurs when the bond is already extended, say at L (a 'late' barrier) in the figure, the 
vibration is now clearly helping the molecule to attack this barrier, and can substantially enhance reaction. 
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The consequences of these effects have been fully worked out, and agree with the Polanyi rules used in gas- 
phase scattering [46, 47]. Experimental observations of both the presence and absence of vibrational 
enhancement have been made, most clearly in hydrogen dissociation on metal surfaces. For instance, H 2 
dissociation on Ni surfaces shows no vibrational enhancement [48, 49]. On Cu surfaces, however, vibrational 
enhancement of dissociation has been clearly demonstrated by using molecular beam techniques (section 
B2.6) to vary the internal and translational energies independently [ 50 ] and by examining the energy and state 
distributions of molecules undergoing the reverse of dissociation, the associative desorption reaction [51]. 
Figure A3. 9. 9 shows typical results presented in the form of dissociation versus translational energy curves 
backed out from the desorption data [52]. The curves corresponding to the vibrationally excited states clearly 
lie at lower energy than those for the vibrational ground-state, implying that some of the energy for the 
reaction comes from the H 2 vibration. 
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Figure A3.9.9. Dissociation probability versus incident energy for D 2 molecules incident on a Cu(l 1 1) 
surface for the initial quantum states indicated (o indicates the initial vibrational state and J the initial 
rotational state) [ 100 ]. For clarity, the saturation values have been scaled to the same value irrespective of the 
initial state, although in reality the saturation value is higher for the o = 1 state. 

An important further consequence of curvature of the interaction region and a late barrier is that molecules 
that fail to dissociate can return to the gas-phase in vibrational states different from the initial, as has been 
observed experimentally in the H 2 /Cu system [53, 55]. To undergo vibrational (de-)excitation, the molecules 
must round the elbow part way, but fail to go over the barrier, either because it is too high, or because the 
combination of vibrational and translational motions is such that the molecule moves across rather than over 
the barrier. Such vibrational excitation and de-excitation constrains the PES in that we require the elbow to 
have high curvature. Dissociation is not necessary, however, for as we have pointed out, vibrational excitation 
is observed in the scattering of NO from Ag(l 11) [55]. 

(C) ROTATIONAL EFFECTS: STERIC HINDRANCE AND CENTRIFUGAL ENHANCEMENT 

Molecular rotation has two competing influences on the dissociation of diatomics [56, 57 and 58]. A molecule 
will only be able to dissociate if its bond is oriented correctly with respect to the plane of the surface. If the 
bond is parallel to the plane, then dissociation will take place, whereas if the molecule is end-on to the 
surface, dissociation requires one atom to be ejected into the gas phase. In most cases, this 'reverse Eley- 
Rideal' process is energetically very 
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unfavourable (although it does occur for very energetic reactions such as halide adsorption on Si surfaces, and 
possibly for 2 adsorbing on reactive metals [59]). In general, molecules cannot dissociate when oriented 
end-on. The PES is, thus, highly corrugated in the molecular orientation coordinate. In consequence, 
increasing the rapidity of motion in this coordinate (i.e. increasing the rotational state) will make it more 
likely for the molecule to race past the small dissociation window at the parallel orientation, strike-off a more 
repulsive region of the PES and return to the gas phase. Therefore, dissociation is inhibited by increasing the 
rotational energy of the molecule. In opposition to this effect, the rotational motion can enhance reactivity 
when the dissociation barrier is late (i.e. occurs at extended bond length). As the molecule progresses through 
the interaction region, its bond begins to extend. This increases the moment of inertia and thus reduces the 
rotational energy. The rotational energy thus 'lost' feeds into the reaction coordinate, further stretching the 
molecular bond and enhancing the reaction. The combination of these two competing effects has been 
demonstrated in the H 2 /Cu(l 11) system. For the first few rotational states, increases in rotation reduce the 
dissociation (i.e. shift the dissociation curve to higher energy) as can be seen in figure A3. 9. 9 . Eventually, 
however, centrifugal enhancement wins out, and for the higher rotational states the dissociation curves are 
pushed to lower translational energies. 


The strong dependence of the PES on molecular orientation also leads to strong coupling between rotational 
states, and hence rotational excitation/de-excitation in the scattering. This has been observed experimentally 
for H 2 scattering from Cu surfaces. Recent work has shown that for H 2 the changes in rotational state occur 
almost exclusively when the molecular bond is extended, that is, longer than the gas-phase equilibrium value 
[60]. 

(D) SURFACE CORRUGATION AND SITE SPECIFICITY OF REACTION 

The idea that certain sites on a surface are especially active is common in the field of heterogeneous catalysis 
[61]. Often these sites are defects such as dislocations or steps. But surface site specificity for dissociation 
reactions also occurs on perfect surfaces, arising from slight differences in the molecule-surface bonding at 
different locations. This is so not only of insulator and semiconductor surfaces where there is strongly 
directional bonding, but also of metal surfaces where the electronic orbitals are delocalized. The site 
dependence of the reactivity manifests itself as a strong corrugation in the PES, which has been shown to exist 
by ab initio computation of the interaction PES for H 2 dissociation on some simple and noble metal surfaces 

[62, 63 and 64]. 

The dynamical implications of this corrugation appear straightforward: surface sites where the dissociation 
barrier is high (unfavourable reaction sites) should shadow those sites where the barrier is low (the favoured 
reaction sites) if the reacting molecule is incident at an angle to the surface plane. If we assume that the 
motion normal to the surface is important in traversing the dissociation barrier, then those molecules 
approaching at an angle should have lower dissociation probability than those approaching at normal 
incidence. This has indeed been observed in a number of dissociation systems [37], but a far more common 

observation is that the dissociation scales with the 'normal energy', E p = E cos 0, where E is the translational 
energy, and the angle of incidence of the beam with respect to the surface normal. Normal energy scaling, 
shown in figure A3. 9. 10 implies that the motion parallel to the surface does not affect dissociation, and the 
surface appears flat. 
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Figure A3.9.10. The dissociation probability for 2 on W(l 10) [ 101 ] as a function of the normal energy, 
(upper). T = 800 K; 0: (*) 0°, (a) 30° (!) 45° and (Q) 60° The normal energy scaling observed can be 
explained by combining the two surface corrugations indicated schematically (lower diagrams). 

This difficulty has been resolved with the realization that the surface corrugation is not merely of the barrier 
energy, but of the distance of the barrier above the surface [65]. We then distinguish between energetic 
corrugation (the variation of the energetic height of the barrier) and geometric corrugation (a simple variation 
of the barrier location or shape). The two cases are indicated in figure A3. 9. 10. For energetic corrugation, the 
shadowing does lead to lower dissociation at off-normal incidence, but this can be counterbalanced by 
geometric corrugation, for which the parallel motion helps the molecule to attack the facing edge of the PES 
[65]. 

The site specificity of reaction can also be a state-dependent site specificity, that is, molecules incident in 
different quantum states react more readily at different sites. This has recently been demonstrated by Kroes 
and co-workers for the H 2 /Cu(100) system [66]. Additionally, we can find reactivity dominated by certain 
sites, while inelastic collisions leading to changes in the rotational or vibrational states of the scattering 
molecules occur primarily at other sites. This spatial separation of the active site according to the change of 
state occurring (dissociation, vibrational excitation etc) is a very surface specific phenomenon. 
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(E) STEERING DOMINATED REACTION 


A very extreme version of surface corrugation has been found in the nonactivated dissociation reactions of H 2 
on W [ 67 , 68], Pd and Rh systems. In these cases, the very strong chemisorption bond of the H atoms gives 
rise to a very large energy release when the molecule dissociates. In consequence, at certain sites on the 
surface, the molecule accelerates rapidly downhill into the dissociation state. At the unfavourable sites, there 


are usually small dissociation barriers and, of course, molecules oriented end-on to the surface cannot 
dissociate. When we examine the dynamics of motion on such PESs, we find that the molecules are steered 
into the attractive downhill regions [69], away from the end-on orientation and away from the unfavourable 
reaction sites. 

Steering is a very general phenomenon, caused by gradients in the PES, occurring in every gas-surface 
system [36]. However, for these nonactivated systems showing extreme variations in the PES, the steering 
dominates the dissociation dynamics. At the very lowest energies, most molecules have enough time to steer 
into the most favourable geometry for dissociation hence the dissociation probability is high. At higher E, 
there is less time for steering to be effective and the dissociation decreases. The general signature of a steering 
dominated reaction is, therefore, a dissociation probability that falls with increasing E [49, 70, 71], as shown 
in figure A3. 9.1 1. This can be contrasted with the curve usually expected for direct dissociation, figure 
A3. 9. 10 , one which increases with E because, as E increases, it is easier to overcome the barriers in 
unfavourable geometries. 
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Figure A3.9.11. Dissociation of H 2 on the W(100)-c(2 x 2)-Cu surface as a function of incident energy [71]. 
The steering dominated reaction [ 102 ] is evident at low energy, confirmed by the absence of a significant 
surface temperature. 
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(F) SURFACE TEMPERATURE DEPENDENCE 

Direct dissociation reactions are affected by surface temperature largely through the motion of the substrate 
atoms [72]. Motion of the surface atom towards the incoming molecule increases the likelihood of (activated) 
dissociation, while motion away decreases the dissociation probability. For low dissociation probabilities, the 
net effect is an enhancement of the dissociation by increasing surface temperature, as observed in the system 
O 2 /Pt{100}-hex-R0.7°[73]. 


This interpretation is largely based on the results of cube models for the surface motion. It may also be that 


the thermal disorder of the surface leads to slightly different bonding and hence different barrier heights. 
Increasing temperature also changes the populations of the excited electronic states in the surface, which may 
affect bonding. The contribution of these effects to the overall surface temperature dependence of reaction is 
presently not clear. 

A3.9.5.2 DIRECT DISSOCIATION OF POLYATOMIC MOLECULES 

Although understanding the dissociation dynamics of diatomic molecules has come a long way, that of 
polyatomic molecules is much less well-developed. Quite simply, this is due to the difficulty of computing 
adequate PESs on which to perform suitable dynamics, when there are many atoms. Quantum dynamics also 
becomes prohibitively expensive as the dimensionality of the problem increases. The dissociation of CH 4 (to 
H + CH 3 ) on metal surfaces is the most studied to date [74]. This shows dependences on molecular 
translational energy and internal state, as well as a strong surface temperature dependence, which has been 
interpreted in terms of thermally assisted quantum tunnelling through the dissociation barrier. More recent 
experimental work has shown complicated behaviour at low E, with the possible involvement of steering or 
trapping [75], 

A3.9.5.3 PRECURSOR-MEDIATED DISSOCIATION 

Precursor-mediated dissociation involves trapping in a molecularly chemisorbed state (or possibly several 
states) prior to dissociation. If the molecule thermalizes before dissociation, we can expect to observe the 
signature of trapping in the dissociation dynamics, that is, we expect increasing E and increasing surface 
temperature to decrease the likelihood of trapping, and hence of dissociation. This is exemplified by the 
dissociation of N 2 on W(100) in the low energy regime [76], shown in figure A3. 9. 12 . 

The thermalization stage of this dissociation reaction is not amenable to modelling at the molecular dynamics 
level because of the long timescales required. For some systems, such as 2 /Pt(l 1 1), a kinetic treatment is 
very successful [77]. However, in others, thermalization is not complete, and the internal energy of the 
molecule can still enhance reaction, as observed for N 2 /Fe(l 11) [78, 79] and in the dissociation of some small 
hydrocarbons on metal surfaces [80], A detailed explanation of these systems is presently not available. 
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Figure A3.9.12. The dissociation probability of N 2 on the W(100) surface as a function of the energy of the 


molecular beam. The falling trend and pronounced surface temperature dependence are indicative of a 
precursor mediated reaction at low energies. 


A3.9.6 ELEY-RIDEAL DYNAMICS 

The idea that reactions can occur directly when an incident reagent strikes an adsorbate is strongly supported 
by detailed theoretical calculations. From early classical simulations on model PESs (e.g. for H on H/tungsten 
[ 81 ] and O on C/platinum [82]), to more recent theoretical studies of ER reactions employing quantum- 
mechanical models, it has been clearly established that product molecules should show a high degree of 
internal excitation. As noted in section A3. 9. 2 , certain highly facile gas-surface reactions can occur directly at 
the point of impact between an incident gas-phase reagent and an adsorbate; however, it is more likely that the 
incident reagent will 'bounce' a few times before reaction, this being to some degree accommodated in the 
process. A more useful working definition of an ER reaction is that it should occur before the reagents have 
become equilibrated at the surface. With this definition, we encompass hot-atom dynamics and what Harris 
and Kasemo have termed precursor dynamics [8]. Since the heat of adsorption of the incident reagent is not 
fully accommodated, ER reactions are far more exothermic than their LH counterparts. 

Until relatively recently, the experimental evidence for the ER mechanism came largely from kinetic 
measurements, relating the rate of reaction to the incident flux and to the surface coverage and temperature 
[83]. For example, it has been found that the abstraction of halogens from Si(100) by incident H proceeds 
with a very small activation barrier, consistent with an ER mechanism. To prove that a reaction can occur on 
essentially a single gas-surface collision, however, dynamical measurements are required. The first definitive 
evidence for an ER mechanism was obtained in 1991, in a study showing that hyperthermal N(C 2 H 4 ) 3 N can 
pick up a proton from a H/Pt(l 11) surface to give an ion with translational energy dependent on that of the 
incident molecule [84]. A year later, a study was reported of the formation of HD from H atoms incident on 
D/Cu(l 1 1), and from D incident on H/Cu(l 1 1) [85]. The angular 


-18- 


distribution of the HD was found to be asymmetrical about the surface normal and peaked on the opposite 
side of the normal to that of the incident atom. This behaviour proved that the reaction must occur before the 
incident atom reaches equilibrium with the surface. Moreover, the angular distribution was found to depend 
on the translational energy of the incident atom and on which isotope was incident, firmly establishing the 
operation of an ER mechanism for this elementary reaction. 

Conceptually similar studies have since been carried out for the reaction of H atoms with Cl/Au(l 11). More 
recently, quantum-state distributions have been obtained for both the H + Cl/Au(l 1 1)[86, 87 and 88 and H(D) 
+ D (H)/Cu(l 11) systems. The results of these studies are in good qualitative agreement with calculations. 
Even for the H(D) + D (H)/Cu(l 11) system [89], where we know that the incident atom cannot be 
significantly accommodated prior to reaction, reaction may not be direct. Detailed calculations yield much 
smaller cross sections for direct reaction than the overall experimental cross section, indicating that reaction 
may occur only after trapping of the incident atom [90]. 

Finally, it should also be clear that ER reactions do not necessarily yield a gas-phase product. The new 
molecule may be trapped on the surface. There is evidence for an ER mechanism in the addition of incident H 
atoms to ethylene and benzene on Cu(l 11) [91], and in the abstraction of H atoms from cyclohexane by 
incident D atoms [92], and the direct addition of H atoms to CO on Ru(001) [93]. 


A3.9.7 PHOTOCHEMISTRY 

The interaction of light with both clean surfaces and those having adsorbed species has been a popular 
research topic over the past 10 years [94]. Our understanding of processes such as photodesorption, 
photodissociation and photoreaction is still at a very early stage and modelling has been largely performed on 
a system-by-system basis rather than any general theories being applicable. One of the most important aspects 
of performing photochemical reactions on surfaces, which has been well documented by Polanyi and co- 
workers is that it is possible to align species before triggering reactions that cannot be done in the gas phase. 
This is frequently referred to as surface aligned photochemistry [95]. One of the key issues when light, such 
as that from a picosecond laser, impinges a surface covered with an adsorbate is where the actual absorption 
takes place. Broadly speaking there are two possible choices either in the adsorbate molecule or the surface 
itself. Unfortunately, although it may seem that unravelling microscopic reaction mechanisms might be quite 
distinct depending on what was absorbing, this is not the case and considerable effort has been spent on 
deciding what the dynamical consequences are for absorption into either localized or extended electronic 
states [96]. 

Of lesser interest here for a laser beam incident upon a surface are the processes that occur due to surface 
heating. Of greater interest are those occasions when an electronic transition is initiated and a process occurs, 
for in these circumstances it becomes possible to 'tune' reactivity by an external agent. A good example of 
this is the UV photodissociation of a range of carbonyls on Si surfaces [97], Here it was shown explicitly that 
257 nm light can selectively excite the adsorbate and then dissociation ensues. An alternative story unfolds 
when NO is photodesorbed from Pt surfaces. Detailed experiment and modelling shows that, in this case, the 
initial excitation (absorption) event occurs in the metal substrate. Following this, the excess energy is 
transferred to the adsorbate by a hot electron which resides for about 10-12 fs before returning to the 
substrate. During this time, it is possible for the NO to gain sufficient energy to overcome the adsorption bond 
[98]. 
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Finally, and most recently, femtosecond lasers have been employed to investigate reactions on surfaces, one 
good example is the oxidation of CO on a Ru surface [99]. One of the long outstanding problems in surface 
dynamics is to determine the energy pathways that are responsible for irreversible processes at the surface. 
Both phonons and electrons are capable of taking energy from a prethermalized adsorbate and because of the 
time required for converting electronic motion to nuclear motion, there is the possibility that measurements 
employing ultrashort-pulsed lasers might be able to distinguish the dominant pathway. 


A3.9.8 OUTLOOK 

Despite the considerable progress over the 1990s, the field of gas-surface reaction dynamics is still very much 
in its infancy. We have a relatively good understanding of hydrogen dissociation on noble metals but our 
knowledge of other gas-surface systems is far from complete. Even for other diatomic reagents such as N 2 or 
2 a great deal yet remains to be learned.Nevertheless, we believe that progress will take place even if in a 
slightly different fashion to that which is described here. 

In parallel with the remarkable increase in computing power, particularly in desktop workstations, there have 
been significant advances also in the algorithmic development of codes that can calculate the potential energy 
(hyper-)surfaces that have been mentioned in this article. Most of the theoretical work discussed here has 
relied to a greater or lesser extent on potential energy surfaces being available from some secondary agency 


and this, we believe, will not be the case in the future. Software is now available which will allow the 
dynamicist to calculate new potentials and then deploy them to evaluate state-to-state cross sections and 
reaction probabilities. Although new, detailed experimental data will provide guidance, a more general 
understanding of gas-surface chemistry will develop further as computational power continues to increase. 
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A3.10 Reactions on surfaces: corrosion, growth, 
etching and catalysis 

Todd P St Clair and D Wayne Goodman 


A3.10.1 INTRODUCTION 

The impact of surface reactions on society is often overlooked. How many of us pause to appreciate integrated 
circuitry before checking email? Yet, without growth and etching reactions, the manufacturing of integrated 
circuits would be quite impractical. Or consider that in 1996, the United States alone consumed 123 billion 
gallons of gasoline [JJ. The production of this gasoline from crude petroleum is accomplished by the 
petroleum industry using heterogeneous catalytic reactions. Even the control of automobile exhaust emissions, 
an obvious environmental concern, is achieved via catalytic reactions using 'three-way catalysts' that 
eliminate hydrocarbons, CO and NO . The study of these types of surface reactions and others is an exciting 
and rapidly changing field. Nevertheless, much remains to be understood at the atomic level regarding the 
interaction of gases and liquids with solid surfaces. 

Surface science has thrived in recent years primarily because of its success at providing answers to 
fundamental questions. One objective of such studies is to elucidate the basic mechanisms that control surface 
reactions. For example, a goal could be to determine if CO dissociation occurs prior to oxidation over Pt 
catalysts. A second objective is then to extrapolate this microscopic view of surface reactions to the 


corresponding macroscopic phenomena. 

How are fundamental aspects of surface reactions studied? The surface science approach uses a simplified 
system to model the more complicated 'real-world' systems. At the heart of this simplified system is the use 
of well defined surfaces, typically in the form of oriented single crystals. A thorough description of these 
surfaces should include composition, electronic structure and geometric structure measurements, as well as an 
evaluation of reactivity towards different adsorbates. Furthermore, the system should be constructed such that 
it can be made increasingly more complex to more closely mimic macroscopic systems. However, relating 
surface science results to the corresponding real-world problems often proves to be a stumbling block because 
of the sheer complexity of these real-world systems. 

Essential to modern surface science techniques is the attainment and maintenance of ultrahigh vacuum 

(UHV), which corresponds to pressures of the order of 10 -10 Torr (~1(T 13 atm). At these pressures, the 
number of collisions between gas phase molecules and a surface are such that a surface can remain relatively 
contaminant-free for a period of hours. For example, in air at 760 Torr and 298 K the collision frequency is 3 

9^ 9 1 1^9 

x 10 collisions cm s . Assuming a typical surface has 10 atoms cm , then each surface atom 
undergoes ~10 8 collisions per second. Clearly, a surface at 760 Torr has little chance of remaining clean. 
However, by lowering the pressure to 10~ 10 Torr, the collision frequency decreases to approximately 10 

9 1 S 

collisions cm s , corresponding to a collision with a surface atom about every 10 s. Decreasing the 
pressure is obviously a solution to maintaining a clean sample, which itself is crucial to sustaining well 
characterized surfaces during the course of an experiment. 

Modern UHV chambers are constructed from stainless steel. The principal seals are metal-on-metal, thus the 
use of greases is avoided. A combination of pumps is normally used, including ion pumps, turbomolecular 
pumps, cryopumps and mechanical (roughing) pumps. The entire system is generally heatable to -500 K. This 
'bakeout' for a period of 


10-20 h increases gas desorption rates from the internal surfaces, ultimately resulting in lower pressure. For 
further reading on vacuum technology, including vacuum and pump theory, see [2, 3]. 

The importance of low pressures has already been stressed as a criterion for surface science studies. However, 
it is also a limitation because real-world phenomena do not occur in a controlled vacuum. Instead, they occur 
at atmospheric pressures or higher, often at elevated temperatures, and in conditions of humidity or even 
contamination. Hence, a major thrust in surface science has been to modify existing techniques and equipment 
to permit detailed surface analysis under conditions that are less than ideal. The scanning tunnelling 
microscope (STM) is a recent addition to the surface science arsenal and has the capability of providing 
atomic-scale information at ambient pressures and elevated temperatures. Incredible insight into the nature of 
surface reactions has been achieved by means of the STM and other in situ techniques. 

This chapter will explore surface reactions at the atomic level. A brief discussion of corrosion reactions is 
followed by a more detailed look at growth and etching reactions. Finally, catalytic reactions will be 
considered, with a strong emphasis on the surface science approach to catalysis. 


A3.10.2 CORROSION 

A3. 10. 2. 1 1NTRODUCTION 

Corrosion is a frequently encountered phenomenon in which a surface undergoes changes associated with 


exposure to a reactive environment. While materials such as plastics and cement can undergo corrosion, the 
term corrosion more commonly applies to metal surfaces. Rust is perhaps the most widely recognized form of 
corrosion, resulting from the surface oxidation of an iron-containing material such as steel. Economically, 
corrosion is extremely important. It has been estimated that annual costs associated with combating and 
preventing corrosion are 2-3% of the gross national product for industrialized countries. Equipment damage is 
a major component of the costs associated with corrosion. There are also costs related to corrosion prevention, 
such as implementation of anti-corrosive paints or other protective measures. Finally, there are indirect losses, 
such as plant shutdowns, when equipment or facilities need repair or replacement. 

Most metals tend to corrode in an environment of air and/or water, forming metal oxides or hydrated oxides. 
Whether or not such a reaction is possible is dictated by the thermodynamics of the corrosion reaction. If the 
reaction has a negative Gibbs free energy of formation, then the reaction is thermodynamically favoured. 
While thermodynamics determines whether a particular reaction can occur or not, the rate of the corrosion 
reaction is determined by kinetic factors. A number of variables can affect the corrosion rate, including 
temperature, pH and passivation, which is the formation of a thin protective film on a metal surface. 
Passivation can have a tremendous influence on the corrosion rate, often reducing it to a negligible amount. 

Since metals have very high conductivities, metal corrosion is usually electrochemical in nature. The term 
electrochemical is meant to imply the presence of an electrode process, i.e. a reaction in which free electrons 
participate. For metals, electrochemical corrosion can occur by loss of metal atoms through anodic 
dissolution, one of the fundamental corrosion reactions. As an example, consider a piece of zinc, hereafter 
referred to as an electrode, immersed in water. Zinc tends to dissolve in water, setting up a concentration of 
Zn ions very near the electrode 


surface. The term anodic dissolution arises because the area of the surface where zinc is dissolving to form 

Zn 2+ is called the anode, as it is the source of positive current in the system. Because zinc is oxidized, a 
concentration of electrons builds up on the electrode surface, giving it a negative charge. This combination of 
negatively charged surface region with positively charged near-surface region is called an electrochemical 
double layer. The potential across the layer, called the electrode potential, can be as much as ±1 V. 

In moist environments, water is present either at the metal interface in the form of a thin film (perhaps due to 
condensation) or as a bulk phase. Figure A3. 10.1 schematically illustrates another example of anodic 
dissolution where a droplet of slightly acidic water (for instance, due to H 2 S0 4 ) is in contact with an Fe 
surface in air [4]. Because Fe is a conductor, electrons are available to reduce 2 at the edges of the droplets. 

The electrons are then replaced by the oxidation reaction of Fe to Fe (forming FeS0 4 if H 2 S0 4 is the acid), 
and the rate of corrosion is simply the current induced by metal ions leaving the surface. 
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Figure A3. 10.1 (a) A schematic illustration of the corrosion process for an oxygen-rich water droplet on an 
iron surface, (b) The process can be viewed as a short-circuited electrochemical cell [4]. 

Corrosion protection of metals can take many forms, one of which is passivation. As mentioned above, 
passivation is the formation of a thin protective film (most commonly oxide or hydrated oxide) on a metallic 
surface. Certain metals that are prone to passivation will form a thin oxide film that displaces the electrode 
potential of the metal by +0.5-2.0 V. The film severely hinders the diffusion rate of metal ions from the 
electrode to the solid-gas or solid-liquid interface, thus providing corrosion resistance. This decreased 
corrosion rate is best illustrated by anodic polarization curves, which are constructed by measuring the net 
current from an electrode into solution (the corrosion current) under an applied voltage. For passivable metals, 
the current will increase steadily with increasing voltage in the so-called active region until the passivating 
film forms, at which point the current will rapidly decrease. This behaviour is characteristic of metals that are 
susceptible to passivation. 

Another method by which metals can be protected from corrosion is called alloying. An alloy is a multi- 
component solid solution whose physical and chemical properties can be tailored by varying the alloy 
composition. 


For example,copper has relatively good corrosion resistance under non-oxidizing conditions. It can be alloyed 
with zinc to yield a stronger material (brass), but with lowered corrosion resistance. However, by alloying 
copper with a passivating metal such as nickel, both mechanical and corrosion properties are improved. 
Another important alloy is steel, which is an alloy between iron (>50%) and other alloying elements such as 
carbon. 


Although alloying can improve corrosion resistance, brass and steel are not completely resistant to attack and 
often undergo a form of corrosion known as selective corrosion (also called de-alloying or leaching). De- 
alloying consists of the segregation of one alloy component to the surface, followed by the removal of this 
surface component through a corrosion reaction. De-zincification is the selective leaching of zinc from brasses 
in an aqueous solution. The consequences of leaching are that mechanical and chemical properties change 
with compositional changes in the alloy. 

As an example of the effect that corrosion can have on commercial industries, consider the corrosive effects 
of salt water on a seagoing vessel. Corrosion can drastically affect a ship's performance and fuel consumption 
over a period of time. As the hull of a steel boat becomes corroded and fouled by marine growths, the 


performance of the ship declines because of increased frictional drag. Therefore, ships are dry docked 
periodically to restore the smoothness of the hull. Figure A3. 10.2 shows the loss of speed due to corrosion and 
marine fouling between annual drydockings for a ship with a steel hull [5]. As corrosion effects progressively 
deteriorated the hull and as marine growth accumulated, the ship experienced an overall loss of speed even 
after drydocking and an increased fuel consumption over time. It is clear that there is strong economic 
motivation to implement corrosion protection. 
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Figure A3. 10.2 The influence of corrosion (C) and marine fouling (F) on the performance of a steel ship 
dry docked annually for cleaning and painting [5]. 


Surface science studies of corrosion phenomena are excellent examples of in situ characterization of surface 
reactions. In particular, the investigation of corrosion reactions with STM is promising because not only can it 
be used to study solid-gas interfaces, but also solid-liquid interfaces. 

A3.10.2.2 SURFACE SCIENCE OF CORROSION 
(A) THE ROLE OF SULFUR IN CORROSION 


STM has been used to study adsorption on surfaces as it relates to corrosion phenomena [6, 7]. Sulfur is a well 
known corrosion agent and is often found in air (S0 2 , H 2 S) and in aqueous solution as dissolved anions ( 
HSO^) or dissolved gas (H 2 S). By studying the interaction of sulfur with surfaces, insights can be gained into 

the fundamental processes governing corrosion phenomena. A Ni(l 11) sample with 10 ppm sulfur bulk 
impurity was used to study sulfur adsorption by annealing the crystal to segregate the sulfur to the surface [8]. 
Figure A3. 10.3 shows a STM image of a S-covered Ni(l 11) surface. It was found that sulfur formed islands 
preferentially near step edges, and that the Ni surface reconstructed under the influence of sulfur adsorption. 
This reconstruction results in surface sites that have fourfold symmetry rather than threefold symmetry as on 
the unreconstructed (111) surface. Furthermore, the fourfold symmetry sites are similar to those found on 
unreconstructed Ni(100), demonstrating the strong influence that sulfur adsorption has on this surface. The 
mechanism by which sulfur leads to corrosion of nickel surfaces is clearly linked to the ability of sulfur to 
weaken Ni-Ni bonds. 
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Figure A3. 10.3 STM images of the early stages of sulfur segregation on Ni(l 11). Sulfur atoms are seen to 
preferentially nucleate at step edges [8]. 


(B) ANODIC DISSOLUTION IN ALLOYS 

This weakening of Ni-Ni surface bonds by adsorbed sulfur might lead one to expect that the corrosion rate 
should increase in this case. In fact, an increased anodic dissolution rate was observed for Ni 3 Fe (100) in 0.05 
M H 2 S0 4 [9]. Figure A3. 10.4 shows the anodic polarization curves for clean and S-covered single-crystal 
alloy surfaces. While both surfaces show the expected current increase with potential increase, the sulfur- 
covered surface clearly has an increased rate of dissolution. In addition, the sulfur coverage (measured using 

radioactive sulfur, 35 S) does not decrease even at the maximum dissolution rate, indicating that adsorbed 
sulfur is not consumed by the dissolution reaction. Instead, surface sulfur simply enhances the rate of 
dissolution, as expected based on the observation above that Ni-Ni bonds are significantly weakened by 
surface sulfur. 


Figure A3. 10.4 The effect of sulfur on the anodic polarization curves from a Ni Q 25 Fe(100) alloy in 0.05 M 
H 2 S0 4 . is the sulfur ( 35 S) coverage [6]. 

The nature of copper dissolution from CuAu alloys has also been studied. CuAu alloys have been shown to 
have a surface Au enrichment that actually forms a protective Au layer on the surface. The anodic polarization 
curve for CuAu alloys is characterized by a critical potential, E Q , above which extensive Cu dissolution is 
observed [10]. Below E c , a smaller dissolution current arises that is approximately potential-independent. This 
critical potential depends not only on the alloy composition, but also on the solution composition. STM was 
used to investigate the mechanism by which copper is selectively dissoluted from a CuAu 3 electrode in 
solution [11], both above and below the critical potential. At potentials below E c , it was found that, as copper 
dissolutes, vacancies agglomerate on the surface to form voids one atom deep. These voids grow two- 
dimensionally with increasing Cu dissolution while the second atomic layer remains undisturbed. The fact that 
the second atomic layer is unchanged suggests that Au atoms from the first layer are filling 


in holes left by Cu dissolution. In sharp contrast, for potentials above E , massive Cu dissolution results in a 
rough surface with voids that grow both parallel and perpendicular to the surface, suggesting a very fast 
dissolution process. These in situ STM observations lend insight into the mechanism by which Cu dissolution 
occurs in CuAu 3 alloys. 

The characterization of surfaces undergoing corrosion phenomena at liquid-solid and gas-solid interfaces 
remains a challenging task. The use of STM for in situ studies of corrosion reactions will continue to shape 
the atomic-level understanding of such surface reactions. 


A3.1 0.3 GROWTH 

A3.1 0.3.1 INTRODUCTION 

Thin crystalline films, or overlayers, deposited onto crystalline substrates can grow in such a way that the 
substrate lattice influences the overlay er lattice. This phenomenon is known as epitaxy; if the deposited 
material is different from (the same as) the substrate, the process is referred to as heteroepitaxy 
(homoepitaxy). Epitaxial growth is of interest for several reasons. First, it is used prevalently in the 
semiconductor industry for the manufacture of III/V and II/VI semiconductor devices. Second, novel phases 
have been grown epitaxially by exploiting such phenomena as lattice mismatch and strain. These new phases 
have physical and chemical properties of interest to science and engineering. Finally, fundamental catalytic 
studies often focus on modelling oxide-supported metal particles by depositing metal films on oxide single 
crystals and thin films and, in many cases, these oxide and metal films grow epitaxially. 

When considering whether growth will occur epitaxially or not, arguments can be made based on geometrical 
considerations, or row matching. This concept is based on the idea that the overlayer must sit on minima of 
the substrate corrugation potential to minimize the interaction energy. For example, consider the illustration of 
epitaxial growth in figure A3. 10.5 where an fcc(l 11) monolayer has been overlaid on a bcc(l 10) surface [12]. 
Figure A3. 10.5(a) shows that the overlayer must be expanded or contracted in two directions to obtain row 
matching. Figure A3. 10.5(b) shows, however, that rotation of the overlayer by 5.26° results in row matching 
along the most close-packed row of the lattices. Epitaxial growth clearly provides a pathway to energetically 
favourable atomic arrangements. 
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Figure A3. 10.5 An fcc(l 11) monolayer (full circles) overlaid onto a bcc(l 10) substrate (open circles), (a) fee 
[Oil] parallel to bcc[001]. (b) 5.26° rotation relative to (a). The lattice constants were chosen to produce row- 
matching in (b) [12]. 

The influence of the substrate lattice makes it energetically favourable for two materials to align lattices. On 
the other hand, if two lattices are misaligned or mismatched in some other way, then lattice strain may result. 
This lattice strain can lead to a metastable atomic arrangement of the deposited material. In other words, an 
overlayer can respond to lattice strain by adopting a crystal structure that differs from its normal bulk structure 
in order to row-match the substrate lattice. This phenomenon is known as pseudomorphy. For example, Cu 
(fee) deposited on a Pd(100) surface will grow epitaxially to yield a pseudomorphic fee overlayer [13]. 
However, upon increasing the copper film thickness, a body-centred tetragonal (bet) metastable phase, one not 
normally encountered for bulk copper, was observed. This phase transformation is due to a high degree of 
strain in the fee overlayer. 

Another example of epitaxy is tin growth on the (100) surfaces of InSb or CdTe (a = 6.49 A) [14]. At room 
temperature, elemental tin is metallic and adopts a bet crystal structure ('white tin') with a lattice constant of 
5.83 A. However, upon deposition on either of the two above-mentioned surfaces, tin is transformed into the 
diamond structure ('grey tin') with a = 6.49 A and essentially no misfit at the interface. Furthermore, since 
grey tin is a semiconductor, then a novel heterojunction material can be fabricated. It is evident that epitaxial 
growth can be exploited to synthesize materials with novel physical and chemical properties. 

A3.1 0.3.2 FILM GROWTH TECHNIQUES 

There are several design parameters which distinguish film growth techniques from one another, namely 
generation of the source atom/molecule, delivery to the surface and the surface condition. The source 
molecule can be generated in a number of ways including vapour produced thermally from solid and liquid 
sources, decomposition of organometallic 


compounds and precipitation from the liquid phase. Depending on the pressures used, gas phase atoms and 


molecules impinging on the surface may be in viscous flow or molecular flow. This parameter is important to 
determining whether atom-atom (molecule-molecule) collisions, which occur in large numbers at pressures 
higher than UHV, can affect the integrity of the atom (molecule) to be deposited. The condition of the 
substrate surface may also be a concern: elevating the surface temperature may alter the growth kinetics, or 
the surface may have to be nearly free of defects and/or contamination to promote the proper growth mode. 
Two film growth techniques, molecular beam epitaxy (MBE) and vapour phase epitaxy (VPE) will be briefly 
summarized below. These particular techniques were chosen because of their relevance to UHV studies. The 
reader is referred elsewhere for more detailed discussions of the various growth techniques [ 15 , 16 and 17 ]. 

MBE is accomplished under UHV conditions with pressures of the order of ~10 _10 Torr. By using such low 
pressures, the substrate surface and deposited thin films can be kept nearly free of contamination. In MBE, the 
material being deposited is usually generated in UHV by heating the source material to the point of 
evaporation or sublimation. The gas phase species is then focused in a molecular beam onto the substrate 
surface, which itself may be at an elevated temperature. The species flux emanating from the source can be 
controlled by varying the source temperature and the species flux arriving at the surface can be controlled by 
the use of mechanical shutters. Precise control of the arrival of species at the surface is a very important 
characteristic of MBE because it allows the growth of epitaxial films with very abrupt interfaces. Several 
sources can be incorporated into a single vacuum chamber, allowing doped semiconductors, compounds or 
alloys to be grown. For instance, MBE is used prevalently in the semiconductor industry to grow 
GaAs/Al x Ga lx As layers and, in such a situation, a growth chamber would be outfitted with Ga, As and Al 
deposition sources. Because of the compatibility of MBE with UHV surface science techniques, it is often the 
choice of researchers studying fundamentals of thin-film growth. 

A second technique, VPE, is also used for surface science studies of overlayer growth. In VPE, the species 
being deposited can be generated in several ways, including vaporization of a liquid precursor into a flowing 
gas stream or sublimation of a solid precursor. VPE generates an unfocused vapour or cloud of the deposited 
material, rather than a collimated beam as in MBE. Historically, VPE played a major role in the development 
of III/V semiconductors. Currently, VPE is used as a tool for studying metal growth on oxides, an issue of 
importance to the catalysis community. 

The following two sections will focus on epitaxial growth from a surface science perspective with the aim of 
revealing the fundamentals of thin-film growth. As will be discussed below, surface science studies of thin- 
film deposition have contributed greatly to an atomic-level understanding of nucleation and growth. 

A3.1 0.3.3 THERMODYNAMICS 

The number of factors affecting thin-film growth is largely dependent upon the choice of growth technique. 
The overall growth mechanism may be strongly influenced by three factors: mass transport, thermodynamics 
and kinetics. For instance, for an exothermic (endothermic) process, increasing (decreasing) the surface 
temperature will decrease (increase) the growth rate for a thermodynamically limited process. On the other 
hand, if temperature has no effect on the growth rate, then the process may be limited by mass transport, 
which has very little dependence on the substrate temperature. Another test of mass transport limitations is to 
increase the total flow rate to the surface while keeping the partial pressures constant — if the growth rate is 
influenced, then mass transport limitations should be considered. Alternatively, if the substrate orientation is 
found to influence the growth rates, then the process is very likely kinetically limited. Thus, through a 
relatively straightforward analysis of the parameters affecting macroscopic 
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quantities, such as growth rate, a qualitative description of the growth mechanism can be obtained. The 


growth of epitaxial thin films by vapour deposition in UHV is a non-equilibrium kinetic phenomenon. At 
thermodynamic equilibrium, atomic processes are required to proceed in opposite directions at equal rates. 
Hence, a system at equilibrium must have equal adsorption and desorption rates, as well as equal cluster 
growth and cluster decay rates. If growth were occurring under equilibrium conditions, then there would be no 
net change in the amount of deposited material on the surface. Typical growth conditions result in systems far 
from equilibrium, so film growth is usually limited by kinetics considerations. Thermodynamics does play an 
important role, however, as will be discussed next. 

Thermodynamics can lend insight into the expected growth mode by examination of energetics 
considerations. The energies of importance are the surface free energy of the overlayer, the interfacial energy 
between the substrate and the overlayer, and the surface free energy of the substrate. Generally, if the free 
energy of the overlayer plus the interface energy is greater than the free energy of the substrate, then Frank- 
van der Merwe (FM) growth will occur [18]. FM growth, also known as layer-by-layer growth, is 
characterized by the completion of a surface overlayer before the second layer begins forming. However, if 
the free energy of the overlayer plus the interface energy is less than the free energy of the substrate then the 
growth mode is Volmer-Weber (VW) [18]. VW, or three-dimensional (3D), growth yields 3D islands or 
clusters that coexist with bare patches of substrate. There is also a third growth mode, called Stranski- 
Krastanov (SK), which can be described as one or two monolayers of growth across the entire surface 
subsequently followed by the growth of 3D islands [18]. In SK growth, the sum of the surface free energy of 
the overlayer plus interface energy is initially greater than that of the substrate, resulting in the completion of 
the first monolayer, after which the surface free energy of the overlayer plus interface energy becomes greater 
than that of the substrate, resulting in 3D growth. It should be stressed that the energetic arguments for these 
growth modes are only valid for equilibrium processes. However, these descriptions provide good models for 
the growth modes experimentally observed even under non-equilibrium conditions. 

A3.1 0.3.4 NUCLEATION AND GROWTH 

The process of thin-film growth from an atomic point of view consists of the following stages: adsorption, 
diffusion, nucleation, growth and coarsening. Adsorption is initiated by exposing the substrate surface to the 
deposition source. As described above, this is a non-equilibrium process, and the system attempts to restore 
equilibrium by forming aggregates. The adatoms randomly walk during the diffusion process until two or 
more collide and subsequently nucleate to form a small cluster. A rate-limiting step is the formation of some 
critical cluster size, at which point cluster growth becomes more probable than cluster decay. The clusters 
increase in size during the growth stage, with the further addition of adatoms leading to island formation. 
Growth proceeds at this stage according to whichever growth mode is favoured. Once deposition has ceased, 
further island morphological changes occur during the coarsening stage, whereby atoms in small islands 
evaporate and add to other islands or adsorb onto available high-energy adsorption sites such as step edge 
sites. For an excellent review on the atomic view of epitaxial metal growth, see [19]. 

Experimentally, the variable-temperature STM has enabled great strides to be made towards understanding 
nucleation and growth kinetics on surfaces. The evolution of overlayer growth can be followed using STM 
from the first stages of adatom nucleation through the final stages of island formation. The variable- 
temperature STM has also been crucial to obtaining surface diffusion rates. In such cases, however, the 
importance of tip-sample interactions must be considered. Typically, low tunnelling currents are best because 
under these conditions the tip is further from the surface, thereby reducing the risk of tip-sample interactions. 
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Much effort in recent years has been aimed at modelling nucleation at surfaces and several excellent reviews 
exist [ 20 , 21 and 22]. Mean-field nucleation theory is one of these models and has a simple picture at its core. 


In the nucleation stage, an atom arriving at the surface from the gas phase adsorbs and then diffuses at a 
particular rate until it collides with another surface adatom to form a dimer. If the dimers are assumed to be 
stable (so that no decay occurs) and immobile (so that no diffusion occurs) then, as deposition proceeds, the 
concentration of dimers will increase approximately linearly until it is roughly equal to the concentration of 
monomers. At this point, the probability of an atom colliding with a dimer is comparable to the probability of 
an adatom colliding with another adatom, hence growth and nucleation compete. Once the island density has 
saturated, i.e. no more clusters are being formed, then the adatom mean free path is equal to the mean island 
separation and further deposition results in island growth. At coverages near 0.5 monolayers (ML), islands 
begin to coalesce and the island density decreases. 

This simple and idealistic picture of nucleation and growth from mean field nucleation theory was found to be 
highly descriptive of the Ag/Pt(l 11) system at 75 K ( figure A3. 10.6 ) [23]. Figure A3. 10.6 shows a series of 
STM images of increasing Ag coverage on Pt(l 1 1) and demonstrates the transition from nucleation to growth. 
At very low coverages ((a) and (b)), the average cluster size is 2.4 and 2.6 atoms, respectively, indicating that 
dimers and trimers are the predominant surface species. However, when the coverage was more than doubled 
from (a) to (b), the mean island size remained relatively constant. This result clearly indicates that deposition 
at these low coverages is occurring in the nucleation regime. By increasing the coverage to 0.03 ML, the Ag 
mean island size doubled to 6.4 atoms and the island density increased, indicating that nucleation and growth 
were competing. Finally, after increasing the coverage even further (d), the mean island size doubled again, 
while the island density saturated, suggesting that a pure growth regime dominated, with little or no 
nucleation occurring. 

Growth reactions at surfaces will certainly continue to be the focus of much research. In particular, the 
synthesis of novel materials is an exciting field that holds much promise for the nanoscale engineering of 
materials. Undoubtedly, the advent of STM as a means of investigating growth reactions on the atomic scale 
will influence the future of nanoscale technology. 
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Figure A3. 10.6 A series of STM images for Ag/Pt(l 1 1) at 75 K showing the transition from nucleation to 
growth [23]. Coverages (0) and mean island sizes (n) are indicated. 


A3.10.4 ETCHING 

A3.1 0.4.1 INTRODUCTION 

Etching is a process by which material is removed from a surface. The general idea behind etching is that by 
interaction of an etch atom or molecule with a surface, a surface species can be formed that is easily removed. 
The use of a liquid to etch a surface is known as wet etching, while the use of a gas to etch a surface is known 
as dry etching. Wet etching has been employed since the late Middle Ages. The process then was rather 
simple and could be typified as follows. The metal to be etched was first coated with a wax, or in modern 
vernacular, a mask. Next, a pattern was cut into the wax to reveal the metal surface beneath. Then, an acid 
was used to etch the exposed metal, resulting in a patterned surface. Finally, the mask was removed to reveal 
the finished product. Modern methods are considerably more technologically advanced, although the general 
principles behind etching remain unchanged. 
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Both wet and dry etching are used extensively in the semiconductor processing industry. However, wet 
etching has limitations that prevent it being used to generate micron or submicron pattern sizes for GaAs 
etching. The most serious of these limitations is called substrate undercutting, which is a phenomenon where 
etch rates parallel and perpendicular to the surface are approximately equal (isotropic etching). Substrate 


undercutting is much less prevalent for silicon surfaces than GaAs surfaces, thus wet etching is more 
commonly used to etch silicon surfaces. Generally, when patterning surfaces, anisotropic etching is preferred, 
where etch rates perpendicular to the surface exceed etch rates parallel to the surface. Hence, in cases of 
undercutting, an ill defined pattern typically results. In the early 1970s, dry etching (with CF 4 /0 2 , for 
example) became widely used for patterning. Dry methods have a distinct advantage over wet methods, 
namely anisotropic etching. 

A form of anisotropic etching that is of some importance is that of orientation-dependent etching, where one 
particular crystal face is etched at a faster rate than another crystal face. A commonly used orientation- 
dependent wet etch for silicon surfaces is a mixture of KOH in water and isopropanol. At approximately 350 
K, this etchant has an etch rate of 0.6 |um min -1 for the Si(100) plane, 0.1 |um min -1 for the Si(l 10) plane and 
0.006 |um min for the Si(l 1 1) plane [24]- These different etch rates can be exploited to yield anisotropically 
etched surfaces. 

Semiconductor processing consists of a number of complex steps, of which etching is an integral step. Figure 
A3. 10.7 shows an example of the use of etching [25] in which the goal of this particular process is to remove 
certain parts of a film, while leaving the rest in a surface pattern to serve as, for example, interconnection 
paths. This figure illustrates schematically how etching paired with a technique called photolithography can 
be used to manufacture a semiconductor device. In this example, the substrate enters the manufacturing 
stream covered with a film (for example, a Si0 2 film on a Si wafer). A liquid thin-film called a photoresist 
(denoted 'positive resist' or 'negative resist', as explained below) is first placed on the wafer, which is then 
spun at several thousand rotations per minute to spread out the film and achieve a uniform coating. Next, the 
wafer is exposed through a mask plate to an ultraviolet (UV) light source. The UV photons soften certain 
resists (positive resists) and harden others (negative resists). Next, a developer solution is used to remove the 
susceptible area, leaving behind the remainder according to the mask pattern. Then, the wafer is etched to 
remove all of the surface film not protected by the photoresist. Finally, the remaining photoresist is removed, 
revealing a surface with a patterned film. Thus the role of etching in semiconductor processing is vital and it 
is evident that motivation exists to explore etching reactions on a fundamental level. 
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Figure A3. 10.7 The role of etching in photolithography [25]. 
A3.1 0.4.2 DRY ETCHING TECHNIQUES 

It has already been mentioned that dry etching involves the interaction of gas phase molecules/atoms with a 
surface. More specifically, dry etching utilizes either plasmas that generate reactive species, or energetic ion 
beams to etch surfaces. Dry etching is particularly important to GaAs processing because, unlike silicon, there 
are no wet etching methods that result in negligible undercutting. Dry etching techniques can be characterized 
by either chemical or physical etching mechanisms. The chemical mechanisms tend to be more selective, i.e. 
more anisotropic, and tend to depend strongly on the specific material being etched. Several dry etch 
techniques will be briefly discussed below. For a more comprehensive description of these and other 
techniques, the reader is referred to the texts by Williams [26] or Sugawara [27]. 
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Ion milling is a dry etch technique that uses a physical etching mechanism. In ion milling, ions of an inert gas 
are generated and then accelerated to impinge on a surface. The etching mechanism is simply the 
bombardment of these energetic ions on the surface, resulting in erosion. The energy of the ions can be 
controlled by varying the accelerating voltage, and it may be possible to change the selectivity by varying the 
angle of incidence. 


Plasma etching is a term used to describe any dry etching process that utilizes reactive species generated from 
a gas plasma. For semiconductor processing, a low-pressure plasma, also called a glow discharge, is used. The 
glow discharge is characterized by pressures in the range 0.1-5 Torr and electron energies of 1-10 eV. The 
simplest type of plasma reactor consists of two parallel plates in a vacuum chamber filled with a gas at low 
pressure. A radio frequency (RF) voltage is applied between the two plates, generating plasma that emits a 
characteristic glow. Reactive radicals are produced by the plasma, resulting in a collection of gas phase 
species that are the products of collisions between photons, electrons, ions and atoms or molecules. These 
chemically reactive species can then collide with a nearby surface and react to form a volatile surface species, 
thereby etching the surface. 

Reactive ion etching (RIE) is distinguished from plasma etching by the fact that the surface reactions are 
enhanced by the kinetic energy of the incoming reactive species. This type of chemical mechanism is referred 
to as a kinetically assisted chemical reaction, and very often results in highly anisotropic etching. RIE is 
typically performed at low pressures (0.01-0.1 Torr) and is used industrially to etch holes in GaAs. 

Dry etching is a commonly used technique for creating highly anisotropic, patterned surfaces. The interaction 
of gas phase etchants with surfaces is of fundamental interest to understanding such phenomena as 
undercutting and the dependence of etch rate on surface structure. Many surface science studies aim to 
understand these interactions at an atomic level, and the next section will explore what is known about the 
etching of silicon surfaces. 

A3.10.4.3 ATOMIC VIEW OF ETCHING 

On the atomic level, etching is composed of several steps: diffusion of the etch molecules to the surface, 
adsorption to the surface, subsequent reaction with the surface and, finally, removal of the reaction products. 
The third step, that of reaction between the etchant and the surface, is of considerable interest to the 
understanding of surface reactions on an atomic scale. In recent years, STM has given considerable insight 
into the nature of etching reactions at surfaces. The following discussion will focus on the etching of silicon 
surfaces [28]. 

Figure A3. 10.8 schematically depicts a Si(100) surface (a) being etched to yield a rough surface (b) and a 
more regular surface (c). The surfaces shown here are seen to consist of steps, terraces and kinks, and clearly 
have a three-dimensional character, rather than the two-dimensional character of an ideally flat, smooth 
surface. The general etching mechanism is based on the use of halogen molecules, the principal etchants used 
in dry etching. Upon adsorption on silicon at room temperature, Br 2 dissociates to form bromine atoms, which 
react with surface silicon atoms. Then, if an external source of energy is provided, for example by heating Si 
(100) to 900 K, SiBr 2 forms and desorbs, revealing the silicon atom(s) beneath and completing the etching 
process. Depending upon the relative desorption energies from various surface sites, the surface could be 
etched quite differently, as seen in figure A3. 10.8(b) and figure A3. 10.8(c) . 
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Figure A3. 10.8 Depiction of etching on a Si(100) surface, (a) A surface exposed to Br 2 as well as electrons, 
ions and photons. Following etching, the surface either becomes highly anisotropic with deep etch pits (b), or 
more regular (c), depending on the relative desorption energies for different surface sites [28]. 

Semiconductors such as silicon often undergo rearrangements, or reconstructions, at surface boundaries to 
lower their surface free energy. One way of lowering the surface free energy is the reduction of dangling 
bonds, which are non-bonding orbitals that extend (dangle) into the vacuum. Si(l 1 1) undergoes a complex (7 
x 7) reconstruction that was ultimately solved using STM. Figure A3. 10.9(a) shows an STM image of the 
reconstructed Si(l 1 1) surface [29]. This reconstruction reduces the number of dangling bonds from 49 to 19 
per unit cell. 

The (7 x 7) reconstruction also affects the second atomic layer, called the rest layer. The rest layer is 
composed of silicon atoms arranged in triangular arrays that are separated from one another by rows of silicon 
dimers. Figure A3. 10.9(b) shows the exposed rest layer following bromine etching at 675 K [29]. It is 
noteworthy that the rest layer does not reconstruct to form a new (7 x 7) surface. The stability of the rest layer 
following etching of (7 x 7)-Si(l 1 1) is due to the unique role of the halogen. The silicon adlayer is removed 
by insertion of bromine atoms into Si-Si dimer bonds. Once this silicon adlayer is gone, the halogen stabilizes 
the silicon rest layer by reacting with the dangling bonds, effectively inhibiting surface reconstruction to a (7 
x 7) phase. Unfortunately, the exposure of the rest layer makes etching more difficult because to form SiBr 2 , 
bromine atoms must insert into stronger Si-Si bonds. 
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Figure A3. 10.9 STM images of Si(l 1 1) surfaces before (a) and after (b) etching by bromine at 675 K. In (a) 
the (7 x 7) reconstructed surface is seen. In (b), the rest layer consisting of triangular arrays of Si atoms has 
been exposed by etching [28]. Both images show a 17 x 17 nm 2 area. 

Si(100) reconstructs as well, yielding a (1 x 2) surface phase that is formed when adjacent silicon atoms bond 
through their respective dangling bonds to form a more stable silicon dimer. This reconstructed bonding 
results in a buckling of the surface atoms. Furthermore, because Si-Si dimer bonds are weaker than bulk 
silicon bonds, the reconstruction actually facilitates etching. For a comprehensive discussion on STM studies 
of reconstructed silicon surfaces, see [30]. 

Si(100) is also etched by Br 2 , although in a more dramatic fashion. Figure A3. 10. 10 shows a STM image of a 
Si(100) surface after etching at 800 K [28]. In this figure, the dark areas are etch pits one atomic layer deep. 
The bright rows running perpendicular to these pits are silicon dimer chains, which are composed of silicon 
atoms that were released from terraces and step edges during etching. The mechanism by which Si(100) is 
etched has been deduced from STM studies. After Br 2 dissociatively adsorbs to the surface, a bromine atom 
bonds to each silicon atom in the dimer pairs. SiBr 2 is the known desorption product and so the logical next 
step is the formation of a surface SiBr 2 species. This step can occur by the breaking of the Si-Si dimer bond 
and the transfer of a bromine atom from one of the dimer atoms to the other. Then, if enough energy is 
available to overcome the desorption barrier, SiBr 2 will desorb, leaving behind a highly uncoordinated silicon 
atom that will migrate to a terrace and eventually re-dimerize. On the other hand, if there is not enough energy 
to desorb SiBr 2 , then the Br atom would transfer back to the original silicon atom, and a silicon dimer bond 
would again be formed. In this scenario, SiBr 2 desorption is essential to the etching process. 
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Figure A3. 10.10 STM image (55 x 55 nm ) of a Si(100) surface exposed to molecular bromine at 800 K. The 
dark areas are etch pits on the terraces, while the bright rows that run perpendicular to the terraces are Si 
dimer chains. The dimer chains consist of Si atoms released from terraces and step edges during etching [28]. 

Another view of the Si(100) etching mechanism has been proposed recently [28]. Calculations have revealed 
that the most important step may actually be the escape of the bystander silicon atom, rather than SiBr 2 
desorption. In this way, the SiBr 2 becomes trapped in a state that otherwise has a very short lifetime, 
permitting many more desorption attempts. Preliminary results suggest that indeed this vacancy-assisted 
desorption is the key step to etching Si(100) with Br 2 . 

The implementation of tools such as the STM will undoubtedly continue to provide unprecedented views of 
etching reactions and will deepen our understanding of the phenomena that govern these processes. 


A3.10.5 CATALYTIC REACTIONS 

A3.1 0.5.1 INTRODUCTION 

A catalyst is a material that accelerates a reaction rate towards thermodynamic equilibrium conversion 
without itself being consumed in the reaction. Reactions occur on catalysts at particular sites, called 'active 
sites', which may have different electronic and geometric structures than neighbouring sites. Catalytic 
reactions are at the heart of many chemical industries, and account for a large fraction of worldwide chemical 
production. Research into fundamental aspects of catalytic reactions has a strong economic motivating factor: 
a better understanding of the catalytic process 


-19- 


may lead to the development of a more efficient catalyst. While the implementation of a new catalyst based on 
surface science studies has not yet been realized, the investigation of catalysis using surface science methods 
has certainly shaped the current understanding of catalytic reactions. Several recommended texts on catalysis 


can be found in [ 31 , 32 and 33 ]. 

Fundamental studies in catalysis often incorporate surface science techniques to study catalytic reactions at 
the atomic level. The goal of such experiments is to characterize a catalytic surface before, during and after a 
chemical reaction; this is no small task. The characterization of these surfaces is accomplished using a number 
of modern analytical techniques. For example, surface compositions can be determined using x-ray 
photoelectron spectroscopy (XPS) or Auger electron spectroscopy (AES). Surface structures can be probed 
using low-energy electron diffraction (LEED) or STM. In addition, a number of techniques are available for 
detecting and identifying adsorbed species on surfaces, such as infrared reflection absorption spectroscopy, 
high-resolution electron energy-loss spectroscopy (HREELS) and sum frequency generation (SFG). 

As with the other surface reactions discussed above, the steps in a catalytic reaction (neglecting diffusion) are 
as follows: the adsorption of reactant molecules or atoms to form bound surface species, the reaction of these 
surface species with gas phase species or other surface species and subsequent product desorption. The global 
reaction rate is governed by the slowest of these elementary steps, called the rate-determining or rate-limiting 
step. In many cases, it has been found that either the adsorption or desorption steps are rate determining. It is 
not surprising, then, that the surface structure of the catalyst, which is a variable that can influence adsorption 
and desorption rates, can sometimes affect the overall conversion and selectivity. 

Industrial catalysts usually consist of one or more metals supported on a metal oxide. The supported metal can 
be viewed as discrete single crystals on the support surface. Changes in the catalyst structure can be achieved 
by varying the amount, or 'loading', of the metal. An increased loading should result in a particle size 
increase, and so the relative population of a particular crystal face with respect to other crystal faces may 
change. If a reaction rate on a per active site basis changes as the metal loading changes, then the reaction is 
deemed to be structure sensitive. The surface science approach to studying structure-sensitive reactions has 
been to examine the chemistry that occurs over different crystal orientations. In general, these studies have 
shown that close-packed, atomically smooth metal surfaces such as (1 1 1) and (100) fee and (110) bec surfaces 
are less reactive than more open, rough surfaces such as fcc(l 10) and bcc(l 11). The remaining task is then to 
relate the structure sensitivity results from single-crystal studies to the activity results over real-world 
catalysts. 

Surface science studies of catalytic reactions certainly have shed light on the atomic-level view of catalysis. 
Despite this success, however, two past criticisms of the surface science approach to catalysis are that the 

pressure regimes (usually 10 Torr) and the materials (usually low-surface-area single crystals) are far 
removed from the high pressures and high-surface-area supported catalysts used industrially. These criticisms 
have been termed the 'pressure gap' and the 'materials gap'. To combat this criticism, much research in the 
last 30 years has focused on bridging these gaps, and many advances have been made that now suggest these 
criticisms are no longer warranted. 
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A3.1 0.5.2 EXPERIMENTAL 

(A) BRIDGING THE PRESSURE GAP 

The implementation of high-pressure reaction cells in conjunction with UHV surface science techniques 
allowed the first true in situ postmortem studies of a heterogeneous catalytic reaction. These cells permit 
exposure of a sample to ambient pressures without any significant contamination of the UHV environment. 
The first such cell was internal to the main vacuum chamber and consisted of a metal bellows attached to a 
reactor cup [34]- The cup could be translated using a hydraulic piston to envelop the sample, sealing it from 


the surrounding UHV by means of a copper gasket. Once isolated from the vacuum, the activity of the 
enclosed sample for a given reaction could be measured at elevated pressures. Following the reaction, the 
high-pressure cell was evacuated and then retracted, exposing the sample again to the UHV environment, at 
which point any number of surface science techniques could be used to study the 'spent' catalyst surface. 

Shortly thereafter, another high-pressure cell design appeared [35], This design consisted of a sample 
mounted on a retractable bellows, permitting the translation of the sample to various positions. The sample 
could be retracted to a high-pressure cell attached to the primary chamber and isolated by a valve, thereby 
maintaining UHV in the primary chamber when the cell was pressurized for catalytic studies. The reactor 
could be evacuated following high-pressure exposures before transferring the sample back to the main 
chamber for analysis. 

A modification to this design appeared several years later ( figure A3. 10. 11 ) [ 36 , 37 ]. In this arrangement, the 
sample rod can be moved easily between the UHV chamber and the high-pressure cell without any significant 
increase in chamber pressure. Isolation of the reaction cell from UHV is achieved by a differentially pumped 
sliding seal mechanism ( figure A3. 10. 12 ) whereby the sample rod is pushed through the seals until it is 
located in the high-pressure cell. Three spring-loaded, differentially pumped Teflon seals are used to isolate 
the reaction chamber from the main chamber by forming a seal around the sample rod. Differential pumping 
is accomplished by evacuating the space between the first and second seals (on the low-pressure side) by a 
turbomolecular pump and the space between the second and third seals (on the high-pressure side) by a 
mechanical (roughing) pump. Pressures up to several atmospheres can be maintained in the high-pressure cell 
while not significantly raising the pressure in the attached main chamber. 

The common thread to these designs is that a sample can be exposed to reaction conditions and then studied 
using surface science methods without exposure to the ambient. The drawback to both of these designs is that 
the samples are still being analysed under UHV conditions before and after the reaction under study. The need 
for in situ techniques is clear. 
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Figure A3.10.ll Side view of a combined high-pressure cell and UHV surface analysis system [37]. 
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Figure A3. 10.12 Side view of the high-pressure cell showing the connections to the UHV chamber, the 
turbomolecular pump and the gas handling system. The differentially pumped sliding seal is located between 
the high-pressure cell and the UHV chamber [37]. 
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Two notable in situ techniques are at the forefront of the surface science of catalysis: STM and SFG. STM is 
used to investigate surface structures while SFG is used to investigate surface reaction intermediates. The 
significance of both techniques is that they can operate over a pressure range of 13 orders of magnitude, from 

10 to 10 Torr, i.e. they are truly in situ techniques. STM has allowed the visualization of surface 
structures under ambient conditions and has shed light on adsorbate-induced morphological changes that 
occur at surfaces, for both single-crystal metals and metal clusters supported on oxide single crystals. Studies 
of surface reactions with SFG have given insight into reaction mechanisms previously investigated under non- 
ideal pressure or temperature constraints. Both SFG and STM hold promise as techniques that will contribute 
greatly to the understanding of catalytic reactions under in situ conditions. 

(B) BRIDGING THE MATERIALS GAP 


Single crystals are traditionally used in UHV studies because they provide an opportunity to well characterize 
a surface. However, as discussed above, single crystals are quite different from industrial catalysts. Typically, 
such catalysts consist of supported particles that can have multiple crystal orientations exposed at the surface. 
Therefore, an obstacle in attempting surface science studies of catalysis is the preparation of a surface in such 
a way that it mimics a real-world catalyst. 

One criterion necessary for using charged-particle spectroscopies such as AES and EELS is that the material 
being investigated should be conductive. This requisite prevents problems such as charging when using 
electron spectroscopies and ensures homogeneous heating during thermal desorption studies. A problem then 
with investigating oxide surfaces for use as metal supports is that many are insulators or semiconductors. For 
example, alumina and silica are often used as oxide supports for industrial catalysts, yet both are insulators at 
room temperature, severely hindering surface science studies of these materials. However, thin-films of these 
and other oxides can be deposited onto metal substrates, thus providing a conductive substrate (via tunnelling) 


for use with electron spectroscopies and other surface science techniques. 

Thin oxide films may be prepared by substrate oxidation or by vapour deposition onto a suitable substrate. An 
example of the former method is the preparation of silicon oxide thin-films by oxidation of a silicon wafer. In 
general, however, the thickness and stoichiometry of a film prepared by this method are difficult to control. 
On the other hand, vapour deposition, which consists of evaporating the parent metal in an oxidizing 
environment, allows precise control of the film thickness. The extent of oxidation can be controlled by 
varying the 2 pressure (lower 2 pressures can lead to lower oxides) and the film thickness can be controlled 
by monitoring the deposition rate. A number of these thin metal oxide films have been prepared by vapour 
deposition, including Si0 2 , A1 2 3 , MgO, Ti0 2 and NiO [38]. 


MgO films have been grown on a Mo(100) substrate by depositing Mg onto a clean Mo(100) sample in 2 
ambient at 300 K [39, 40]. LEED results indicated that MgO grows epitaxially at an optimum 2 pressure of 

10" 7 Torr, with the (100) face of MgO parallel to the Mo(100) surface. Figure A3. 10. 13 shows a ball model 
illustration of the MgO(100) overlayer on Mo(100). The chemical states of Mg and O were also probed as a 
function of the 2 pressure during deposition by AES and XPS. It was found that as the 2 pressure was 

increased, the metallic Mg° (L 9 .VV) Auger transition at 44.0 eV decreased while a new transition at 32.0 eV 

9+ 

increased. The transition at 32.0 eV was assigned to a Mg (L ? .VV) transition due to the formation of MgO. 

H 9-1- ? c\ 

When the 2 pressure reached 10 Torr, the Mg feature dominated the AES spectrum while the Mg u 
feature completely diminished. XPS studies confirmed the LEED and AES results, verifying that MgO was 
formed at the optimal 2 pressure. Furthermore, the Mg 2p and Ols XPS peaks from the MgO film had the 
same binding energy (BE) and peak shape as the Mg 2p and Ols peaks from an MgO single crystal. Both 
AES and XPS indicated that the stoichiometry of the film was MgO. Further annealing in 2 did 
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not increase the oxygen content of the film, which supports the fact that no evidence of Mg suboxides was 
found. This MgO film was successfully used to study the nature of surface defects in Li-doped MgO as they 
relate to the catalytic oxidative coupling of methane. 



Figure A3. 10.13 Ball model illustration of an epitaxial MgO overlayer on Mo(100) [38], 


The deposition of titanium oxide thin-films on Mo(l 10) represents a case where the stoichiometry of the film 
is sensitive to the deposition conditions [41]. It was found that both Ti0 2 and Ti 2 3 thin-films could be made, 


depending on the Ti deposition rate and the 2 background pressure. Lower deposition rates and higher 2 
pressures favoured the formation of Ti0 2 . The two compositionally different films could be distinguished in 
several ways. Different LEED patterns were observed for the different films: Ti0 2 exhibited a (1 x 1) 
rectangular periodicity, while Ti 2 3 exhibited a (1 x 1) hexagonal pattern. XPS Ti 2p data clearly 
differentiated the two films as well, showing narrow peaks with a Ti 2p 3/2 BE of 459.1 eV for Ti0 2 and broad 
peaks with a Ti 2p 3/2 BE of 458.1 eV for Ti 2 3 . From LEED and HREELS results, it was deduced that the 
surfaces grown on Mo(l 10) were TiO 2 (100) and Ti 2 O 3 (0001). Therefore, it is clear that vapour deposition 
allows control over thickness and extent of oxidation and is certainly a viable method for producing thin oxide 
films for use as model catalyst supports. 

Metal vapour deposition is a method than can be used to conveniently prepare metal clusters for investigation 
under UHV conditions. The deposition is accomplished using a doser constructed by wrapping a high-purity 
wire of the metal to be deposited around a tungsten or tantalum filament that can be resistively heated. After 
sufficient outgassing, which is the process of heating the doser to remove surface and bulk impurities, then a 
surface such as an oxide can be exposed to the metal emanating from the doser to yield a model oxide- 
supported metal catalyst. 

Model catalysts such as Au/Ti0 2 (l 10) have been prepared by metal vapour deposition [42]. Figure A3. 10. 14 

shows a STM image of 0.25 ML (1 ML = 1.387 x 10 15 atoms cm" 2 ) Au/Ti0 2 (l 10). These catalysts were 
tested for CO oxidation to compare to conventional Au catalysts. It is well known that for conventional Au 
catalysts there is an optimal Au cluster size (-3 nm) that yields a maximum CO oxidation rate. This result was 
duplicated by measuring the CO oxidation rate over model Au/Ti0 2 (l 10), where the cluster sizes were varied 
by manipulating the deposition 
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amounts. There is a definite maximum in the CO oxidation activity at a cluster size of approximately 3.5 nm. 
Furthermore, investigation of the cluster electronic properties using scanning tunnelling spectroscopy (STS) 
revealed a correlation between the cluster electronic structure and the maximum in CO oxidation activity. 
Pd/SiO 2 /Mo(100) model catalysts were also prepared and were found to have remarkably similar kinetics for 
CO oxidation when compared to Pd single crystals and conventional silica-supported Pd catalysts [43]. These 
results confirm that metal vapour deposition on a suitable substrate is a viable method for producing model 
surfaces for UHV studies. 



nm 


Figure A3. 10.14 STM image of 0.25 ML Au vapour-deposited onto Ti0 2 (l 10). Atomic resolution of the 
substrate is visible as parallel rows. The Au clusters are seen to nucleate preferentially at step edges. 

Another method by which model-supported catalysts can be made is electron beam lithography [44]. This 
method entails spin-coating a polymer solution onto a substrate and then using a collimated electron beam to 
damage the polymer surface according to a given pattern. Next, the damaged polymer is removed, exposing 
the substrate according to the electron beam pattern, and the sample is coated with a thin metal film. Finally, 
the polymer is removed from the substrate, taking with it the metal film except where the metal was bound to 
the substrate, leaving behind metal particles of variable size. This technique has been used to prepare Pt 
particles with 50 nm diameters and 15 nm heights on an oxidized silicon support [44]. It was found that 
ethylene hydrogenation reaction rates on the model catalysts agreed well with turnover rates on Pt single 
crystals and conventional Pt-supported catalysts. 
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A3.1 0.5.3 ATOMIC-LEVEL VIEWS OF CATALYSIS 


(A) NH 3 SYNTHESIS: N 2 +3H 2 ^ 2NH 3 


Ammonia has been produced commercially from its component elements since 1909, when Fritz Haber first 
demonstrated the viability of this process. Bosch, Mittasch and co-workers discovered an excellent promoted 
Fe catalyst in 1909 that was composed of iron with aluminium oxide, calcium oxide and potassium oxide as 
promoters. Surprisingly, modern ammonia synthesis catalysts are nearly identical to that first promoted iron 
catalyst. The reaction is somewhat exothermic and is favoured at high pressures and low temperatures, 
although, to keep reaction rates high, moderate temperatures are generally used. Typical industrial reaction 
conditions for ammonia synthesis are 650-750 K and 150-300 atm. Given the technological importance of the 


ammonia synthesis reaction, it is not surprising that surface science techniques have been used to thoroughly 
study this reaction on a molecular level [45, 46 ]. 

As mentioned above, a structure-sensitive reaction is one with a reaction rate that depends on the catalyst 
structure. The synthesis of ammonia from its elemental components over iron surfaces is an example of a 
structure-sensitive reaction. Figure A3. 10. 15 demonstrates this structure sensitivity by showing that the rate of 
NH 3 formation at 20 atm and 600-700 K has a clear dependence on the surface structure [47]. The (111) and 
(21 1) Fe faces are much more active than the (100), (210) and (110) faces. Figure A3. 10. 16 depicts the 
different Fe surfaces for which ammonia synthesis was studied in figure A3. 10. 15 . The coordination of the 
different surface atoms is denoted in each drawing. Surface roughness is often associated with higher catalytic 
activity, however in this case the (1 1 1) and (210) surfaces, both of which can be seen to be atomically rough, 
have distinctly different catalytic activities. Closer inspection of these surfaces reveals that the (1 1 1) and 
(211) faces have a C 7 site in common, i.e. a surface Fe atom with seven nearest neighbours. The high catalytic 
activity of the (1 1 1) and (21 1) Fe faces has been proposed to be due to the presence of these C 7 sites. 


-26- 


T = 673K 

20 atm 3:1 H 2 :N 3 



(111) (211) {100) (210) (110) 
Surface Orientation 

Figure A3. 10.15 NH 3 synthesis activity of different Fe single-crystal orientations [32], Reaction conditions 
were 20 atm and 600-700 K. 
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Figure A3. 10.16 Illustrations of the surfaces in figure A3. 10. 15 for which ammonia synthesis activity was 
tested. The coordination of the surface atoms is noted in the figure [32]. 

It is widely accepted that the rate-determining step in NH 3 synthesis is the dissociative adsorption of N 2 , 
depicted in a Lennard- Jones potential energy diagram in figure A3. 10. 17 [46]. This result is clearly illustrated 
by examining the sticking coefficient (the adsorption rate divided by the collision rate) of N 2 on different Fe 
crystal faces ( figure A3. 10. 18 ) [48]. The concentration of surface nitrogen on the Fe single crystals at elevated 
temperatures in UHV was monitored with AES as a function of N 2 exposure. The sticking coefficient is 
proportional to the slope of the curves in figure A3. 10. 18 . The initial sticking coefficients increase in the order 
(1 10) < (100) < (1 1 1), which is the same trend observed for the ammonia synthesis catalytic activity at high- 
pressure (20 atm). This result indicates that the pressure gap for ammonia synthesis can be overcome: the 
kinetics results obtained in UHV conditions can be readily extended to the kinetics results obtained under 
high-pressure reaction conditions. 
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Figure A3. 10.17 Potential energy diagram for the dissociative adsorption of N 2 [46]. 
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Figure A3. 10.18 Surface concentration of nitrogen on different Fe single crystals following N 2 exposure at 
elevated temperatures in UHV [48]. 

Further work on modified Fe single crystals explored the role of promoters such as aluminium oxide and 
potassium [49, 50 and 51 ]. It was found that the simple addition of aluminium oxide to Fe single crystal 
surfaces decreased the ammonia synthesis rate proportionally to the amount of Fe surface covered, indicating 
no favourable interaction between Fe and aluminium oxide under those conditions. However, by exposing an 
aluminium-oxide-modified Fe surface to water vapour, the surface was oxidized, inducing a favourable 

interaction between Fe and the Al O . This interaction resulted in a 400-fold increase in ammonia synthesis 

x y J 

activity for AMD /Fe(l 10) as compared to Fe(l 10) and an activity for AMD /Fe(l 10) comparable to that of Fe 
(111). Interestingly, aluminium-oxide-modified Fe(l 11) showed no change in activity. The increase in activity 
for AH3 /Fe(l 10) to that of Fe(l 11) suggests a possible reconstruction 


-29- 


of the catalyst surface, in particular that Fe(l 11) and Fe(21 1) surfaces may be formed. These surfaces have C 7 
sites and so the formation of crystals with these orientations could certainly lead to an enhancement in 
catalytic activity. Thus, the promotion of Fe ammonia synthesis catalysts by Al^O appears to be primarily a 
geometric effect. 


The addition of potassium to Fe single crystals also enhances the activity for ammonia synthesis. Figure 

A3. 10. 19 shows the effect of surface potassium concentration on the N 2 sticking coefficient. There is nearly a 

300-fold increase in the sticking coefficient as the potassium concentration reaches -1.5 x 10 K atoms cm - 
. Not only does the sticking coefficient increase, but with the addition of potassium as a promoter, N 2 

molecules are bound more tightly to the surface, with the adsorption energy increasing from 30 to 45 kJ mol - 
. A consequence of the lowering of the N 2 potential well is that the activation energy for dissociation (E* in 

Figure A3. 10. 17 ) also decreases. Thus, the promotion of Fe ammonia synthesis catalysts by potassium 

appears to be primarily an electronic effect. 
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Figure A3. 10.19 Variation of the initial sticking coefficient of N 2 with increasing potassium surface 
concentration on Fe(100) at 430 K [50]. 

(B) ALKANE HYDROGENOLYSIS 

Alkane hydrogeno lysis, or cracking, involves the dissociation of a larger alkane molecule to a smaller alkane 
molecule. For example, ethane hydrogenolysis in the presence of H 2 yields methane: 

CiH 6 + H : -+ 2CH 4 . 

Cracking (or hydrocracking, as it is referred to when carried out in the presence of H 2 ) reactions are an 
integral part of petroleum refining. Hydrocracking is used to lower the average molecular weight (MW) of a 
higher MW hydrocarbon mixture so that it can then be blended and sold as gasoline. The interest in the 
fundamentals of catalytic cracking reactions is strong and it has been thoroughly researched. 
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Ethane hydrogenolysis has been shown to be structure sensitive over nickel catalysts [43], as seen in figure 
A3. 10.20 where methane formation rates are plotted for both nickel single crystals and a conventional, 
supported nickel catalyst. There is an obvious difference in the rates over Ni(l 11) and Ni(100), and it is 
evident that the rate also changes as a function of particle size for the supported Ni catalysts. In addition, 
differences in activation energy were observed: for Ni(l 1 1) the activation energy is 192 kJ mol, while for 
Ni(100) the activation energy is 100 kJ mol. It is noteworthy that there is overlap between the 
hydrogenolysis rates over supported Ni catalysts with the Ni single crystals. The data suggest that small Ni 
particles are composed primarily of Ni(100) facets while large Ni particles are composed primarily of Ni(l 11) 
facets. In fact, this has been observed for fee materials where surfaces with a (1 1 1) orientation are more 
commonly observed after thermally induced sintering. The structure sensitivity of this reaction over Ni 
surfaces has been clearly demonstrated. 
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Figure A3. 10.20 Arrhenius plot of ethane hydrogenolysis activity for Ni(100) and Ni(l 1 1) at 100 Torr and 
H 2 /C 2 H 6 = 100. Also included is the hydrogenolysis activity on supported Ni catalysts at 175 Torr and 
H 2 /C 2 H 6 = 6.6[43]. 


The initial step in alkane hydrogenolysis is the dissociative adsorption, or 'reactive sticking' of the alkane. 
One might suspect that this first step may be the key to the structure sensitivity of this reaction over Ni 
surfaces. Indeed, the reactive sticking of alkanes has been shown to depend markedly on surface structure 
[52]. Figure A3. 10.21 shows the buildup of surface carbon due to methane decomposition (P me ^ mG = 1-00 
Torr) over three single-crystal Ni surfaces at 450 K. The rate of methane decomposition is obviously 
dependent upon the surface structure with the decomposition rate increasing in the order (1 1 1) < (100) < 
(110). It can be seen that, initially, the rates of methane decomposition are 
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similar for Ni(100) and (110), while Ni(l 11) has a much lower reaction rate. With increasing reaction time, 
i.e. increasing carbon coverage, the rate over Ni(l 10) continues to increase linearly while both Ni(l 11) and 
(100) exhibit a nonlinear dependence. This linear dependence over Ni(l 11) may be due to either the formation 
of carbon islands or a reduced carbon coverage dependence as compared to Ni(l 11) and (100). 
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Figure A3. 10.21 Methane decomposition kinetics on low-index Ni single crystals at 450 K and 1.00 Torr 
methane [43]. 

Hydrogenolysis reactions over Ir single crystals and supported catalysts have also been shown to be structure 
sensitive [53, 54 and 55]. In particular, it was found that the reactivity tracked the concentration of low- 
coordination surface sites. Figure A3. 10.22 shows ethane selectivity (selectivity is reported here because both 
ethane and methane are products of butane cracking) for ?z -butane hydrogenolysis over Ir( 111) and the 
reconstructed surface Ir(l 10)-(1 x 2), as well as two supported Ir catalysts. There are clear selectivity 
differences between the two Ir surfaces, with Ir(l 10)-(1 x 2) having approximately three times the ethane 
selectivity of Ir(l 11). There is also a similarity seen between the ethane selectivity on small Ir particles and Ir 
(1 10)-(1 x 2), and between the ethane selectivity on large Ir particles and Ir(l 11). 
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Figure A3. 10.22 Relationship between selectivity and surface structure for ^-butane hydrogenolysis on 
iridium, (a) Illustrations of the Ir(l 10)-(1 x 2) and Ir( 111) surfaces. The z-axis is perpendicular to the plane of 
the surface, (b) Selectivity for C 2 H 6 production (mol% total products) for ^2-butane hydrogenolysis on both Ni 
single crystals and supported catalysts at 475 K. The effective particle size for the single crystal surfaces is 
based on the specified geometric shapes [43]. A Ir/Al 2 3 ; Glr/Si0 2 . 

The mechanisms by which ?z -butane hydrogenolysis occurs over Ir(l 10)-(1 x 2) and Ir(l 1 1) are different. The 
high ethane selectivity of Ir(l 10)-(1 x 2) has been attributed to the 'missing row' reconstruction that the (110) 
surface undergoes (figure A3. 10.22). This reconstruction results in the exposure of a highly uncoordinated C 7 
site that is sterically unhindered. These C 7 sites are capable of forming a metallocyclopentane (a five- 
membered ring consisting of four carbons and an Ir atom) which, based on kinetics and surface carbon 
coverages, has been suggested as the intermediate for this reaction [56, 57 ]. It has been proposed that the 
crucial step in this reaction mechanism over the reconstructed (110) surface is the reversible cleavage of the 
central C-C bond. On the other hand, the hydrogenolysis of ?z -butane over Ir(l 1 1) is thought to proceed by a 
different mechanism, where dissociative chemisorption of ^-butane and hydrogen are the first steps. Then, the 
adsorbed hydrocarbon undergoes the irreversible cleavage of the terminal C-C bond. It is evident that surface 
structure plays an important role in hydrogenolysis reactions over both nickel and iridium surfaces. 

(C) CO OXIDATION: 2CO+0 2 -> 2C0 2 

The oxidation of CO to C0 2 , which is essential to controlling automobile emissions, has been extensively 
studied because of the relative simplicity of this reaction. CO oxidation was the first reaction to be studied 
using the surface science approach and is perhaps the most well understood heterogeneous catalytic reaction 
[58]. The simplicity of CO oxidation by 2 endears itself to surface science studies. Both reactants are 
diatomic molecules whose adsorption 
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on single-crystal surfaces has been widely studied, and presumably few steps are necessary to convert CO to 
C0 2 . Surface science studies of CO and 2 adsorption on metal surfaces have provided tremendous insight 


into the mechanism of the CO-0 2 reaction. The mechanism over platinum surfaces has been unequivocally 
established and the reaction has shown structure insensitivity over platinum [59], palladium [55, 59, 60] and 
rhodium surfaces [ 61 , 62 ]. 

Although dissociative adsorption is sometimes observed, CO adsorption on platinum group metals typically 
occurs molecularly and this will be the focus of the following discussion. Figure A3. 10.23 illustrates 
schematically the donor-acceptor model (first proposed by Blyholder [63]) for molecular CO chemisorption 
on a metal such as platinum. The bonding of CO to a metal surface is widely accepted to be similar to bond 
formation in a metal carbonyl. Experimental evidence indicates that the 5a highest occupied molecular orbital 
(HOMO), which is regarded as a lone pair on the carbon atom, bonds to the surface by donating charge to 
unoccupied density of states (DOS) at the surface. Furthermore, this surface bond can be strengthened by 
back-donation, which is the transfer of charge from the surface to the 2tt* lowest unoccupied molecular 
orbital (LUMO). An effect of this backbonding is that the C-0 bond weakens, as seen by a lower C-0 stretch 

frequency for adsorbed CO (typically <2100 cm -1 ) than for gas phase CO (2143 cm -1 ). 


2rc* 



Figure A3. 10.23 Schematic diagram of molecular CO chemisorption on a metal surface. The model is based 
on a donor-acceptor scheme where the CO 5a HOMO donates charge to surface unoccupied states and the 
surface back-donates charge to the CO 2n LUMO [58], 

Ultraviolet photoelectron spectroscopy (UPS) results have provided detailed information about CO adsorption 
on many surfaces. Figure A3. 10.24 shows UPS results for CO adsorption on Pd(l 10) [58] that are 
representative of molecular CO adsorption on platinum surfaces. The difference result in (c) between the 
clean surface and the CO-covered surface shows a strong negative feature just below the Fermi level (^ F ), and 
two positive features at -8 and 1 1 eV below E^. The negative feature is due to suppression of emission from 
the metal d states as a result of an anti-resonance phenomenon. The positive features can be attributed to the 
4a molecular orbital of CO and the overlap of the 5a and Itt molecular orbitals. The observation of features 
due to CO molecular orbitals clearly indicates that CO molecularly adsorbs. The overlap of the 5a and In 
levels is caused by a stabilization of the 5a molecular orbital as a consequence of forming the surface-CO 
chemisorption bond. 
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Figure A3.10.24 UPS data for CO adsorption on Pd(l 10). (a) Clean surface, (b) CO-dosed surface, (c) 
Difference spectrum (b-a). This spectrum is representative of molecular CO adsorption on platinum metals 
[58]. 

The adsorption of 2 on platinum surfaces is not as straightforward as CO adsorption because molecular and 
dissociative adsorption can occur, as well as oxide formation [58]. However, molecular adsorption has been 
observed only at very low temperatures, where CO oxidation rates are negligible, hence this form of adsorbed 
oxygen will not be discussed here. UPS data indicate dissociative adsorption of 2 on platinum surfaces at 
temperatures >100 K, and isotopic exchange measurements support this finding as well. The oxygen atoms 
resulting from 2 dissociation can be either chemisorbed oxygen or oxygen in the form of an oxide. The two 
types of oxygen are distinguished by noting that oxide oxygen is located beneath the surface ('subsurface') 
while chemisorbed oxygen is located on the surface. Experimentally, the two types of oxygen are discernible 
by AES, XPS and UPS. In general, it has been found that as long as pressure and temperature are kept fairly 
low, the most likely surface oxygen species will be chemisorbed. Therefore, when formulating a mechanism 
for reaction under these general conditions, only chemisorbed oxygen needs to be considered. 
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The mechanism for CO oxidation over platinum group metals has been established from a wealth of data, the 
analysis of which is beyond the scope of this chapter. It is quite evident that surface science provided the 
foundation for this mechanism by directly showing that CO adsorbs molecularly and 2 adsorbs 


dissociatively. The mechanism is represented below (* denotes an empty surface site): 

CXJ+* ^C(ii 

2 +2*^ 20 iJsJ 
O^^CQ-u -> CO. +2*. 

The first step consists of the molecular adsorption of CO. The second step is the dissociation of 2 to yield 
two adsorbed oxygen atoms. The third step is the reaction of an adsorbed CO molecule with an adsorbed 
oxygen atom to form a C0 2 molecule that, at room temperature and higher, desorbs upon formation. To 
simplify matters, this desorption step is not included. This sequence of steps depicts a Langmuir- 
Hinshelwood mechanism, whereby reaction occurs between two adsorbed species (as opposed to an Eley- 
Rideal mechanism, whereby reaction occurs between one adsorbed species and one gas phase species). The 
role of surface science studies in formulating the CO oxidation mechanism was prominent. 

CO oxidation by 2 is a structure-insensitive reaction over rhodium catalysts [61, 62]. Figure A3. 10.25 
illustrates this structure insensitivity by demonstrating that the activation energies over supported Rh catalysts 
and a Rh(l 11) single crystal (given by the slope of the line) were nearly identical. Furthermore, the reaction 
rates over both supported Rh/Al 2 3 and single crystal Rh (1 1 1) surfaces were also remarkably similar. Thus, 
the reaction kinetics were quite comparable over both the supported metal particles and the single crystal 
surfaces, and no particle size effect (structure sensitivity) was observed. 
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Figure A3. 10.25 Arrhenius plots of CO oxidation by 2 over Rh single crystals and supported Rh/Al 2 3 at 
PCO = P0 2 = 0.01 atm [43]. The dashed line in the figure is the predicted behaviour based on the rate 
constants for CO and 2 adsorption and desorption on Rh under UHV conditions. 

The study of catalytic reactions using surface science techniques has been fruitful over the last 30 years. Great 
strides have been made towards understanding the fundamentals of catalytic reactions, particularly by 


bridging the material and pressure gaps. The implementation of in situ techniques and innovative model 
catalyst preparation will undoubtedly shape the future of catalysis. 
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A3.11 Quantum mechanics of interacting systems: 
scattering theory 

George C Schatz 


A3.11.1 INTRODUCTION 

Quantum scattering theory is concerned with transitions between states which have a continuous energy 
spectrum, i.e., which are unbound. The most common application of scattering theory in chemical physics is 
to collisions involving atoms, molecules and/or electrons. Such collisions can produce many possible results, 
ranging from elastic scattering to reaction and fragmentation. Scattering theory can also be used to describe 
collisions of atoms, molecules and/or electrons with solid surfaces and it also has application to many kinds of 
dynamical process in solids. These latter include collisions of conduction electrons in a metal with impurities 
or with particle surfaces, or collisions of collective wave motions such as phonons with impurities, or 
adsorbates. Scattering theory is also involved in describing the interaction of light with matter, including 
applications to elastic and inelastic light scattering, photoabsorption and emission. Additionally, there are 
many processes where continuum states of particles are coupled to continuum states of electromagnetic 
radiation, including photodissociation of molecules and photoemission from surfaces. 

While the basic formalism of quantum scattering theory can be found in a variety of general physics textbooks 
[1, 2, 3 , 4, 5, 6 and 7] and textbooks that are concerned with scattering theory in a broad sense [8, 9, 10 and 
11], many problems in chemical physics require special adaptation of the theory. For example, in collisions of 
particles with surfaces, angular momentum conservation is not important, but linear momentum conservation 
can be crucial. Also, in many collision problems involving atoms and molecules, the de Broglie wavelength is 
short compared to the distances over which the particles interact strongly, making classical or semiclassical 
theory useful. One especially important feature associated with scattering theory applications in chemical 
physics is that the forces between the interacting particles can usually be determined with reasonable accuracy 
(in principle to arbitrary accuracy), so explicit forms for the Hamiltonian governing particle motions are 


available. Often these forces are quite complicated, so it is not possible to develop analytical solutions to the 
scattering theory problem. However, numerical solutions are possible, so a significant activity among 
researchers in this field is the development of numerical methods for solving scattering problems. There are a 
number of textbooks which consider scattering theory applications of more direct relevance to problems in 
chemical physics [12, 13, 14, 15, 16, 17 and 18], as well as numerous monographs that have a narrower focus 
within the field [19, 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29]. 

Much of what one needs to know about scattering theory can be understood by considering a particle moving 
in one dimension governed by a potential that allows it to move freely except for a range of coordinates where 
there is a feature such as a barrier or well that perturbs the free particle motion. Our discussion will therefore 
begin with this simple problem ( section A3. 11.2 ). Subsequently ( section A3. 11. 3 ) more complete versions of 
scattering theory will be developed that apply to collisions involving particles having internal degrees of 
freedom. There are both time dependent and time independent versions of scattering theory, and both of these 
theories will be considered. In section A3. 11. 4 , the numerical methods that are used to calculate scattering 
theory properties are considered, including time dependent and independent approaches. Also presented 
( section A3. 11. 5 ) are scattering theory methods for determining information that has been summed and 
averaged over many degrees of freedom (such as in a Boltzmann distribution). 


Finally, in section A3. 11. 6 , scattering theory methods based on classical and semiclassical mechanics are 
described. 

There are a variety of topics that will not be considered, but it is appropriate to provide references for further 
reading. The development in this paper assumes that the Born-Oppenheimer approximation applies in 
collisions between atoms and molecules and thus the nuclear motion is governed by a single potential energy 
surface. However there are many important problems where this approximation breaks down and multiple 
coupled potential energy surfaces are involved, with nonadiabatic transitions taking place during the 
scattering process. The theory of such processes is described in many places, such as [14, 15, 23, 25]. 

Other topics that have been omitted include the description of scattering processes using Feynman path 
integrals [18, 19] and the description of scattering processes with more than two coupled continua (i.e., where 
three or more independent particles are produced, as in electron impact ionization [30] or collision induced 
dissociation) [31]. Our treatment of resonance effects in scattering processes (i.e., the formation of metastable 
intermediate states) is very brief as this topic is commonly found in textbooks and one monograph is available 
[26]. Finally, it should be mentioned that the theory of light scattering is not considered; interested readers 
should consult textbooks such as that by Newton [8]. 


A3.11.2 QUANTUM SCATTERING THEORY FOR A ONE- 
DIMENSIONAL POTENTIAL FUNCTION 

A3.11. 2.1 HAMILTONIAN; BOUNDARY CONDITIONS 

The problem of interest in this section is defined by the simple one-dimensional Hamiltonian 


H = +VU) (A3.11.1J 

2)ii 

where V(x) is the potential energy function, examples of which are pictured in figure A3.1 1.1 . The potentials 


shown are of two general types: those which are constant in the limit of x — » ±00 figure A3. 11.1(a) and figure 
A3. 11. 1(b) , and those which are constant in the limit of x —> -00 and are infinite in the limit of x — » -00 figure 
A3. 11.1(c) (of course this potential could be flipped around if one wants). In the former case, one can have 
particles moving at constant velocity in both asymptotic limits (x — » ±00), so there are two physically distinct 
processes that can be described, namely, scattering in which the particle is initially moving to the right in the 
limit x —> -go, and scattering in which the particle is initially moving to the left in the limit x —> go. In the latter 
case figure A3. 11.1(c) , the only physically interesting situation involves the particle initially moving to the 
left in the limit x — » go. The former case is appropriate for describing a chemical reaction where there is either 
a barrier figure A3. 11. 1(a) or a well figure A3. 11.1(b) . It is also relevant to the scattering of an electron from 
the surface of a metal, where either transmission or reflection can occur. In figure A3. 11. 1(c) , only reflection 
can occur, such as happens in elastic collisions of atoms, or low energy collisions of molecules with surfaces. 


(a) 



(b) 




Figure A3. 11.1. Potential associated with the scattering of a particle in one dimension. The three cases shown 
are (a) barrier potential, (b) well potential and (c) scattering off a hard wall that contains an intermediate well. 

The physical question to be answered for figure A3. 1 1 . 1 (a) and figure A3. 1 1 . 1 (b) is: what is the probability P 
that a particle incident with an energy E from the left at x — » -go will end up moving to the right at x — » +00? In 
the case of figure A3. 1 1.1(c) only reflection can occur. However the change in phase of the wavefunction that 
occurs in this reflection is often of interest. In the following treatment the detailed theory associated with 
figure A3 . 1 1 . 1 (a) and figure A3 . 1 1 . 1 (b) will be considered. Eventually we will see that figure A3 . 1 1 . 1 (c) is a 
subset of this theory. 

The classical expression for the transmission probability associated with figure A3. 1 1.1(a) or figure A3. 1 1.1 
(b) is straightforward, namely 

(1) P(E) = if V(x) > E for any x 

(2) P(E) = 1 if V(x) < for all x. 

The quantum solution to this problem is much more difficult for a number of reasons. First, it is important to 
know how to define what we mean by a particle moving in a given direction when V(x) is constant. Secondly, 
one must determine the probability that the particle is moving in any specified direction at any desired 


location and, third, we need to be able to solve the Schrodinger equation for the potential V(x). 
A3.11.2.2 WAVEPACKETS IN ONE DIMENSION 

To understand how to describe a particle moving in a constant potential, consider the case of a free particle for 
which V(x) = 0. In this case the time-dependent Schrodinger equation is 


Ul ^L = -lt^L (A3.11.2) 

i)r 2w it* 2 

and if one invokes the usual procedure for separating the time and spatial parts of this equation, it can readily 
be shown that one possible solution is 

^UM) = e~ itfA e f * T (A3.11.3) 


where 

Ti 2 k 2 

E = (A3.11.4) 

2m 

is the particle's energy and hk is its linear momentum. Note that both energy and momentum of the particle 
are exactly specified in this solution. As might be expected from the uncertainty principle, the location of the 
particle is therefore completely undetermined. As a result, this solution to the Schrodinger equation, even 
though correct, is not useful for describing the scattering processes of interest. 

In order to localize the particle, it is necessary to superimpose wavefunctions \|/ with different momenta k. A 
very general way to do this is to construct a wavepacket, defined through the integral 

/v 
dkC{kW t {x t t) 

2 (A3. 11.5) 
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where C(k) is a function which tells us how much of each momentum hk is contained in the wavepacket. If the 
particle is to move with roughly a constant velocity, C(k) must be peaked at some k which is taken to be £ Q . 
One function which accomplishes this is the Gaussian 


C(k) = /-JLjcxpt-d^t- A",)) 2 / 2 ] (A3.11.6) 

where a measures the width of the packet. Substituting this into equation (A3. 1 1.5), the result is: 

(A3. 11.7) 
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The absolute square of this wavefunction is |v|/ (x,0| 
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This is a Gaussian function which peaks at x = hk^tlm, moving to the right with a momentum hk^. The width 
of this peak is 

A =tf[(ln2](l+nV/fflV)] J/2 (A3.11.9) 

which starts out at A = a(ln 2) 1/2 at t = and increases linearly with time for large t. This increase in width 
means that wavepacket spreads as it moves. This is an inevitable consequence of the fact that the wavepacket 
was constructed with a distribution of momentum components, and is a natural consequence of the uncertainty 
principle. Note that the wavefunction in equation (A3. 11.7) still satisfies the Schrodinger equation (A3. 1 1 .2) ). 

One can show that the expectation value of the Hamiltonian operator for the wavepacket in equation 
(A3. 11.7) is: 

(ff\ = ^i> + 7r (A3.11.10) 

The first term is what one would expect to obtain classically for a particle of momentum ftk^ and it is much 
bigger than the second term provided k^a »1. Since the de Broglie wavelength X is 2n/k^ this condition is 
equivalent to the statement that the size of the wavepacket be much larger than the de Broglie wavelength. 

It is also notable that the spreading of the wavepacket can be neglected for times t such that t iwia Ih. In this 
time interval the centre of the wavepacket will have moved a distance {k^d)a. Under the conditions noted 
above for which k^a »1, this distance will be many times larger than the width of the packet. 

A3.11.2.3 WAVEPACKETS FOR THE COMPLETE SCATTERING PROBLEM 

The generalization of the treatment of the previous section to the determination of a wavepacket for the 
Hamiltonian in equation (A3. 11.1) is accomplished by writing the solution as follows: 

/AC 
dke(k)Mx)*~* £l ' m (A3.11.H) 

where \\r v is the solution of the time-independent Schrodinger equation 


Hfa=Etfa (A3.11.12) 

for an energy E^. By substituting equation (A3.ll.ll) into the time-dependent Schrodinger equation one can 
readily show that \|/ is a solution. 

However, it is important to make sure that \|/ satisfies the desired boundary conditions initially and finally. 
Part of this is familiar already, since we have already demonstrated in equation (A3. 11.3) , equation (A3. 11.5) 

and equation (A3. 11.7) that use of v^ = e and a Gaussian C(k) gives a Gaussian wavepacket which moves 
with momentum hk^. This is the behaviour that is of interest initially (t — » -go) in the limit of x — » -co. 

At the end of the collision (t — » +oo) one expects to see part of the wavepacket moving to the right for x — » oo 
(the transmitted part) and part of it moving to the left for x — » -co (the reflected part). Both this and the t — » -co 
boundary condition can be satisfied by requiring that 


f k {x) = c KJF + flc" w ' v (A3.1 1.13a) 

.T->-!X. 

= ' e (A3.1 1.13b) 

where R and Tare as yet undetermined coefficients that will be discussed later and K= (2m(E - V^/ft 2 ) 

where F Q is the value of the potential in the limit x — » oo. Note that Vq specifies the energy difference between 
the potential in the right and left asymptotic limits, and it has been assumed that E > Vq, as otherwise there 
could not be travelling waves in the x — » co limit. 

To prove that equation (A3. 1 1.13 gives a wavepacket which satisfies the desired boundary conditions, we 
note that substitution of equation (3.1 1.13 into equation (A3.ll.ll) gives us two wavepackets which roughly 
speaking are given by 

tf™ fe a-Cr-AM/J"* 3 /** 3 + jjg-^tat/mJ 3 ^ (A3.1 1.14a) 

In the t -^ -co limit, only the first term, representing a packet moving to the right, has a peak in the x^> -co 
region (the left asymptotic region). The second term peaks in the right asymptotic region but this is irrelevant 
as equation A3.1 1.1 4( does not apply there. Thus, in the left asymptotic region the second term is negligible 
and all we have is a packet moving to the right. For t -^ +co, equation A3.1 1.1 4( still applies in the left 
asymptotic region, but now it is the second term which peaks and this packet moves to the left. 

Now substitute equation A3.1 1.1 3( into equation (A3.ll.ll) . Ignoring various unimportant terms, we obtain 

V- ** re -*-MW*W (A3.1 1.14ft) 


This formula represents a packet moving to the right centred at x = hk^tlm. For t -^ -co, this is negligible in 

the right asymptotic region, so the wavefunction is zero there, while for t -^ +co this packet is large for x -^ 
+co just as we wanted. 


A3.11.2.4 FLUXES AND PROBABILITIES 


Now let us use the wavepackets just discussed to extract the physically measurable information about our 
problem, namely, the probabilities of reflection and transmission. As long as the wavepackets do not spread 
much during the collision, these probabilities are given by the general definition: 


Itotal flux out ao i ne; for process ofiiiterestl 

probability = ■ -^— ^ — ^— — ■ (A3.11.15) 

| total flux incident | 

where the flux is the number of particles per unit time that cross a given point (that cross a given surface in 
three dimensions), and the total flux is the spatial integral of the instantaneous flux. Classically the flux is just 
pv where p is the density of particles (particles per unit length in one dimension) and v is the velocity of the 
particles. In quantum mechanics, the flux / is defined as 

/ = Re[^*lV] (A3.11.16) 

where vis the velocity operator (v = {-ihlm)dldx in one dimension) and Re implies that only the real part of \|/* 
v\|/ is to be used. 

To see how equation (A3. 1 1.16) works, substitute equation (A3. 11.7) into (A3. 1 1.16). Under the condition 
that wavepacket spreading is small (i.e., htlma «1) we obtain 

/ = I — j7r" ,/ -tf" l cxp[-U" -hhtjmfja 1 ] (A3.11. 17) 

which is just v |\|/ | where v Q is the initial most probable velocity (v Q = hkjm). In view of equation 

A3. 11.14(a) , this is just the incident flux. The integral of this quantity over all space (the total flux) is 
flit ,. 

For the reflected wave associated with equation (A3 .11.13a) , the total outgoing flux is /^J 1 = |>?| 2 r^so the 
reflection probability P R is 

P H = |ff | 2 , (A3.1 1.18a) 
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A similar calculation of the transmission probability gives 


P T = —\T\ 2 (A3.1 1.186) 

WO 


where 


w 


A3.11.2.5 TIME-INDEPENDENT APPROACH TO SCATTERING 


fit 

^ (A3. 11.19) 


Note from equation (A3 .11.1 8a) , equation (A3. 1 1.18 that all of the physically interesting information about 
the scattering process involves the coefficients R and T which are properties of the time independent 
wavefunction \\f k obtained from equations (A3 .11.12) with the boundary conditions in equations (A3. 11. 13) . 
As a result, we can use scattering theory completely in a time independent picture. This picture can be thought 
of as related to the time dependent picture by the superposition of many Gaussian incident wavepackets to 
form a plane wave. The important point to remember in using time independent solutions is that the 
asymptotic solution given by equations (A3 .11.13) involves waves moving to the left and right that should be 
treated separately in calculating fluxes since these solutions do not contribute at the same time to the 
evolution of \|/ (x, i) in the t — » ±oo limits. As a result, fluxes are evaluated by substituting either the left or 
right moving wavepacket parts of equations (A3. 1 1.13) into ( A3. 11. 16 ). 

A3.11.2.6 SCATTERING MATRIX 

It is useful to rewrite the asymptotic part of the wavefunction as 

$ k ix) = e'** + Sn e - '** (A3.11.20a) 

t ^= K Sillkfk) 1 ' 2 C" 1 ^ (A3.11.20/)) 

where the coefficients S^ and S^ 2 are two elements of a 2 x 2 matrix known as the scattering (S) matrix. The 
other two elements are associated with a different scattering solution in which the incident wave at t — » -co 
moves to the left in the x^> +oo region. The boundary conditions on this solution are 

^_ = e-^ + S^e*** 

v ^ +0t; _ (A3. 11.21) 

= S 2l ikfk) l/2 z- lk *. 


The S matrix has a number of important properties, one of which is that it is unitary. Mathematically this 

means that S + S = 1 where S + is the Hermitian conjugate (transpose of complex conjugate) of S. This property 
comes from the equation of continuity, which says that for any solution \\f to the time dependent Schrodinger 
equation, 


fl l^| 5 + 3/ =0 (A3.11.22) 

'41 3jt 

where / is the flux from equation (A3. 11.16) . equation (A3. 1 1 .22) can be proved by substitution of equation 
(A3. 11. 16) and the time dependent Schrodinger equation into (A3. 1 1.22). 


If \\f = \|/£(x)e , |\|/| is time independent, so equation (A3. 1 1.22) reduces to 31/ dx = 0, which implies lis a 

constant (i.e., flux is conserved), independent of x. If so then the evaluation of I at x — » +oo and at x — » -oo 
should give the same result. By directly substituting equations (A3 .11.13) into ( A3. 11. 16 ) one finds 


fit 


(A3. 11.23) 


and since these two have to be equal, we find that 

|5iil" + |S|7| : = I (A3.11.24) 

which indicates that the sum of the reflected and transmitted probabilities has to be unity. This is one of the 
equations that is implied by unitarity of the S matrix. The other equations can be obtained by using the 
solution \\f k ( equation (A3 .11.21) ) and by using a generalized flux that is defined by 

/ ir = Rc(^l^^r). (A3. 11.25) 

Another useful property of the S matrix is that it is symmetric. This property follows from conservation of the 
fluxlike expression 

I kI =ll£l$ k vfa) (A3.11.26) 

which differs from equation (A3. 1 1.25) in the absence of a complex conjugate in the wavefunction \|/^. The 
symmetry property of S implies that S^ 2 in equation (A3. 1 1 20b) equals S 21 in equation (A3. 11.21) . Defining 
the probability matrix P by the relation 

P u =\S XJ \ 2 (A3. 11.27) 
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we see that symmetry of S implies equal probabilities for the / ->j andy -^ i transitions. This is a statement of 
the principle of microscopic reversibility and it arises from the time reversal symmetry associated with the 
Schrodinger equation. 

The probability matrix plays an important role in many processes in chemical physics. For chemical reactions, 
the probability of reaction is often limited by tunnelling through a barrier, or by the formation of metastable 
states (resonances) in an intermediate well. Equivalently, the conductivity of a molecular wire is related to the 
probability of transmission of conduction electrons through the junction region between the wire and the 
electrodes to which the wire is attached. 

A3.11.2.7 GREEN'S FUNCTIONS FOR SCATTERING 

Now let us write down the Schrodinger equation (A3 .11.12) ) using equation (A3. 11.1) for H and assuming 
that Vq in figure A3. 11.1 is zero. The result can be written 


(^ + * 2 ) ^ ( " V) = ^T V(x), ^ {xh (A3.11.28) 

One way to solve this is to invert the operator on the left hand side, thereby converting this differential 
equation into an integral equation. The general result is 

2m f x 

^U)=ftU) + 77 / 6VA, A- # )V(j')^U'jdx (A3.11.29) 

where G Q is called the Green function associated with the operator d I doc + kr and q> k is a solution of the 
homogeneous equation that is associated with equation (A3. 1 1 .28), namely 

(|j+A 2 )wU)=0. (A3.11.30) 

To determine G Q (x,x'), it is customary to reexpress equation (A3. 1 1.28) in a Fourier representation. Let F k (kf) 
be the Fourier transform of \\fk(x). Taking the Fourier transform of equation (A3. 1 1.28), we find 

-^L / *P' x ik 2 -k*')F k (h')dk r = -1= ( c^Btk^dk' (A3.11.31) 
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where 


Bik')^—^ I C-^ x ^rV(.\)f { (x)d.\, (A3.11.32) 

v2,t J -k tr 


Equation (A3 .11.31) implies 

it, tb^ — _ 

k*-k n - 


Btk f ) 
F k ik')=-, -y (A3. 11.33) 


and upon inverting the Fourier transform we find 


1 f^ B(kf) 2m f x 

lM-0 = — = / c 1 ** . dk' = -y / Guix.x'Wlx'^ix^d.x' (A3.11.34) 

V2.7 /-% k--k- /r ,/_^ 


where the Green function is given by 


2jt y_ 30 


(A3. 11.35) 


The evaluation of the integral in equation (A3. 1 1.35) needs to be done carefully as there is a pole at k' = ±k. A 
standard trick to do it involves replacing k by k±is where s is a small positive constant that will be set to zero 
in the end. This reduces equation (A3. 1 1.35) to 


I f#> ,/ I 

Gnix.x) = — — I mi / cxp[iA'U -a)] — 

%7Tk t ^J-oo \k - k 


1 


±ie k + k f ± \€ 


\ dk\ 


(A3. 11.36) 


This integral can be done by contour integration using the contours in figure A3. 11.2 . For the +is choice, the 
contour in figure A3. 11.2(a) is appropriate for x < x' as the circular part has a negative imaginary k' which 

makes ei ^"^ ) vanish for \kf\-> oo. Likewise for x > x\ we want to use the contour in figure A3. 11. 2(b) as this 
makes the imaginary part of k' positive along the circular part. In either case, the integral along the real axis 
equals the full contour integral, and the latter is determined by the residue theorem to be 2ni times the residue 
at the pole which is encircled by the contour. 
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Figure A3. 11.2. Integration contours used to evaluate equation (A3. 1 1.36) (a) for x < x\ (b) for x> x\ 

The pole is at k f = -k -is for the contour in figure A3. 1 1 .2(a) and at k f = k + is for figure A3. 1 1 .2(b). This 
gives us 


-ijVi.v-A-'t 


*.«■"- 1 lij:-. ;: 


for x < x f 
for x > x f 


(A3. 11.37) 


which we will call the 'plus' wave free particle Green function G* r A different Green function ('minus' wave) 
is obtained by using -is in the above formulas. It is 




(A3. 11.38) 


Upon substitution of G Q into equation (A3 .11.29) we generate the following integral equation for the solution 
^that is associated with G^: 


(A3. 11.39) 


For x — » ±00, it is possible to make tf^look like equation (A3. 1 1 .20) by setting (p^(x) = e 1 . This shows that 

the plus Green function is associated with scattering solutions in which outgoing waves move to the right in 
the x — » oo limit. For x — » -oo, equation (A3. 1 1.39) becomes 
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By comparison with equation A3 . 1 1 .20(a) , we see that 


(A3. 11.40) 


*» = 4£<'''F n <' ) *' ( *' )d *' 


(A3. 11.41) 


which is an integral that can be used to calculate S, 1 provided that ^ is known. One can similarly show that 


S a = 1 - ^ jT e ]h ^V(,')^(/)dr. 


(A3. 11.42) 


The other S matrix components S^j and S 22 can be obtained from the Gg Green function. 


A3.11.2.8 BORN APPROXIMATION 


If F(x) is 'small', ^will not be perturbed much from what it would be if V(x) = 0. If so, then we can 
approximate ^*and obtain 


^—sC^w 


(A3. 11.43a) 


5 l2 = i--L r^-v^dx'. 


(A3. 11.43b) 


This is the one dimensional version of what is usually called the Born approximation in scattering theory. The 
transition probability obtained from equation A3.1 1.43() is 


ftp- | J-rX, 


(A3. 11.44) 


where/? = hk is the momentum. Note that this approximation simplifies the evaluation of transition 
probabilities to performing an integral. 

A number of improvements to the Born approximation are possible, including higher order Born 
approximations (obtained by inserting lower order approximations to ^ into equation (A3. 1 1 .40) , then the 

result into (A3. 1 1.41) and (A3. 1 1.42)), and the distorted wave Born approximation (obtained by replacing the 
free particle approximation for ^*by the solution to a Schrodinger equation that includes part of the 

interaction potential). For chemical physics 
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applications, the distorted wave Born approximation is the most often used approach, as the approximation of 
tfj^by a plane wave is rarely of sufficient accuracy to be even qualitatively useful. However, even the distorted 

wave Born approximation is poorly convergent for many applications, so other exact and approximate 
methods need to be considered. 

A3.11.2.9 VARIATIONAL METHODS 

A completely different approach to scattering involves writing down an expression that can be used to obtain 
S directly from the wavefunction, and which is stationary with respect to small errors in the wavefunction. In 
this case one can obtain the scattering matrix element by variational theory. A recent review of this topic has 
been given by Miller [32]. There are many different expressions that give S as a functional of the 
wavefunction and, therefore, there are many different variational theories. This section describes the Kohn 
variational theory, which has proven particularly useful in many applications in chemical reaction dynamics. 
To keep the derivation as simple as possible, we restrict our consideration to potentials of the type plotted in 
figure A3. 11.1(c) where the wavefunction vanishes in the limit of x — » -oo, and where the S matrix is a scalar 
property so we can drop the matrix notation. 

The Kohn variational approximation states that for a trial wavefunction ^which has the asymptotic form 

^ ^ _ w -l/2 (c -iA* _ e -'^£) (A3.11.45) 

that the quantity 

S = 5+-{*|//-£|*} (A3.11.46) 

rJ 

is stationary with respect to variations in ^, and S = S Qx where S QX is the exact scattering matrix when V= 
v|/ exacr Note that ^is not complex conjugated in calculating (*&|. 

To prove this we expand ^about the exact wavefunction \|/ , that is, we let 

* = * OT + AiK (A3. 11.47) 

v|/ ex here is assumed to have the asymptotic form 


*c, - - v - i/2 (e- ,k * - e-' ,kx S a ). 


(A3. 11.48) 
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This means that 




(A3. 11.49) 


where 85 = S - S Qx . Then we see that 




(A3. 11.50) 


since /7- £| Y. x > = 0. 


Now use integration by parts twice to show that 




-X 


which means that 


x 


-a 2 


+($+|ff -£!*„>. 


—x 


The last term vanishes, and so does the first at x = -co. The nonzero part is then 


(A3. 11.51) 


(A3. 11.52) 


2» 


. - 1 (1 ( A - Lfe.v _ ft itt p \i,-W2/:i.\ A -ik 


<-P _, ' i (e _l " - e^SaJw-'^d*)* - "' «' + »-' "(-tit) 






iftM, 


(A3. 11.53) 


So overall 


ft 


(A3. 11.54) 
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which means that the deviations from the exact result are of second order. This means that S is stationary with 
respect to variations in the trial function. Later ( section A3. 11. 4 ) we will show how the variational approach 
can be used in practical applications where the scattering wavefunction is expanded in terms of basis 
functions. 


A3.11.3 MULTICHANNEL QUANTUM SCATTERING THEORY; 
SCATTERING IN THREE DIMENSIONS 

In this section we consider the generalization of quantum scattering theory to problems with many degrees of 
freedom, and to problems where the translational motion takes place in three dimensions rather than one. The 
simplest multidimensional generalization is to consider two degrees of freedom, and we will spend much of 
our development considering this, as it contains the essence of the complexity that can arise in what is called 
'multichannel' scattering theory. Moreover, models containing two degrees of freedom are of use throughout 
the field of chemical physics. For example, this model can be used to describe the collision of an atom with a 
diatomic molecule with the three atoms constrained to be collinear so that only vibrational motion in the 
diatomic molecule needs to be considered in addition to translational motion of the atom relative to the 
molecule. This model is commonly used in studies of vibrational energy transfer [ 29 ] where the collision 
causes changes in the vibrational state of the molecule. In addition, this model can be used to describe 
reactive collisions wherein an atom is transferred to form a new diatomic molecule [23, 23 and 24]- We will 
discuss both of these processes in the following two sections (A3. 1 1.3.1 and A3. 11. 3. 2 ). 

The treatment of translational motion in three dimensions involves representation of particle motions in terms 

of plane waves ei where the wavevector k specifies the direction of motion in addition to the magnitude of 
the velocity. For problems involving the motion of isolated particles, i.e., gas phase collisions, all problems 
can be represented in terms of eigenfunctions of the total angular momentum, which is a conserved quantity. 
The relationship between these eigenfunctions and the plane wave description of particle motions leads to the 
concept of a partial wave expansion, something that is used throughout the field of chemical physics. This is 
described in the third part of this section (A3. 11.3.3) . 

Problems in chemical physics which involve the collision of a particle with a surface do not have rotational 
symmetry that leads to partial wave expansions. Instead they have two dimensional translational symmetry for 
motions parallel to the surface. This leads to expansion of solutions in terms of diffraction eigenfunctions. 
This theory is described in the literature [33]. 

A3.11.3.1 MULTICHANNEL SCATTERING— COUPLED CHANNEL EQUATIONS 

Consider the collision of an atom (denoted A) with a diatomic molecule (denoted BC), with motion of the 
atoms constrained to occur along a line. In this case there are two important degrees of freedom, the distance 
R between the atom and the centre of mass of the diatomic, and the diatomic internuclear distance r. The 
Hamiltonian in terms of these coordinates is given by: 


H = — ^— + -^— - VIR. r) (A3.11.55) 

2}* AM' ^MIk: 
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where |u A BC is the reduced mass associated with motion in the R coordinate, and |u BC is the corresponding 
diatom reduced mass. Note that this Hamiltonian can be derived by starting with the Hamiltonian of the 


independent atoms and separating out the motion of the centre of mass. The second form (A3. 1 1.56) arises by 
replacing the momentum operators by their usual quantum mechanical expressions. 


h 1 & h 1 a* 


2/iA.lK df£~ 2li\iL <fr" 


T \ VOt.r) 


(A3. 11.56) 


We concentrate in this section on solving the time-independent Schrodinger equation, which, as we learned 
from section A3. 11. 2. 5 , is all we need to do to generate the physically meaningful scattering information. If 
BC does not dissociate then it is reasonable to use the BC eigenfunctions as a basis for expanding the 
scattering wavefunction. Assume that as R — » go, V{R,r) — » V BC (r). Then the BC eigenfunctions are solutions 
to 


( - ' . , + V<r ) J 9Ar) = i\.ip v (r) 


(A3. 11.57) 


where 8 is the vibrational eigenvalue. The expansion of *F in terms of the BC eigenfunctions is thus given by 


*(fl.r) = J]vV(>UMK] 


(A3. 11.58) 


where the g are unknown functions to be determined. This equation is called a coupled channel expansion. 
Substituting this into the Schrodinger equation, we find 




-h 2 d- 


= Ej^f>AR)<pAr). 


Vv(r) 


(A3. 11.59) 


Now rearrange, multiply by cp , and integrate to obtain 


— t- H- 
=7 ■ "p' = (£ - «.-)*r - J^WAV - Vacates 

-/'ABC ti«~ ~r 


(A3. 11.60) 


or 
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dR 1 


= J^U„ ,{R)gAR) 


(A3. 11.61) 


where 


(/,,, =2t£*L { E -*,)&„.- + =^{&\V - Vnrl^O- (A3.11.62) 

Jr ft* 

In matrix-vector form these coupled-channel equations are 

-^ = Uff (A3. 11.63) 

where g is the vector formed using the gv as elements and U is a matrix whose elements are U ,. Note that the 
internal states may be either open or closed, depending on whether the energy E is above or below the internal 
energy s . Only the open states (often termed open channels) have measureable scattering properties, but the 
closed channels can be populated as intermediates during the collision, sometimes with important physical 
consequences. In the following discussion we confine our discussion to the open channels. The boundary 
conditions on the open channel solutions are: 

fl(/0 -* Otis R -* (A3.11.64) 

provided that the potential is repulsive at short range, and 

${R) — v" ,/2 (e" ,lplp - e J **.S)as R -* OG. (A3.11.65) 

Here we have collected the N independent g that correspond to different incoming states for TV open channels 
into a matrix g (where the sans serif bold notation is again used to denote a square matrix). Also we have the 
matrices 

(V) ri , = Mm-' (A3.11.66) 

(k ),.,.■ = k r ii rV (A3. 11.67) 


where 


k> = Ayr |E _ (i) (A3.11.68) 

Tr 
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Jik r 

V r = (A3. 11.69) 

S = $ m . (A3. 11.70) 

S is the scattering matrix, analogous to that defined earlier. As before, the probabilities for transitions between 
states v and V are 


P^ = \S^\ l . (A3.11.71) 

Often in numerical calculations we determine solutions g (R) that solve the Schrodinger equations but do not 
satisfy the asymptotic boundary condition in ( A3. 11.65 ). To solve for S, we rewrite equation (A3. 11.65) and 
its derivative with respect to 7? in the more general form: 

g = (l-OS)A (A3.11.72) 

gf = (l'-O'S)A (A3. 11.73) 
where the incoming and outgoing asymptotic solutions are: 

, = K -i/^-i** (A3. 11.74) 

O = k~ W2 C 1hR . (A3. 11.75) 

A is a coefficient matrix that is designed to transform between solutions that obey arbitrary boundary 
conditions and those which obey the desired boundary conditions. A and S can be regarded as unknowns in 
equation (A3. 1 1.72) and equation (A3. 1 1.73). This leads to the following expression for S: 


S = W 'ilg-lg'HO'g-Og') ] W 


(A3. 11.76) 


where 

W = O'l - Ol'. (A3.11.77) 

The present derivation can easily be generalized to systems with an arbitrary number of internal degrees of 
freedom, and it leads to coupled channel equations identical with equation (A3. 1 1.63) , where the coupling 
terms ( A3. 11.62 ) are expressed as matrix elements of the interaction potential using states which depend on 
these internal degrees of 
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freedom. These internal states could, in principle, have a continuous spectrum but, in practice, if there are 
multiple continuous degrees of freedom then it is most useful to reformulate the problem to take this into 
account. One particularly important case of this sort arises in the treatment of reactive collisions, where the 
atom B is transferred from C to A, leading to the formation of a new arrangement of the atoms with its own 
scattering boundary conditions. We turn our attention to this situation in the next section. 

A3.11.3.2 REACTIVE COLLISIONS 

Let us continue with the atom-diatom collinear collision model, this time allowing for the possibility of the 
reaction A + BC — > AB + C. We first introduce mass-scaled coordinates, as these are especially convenient to 
describe rearrangements, using 

(A3. 11.78) 


The choice of m in these formulas is arbitrary, but it is customary to take either m = 1 or 


(A3. 11.79) 


" J = 1/ ; = V^-BCMBO (A3.11.80) 

V HI a +WB + Wit- 
Either choice is invariant to permutation of the atom masses. 
In terms of these coordinates, the Hamiltonian of equation (A3. 1 1.55) becomes 

P 2 P 2 P' 2 + P'? 

H = — 1 — * V = — ^ — < V. (A3.11.81) 

I^ajm 2^hc: 2jjj 

One nice thing about Hin mass-scaled coordinates is that it is identical to the Hamiltonian of a mass point 
moving in two dimensions. This is convenient for visualizing trajectory motions or wavepackets, so the mass- 
scaled coordinates are commonly used for plotting data from scattering calculations. 

Another reason why mass-scaled coordinates are useful is that they simplify the transformation to the Jacobi 
coordinates that are associated with the products AB + C. If we define S as the distance from C to the centre 
of mass of AB, and s as the AB distance, mass scaling is accomplished via 


The Hamiltonian in terms of product coordinates is 


g* _ j M'C.AB ^ (A3. 11.82) 

HI 
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$' = t**Ls. (A3.11.83) 


P 2 + P 2 

// - r $ T f * \ V (A3. 11.84) 


2m 

and the transformation between reagent and product coordinates is given by 

A ,f = / sin £ + tf' cos /J 
&' = — r'eo&/J + /f" s>iri /f 

where the angle (3 is defined by: 


(A3. 11.85) 


tan p = 


/ Wtf(l»A * '"U f Wc) (A3. 11.86) 


Equation (A3. 1 1.85) implies that the i?',r' — » 5", s' transformation is orthogonal, a point which is responsible 
for the similarities between the Hamiltonian expressed in terms of reagent and product mass-scaled 
coordinates (( A3. 11. 81 ) and (A3. 1 1.84)). In fact, the reagent to product transformation can be thought of as a 
rotation by an angle (3 followed by a flip in the sign of s f . The angle P is sometimes called the 'skew' angle, 
and it can vary between and 90°, as determined by equation (A3. 1 1.86). If m A = m B = m c (i.e., all three 
masses are identical, as in the reaction H + H 2 ), then p = 60°, while for m B »m A , m c , p — » 90° and m B #m A m c 

gives P — » 0. 

Although the Schrodinger equation associated with the A + BC reactive collision has the same form as for the 
nonreactive scattering problem that we considered previously, it cannot be solved by the coupled-channel 
expansion used then, as the reagent vibrational basis functions cannot directly describe the product region (for 
an expansion in a finite number of terms). So instead we need to use alternative schemes of which there are 
many. 

One possibility is to use hyperspherical coordinates, as these enable the use of basis functions which describe 
reagent and product internal states in the same expansion. Hyperspherical coordinates have been extensively 
discussed in the literature [34, 35 and 36] and in the present application they reduce to polar coordinates (p, r|) 
defined as follows: 

^ =v ^?= = yFT^ 0</, <OC (A3.11.87) 

if = (air 1 !— < t) < fi, (A3.1 1 .88) 
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Hyperspherical coordinates have the properties that r| motion is always bound since r| = and r| = p 
correspond to cases where two of the three atoms are on top of one another, yielding a very repulsive 
potential. Also, p — > is a repulsive part of the potential, while large p takes us to the reagent and product 
valleys. 

To develop coupled-channel methods to solve the Schrodinger equation, we first transform the Hamiltonian 
( A3. 11. 81 ) to hyperspherical coordinates, yielding: 


A = ^L(lL + L± + l-*) + Y. (A3.11.89) 

liti \tifr p (if) fi- tiff 1 / 


Now define a new wavefunction % = P W- Then 


2m \ dp- p dp p 1 dt}- / 


(A3. 11.90) 


After cancelling out a factor p 1/2 and regrouping, we obtain a new version of the Schrodinger equation in 
which the first derivative term has been eliminated. 


f -7r [ 2 I I3M J 


(A3. 11.91) 


Now select out the r| -dependent part to define vibrational functions at some specific p which we call P. 


ft- d 


Imp 11 d^ 


-<p„+V(p h }})<p fi =&„¥„ 


(A3. 11.92) 


with the boundary condition that cp — » as r| — » and r| ^ (3. For large Pihe cp n will become eigenfunctions 
of the reagent and product diatomics. 

To set up coupled-channel equations, use the expansion 


* = p " 2 53ft-(»j)«ii(p)- 


(A3. 11.93) 


This leads to 


-Jr d 2 t-^ — ft 1 t— l d 2 ^F /., 7r \r^ ^ 


(A3. 11.94) 
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Now substitute the p = ^solution for d 2 cp dr| 2 from above, multiply by cp and integrate over r|. This gives 


d/3 2 


T^=U9 


(A3. 11.95) 


where 


„ 2m I 


P 2 


\ 2m , ( p 2 1r „\ 


(A3. 11.96) 


This equation may be solved by the same methods as used with the nonreactive coupled-channel equations 
(discussed later in section A3. 11. 4.2 ). However, because V(p, r|) changes rapidly with p, it is desirable to 
periodically change the expansion basis set cp w . To do this we divide the range of p to be integrated into 
'sectors' and within each sector choose a ^(usually the midpoint) to define local eigenfunctions. The coupled- 
channel equations just given then apply within each sector, but at sector boundaries we change basis sets. Let 
^j and ^ 2 be the ^associated with adjacent sectors. Then, at the sector boundary p b we require 


*i07,flO = *2{*ft) 


(A3. 11.97) 


or 


£>«0>- *>*!!,>*) = £**•(*- fc>*£>*)- (A3.11.98) 

Jl It 

Multiply by cp (s, P 2 ) and integrate to obtain 

q a> _ S 2l g tl) (A3.11.99) 

where 

Sll = {tpAl.fafatoPl)). (A3.11.100) 

The corresponding derivative transformation is: 

— — = S 21 -^— . (A3.11. 101) 

dp dp 
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This scheme makes it possible to propagate g from small p where g should vanish to large p where an 
asymptotic analysis can be performed. 

To perform the asymptotic analysis we need to first write down the proper asymptotic solution. Clearly we 
want some solutions with incoming waves in the reagents, then outgoing waves in both reagents and products 
and other solutions with the reagent and product labels interchanged. One way to do this is to define a matrix 
of incoming waves I and outgoing waves O such that 


AjuVl-' = ^ci'^y 


c ^ ™ J (A3.1 1.102a) 


ii ^ 1 (A3. 11.102/)) 

C 1 * 1 ■* Of = 2 

where a is an arrangement channel label such that a = 1 and 2 correspond to the 'reagents' and 'products'. 
Also let cp be the reagent or product vibrational function. Then the asymptotic solution is 




(A3. 11.103) 


We have expressed W in terms of Jacobi coordinates as this is the coordinate system in which the vibrations 
and translations are separable. The separation does not occur in hyperspherical coordinates except at p = go, so 
it is necessary to interrelate coordinate systems to complete the calculations. There are several approaches for 
doing this. One way is to project the hyperspherical solution onto Jacobi 's before performing the asymptotic 
analysis, i.e. 

(A3. 11.104) 


The G matrix is then obtained by performing the quadrature 

G vv = I dr^ L1 (r)^p" l/2 ^ ,r (^)Si- J, L.'(jo) (A3.11. 105) 

* V" 

where p(R, r), r\(R, r) are to be substituted as needed into the right hand side. 
A3.1 1.3.3 SCATTERING IN THREE DIMENSIONS 

All the theory developed up to this point has been limited in the sense that translational motion (the 
continuum degree of freedom) has been restricted to one dimension. In this section we discuss the 
generalization of this to three dimensions for collision processes where space is isotropic (i.e., collisions in 
homogeneous phases, such as in a 


-25- 

vacuum, but not collisions with surfaces). We begin by considering collisions involving a single particle in 
three dimensions; the multichannel case is considered subsequently. 

The biggest change associated with going from one to three dimensional translational motion refers to 
asymptotic boundary conditions. In three dimensions, the initial scattering wavefunction for a single particle 

is represented by a plane wave e lkr moving in a direction which we denote with the wavevector k. Scattering 
then produces outgoing spherical waves as t — » oo weighted by an amplitude ^(0) which specifies the 
scattered intensity as a function of the angle between k and the observation direction. Mathematically the 
time independent boundary condition analogous to equation (A3. 11. 13 a) , equation (A3 . 1 1 . 1 3 b) is: 

f k (r) = e ikr + M0)— . (A3.11.106) 

Note that for potentials that depend only on the scalar distance r between the colliding particles, the amplitude 
f k (Q) does not depend on the azimuthal angle associated with the direction of observation. 

The measurable quantity in a three dimensional scattering experiment is the differential cross section da k 
(0)/dQ. This is defined as 

d*Ti (ft ) I outgoing radial flux I 

= - 2_L L (A3.1 1.107) 

d £ I | total i nc i den t flux | 

where outgoing flux refers to the radial velocity operator v r = -iftt)/flr. Substitution of equation (A3. 1 1.106) 
into (A3 . 1 1 . 1 07) using ( A3. 11. 16 ) yields 

d(T<(0)/dft = \M&)\ 2 . (A3.1 1.108) 

It is convenient to expand f k (Q) in a basis of Legendre polynomials Z^(cos 0) (as these define the natural 


angular eigenfunctions associated with motion in three dimensions). Here we write: 




(A3. 11.109) 


We call this a partial wave expansion. To determine the coefficients ^*, one matches asymptotic solutions to 

the radial Schrddinger equation with the corresponding partial wave expansion of equation (A3. 1 1.106). It is 
customary to write the asymptotic radial Schrddinger equation solution as 

^,„(r,0,0) = -Y tim (0,<i>)(e- ittr - t1cm -S t J t * r - tMm ) (A3.11.110) 

r— *oq r 
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where S f is the scattering matrix for the £ th partial wave and m is the projection quantum number associated 
with £. Unitarity of the scattering matrix implies that S f can be written as exp(2i5 f ) where 5<=is a real quantity 
known as the phase shift. 

The asymptotic partial wave expansion of equation (A3 .11.1 06) can be developed using the identity 


c ife, = c i* f «»* = Y^\* { 2i + i)j t (kr)P t (cose) 


(A3.11. 111) 


t=U 


where ;< (Jtr)is a spherical Bessel function. At large r, the spherical Bessel function reduces to 


jAkr) = 


stti(£r - £jt/2) 


hk kr 


(A3.11.112) 


If equation (A3. 1 1.1 12) is then used to evaluate (A3. 1 1.1 1 1) after substitution of the latter into ( A3. 11. 106 ) 
and if equation (A3 .11.1 09) is also substituted into ( A3. 11. 106 ) and the result for each iand m is equated to 
( A3. 11. 110 ), one finds that only m = contributes, and that 




(A3.11.113) 


From equation (A3. 11.108) , equation (A3 .11.1 09) and equation (A3 .11.113) one then finds 


<i<j,m i 


i_ 


£(2£+l)P f (co S 0KS f -]) 
t 

2 


Y^(2t+ l)P t (coae)e iS ' sinS, 


r 


(A3.11.114a) 


(A3.11.114fo) 


This differential cross section may be integrated over scattering angles to define an integral cross section a as 
follows: 


c = in r 


sin 


^^^D 2 ^ 1 ^" 112 (A3.11.115a) 


tin k 


= Tt53^ £+ ^ sir7 *<* (A3.11. 1156) 


* 2 
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Equations A3.1 1.1 14(b) and A3. 11. 115(b) are in a form that is convenient to use for potential scattering 
problems. One needs only to determine the phase shift 5 f for each £, then substitute into these equations to 
determine the cross sections. Note that in the limit of large £, 5 f must vanish so that the infinite sum over 
partial waves iwill converge. For most potentials of interest to chemical physics, the calculation of Sf must be 
done numerically. 

Equation A3. 11. 115(a) is also useful as a form that enables easy generalization of the potential scattering 
theory that we have just derived to multistate problems. In particular, if we imagine that we are interested in 
the collision of two molecules A and B starting out in states n A and n B and ending up in states n' A and n' B , 
then the asymptotic wavefunction analogous to equation (A3 .11.1 06) is 

Km^w*u = exp(ifc^„, • r)|H A fl B ) l r~ l ^ f^^^n'JO) exp(i^, ri r)|n>' B } (A 3.11.116) 

where the scattering amplitude/is now labelled by the initial and final state indices. Integral cross sections are 
then obtained using the following generalization of equation A3 . 1 1 . 1 1 5 (a) : 

*****£* = T3— Z) (2/+ 1)|5 ^ S ^> B " ^Viii-vi I" (A3.11.117) 

where S is the multichannel scattering matrix, 8 is the Kronecker delta function and J is the total angular 
momentum (i.e., the vector sum of the orbital angular momentum iplus the angular momenta of the molecules 
A and B). Here the sum is over J rather than £, because £is not a conserved quantity due to coupling with 
angular momenta in the molecules A and B. 


A3.11.4 COMPUTATIONAL METHODS AND STRATEGIES FOR 
SCATTERING PROBLEMS 

In this section we present several numerical techniques that are commonly used to solve the Schrodinger 
equation for scattering processes. Because the potential energy functions used in many chemical physics 
problems are complicated (but known to reasonable precision), new numerical methods have played an 
important role in extending the domain of application of scattering theory. Indeed, although much of the 
formal development of the previous sections was known 30 years ago, the numerical methods (and 
computers) needed to put this formalism to work have only been developed since then. 

This section is divided into two sections: the first concerned with time-dependent methods for describing the 
evolution of wavepackets and the second concerned with time-independent methods for solving the time 
independent Schrodinger equation. The methods described are designed to be representative of what is in use, 


but not exhaustive. More detailed discussions of time-dependent and time-independent methods are given in 
the literature [37, 38]. 
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A3.1 1.4. 1 TIME-DEPENDENT WA VEPA CKET METHODS 

(A) OVERALL STRATEGY 

The methods described here are all designed to determine the time evolution of wavepackets that have been 
previously defined. This is only one of several steps for using wavepackets to solve scattering problems. The 
overall procedure involves the following steps: 

(a) First, choose an initial wavepacket \\r(x,t) that describes the range of energies and initial conditions that 
we want to simulate and which is numerically as well behaved as possible. Typically, this is chosen to 
be a Gaussian function of the translational coordinate x, with mean velocity and width chosen to 
describe the range of interest. In making this choice, one needs to consider how the spatial part of the 
Schrodinger equation is to be handled, i.e., whether the dependence of the wavepacket on spatial 
coordinates is to be represented on a grid, or in terms of basis functions. 

(b) Second, one propagates this wavepacket in time using one of the methods described below, for a 
sufficient length of time to describe the scattering process of interest. 

(c) Third, one calculates the scattering information of interest, such as the outgoing flux. 

Typically, the ratio of this to the incident flux determines the transition probability. This information will be 
averaged over the energy range of the initial wavepacket, unless one wants to project out specific energies 
from the solution. This projection procedure is accomplished using the following expression for the energy 
resolved (time-independent) wavefunction in terms in terms of its time-dependent counterpart: 




3C 

(A3.11.118) 
re 


where 


tf[£) = 4T<C-*V(0)>. (A3.11.119) 

(B) SECOND ORDER DIFFERENCING 

A very simple procedure for time evolving the wavepacket is the second order differencing method. Here we 
illustrate how this method is used in conjunction with a fast Fourier transform method for evaluating the 
spatial coordinate derivatives in the Hamiltonian. 

If we write the time-dependent Schrodinger equation as d\\f/dt = -(i/fi)/?\|/, then, after replacing the time 
derivative by a central difference, we obtain 

at il/U +■ At) - \[f(t -HAM 

_ — Zll t±H I1L —L (A3.1 1.120) 
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After rearranging this becomes 

fU + AO = ${1 - Ai) - -^Hir(t). (A3.11.121) 

ft 

To invoke this algorithm, we need to evaluate //^ — {7" + \/)^. If \|/ is represented on a uniform grid in 

coordinate x, an effective scheme is to use fast Fourier transforms (FFTs) to evaluate f\\f. Thus, in one 
dimension we have, with n points on the grid, 


it(kjj) = J2 d k ' x "ir(x mf t) (A3.11.122) 

JW=1 

where the corresponding momentum grid is 

*/= — — ■ (A3. 11.123) 

Differentiation of (A3 .11.122) then gives 

PI ~ 

7>(Jtj, t) =■ — ^{A>0 (A3.1 1.124a) 

2m 

where/?. = hkJm. This expression can be inverted to give 


t*-i i 


i pt'j— I - 

it *— ^ /*n 


it . ^— ' 2m 


(A3. 11.124/)) 


This expression, in combination with (A3. 1 1.122), determines the action of the kinetic energy operator on the 
wavefunction at each grid point. The action of ^is just F(x.)\|/(x.) at each grid point. 

(C) SPLIT-OPERATOR OR FEIT-FLECK METHOD 

A more powerful method for evaluating the time derivative of the wavefunction is the split-operator method 
[39]. Here we start by formally solving ihd\\f/dt = fl\\f with the solution iHO = c l ' ' ' *$ (0). Note that H is 
assumed to be time-independent. Now imagine evaluating the propagator e~ lJf?, %ver a short time interval. 

^-W&ifh _ c -i(f+V)Af/R fe ^-iV At f 2h c -\f£ktfh c -\Vt!ktf 2h (A3. 11.125) 
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Evidently, this formula is not exact if Tand vdo not commute. However for short times it is a good 
approximation, as can be verified by comparing terms in Taylor series expansions of the middle and right- 
hand expressions in ( A3. 11. 125 ). This approximation is intrinsically unitary, which means that scattering 
information obtained from this calculation automatically conserves flux. 

The complete propagator is then constructed by piecing together N time steps, leading to 


e -iff^/fl = e -iV>*r/2ft ^-iT&Tflr^-V&ifK a a a e -\Y&T/X p-it&ffli ^iV&t/lb (A3.1 1.126) 

To evaluate each term we can again do it on a grid, using FFTs as described above to evaluate e -if£j/ft. 

(D) CHEBYSHEV METHOD 

Another approach [ 40 ] is to expand ^-iHAi/hin. terms of Chebyshev polynomials and to evaluate each term in 
the polynomial at the end of the time interval. Here a Chebyshev expansion is chosen as it gives the most 
uniform convergence in representing the exponential over the chosen time interval. The time interval At is 
typically chosen to be several hundred of the time steps that would be used in the second order differencing or 
split-operator methods. Although the Chebyshev method is not intrinsically unitary, it is capable of much 
higher accuracy than the second order differencing or split-operator methods [41]. 


In order to apply this method it is necessary to scale //to lie in a certain finite interval which is usually chosen 
to be (-1,1). Thus, if K^ and K r 
maximum kinetic energy, we use 


to be (-1, 1). Thus, if F Qv and V m ■ are estimates of the maximum and minimum potentials and T Qv is the 

nidx. mill nidx. 


H niam = [H- (R+V^n/R = H ^ - 1 (A3.11.127) 


where 


R=<T amx + V ma *-V min )/2. (A3.1 1.128) 

This choice restricts the range of values of // norm to the interval (0,1). Then the propagator becomes 

g-itt&i/Ti _ 2-\H lwm 1l±ifh Q-i{R~V mn )&Jfh (A3. 11. 129) 

Now we replace the first exponential in the right-hand side of (A3. 1 1.129) by a Chebyshev expansion as 
follows: 

jV 

e-'^ = ^ u ^ v ~^^^C k MRAtmi k T k {-H^ m ) (A3.11.130) 
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where T k is a Chebyshev polynomial, and J k is a Bessel function. The coefficients C k are fixed at C Q = 1, C k : 
2,£<0. 


To apply this method, the J k are calculated once and stored while the T k are generated using the recursion 
formula: 


7^itr)=2*7H.O-7i_iU) 


(A3.11. 131] 


with Tq = 1 , Tj = x. Actually the T k are never explicitly stored, as all we really want is T k operating onto a 
wavefunction. However, the recursion formula is still used to generate this, so the primary computational step 
involves // norm operating onto wave functions. This can be done using FFTs as discussed previously. 

(E) SHORT ITERATIVE LANCZOS METHOD 

Another approach involves starting with an initial wavefunction \|/ , represented on a grid, then generating 
/?\|/q, and consider that this, after orthogonalization to \|/ , defines a new state vector. Successive applications 

//can now be used to define an orthogonal set of vectors which defines as a Krylov space via the iteration: (n 
= 0,...,7V) 


A.L|*iril)=(ff-« B )lft)-Arl^-L} 


(A3. 11.132) 


where 


<*, ( =<lMf/|,M &i] = {fklffl^i}. 


(A3. 11.133) 


The Hamiltonian in this vector space is 


H = 


/c*«, fa ^ 

fa "I fc 

fa. «2 fa 

fa "3 fa 

V / 


(A3. 11.134) 


Here H forms an N x N matrix, where N is the dimensionality of the space and is generally much smaller than 
the number of grid points. 

Now diagonalize H, calling the eigenvalues X k and eigenvectors T (t . Numerically, this is a very efficient 
process due to the tridiagonal form of (A3. 1 1.134). The resulting eigenvalues and eigenvectors are then used 
to propagate for a short time At via 
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[$(, + A*}]; = [ft (OlMOj; Yl E 


mthh 


^at^m^mwj 


-T fk e-^*'f\ 


1)1, 


(A3. 11.135) 


where (^ )^are coefficients that transfer between they'th grid point and the £th order Krylov space in 
( A3.ll.132 ). 


A3.11.4.2 TIME-INDEPENDENT METHODS 

Here we discuss several methods that are commonly used to propagate coupled-channel equations, and we 
also present a linear algebra method for applying variational theory. The coupled-channel equations are 
coupled ordinary differential equations, so they can in principle be solved using any one of a number of 
standard methods for doing this (Runge-Kutta, predictor-corrector etc). However these methods are very 
inefficient for this application and a number of alternatives have been developed which take advantage of 
specific features of the problems being solved. 

(A) GORDON-TYPE METHODS 

In many kinds of atomic and molecular collision problem the wavefunction has many oscillations because the 

energy is high (i.e., g(R) « Q lkR and k is large). In this case it useful to expand g(R) in terms of oscillatory 
solutions to some reference problem that is similar to the desired one and then regard the expansion 
coefficients as the quantity being integrated, thereby removing most or all of the oscillations from the time 
dependence of the coefficients. 

For example, suppose that we divide coordinate space into steps AR, then evaluate U (in equation (A3 .11.61) ) 
at the middle of each step and regard this as the reference for propagation within this step. Further, let us 
diagonalize U, calling the eigenvalues u k and eigenvectors T. Then, as long as the variation in eigenvalues and 
eigenvectors can be neglected in each step, the Schrodinger equation solution within each step is easily 
expressed in terms of sin and cos w^AR, where w& = V - "*( or exponential solutions if u k < 0). In particular, if 
g(7? ) is the solution at the beginning of each step, then Tg transforms into the diagonalized representation and 
Tg' is the corresponding derivative. The complete solution at the end of each step would then be 

g(fi]) = T _l (w- 3 s\n{v/AR)Tq'(R ) + cos(wAR)TQ(R i} )) (A3.11.136) 

g^«i}=T" l (cos(wA J K)Tg'(/f )-wsin(wAi?)Tg(« )). (A3.1 1.137) 

In principle, one can do better by allowing for ^-dependence to U and T. If we allow them to vary linearly 
with 7?, then we have Gordon's method [42]. However, the higher order evaluation in this case leads to a 
much more cumbersome theory that is often less efficient even though larger steps can be used. 

One problem with using this method (or any method that propagates \|/) is that in regions where u k > 0, the so- 
called 'closed' channels, the solutions increase exponentially. If such solutions exist for some channels while 
others are still open, the closed-channel solutions can become numerically dominant (i.e., so much bigger that 
they overwhelm the open-channel solutions to within machine precision and, after a while, all channels 
propagate as if they are closed). 
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To circumvent this, it is necessary to 'stabilize' the solutions periodically. Typically this is done by 
multiplying g (R) and g' (R) by some matrix h that 'orthogonalizes' the solutions as best one can. For 

example, this can be done using h = g _1 (i? ) where R is the value of R at the end of the 'current' step. Thus, 
after stabilization, the new g and g' are: 

0nw(*t ) = 9o]dC^)g^ L (if-i) = I (A3.11.138) 


g^.(/f, v ) = g;, d (/?,)fl^(«4). (A3. 11.139) 

One consequence of performing the stabilization procedure is that the initial conditions that correspond to the 
current g (R) are changed each time stabilization is performed. However this does not matter as long the initial 
g (R) value corresponds to the limit R — » as then all one needs is for g (7?) to be small (i.e., the actual value 
is not important). 

(B) LOG DERIVATIVE PROPAGATION 

One way to avoid the stabilization problem just mentioned is to propagate the log derivative matrix Y(R) [43], 
This is defined by 

Y(tf) = g'(tf)g~ l (7f) (A3.11. mo) 

and it remains well behaved numerically even when g (7?) grows exponentially. The differential equation 
obeyed by Y is 

r(/o = g r (tf>g-V/0-g^)g _1 ^ (A3.11.141) 

It turns out that one cannot propagate Y using standard numerical methods because \Y\ blows up whenever \g\ 
is zero. To circumvent this one must propagate Y by 'invariant imbedding'. The basic idea here is to construct 
a propagator Y which satisfies 


( S'(fl') \ _ { Yi (R\ R") Y 2 (^ ST)\( -9<*') \ 


(A3. 11.142) 


where R f and R" might form the beginning and end of a propagation step. Assuming for the moment that we 
know what the Y z - are, then the evolution of Y is as follows 

Y{fl ?/ ) = g / (r)g^ l {/? v ) = -Y i g(^)g- | (fi'') + Y 4 g(if")g- l {fi^ (A3.11.143) 

Y(fi') = g'(/t')g- L (fl') = -YigtflVW) + Y 2 g(/f"}g- | {iO. (A3.ii.144) 

So the overall result is 

Y(R V ) = Y 4 - Y 3 [Y(fi') + Yi]"'Y2. (A3.11.145) 
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To solve for the Y, we begin by solving a reference problem wherein the coupling matrix is assumed diagonal 
with constant couplings within each step. (These could be accomplished by diagonalizing U, but it would be 
better to avoid this work and use the diagonal U matrix elements.) Then, in terms of the reference U (which 
we call U d ), we have 

g(/?]} = U" 1 smiU 4 AR)q f (R i) ) + cos{U li AR)qiRo) (A3.11.146) 


g'(ffi) =cosUdA/?g(«o) - Ud sinU d A RgiRn) (A3. 11.147) 

Now rearrange these to: 

S \R ) ^U d (sin- 1 UdA/f)- | g(ff l )-U J coiU d Aflg(ff c ) (A3.1 1.148) 

g'(/?i) = Urf cotUdA/?g(/?i) - U^tsinUrfA^J-'gCWn) (A3. 11.149) 

which can be written: 


(tfWn)\ ( U,,cotU,,AK UjtsinUrfAffJ-'W-gtiJo)^ 
^g'(i?i)A W sinU rf AJ? r' U rf cotU^Ai? ^ gW / 


(A3. 11.150) 


Note that |U^A/?| < is required for meaningful results and thus AR cannot be too large. By comparing 
equation (A3. 1 1 .142) and equation (A3 . 1 1 . 1 50), we find: 

Y, = Y 4 = U„cotU,,AK (A3.11.151) 

Y 2 =Y 3 = U d (sinU d A /?)-'. (A3.11. 152) 

The standard log-derivative propagator now corrects for the difference between U and \i d using a Simpson- 
rule integration. The specific formulas are 

Yl -> Yl+CKflj) (A3.1 1.153) 


V2 -* ?2 (A3.1 1.154) 


y.i -> y? (A3. n.155) 


y4-> yj + Q(J?i). (A3.11. 156) 


Then for a step divided into two halfsteps, at R = a, c, b, write c = l/2(a+b), AR = (b-a)/2, Au = U - U^, a = 
Rq, b = R l . This leads to the following expression for Q: 
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Q(n) = Au(fl) (A3. 11.157) 

e 


„ % \ /, Aft 2 \ 4A/f , , 


4 - V J 1 ',' 

(A3. 11.158) 


Q(/?) = AU(/;), (A3.1 1.159) 

Propagation then proceeds from R^ Oto large 7?, then the scattering matrix is easily connected to Y at large 
R. 

(C) VARIATIONAL CALCULATIONS 

Now let us return to the Kohn variational theory that was introduced in section A3. 11. 2. 8 . Here we 
demonstrate how equation (A3 . 1 1 .46) may be evaluated using basis set expansions and linear algebra. This 
discussion will be restricted to scattering in one dimension, but generalization to multidimensional problems 
is very similar. 

To construct *P, we use the basis expansion 

* = -i^ + ^u,(r)C, (A3.1 1.160) 


where u^(r) is a special basis function which asymptotically looks like u^{r) ~ v e , and ^i = "nis the 
outgoing wavepart, multiplied by a coefficient C 1 which is S.. Typically the complete form of u^ is chosen to 
be 

U^ = v' lf2 J\r)Q' lkr (A3.11.161) 

where fir) is a function which is unity at larger r and vanishes at small r. The functions u^ u^ . . . are taken to 
be square integrable and the coefficients C 1? . . .,C^are to be variationally modified. Now substitute i'into the 
expression for S. This gives 

S = S ■ -I - uo ■ J^ UjQ H-E - wo + J^ titCt J 

= 5 + ^<«fl[ff " fflMo) - ^ y)(ffol« - £|«r)C, (A3.11.162) 
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Let us define a matrix M via 

Mo.O = fau|//-E|w ) (A3.11. 163) 


(M,), = («,\H - E\ U q) (A3. 11.164) 

(M) st > = {u,|ff - E\uf). (A3. 11.165) 

Also, employ integration by parts to convert (plus 5= Cj), yielding 

y £<ii |£ " ^l«r)C r = y £<«,|H - E|«o)C, - y(«5)< (A3. 11.166) 

This replaces ( A3.1 1.162 ) with 

S = y } (m»m ~ 2 £ C r M, M + £ C r C, A/;. A 

Now apply the variational criterion as follows: 


±S = 0= i(-2H„ + 2j>*,). 


This leads to: 


and thus: 


(A3. 11.167) 


(A3. 11.168) 


MC = Mo (A3. 11.169) 


C = M" ] Mo (A3.11.170) 


and the S matrix is given by: 

s = J^M^-2M^fA- l ^f v ^^f^n- } uu- i M^ = ^(m^-m'm-'m,,). (A3.11.1713 

This converts the calculation of S to the evaluation of matrix elements together with linear algebra operations. 
Generalizations of this theory to multichannel calculations exist and lead to a result of more or less the same 
form. 
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A3.11.5 CUMULATIVE REACTION PROBABILITIES 

A special feature of quantum scattering theory as it applies to chemical reactions is that in many applications 


it is only the cumulative reaction probability (CRP) that is of interest in determining physically measurable 
properties such as the reactive rate constant. This probability ^ cum (also denoted N(E)) is obtained from the S 
matrix through the formula: 

^um = ^ ^ \Sfj f, (A3.1 1.172) 

i / 

Note that the sums are restricted to the portion of the full S matrix that describes reaction (or the specific 
reactive process that is of interest). It is clear from this definition that the CRP is a highly averaged property 
where there is no information about individual quantum states, so it is of interest to develop methods that 
determine this probability directly from the Schrodinger equation rather than indirectly from the scattering 
matrix. In this section we first show how the CRP is related to the physically measurable rate constant, and 
then we discuss some rigorous and approximate methods for directly determining the CRP. Much of this 
discussion is adapted from Miller and coworkers [44, 45 ]. 

A3.11.5.1 RATE CONSTANTS 

Consider first a gas phase bimolecular reaction (A + B — » C + D). If we consider that the reagents are 
approaching each other with a relative velocity v, then the total flux of A moving toward B is just vC A where 
C A is the concentration of A (number of A per unit volume (or per unit length in one dimension)). If a is the 
integral cross section for reaction between A and B for a given velocity v (a is the reaction probability in one 
dimension), then for every B, the number of reactive collisions per unit time is ovC A . The total number of 
reactive collisions per unit time per unit volume (or per unit length in one dimension) is then o y C A C B where 
C B is the concentration of B. Equating this to the rate constant k times C A C B leads us to the conclusion that 

k=OV. (A3.11.173) 

This rate constant refers to reactants which all move with a velocity v whereas the usual situation is such that 
we have a Boltzmann distribution of velocities. If so then the rate constant is just the average of (A3. 1 1.173) 
over a Boltzmann distribution P r 


B" 


-I 


k(T)= I Pu(v)v<?(v)dv. (A3.11. 174) 


This expression is still oversimplified, as it ignores the fact that the molecules A and B have internal states 
and that the cross section a depends on these states; a depends also on the internal states of the products C 
and D. Letting the indices i and/denote the internal states of the reagents and products respectively, we find 
that a in equation (A3. 1 1 .174) must be replaced by ^/ ^r/and the Boltzmann average must now include the 

internal states. 
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Thus, equation (A3 .11.1 74) becomes: 


(A3. 11.175) 




where p B (i) is the internal state Boltzmann distribution. 

Now let us write down explicit expressions for/? B (/), ^P R (v 7 .) and a^ Denoting the internal energy for a given 
state i as z f and the relative translational energy as E* = 1/2^/ u? 5 we have (in three dimensions) 

/>b(0= C-^' Ar /(?.ni (A3. 11.176) 

and 

Pb =47r(^./2^frr} 3/2 ^cxp(-/i.^/2AT) (A3.11. 177) 

where Q. x is the internal state partition function. 

The cross section a yis related to the partial wave reactive scattering matrix * S f j through the partial wave sum 
(i.e., equation (A3 .11.117) evaluated for n A n B ^ n^,n B ). 

*V = Jll2 {2J + 1)|5 '7 |2 (A3.11.178) 

■j" J 

where k f = \ivJh. Now substitute equation (A3. 1 1.176), equation (A3. 1 1.177) and equation (A3. 1 1.178) into 
(A3. 1 1.175). Replacing the integral over v. by one over E. leads us to the expression 


^=a£%w^^l % -' ,,a ^ 2j ^^ 2iK ' 


(A3. 11.179) 


If we now change the integration variable from E- x to the total energy E = E f + e., we can rewrite equation 
(A3. 11. 179) as 




k(T) = ^t^; / ^~ F/kT P mm {E)dE/kT (A3.1 1.180) 
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where 2 trfmQ is the translational partition function per unit volume: 


-trans 


<MW 


(A3.11.181) 


and ^ cum is the cumulative reaction probability that we wrote down in equation (A3 .11.1 72) , but generalized 
to include a sum over the conserved total angular momentum J weighted by the usual 2J+ 1 degeneracy: 

Pc^iE) = J2 aJ " l >SEl^| 2 - (A3.11.182) 

J i / 

Note that in deriving equation (A3. 1 1 . 1 80) , we have altered the lower integration limit in equation 
(A3 .11.1 82) from zero to -s . by defining Sf J-to be zero for E . < 0. 

In one physical dimension, equation (A3 .11.180) still holds, but 2 trans i s given by its one dimensional 
counterpart and ( A3. 11. 172) is used for the CRP. 

A3.11.5.2 TRANSITION STATE THEORY 

The form of equation (A3. 11.182) is immediately suggestive of statistical approximations. If we assume that 

the total reaction probability 2JSJ 2 is zero for E<E% and unity for E > E$ where E$ is the energy of a 
critical bottleneck (commonly known as the transition state) then 


^ = ^(2/H)^^-£;} 


(A3. 11.183) 


where h is a Heaviside (step) function which is unity for positive arguments and zero for negative arguments, 
and we have added the subscript / to £:^since the bottleneck energies will in general be dependent on internal 

state. 

Equation (A3. 1 1.183) is simply a formula for the number of states energetically accessible at the transition 
state and equation (A3 .11.1 80) leads to the thermal average of this number. If we imagine that the states of the 
system form a continuum, then J^JmGEJcan be expressed in terms of a density of states p as in 

Plm( E )= [ flHz)dx (A3.11.184) 

Ji) 

Substituting this into the integral in equation (A3 .11.180) and inverting the order of integration, one obtains 
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P e~ £/kT ( f pHe)de\dE/kT = /* p*{e)( /* z~ E/kT dEfkAfa. 


(A3. 11.185) 


The inner integral on the right-hand side is just e T }, so equation (A3. 1 1.185) reduces to the transition state 
partition function (leaving out relative translation): 

(A3. 11.186) 




Using this in equation (A3 .11.180) gives the following 


*(D = 




(A3. 11.187) 


This is commonly known as the transition state theory approximation to the rate constant. Note that all one 
needs to do to evaluate (A3. 1 1.187) is to determine the partition function of the reagents and transition state, 
which is a problem in statistical mechanics rather than dynamics. This makes transition state theory a very 
useful approach for many applications. However, what is left out are two potentially important effects, 
tunnelling and barrier recrossing, both of which lead to CRPs that differ from the sum of step functions 
assumed in ( A3. 11. 183 ). 

A3.11.5.3 EXACT QUANTUM EXPRESSIONS FOR THE CUMULATIVE REACTION PROBABILITY 

An important development in the quantum theory of scattering in the last 20 years has been the development 
of exact expressions which directly determine either P cum (E) or the thermal rate constant k(T) from the 
Hamiltonian H. Formally, at least, these expressions avoid the determination of scattering wavefunctions and 
any information related to the internal states of the reagents or products. The fundamental derivations in this 
area have been presented by Miller [44] and by Schwartz et al [45]. 

The basic expression of P(E) is 


P w i£) = \{27ih) 2 Tr[£ H£ - H)F8[E - ft)] 


(A3. 11.188) 


where Pis the symmetrized flux operator: 


F = 2 
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(A3. 11.189) 


Note that equation (A3. 1 1.188) includes a quantum mechanical trace, which implies a sum over states. The 
states used for this evaluation are arbitrary as long as they form a complete set and many choices have been 
considered in recent work. Much of this work has been based on wavepackets [ 46 ] or grid point basis 
functions [47]. 

An exact expression for the thermal rate constant is given by: 


k= Q 


/ dt 


Cr(t) 


(A3. 11.190) 


where Cp) is a flux-flux correlation function 


CrtO = WV"^F c -'"'^). 


(A3.11.191; 


Here t Q is a complex time which is given by t Q = t - ihllkT. Methods for evaluating this equation have 

included path integrals [45], wavepackets [48, 49] and direct evaluation of the trace in square integrable basis 
sets [50]. 


A3.11.6 CLASSICAL AND SEMICLASSICAL SCATTERING THEORY 

Although the primary focus of this article is on quantum scattering theory, it is important to note that classical 
and semiclassical approximations play an important role in the application of scattering theory to problems in 
chemical physics. The primary reason for this is that the de Broglie wavelength associated with motions of 
atoms and molecules is typically short compared to the distances over which these atoms and molecules move 
during a scattering process. There are exceptions to this of course, in the limits of low temperature and energy, 
and for light atoms such as hydrogen atoms, but for a very broad sampling of problems the dynamics is close 
to the classical limit. 

A3.11.6.1 CLASSICAL SCATTERING THEORY FOR A SINGLE PARTICLE 

Consider collisions between two molecules A and B. For the moment, ignore the structure of the molecules, 
so that each is represented as a particle. After separating out the centre of mass motion, the classical 
Hamiltonian that describes this problem is 

H = ±fiT 2 + V[r) (A3.1 1.192) 

where the reduced mass is |u = m A m B /(m A + m B ) and the potential Fonly depends on the distance r between 
the particles. Because of the spherical symmetry of the potential, motion of the system is confined to a plane. 
It is convenient to use polar coordinates, r, 0, cp and to choose the plane of motion such that cp = 0. In this case 
the orbital angular momentum is: 

|L| = |p x p r \ = fir 2 L (A3.1 1.193) 


-42- 


Since angular momentum is conserved, equation (A3 .11.1 92) may be rearranged to give the following implicit 
equation for the time dependence of r: 


/. 


dr 

= t 2 -1|. (A3.1 1.194) 


The time dependence of can then be obtained by integrating ( A3. 11. 193 ). 

The physical situation of interest in a scattering problem is pictured in figure A3. 11.3 . We assume that the 
initial particle velocity v is coincident with the z axis and that the particle starts at z = -co, with x = b = impact 
parameter, and y = 0. In this case, L = \ivb. Subsequently, the particle moves in the x, z plane in a trajectory 
that might be as pictured in figure A3. 11.4 (here shown for a hard sphere potential). There is a point of closest 
approach, i.e., r = r 2 (inner turning point for r motions) where 


E = t + V(jv). 


If we define ^ = at r = r,, then the explicit trajectory motion is determined by 


(A3. 11.195) 


~L V(E-lJ{2iLrl-Vir))2fti 


(A3. 11.196) 


(A3. 11.197) 


The final scattering angle is defined using = 0(7 = go). There will be a correspondence between b and that 
will tend to look like what is shown in figure A3. 11.5 for a repulsive potential (here given for the special case 
of a hard sphere potential). 
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z 


Figure A3. 11.3. Coordinates for scattering of a particle from a central potential. 



Figure A3.11.4. Trajectory associated with a particle scattering off a hard sphere potential. 
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Figure A3.11.5. Typical dependence of b on x (shown for a hard sphere potential). 

In an ensemble of collisions, the impact parameters are distributed randomly on a disc with a probability 
distribution P(b) that is defined by P{b) db = 2nb db. The cross section da is then defined by 


da = 27tbdb. (A3. 1 1.198) 

Now da = (da/dQ) dQ or (do/2nd cos0)27id cos0 = [da/27rd(cos0)]27i sin0 d0 = I((ti)2n sin0 d0 where 7(co) is 
the differential cross section. Therefore 




dh 


dfl 


sitiOdO sintf 

where the absolute value takes care of the case when db/dQ < 0. The integral cross section is 


(A3. 11.199) 


= 2jt / Is'm0d0 = 27i f h 


dh (A3. 11.200) 
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where Z? max is the value of the impact parameter that is associated with a scattering angle of (i.e., scattering 
in the forward direction). Note that for a potential with infinite range (one that does not go to zero until r = oo), 
the cross section predicted by ( A3. 11.200 ) is infinite. This is not generally correct, except for a Coulomb (1/r) 
potential. This is a classical artifact; the corresponding quantum mechanical result is finite. 

A simple example of a finite range potential is the hard sphere, for which V(r) = for r > a, V(r) = oo for r < 
a. By geometry one can show that 2cp + = n and sin cp = b/a. Therefore 


b = a sin I — — — ] = a cos 0/2 
\2 2J 


(A3. 11.201; 


— = — Bin 0/2 (A3.1 1 .202) 

60 2 

***WMmmw_± (A3 ,, 203) 


and 


a 2 f 


(A3. 11.204) 


This shows that the differential cross section is independent of angle for this case, and the integral cross 
section is, just as expected, the area of the circle associated with the radius of the sphere. More generally it is 
important to note that there can be many trajectories which give the same for different b'. In this case the 
DCS is just the sum over trajectories. 

fkW) = J^df Pf(C0Sfl), (A3.11.205) 

t 

An explicit result for the differential cross section (DCS) can be obtained by substituting L=pb = \ivb into the 


following expression: 



r 




L/ur 2 


b/r 2 


J(2/tJi){E -V- L 1 /!^ 2 ) /a" 


V/E 
b/r 2 


(2}iE)b 2 J2ttEr 2 ) 


(A3. 11.206) 


J\-V/E-b 2 /r 2 


To integrate this expression, we note that starts at n when r = <x>, then it decreases while r decreases to its 
turning point, then r retraces back to go while continues to evolve back to ji. The total change in is then 
twice the integral 
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ni f dr/, V b 2 \ 


1/2 


(A3. 11.207) 


Note that obtained this way can be negative. Because of cylindrical symmetry, only |0| (or mod n) means 
anything. 

For a typical interatomic potential such as a 6-12 potential, Q(b) looks like figure A3. 1 1.6 rather than A3. 11. 5 . 
This shows that for some there are three b (one for positive and two for negative 0) that contribute to the 
DCS. The where the number of contributing trajectories changes value are sometimes called rainbow 
angles. At these angles, the classical differential cross sections have singularities. 


0> 



Figure A3. 11.6. Dependence of scattering angle x on impact parameter for a 6-12 potential. 


A3.11.6.2 CLASSICAL SCATTERING THEORY FOR MANY INTERACTING PARTICLES 


To generalize what we have just done to reactive and inelastic scattering, one needs to calculate numerically 
integrated trajectories for motions in many degrees of freedom. This is most convenient to develop in space- 
fixed Cartesian coordinates. In this case, the classical equations of motion (Hamilton's equations) are given 
by: 

Jii = — = ^- (A3. 11.208) 

dpi mi 
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Pi = -—=-— - (A3. 11.209) 

dXj ft*; 

where m . is the mass associated with the /th degree of freedom and the second equality applies to Cartesian 
coordinates. Methods for solving these equations of motion have been described in review articles, as have 
procedures for defining initial conditions [27]. Note that for most multidimensional problems it is necessary to 
average over initial conditions that represent the internal motions of the species undergoing collision. These 
averages are often determined by Monte Carlo integration (i.e., randomly sampling the coordinates that need 
to be averaged over). The initial conditions may be chosen from canonical or microcanonical ensembles, or 
they may be chosen to mimic an initially prepared quantum state. In the latter case, the trajectory calculation 
is called a 'quasiclassical' trajectory calculation. 

A3.11.6.3 SEMICLASSICAL THEORY 

The obvious defect of classical trajectories is that they do not describe quantum effects. The best known of 
these effects is tunnelling through barriers, but there are others, such as effects due to quantization of the 
reagents and products and there are a variety of interference effects as well. To circumvent this deficiency, 
one can sometimes use semiclassical approximations such as WKB theory. WKB theory is specifically for 
motion of a particle in one dimension, but the generalizations of this theory to motion in three dimensions are 
known and will be mentioned at the end of this section. More complete descriptions of WKB theory can be 
found in many standard texts [1, 2, 3, 4 and 5, 18 ]. 

(A) WKB THEORY 

In WKB theory, one generates a wavefunction that is valid in the ft^> limit using a linear combination of 
exponentials of the form 

*U) = A(jc)c 15Wfl (A3.11.210) 

where A(x) and S(x) are real (or sometimes purely imaginary) functions that are derived from the Hamiltonian. 
This expression is, of course, very familiar from scattering theory applications described above ( A3. 11.2 ), 
where A(x) is a constant, and S(x) is kx. More generally, by substituting (A3. 1 1.210) into the time independent 
Schrodinger equation in one dimension, and expanding A(x) and S(x) in powers of ft, one can show that the 
leading terms representing S(x) have the form: 


(A3. 11.211] 


Atx) = [E-V{x)]- lf4 


Six) = ±j ^(£- VU))dx. 


Note that the integrand in S(x) is just the classical momentum p{x), so S(x) is the classical action function. In 

addition, A(x) is proportional top~ , which means that |\|/| 2 is proportional to the inverse of the classical 
velocity of the particle. 
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This is just the usual classical expression for the probability density. Note that A(x) and S(x) are real as long as 
motion is classically allowed, meaning that E > V{x). \iE> V(x), then S(x) becomes imaginary and \|/(x) 
involves real rather than complex exponentials. At the point of transition between allowed and forbidden 
regions, i.e., at the so-called turning points of the classical motion, A(x) becomes infinite and the solutions 
above are not valid. However, it is possible to 'connect' the solutions on either side of the turning point using 
'connection formulas' that are determined from exact solutions to the Schrodinger equation near the turning 
point. The reader should consult the standard textbooks [I, 2, 3, 4 and 5, 18] for a detailed discussion of this. 

In applications to scattering theory, one takes linear combinations of functions of the form ( A3. 11.210 ) to 
satisfy the desired boundary conditions and one uses the connection formulas to determine wavefunctions that 
are valid for all values of x. By examining the asymptotic forms of the wavefunction, scattering information 
can be determined. For example, in applications to scattering from a central potential, one can solve the radial 
Schrodinger equation using WKB theory to determine the phase shift for elastic scattering. The explicit result 
depends on how many turning points there are in the radial motion. 

(B) SCATTERING THEORY FOR MANY DEGREES OF FREEDOM 

For multidimensional problems, the generalization of WKB theory to the description of scattering problems is 
often called Miller-Marcus or classical S-matrix theory [51]. The reader is referred to review articles for a 
more complete description of this theory [52], 

Another theory which is used to describe scattering problems and which blends together classical and 
quantum mechanics is the semiclassical wavepacket approach [53]. The basic procedure comes from the fact 
that wavepackets which are initially Gaussian remain Gaussian as a function of time for potentials that are 
constant, linear or quadratic functions of the coordinates. In addition, the centres of such wavepackets evolve 
in time in accord with classical mechanics. We have already seen one example of this with the free particle 
wavepacket of equation (A3. 11.7) . Consider the general quadratic Hamiltonian (still in one dimension but the 
generalization to many dimensions is straightforward) 


H = "f-^7 + Va + VAX - X t ) + 1 -V XX {X - X t )\ (A3.1 1.212) 

The Gaussian wavepacket is written as 

f(x, n = exp[(\fh)a t (x - x t ) 2 + (\/7i) Pi (x - x t ) + {i/h)y f l (A3.1 1.213) 


Here x f andp t are real time dependent quantities that specify the average position and momentum of the 
wavepacket (p t = (p), x f = (x)) and a^ and y t are complex functions which determine the width, phase and 
normalization of the wavepacket. 

Inserting equation (A3. 1 1.213) into /?\|/ = iftd\\f/dt, and using equation (A3. 1 1.212), leads to the following 
relation: 


-49- 

-ff,U - *) 2 + tfff, i t - r> r )U -Jfi) - Yt + Pr-Vr]^ = l[(2/m)<r t 2 + k,,,](.v - x,) 2 

-j J (A3. 11.214) 

< (Ittipifm i V v ){.r-*> * V D - ilux t fm ^ plf2ni\$. 

Comparing coefficients of like powers of (x - jc ) then gives us three equations involving the four unknowns: 

*, = -{2/m)af - V H /2 (A3.1 1.215a) 

2a r i| - /i, = 2a t ptfm + K* (A3. 11.21 5ft) 

j>, = lfuxjfm + /;, i f - V - p?/2m- (A3. 11.215c) 

To develop an additional equation, we simply make the ansatz that the first term on the left-hand side of 
equation (3.1 1.2156) equals the first term on the right-hand side and similarly with the second term. This 
immediately gives us Hamilton's equations 

X t = p t fm (A3.1 1.216a) 


p t =-V x (A3.11.216b) 
from which it follows that x f and /?^ are related through the classical Hamiltonian function 

H = p^ /2m + Vo = E^ (A3.11.217) 
Equations (A3. 1 1.216) can then be cast in the general form 

i, =3H/3p f (A3.1 1.218a) 

- p, = 3H/dx t (A3.11.218b) 
and the remaining two equations in equation (A3. 1 1.215) become 

d r = -{2/m)af - l ^V xx (A3.1 1.219a) 

Yi = ]Titx t /m - p,i r - £\ (A3. 11.219b) 


It is not difficult to show that, for a constant potential, equation (A3. 1 1.218) and equation (A3. 1 1.219) can be 
solved to give the free particle wavepacket in equation (A3. 11.7) . More generally, one can solve equation 
(A3. 1 1.218) and equation (A3. 1 1.219) numerically for any potential, even potentials that are not quadratic, 
but the solution obtained will be exact only for potentials that are constant, linear or quadratic. The deviation 
between the exact and Gaussian wavepacket solutions for other potentials depends on how close they are to 
being locally quadratic, which means 
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how well the potential can be approximated by a quadratic potential over the width of the wavepacket. Note 
that although this theory has many classical features, the h^> limit has not been used. This circumvents 
problems with singularities in the wavefunction near classical turning points that cause trouble in WKB 
theory. 
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A3.12 Statistical mechanical description of 
chemical kinetics: RRKM 


William L Hase 


A3.12.1 INTRODUCTION 

As reactants transform to products in a chemical reaction, reactant bonds are broken and reformed for the 
products. Different theoretical models are used to describe this process ranging from time-dependent classical 
or quantum dynamics [1,2], in which the motions of individual atoms are propagated, to models based on the 
postulates of statistical mechanics [3]. The validity of the latter models depends on whether statistical 
mechanical treatments represent the actual nature of the atomic motions during the chemical reaction. Such a 
statistical mechanical description has been widely used in unimolecular kinetics [4] and appears to be an 
accurate model for many reactions. It is particularly instructive to discuss statistical models for unimolecular 
reactions, since the model may be formulated at the elementary microcanonical level and then averaged to 
obtain the canonical model. 

Unimolecular reactions are important in chemistry, physics, biochemistry, materials science, and many other 
areas of science and are denoted by 

A + — products (A3. 12.1) 

where the asterisk denotes that the unimolecular reactant A contains sufficient internal vibrational/rotational 
energy to decompose. (Electronic excitation may also promote decomposition of A, but this topic is outside 
the purview of this presentation.) The energy is denoted by E and must be greater than the unimolecular 
decomposition threshold energy E^. There are three general types of potential energy profiles for 
unimolecular reactions (see figure A3. 12.1 ). One type is for an isomerization reaction, such as cyclopropane 
isomerization 


C\h Clh 

V 

cu 2 


CH 3 — €H=CU 2 


for which there is a substantial potential energy barrier separating the two isomers. The other two examples 
are for unimolecular dissociation. In one case, as for formaldehyde dissociation 

H,CO -* H 2 + CO 

there is a potential energy barrier for the reverse association reaction. In the other, as for aluminium cluster 
dissociation 

Al„ -hn A1,j_l + Al 

there is no barrier for association. 





micwn uoonxm 

Figure A3. 12.1. Schematic potential energy profiles for three types of unimolecular reactions, (a) 
Isomerization. (b) Dissociation where there is an energy barrier for reaction in both the forward and reverse 
directions, (c) Dissociation where the potential energy rises monotonically as for rotational ground-state 
species, so that there is no barrier to the reverse association reaction. (Adapted from [5].) 


A number of different experimental methods may be used to energize the unimolecular reactant A. For 
example, energization can take place by the potential energy release in chemical reaction, i.e. 


■•< 


■CI[ 2 + C]I 3 SiII 3 -► CI^SiNsCII, 

or by absorption of a single photon, 

CB*NC + /nj-> CH 3 NC*- 

Extensive discussions of procedures for energizing molecules are given elsewhere [5]. 

Quantum mechanically, the time dependence of the initially prepared state of A* is given by its wavefunc\|/(%), 
which may be determined from the equation of motion 

ifi — -— = tf*(n. (A3.12.2) 

At 

At the unimolecular threshold of moderate to large size molecules (e.g. C 2 H 6 to peptides), there are many 
vibrational/rotational states within the experimental energy resolution dE and the initial state of A* may decay 
by undergoing transitions to other states and/or decomposing to products. The former is called intramolecular 
vibrational energy redistribution (IVR) [6]. The probability amplitude versus time of remaining in the initially 
prepared state is given by 

C(0 = (*(0)|*(0) (A3.12.3) 

and is comprised of contributions from both IVR and unimolecular decomposition. The time dependence of 

the unimolecular decomposition may be constructed by evaluating |\|/(0I inside the potential energy barrier, 
within the reactant region of the potential energy surface. 

In the statistical description of unimolecular kinetics, known as Rice-Ramsperger-Kassel-Marcus (RRKM) 
theory [4,7,8], it is assumed that complete IVR occurs on a timescale much shorter than that for the 
unimolecular reaction [9]. Furthermore, to identify states of the system as those for the reactant, a dividing 
surface [10], called a transition state, is placed at the potential energy barrier region of the potential energy 
surface. The assumption implicit in RRKM theory is described in the next section. 


A3.12.2 FUNDAMENTAL ASSUMPTION OF RRKM THEORY: 
MICROCANONICAL ENSEMBLE 

RRKM theory assumes a microcanonical ensemble of A* vibrational/rotational states within the energy 
interval E — » E + dE, so that each of these states is populated statistically with an equal probability [4]. This 
assumption of a microcanonical distribution means that the unimolecular rate constant for A* only depends on 
energy, and not on the manner in which A* is energized. If N(0) is the number of A* molecules excited at t = 
in accord with a microcanonical ensemble, the microcanonical rate constant k(E) is then defined by 


-dA'in 


61 


= k(E)N{t)\ r ^. (A3.12.4) 


The rapid IVR assumption of RRKM theory means that a microcanonical ensemble is maintained as the A* 
molecules decompose so that, at any time t, k(E) is given by 

~ dA<n =k{E)N{ty (A3.12.5) 

As a result of the fixed time-independent rate constant k(E), N(t) decays exponentially, i.e. 

Nit) = N(0)£X\y[-klE)tl (A3.12.6) 

A RRKM unimolecular system obeys the ergodic principle of statistical mechanics[ll]. 

The quantity -dN(t)/[N(t)dt] is called the lifetime distribution P(t) [12] and according to RRKM theory is 
given by 

Pin = fc(£)expf-fc(£)r]. (A3.12.7) 

Figure A3. 12.2(a) illustrates the lifetime distribution of RRKM theory and shows random transitions among 
all states at some energy high enough for eventual reaction (toward the right). In reality, transitions between 
quantum states (though coupled) are not equally probable: some are more likely than others. Therefore, 
transitions between states must be sufficiently rapid and disorderly for the RRKM assumption to be 
mimicked, as qualitatively depicted in figure A3. 12.2(b) . The situation depicted in these figures, where a 
microcanonical ensemble exists at t = and rapid IVR maintains its existence during the decomposition, is 
called intrinsic RRKM behaviour [9]. 


fa) RRKM model 
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Figure A3. 12.2. Relation of state occupation (schematically shown at constant energy) to lifetime distribution 
for the RRKM theory and for various actual situations. Dashed curves in lifetime distributions for (d) and (e) 
indicate RRKM behaviour, (a) RRKM model, (b) Physical counterpart of RRKM model, (c) Collisional state 
selection, (d) Chemical activation, (e) Intrinsically non-RRKM. (Adapted from [9].) 

The lifetime distribution will depend in part on the manner in which the energy needed for reaction is 
supplied. In many experiments, such as photoactivation and chemical activation, the molecular 
vibrational/rotational states are excited non-randomly. Regardless of the pattern of the initial energizing, the 
RRKM model of rapid IVR requires the distribution of states to become microcanonical in a negligibly short 
time. Three different possible lifetime distributions are represented by figure A3. 12.2(d). As shown in the 
middle, the lifetime distribution may be similar to that of RRKM theory. In other cases, the probability of a 
short lifetime with respect to reaction may be enhanced or reduced, depending on the location of the initial 
excitation within the molecule. These are examples of apparent non-RRKM behaviour [9] arising from the 
initial non-random excitation. If there are strong internal couplings, P(t) will become 


that of RRKM theory, ( equation A3. 12.5 ), after rapid IVR. A classic example of apparent non-RRKM 
behaviour is described below in section A3. 12.8.1. 


A situation that arises from the intramolecular dynamics of A* and completely distinct from apparent non- 
RRKM behaviour is intrinsic non-RRKM behaviour [9]. By this, it is meant that A* has a non-random P(i) 
even if the internal vibrational states of A* are prepared randomly. This situation arises when transitions 
between individual molecular vibrational/rotational states are slower than transitions leading to products. As a 
result, the vibrational states do not have equal dissociation probabilities. In terms of classical phase space 
dynamics, slow transitions between the states occur when the reactant phase space is metrically decomposable 
[13,14] on the timescale of the unimolecular reaction and there is at least one bottleneck [9] in the molecular 
phase space other than the one defining the transition state. An intrinsic non-RRKM molecule decays non- 
exponentially with a time-dependent unimolecular rate constant or exponentially with a rate constant different 
from that of RRKM theory. 

The above describes the fundamental assumption of RRKM theory regarding the intramolecular dynamics of 
A*. The RRKM expression for k(E) is now derived. 


A3.12.3 THE RRKM UNIMOLECULAR RATE CONSTANT 

A3.1 2.3.1 DERIVATION OF THE RRKM K(E) 

As discussed above, to identify states of the system as those for the reactant A*, a dividing surface is placed at 
the potential energy barrier region of the potential energy surface. This is a classical mechanical construct and 
classical statistical mechanics is used to derive the RRKM k(E) [4]. 

In the vicinity of the dividing surface, it is assumed that the Hamiltonian for the system may be separated into 
the two parts 

H = H t + H' (A3.12.8) 

where H^ defines the energy for the conjugate coordinate and momentum pair q^P\ and H' gives the energy 
for the remaining conjugate coordinates and momenta. This special coordinate q, is called the reaction 
coordinate. Reactive systems which have a total energy H = E and a value for q^ which lies between q\ are q\ 
+ dgf called microcanonical transition states. The reaction coordinate potential at the transition state is E^. The 
RRKM k{E) is determined from the rate at which these transition states form products. 

The hypersurface formed from variations in the system's coordinates and momenta at H(p, q) = E is the 
microcanonical system's phase space, which, for a Hamiltonian with 3n coordinates, has a dimension of 6n - 
1. The assumption that the system's states are populated statistically means that the population density over 
the whole surface of the phase space is uniform. Thus, the ratio of molecules at the dividing surface to the 
total molecules [d7V(q* pf)/N] 


may be expressed as a ratio of the phase space at the dividing surface to the total phase space. Thus, at any 
instant in time, the ratio of molecules whose reaction coordinate and conjugate momentum have values that 
range from a* to a* + dq* and from/?* top* +dp* to the total number of molecules is given by 


dN(qt>ph &t;<ip]f^J H=F £ - £i d^,,,d4 r d/4 , dpi 

' ' = z ■ (A3. 12.9) 

N J ... J H=E dcf]>.< d^ rr d/j| <> dpi if 

where E* is the translational energy in the reaction coordinate. One can think of this expression as a reactant- 

transition state equilibrium constant for a microcanonical system. The term dq* dp* divided by Planck's 

constant is the number of translational states in the reaction coordinate and the surface integral in the 

numerator divided by h ~ is the density of states for the 3n - 1 degrees of freedom orthogonal to the 
reaction coordinate. Similarly, the surface integral in the denominator is the reactant density of states 

multiplied by }? n . 


To determine k(E) from equation (A3. 12.9) it is assumed that transition states with positive/?* form products. 
Noting that/?, = |u 1 dq^/dt, where |u 1 is the reduced mass of the separating fragments, all transition states that 
lie within q* and qj + dq* with positive/?* will cross the transition state toward products in the time interval dt 
= |u 1 dcj\p\. Inserting this expression into equation (A3. 12.9), one finds that the reactant-to-product rate (i.e. 
flux) through the transition state for momentum/?* is 


<i.Ylgf - p\) = Nl ^T i I - - ■ fn-F.-F.\-B. H ■ -M, d/ J : - M,. (A3.12.10) 

dt j . . . j M=E Aqi . , . dii ^ d/J i . . , d/>*„ 

Since the energy in the reaction coordinate is E% = j pT"/2(i 1 , its derivative is dE% =^fd^f/|u 1 so that equation 
(A3. 12. 10) can be converted into 

diVl*/f,j>i> = ^ lg i / ■ ■ fa l-l>-l» d( *2 ■ ■ di li d l^ ■ ■ ■ d PJ 
dt / „ r f M=£ d</|,,,d^d/jj r .dp^ 

This equation represents the reaction rate at total energy E with a fixed energy in the reaction coordinate E* 
and may be written as 


6N(E. Ej)fdj = kiE. El)N&E\ (A3.12.12) 


where k(E, £j) is a unimolecular rate constant. As discussed above, the integrals in equation (A3.12.ll) are 
densities of states p, so k(E,E$) becomes 


kiB , Eh = rlZ-Zo-Zll (A3.12.13) 

1 hp{E) 

To find the total reaction flux, equation (A3 .12. 12) must be integrated between the limits E* equal to and E - 
E , so that 




Jo 


F.-E, 


k(E.El)dE\ = k{E)N 


(A3. 12. 14) 


where, using equation (A3. 12. 13), k(E) is given by 


r£-£b 


k(E) = 


^- ft V:(£-^-fipdAY #*(£-£*) 


(A3. 12. 15) 


AplE) 


M£) 


The term iV* (E-Eq) is the sum of states at the transition state for energies from to E-Eq. Equation 
(A3. 12. 15) is the RRKM expression for the unimolecular rate constant. 

Only in the high-energy limit does classical statistical mechanics give accurate sums and densities of state 
[15]. Thus, in general, quantum statistical mechanics must be used to calculate a RRKM k(E) which may be 
compared with experiment [16]. A comparison of classical and quantum harmonic (see below) RRKM rate 
constants for C 2 H 5 ^H+C 2 H 4 is given in figure A3. 12.3 [17]. The energies used for the classical calculation 
are with respect to the reactant's and transition state's potential minima. For the quantum calculation the 
energies are with respect to the zero-point levels. If energies with respect to the zero-point levels were used in 
the classical calculation, the classical k(E) would be appreciably smaller than the quantum value [16]. 



total «nwtgy 


Figure A3. 12.3. Harmonic RRKM unimolecular rate constants for C 2 H 5 ^H+C 2 H 4 dissociation: classical 
state counting (solid curve), quantal state counting (dashed curve). Rate constant is in units of s _1 and energy 
in kcal mol -1 . (Adapted from [17].) 

RRKM theory allows some modes to be uncoupled and not exchange energy with the remaining modes [16]. 
In quantum RRKM theory, these uncoupled modes are not active, but are adiabatic and stay in fixed quantum 
states n during the reaction. For this situation, equation (A3. 12. 15) becomes 


kth.n) = , (A3.12.16) 

hp(E f n) 

In addition to affecting the number of active degrees of freedom, the fixed n also affects the unimolecular 
threshold E^(n). Since the total angular momentum y is a constant of motion and quantized according to 


j = JJU ■ lift (A3.12.17) 

the quantum number J is fixed during the unimolecular reaction. This may be denoted by explicitly including 
Jin equation (A3. 12. 16), i.e. 

ktE, J, n) = NH ^ Jn \ (A3.12.18) 

pit:, J.n) 
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The treatment of angular momentum is discussed in detail below in section A3. 12.4.3 . 
A3.12.3.2 K(E) AS AN AVERAGE FLUX 

The RRKM rate constant is often expressed as an average classical flux through the transition state [18, 19 
and 20]. To show that this is the case, first recall that the density of states p(E) for the reactant may be 
expressed as 

p(E) = — I . . . / d</i ...dtj$„dp\ ..dpfrtiiE - H)ftr Ti (A3.12.19) 


where is the Heaviside function, i.e. 0(x) = 1 for x > and 0(x) = for x < 0. Since the delta and Heaviside 
functions are related by S(x) = d0(x)/dx, equation (A3. 12. 19) becomes 


Pl£ 


1- f ... /dg .dp m &p ...ipiME-ntth*", (A3. 12.20) 


From equation (A3.12.ll) , equation (A3. 12. 12) , equation (A3. 12. 13) , equation (A3. 12. 14) and equation 
(A3. 12. 15) and the discussion above, the RRKM rate constant may be written as 

ft"' 1 [/■■■/ Jvl» ^l M - M^' - H)]dH (A3 ^ 21) 

/ . . .jdtfi , . . d^ w d/?i . . .dj>fc r 5fE - H) 

The inner multiple integral is the transition state's density of states at energy E\ and also the numerator in 

equation ( A3 . 1 2 . 1 3 ) , which gives the transition states sum of states N+(E - E^) when integrated from E' = to 
E'=E- E Q . Using Hamilton's equation dH/dp^ = q v dHin the above equation may be replaced by q^dpy 

Also, from the definition of the delta function 


fs{qy ~q[)dq\ = I. (A3.12.22) 


This expression may be inserted into the numerator of the above equation, without altering the equation. 
Making the above two changes and noting that 8(q^ - q*) specifies the transition state, so that the J super 

to the transition state's coordinates and momenta may be dropped, equation (A3. 12.21) becomes 


kit\ = - J .! „ ' ' — — *-± -, (A3. 12.23) 

j ...J -d^y i . .,df/^d/J| ...<it*to&[E - H) 
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The rate constant is an average of<f]8(q^ - q\), with positive f i, for a microcanonical ensemble H = E and may 


be expressed as 


k{E) = {tfi5(tfi -q\))- (A3.12.24) 

The RRKM rate constant written this way is seen to be an average flux through the transition state. 
A3.1 2.3.3 VARIATIONAL RRKM THEORY 

In deriving the RRKM rate constant in section A3. 12.3. 1 , it is assumed that the rate at which reactant 
molecules cross the transition state, in the direction of products, is the same rate at which the reactants form 
products. Thus, if any of the trajectories which cross the transition state in the product direction return to the 
reactant phase space, i.e. recross the transition state, the actual unimolecular rate constant will be smaller than 
that predicted by RRKM theory. This one-way crossing of the transition state, with no recrossing, is a 
fundamental assumption of transition state theory [21]. Because it is incorporated in RRKM theory, this 
theory is also known as microcanonical transition state theory. 

As a result of possible recrossings of the transition state, the classical RRKM k(E) is an upper bound to the 
correct classical microcanonical rate constant. The transition state should serve as a bottleneck between 
reactants and products, and in variational RRKM theory [ 22 ] the position of the transition state along q^ is 
varied to minimize k(E). This minimum k(E) is expected to be the closest to the truth. The quantity actually 
minimized is N+(E - E^) in equation (A3. 12. 15) , so the operational equation in variational RRKM theory is 

^-1 = (A3. 12.25) 

iky i 

where E^(q^ is the potential energy as a function of q v The minimum in m[E - E^q^)] is identified by q^ = 
qf and this value for q v with the smallest sum of states, is expected to be the best bottleneck for the reaction. 

For reactions with well defined potential energy barriers, as in figure A3. 12. 1(a) and figure A3. 12. 1(b) the 
variational criterion places the transition state at or very near this barrier. The variational criterion is 
particularly important for a reaction where there is no barrier for the reverse association reaction: see figure 
A3. 12. 1(c). There are two properties which gave rise to the minimum mN*\E-En(q-t)] for such a reaction. 


As q^ is decreased the potential energy E^q^) decreases and the energy available to the transition state E-E^ 
(q^) increases. This has the effect of increasing the sum of states. However, as q^ is decreased, the 
intermolecular anisotropic forces between the dissociating fragments increase, which has the effect of 
decreasing the available phase space and, thus, the sum of states. The combination of these two effects gives 

rise to a minimum in N*[E - E^(q^)]. Plots of the sum of states versus q^ are shown in figure A3. 12.4 for three 
model potentials of the C 2 H 6 — » 2CH 3 dissociation reaction [23], 
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Figure A3. 12.4. Plots of N\E-E^{q^) versus q^ for three models of the C 2 H 6 ^2CH 3 potential energy 
function. r + represents g 1 and the term on the abscissa represents A$. (Adapted from [23].) 

Variational RRKM theory is particularly important for unimolecular dissociation reactions, in which 
vibrational modes of the reactant molecule become translations and rotations in the products [22]. For CH 4 — » 
CH 3 +H dissociation there are three vibrational modes of this type, i.e. the C— H stretch which is the reaction 
coordinate and the two degenerate H— CH 3 bends, which first transform from high-frequency to low- 
frequency vibrations and then hindered rotors as the H— C bond ruptures. These latter two degrees of freedom 
are called transitional modes [ 24 , 25 ], C 2 H 6 ^2CH 3 dissociation has five transitional modes, i.e. two pairs of 
degenerate CH 3 rocking/rotational motions and the CH 3 torsion. 

To calculate N*(E-Eq), the non-torsional transitional modes have been treated as vibrations as well as 
rotations [26]. The former approach is invalid when the transitional mode's barrier for rotation is low, while 
the latter is inappropriate when the transitional mode is a vibration. Harmonic frequencies for the transitional 
modes may be obtained from a semi-empirical model [23] or by performing an appropriate normal mode 
analysis as a function of the reaction path for the reaction's potential energy surface [26], Semiclassical 
quantization may be used to determine anharmonic energy levels for the transitional modes [27], 


The intermolecular Hamiltonian of the product fragments is used to calculate the sum of states of the 
transitional modes, when they are treated as rotations. The resulting model [28] is nearly identical to phase 
space theory [29], 
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if the distance between the product fragments' centres-of-mass is assumed to be the reaction coordinate [30]. 
A more complete model is obtained by using a generalized reaction coordinate, which may contain 
contributions from different motions, such as bond stretching and bending, as well as the above relative 
centre-of-mass motion [31]. 

Variational RRKM calculations, as described above, show that a unimolecular dissociation reaction may have 
two variational transition states [32, 33, 34, 35 and 36], i.e. one that is a tight vibrator type and another that is 
a loose rotator type. Whether a particular reaction has both of these variational transition states, at a particular 
energy, depends on the properties of the reaction's potential energy surface [33, 34 and 35]. For many 
dissociation reactions there is only one variational transition state, which smoothly changes from a loose 
rotator type to a tight vibrator type as the energy is increased [26], 


A3.12.4 APPROXIMATE MODELS FOR THE RRKM RATE CONSTANT 

A3.1 2.4.1 CLASSICAL HARMONIC OSCILLATORS: RRK THEORY 

The classical mechanical RRKM k(E) takes a very simple form, if the internal degrees of freedom for the 
reactant and transition state are assumed to be harmonic oscillators. The classical sum of states for s harmonic 
oscillators is [ 16 ] 


NiE) = -—=! ; — - (A3. 12.26) 


The density p(E) = dN(E)/dE is then 


ME) = — (A3.12.27) 

The reactant density of states in equation (A3. 12. 15) is given by the above expression for p(£). The transition 
state's sum of states is 


(g-g ) f - L 

(.v-]j!n;:>; 


NHL- E t) )= ' , _, . » . (A3. 1 2.28) 


Inserting equation (A3. 12. 27) and equation (A3. 12.28) into equation (A3 .12. 15) gives 
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If the ratio of the products of vibrational frequencies is replaced by v, equation (A3. 12.29) becomes 

k(E) = v f I , (A3.12.30) 

which is the Rice-Ramsperger-Kassel (RRK) unimolecular rate constant [16,37]. Thus, the k(E) of RRK 
theory is the classical harmonic limit of RRKM theory. 

A3.1 2.4.2 QUANTUM HARMONIC OSCILLATORS 

Only in the high-energy limit does classical statistical mechanics give accurate values for the sum and density 
of states terms in equation (A3. 12. 15) [3,14]. Thus, to determine an accurate RRKM k(E) for the general case, 
quantum statistical mechanics must be used. Since it is difficult to make anharmonic corrections, both the 
molecule and transition state are often assumed to be a collection of harmonic oscillators for calculating the 

sum N\E - Eq) and density p(E). This is somewhat incongruous since a molecule consisting of harmonic 
oscillators would exhibit intrinsic non-RRKM dynamics. 

With the assumption of harmonic oscillators, the molecule's quantum energy levels are 

■V 

E(n) = y^Mf/ny, (A3. 12.31) 


The same expression holds for the transition state, except that the sum is over s - 1 oscillators and the 
frequencies are the v*. The Beyer-Swinehart algorithm [38] makes a very efficient direct count of the number 

of quantum states between an energy of zero and E. The molecule's density of states is then found by finite 
difference, i.e. 

A'.£ + A£/2)-JVlC-AC/2) (A3 , 2 . 32) 

AZT 

where N(E + AE/2) is the sum of states at energy E + A E/2. The transition state's harmonic N*(E - Eq) is 
counted directly by the Beyer-Swinehart algorithm. This harmonic model is used so extensively to calculate a 
value for the RRKM k(E) that it is easy to forget that RRKM theory is not a harmonic theory. 

A3.1 2.4.3 OVERALL ROTATION 

Regardless of the nature of the intramolecular dynamics of the reactant A*, there are two constants of the 
motion in a unimolecular reaction, i.e. the energy E and the total angular momentum^. The latter ensures the 
rotational quantum number J is fixed during the unimolecular reaction and the quantum RRKM rate constant 
is specified as k(E, J). 
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(A) SEPARABLE VIBRATION/ROTATION 

For a RRKM calculation without any approximations, the complete vibrational/rotational Hamiltonian for the 
unimolecular system is used to calculate the reactant density and transition state's sum of states. No 
approximations are made regarding the coupling between vibration and rotation. However, for many 
molecules the exact nature of the coupling between vibration and rotation is uncertain, particularly at high 
energies, and a model in which rotation and vibration are assumed separable is widely used to calculate the 
quantum RRKM k{E, J) [4,16]. To illustrate this model, first consider a linear polyatomic molecule which 
decomposes via a linear transition state. The rotational energy for the reactant is assumed to be that for a rigid 
rotor, i.e. 


E r = J (J 4 ])/r/2/. (A3.12.33) 

The same expression applies to the transition state's rotational energy E*(J) except that the moment of inertia / 

is replaced by fi. Since the quantum number j is fixed, the active energies for the reactant and transition state 
are [E - E Y (J)] and [E-Eq- E f '(J)], respectively. The RRKM rate constant is denoted by 

,,_ „ .V[A-tii-f-(J)| 

kiF - J) Mfi -£.<■»] <A3, " 4) 

where Ar and p are the sum and density of states for the vibrational degrees of freedom. Eachy level is (2J + 
1) degenerate, which cancels for the sum and density. 

(B) THE K QUANTUM NUMBER: ADIABATIC OR ACTIVE 

The degree of freedom in equation (A3. 12. 18) , which has received considerable interest regarding its activity 
or adiabaticity, is the one associated with the K rotational quantum number for a symmetric or near-symmetric 
top molecule [39,40]. The quantum number K represents the projection of J onto the molecular symmetry 
axis. Coriolis coupling can mix the 2J+1 K levels for a particular J and destroy K as a good quantum number. 
For this situation K is considered an active degree of freedom. On the other hand, if the Coriolis coupling is 
weak, the K quantum number may retain its integrity and it may be possible to measure the unimolecular rate 
constant as a function of K as well as of E and J. For this case, K is an adiabatic degree of freedom. 

It is straightforward to introduce active and adiabatic treatments of K into the widely used RRKM model 
which represents vibration and rotation as separable and the rotations as rigid rotors [41,42]. For a symmetric 
top, the rotational energy is given by 

EfUtK) = ^^!!1 + (j - ^ JfV. (A3.12.35) 


2L 
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If K is adiabatic, a molecule containing total vibrational-rotational energy E and, in a particular J, K level, has 
a vibrational density of states p[E - E r (J,K)]. Similarly, the transition state's sum of states for the same E, J, 
and K is n\e -E q - Ef(J,K)]. The RRKM rate constant for the K adiabatic model is 


hp[E - E t (J* A }] 

Mixing the 2J+ 1 K levels, for the K active model, results in the following sums and densities of states: 

J 

NHE. J) = Y^ N ^ E - £ » ~ E H J > K ^ (A3.12.37) 

K=-J 

J 
plK,J)= ^ p[E-E T U.K)l (A3.12.38) 


K = -J 


The RRKM rate constant for the K active model is 


k(F.J)= ^ k ~ \ ! - . (A3. 1 2.39) 

* EJU-v P\E - EAJ, K)] 

In these models the treatment of K is the same for the molecule and transition state. It is worthwhile noting 
that mixed mode RRKM models are possible in which K is treated differently in the molecule and transition 
state [39]. 


A3.12.5 ANHARMONIC EFFECTS 

In the above section a harmonic model is described for calculating RRKM rate constants with harmonic sums 
and densities of states. This rate constant, denoted by k^(E, J), is related to the actual anharmonic RRKM rate 
constant by 

k{E.J) = f mh {E k J)k h (E r J) = / Mh (£ i 7) 5 JLi±_L (A3.12.40) 

where NN^(E, J) and p h (^, J) are the harmonic approximations to the sum and density of states. The 
anharmonic correction, / anh (^, J), is obviously 
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jmj). «j*<*-wi<s> = '•*»•*■» (A3 , 2 . 41) 

ArahlE. J)/ P^l E.J) fm^plEsJ) 

the ratio of anharmonic corrections for the sum and density. For energies near the unimolecular threshold, 
where the transition state energy E - E is small, anharmonicity in the transition state may be negligible, so 


that/ anh (iT) may be well approximated by l// anh m\ [43]. However, for higher energies, anharmonicity is 
expected to become important also for the transition state. 

There is limited information available concerning anharmonic corrections to RRKM rate constants. Only a 
few experimental studies have investigated the effect of anharmonicity on the reactant's density of states (see 
below). To do so requires spectroscopic information up to very high energies. It is even more difficult to 
measure anharmonicities for transition state energy levels [44,45]. If the potential energy surface is known for 
a unimolecular reactant, anharmonic energy levels for both the reactant and transition state may be determined 
in principle from large-scale quantum mechanical variational calculations. Such calculations are more feasible 
for the transition state with energy E-E^ than for the reactant with the much larger energy E. Such 
calculations have been limited to relatively small molecules [4]. 

The bulk of the information about anharmonicity has come from classical mechanical calculations. As 
described above, the anharmonic RRKM rate constant for an analytic potential energy function may be 
determined from either equation (A3. 12.4) [ 13 ] or equation (A3. 12.24) [ 46 ] by sampling a microcanonical 
ensemble. This rate constant and the one calculated from the harmonic frequencies for the analytic potential 
give the anharmonic correction f an ^(E, J) in equation (A3. 12.41). The transition state's anharmonic classical 
sum of states is found from the phase space integral 

N L< E < -0 =J-..Jd<!2- <i<iM, dp 2 ■ d/^(£ - tf )//i>"-' (A3.12.42) 

which may be combined with the harmonic sum NN^(E, J) to give f ar ^m(E, J). The classical anharmonic 
correction to the reactant's density of states, / anh (E, J), may be obtained in a similar manner. 

A3.1 2.5.1 MOLECULES WITH A SINGLE MINIMUM 

Extensive applications of RRKM theory have been made to unimolecular reactions, for which there is a single 
potential energy minimum for the reactant molecule [4,47]. For such reactions, the anharmonic correction/^ 
(E, J) is usually assumed to be unity and the harmonic model is used to calculate the RRKM k(E, J). Though 
this is a widely used approach, uncertainties still remain concerning its accuracy. Anharmonic densities of 
states for formaldehyde [48] and acetylene [49] obtained from high-resolution spectroscopic experiments at 
energies near their unimolecular thresholds, are 1 1 and 6 times larger, respectively, than their harmonic 
densities. From calculations with analytic potential energy functions at energies near the unimolecular 
thresholds, the HCN quantum anharmonic density of states is 8 times larger than the harmonic value [50] and 
the classical anharmonic density of states for the model alkyl radical HCC is 3-5 times larger than the 
harmonic value [51]. There is a sense that the anharmonic correction may become less important for larger 
molecules, since the average energy per mode becomes smaller [4]. However, as shown below, this 
assumption is clearly not valid for large fluxional molecules with multiple minima. 
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Analytic expressions have been proposed for making anharmonic corrections for molecules with a single 
potential minimum [52,53,54,55 and 56]. Haarhoff [52] derived an expression for this correction factor by 
describing the molecules' degrees of freedom as a collection of Morse oscillators. One of the limitations of 
this model is that it is difficult to assign Morse parameters to non-stretching degrees of freedom. Following 
the spirit of Haarhoff s work, Troe [54] formulated the correction factor 


/= I 


(A3. 12.43) 


for a molecule with s degrees of freedom, m of which are Morse stretches. The remaining s -m degrees of 
freedom are assumed to be harmonic oscillators. The D. are the individual Morse dissociation energies. To 
account for bend-stretch coupling, i.e. the attenuation of bending forces as bonds are stretched [51], Troe 
amended equation (A3. 12.43) to give [ 55 ] 

J&A,M £ ) = Yl ( n TrrS} ' (A3.12.44) 

The above expressions are empirical approaches, with m and D f as parameters, for including an anharmonic 
correction in the RRKM rate constant. The utility of these equations is that they provide an analytic form for 
the anharmonic correction. Clearly, other analytic forms are possible and may be more appropriate. For 
example, classical sums of states for H-C-C, H-C=C, and H-C=C hydrocarbon fragments with Morse 
stretching and bend-stretch coupling anharmonicity [51] are fit accurately by the exponential 

/mh,v(E) = exp(W. (A3.12.45) 

The classical anharmonic density of states is then [ 56 ] 

fartipiE) = exp(/^)[l +bEfs]. (A3.12.46) 

Modifying equation (A3. 12.45) to represent the transition state's sum of states, the anharmonic correction to 
the RRKM rate constant becomes 

, r exp[bHE -E&)] 

/WEI = : — - (A3. 12.47) 

wp(fr£)[l * hE/s] 

This expression, and variations of it, have been used to fit classical anharmonic microcanonical k(E, J) for 
unimolecular decomposition [56]. 
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A3.1 2.5.2 FLUXIONAL MOLECULES WITH MULTIPLE MINIMA 


Anharmonic corrections are expected to be very important for highly fluxional molecules such as clusters and 
macromolecules [30]. Figure A3. 12.5 illustrates a possible potential energy curve for a fluxional molecule. 
There are multiple minima (i.e. conformations) separated by barriers much lower than that for dissociation. 
Thus, a moderately excited fluxional molecule may undergo rapid transitions between its many 
conformations, and all will contribute to the molecule's unimolecular rate constant. Many different 
conformations are expected for the products, but near the dissociation threshold E^, only one set of product 
conformations is accessible. As the energy is increased, thresholds for other product conformations are 
reached. For energies near E^, there is very little excess energy in the transition state and the harmonic 
approximation for the lowest energy product conformation should be very good for the transition state's sum 


of states. Thus, for E &Eq the anharmonic correction in equation (A3. 12.40) is expected to primarily result 
from anharmonicity in the reactant density of states. 
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Figure A3. 12.5. A model reaction coordinate potential energy curve for a fluxional molecule. (Adapted from 
[30].) 

The classical anharmonic RRKM rate constant for a fluxional molecule may be calculated from classical 
trajectories by following the initial decay of a microcanonical ensemble of states for the unimolecular 
reactant, as given by equation (A3. 12.4) . Such a calculation has been performed for dissociation of the Al 6 
and Al 13 clusters using a model analytic potential energy function written as a sum of Lennard- Jones and 
Axelrod-Teller potentials [30]. Structures of some of the Al 6 minima, for the potential function, are shown in 
figure A3. 12.6 . The deepest potential minimum has 
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C 2h symmetry and a classical Al 6 — » A1 5 +A1 dissociation energy Eq of 43.8 kcal mol . For energies 30-80 
kcal mol -1 in excess of this £q, the value off ^ determined from the trajectories varies from 200 to 130. The 
harmonic RRKM rate constants are based on the deepest potential energy minima for the reactant and 
transition state, and calculated for a reaction path degeneracy of 6. As discussed above, even larger 
corrections are expected at lower energies, particularly for E « Eq, where anharmonicity in the transition state 
does not contribute B anh (£). However, because of the size of Al 6 and its long unimolecular lifetime, it 
becomes impractical to simulate the classical dissociation of Al 6 for energies in excess of E^ much smaller 

than 30 kcal mol. For the bigger cluster Al 13 , the anharmonic correction varies from 5500 to 1200 for 
excess energies in the range of 85-185 kcal mol [30]. These calculations illustrate the critical importance of 
including anharmonic corrections when calculating accurate RRKM rate constants for fluxional molecules. 
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Figure A3. 12.6. Structures for some of the potential energy minima for Al 6 . The unimolecular thresholds for 

the C 2h , D 3h , C 5? h , C 4v , and D h minima are 43.8, 40.0, 39.6, 38.8, 31.4 and 20.9 kcal mol -1 , respectively. 
(Adapted from [40]!) 

In the above discussion it was assumed that the barriers are low for transitions between the different 
conformations of the fluxional molecule, as depicted in figure A3. 12.5 and therefore the transitions occur on a 
timescale much shorter than the RRKM lifetime. This is the rapid IVR assumption of RRKM theory discussed 
in section A3. 12.2 . Accordingly, an initial microcanonical ensemble over all the conformations decays 
exponentially. However, for some fluxional molecules, transitions between the different conformations may 
be slower than the RRKM rate, giving rise to bottlenecks in the unimolecular dissociation [4,52]. The ensuing 
lifetime distribution, equation (A3. 12.7) , will be non-exponential, as is the case for intrinsic non-RRKM 
dynamics, for an initial microcanonical ensemble of molecular states. 
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A3.12.6 CLASSICAL DYNAMICS OF INTRAMOLECULAR MOTION 
AND UNIMOLECULAR DECOMPOSITION 

A3.1 2.6.1 NORMAL-MODE HAMILTONIAN: SLATER THEORY 

The classical mechanical model of unimolecular decomposition, developed by Slater [18], is based on the 
normal-mode harmonic oscillator Hamiltonian. Although this Hamiltonian is rigorously exact only for small 
displacements from the molecular equilibrium geometry, Slater extended it to the situation where molecules 
are highly vibrationally energized, undergo large amplitude motions and decompose. Since there are no 
couplings in the normal-mode Hamiltonian, the energies in the individual normal modes do not vary with 
time. This is the essential difference from the RRKM theory which treats a molecule as a collection of 
coupled modes which freely exchange energy. 

The normal-mode harmonic oscillator classical Hamiltonian is 


"-E 


(^+A,er> 


(A3. 12.48) 


i -I 


where X, = 4tt v j. Solving the classical equations of motion for this Hamiltonian gives rise to quasiperiodic 
motion [58] in which each normal-mode coordinate Q x varies with time according to 

<?i = Q" COSQxvjt - S/) (A3.12.49) 


where £?" is the amplitude and 5 f the phase of the motion. Thus, if an energy E- x = ( p ? + Xfjf)/2 and phase 8 z - 

are chosen for each normal mode, the complete intramolecular motion of the energized molecule may be 
determined for this particular initial condition. 

Reaction is assumed to have occurred if a particular internal coordinate q, such as a bond length, attains a 

critical extension q*. In the normal-mode approximation, the displacement d of internal coordinates and 
normal-mode coordinates Q are related through the linear transformation 

d= LQ. (A3. 12.50) 

The transformation matrix L is obtained from a normal-mode analysis performed in internal coordinates 
[59,60]. Thus, as the evolution of the normal-mode coordinates versus time is evaluated from equation 
(A3. 12.49), displacements in the internal coordinates and a value for q are found from equation (A3. 12.50). 
The variation in q with time results from a superposition of the normal modes. At a particular time, the 

normal-mode coordinates may phase together so that q exceeds the critical extension q+, at which point 
decomposition is assumed to occur. 
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The preceding discussion gives the essential details of the Slater theory. Energy does not flow freely within 
the molecule and attaining the critical reaction coordinate extension is not a statistically random process as in 
RRKM theory, but depends on the energies and phases of the specific normal modes excited. If a 
microcanonical ensemble is chosen at t = 0, Slater theory gives an initial decay rate which agrees with the 
RRKM value. However, Slater theory gives rise to intrinsic non-RRKM behaviour [ 12 , 13 ]. The trajectories 
are quasiperiodic and each trajectory is restricted to a particular type of motion and region of phase space. 
Thus, as specific trajectories react, other trajectories cannot fill up unoccupied regions of phase space. As a 
result, a microcanonical ensemble is not maintained during the unimolecular decomposition. In addition, some 
of the trajectories may be unreactive and trapped in the reactant region of phase space. 

Overall, the Slater theory is unsuccessful in interpreting experiments. Many unimolecular rate constants and 
reaction paths are consistent with energy flowing randomly within the molecule [4,36]. If one considers the 
nature of classical Hamiltonians for actual molecules, it is not surprising that the Slater theory performs so 
poorly. For example, in Slater theory, the intramolecular and unimolecular dynamics of the molecule conform 
to the symmetry of the molecular vibrations. Thus, if normal modes of a particular symmetry type are excited 
(e.g. in-plane vibrations) a decomposition path of another symmetry type (e.g. out-of-plane dissociation) 
cannot occur. This path requires excitation of out-of-plane vibrations. Normal modes of different symmetry 
types for actual molecules are coupled by Coriolis vibrational-rotational interactions [61]. Similarly, 
nonlinear resonance interactions couple normal modes of vibration, allowing transfer of energy [62,63]. Not 
including these effects is a severe shortcoming of Slater theory. Clearly, understanding the classical 
intramolecular motion of vibrationally excited molecules requires one to go beyond the normal-mode model. 

A3.1 2.6.2 COUPLED AN HARMONIC HAMILTONIANS 


The first classical trajectory study of unimolecular decomposition and intramolecular motion for realistic 
anharmonic molecular Hamiltonians was performed by Bunker [12,13]. Both intrinsic RRKM and non- 
RRKM dynamics was observed in these studies. Since this pioneering work, there have been numerous 
additional studies [9,17,30,64,65,66 and 67] from which two distinct types of intramolecular motion, chaotic 
and quasiperiodic [14], have been identified. Both are depicted in figure A3. 12.7 . Chaotic vibrational motion 
is not regular as predicted by the normal-mode model and, instead, there is energy transfer between the 
modes. If all the modes of the molecule participate in the chaotic motion and energy flow is sufficiently rapid, 
an initial microcanonical ensemble is maintained as the molecule dissociates and RRKM behaviour is 
observed [9]. For non-random excitation initial apparent non-RRKM behaviour is observed, but at longer 
times a microcanonical ensemble of states is formed and the probability of decomposition becomes that of 
RRKM theory. 
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Figure A3. 12.7. Two trajectories for a model HCC Hamiltonian. Top trajectory is for n^ c =0 and n cc =2, and 


is quasiperiodic. Bottom trajectory is for ^ HC = 5 and ^ cc = 0, and is chaotic. R^ is the HC bond length and R 2 
the CC bond length. Distance is in A ngstroms (A). (Adapted from [ 121 ] and [4].) 
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Quasiperiodic motion is regular as assumed by the Slater theory. The molecule's vibrational motion may be 
represented by a superposition of individual modes, each containing a fixed amount of energy. For some cases 
these modes resemble normal modes, but they may be identifiably different. Thus, although actual molecular 
Hamiltonians contain potential and kinetic energy coupling terms, they may still exhibit regular vibrational 
motion with no energy flow between modes as predicted by the Slater theory. The existence of regular motion 
for coupled systems is explained by the Kolmogorov-Arnold-Moser (KAM) theorem [4,68], 

Extensive work has been done to understand quasiperiodic and chaotic vibrational motion of molecular 
Hamiltonians [14,58,63,68,69]. At low levels of excitation, quasiperiodic normal mode motion is observed. 
However, as the energy is increased, nonlinear resonances [14,62,63,68] result in the flow of energy between 
the normal modes, giving rise to chaotic trajectories. In general, the fraction of the trajectories, which are 
chaotic at a fixed energy, increases as the energy is increased. With increase in energy the nature of the 
quasiperiodic trajectories may undergo a transition from normal mode to another type of motion, e.g. local 
mode [70,71]. In many cases the motion becomes totally chaotic before the unimolecular threshold energy is 
reached, so that the intramolecular dynamics is ergodic. Though this implies intrinsic RRKM behaviour for 
energies above the threshold, the ergodicity must occur on a timescale much shorter than the RRKM lifetime 
for a system to be intrinsically RRKM. 

For some systems quasiperiodic (or nearly quasiperiodic) motion exists above the unimolecular threshold, and 
intrinsic non-RRKM lifetime distributions result. This type of behaviour has been found for Hamiltonians 
with low unimolecular thresholds, widely separated frequencies and/or disparate masses [12,62,65]. Thus, 
classical trajectory simulations performed for realistic Hamiltonians predict that, for some molecules, the 
unimolecular rate constant may be strongly sensitive to the modes excited in the molecule, in agreement with 
the Slater theory. This property is called mode specificity and is discussed in the next section. 

It is of interest to consider the classical/quantal correspondence for the above different types of classical 
motion. If the motion within the classical phase space is ergodic so that the decomposing molecules can be 
described by a microcanonical ensemble, classical RRKM theory will be valid. However, the classical and 
quantal RRKM rate constants may be in considerable disagreement. This results from an incorrect treatment 
of zero-point energy in the classical calculations [17,72] and is the reason quantum statistical mechanics is 
needed to calculate an accurate RRKM rate constant: see the discussion following equation (A3. 12. 15) . With 
the energy referenced at the bottom of the well, the total internal energy of the dissociating molecule is E = E* 
+ E* where E* is the internal energy of the molecule and E* is its zero-point energy. The classical dissociation 

energy is D. and the energy available to the dissociating molecule at the classical barrier is E - D. Because 
the quantal threshold is D + E; where E.} is the zero-point energy at the barrier, the classical threshold is 

lower than the quantal one by E*. For large molecules with a large E* and/or for low levels of excitation the 

classical RRKM rate constant is significantly larger than the quantal one. Only in the high-energy limit are 
they equal; see figure A3. 12.3 . 

Quasiperiodic trajectories, with an energy greater than the unimolecular threshold, are trapped in the reactant 
region of phase space and will not dissociate. These trajectories correspond to quantum mechanical 
compound-state resonances \n) (discussed in the next section), which have complex eigenvalues. Applying 
semiclassical mechanics to the trajectories [73, 74, 75 and 76] gives energies E n , wavefunctions \\r n , and 
unimolecular rate constants k for these resonances. A classical microcanonical ensemble for an energized 


molecule may consist of quasiperiodic, chaotic, and 'vague tori' trajectories [77]. The lifetimes of trajectories 
for the latter may yield correct quantum k for resonance states[4]. 
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A3.12.7 STATE-SPECIFIC UNIMOLECULAR DECOMPOSITION 

The quantum dynamics of unimolecular decomposition may be studied by solving the time-dependent 
Schrodinger equation, i.e. equation (A3. 12.2) . For some cases the dissociation probability of the molecule is 
sufficiently small that one can introduce the concept of quasi-stationary states. Such states are commonly 
referred to as resonances, since the energy of the unimolecular product(s) in the continuum is in resonance 
with (i.e. matches) the energy of a vibrational/rotational level of the unimolecular reactant. For unimolecular 
reactions there are two types of resonance states. A shape resonance occurs when a molecule is temporarily 
trapped by a fairly high and wide potential energy barrier and decomposes by tunnelling. The second type of 
resonance, called a Feshbach or compound-state resonance, arises when energy is initially distributed between 
molecular vibrational/rotational degrees of freedom which are not strongly coupled to the decomposition 
reaction coordinate motion, so that there is a time lag for unimolecular dissociation. 

In a time-dependent picture, resonances can be viewed as localized wavepackets composed of a superposition 
of continuum wavefunctions, which qualitatively resemble bound states for a period of time. The 
unimolecular reactant in a resonance state moves within the potential energy well for a considerable period of 
time, leaving it only when a fairly long time interval x has elapsed; x may be called the lifetime of the almost 
stationary resonance state. 

Solving the time-dependent Schrodinger equation for resonance states [ 78 ] one obtains a set of complex 
eigenvalues, which may be written in the form 

^ = £ fJ -ir Fr /2 (A3.12.51) 

where E and Y n are positive constants. The constant E , the real component to the eigenvalue, gives the 
position of the resonance in the spectrum. It is easy to see the physical significance of complex energy values. 
The time factor in the wavefunction of a quasi-stationary state is of the form 

e9CrtHi/A)£l']=«p[-(i/ft)£ f| /]CTp[-(r fl /2B)/]. (A3.12.52) 

Hence, all probabilities given by the squared modulus of the wavefunction decrease as exp[-(r n /Tj)t] with 
time, that is 

\fa(t)\ 2 = |^r„(0>| 2 CKp[-(r„/a>/]. (A3.12.53) 

In particular, the probability of finding the unimolecular reactant within its potential energy well decreases 
according to this law. Thus T n determines the lifetime of the state and the state specific unimolecular rate 
constant is 

*ja = r w /fi = l/tj, (A3.12.54) 


where x n is the state's lifetime. 
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The energy spectrum of the resonance states will be quasi-discrete; it consists of a series of broadened levels 
with Lorentzian lineshapes whose full- width at half-maximum T is related to the lifetime by T = ftx. The 
resonances are said to be isolated if the widths of their levels are small compared with the distances (spacings) 
between them, that is 


r„ <£ 1/piE) 


(A3. 12.55) 


where p(E) is the density of states for the energized molecule. A possible absorption spectrum for a molecule 
with isolated resonances is shown in figure A3. 12.8. Below the unimolecular threshold^, the absorption 
lines for the molecular eigenstates are very narrow and are only broadened by interaction of the excited 
molecule with the radiation field. However, above Eq the excited states leak toward product space, which 
gives rise to widths for the resonances in the spectrum. Each resonance has its own characteristic width (i.e. 
lifetime). As the linewidths broaden and/or the number of resonance states in an energy interval increases, the 
spectrum may no longer be quasi-discrete since the resonance lines may overlap, that is 


r fl » Wp(E)> 


(A3. 12.56) 


It is of interest to determine when the linewidth T(E) associated with the RRKM rate constant k(E) equals the 
average distance p(E) between the reactant energy levels. From equation (A3. 12.54) T(E) = hk(E) and from 
the RRKM rate constant expression equation (A3. 12. 15) p(£) _1 = h2n K(E)/N±(E - E^). Equating these two 

terms gives N+(E - E^) = 2tt, which means that the linewidths, associated with RRKM decomposition, begin 
to overlap when the transition state's sum of states exceeds six. 


i 


i 



Energy 


Figure A3. 12.8. Possible absorption spectrum for a molecule which dissociates via isolated compound-state 
resonances, E^ is the unimolecular threshold. (Adapted from [4].) 


The theory of isolated resonances is well understood and is discussed below. Mies and Krauss [79,80] and 
Rice [ 81 ] were pioneers in treating unimolecular rate theory in terms of the decomposition of isolated 
Feshbach resonances. 
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A3.1 2.7.1 ISOLATED REACT ANT RESONANCE STATES MODE SPECIFICITY 

The observation of decomposition from isolated compound-state resonances does not necessarily imply mode- 
specific unimolecular decomposition. Nor is mode specificity established by the presence of fluctuations in 
state-specific rate constants for resonances within a narrow energy interval. What is required for mode- 
specific unimolecular decomposition is a distinguishable and, thus, assignable pattern (or patterns) in the 
positions of resonance states in the spectrum. Identifying such patterns in a spectrum allows one to determine 
which modes in the molecule are excited when forming the resonance state. It is, thus, possible to interpret 
particularly large or small state-specific rate constants in terms of mode-specific excitations. Therefore, mode 
specificity means there are exceptionally large or small state-specific rate constants depending on which 
modes are excited. 

The ability to assign a group of resonance states, as required for mode-specific decomposition, implies that 
the complete Hamiltonian for these states is well approximated by a zero-order Hamiltonian with 
eigenfunctions §£m) [58]. The § f are product functions of a zero-order orthogonal basis for the reactant 
molecule and the quantity m represents the quantum numbers defining §.. The wavefunctions \\i n for the 
compound state resonances are given by 


fn = J]V^-(m)- (A3.12.57) 


Resonance states in the spectra, which are assignable in terms of zero-order basis (^.(m), will have a 
predominant expansion coefficient c. . Hose and Taylor [ 58 ] have argued that for an assignable level cj^>0.5 

for one of the expansion coefficients. The quasiperiodic and 'vague tori' trajectories for energies above the 
unimolecular threshold, discussed in the previous section, are the classical analogue of these quantum mode 
specific resonance states. 

Mode specificity has been widely observed in the unimolecular decomposition of van der Waals molecules 
[82], e.g. 

HF-HF-* 2HK 

A covalent bond (or particular normal mode) in the van der Waals molecule (e.g. the I 2 bond in I 2 -He) can be 
selectively excited, and what is usually observed experimentally is that the unimolecular dissociation rate 
constant is orders of magnitude smaller than the RRKM prediction. This is thought to result from weak 
coupling between the excited high-frequency intramolecular mode and the low-frequency van der Waals 
intermolecular modes [83]. This coupling may be highly mode specific. Exciting the two different HF stretch 
modes in the (HF) 2 dimer with one quantum results in lifetimes which differ by a factor of 24 [84], Other van 
der Waals molecules studied include (NO) 2 [85], NO-HF [86], and (C 2 H 4 ) 2 [87]. 

There are fewer experimental examples of mode specificity for the unimolecular decomposition of covalently 
bound molecules. One example is the decomposition of the formyl radical HCO, namely 

HCO^ H + CtX 
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Well defined progressions are seen in the stimulated emission pumping spectrum (SEP) [88,89] so that 
quantum numbers may be assigned to the HCO resonance states, and lifetimes for these states may be 
associated with the degree of excitation in the HC stretch, CO stretch and HCO bend vibrational modes, 
denoted by quantum numbers u 1? u 2 , and u 3 , respectively. States with large u 1 and large excitations in the HC 
stretch have particularly short lifetimes, while states with large u 2 and large excitations in the CO stretch have 
particularly long lifetimes. Short lifetimes for states with a large u 1 might be expected, since the reaction 
coordinate for dissociation is primarily HC stretching motion. The mode specific effects are illustrated by the 
nearly isoenergetic (Up u 2 , u 3 ) resonance states (0, 4, 5), (1, 5, 1) and (0, 7, 0) whose respective energies (i.e. 
position in spectrum) are 12373, 12487 and 12544 cm -1 and whose respective linewidths F are 42, 55 and 
0.72 cm- 1 . 

Time-dependent quantum mechanical calculations have also been performed to study the HCO resonance 
states [90,91]. The resonance energies, linewidths and quantum number assignments determined from these 
calculations are in excellent agreement with the experimental results. 

Mode specificity has also been observed for HOCl^Cl+OH dissociation [92, 93 and 94]. For this system, 
many of the states are highly mixed and unassignable (see below). However, resonance states with most of the 
energy in the OH bond, e.g. u QH = 6, are assignable and have unimolecular rate constants orders of magnitude 
smaller than the RRKM prediction [92, 91 and 94]- The lifetimes of these resonances have a very strong 
dependence on the J and K quantum numbers of HOC1. 

(A) STATISTICAL STATE SPECIFICITY 

In contrast to resonance states which may be assigned quantum numbers and which may exhibit mode- 
specific decomposition, there are states which are intrinsically unassignable. Because of extensive couplings, 
a zero-order Hamiltonian and its basis set cannot be found to represent the wavefunctions \\f n for these states. 
The spectrum for these states is irregular without patterns, and fluctuations in the k n are related to the manner 
in which the \\f are randomly distributed in coordinate space. Thus, the states are intrinsically unassignable 
and have no good quantum numbers apart from the total energy and angular momentum. Energies for these 
resonance states do not fit into a pattern, and states with particularly large or small rate constants are simply 
random occurrences in the spectrum. For the most statistical (i.e. non-separable) situation, the expansion 
coefficients in equation (A3. 12.56) are random variables, subject only to the normalization and orthogonality 
conditions 


5^«7 fl = 1 iiild ^0#,cv ffl =0. 


(A3. 12.58) 


If all the resonance states which form a microcanonical ensemble have random \\f n , and are thus intrinsically 
unassignable, a situation arises which is called statistical state-specific behaviour [95]. Since the 
wavefunction coefficients of the \|/ are Gaussian random variables when projected onto §. basis functions for 
any zero-order representation [96], the distribution of the state-specific rate constants k will be as statistical 

as possible. If these k n within the energy interval E^> E + dE form a continuous distribution, Levine [92] has 
argued that the probability of a particular k is given by the Porter-Thomas [98] distribution 
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P(A-) = — ^ ) — 1- A3.12.59 

2k \2kJ ni/2v) 

where Ms the average state-specific unimolecular rate constant within the energy interval E — » E + dE 1 , 

1= / kPik)d(k) (A3.12.60) 

Jq 

and v is the 'effective number of decay channels'. Equation (A3. 12.59) is derived in statistics as the 
probability distribution 

X* =xf +Jf| + ---+^j (A3.12.61) 

where the v x f are each independent Gaussian distributions [96]. Increasing v reduces the variance of the 
distribution P{k). 

The connection between the Porter-Thomas P(k) distribution and RRKM theory is made through the 
parameters Xand v. Waite and Miller [99] have studied the relationship between the average of the statistical 

state-specific rate constants k and the RRKM rate constant k(E) by considering a separable (uncoupled) two- 
dimensional Hamilton, H= H x + H , whose decomposition path is tunnelling through a potential energy 
barrier along the x-coordinate. They found that the average of the state-specific rate constants for a 
microcanonical ensemble Xis the same as the RRKM rate constant k(E). Though insightful, this is not a 

general result since the tunnelling barrier defines the dividing surface, with no recrossings, which is needed to 
derive RRKM from classical (not quantum) mechanical principles (see section A3. 12.3 ). For state-specific 
decomposition which does not occur by tunnelling, a dividing surface cannot be constructed for a quantum 
calculation. However, the above analysis is highly suggestive that Tmay be a good approximation to the 

RRKM k(E). 

The parameter v in equation (A3. 12.59) has also been related to RRKM theory. Polik et al [80] have shown 
that for decomposition by quantum mechanical tunnelling 

V = \J^KiK - £;>] /Etf<£ - & (A3.12.62) 

where k^ - £"=) is a one-dimensional tunnelling probability through a potential barrier and E~ x is the 

vibrational energy in the 37V- 7 modes orthogonal to the tunnelling coordinate. If the energy is sufficiently 
low that all the tunnelling probabilities are much less than 1 and one makes a parabolic approximation to the 
tunnelling probabilities [ 96 , 100 ], equation (A3. 12.62) becomes 


3.V-7 

n 

fc=1 


V = Y\ COth(yT^A^b) (A3.12.63) 
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where the coj are the 37V - 7 frequencies for the modes orthogonal to the tunnelling coordinate and a> b is the 


barrier frequency. The interesting aspect of equation (A3. 12.63) is that it shows v to be energy independent in 
the tunnelling region. On the other hand, for energies significantly above the barrier so that k^ - £|) = 1 , it is 

easy to show [96, 100] that 


v = N l (E) 


(A3. 12.64) 


where N+(E) is the sum of states for the transition state. In this energy region, v rapidly increases with 
increase in energy and the P(k) distribution becomes more narrowly peaked. Statistical state-specific 

behaviour has been observed in experimental SEP studies of H 2 CO^H 2 +CO dissociation [ 44 , 48 ] and 
quantum mechanical scattering calculations of H0 2 ^H+0 2 dissociation [ 101 , 102 ]. The state-specific rate 
constants for the latter calculation are plotted in figure A3. 12.9. For both of these dissociations the RRKM 
rate constant and the average of the state-specific quantum rate constants for a small energy interval AE are in 
good agreement. Similarly, the fluctuations in the resonance rate constants are well represented by the Porter- 
Thomas distribution. That H0 2 dissociation is statistical state-specific, while HCO dissociation is mode 
specific, is thought to arise from the deeper potential energy well and associated greater couplings and density 
of states for HO~. 
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Figure A3. 12.9. Comparison of the unimolecular dissociation rates for H0 2 ^H+0 2 as obtained from the 
quantum mechanical resonances (k , open circles) and from variational transition state RRKM (^rrjq^ step 
function). E^ is the threshold energy for dissociation. Also shown is the quantum mechanical average (solid 
line) as well as the experimental prediction for J=0 derived from a simplified SACM analysis of high pressure 
unimolecular rate constants. (Adapted from [ 101 ].) 
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A microcanonical ensemble of isolated resonances decays according to 


(A3. 12.65) 


n 

If the state-specific rate constants are assumed continuous, equation (A3. 12.65) can be written as [ 103 ] 

NU.E) = No j exp{-k!)Pik) <l(k) (A3.12.66) 

where Nq is the total number of molecules in the microcanonical ensemble. For the Porter-Thomas P(k) 
distribution, N(t, E) becomes [ 103 , 104 ] 

N(f % E)/Nq = (1 + 2lf/n)" 1,/3 . (A3.12.67) 

The expression for N(t, E) in equation (A3. 12.67) has been used to study [ 103 , 104 ] how the Porter-Thomas P 
(k) affects the collision-averaged monoenergetic unimolecular rate constant k(co, E) [ 105 ] and the Lindemann- 
Hinshelwood unimolecular rate constant £ uni (o>, T) [47]. The Porter-Thomas P(k) makes k(co, E) pressure 
dependent [ 103 ]. It equals Mn the high-pressure co^oo limit and [(v-2)/v]X" in the co — » low-pressure limit. 

P(k) only affects ^ un j(o3, T) in the intermediate pressure regime [ 40 , 104 ], and has no affect on the high- and 
low-pressure limits. This type of analysis has been applied to H0 2 —> H + 2 resonance states [ 106 ], which 
decay in accord with the Porter-Thomas P(k). Deviations between the £ uni (a>, T) predicted by the Porter- 
Thomas and exponential P(k) are more pronounced for the model in which the rotational quantum number K 
is treated as adiabatic than the one with K active. 

A3.1 2.7.2 INDIVIDUAL TRANSITION STATE LEVELS 

The prediction of RRKM theory is that at low energies, where N+(E) is small, there are incremental increases 
in N+(E) and resulting in steps in k(E). The minimum rate constant is at the threshold where N+(E) = 1, i.e. k 
(Eq) = 1/p(Eq). Steps are then expected in k(E) as N*(E) increases by unit amounts. This type of behaviour has 
been observed in experiments for N0 2 ->NO+0 [107,108], CH 2 CO->CH 2 +CO [44] and 
CH 3 CHO^CH 3 +HCO [ 109 ] dissociation. These experiments do not directly test the rapid IVR assumption of 

RRKM theory, since steps are expected in N*(E) even if all the states of the reactant do not participate in pCE). 
However, if the measured threshold rate constant k(E^) equals the inverse of the accurate anharmonic density 

of states for the reactant (difficult to determine), RRKM theory is verified. 

If properly interpreted [ 110 ], the above experiments provide information about the energy levels of the 
transition state, i.e. figure A3. 12. 10 . For N0 2 ^NO+O dissociation, there is no barrier for the reverse 
association reaction, and it has been suggested that the steps in its k(E) may arise from quantization of the 
transition state's O— NO bending mode [ 107 , 108 ], Ketene (CH 2 CO) dissociates on both singlet and triplet 
surfaces. The triplet surface has a saddlepoint, at which the transition state is located, and the steps in k(E) for 
this surface are thought to result from excitation in the transition state's CH 2 wag and C— CO bending 

vibrations [44]. The singlet ketene surface does not have a barrier for the reverse 1 CH 2 +CO association and 
the small steps in k(E) for dissociation on this surface are attributed to CO free 
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rotor energy levels for a loose transition state [44]. The steps for acetaldehyde dissociation [ 109 ] have been 
associated with the torsional and C— CO bending motions at the transition state. 


u 

E 





Reaction Coordinate 

Figure A3. 12. 10. Schematic diagram of the one-dimensional reaction coordinate and the energy levels 
perpendicular to it in the region of the transition state. As the molecule's energy is increased, the number of 
states perpendicular to the reaction coordinate increases, thereby increasing the rate of reaction. (Adapted 
from [4].) 

Detailed analyses of the above experiments suggest that the apparent steps in k(E) may not arise from 
quantized transition state energy levels [ 110 , 111 ]. Transition state models used to interpret the ketene and 
acetaldehyde dissociation experiments are not consistent with the results of high-level ab initio calculations 
[ 110 , 111 ]. The steps observed for N0 2 dissociation may originate from the opening of electronically excited 
dissociation channels [ 107 , 108 ]. It is also of interest that RRKM-like steps in k(E) are not found from detailed 
quantum dynamical calculations of unimolecular dissociation [ 91 , 101 , 102 , 112 ]. More studies are needed of 
unimolecular reactions near threshold to determine whether there are actual quantized transition states and 
steps in k(E) and, if not, what is the origin of the apparent steps in the above measurements of k(E). 


A3.12.8 EXAMPLES OF NON-RRKM DECOMPOSITION 

A3. 12.8. 1 APPARENT NON-RRKM 

Apparent non-RRKM behaviour occurs when the molecule is excited non-randomly and there is an initial 
non-RRKM decomposition before IVR forms a microcanonical ensemble (see section A3. 12.2 ). Reaction 
pathways, which have non-competitive RRKM rates, may be promoted in this way. Classical trajectory 
simulations were used in early studies of apparent non-RRKM dynamics [ 113 , 114 ]. 
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To detect the initial apparent non-RRKM decay, one has to monitor the reaction at short times. This can be 
performed by studying the unimolecular decomposition at high pressures, where collisional stabilization 
competes with the rate of IVR. The first successful detection of apparent non-RRKM behaviour was 
accomplished by Rabinovitch and co-workers [ 115 ], who used chemical activation to prepare vibrationally 
excited hexafluorobicyclopropyl-d 2 : 


CH 2 iCF> 


V 

CH- 


r CF— CF = CF 2 


■£F— CF k " 


V \/ 

CH 2 CD 2 


,CF 2 * 


The molecule decomposes by elimination of CF 2 , which should occur with equal probabilities from each ring 
when energy is randomized. However, at pressures in excess of 100 Torr there is a measurable increase in the 
fraction of decomposition in the ring that was initially excited. From an analysis of the relative product yield 
versus pressure, it was deduced that energy flows between the two cyclopropyl rings with a rate of only 3><10 9 
s . In a related set of experiments Rabinovitch et al [ 116 ] studied the series of chemically activated 
fluoroalkyl cyclopropanes: 



R, fl = CF 3 ,C 3 F7,Qr||. 


The chemically activated molecules are formed by reaction of CH 2 with the appropriate fluorinated alkene. 
In all these cases apparent non-RRKM behaviour was observed. As displayed in figure A3.12.ll the 
measured unimolecular rate constants are strongly dependent on pressure. The large rate constant at high 
pressure reflects an initial excitation of only a fraction of the total number of vibrational modes, i.e. initially 
the molecule behaves smaller than its total size. However, as the pressure is decreased, there is time for IVR 
to compete with dissociation and energy is distributed between a larger fraction of the vibrational modes and 
the rate constant decreases. At low pressures each rate constant approaches the RRKM value. 
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Figure A3.12.ll. Chemical activation unimolecular rate constants versus co for fluoroalkyl cyclopropanes. 
The □, Qand Apoints are for R=CF 3 , C 3 F 7 , and C 5 F n , respectively. (Adapted from [ 116 ].) 


Apparent non-RRKM dynamics has also been observed in time-resolved femtosecond (fs) experiments in a 
collision- free environment [ 117 ]. An experimental study of acetone illustrates this work. Acetone is 
dissociated to the CH 3 and CH 3 CO (acetyl) radicals by a fs laser pulse. The latter which dissociates by the 
channel 


CHgCO^ CO + CH3 

is followed in real time by fs mass spectrometry to measure its unimolecular rate constant. It is found to be 

2x10 s and -10 times smaller than the RRKM value, which indicates the experimental excitation process 
does not put energy in the C-C reaction coordinate and the rate constant value and short timescale reflects 
restricted IVR and non-RRKM kinetics. 

A3.1 2.8.2 INTRINSIC NON-RRKM 

As discussed in section A3. 12.2 , intrinsic non-RRKM behaviour occurs when there is at least one bottleneck 
for transitions between the reactant molecule's vibrational states, so that IVR is slow and a microcanonical 
ensemble over the reactant 's phase space is not maintained during the unimolecular reaction. The above 
discussion of mode-specific decomposition illustrates that there are unimolecular reactions which are 
intrinsically non-RRKM. Many van der Waals molecules behave in this manner [4,82]. For example, in an 
initial microcanonical ensemble for the (C 2 H 4 ) 2 van der Waals molecule both the C 2 H 4 — C 2 H 4 
intermolecular modes and C 2 H 4 intramolecular modes are excited with equal probabilities. However, this 
microcanonical ensemble is not maintained as the dimer dissociates. States with energy in the intermolecular 
modes react more rapidly than do those with the C 2 H 4 intramolecular modes excited [85]. 
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Furthermore, IVR is not rapid between the C 2 H 4 intramolecular modes and different excitation patterns of 
these modes result in different dissociation rates. As a result of these different timescales for dissociation, the 
relative populations of the vibrational modes of the C 2 H 4 dimer change with time. 

Similar behaviour is observed in both experiments and calculations for HCO^H+CO dissociation [ 88 , 89 , 90 

AND 91] and in calculations for the X~ — CH3Y ion-dipole complexes, which participate in S N 2 nucleophilic 
substitution reactions [ 118 ]. HCO states with HC excitation dissociate more rapidly than do those with CO 
excitation and, thus, the relative population of HC to CO excitation decreases with time. The unimolecular 
dynamics of the X~ — CH^Y complex is similar to that for van der Waals complexes. There is weak coupling 
between the X~— CH^Y intermolecular modes and the CH 3 Y intramolecular modes, and the two sets of 
modes react on different timescales. 

Definitive examples of intrinsic non-RRKM dynamics for molecules excited near their unimolecular 
thresholds are rather limited. Calculations have shown that intrinsic non-RRKM dynamics becomes more 
pronounced at very high energies, where the RRKM lifetime becomes very short and dissociation begins to 
compete with IVR [ 119 ]. There is a need for establishing quantitative theories (i.e. not calculations) for 
identifying which molecules and energies lead to intrinsic non-RRKM dynamics. For example, at thermal 

energies the unimolecular dynamics of the Cl~ — CH 3 C1 complex is predicted to be intrinsically non-RRKM 
[ 118 ], while experiments have shown that simply replacing one of the H-atoms of CH 3 C1 with a CN group 

leads to intrinsic RRKM dynamics for the Cl~— C1CH 2 CN complex [ 120 ]. This difference is thought to arise 
from a deeper potential energy well and less of a separation between vibrational frequencies for the Cl~— 
C1CH 2 CN complex. For the Cl~— CH^Cl complex the three intermolecular vibrational frequencies are 64(2) 
and 95 cm -1 , while the lowest intramolecular frequency is the C-Cl stretch of 622 cm -1 [1 181. Thus, very 


high-order resonances are required for energy transfer from the intermolecular to intramolecular modes. In 

contrast, for the Cl~— C1CH 2 CN complex there is less of a hierarchy of frequencies, with 44, 66 and 118 cm -1 
for the intermolecular modes and 207, 367, 499, 717, ... for the intramolecular ones [ 120 ], Here there are low- 
order resonances which couple the intermolecular and intramolecular modes. It would be very useful if one 
could incorporate such molecular properties into a theoretical model to predict intrinsic RRKM and non- 
RRKM behaviour. 
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A 3.13 Energy redistribution in reacting systems 

Roberto Marquardt and Martin Quack 


A 3.13.1 INTRODUCTION 

Energy redistribution is the key primary process in chemical reaction systems, as well as in reaction systems 
quite generally (for instance, nuclear reactions). This is because many reactions can be separated into two 
steps: 

(a) activation of the reacting species R, generating an energized species R : 

R -* R* (A3.13.1) 

(b) reaction of the energized species to give products. 

IV -> P. (A 3.13.2) 

The first step (A3. 13.1) is a general process of energy redistribution, whereas the second step (A3. 13.2) is the 
genuine reaction step, occurring with a specific rate constant at energy E. This abstract reaction scheme can 
take a variety of forms in practice, because both steps may follow a variety of quite different mechanisms. For 
instance, the reaction step could be a barrier crossing of a particle, a tunnelling process or a nonadiabatic 
crossing between different potential hypersurfaces to name just a few important examples in chemical 
reactions. 

The first step, which is the topic of the present chapter, can again follow a variety of different mechanisms. 
For instance, the energy transfer could happen within a molecule, say from one initially excited chemical 
bond to another, or it could involve radiative transfer. Finally, the energy transfer could involve a collisional 
transfer of energy between different atoms or molecules. All these processes have been recognized to be 
important for a very long time. The basic idea of collisional energization as a necessary primary step in 
chemical reactions can be found in the early work of van't Hoff [1] and Arrhenius [2, 3], leading to the 
famous Arrhenius equation for thermal chemical reactions (see also chapter A3. 4 ) 


i( r)=A(r)exp(-^p). 


(A 3.13.3) 


This equation results from the assumption that the actual reaction step in thermal reaction systems can happen 
only in molecules (or collision pairs) with an energy exceeding some threshold energy Eq which is close, in 
general, to the Arrhenius activation energy defined by equation (A3. 13.3). Radiative energization is at the 
basis of classical photochemistry (see e.g. [4, 3 and 7] and chapter B2. 5 ) and historically has had an 
interesting sideline in the radiation 


theory of unimolecular reactions [8], which was later superseded by the collisional Lindemann mechanism 
[9]. Recently, radiative energy redistribution has received new impetus through coherent and incoherent 
multiphoton excitation [10]. 

In this chapter we shall first outline the basic concepts of the various mechanisms for energy redistribution, 
followed by a very brief overview of collisional intermolecular energy transfer in chemical reaction systems. 
The main part of this chapter deals with true intramolecular energy transfer in polyatomic molecules, which is 
a topic of particular current importance. Stress is placed on basic ideas and concepts. It is not the aim of this 
chapter to review in detail the vast literature on this topic; we refer to some of the key reviews and books [ 11 , 
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, and 32] and the literature cited 
therein. These cover a variety of aspects of the topic and further, more detailed references will be given 
throughout this review. We should mention here the energy transfer processes, which are of fundamental 
importance but are beyond the scope of this review, such as electronic energy transfer by mechanisms of the 
Forster type [33, 34] and related processes. 


A 3.13.2 BASIC CONCEPTS FOR INTER- AND INTRAMOLECULAR 
ENERGY TRANSFER 

The processes summarized by equation (A3. 13.1) can follow quite different mechanisms and it is useful to 
classify them and introduce the appropriate nomenclature as well as the basic equations. 

A3.13.2.1 PROCESSES INVOLVING INTERACTION WITH THE ENVIRONMENT (BIMOLECULAR AND RELATED) 

(a) The first mechanism concerns bimolecular, collisional energy transfer between two molecules or atoms 
and molecules. We may describe such a mechanism by 


M + R -^M+r (A 3.13.4) 

or more precisely by defining quantum energy levels for both colliding species, e.g. 

|\4[J-ni) + R(E kl )}i -* {M(Ew) + R<EW)Ih. (A3.13.5) 


This is clearly a process of intermolecular energy transfer, as energy is transferred between two molecular 
species. Generally one may, following chapter A.3.4. 5 , combine the quantum labels of M and R into one level 
index (I for initial and F for final) and define a cross section a FI for this energy transfer. The specific rate 
constant ^ FT (^ t T ) for the energy transfer with the collision energy E t T is given by: 


— (A 3.13.6) 


with the reduced mass: 

E Hi * £ri ' £,,] = fur ■ ZT Rr * £ t _r> (A 3.13.7) 

We note that, by energy conservation, the following equation must hold: 

E M , > £ R j * £ r .i = £Mf * £rc f £t.f< (A 3.13.8) 

Some of the internal (rovibronic) energy of the atomic and molecular collision partners is transformed into 
extra translational energy A E t = E t F -E t j (or consumed, if A E t is negative). If one averages over a thermal 
distribution of translational collision energies, one obtains the thermal rate constant for collisional energy 
transfer: 

kv\lT)= f — — J / A'exp(-.x)^ F1 {i R T.v)dv. (A 3.13.9) 

We note here that the quantum levels denoted by the capital indices I and F may contain numerous energy 
eigenstates, i.e. are highly degenerate, and refer to chapter A3. 4 for a more detailed discussion of these 
equations. The integration variable in equation (A3. 13.9) is x = E, j / k^T. 

(b) The second mechanism, which is sometimes distinguished from the first although it is similar in kind, is 
obtained when we assume that the colliding species M does not change its internal quantum state. This special 
case is frequently realized if M is an inert gas atom in its electronic ground state, as the energies needed to 
generate excited states of M would then greatly exceed the energies available in ordinary reaction systems at 
modest temperatures. This type of mechanism is frequently called collision induced intramolecular energy 
transfer, as internal energy changes occur only within the molecule R. One must note that in general there is 
transfer of energy between intermolecular translation and intramolecular rotation and vibration in such a 
process, and thus the nomenclature 'intramolecular' is somewhat unfortunate. It is, however, widely used, 
which is the reason for mentioning it here. In the following, we shall not make use of this nomenclature and 
shall summarize mechanisms (a) and (b) as one class of bimolecular, intermolecular process. We may also 
note that, for mechanism (b) one can define a cross section a FI and rate constant k^ between individual, 
nondegenerate quantum states / and/and obtain special equations analogous to equation (A3. 13.5) , equation 
(A3. 13.4) and equation (A3. 13.3) , which we shall not repeat in detail. Indeed, one may then have cross 
sections and rates between different individual quantum states i and/of the same energy and thus no transfer 
of energy to translation. In this very special case, the redistribution of energy would indeed be entirely 
'intramolecular'within R. 


(c) The third mechanism would be transfer of energy between molecules and the radiation field. These 
processes involve absorption, emission or Raman scattering of radiation and are summarized, in the simplest 
case with one or two photons, in equation (A3. 13. 10), equation (A3. 13.1 1) and equation (A3. 13. 12): 


R, + fur -> R, (absorption) (A3.13.10) 

R L - -> Rf +/ir (emission) (A 3.13.11) 


Rj« + h v\ — * Rf, + h Vf (Rjim4.n1 ^altering). (A 3.13.12) 

In the case of polarized, but otherwise incoherent statistical radiation, one finds a rate constant for radiative 
energy transfer between initial molecular quantum states i and final states f: 


^ (4,t to*- ' : ' 


ft fi = V , " litfip (A 3.13.13) 


where *» = dl z / dv is the intensity per frequency bandwidth of radiation and Atf|-is the electric dipole transition 
moment in the direction of polarization. For unpolarized random spatial radiation of density p (v) per volume 
and frequency, *v/ c must be replaced by p (v) / 3, because of random orientation, and the rate of induced 
transitions (absorption or emission) becomes: 


iikJ.kw. 


ftf-""=fl flP {p) 


8JT 3 , (A 3.13.14) 


3/j-(4.T*:(,) 


5 FI is the Einstein coefficient for induced emission or absorption, which is approximately related to the 

absolute value of the dipole transition moment I M fi I , to the integrated cross section G fi for the transition and 
to the Einstein coefficient A^ for spontaneous emission [10]: 


C „ ( } 


Sfi = T C fi = , f Afl (A 3.13.15) 


with 


=/ 


tffiO')!' dw (A3.13.16) 

line 


and a fl (v) the frequency dependent absorption cross section. In equation (A3 .13.15) , v fi = I E^-E^ \ I h. 
Equation (A3. 13. 17) is a simple, useful formula relating the integrated cross section and the electric dipole 
transition moment as dimensionless quantities, in the electric dipole approximation [10, 100 ]: 


pirr 


Mr 


Dcbyc 


(A 3.13.17) 


From these equations one also finds the rate coefficient matrix for thermal radiative transitions including 
absorption, induced and spontaneous emission in a thermal radiation field following Planck's law [35]: 

si en {Ef - E % ) 

kt = An ! * (A3.13.18) 


Finally, if one has a condition with incoherent radiation of a small band width Av exciting a broad absorption 
band with a(v ± Av) « a(v), one finds: 


i induced _ ^ U-' > r 


(A 3.13.19) 


where /is the radiation intensity. For a detailed discussion refer to [10]. The problem of coherent radiative 
excitation is considered in section A3. 13.4 and section A3. 13.5 in relation to intramolecular vibrational 
energy redistribution. 

(d) The fourth mechanism is purely intramolecular energy redistribution. It is addressed in the next section. 

A3.13.2.2 STRICTLY MONOMOLECULAR PROCESSES IN ISOLATED MOLECULES 

Purely intramolecular energy transfer occurs when energy migrates within an isolated molecule from one part 
to another or from one type of motion to the other. Processes of this type include the vast field of molecular 
electronic radiationless transitions which emerged in the late 1960s [36], but more generally any type of 
intramolecular motion such as intramolecular vibrational energy redistribution (IVR) or intramolecular 
vibrational-rotational energy redistribution (IVRR) and related processes [37, 38 and 39]. These processes 
will be discussed in section A3. 13.5 in some detail in terms of their full quantum dynamics. However, in 
certain situations a statistical description with rate equations for such processes can be appropriate [38], 


Figure A3. 13.1 illustrates our general understanding of intramolecular energy redistribution in isolated 
molecules and shows how these processes are related to 'intermolecular 'processes, which may follow any of 
the mechanisms discussed in the previous section. The horizontal bars represent levels of nearly degenerate 
states of an isolated molecule. 
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Figure A3. 13.1. Schematic energy level diagram and relationship between 'intermolecular' (collisional or 
radiative) and intramolecular energy transfer between states of isolated molecules. The fat horizontal bars 
indicate thin energy shells of nearly degenerate states. 


Having introduced the basic concepts and equations for various energy redistribution processes, we will now 


discuss some of them in more detail. 


A 3.13.3 COLLISIONAL ENERGY REDISTRIBUTION PROCESSES 

A3.13.3.1 THE MASTER EQUATION FOR COLLISIONAL RELAXATION REACTION PROCESSES 

The fundamental kinetic master equations for collisional energy redistribution follow the rules of the kinetic 
equations for all elementary reactions. Indeed an energy transfer process by inelastic collision, equation 
(A3. 13.5) , can be considered as a somewhat special 'reaction'. The kinetic differential equations for these 
processes have been discussed in the general context of chapter A3. 4 on gas kinetics. We discuss here some 
special aspects related to collisional energy transfer in reactive systems. The general master equation for 
relaxation and reaction is of the type [ 11 , 12 and 13, 15, 25, 40, 41]: 


^1 = F(faU)l) (A 3.13.20) 

CjU = 0) =C,o< (A 3.13.21) 


The index y can label quantum states of the same or different chemical species. Equation (A3. 13.20) 
corresponds to a generally stiff initial value problem [42, 43]. In matrix notation one may write: 


^H=F[cit)] (A 3.13.22) 

C(r = 0) =C n . (A 3.13.23) 

There is no general, simple solution to this set of coupled differential equations, and thus one will usually 
have to resort to numerical techniques [42, 43] (see also chapter A3. 4 ). 

A3.1 3.3.2 THE MASTER EQUATION FOR COLLISIONAL AND RADIATIVE ENERGY REDISTRIBUTION UNDER 
CONDITIONS OF GENERALIZED FIRST-ORDER KINETICS 

There is one special class of reaction systems in which a simplification occurs. If collisional energy 
redistribution of some reactant occurs by collisions with an excess of 'heat bath'atoms or molecules that are 
considered kinetically structureless, and if furthermore the reaction is either unimolecular or occurs again with 
a reaction partner M having an excess concentration, then one will have generalized first-order kinetics for 
populations/?, of the energy levels of the reactant, i.e. with 


-IT = £<*>* - *Aj/'J> - k iPi (A 3.13.24) 

u ' t = , 

y = K/' (A 3.13.25) 


In equation (A3. 13.24), k. is the specific rate constant for reaction from levely, and IC - k are energy transfer 
rate coefficients. With appropriate definition of a rate coefficient matrix K one has, in matrix notation, 


-^- = Kp (A 3.13.26) 


where fory ^ i 

1/2 


fCjiCf) = (—^\ [A/J / A-CXp(-.OiT;i(Jt E r.Od.^ (A 3.13.27) 


(see equation (A3 .13.9) ) and 

-KjjlT) — kj * 2^ KljiTh (A3. 13.28) 

The master equation (A3. 13.26) applies also, under certain conditions, to radiative excitation with rate 
coefficients for radiative energy transfer being given by equation (A3. 13. 13) , equation (A3. 13. 14) , equation 
(A3. 13. 15) , equation (A3. 13. 16) , equation (A3. 13. 17) , equation (A3. 13. 18) and equation (A3. 13. 19) , 
depending on the case, or else by more general equations [10]. Finally, the radiative and collisional rate 
coefficients may be considered together to be important at the same time in a given reaction system, if time 
scales for these processes are of the appropriate order of magnitude. The solution of equation (A3. 13.26) is 
given by: 

p[l) = e\p(KHp(0), (A 3.13.29) 

This solution can be obtained explicitly either by matrix diagonalization or by other techniques (see chapter 
A3. 4 and [42, 43]). In many cases the discrete quantum level labels in equation (A3. 13.24) can be replaced by 
a continuous energy variable and the populations by a population density p(E), with replacement of the sum 
by appropriate integrals [11]. This approach can be made the starting point of useful analytical solutions for 
certain simple model systems [ 11 , 19 , 44 , 45 and 46 ]. 

While the time dependent populations/?.^ may generally show a complicated behaviour, certain simple 
limiting cases can be distinguished and characterized by appropriate parameters: 

(a) The long time steady state limit (formally t — » qo) is described by the largest eigenvalue A^ of K. Since all 
\. are negative, A^ has the smallest absolute value [35, 47]. In this limit one finds [47] (with the reactant 
fraction F R = Y*p •): 

d ln ^" " - ^^^ - k - _ Xl (A3.13.30) 

Thus, this eigenvalue A^ determines the unimolecular steady-state reaction rate constant. 

(b) The second largest eigenvalue X 2 determines ideally the relaxation time towards this steady state, thus: 


T K L = ~ k ^ 


(A 3.13.31) 


More generally, further eigenvalues must be taken into account in the relaxation process. 


(c) It is sometimes useful to define an incubation time x- by the limiting equation for steady state: 


Ln<F£'{M) = -A|C -tw). 


(A 3.13.32) 


Figure A3. 13.2 illustrates the origin of these quantities. Refer to [ 47 ] for a detailed mathematical discussion as 
well as the treatment of radiative laser excitation, in which incubation phenomena are important. Also refer to 
[ 11 ] for some classical examples in thermal systems. 



Figure A3. 13.2. Illustration of the analysis of the master equation in terms of its eigenvalues X^ and X 2 for the 
example of IR-multiphoton excitation. The dashed lines give the long time straight line limiting behaviour. 
The full line to the right-hand side is for x = F^(t) with a straight line of slope -X, = k •. The intercept of the 

_■ j Xv 1 LI 111 - 

corresponding dashed line ( /r u) indicates x inc (see equation (A3. 13.32)). The left-hand line is forx = I F, 

j -si i -i 

/r ii I with limiting slope -X 2 = r «i«(see text and [47]). 


R 


As a rule, in thermal unimolecular reaction systems at modest temperatures, A^ is well separated from the 
other eigenvalues, and thus the time scales for incubation and 'relaxation 'are well separated from the steady- 


state reaction time scale x 


t.-i i -i 

v reaction = ^ n ^ e ot ' i er hand, at high temperatures, £ uni , r «ia^and T iiK- may merge. 

This is illustrated in figure A3. 13.3 for the classic example of thermal unimolecular dissociation [48, 49, 50 
and 51]: 


NvO + Ar-v N^+O + Ar 


(A 3.13.33) 
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Note that in the 'low pressure limit' of unimolecular reactions ( chapter A3 .4 ), the unimolecular rate constant 
^uni * s en tirely dominated by energy transfer processes, even though the relaxation and incubation rates ( r win 

and T iiK-) may be much faster than & uni . 
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Figure A3. 13.3. Dissociation (£ uni = ^ diss ), incubation ( T im; ) and relaxation ( r wi^) rate constants for the 

reaction A^O — » 7V~ 2 + O at low pressure in argon (from [11], see discussion in the text for details and 
references to the experiments). 


The master equation treatment of energy transfer in even fairly complex reaction systems is now well 
established and fairly standard [52]. However, the rate coefficients k. . for the individual energy transfer 
processes must be established and we shall discuss some aspects of this matter in the following section. 


A3.1 3.3.3 MECHANISMS OF COLLISION AL ENERGY TRANSFER 


Collisional energy transfer in molecules is a field in itself and is of relevance for kinetic theory ( chapter 
A3.1 ), gas phase kinetics ( chapter A3. 4 ), RRKM theory ( chapter A3. 12 ), the theory of unimolecular reactions 
in general, 
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as well as the kinetics of laser systems [53], Chapter C3. 3 , Chapter C3. 4 and Chapter C3. 5 treat these subjects 
in detail. We summarize those aspects that are of importance for mechanistic considerations in chemically 
reactive systems. 


We start from a model in which collision cross sections or rate constants for energy transfer are compared 
with a reference quantity such as average Lennard- Jones collision cross sections or the usually cited Lennard- 
Jones collision frequencies [ 54 ] 


Z] j = JTtfj n Q;^ } (A3.13.34) 


fl'" 1 *- 


where a AB is the Lennard- Jones parameter and *^k is the reduced collision integral [54], calculated from the 
binding energy e and the reduced mass jli ab for the collision in the Lennard- Jones potential 


,„.«.[(*)■-(*)•]. 


(A3. 13.35) 


Given such a reference, we can classify various mechanisms of energy transfer either by the probability that a 
certain energy transfer process will occur in a 'Lennard- Jones reference collision', or by the average energy 
transferred by one 'Lennard- Jones collision'. 

With this convention, we can now classify energy transfer processes either as resonant, if I A E t I defined in 
equation (A3. 13.8) is small, or non-resonant, if it is large. Quite generally the rate of resonant processes can 
approach or even exceed the Lennard- Jones collision frequency (the latter is possible if other long-range 
potentials are actually applicable, such as by permanent dipole-dipole interaction). 

Resonant processes of some importance include resonant electronic to electronic energy transfer (E-E), such 
as the pumping process of the iodine atom laser 

{E-E} Oat 1 A) ' K 3 P,v2) -> CW'Ep + tfPifl). (A3.13.36) 

Another near resonant process is important in the hydrogen fluoride laser, equation (A3. 13.37), where 
vibrational to vibrational energy transfer is of interest: 

(V-V) IlF(r') * 1HV P ) -* HP(r' - Av) * IIFd- r - An. (A3.13.37) 

where Av is the number of vibrational quanta exchanged. If HF were a harmonic oscillator, A E. would be zero 
(perfect resonance). In practice, because of anharmonicity, the most important process is exothermic, leading 
to increasing excitation v' of some of the HF molecules with successive collisions [55, 56], because the 
exothermicity drives this process to high v' as long as plenty of HF(v") with low v" are available. 
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Resonant rotational to rotational (R-R) energy transfer may have rates exceeding the Lennard- Jones collision 
frequency because of long-range dipole-dipole interactions in some cases. Quasiresonant vibration to rotation 
transfer (V-R) has recently been discussed in the framework of a simple model [52]. 

'Non-resonant'processes include vibration-translation (V-T) processes with transfer probabilities decreasing 

to very small values for diatomic molecules with very high vibrational frequencies, of the order of 10 and 
less for the probability of transferring a quantum in a collision. Also, vibration to rotation (V-R) processes 

frequently have low probabilities, of the order of 10 , if A E+ is relatively large. Rotation to translation (R-T) 


processes are generally fast, with probabilities near 1. Also, the R-V-T processes in collisions of large 
polyatomic molecules have high probabilities, with average energies transferred in one Lennard- Jones 

collision being of the order of a few kJ moP 1 [ 11 , 25], or less in collisions with rare gas atoms. As a general 
rule one may assume collision cross sections to be small, if A E t is large [ 11 , 58 , 59 ]. 

In the experimental and theoretical study of energy transfer processes which involve some of the above 
mechanisms, one should distinguish processes in atoms and small molecules and in large polyatomic 
molecules. For small molecules a full theoretical quantum treatment is possible and even computer program 
packages are available [60, 62 and 63], with full state to state characterization. A good example are rotational 
energy transfer theory and experiments on He + CO [64]: 


He + CO(/) -+ Ik + CO(/ )- (A 3.13.38) 

On the experimental side, small molecule energy transfer experiments may use molecular beam techniques 
[65, 66 and 67] (see also chapter C3. 3 for laser studies). 

In the case of large molecules, instead of the detailed quantum state characterization implied in the cross 
sections a fi and rate coefficients K^ of the master equation (A3. 13.24) , one derives more coarse grained 
information on 'levels 'covering a small energy bandwidth around E and £" (with an optional notation K^ 
(E\E)) or finally energy transfer probabilities P(E',E) for a transition from energy E to energy £" in a highly 
excited large polyatomic molecule where the density of states p(E') is very large, for example in a collision 
with a heat bath inert gas atom [11]. Such processes can currently be modelled by classical trajectories [68, 69 
and 70]. 

Experimental access to the probabilities P(E',E) for energy transfer in large molecules usually involves 
techniques providing just the first moment of this distribution, i.e. the average energy (AE) transferred in a 
collision. Such methods include UV absorption, infrared fluorescence and related spectroscopic techniques 
[ 11 , 28 , 71 , 72 , 73 and 74]. More advanced techniques, such as kinetically controlled selective ionization 
(KCSI [74]) have also provided information on higher moments of P(E',E), such as ((A£) ). 

The standard mechanisms of collisional energy transfer for both small and large molecules have been treated 
extensively and a variety of scaling laws have been proposed to simplify the complicated body of data [58, 59, 
75]. To conclude, one of the most efficient special mechanisms for energy transfer is the quasi-reactive 
process involving chemically bound intermediates, as in the example of the reaction: 

Os0>\y)+O-*Oj-> O^I'V'J+O. (A 3.13.39) 
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Such processes transfer very large amounts of energy in one collision and have been treated efficiently by the 
statistical adiabatic channel model [ 11 , 19 , 30 , 76 , 77 , 78 and 79]. They are quite similar mechanistically to 
chemical activation systems. One might say that in such a mechanism one may distinguish three phases: 

(a) Formation of a bound collision complex AB: 

A(v\f) + B^ ABV (A 3.13.40) 

(b) IVRR in this complex: 


AB* -+ AB**. (A 3.13.41) 

(c) Finally, dissociation of the internally, statistically equilibrated complex: 

A R"^ Ad^/'l + B. (A 3.13.42) 

That is, rapid IVR in the long lived intermediate is an essential step. We shall treat this important process in 
the next section, but mention here in passing the observation of so-called 'supercollisions'transferring large 
average amounts of energy (AE) in one collision [80], even if intermediate complex formation may not be 
important. 


A 3.13.4 INTRAMOLECULAR ENERGY TRANSFER STUDIES IN 
POLYATOMIC MOLECULES 

In this section we review our understanding of IVR as a special case of intramolecular energy transfer. The 
studies are based on calculations of the time evolution of vibrational wave packets corresponding to middle 
size and large amplitude vibrational motion in polyatomic molecules. An early example for the investigation 
of wave packet motion as a key to understanding IVR and its implication on reaction kinetics using 
experimental data is given in [81]. Since then, many other contributions have helped to increase our 
knowledge using realistic potential energy surfaces, mainly for two- and three-dimensional systems, and we 
give a brief summary of these results below. 

A3.1 3.4.1 IVR AND CLASSICAL MECHANICS 

Before undergoing a substantial and, in many cases, practically irreversible, change of geometrical structure 
within a chemical reaction, a molecule may often perform a series of vibrations in the multidimensional space 
around its equilibrium structure. This applies in general to reactions that take place entirely in the bound 
electronic ground state and in many cases to reactions that start in the electronic ground state near the 
equilibrium structure, but evolve into highly excited states above the reaction threshold energy. In the latter 
case, within the general scheme of equation (A3 .13.1) a reaction is thought to be induced by a sufficiently 
energetic pulse of electromagnetic radiation or by collisions with adequate high-energy collision partners. In 
the first case, a reaction is thought to be the last step after 
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a chain of excitation steps has transferred enough energy into the molecule to react either thermally, by 
collisions, or coherently, for instance by irradiation with infrared laser pulses. These pulses can be tuned to 
adequately excite vibrations along the reaction coordinate, the amplitudes of which become gradually larger 
until the molecule undergoes a sufficiently large structural change leading to the chemical reaction. 

Vibrational motion is thus an important primary step in a general reaction mechanism and detailed 
investigation of this motion is of utmost relevance for our understanding of the dynamics of chemical 
reactions. In classical mechanics, vibrational motion is described by the time evolution 9(t) and P(t) of 
general internal position and momentum coordinates. These time dependent functions are solutions of the 
classical equations of motion, e.g. Newton's equations for given initial conditions ?(7q) = q^ and P(t^) =p^ 

The definition of initial conditions is generally limited in precision to within experimental uncertainties A q 
and A/? , more fundamentally related by the Heisenberg principle A q^Ap^ = h/4n. Therefore, we need to 


consider an initial distribution F^q-q^ip-p^), with widths A g and Ap^ and the time evolution F t (q-$(t),p-P 
(0), which may be quite different from the initial distribution F Q , depending on the integrability of the 
dynamical system. Ideally, for classical, integrable systems, vibrational motion may be understood as the 
motion of narrow, well localized distributions F(q-^(t) 9 p-P(t)) (ideally 8-functions in a strict mathematical 
sense), centred around the solutions of the classical equations of motion. In this picture we wish to consider 
initial conditions that correspond to localized vibrational motion along specific manifolds, for instance a 
vibration that is induced by elongation of a single chemical bond (local mode vibrations) as a result of the 
interaction with some external force, but it is also conceivable that a large displacement from equilibrium 
might be induced along a single normal coordinate. Independent of the detailed mechanism for the generation 
of localized vibrations, harmonic transfer of excitation may occur when such a vibration starts to extend into 
other manifolds of the multidimensional space, resulting in trajectories that draw Lissajous figures in phase 
space, and also in configuration space [82] (see also [83]). Furthermore, if there is anharmonic interaction, 
IVR may occur. In [84, 85] this type of IVR was called classical intramolecular vibrational redistribution 
(CIVR). 

A3.1 3.4.2 IVR AND QUANTUM MECHANICS 

In time-dependent quantum mechanics, vibrational motion may be described as the motion of the wave packet 

I \\r(q,t) 1 2 in configuration space, e.g. as defined by the possible values of the position coordinates q. This 
motion is given by the time evolution of the wave function \\f(q,t), defined as the projection ( q I \|/(t)) of the 
time-dependent quantum state |\|/(t)) on configuration space. Since the quantum state is a complete description 
of the system, the wave packet defining the probability density can be viewed as the quantum mechanical 
counterpart of the classical distribution F(q-9(t),p- P(t)). The time dependence is obtained by solution of the 
time-dependent Schrodinger equation 


h * mn) =HMn) (A3-13.43) 


2,t dt 


where h is the Planck constant and //is the Hamiltonian of the system under consideration. Solutions depend 
on initial conditions I v|/(/ )) and may be formulated using the time evolution operator U(?,? ): 

|^(/» = U(t,h)\f(tay). (A 3.13.44) 
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Alternatively, in the case of incoherent (e.g. statistical) initial conditions, the density matrix operator F(t) = 
I \|/(0) (y(0 1 at time t can be obtained as the solution of the Liouville-von Neumann equation: 

PU) = UU. h)Pih)U f U. W (A 3.13.45) 

where U'(^ n ) is the adjoint of the time evolution operator (in strictly conservative systems, the time evolution 
operator is unitary and U T (^ ) = XJ~ (t,t ) = U(^ , 0)- 

The calculation of the time evolution operator in multidimensional systems is a formidable task and some 
results will be discussed in this section. An alternative approach is the calculation of semi-classical dynamics 
as demonstrated, among others, by Heller [86, 87 and 88], Marcus [89, 90], Taylor [91, 92], Metiu [93, 94] 
and coworkers (see also [83] as well as the review by Miller [ 95 ] for more general aspects of semiclassical 
dynamics). This method basically consists of replacing the 8-function distribution in the true classical 
calculation by a Gaussian distribution in coordinate space. It allows for a simulation of the vibrational 


quantum dynamics to the extent that interference effects in the evolving wave packet can be neglected. While 
the application of semi-classical methods might still be of some interest for the simulation of quantum 
dynamics in large polyatomic molecules in the near future, as a natural extension of classical molecular 
dynamics calculations [68, 96], full quantum mechanical calculations of the wave packet evolution in smaller 
polyatomic molecules are possible with the currently available computational resources. Following earlier 
spectroscopic work and three-dimensional quantum dynamics results [81, 92, 98, 99 and 100 ], Wyatt and 
coworkers have recently demonstrated applications of full quantum calculations to the study of IVR in 
fluoroform, with nine degrees of freedom [ 101 , 102 ] and in benzene [ 103 ], considering all 30 degrees of 
freedom [ 104 ], Such calculations show clearly the possibilities in the computational treatment of quantum 
dynamics and IVR. However, remaining computational limitations restrict the study to the lower energy 
regime of molecular vibrations, when all degrees of freedom of systems with more than three dimensions are 
treated. Large amplitude motion, which shows the inherently quantum mechanical nature of wave packet 
motion and is highly sensitive to IVR, cannot yet be discussed for such molecules, but new results are 
expected in the near future, as indicated in recent work on ammonia [ 105 , 106 ], formaldehyde and hydrogen 
peroxide [ 106 , 107 and 108 ], and hydrogen fluoride dimer [ 109 , 110 and 111 ] including all six internal 
degrees of freedom. 

A key feature in quantum mechanics is the dispersion of the wave packet, i.e. the loss of its Gaussian shape. 
This feature corresponds to a derealization of probability density and is largely a consequence of 
anharmonicities of the potential energy surface, both the ' diagonal' anharmonicity, along the manifold in 
which the motion started, and 'off diagonal', induced by anharmonic coupling terms between different 
manifolds in the Hamiltonian. Spreading of the wave packet into different manifolds is thus a further 
important feature of IVR. In [84, 85] this type of IVR was called derealization quantum intramolecular 
vibrational redistribution (DIVR). DIVR plays a central role for the understanding of statistical theories for 
unimolecular reactions in polyatomic molecules [84, 92], as will be discussed below. 

A3.13.4.3 IVR WITHIN THE GENERAL SCHEME OF ENERGY REDISTRIBUTION IN REACTIVE SYSTEMS 

As in classical mechanics, the outcome of time-dependent quantum dynamics and, in particular, the 
occurrence of IVR in polyatomic molecules, depends both on the Hamiltonian and the initial conditions, i.e. 
the initial quantum mechanical state I \|/(*q)). We focus here on the time-dependent aspects of IVR, and in this 
case such initial conditions always correspond to the preparation, at a time t^ of superposition states of 
molecular (spectroscopic) eigenstates involving at least two distinct vibrational energy levels. Strictly, IVR 
occurs if these levels involve at least two distinct 
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vibrational manifolds in terms of which the total (vibrational) Hamiltonian is not separable [84]- In a time- 
independent view, this requirement states that the wave functions belonging to the two spectroscopic states 
are spread in a non-separable way over the configuration space spanned by at least two different vibrational 
modes. The conceptual framework for the investigation of IVR may be sketched within the following scheme, 
which also mirrors the way we might investigate IVR in the time-dependent approach, both theoretically and 
experimentally: 

m ^ IWW.,1 mra)) 0^> mn} (A 3.13.46) 

In a first time interval [^_ 1? ^ ] of the scheme (A3. 13.46), a superposition state is prepared. This step 
corresponds to the step in equation (A3. 13.1) . One might think of a time evolution I \\f(t_^))—> I \|/(*q)) = U 
(£q, t_]) I X|/(/_i)X where I \|/(*_i)) ma Y be a molecular eigenstate and U D is the time evolution operator 
obtained from the interaction with an external system, to be specified below. The probability distribution I \|/ 
(q, tn) I is expected to be approximatively localized in configuration space, such that I \\j(g , t n ) I > for 


position coordinates q eM belonging to some specific manifold M and I i|/(g^ ) I - for coordinates q e 

Aibelonging to the complementary manifold M = M*. In a second time interval [^ ,^], the superposition 

state I \|/(*q)) has a free evolution into states I \|/(0) = U free (^ ) I \|/(^q)). This step corresponds to the 
intermediate step equation (A3. 13.47), occurring between the steps described before by equation (A3. 13.1) 
and equation (A3. 13.2) (see also equation (A3 .13.41) ): 

R" -^ R*\ (A 3.13.47) 

IVR is present if I \\f(q,t) 1 2 > is observed for t > t^ also for qeM. IVR may of course also occur during the 
excitation process, if its time scale is comparable to that of the excitation. 

In the present section, we concentrate on coherent preparation by irradiation with a properly chosen laser 
pulse during a given time interval. The quantum state at time t_, may be chosen to be the vibrational ground 

state I WO ) in the electronic ground state. In principle, other possibilities may also be conceived for the 
preparation step, as discussed in section A3. 13.1 , section A3. 13.2 and section A3. 13.3 . In order to determine 
superposition coefficients within a realistic experimental set-up using irradiation, the following questions need 
to be answered: (1) What are the eigenstates? (2) What are the electric dipole transition matrix elements? (3) 
What is the orientation of the molecule with respect to the laboratory fixed (linearly or circularly) polarized 
electric field vector of the radiation? The first question requires knowledge of the potential energy surface, or 

the Hamiltonian fi^{p,q) of the isolated molecule, the second that of the vector valued surface £**(q) of the 

electric dipole moment. This surface yields the operator, which couples spectroscopic states by the impact of 
an external irradiation field and thus directly affects the superposition procedure. The third question is indeed 
of great importance for comparison with experiments aiming at the measurement of internal wave packet 
motion in polyatomic molecules and has recently received much attention in the treatment of molecular 
alignment and orientation [ 112 , 113 ], including non-polar molecules [ 114 , 115 ]. To the best of our 
knowledge, up to now explicit calculations of multidimensional wave packet evolution in polyatomic 
molecules have been performed upon neglect of rotational degrees of freedom, i.e. only internal coordinates 
have been considered, although calculations on coherent excitation in ozone level structures with rotation 
exist [ 116 , 117 ], which could be interpreted in terms of wave packet evolution. A more detailed discussion of 
this point will be given below for a specific example. 
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A3.1 3.4.4 CONCEPTS OF COMPUTATIONAL METHODS 

There are numerous methods for solving the time dependent Schrodinger equation (A3. 13.43) , and some of 
them were reviewed by Kosloff [ 118 ] (see also [ 119 , 120 ]). Whenever projections of the evolving wave 
function on the spectroscopic states are useful for the detailed analysis of the quantum dynamics (and this is 
certainly the case for the detailed analysis of IVR), it is convenient to express the Hamiltonian based on 
spectroscopic states I (|) ): 

A) = Y" —<*>„ |(fr, r ) Wfo I (A3.13.48) 

where co n are the eigenfrequencies. For an isolated molecule /?= I? in equation (A3. 13.43) and the time 
evolution operator is of the form 

(A3. 13.49) 


n 

The time-dependent wave function is then given by the expression: 


Here, § n (q) = ( q I (|) n ) are the wave functions of the spectroscopic states and the coefficients c^ are 
determined from the initial conditions 


tiUrh) = X/i!*i<v)> C J = W ff !*('«))■ 


(A3.13.51) 


Equation (A3. 13.49) describes the spectroscopic access to quantum dynamics. Clearly, when the spectral 
structure becomes too congested, i.e. when there are many close lying frequencies a> n , calculation of all 
spectroscopic states becomes difficult. However, often it is not necessary to calculate all states when certain 
model assumptions can be made. One assumption concerns the separation of time scales. When there is 
evidence for a clear separation of time scales for IVR, only part of the spectroscopic states need to be 
considered for fast evolution. Typically, these states have large frequency separations, and considering only 
such states means neglecting the fine-grained spectral structure as a first approximation. An example for 
separation of time scales is given by the dynamics of the alkyl CH chromophore in CHXYZ compounds, 
which will be discussed below. This group span a three-dimensional linear space of stretching and bending 
vibrations. These vibrations are generally quite strongly coupled, which is manifested by the occurrence of a 
Fermi resonance in the spectral structure throughout the entire vibrational energy space. As we will see, the 
corresponding time evolution and IVR between these modes takes place in less than 1 ps, while other modes 
become involved in the dynamics on much longer time scales (10 ps to ns, typically). The assumption for time 
scale separation and IVR on the subpicosecond time scale for the alkyl CH chromophore was founded on the 
basis of 
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spectroscopic data nearly 20 years ago [98, 121 ]. The first results on the nature of IVR in the CH 
chromophore system and its role in IR photochemistry were also reported by that time [ 122 , 123 ], including 
results for the acetylenic CH chromophore [ 124 ] and results obtained from first calculations of the wave 
packet motion [81]. The validity of this assumption has recently been confirmed in the case of CHF 3 both 
experimentally, from the highly resolved spectral structure of highly excited vibrational overtones [ 125 , 126 ], 
and theoretically, including all nine degrees of freedom for modestly excited vibrational overtones up to 6000 
cm" 1 [ 102 ], 

A3.1 3.4.5 IVR DURING AND AFTER COHERENT EXCITATION: GENERAL ASPECTS 

Modern photochemistry (IR, UV or VIS) is induced by coherent or incoherent radiative excitation processes 
[4, 5, 6 and 7]. The first step within a photochemical process is of course a preparation step within our 
conceptual framework, in which time-dependent states are generated that possibly show IVR. In an ideal 
scenario, energy from a laser would be deposited in a spatially localized, large amplitude vibrational motion 
of the reacting molecular system, which would then possibly lead to the cleavage of selected chemical bonds. 
This is basically the central idea behind the concepts for a 'mode selective chemistry', introduced in the late 
1970s [127], and has continuously received much attention [10, H7, 122, 128, 129, 130, 131, 132, 133, 134 


and 135 ]. In a recent review [ 136 ], IVR was interpreted as a 'molecular enemy'of possible schemes for mode 
selective chemistry. This interpretation is somewhat limited, since IVR represents more complex features of 
molecular dynamics [37, 84, 134 ], and even the opposite situation is possible. IVR can indeed be selective 
with respect to certain structural features [85, 97] that may help mode selective reactive processes after 
tailored laser excitation [ 137 ], 

To be more specific, we assume that for a possible preparation step the Hamiltonian might be given during the 
preparation time interval [^_ 1? ^ ] by the expression: 

H = jf +$i(0 (A 3.13.52) 

where l? is the Hamiltonian of the isolated molecule and fl^ is the interaction Hamiltonian between the 

molecule and an external system. In this section, we limit the discussion to the case where the external system 
is the electromagnetic radiation field. For the interaction with a classical electromagnetic field with electric 
field vector £(t), the interaction Hamiltonian is given by the expression: 

Hi(0 = -jj£(a (A 3.13.53) 

where ^is the operator of the electric dipole moment. When we treat the interaction with a classical field in 
this way, we implicitly assume that the field will remain unaffected by the changes in the molecular system 
under consideration. More specifically, its energy content is assumed to be constant. The energy of the 
radiation field is thus not explicitly considered in the expression for the total Hamiltonian and all operators 
acting on states of the field are replaced by their time-dependent expectation values. These assumptions are 
widely accepted, whenever the number of photons in each field mode is sufficiently large. For a coherent, 

monochromatic, polarized field with intensity ' = v*u// J nl^l" ** 1 ^W cm *j n vacu0t> which is a typical 
value used in laser chemical experiments in the gas phase at low pressures, the number N y of mid infrared 

photons existing in a cavity of volume V= 1 nr is [138, p498"|: 
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N v = W 10"'. (A 3.13.54) 

Equation (A3. 13.54) legitimates the use of this semi-classical approximation of the molecule-field interaction 
in the low-pressure regime. Since lift) is explicitly time dependent, the time evolution operator is more 

complicated than in equation A3. 13.49 . However, the time-dependent wave function can still be written in the 
form 


&{q,t) =52r,i(OAi(</> 


(A 3.13.55) 


with time-dependent coefficients that are obtained by solving the set of coupled differential equations 

i^l = £{W m| . + V m -(T))cAr) (A 3.13.56) 


tt 


where W„ w = 5 w co„ (8 W is the Kronecker symbol, ca n were defined in equation (A3. 13.48) ) and 


" (A 3.13.57) 

/j 

The matrix elements ^IH^H-}are multidimensional integrals i ^W'^Www) dr of the vector valued dipole 
moment surface. The time-independent part of the coupling matrix elements in equation (A3. 13.57) can also 
be cast into the practical formula 


V^/{2n^ cm" 1 ) = -tUA()9.M^ r |/4/Dcbyc|^,) v // fl /MW cm-2, (A 3.13.58) 

where a is the direction of the electric field vector of the linearly polarized radiation field with maximal 
intensity /q. The solution of equation (A3. 13.56) may still be quite demanding, depending on the size of the 
system under consideration. However, it has become a practical routine procedure to use suitable 
approximations such as the QRA (quasiresonant approximation) or Floquet treatment [35, 122 , 129 ] and 
programmes for the numerical solution are available [ 139 , 140 ]. 

A3.1 3.4.6 ELECTRONIC EXCITATION IN THE FRANCK-CONDON LIMIT AND IVR 

At this stage we may distinguish between excitation involving different electronic states and excitation 
occurring within the same electronic (ground) state. When the spectroscopic states are located in different 
electronic states, say the ground (g) and excited (e) states, one frequently assumes the Franck-Condon 
approximation to be applicable: 
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Such electronic excitation processes can be made very fast with sufficiently intense laser fields. For example, 
if one considers monochromatic excitation with a wavenumber in the UV region (60 000 cm ) and a 

coupling strength (A* j§) / he * 4000 cm -1 (e.g. u « 1 Debye in equation (A3. 13.59), /« 50 TW cm -2 ), 

ge ge 

excitation occurs within 1 fs [ 141 ]. During such a short excitation time interval the relative positions of the 
nuclei remain unchanged (Franck approximation). Within these approximations, if one starts the preparation 

step in the vibrational ground state I (|) g Q), the resulting state I \|/(*q)) at time t^ has the same probability 
distribution as the vibrational ground state. However, it is now transferred into the excited electronic state 
where it is no longer stationary, since it is a superposition state of vibrational eigenstates in the excited 
electronic state: 

\tm = Ylu>« y \W)\4P). (A3 - 13 - 60) 

Often the potential energy surfaces for the ground and excited states are fairly different, i.e. with significantly 
different equilibrium positions. The state I \|/(*q)) will then correspond to a wave packet, which has nearly a 
Gaussian shape with a centre position that is largely displaced from the minimal energy configuration on the 
excited surface and, since the Franck approximation can be applied, the expectation value of the nuclear linear 
momentum vanishes. In a complementary view, the superposition state of equation (A3. 13.60) defines the 

manifold M in configuration space. It is often referred to as the 'bright'state, since its probability density 
defines a region in configuration space, the Franck-Condon region, which has been reached by the irradiation 


field through mediation by the electric dipole operator. After the preparation step, the wave packet most likely 
starts to move along the steepest descent path from the Franck-Condon region. One possibility is that it 
proceeds to occupy other manifolds, which were not directly excited. The occupation of the remaining, 
'dark'manifolds (e.g. «A4*) by the time-dependent wave packet is a characteristic feature of IVR. 

Studies of wave packet motion in excited electronic states of molecules with three and four atoms were 
conducted by Schinke, Engel and collaborators, among others, mainly in the context of photodissociation 
dynamics from the excited state [ 142 , 143 and 144 ] (for an introduction to photodissociation dynamics, see 
[7], and also more recent work [ 145 , 146 , 147 , 148 and 149 ] with references cited therein). In these studies, 
the dissociation dynamics is often described by a time-dependent displacement of the Gaussian wave packet 
in the multidimensional configuration space. As time goes on, this wave packet will occupy different 
manifolds (from where the molecule possibly dissociates) and this is identified with IVR. The dynamics may 
be described within the Gaussian wave packet method [ 150 ], and the vibrational dynamics is then of the 
classical IVR type (CIVR [84]). The validity of this approach depends on the dissociation rate on the one 
hand, and the rate of derealization of the wave packet on the other hand. The occurrence of DIVR often 
receives less attention in the discussions of photodissociation dynamics mentioned above. In [ 148 ], for 
instance, details of the wave packet motion by means of snapshots of the probability density are missing, but a 
derealization of the wave packet probably takes place, as may be concluded from inspection of figure 5 
therein. 
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A 3.13.5 IVR IN THE ELECTRONIC GROUND STATE: THE EXAMPLE 
OF THE CH CHROMOPHORE 

A3.1 3.5.1 REDISTRIBUTION DURING AND AFTER COHERENT EXCITATION 

A system that shows IVR with very fast spreading of the wave packet, i.e. DIVR in the subpicosecond time 
range, is that of the infrared alkyl CH chromophore, which will be used in the remaining part of this chapter to 
discuss IVR as a result of a mode specific excitation within the electronic ground state. The CH stretching and 
bending modes of the alkyl CH chromophore in CHXYZ compounds are coupled by a generally strong Fermi 
resonance [ 100 , 151 ], Figure A3. 13.4 shows the shape of the potential energy surface for the symmetrical 
compound CHD 3 as contour line representations of selected one- and two-dimensional sections (see figure 
caption for a detailed description). The important feature is the curved shape of the V(Q fa) potential section 
(V(Q^fai) being similarly curved), which indicates a rather strong anharmonic coupling. This feature is 
characteristic for compounds of the type CHXYZ [84, 100, 111, 152 and 153]. £ s , £?i* and t?h-are (mass 
weighted) normal coordinates of the CH stretching and bending motion, with symmetry A 1 and E, 
respectively, in the C 3v point group of symmetrical CHD 3 . A change of g s is a concerted motion of all atoms 
along the z-axis, defined in figure A3. 13.5 . However, displacements along g s are small for the carbon and 
deuterium atoms, and large for the hydrogen atom. Thus, this coordinate essentially describes a stretching 
motion of the CH bond (along the z-axis). In the same way, f?^ and (^describe bending motions of the CH 
bond along the x- and jy-axis, respectively (see figure A3. 13.5) . In the one-dimensional sections the positions 
of the corresponding spectroscopic states are drawn as horizontal lines. On the left-hand side, in the potential 

section F(t?i\), a total of 800 states up to an energy equivalent wave number of 25 000 cm has been 
considered. These energy levels may be grouped into semi-isoenergetic shells defined by multiplets of states 

with a constant chromophore quantum number ' 2 '* ' 2' '2 2,..., where v § and v b are 

quantum numbers of effective basis states ('Fermi modes' [97, 152 , 154 ]) that are strongly coupled by a 2:1 
Fermi resonance. These multiplets give rise to spectroscopic polyads and can be well distinguished in the 
lower energy region, where the density of states is low. 
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Figure A3. 13.4. Potential energy cuts along the normal coordinate subspace pertaining to the CH 
chromophore in CHD 3 . g bl is the A coordinate in C § symmetry, essentially changing structure along the x- 
axis see also Figure A3. 13.5 , and Q^ 2 is the A " coordinate, essentially changing structure along thej^-axis. 

Contour lines show equidistant energies at wave number differences of 3000 cm up to 30 000 cm . The 
upper curves are one-dimensional cuts along Q^ 2 (left) and Q s (right). The dashed curves in the two upper 

figures show harmonic potential curves (from [ 154 ]). 
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Figure A3. 13.5. Coordinates and axes used to describe the wave packet dynamics of the CH chromophore in 


CHX 3 or CHXYZ compounds. 

In the potential section V(Q S ), shown on the right hand side of figure A3. 13.4 the subset of A^ energy states is 
drawn. This subset contains only multiplets with integer values of the chromophore quantum number N = 
0,1,2,. . .. This reduction allows for an easier visualization of the multiplet structure and also represents the 
subset of states that are strongly coupled by the parallel component of the electric dipole moment (see 
discussion in the following paragraph). The excitation dynamics of the CH chromophore along the stretching 
manifold can indeed be well described by restriction to this subset of states [ 97 , 154 ]. 

Excitation specificity is a consequence of the shape of the electric dipole moment surface. For the alkyl CH 
chromophore in CHX 3 compounds, the parallel component of the dipole moment, i.e. the component parallel 
to the symmetry axis, is a strongly varying function of the CH stretching coordinate, whereas it changes little 
along the bending manifolds [ 155 , 156 ], Excitation along this component will thus induce preparation of 
superposition states lying along the stretching manifold, preferentially. These states thus constitute the 
'bright'manifold in this example. The remaining states define the 'dark'manifolds and any substantial 
population of these states during or after such an excitation process can thus be directly linked to the existence 
of IVR. On the other hand, the perpendicular components of the dipole moment vector are strongly varying 
functions of the bending coordinates. For direct excitation along one of these components, states belonging to 
the bending manifolds become the 'bright' states and any appearance of a subsequent stretching motion can be 
interpreted as arising from IVR. 

The following discussion shall illustrate our understanding of structural changes along 'dark'manifolds in 
terms of wave packet motion as a consequence of IVR. Figure A3. 13.6 shows the evolution of the wave 
packet for the CH chromophore in CHF 3 during the excitation step along the parallel (stretching) coordinate 
[97]. The potential surface in the CH chromophore subspace is similar to that for CHD 3 ( figure A3. 13.4 ) 
above), with a slightly more curved form in the stretching-bending representation (figures are shown in [97, 
151 ]). The laser is switched on at a given time t_ v running thereafter as a continuous, monochromatic 
irradiation up to time t^ when it is switched off. Thus, the electric field vector is given as 
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where h(t) is the Heaviside unit step function, Eq is the amplitude of the electric field vector and a> L = 2tt cv l 

its angular frequency. Excitation parameters are the irradiation intensity 7 = 30TW cm , which corresponds 

1 n 1 ■*- 1 

to a maximal electric field strength E^ « 3.4 x 10 iU V m , and wave number ^ L = 2832.42 cm , which lies 

in the region of the fundamental for the CH stretching vibration (see arrows in the potential cut V(Q s ) of 
figure (A3. 13.4) . The figure shows snapshots of the time evolution of the wave packet between 50 and 70 fs 
after the beginning of the irradiation (t_^ = here). On the left-hand side, contour maps of the time-dependent, 
integrated probability density 

llH£ St Gb>r}| 2 = / hKfisifib-¥W)| 2 dVb (A3.13.62) 


J<f*> 


are shown, where g s is the coordinate for the stretching motion and V ^ h ' ^ b - ' ^= arctan((? N / t?K) are 

polar representations of the bending coordinates t?* and Gi\. Additionally, contour curves of the potential 
energy surface are drawn at the momentary energy of the wave packet. This energy is defined as: 

(A3. 13.63) 


E(i) = J^E ft p, t (i) 


where 


P*(t) = 4(t)€ n (t) 


(A3. 13.64) 


are the time-dependent populations of the spectroscopic states during the preparation step (the complex 
coefficients c n (t) in equation (A3. 13.64) are calculated according to equation (A3. 13.55) , the spectroscopic 

energies * = ^^"are defined in equation (A3. 13.48) ; the dashed curves indicate the quantum mechanical 
uncertainty which arises from the superposition of molecular eigenstates). The same evolution is repeated on 
the right-hand side of the figure as a three-dimensional representation. 


-25- 


■io o r» 




54,0 f* 



5fl. Q ft 




62.0 t* 




Figure A3. 13.6. Time evolution of the probability density of the CH chromophore in CHF 3 after 50 fs of 
irradiation with an excitation wave number v L = 2832.42 cm -1 at an intensity 7 = 30 TW cm -2 . The contour 

^ 1 9 

lines of equiprobability density in configuration space have values 2 x 10 u pm for the lowest line 
shown and distances between the lines of 24, 15, 29 and 20 x 10 u pm in the order of the four images 


shown. The averaged energy of the wave packet corresponds to 6000 cm (roughly 3100 cm absorbed) 
with a quantum mechanical uncertainty of ±3000 cm -1 (from [97]). 
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In the treatment adopted in [92], the motion of the CF 3 frame is implicitly considered in the dynamics of the 

I I 9 

normal modes. Indeed, the integrand I ^f(QJQ^,^t) I in equation (A3 .13.62) is to be interpreted as 
probability density for the change of the CHF 3 structure in the subspace of the CH chromophore, as defined 

by the normal coordinates Q^ O^and (Ji% irrespective of the molecular structure and its change in the 

remaining space. This interpretation is also valid beyond the harmonic approximation, as long as the structural 
change in the CH chromophore space can be dynamically separated from that of the rest of the molecule. The 
assumption of dynamical separation is well confirmed, both from experiment and theory, at least during the 
first 1000 fs of motion of the CH chromophore. 

When looking at the snapshots in figure A3. 13.6 we see that the position of maximal probability oscillates 
back and forth along the stretching coordinate between the walls at Q s = -20 and +25 v N pm, with an 

approximate period of 12 fs, which corresponds to the classical oscillation period x = 1 / v of a pendulum with 
a frequency v = c^v ~ 8.5 x 10 s and wave number is= 2850 cm . Indeed, the motion of the whole wave 

packet approximately follows this oscillation and, when it does so, the wave packet motion is semiclassical. In 
harmonic potential wells the motion of the wave packet is always semiclassical [ 157 , 158 and 159 ]. However, 
since the potential surface of the CH chromophore is anharmonic, some gathering and spreading out of the 
wave packet is observable on top of the semiclassical motion. It is interesting to note that, at this 'initial' stage 
of the excitation step, the motion of the wave packet is nearly semiclassical, though with modest amplitudes 
of the oscillations, despite the anharmonicity of the stretching potential. 

The later time evolution is shown in figure A3. 13.7 between 90 and 100 fs, and in figure A3. 13.8 , between 
390 and 400 fs, after the beginning of the excitation (time step t_^). Three observations are readily made: first, 

the amount of energy absorbed by the chromophore has increased, from 3000 cm in figure A3. 13.6 , to 6000 
cm in figure A3. 13.7 and 12 000 cm in figure A3. 13.8 . Second, the initially semiclassical motion has 
been replaced by a more irregular motion of probability density, in which the original periodicity is hardly 
visible. Third, the wave packet starts to occupy nearly all of the energetically available region in configuration 
space, thus escaping from the initial, 'bright'manifolds into the 'dark 'manifolds. From these observations, the 
following conclusions may be directly drawn: IVR of the CH chromophore in fluoroform is fast (in the 
subpicosecond time scale); IVR sets in already during the excitation process, i.e. when an external force field 
is driving the molecular system along a well prescribed path in configuration space (the 'bright'manifold); 
IVR is of the derealization type (DIVR). Understanding these observations is central for the understanding of 
IVR and they are discussed as follows: 

(a) A more detailed analysis of quantum dynamics shows that the molecular system, represented by the group 
of vibrations pertaining to the CH chromophore in this example, absorbs continuously more energy as time 
goes on. Let the absorbed energy be E ^ = N v abs (h/27i)co L , where N v abs is the mean number of absorbed 
photons. Since the carrier frequency of the radiation field is kept constant at a value close to the fundamental 
of the stretching oscillation, a> L « <$ N = 1 - (£ >n = (N being the chromophore quantum number here), this 
means that the increase in absorbed energy is a consequence of the stepwise multiphoton excitation process, in 
which each vibrational level serves as a new starting level for further absorption of light after it has itself been 
significantly populated. This process is schematically represented, within the example for CHD 3 , by the 
sequence of upright arrows shown on the right-hand side of figure A3 . 1 3 .4 N v a o Q is thus a smoothly 
increasing function of time. 
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Figure A3. 13.7. Continuation of the time evolution for the CH chromophore in CHF 3 after 90 fs of irradiation 

(see also figure A3. 13.6 ). Distances between the contour lines are 10, 29, 16 and 9 x 10~^ u pm in the 
order of the four images shown. The averaged energy of the wave packet corresponds to 9200 cm (roughly 
6300 cm -1 absorbed) with a quantum mechanical uncertainty of ±5700 cm -1 (from [97]). 
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(b) The disappearance of the semiclassical type of motion and, thus, the derealization of the wave packet, is 
understood to follow the onset of dephasing. With increasing energy, both the effective anharmonic couplings 
between the 'bright 'stretching mode and the 'dark'bending modes, as well as the diagonal anharmonicity of 


the 'bright'mode increase. The larger the anharmonicity, the larger the deviation from a purely harmonic 
behaviour, in which the wave packet keeps on moving in a semiclassical way. In quantum mechanics, the 
increase in anharmonicity of an oscillator leads to an effective broadening Av eff > in the distribution of 
frequencies of high-probability transitions — for transitions induced by the electric dipole operator usually 
those with a difference of ±1 in the oscillator quantum number (for the harmonic oscillator Av ff = 0). On the 
other hand, these are the transitions which play a major role in the stepwise multiphoton excitation of 
molecular vibrations. A broadening of the frequency distribution invariably leads to a broadening of the 
distribution of relative phases of the time-dependent coefficients c (t) in equation (A3. 13.55) . Although the 
sum in equation (A3. 13.55) is entirely coherent, one might introduce an effective coherence time defined by: 


^eJT = 1Mi>eir (A 3.13.65) 

For the stretching oscillations of the CH chromophore in CHF 3 x c eff « 100 fs. Clearly, typical coherence time 
ranges depend on both the molecular parameters and the effectively absorbed amount of energy during the 
excitation step, which in turn depends on the coupling strength of the molecule-radiation interaction. A more 
detailed study of the dispersion of the wave packet and its relationship with decoherence effects was carried 
out in [ 106 ]. In [ 97 ] an excitation process has been studied for the model of two anharmonically coupled, 
resonant harmonic oscillators (i.e. with at least one cubic coupling term) but under similar conditions as for 
the CH chromophore in fluoroform discussed here. When the cubic coupling parameter is chosen to be very 
small compared with the diagonal parameters of the Hamilton matrix, the motion of the wave packet is indeed 
semiclassical for very long times (up to 600 ps) and, moreover, the wave packet does probe the bending 
manifold without significantly changing its initial shape. This means that, under appropriate conditions, IVR 
can also be of the classical type within a quantum mechanical treatment of the dynamics. Such conditions 
require, for instance, that the band width Mv eff be smaller than the resonance width (power broadening) of 
the excitation process. 

(c) The third observation, that the wave packet occupies nearly all of the energetically accessible region in 
configuration space, has a direct impact on the understanding of IVR as a rapid promotor of microcanonical 
equilibrium conditions. Energy equipartition preceding a possible chemical reaction is the main assumption in 
quasiequilibrium statistical theories of chemical reaction dynamics ('RRKM' theory [ 161 , 162 and 163 ], 
'transition state'theory [ 164 , 165 ] but also within the 'statistical adiabatic channel model' [76, 77]; see also 
chapter A3. 12 and further recent reviews on varied and extended forms of statistical theories in [25, 166 , 167 , 
168 , 169 , 170 , 171 and 172 ]). In the case of CHF 3 one might conclude from inspection of the snapshots at the 
later stage of the excitation dynamics (see figure A3. 13.8 ) that after 400 fs the wave packet derealization is 
nearly complete. Moreover, this derealization arises here from a fully coherent, isolated evolution of a 
system consisting of one molecule and a coherent radiation field (laser). Of course, within the common 
interpretation of the wave packet as a probability distribution in configuration space, this result means that, for 
an ensemble of identically prepared molecules, vibrational motion is essentially delocalized at this stage and 
vibrational energy is nearly equipartitioned. 
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Figure A3. 13.8. Continuation of the time evolution for the CH chromophore in CHF 3 after 392 fs of 
irradiation (see also figure A3. 13.6 and figure A3. 13.7 ). Distances between the contour lines are 14, 12, 13 

^ 1 9 

and 14 x 10 _J u~ pm in the order of the four images shown. The averaged energy of the wave packet 
corresponds to 15 000 cm -1 (roughly 12 100 cm -1 absorbed) with a quantum mechanical uncertainty of 
±5800 cm" 1 (from \97]). 
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However, the wave packet does not occupy all of the energetically accessible region. A more detailed analysis 
of populations [97, table IV] reveals that, during the excitation process, the absorbed energy is 
inhomogeneously distributed among the set of molecular eigenstates of a given energy shell (such a shell is 
represented by all nearly iso-energetic states belonging to one of the multiplets shown on the right-hand side 
of figure A3. 13.4 . Clearly, equipartition of energy is attained, if all states of an energy shell are equally 
populated. The microcanonical probability distribution in configuration space may then be represented by a 
typical member of the microcanonical ensemble, defined e.g. by the wave function 


V "shell „ e shell 


(A3. 13.66) 


where A^ shell denotes the number of nearly iso-energetic states § n of a shell and (p n random i s a random phase. 
Such a state is shown in figure A3. 13.9 . When comparing this state with the state generated by multiphoton 
excitation, the two different kinds of superposition that lead to these wave packets must, of course, be 
distinguished. In the stepwise multiphoton excitation, the time evolved wave packet arises from a 
superposition of many states in several multiplets (with roughly constant averaged energy after some 
excitation time and a large energy uncertainty). The microcanonical distribution is given by the superposition 
of states in a single multiplet (of the same averaged energy but much smaller energy uncertainty). In the case 
of the CH chromophore in CHF 3 studied in this example, the distribution of populations within a molecular 
energy shell is not homogeneous during the excitation process because the multiplets are not ideally centred at 
the multiphoton resonance levels and their energy range is effectively too large when compared to the 
resonance width of the excitation process (power broadening). If molecular energy shells fall entirely within 
the resonance width of the excitation, such as in the model systems of two harmonic oscillators studied in 
[97], population distribution within a shell becomes more homogeneous [ 97 , table V]. However, as discussed 
in that work, equidistribution of populations does not imply that the wave packet is delocalized. Indeed, the 
contrary was shown to occur. If the probability distribution in configuration space is to delocalize, the relative 
phases between the superposition states must follow an irregular evolution, such as in a random phase 
ensemble, in addition to equidistribution of population. Thus, one statement would therefore be that IVR is 
not complete, although very fast, during the multiphoton excitation of CHF 3 . Excitation and redistribution are 
indeed two concurring processes. In the limit of weak field excitation, in the spectroscopic regime, the result 
is a superposition of essentially two eigenstates (the ground and an excited state, for instance). Within the 
'bright'state concept, strong IVR will be revealed by an instantaneous derealization of probability density, 
both in the 'bright'and the 'dark'manifolds, as soon as the excited state is populated, because the excited state 
is, of course, a superposition state of states from both manifolds. On the other hand, strong field stepwise IR 
multiphoton excitation promotes, in a first step, the deposition of energy in a spatially localized, time- 
dependent molecular structure. Simultaneously, IVR starts to induce redistribution of this energy among other 
modes. The redistribution becomes apparent after some time has passed and is expected to be of the DIVR 
type, at least on longer time scales. DIVR may lead to a complete redistribution in configuration space, if the 
separation between nearly iso-energetic states is small compared to the power broadening of the excitation 
field. However, under such conditions, at least during an initial stage of the dynamics, CIVR will dominate. 


-31- 



'.■■:■■■■■■ 



Figure A3. 13.9. Probability density of a microcanonical distribution of the CH chromophore in CHF 3 within 
the multiplet with chromophore quantum number N= 6 (Af shell = N+ 1 = 7). Representations in configuration 
space of stretching (g s ) and bending (Q^) coordinates (see text following ( equation (A3. 13.62) ) and figure 
A3. 13. 10 ). Left-hand side: typical member of the microcanonical ensemble of the multiplet with N = 6 


(random phases, ( equation (A3. 13.66) )). Right-hand side: microcanonical density J 
the multiplet with N = 6 (# shell = 7). Adapted from [81]. 
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In view of the foregoing discussion, one might ask what is a typical time evolution of the wave packet for the 
isolated molecule, what are typical time scales and, if initial conditions are such that an entire energy shell 
participates, does the wave packet resulting from the coherent dynamics look like a microcanonical 


distribution? Such studies were performed for the case of an initially pure stretching 'Fermi mode'(v s , v b = 0), 
with a high stretching quantum number, e.g. v § = 6. It was assumed that such a state might be prepared by 
irradiation with some hypothetical laser pulse, without specifying details of the pulse. The energy of that state 
is located at the upper end of the energy range of the corresponding multiplet [81, 152 , 154 ], which has a total 
of A^ shell = 7 states. Such a state couples essentially to all remaining states of that multiplet. The corresponding 
evolution of the isolated system is shown as snapshots after the preparation step (7 Q = 0) in figure A3. 13. 10 . 
The wave packet starts to spread out from the initially occupied stretching manifold (along the coordinate axis 
denoted by Q s ) into the bending manifold (g b ) within the first 30-45 fs of evolution (left-hand side). Later on, 
it remains delocalized most of the time (as shown at the time steps 80, 220 and 380 fs, on the right-hand side) 
with exceptional partial recovery of the initial conditions at some isolated times (such as at 125 fs). The shape 
of the distribution at 220 fs is very similar to that of a typical member of the microcanonical ensemble in 
figure A3. 13.9 above. However, in figure A3. 1 3.9 , the relative phases between the seven superposition states 
were drawn from a random number generator, whereas in figure A3. 13. 10 they result from a fully coherent 
and deterministic propagation of a wave function. 
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Figure A3. 13. 10. Time-dependent probability density of the isolated CH chromophore in CHF 3 . Initially, the 
system is in a 'Fermi mode'with six quanta of stretching and zero of bending motion. The evolution occurs 
within the multiplet with chromophore quantum number N= 6 (^V shell = 7V + 1 =7). Representations are given 
in the configuration space of stretching (g s ) and bending (Q^) coordinates (see text following ( equation 
(A3. 13.62) ): Q^ is strictly a positive quantity, and there is always a node at Q^ = 0; the mirrored 
representation at g b = is artificial and serves to improve visualization). Adapted from [81]. 


IVR in the example of the CH chromophore in CHF 3 is thus at the origin of a redistribution process which is, 
despite its coherent nature, of a statistical character. In CHD 3 , the dynamics after excitation of the stretching 
manifold reveals a less complete redistribution process in the same time interval [97]. The reason for this is a 
smaller effective coupling constant £' sbb between the 'Fermi modes'of CHD 3 (by a factor of four) when 


compared to that of CHF 3 . In [97] it was shown that redistribution in CHD 3 becomes significant in the 
picosecond time scale. However, on that time scale, the dynamical separation of time scales is probably no 
longer valid and couplings to modes pertaining to the space of CD 3 vibrations may become important and 
have additional influence on the redistribution process. 
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A3.1 3.5.2 IVR AND TIME-DEPENDENT CHIRALITY 

IVR in the CH chromophore system may also arise from excitation along the bending manifolds. Bending 
motions in polyatomic molecules are of great importance as primary steps for reactive processes involving 
isomerization and similar, large amplitude changes of internal molecular structure. At first sight, the one- 
dimensional section of the potential surface along the out-of-plane CH bending normal coordinate in CHD 3 , 
shown in figure A3. 13.4 is clearly less anharmonic than its one-dimensional stretching counterpart, also 

shown in that figure, even up to energies in the wave number region of 30 000 cm . This suggests that 
coherent sequential multiphoton excitation of a CH bending motion, for instance along the x-axis in figure 
A3. 13.5 may induce a quasiclassical motion of the wave packet along that manifold [ 159 , 160 ], which is 
significantly longer lived than the motion induced along the stretching manifold under similar conditions (see 
discussion above). Furthermore, the two-dimensional section in the CH bending subspace, spanned by the 
normal coordinates in the lower part of figure A3. 13.4 is approximately isotropic. This corresponds to an 
almost perfect C^ symmetry with respect to the azimuthal angle cp (in the xy plane of figure A3. 13.5 , and is 
related to the approximate conservation of the bending vibration angular momentum t^ [ 152 , 173 ]. This 

implies that the direct anharmonic coupling between the degenerate bending manifolds is weak. However, 
IVR between these modes might be mediated by the couplings to the stretching mode. An interesting question 
is then to what extent such a coupling scheme might lead to a motion of the wave packet with quasiclassical 
exchange of vibrational energy between the two bending manifolds, following paths which could be described 
by classical vibrational mechanics, corresponding to CIVR. Understanding quasiclassical exchange 
mechanisms of large amplitude vibrational motion opens one desirable route of exerting control over 
molecular vibrational motion and reaction dynamics. In [ 154 ] these questions were investigated by 
considering the CH bending motion in CHD 3 and the asymmetric isotopomers CHD 2 T and CHDT 2 . The 
isotopic substitution was investigated with the special goal of a theoretical study of the coherent generation of 
dynamically chiral, bent molecular structures [ 174 ] and of the following time evolution. It was shown that 
IVR is at the origin of a coherent racemization dynamics which is superposed to a very fast, periodic 
exchange of left- and right-handed chiral structures ('stereomutation' reaction, period of roughly 20 fs, 
comparable to the period of the bending motion) and sets in after typically 300-500 fs. The main results are 
reviewed in the discussion of figure A3.13.ll figure A3. 13. 12 and figure A3. 13. 13 . 
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Figure A3.13.ll. Illustration of the time evolution of reduced two-dimensional probability densities I \|/ bb I 
and I \|/ b 1 2 , for the excitation of CHD 3 between 50 and 70 fs (see [ 154 ] for further details). The full curve is a 
cut of the potential energy surface at the momentary absorbed energy corresponding to 3000 cm during the 
entire time interval shown here («6000 cm -1 , if zero point energy is included). The dashed curves show the 
energy uncertainty of the time-dependent wave packet, approximately 500 cm -1 . Left-hand side: excitation 
along the x-axis (see figure A3. 13.5). The vertical axis in the two-dimensional contour line representations is 

I 19 I 19 

the g b ,-axis, the horizontal axes are Q^ 2 and Q s , for I \|/ bb I and I \|/ sb I , respectively. Right-hand side: 
excitation along the j-axis, but with the field vector pointing into the negative j-axis. In the two-dimensional 

contour line representations, the vertical axis is the g b2 -axis, ^ e horizontal axes are Q bl and Q s , for I \|/ bb I 
and I \|/„ u 1 2 , respectively. The lowest contour line has the value 44 x 10" 5 u" 1 pm" 2 , the distance between 


'^sb 1 


them is 7 x 10 D u pm . Maximal values are nearly constant for all the images in this figure and correspond 
to 140 x 10" 5 u" 1 pm" 2 for I y bb 1 2 and 180 x 10" 5 u" 1 pm" 2 for I \|/ gb 1 2 . 
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The wave packet motion of the CH chromophore is represented by simultaneous snapshots of two- 
dimensional representations of the time-dependent probability density distribution 


(A3. 13.67) 


and 


-/, 


itaO. O bl , fib:)! 2 = / dQMit, Q s , Q bl , at:)| 2 - (A3.13.68) 


Such a sequence of snapshots, calculated in intervals of 4 fs, is shown as a series of double contour line plots 
on the left-hand side of figure A3.13.ll (the outermost row shows the evolution of I \|/ bb 1 2 , equation 
(A3. 13.68), the innermost row is I \|/ gb 1 2 , equation (A3. 13.67), at the same time steps). This is the wave packet 
motion in CHD^ for excitation with a linearly polarized field along the the x-axis at 1300 cm and 10 TW 
cm after 50 fs of excitation. At this point a more detailed discussion regarding the orientational dynamics of 
the molecule is necessary. Clearly, the polarization axis is defined in a laboratory fixed coordinate system, 
while the bending axes are fixed to the molecular frame. Thus, exciting internal degrees of freedom along 
specific axes in the internal coordinate system requires two assumptions: the molecule must be oriented or 
aligned with respect to the external polarization axis, and this state should be stationary, at least during the 
relevant time scale for the excitation process. It is possible to prepare oriented states [ 112 , 114 , 115 ] in the gas 
phase, and such a state can generally be represented as a superposition of a large number of rotational 
eigenstates. Two questions become important then: How fast does such a rotational superposition state 
evolve? How well does a purely vibrational wave packet calculation simulate a more realistic calculation 
which includes rotational degrees of freedom, i.e. with an initially oriented rotational wave packet? The 
second question was studied recently by full dimensional quantum dynamical calculations of the wave packet 
motion of a diatomic molecule during excitation in an intense infrared field [ 175 ], and it was verified that 
rotational degrees of freedom may be neglected whenever vibrational-rotational couplings are not important 
for intramolecular rotational-vibrational redistribution (IVRR) [84]. Regarding the first question, because of 
the large rotational constant of methane, the time scales on which an initially oriented state of the free 
molecule is maintained are likely to be comparatively short and it would also be desirable to carry out 
calculations that include rotational states explicitly. Such calculations were done, for instance, for ozone at 
modest excitations [ 116 , 117 ], but they would be quite difficult for the methane isotopomers at the high 
excitations considered in the present example. 

The multiphoton excitation scheme corresponding to excitation along the x-axis is shown by the upright 
arrows on the left-hand side of figure A3. 13.4 . In the convention adopted in [ 154 ], nuclear displacements 
along Q+, occur along the x-axis, displacements along (2h.are directed along the jy-axis. One observes a 

semiclassical, nearly periodic motion of the wave packet along the excited manifold with a period of 
approximately 24 fs, corresponding to the frequency of the bending vibrations in the wave number region 

around 1500 cm -1 . At this stage of the excitation process, the motion of the wave packet is essentially one- 
dimensional, as seen from the trajectory followed by the maximum of the probability distribution and its 
practically unchanged shape during the oscillations back and forth between the turning points. The latter lie on 
the potential energy section defined by the momentary energy E(t) of the wave packet, as described above, 
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and describe the classically accessible region in configuration space. These potential energy sections are 
shown by the continuous curves in the figures, which are surrounded by dotted curves describing the energy 
uncertainty. 

The sequence on the right-hand side of figure A3.13.ll shows wave packets during the excitation along thejy- 
axis. Here, excitation was chosen to be antiparallel to the jy-axis (EJ\ -\i ). This choice induces a phase shift 
of n between the two wave packets shown in the figure, in addition to forcing oscillations along different 
directions. Excitation along thejy-axis can be used to generate dynamically chiral structures. If the excitation 
laser field is switched off, e.g. at time step 70 fs after beginning the excitation, the displacement of the wave 

o 

packet clearly corresponds to a bent molecular structure with angle « 10 (e.g. in the xy plane of figure 


A3. 13.5 . This structure will, of course, also change with time for the isolated molecule, and one expects this 
change to be oscillatory, like a pendulum, at least initially. Clearly, IVR will play some role, if not at this 
early stage, then at some later time. One question is, will it be CIVR or DIVR? When studying this question 
with the isotopically substituted compounds CHD 2 T and CHDT 2 , the jy-axis being perpendicular to the C § 
mirror plane, a bent CH chromophore corresponds to a chiral molecular structure with a well defined chirality 
quantum number, say R. As time evolves, the wave packet moves to the other side of the symmetry plane, 
Q\,,= 0, implying a change of chirality. In this context, the enantiomeric excess can be defined by the 

probability 


vo = j 


<l<2»JlM'. £?fc)| 2 (A3. 13.69) 


-X; 


for right-handed ('R') chiral structures (P L (t) = 1 - P R (0 is the probability for left-handed ('L') structures), 
where 

l#bj(n <?b,)I 2 = / / dGsdGiJ^ Q*< Q* - (?b)l 2 - (A3.13.70) 

The time evolution of P^(t) is shown in figure A3. 13. 12 for the field free motion of wave packets for CHD 2 T 
and CHDT 2 prepared by a preceding excitation along the jy-axis. 

In the main part of each figure, the evolution of P R calculated within the stretching and bending manifold of 
states for the CH chromophore is shown (full curve). The dashed curve shows the evolution of P R within a 
one-dimensional model, in which only the Q ^-bending manifold is considered during the dynamics. Within 
this model there is obviously no IVR, and comparison of the full with the dashed curves helps to visualize the 
effect of IVR. The insert on the left-hand side shows a survey of the evolution of P R for the one-dimensional 
model during a longer time interval of 2 ps, while the insert on the right-hand side shows the evolution of P R 
for the calculation within the full three-dimensional stretching and bending manifold of states during the same 
time interval of 2 ps. The three-dimensional calculations yield a fast, initially nearly periodic, evolution, with 
an approximate period of 20 fs, which is superimposed by a slower decay of probability corresponding to an 
overall decay of enantiomeric excess \D ^(t) I = I l-2P R (f) I on a time scale of 300-400 fs for both CHD 2 T 
and CHDT 2 . The decay is clearly more pronounced for CHD 2 T ( figure A3.13.12 )a)). The first type of 
evolution corresponds to a stereomutation reaction, while the second can be interpreted as racemization. A 
further question is then related to the origin of this racemization. 


-37- 


2.0 1111111 lll|IIMIIMI|IIIIMIII|MIIII1ll|lllllllir 


1,5k 

M 

CU 10 

• 5 ' '- 


.0, 



M 

ii 

k 
1 

^F . - . . , . _i- \ 


1 '* 

PI 


"■ r. «"i 




:| if. ! 1 SUi«:-Vl^! ! "!- - !-' l! \i v v ; - 

mtiLaj/ihiMiMliiWl^iVirttfJi^Miiuliiiiuiiil 


0. 100. 200. 300. 4D0. 500, 

t / fs 


2.0 i n 1 1 1 h 1 1 |i m n m 1 1 1 1 i i 1 1 m 1 1 r 1 1 1 1 n 1 1 i n 



f i ii 1 f n 1 i 1 1 1 1 1 1 


100. 200. 300. 400. 500. 
t / fs 


Figure A3. 13. 12. Evolution of the probability for a right-handed chiral structure P^(t) (full curve, see 
( equation (A3. 13.69) )) of the CH chromophore in CHD 2 T (a) and CHDT 2 (b) after preparation of chiral 
structures with multiphoton laser excitation, as discussed in the text (see also [ 154 ]). For comparison, the time 
evolution of P R according to a one-dimensional model including only the Q^ 2 bending mode (dashed curve) is 
also shown. The left-hand side insert shows the time evolution of P R within the one-dimensional calculations 
for a longer time interval; the right-hand insert shows the P R time evolution within the three-dimensional 
calculation for the same time interval (see text). 
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Figure A3. 13. 13. Illustration of the time evolution of reduced two-dimensional probability densities I \|/ bb I 
and I \|/ sb 1 2 , for the isolated CHD 2 T (left-hand side) and CHDT- (right-hand side) after 800 fs of free 
evolution. At time fs the wave packets corresponded to a localized, chiral molecular structure (from [ 154 ]). 
See also text and figure A3.13.ll . 
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Figure A3. 13. 14 shows the wave packet motion for CHD 2 T and CHDT 2 , roughly 800 fs after the initially 
localized, chiral structure has been generated. Comparison with the wave packet motion allows for the 
conclusion that racemization is induced by the presence of DIVR between all vibrational modes of the CH 
chromophore. However, while DIVR is quite complete for CHD 2 T, after excitation along the y axis, it is only 
two-dimensional for CHDT 2 . A localized exchange of vibrational energy in terms of CIVR has not been 
observed at any intermediate time step. Racemization is stronger for CHD 2 T, for which DIVR occurs in the 
full three-dimensional subspace of the CH chromophore, under the present conditions. It is less pronounced 
for CHDT 2 , which has a higher degree of localization of the wave packet motion. In comparison with the one- 
dimensional calculations in figure A3. 13. 12 , it becomes evident that there is a decay of the overall 


enantiomeric excess for CHD 2 T, as well as for CHDT 2 , also in the absence of IVR. The decay takes place on 
a time scale of 500-1000 fs and is a consequence of the dephasing of the wave packet due to the diagonal 
anharmonicity of the bending motion. This decay may, of course, also be interpreted as racemization. 
However, it is much less complete than racemization in the three-dimensional case and clearly of secondary 
importance for the enantiomeric decay in the first 200 fs. 
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Figure A3. 13. 14. Illustration of the quantum evolution (points) and Pauli master equation evolution (lines) in 
quantum level structures with two levels (and 59 states each, left-hand side) and three levels (and 39 states 
each, right-hand side) corresponding to a model of the energy shell IVR (horizontal transition in figure 

A3. 13.1 ). From [38]. The two-level structure (left) has two models: I V..\ 2 = const and random signs (upper 
part), random V f - but -V m < V-. < V m (lower part). The right-hand side shows an evolution with initial diagonal 
density matrix (upper part) and a single trajectory (lower part). 
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A 3.13.6 STATISTICAL MECHANICAL MASTER EQUATION 
TREATMENT OF INTRAMOLECULAR ENERGY REDISTRIBUTION IN 
REACTIVE MOLECULES 


The previous sections indicate that the full quantum dynamical treatment of IVR in an intermediate size 
molecule even under conditions of 'coherent 'excitation shows phenomena reminiscent of relaxation and 
equilibration. This suggests that, in general, at very high excitations in large polyatomic molecules with 
densities of states easily exceeding the order of 10 cm (or about 10 molecular states in an energy interval 
corresponding to 1 J mol -1 ), a statistical master equation treatment may be possible [ 38 , 122 ], Such an 
approach has been justified by quantum simulations in model systems as well as analytical considerations 
[38], following early ideas in the derivation of the statistical mechanical Pauli equation [ 176 ]. Figure 
A3. 13. 14 shows the kinetic behaviour in such model systems. The 'coarse grained 'populations of groups of 
quantum states ('levels'with less than 100 states, indexed by capital letters /and J) at the same total energy 
show very similar behaviour if calculated from the Schrodinger equation, e.g. equation (A3. 13.43) , or the 
Pauli equation (A3. 13.71) , 


pU) = Y(Oj)(0). (A 3.13.71) 

with Y being given by: 

Y{r) = e\p{Kr), (A 3.13.72) 

and the rate coefficient matrix elements in the limit of perturbation theory 

K JJ = 27T\V 1J \ 2 /Sf. (A3.13.73) 


In equation (A3. 13.73), 8 l is the average angular frequency distance between quantum states within level I 

I I 9 

and I V i ■ I is the average square coupling matrix element (as angular frequency) between the quantum states 
in levels / and J (of total number of states TVj and TVj, respectively) and is given by: 


1 1 


■Vj 


' J r = \ j=\ 

Figure A3. 13. 14 seems to indicate that the Pauli equation is a strikingly good approximation for treating IVR 
under these conditions, involving even relatively few quantum states. This is, however, only true in this 
simple manner because we assume in the model that all the couplings are randomly distributed around their 
'typical'values. This excludes any symmetry selection rules for couplings within the set of quantum states 
considered [ 177 ]. More generally, one has to consider only sets of quantum states with the same set of good 
(conserved) quantum numbers in such a master equation treatment. It is now well established that even in 
complex forming collisions leading to maximum energy transfer by IVR ( section A3. 13.3.3 ), conserved 
quantum numbers such as nuclear spin symmetry 
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and parity lead to considerable restrictions [ 177 , 178 ]. More generally, one has to identify approximate 
symmetries on short time scales, which lead to further restrictions on the density of strongly coupled states 
[ 179 ]. Thus, the validity of a statistical master equation treatment for IVR in large polyatomic molecules is 
not obvious a priori and has to be established individually for at least classes of molecular systems, if not on a 
case by case basis. 

Figure A3. 13. 15 shows a scheme for such a Pauli equation treatment of energy transfer in highly excited 
ethane, e.g. equation (A3. 13.75), formed at energies above both thresholds for dissociation in chemical 
activation: 


H + C2H5 -* CjH; -+ 2CH 3 . (A 3.13.75) 

The figure shows the migration of energy between excited levels of the ultimately reactive C-C oscillator, the 

total energy being constant at E I he = 4 1 000 cm with a CC dissociation threshold of 3 1 000 cm . The 
energy balance is thus given by: 


(A 3.13.76) 


]? 




The microcanonical equilibrium distributions are governed by the densities p sl in the (s-l) = 17 oscillators 
( figure A3. 13. 15 ): 


PmicroQO = p : 


(ft) 



(A 3.13.77) 


where the 17 remaining degrees of freedom of ethane form essentially a 'heat bath'. The kinetic master 
equation treatment of this model leads to steady-state populations shown in figure A3. 13. 16 . This illustrates 
that the steady-state populations under conditions where reaction equation (A3. 13.75) competes with IVR 
differ from the microcanonical equilibrium populations at high energy, and both differ from thermal 
distributions shown as lines (quantum or classical). Whereas the deviation from a thermal distribution is well 
understood and handled by standard statistical theories such as RRKM ( chapter A3. 12 ) and the statistical 
adiabatic channel model [76], the deviation from the microcanonical distribution would lead to an 
intramolecular nonequilibrium effect on the rates of reaction which so far has not been well investigated 
experimentally [37, 38 and 39 ], 
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Figure A3. 13. 15. Master equation model for IVR in highly excited C 2 H 6 . The left-hand side shows the 
quantum levels of the reactive CC oscillator. The right-hand side shows the levels with a high density of states 
from the remaining 17 vibrational (and torsional) degrees of freedom (from [38]). 
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Figure A3. 13. 16. Illustration of the level populations (corresponding to the C-C oscillator states) from 
various treatments in the model of figure A3. 13. 15 for C 2 H 6 at a total energy E = (he) 41 000 cm and a 
threshold energy^ = (he) 31 000 cm -1 . The points are microcanonical equilibrium distributions. The crosses 
result from the solution of the master equation for IVR at steady state and the lines are thermal populations at 
the temperatures indicated (from [38]: quant, is calculated with quantum densities of states, class, with 
classical mechanical densities.). 
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A 3.13.7 SUMMARIZING OVERVIEW ON ENERGY REDISTRIBUTION 
IN REACTING SYSTEMS 


It has been understood for more than a century that energy redistribution is a key process in chemical 
reactions, including in particular the oldest process of chemical technology used by mankind: fire or 
combustion, where both radiative and collisional processes are relevant. Thus one might think that this field 
has reached a stage of maturity and saturation. Nothing could be further from the truth. While collisional 
energy transfer is now often treated in reaction systems in some detail, as is to some extent routine in 
unimolecular reactions, there remain plenty of experimental and theoretical challenges. In the master equation 
treatments, which certainly should be valid here, one considers a statistical, macroscopic reaction system 
consisting of reactive molecules in a mixture, perhaps an inert gas heat bath. 

The understanding of the second process considered in this chapter, intramolecular energy redistribution 
within a single molecular reaction system, is still in its infancy. It is closely related to the challenge of finding 
possible schemes to control the dynamics of atoms in molecules and the related change of molecular structure 
during the course of a chemical reaction [ 10 , 117 , 154 , 175 ], typically in the femtosecond time scale, which 
has received increasing attention in the last few decades [ 180 , 181 and 182 ]. The border between fully 
quantum dynamical treatments, classical mechanical theories and, finally, statistical master equations for IVR 
type processes needs to be explored further experimentally and theoretically in the future. Unravelling details 
of the competition between energy redistribution and reaction in individual molecules remains an important 
task for the coming decades [37, 38, 39 and 40 ]. 
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A3.14 Nonlinear reactions, feedback and self- 
organizing reactions 

Stephen K Scott 


A3.14.1 INTRODUCTION 

A3.1 4.1.1 NONLINEARITY AND FEEDBACK 

In the reaction kinetics context, the term 'nonlinearity' refers to the dependence of the (overall) reaction rate 
on the concentrations of the reacting species. Quite generally, the rate of a (simple or complex) reaction can 
be defined in terms of the rate of change of concentration of a reactant or product species. The variation of 
this rate with the extent of reaction then gives a 'rate-extent' plot. Examples are shown in figure A3. 14.1 . In 


the case of a first-order reaction, curve (i) in figure A3. 14. 1(a) , the rate-extent plot gives a straight line: this is 
the only case of 'linear kinetics'. For all other concentration dependences, the rate-extent plot is 'nonlinear': 
curves (ii) and (iii) in figure A3. 14. 1(a) correspond to second-order and half-order kinetics respectively. For 
all the cases in figure A3. 14. 1(a) , the reaction rate is maximal at zero extent of reaction, i.e. at the beginning 
of the reaction. This is characteristic of 'deceleratory' processes. A different class of reaction types, figure 
A3. 14. Kb) , show rate-extent plots for which the reaction rate is typically low for the initial composition, but 
increases with increasing extent of reaction during an 'acceleratory' phase. The maximum rate is then 
achieved for some non-zero extent, with a final deceleratory stage as the system approaches complete reaction 
(chemical equilibrium). Curves (i) and (ii) in figure A3. 14. 1(b) are idealized representations of rate-extent 
curves observed in isothermal processes exhibiting 'chemical feedback'. Feedback arises when a chemical 
species, typically an intermediate species produced from the initial reactants, influences the rate of (earlier) 
steps leading to its own formation. Positive feedback arises if the intermediate accelerates this process; 
negative feedback (or inhibition) arises if there is a retarding effect. Such feedback may arise chemically 
through 'chain-branching' or 'autocatalysis' in isothermal systems. Feedback may also arise through thermal 
effects: if the heat released through an exothermic process is not immediately lost from the system, the 
temperature of the reacting mixture will rise. A reaction showing a typical overall Arrhenius temperature 
dependence will thus show an increase in overall rate, potentially giving rise to further self-heating. Curve 
(iii) in figure A3. 14. 1(b) shows the rate-extent curve for an exothermic reaction under adiabatic conditions. 
Such feedback is the main driving force for the process known as combustion: endothermic reactions can 
similarly show self-cooling and inhibitory feedback. Specific examples of the origin of feedback in a range of 
chemical systems are presented below. 





|A| 
extent or reaction |A|„ - JA] 



Figure A3. 14.1. Rate-extent plots for (a) deceleratory and (b) acceleratory systems. 

The branching cycle involving the radicals H, OH and O in the H 2 + 2 reaction involves the three 
elementary steps 


H + th -+ OH-O 


(A3. 14.1] 


O+H, -+ OH + H 


(A3. 14.2) 


Oil r U 2 -+ H 2 + R 


(A3. 14.3) 


In step (1) and step (2) there is an increase from one to two 'chain carriers'. (For brevity, step (x) is used to 
refer to equation (A3. 14.x) throughout.) Under typical experimental conditions close to the first and second 
explosion limits (see section A3. 14.2.3 ), step (2) and step (3) are fast relative to the rate determining step (1). 


Combining (1) + (2) + 2 x (3) gives the overall stoichiometry 

H + 3H : +0 : ^ 3H + 2H : 

so there is a net increase of 2 H atoms per cycle. The rate of this overall step is governed by the rate of step 
(1), so we obtain 

d[H]/d/ = +2£,[0 2 ][H] 

where the + sign indicates that the rate of production of H atoms increases proportionately with that 
concentration. 


In the bromate-iron clock reaction, there is an autocatalytic cycle involving the species intermediate species 
HBr0 2 . This cycle is comprised of the following non-elementary steps: 


HBr0 2 + BrQi + H -► 2BrG 2 + H 2 (A3.14.4) 

Brt> 2 + Fc 2 +H 1 -+HBK) 2 +Fc 3 \ (A3.14.5) 

Step (5) is rapid due to the radical nature of Br0 2 , so the overall stoichiometric process given by (4) + 2 x (5) ? 
has the form 

HBrG 2 + brO: + 2Fe^ + 3H ' -> 2HBiC 2 + 21-V 1 + H 2 

and an effective rate law 

d[HBr0 2 ]/dr = +Jt 4 [BiOj-][ir][lIBi0 2 ] 

again showing increasing rate of production as the concentration of HBr0 2 increases. 

In 'Landolt'-type reactions, iodate ion is reduced to iodide through a sequence of steps involving a reductant 
species such as bisulfite ion (" .1) or arsenous acid (H 3 As0 3 ). The reaction proceeds through two overall 
stoichiometric processes. The Dushman reaction involves the reaction of iodate and iodide ions 

10j +51" + 611" -+ 3I 3 + 3H 2 (A3.14.6) 

with an empirical rate law R a = (k x + £ 2 [r])['°* ][I"][H + ] 2 . 

The iodine produced in the Dushman process is rapidly reduced to iodide via the Roebuck reaction 

h + FbAsOi + H 2 -+ 2l~ + HiAsO + + 2H 1 . (A3.14.7) 


If the initial concentrations are such that [H 3 AsO 3 ] /[ .t] > 3, the system has excess reductant. In this case, 
the overall stoichiometry is given by (6) + 3 x (7) to give 

3H 3 As0 3 + 10 J + 51" -+ 3H3ASO4 + 61" 


i.e. there is a net production of one iodide ion, with an overall rate given by R . At constant pH and for 
conditions such that k 2 [l~] ^k 1? this can be approximated by 

d[r]/d, = A[]Oj][r] 2 

where k = /c 2 [H + ] 2 . This again has the autocatalytic form, but now with the growth proportional to the square of the 

autocatalyst (l~) concentration. Generic representations of autocatalytic processes in the form 

qmidrmic auiQcaialysis A + R -* 2B rjtc = k<ih 

cubic autocatalyst A + 2B -* 3B rase = kah 2 

where a and b are the concentrations of the reactant A and autocatalyst B respectively, are represented in 
figure A3. 14. 1(b) as curves (i) and (ii). The bromate-iron reaction corresponds to the quadratic type and the 
Landolt system to the cubic form. 

A3.14.1.2 SELF-ORGANIZING SYSTEMS 

'Self-organization' is a phrase referring to a range of behaviours exhibited by reacting chemical systems in 
which nonlinear kinetics and feedback mechanisms are operating. Examples of such behaviour include 
ignition and extinction, oscillations and chaos, spatial pattern formation and chemical wave propagation. 
There is a formal distinction between thermodynamically closed systems (no exchange of matter with the 
surroundings) and open systems [JJ. In the former, the reaction will inevitably attain a unique state of 
chemical equilibrium in which the forward and reverse rates of every step in the overall mechanism become 
equal (detailed balance). This equilibrium state is temporally stable (the system cannot oscillate about 
equilibrium) and spatially uniform (under uniform boundary conditions). However, nonlinear responses such 
as oscillation in the concentrations of intermediate species can be exhibited as a transient phenomenon, 
provided the system is assembled with initial species concentrations sufficiently 'far from' the equilibrium 
composition (as is frequently the case). The 'transient' evolution may last for an arbitrary long (pehaps even a 
geological timescale), but strictly finite, period. 


A3.14.2 CLOCK REACTIONS, CHEMICAL WAVES AND IGNITION 

A3.14.2.1 CLOCK REACTIONS 

The simplest manifestation of nonlinear kinetics is the clock reaction — a reaction exhibiting an identifiable 
'induction period', during which the overall reaction rate (the rate of removal of reactants or production of 
final products) may be practically indistinguishable from zero, followed by a comparatively sharp 'reaction 
event' during which reactants are converted more or less directly to the final products. A schematic evolution 
of the reactant, product and intermediate species concentrations and of the reaction rate is represented in 
figure A3. 14.2 . Two typical mechanisms may operate to produce clock behaviour. 
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Figure A3. 14.2. Characteristic features of a clock reaction, illustrated for the Landolt reaction, showing (a) 
variation of product concentration with induction period followed by sharp 'reaction event'; (b) variation of 
overall reaction rate with course of reaction. 

The Landolt reaction (iodate + reductant) is prototypical of an autocatalytic clock reaction. During the 
induction period, the absence of the feedback species (here iodide ion, assumed to have virtually zero initial 
concentration and formed from the reactant iodate only via very slow 'initiation' steps) causes the reaction 
mixture to become 'kinetically frozen'. There is reaction, but the intermediate species evolve on concentration 
scales many orders of magnitude less than those of the reactant. The induction period depends on the initial 
concentrations of the major reactants in a manner predicted by integrating the overall rate cubic autocatalytic 
rate law, given in section A3. 14. 1.1 . 

The bromate-ferroin reaction has a quadratic autocatalytic sequence, but in this case the induction period is 
determined primarily by the time required for the concentration of the 'inhibitor' bromide ion to fall to a 
critical low value through the reactions 


BrOJ + Br" + 2H 4 -* HBrOi + HGBr 


(A3. 14.8) 


IIBrO? Or" ■ II * -* 2UOBr. 


(A3. 14.9) 


Bromide ion acts as an inhibitor through step (9) which competes for HBr0 2 with the rate determining step 
for the autocatalytic process described previously, step (4) and step (5) , Step (8) and Step (9) constitute a 
pseudo-first-order removal of Br~ with HBr0 2 maintained in a low steady-state concentration. Only once [Br" 
] < [Br~] cr = k^[ErO^]/k 2 does step (3) become effective, initiating the autocatalytic growth and oxidation. 


Clock-type induction periods occur in the spontaneous ignition of hydrocarbon-oxygen mixtures [2], in the 
setting of concrete and the curing of polymers [3]. A related phenomenon is the induction period exhibited 


during the self-heating of stored material leading to thermal runaway [4]. A wide variety of materials stored in 
bulk are capable of undergoing a slow, exothermic oxidation at ambient temperatures. The consequent self- 
heating (in the absence of efficient heat transfer) leads to an increase in the reaction rate and, therefore, in the 
subsequent rate of heat release. The Semenov and Frank-Kamenetskii theories of thermal runaway address 
the relative rates of heat release and heat loss under conditions where the latter is controlled by Newtonian 
cooling and by thermal conductivity respectively. In the Frank-Kamenetskii form, the heat balance equation 
shows that the following condition applies for a steady-state balance between heat transfer and heat release: 

*V 3 r - (-Affk ; Mc" /:;/?r = 

where k is the thermal conductivity and V 2 is the Laplacian operator appropriate to the particular geometry. 
The boundary condition specifies that the temperature must have some fixed value equal to the surrounding 
temperature T^ at the edge of the reacting mass: the temperature will then exceed this value inside the reacting 
mass, varying from point to point and having a maximum at the centre. Steady-state solutions are only 

possible if the group of quantities * ™ " " '^O c ^ " e J * K " <~ , where a^ is the half-width of the pile, is less 

than some critical value. If this group exceeds the critical value, thermal runaway occurs. For marginally 
supercritical situations where thermal balance is almost achieved, the runaway is preceded by an induction 

period as the temperature evolves on the Fourier time scale, r = ^ck^ where c is the heat capacity. For 

large piles of low thermal conductivity, this may be of the order of months. 

A3.14.2.2 REACTION-DIFFUSION FRONTS 

A 'front' is a thin layer of reaction that propagates through a mixture, converting the initial reactants to final 
products. It is essentially a clock reaction happening in space. If the mixture is one of fuel and oxidant, the 
resulting front is known as a flame. In each case, the unreacted mixture is held in a kinetically frozen state due 
to the virtual absence of the feedback species (autocatalyst or temperature). The reaction is initiated locally to 
some point; for example, by seeding the mixture with the autocatalyst or providing a 'spark'. This causes the 
reaction to occur locally, producing a high autocatalyst concentration/high temperature. Diffusion/conduction 
of the autocatalyst/heat then occurs into the surrounding mixture, initiating further reaction there. Front/flames 
propagate through this combination of diffusion and reaction, typically adopting a constant velocity which 
depends on the diffusion coefficient/thermal diffusivity and the rate coefficient for the reaction [5]. In each 

case, the speed c has the form c &> VDk j n gravitational fields, convective effects may arise due to density 
differences between the reactants ahead and the products behind the front. This difference may arise from 
temperature changes due to an exothermic/endothermic reaction or due to changes in molar volume between 
reactants and products — in some cases the two processes occur and may compete. In solid-phase combustion 
systems, such as those employed in self-propagating high-temperature synthesis (SHS) of materials, the 
steady flame may become unstable and a pulsing or oscillating flame develop — a feature also observed in 
propagating polymerization fronts [6]. 


A3.1 4.2.3 IGNITION, EXTINCTION AND BIST ABILITY 

In flow reactors there is a continuous exchange of matter due to the inflow and outflow. The species 
concentrations do not now attain the thermodynamic chemical equilibrium state — the system now has steady 
states which constitute a balance between the reaction rates and the flow rates. The steady-state concentrations 
(and temperature if the reaction is exo/endo thermic) depend on the operating conditions through experimental 
parameters such as the flow rate. A plot of this dependence gives the steady-state locus, see figure A3. 14.3 . 
With feedback reactions, this locus may fold back on itself, the fold points corresponding to critical conditions 


for ignition or extinction — the plot is also then known as a 'bifurcation diagram'. Between these points, the 
system exhibits bistability, as either the upper or the lower branch can be accessed; so the system may have 
different net reaction rates for identical operating conditions. Starting with a long residence time (low flow 
rates), the system lies on the 'thermodynamic branch', with a steady-state composition close to the 
equilibrium state. As the residence time is decreased (flow rate is increased), so the steady-state extent of 
conversion decreases, but at the turning point in the locus a further decrease in residence time causes the 
system to drop onto the lower, flow branch. This jump is known as 'washout' for solution-phase reactions and 
'extinction' in combustion. The system now remains on the flow branch, even if the residence time is 
increased: there is hysteresis, with the system jumping back to the thermodynamic branch at the 'ignition' 
turning point in the locus. Many reactions exhibiting clock behaviour in batch reactors show ignition and 
extinction in flow systems. The determination of such bifurcation diagrams is a classic problem in chemical 
reactor engineering and of great relevance to the safe and efficient operation of flow reactors in the modern 
chemical industry. More complex steady-state loci, with isolated branches or multiple fold points leading to 
three accessible competing states have been observed in systems ranging from autocatalytic solution-phase 
reactions, smouldering combustion and in catalytic reactors [7]. Bistability has been predicted from certain 
models of atmospheric chemistry [8]. In the H 2 + 2 and other branched-chain reactions, a balance equation 
for the radical species expresses the condition for a steady-state radical concentration. The condition for an 
'ignition limit', i.e. for the marginal existence of a steady state, is that the branching and termination rates just 
balance. This can be expressed in terms of a 'net branching factor', § = k^-k t where k^ and k t are the pseudo- 
first-order rate constants for branching and termination respectively. For the hydrogen-oxygen system at low 
pressures, this has the form 

2ft h [0;] = * [1 [0 2 ][M]+4 ( , 

where k^ corresponds to a three-body termination process (with [M] being the total gas concentration) and k t2 
to a surface removal of H-atoms. This condition predicts a folded curve on the pressure-temperature plane — 
the first and second explosion limits, see figure A3. 14.4 . 
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Figure A3. 14.3. Example bifurcation diagrams, showing dependence of steady-state concentration in an open 
system on some experimental parameter such as residence time (inverse flow rate): (a) monotonic 
dependence; (b) bistability; (c) tristability; (d) isola and (e) mushroom. 
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Figure A3. 14.4. P-T ignition limit diagram for H 2 + 2 system showing first, second and third limits as 
appropriate to a closed reactor. The first and second limits have similar positions in a typical flow reactor, for 
which there is also a region of oscillatory ignition as indicated. 


A3.14.3 OSCILLATIONS AND CHAOS 

Despite previous worries about restrictions imposed by thermodynamics, the ability of homogeneous 
isothermal chemical systems to support long-lived (although strictly transient) oscillations in the 
concentrations of intermediate species even in closed reactors is now clearly established [9]. The reaction 
system studied in greatest detail is the Belousov-Zhabotinsky (BZ) reaction [10, H and 12], although the 
CIMA/CDIMA system involving chlorine dioxide, iodine and malonic acid is also of importance [13]. In flow 
reactors, oscillations amongst the concentrations of all species, including the reactants and products, are 
possible. 

A3.14.3.1 THE BELOUSOV-ZHABOTINSKY REACTION 


The BZ reaction involves the oxidation of an organic molecule (citric acid, malonic acid (MA)) by an 

acidified bromate solution in the presence of a redox catalyst such as the ferroin/ferriin or Ce /Ce couples. 
For a relatively wide range of initial reactant concentrations in a well-stirred beaker, the reaction may exhibit 
a short induction period, followed by a series of oscillations in the concentration of several intermediate 
species and also in the colour of the solution. The response of a bromide-ion-selective electrode and of a Pt 
electrode (responding to the redox couple) for such a system is shown in figure A3. 14.5. Under optimal 
conditions, several hundred excursions are observed. In the redox catalyst concentrations, the oscillations are 
of apparently identical amplitude and only minutely varying period: the bromide ion concentration increases 
slowly with each complete oscillation and it is the slow build-up of this inhibitor, coupled with the 
consumption of the initial reactants, that eventually causes the oscillations to cease (well before the system 
approaches its equilibrium concentration). 
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Figure A3. 14.5. Experimental records from Pt and Br -ion-sensitive electrode for the BZ reaction in batch 
showing regular oscillatory response. 
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The basic features of the oscillatory mechanism of the BZ reaction are given by the Field-Koros-Noyes 
(FKN) model [14]. This involves three 'processes' — A, B and C. Process A involves step (8) and step (9) 
from section A3. 14.2.1 , leading to removal of 'inhibitor' bromide ion. Process B involves step (3) and step (4) 
from Section A3. 14. 1.1 and gives the autocatalytic oxidation of the catalyst. This growth is limited partly by 
the disproportionation reaction 


2H13r0 2 -> BrO: i HOBr Hf . 


(A3. 14. 10) 


The 'clock' is reset through process C. Bromomalonic acid, BrMA, is a by-product of processes A and B 
(possibly from HOBr) which reacts with the oxidized form of the redox catalyst. This can be represented as 


2M™ - MA + BrMA -+ 2M^ + /Br" 


(A3.14.11) 


Here, /is a stoichiometric factor and represents the number of bromide ions produced through this overall 
process for each two catalyst ions reduced. Because of the complex nature of this process, involving various 
radical species, /can lie in a range between and -3 depending on the [BrO^"]/[MA] ratio and the [H + ] 
concentration (note that these may change during the reaction). 

The behaviour of the BZ system can be modelled semi-quantitatively by the 'oregonator' model [15]: 


dx If (x-q) ) 

— = - ml — .v) - fz- } 

At e\ {x + ij) f 




d.v 1 

d/ ~~ e 

Ixo-,) -/,'; v :«' 


dz 

— = *-z 

df 


where x and z are (scaled) concentrations of HBr0 2 and M Qx respectively, and f "" " ■ ' l Uj '£l/ *- s I ^■Vi Jl " 'and 
q = 2k^k^/kgk 4 are parameters depending on the rate coefficients and the initial concentrations [16], with 
[Org] being the total concentration of organic species (MA + BrMA). Oscillations are observed in this model 
for 0.5 <f< 1 + V2. More advanced models and detailed schemes account for the difference between systems 
with ferrion and cerium ion catalysts and for the effect of oxygen on the reaction [17]. 

Under some conditions, it is observed that complex oscillatory sequences develop even in batch systems, 
typically towards the end of the oscillatory phase of the reaction. Transient 'chaos' — see section A3. 14.3.3 — 
appears to be established [18]. 
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In flow reactors, both simple period- 1 oscillations (every oscillation has the same amplitude and period as its 
predecessor) and more complex periodic states can be established and sustained indefinitely. The first report 
of chemical chaos stemmed from the BZ system [19] (approximately contemporaneously with observations in 
the biochemical peroxidase reaction [20]). These observations were made in systems with relatively high flow 
rate and show complexity increasing through a sequence of 'mixed-mode' wave forms comprising one large 
excursion followed by n small peaks, with n increasing as the flow rate is varied. Subsequent period-doubling 
and other routes to chaos have been found in this system at low flow rates [21]. A relatively simple two- 
variable extension of the oregonator model can adequately describe these complex oscillations and chaos. 

A3.14.3.2 THE CI MA/CD I MA SYSTEM 

The reaction involving chlorite and iodide ions in the presence of malonic acid, the CIMA reaction, is another 
that supports oscillatory behaviour in a batch system (the chlorite-iodide reaction being a classic 'clock' 
system: the CIMA system also shows reaction-diffusion wave behaviour similar to the BZ reaction, see 
section A3. 14.4 ). The initial reactants, chlorite and iodide are rapidly consumed, producing C10 2 and I 2 which 
subsequently play the role of 'reactants'. If the system is assembled from these species initially, we have the 
CDIMA reaction. The chemistry of this oscillator is driven by the following overall processes, with the 
empirical rate laws as given: 


(1J MA + [ 2 -* IMA + I" +H + n =ifc|[MA] (A3.14.12a) 

(2) C10 2 + r -► CIO? + \h r 2 = Jfc 3 [C102][I~] (A3.i4.i2b) 


(3) CIO- + 4I- +4H 1 -*. 2I 2 + C1" +2RiO r 3 = fr 3 fCIQ ' ^J 1 \ 

" h + [I"] 2 


(A3. 14. 12c) 


The concentrations of the major reactants C10 2 and I 2 , along with H + , are treated as constants, so this is a 
two-variable scheme involving the concentrations of CI Q^ and I - . Step (12 constitutes the main feedback 


process, which here is an inhibitory channel, with the rate decreasing as the concentration of iodide ion 
increases (for large [I~] the rate is inversely proportional to the concentration). Again, exploiting 
dimensionless terms, the governing rate equations for u (a scaled [I - ]) and v (scaled [GOT]) can be written as 

[22,23]: 

a=«-"-T7Pd7=M"-T7^) 

where a and b are constants depending on the rate coefficients and the initial concentrations of the reactants. 
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Another important reaction supporting nonlinear behaviour is the so-called FIS system, which involves a 
modification of the iodate-sulfite (Landolt) system by addition of ferrocyanide ion. The Landolt system alone 
supports bistability in a CSTR: the addition of an extra feedback channel leads to an oscillatory system in a 
flow reactor. (This is a general and powerful technique, exploiting a feature known as the 'cross-shaped 
diagram', that has led to the design of the majority of known solution-phase oscillatory systems in flow 

reactors [25].) The FIS system is one member of the important class of pH oscillators in which H + acts as an 
autocatalyst in the oxidation of a weak acid to produce a strong acid. Elsewhere, oscillations are observed in 
important chemical systems such as heterogeneously catalysed reactions or electrochemical and 
electrodissolution reactions [26], 

A3.1 4.3.3 COMBUSTION SYSTEMS 

Oscillatory behaviour occurs widely in the oxidation of simple fuels such as H 2 , CO and hydrocarbons. Even 
in closed reactors, the CO + 2 reaction shows a 'lighthouse effect', with up to 100 periodic emissions of 
chemiluminescence accompanying the production of electronically excited C0 2 . Although also described as 
'oscillatory ignition', each 'explosion' is associated with less than 1% fuel consumption and can be 
effectively isothermal, even for this strongly exothermic reaction. Many hydrocarbons exhibit 'cool flame' 
oscillations in closed reactors, with typically between two and seven 'bursts' of light emission (from excited 
HCHO) and reaction, accompanied by self-heating of the reacting mixture. In continuous flow reactors, these 
modes can be sustained indefinitely. Additionally, true periodic ignitions occur for both the CO + 2 and H 2 
+ 2 systems [ 27 , 28 ], The p-T 'ignition' diagram for the CO + 2 reaction under typical experimental 
conditions is shown in figure A3. 14.6. Example oscillations observed at various locations on this diagram are 
displayed in figure A3. 14.7 . Within the region marked 'complex oscillations' the simple period- 1 oscillation 
is replaced by waveforms that have different numbers of excursions in the repeating unit. The complexity 

develops through a 'period-doubling' sequence, with the waveform having 2 n oscillations per repeating unit, 
with n increasing with T . The range of experimental conditions over which the higher-order periodicities 

exist decreases in a geometric progression, with n — > go leading to an oscillation with no repeating unit at 
some finite ambient temperature. Such chaotic responses exist over a finite range of experimental conditions 
and differ fundamentally from stochastic responses. Plotting the amplitude of one excursion against the 
amplitude of the next gives rise to a 'next-maximum map' ( figure A3. 14.8 ). This has a definite structure — a 
single-humped maximum — characteristic of a wide class of physical systems showing the period-doubling 
route to chaos. 
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Figure A3. 14.6. P-T^ ignition limit diagram for CO + 2 system in a flow reactor showing location of 
ignition limits and regions of simple and complex (shaded area) oscillations. 
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Figure A3. 14.7. Example oscillatory time series for CO + 2 reaction in a flow reactor corresponding to 
different P-T^ locations in figure A3. 14.6 : (a) period- 1; (b) period-2; (c) period-4; (d) aperiodic (chaotic) 
trace; (e) period-5; (f) period-3. 
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Figure A3. 14.8. Next-maximum map obtained by plotting maximum temperature in one ignition against 
maximum in next ignition from trace (d) of figure A3. 14.7 . 

The mechanistic origin of simple and complex oscillation in the H 2 + 2 system is well established. The basic 
oscillatory clockwork involves the self-acceleration of the reaction rate through the chain-branching cycle, 
step (1) , step (2) and step (3) in section A3. 14. 1.1 and the 'self-inhibitory' effect of H 2 production. Water is 
an inhibitor of the H 2 + 2 system under these pressure and temperature conditions through its role in the 
main chain-termination step 


H + 2 + M^ HCh + M 


(A3. 14. 13) 


where M is a 'third-body' species which removes energy, stabilizing the newly-formed H0 2 bond. H 2 and 2 
have third body efficiencies of 1 and 0.3 respectively (these are measured relative to H 2 ), but H 2 is 
substantially more effective, with an efficiency of -6.3 relative to H 2 . Following an ignition, then, the rate of 
step (A3. 14. 13) is enhanced relative to the branching cycle, due to the now high concentration of H 2 0, and 
further reaction is inhibited. The effect of the flow to the reactor, however, is to replace H 2 with H 2 and 2 , 
thus lowering the overall third-body effectiveness of the mixture in the reactor. Eventually, the rate of step 
(A3. 14. 13) relative to the branching rate becomes sufficiently small for another ignition to occur. Complex 
oscillations require the further feedback associated with the self-heating accompanying ignition. A 'minimal' 
complex oscillator mechanism for this system has been determined [28], 

Surprisingly, the origin of the complex oscillations and chaos in the CO + 2 system (where trace quantities 
of H-containing species have a major influence on the reaction and, consequently, many of the reactions of the 
H 2 + 2 system predominate) are far from established to date. 

The 'cool flames' associated with the low-temperature oxidation of hydrocarbon have great relevance, being 
the fundamental cause of knock in internal combustion engines. Their mechanistic origin arises through a 
thermokinetic feedback [2]. The crucial feature is the reaction through which 2 reacts by addition to an alkyl 

radical R 


R+0 2 = ROi. 


(A3. 14. 14) 
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Under typical operating conditions, in the absence of self-heating from the reaction, the equilibrium for this 
step lies in favour of the product ^®\. This species undergoes a series of intramolecular hydrogen-abstraction 
and further 2 -addition steps before fragmentation of the carbon chain. This final step produces three radical 


species, leading to a delayed, but overall branching of the radical chain ('degenerate branching'). This channel 
is overall an exothermic process, and the acceleration in rate associated with the branching leads to an 
increase in the gas temperature. This increase causes the equilibrium of step (A3. 14. 14) to shift to the left, in 

favour of the R' radical. The subsequent reaction channel for this species involves H-atom abstraction by 2 , 
producing the conjugate alkene. This is a significantly less exothermic channel, and the absence of branching 
means that the overall reaction rate and the rate of heat release fall. The temperature of the reacting mixture, 
consequently, decreases, causing a shift of the equilibrium back to the right. Complex oscillations have been 
observed in hydrocarbon oxidation in a flow reactor, although chaotic responses have not yet been reported. 

A3.1 4.3.4 CONTROLLING CHAOS 

The simple shape of the next-maximum map has been exploited in approaches to 'control' chaotic systems. 
The basic idea is that a system in a chaotic state is coexisting with an infinite number of unstable periodic 
states — indeed the chaotic 'strange attractor' is comprised of the period- 1, period-2, period-4 and all other 
periodic solutions which have now become unstable. Control methods seek to select one of these unstable 
periodic solutions and to 'stabilize' it by applying appropriate but very small perturbations to the experimental 
operating conditions. Such control methods can also be adapted to allow an unstable state, such as the period- 
1 oscillation, to be 'tracked' through regions of operating conditions for which it would be naturally unstable. 
These techniques have been successfully employed for the BZ chaos as well as for chaos in lasers and various 
other physical and biological systems. For a full review and collection of papers see [29]. 


A3.14.4 TARGETS AND SPIRAL WAVES 

The BZ and other batch oscillatory systems are capable of supporting an important class of reaction-diffusion 
structures. As mentioned earlier, clock reactions support one-off travelling wave fronts or flame, converting 
reactants to products. In an oscillatory system, the 'resetting' process can be expected to produce a 'wave 
back' following the front, giving rise to a propagating wave pulse. Furthermore, as the system is then returned 
more or less to its initial state, further wave initiation may be possible. A series of wave pulses travelling one 
after the other forms a wave train. If the solution in spread as a thin film, for example in a Petri dish, and 
initiation is from a point source, the natural geometry will be for a series of concentric, circular wave pulses — 
a target pattern ' [16, 30, 31 ]. An example of such reaction-diffusion structures in the BZ system is shown in 
figure A3. 14.9(a) . For such studies, the reactant solution is typically prepared with initial composition such 
that the system lies just outside the range for which it is spontaneously oscillatory (i.e. for/marginally in 
excess of 1 + V2, see section A3. 14.3.1 ) by increasing the initial malonic acid concentration relative to 
bromate. The system then sits in a stable steady state corresponding to the reduced form of the catalyst, and 
has the property of being excitable. An excitable system is characterized by (i) having a steady state; (ii) the 
steady state is stable to small perturbations and (iii) if the perturbation exceeds some critical or threshold 
value, the system responds by exhibiting an excitation event. For the BZ system, this excitation event is the 
oxidation of the redox catalyst, corresponding to process B with a local colour change in the vicinity of the 
perturbation (initiation) site. 
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This response is typically large compared with the critical stimulus, so the system acts as a 'nonlinear 
amplifier' of the perturbing signal. Following the excitation, the system eventually returns to the initial steady 
state and recovers its excitability. There is, however, a finite period, the refractory period, between the 
excitation and the recovery during which the system is unresponsive to further stimuli. These basic 
characteristics are summarized in figure A3. 14. 10. Excitability is a feature not just of the BZ system, but is 
found widely throughout physical and, in particular, biological systems, with important examples in nerve 
signal transmission and co-ordinated muscle contraction [32]. 



Figure A3. 14.9. Reaction-diffusion structures for an excitable BZ system showing (a) target and (b) spiral 
waves. (Courtesy of A F Taylor.) 
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Figure A3. 14.10. Schematic representation of important features of an excitable system (see the text for 
details). 

The target structures shown in figure A3. 14.9(a) reveal several levels of detail. Each 'pacemaker site' at the 
centre of a target typically corresponds to a position at which the system exhibits some heterogeneity, in some 
cases due to the presence of dust particles or defects of the dish surface. It is thought that these alter the local 
pH, so as to produce a composition in that vicinity such that the system is locally oscillatory, the spontaneous 
oscillations then serving as initiation events. Different sites have differing natural oscillatory frequencies, 
leading to the differing observed wavelengths of the various target structures. The speed of the waves also 
depends on the frequency of the pacemaker, through the so-called dispersion relation. The underlying cause of 
this for the BZ system is that the speed of a given front is dependent on the bromide ion (inhibitor) 
concentration into which it is propagating: the higher the pacemaker frequency, the less time the bromide ion 
concentration has to fall, so high-frequency (low-period) structures have lower propagation speeds. 
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If a wave pulse is broken, for example through mechanical disturbance, another characteristic feature of 
excitable media is that the two 'ends' then serve as sites around which the wave may develop into a pair of 
counter-rotating spirals, see figure A3. 14.9(b) . Once created, the spiral core is a persistent structure (in 
contrast to the target, in which case removal of the pacemaker heterogeneity prevents further initiation). The 
spiral structures have a wavelength determined by the composition of the bulk solution rather than the local 
properties at the core (although other features such as the meandering of the core may depend more crucially 
on local properties). 


Targets and spirals have been observed in the CIMA/CDIMA system [13] and also in dilute flames (i.e. 
flames close to their lean flammability limits) in situations of enhanced heat loss [33]. In such systems, 
substantial fuel is left unburnt. Spiral waves have also been implicated in the onset of cardiac arrhythmia [32]: 
the normal contractive events occurring across the atria in the mammalian heart are, in some sense, equivalent 
to a wave pulse initiated from the sino-atrial node, which acts as a pacemaker. If this pulse becomes 
fragmented, perhaps by passing over a region of heart muscle tissue of lower excitability, then spiral 
structures (in 3D, these are scroll waves) or 're-entrant waves' may develop. These have the incorrect 


sequencing of contractions to squeeze blood from the atria to the ventricles and impair the operation of the 
heart. Similar waves have been observed in neuronal tissue and there are suggested links to pathological 
behaviour such a epilepsy and migraine [34]- Spirals and targets have also been observed accompanying the 
oxidation of CO on appropriate single-crystal catalysts, such as Pt(l 10), and in other heterogeneously 
catalysed systems of technological relevance [ 35 ] (see figure A3. 14.1 1. The light-sensitive nature of the Ru 
(bipy) 2 -catalysed BZ system has been exploited in many attempts to 'control' or influence spiral structures 
(for example to remove spirals). The excitable properties of the BZ system have also been used to develop 
generic methods for devising routes through complex mazes or to construct chemical equivalents of logic 
gates [36]. 



Figure A3.14.ll. Spiral waves imaged by photoelectron electron microscopy for the oxidation of CO by 2 
on a Pt(l 10) single crystal under UHV conditions. (Reprinted with permission from [35], © The American 
Institute of Physics.) 
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A3.14.5 TURING PATTERNS AND OTHER STRUCTURES 

A3.1 4.5.1 TURING PATTERNS 

Diffusive processes normally operate in chemical systems so as to disperse concentration gradients. In a paper 
in 1952, the mathematician Alan Turing produced a remarkable prediction [ 37 ] that if selective diffusion were 
coupled with chemical feedback, the opposite situation may arise, with a spontaneous development of 
sustained spatial distributions of species concentrations from initially uniform systems. Turing's paper was set 
in the context of the development of form (morphogenesis) in embryos, and has been adopted in some studies 
of animal coat markings. With the subsequent theoretical work at Brussels [1], it became clear that oscillatory 
chemical systems should provide a fertile ground for the search for experimental examples of these Turing 
patterns. 


The basic requirements for a Turing pattern are: 

(i) the chemical reaction must exhibit feedback kinetics; 


(ii) the diffusivity of the feedback species must be less than those of the other species; 

(iii) for the patterns to be sustained, the system must be open to the inflow and outflow of reactants and products. 

Requirement (i) is met particularly well by the BZ and CIMA/CDIMA reactions, although many chemical 
reactions with feedback are known. Requirements (ii) and (iii) were met almost simultaneously through the 
use of 'continuous flow unstirred reactors' (CFURs) in which the reaction is carried out in a dilute gel or 
membrane, with reactant free flows across the edges or faces of the gel. The incorporation of large indicator 
molecules such as starch into the gel is the key. This indicator is used with the CIMA/CDIMA system for 

which l^is formed where the I~ concentration is high, and this binds as a complex to the starch to produce the 
characteristic blue colour. The complexed ion is temporarily immobilized compared with the free ion, thus 
reducing the effective diffusion coefficient in a kind of 'reactive chromatography' [24]. In this way the first 
laboratory examples of Turing patterns were produced in Bordeaux [38] and in Texas [39]: examples are 
shown in figure A3. 14. 12 and figure A3. 14. 13 . Turing patterns have not been unambiguously observed in the 
BZ system as no similar method of reducing the diffusivity of the autocatalytic species HBr0 2 has been 
devised. 
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Figure A3. 14.12. The first experimental observation of a Turing pattern in a gel strip reactor. Solutions 
containing separate components of the CIMA/CDIMA reaction are flowed along each edge of the strip and a 
spatial pattern along the horizontal axis develops for a range of experimental conditions. (Reprinted with 
permission from [38], © The American Physical Society.) 
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Figure A3. 14.13. Further examples of the various Turing patterns observable in a 2D gel reactor, (a) and (b) 
spots, (c) and (d) stripes, (e) and (f): wider field of view showing long-range defects in basic structure. The 
scale bar alongside each figure represents 1 mm. (Reprinted with permission from [39], © The American 
Institute of Physics.) 

A3.1 4.5.2 CELLULAR FLAMES 

Such 'diffusion-driven instabilities' have been observed earlier in combustion systems. As early as 1892, 
Smithells reported the observation of 'cellular flames' in fuel-rich mixtures [40]. An example is shown in 
figure A3. 14. 14 . These were explained theoretically by Sivashinsky in terms of a 'thermodiffusive' 
mechanism [41] . The key feature here involves the role played by the Lewis number, Le, the ratio of the 
thermal to mass diffusivity. If Le < 1, which may arise with fuel-rich flame, for which H-atoms are the 
relevant species, of relatively low thermal conductivity (due to the high hydrocarbon content), a planar flame 
is unstable to spatial perturbations along the front. This mechanism has also been shown to operate for simple 
one-off chemical wave fronts, such as the iodate-arsenite system [42] and for various pH-driven fronts [43], if 

the diffusivity of I~ or H + are reduced via complexing strategies similar to that described above for the CIMA 
system. 
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Figure A3. 14.14. A cellular flame in butane oxidation on a burner. (Courtesy of A C Mcintosh.) 
A3.14.5.3 OTHER REACTION-DIFFUSION STRUCTURES 

The search for Turing patterns led to the introduction of several new types of chemical reactor for studying 
reaction-diffusion events in feedback systems. Coupled with huge advances in imaging and data analysis 
capabilities, it is now possible to make detailed quantitative measurements on complex spatiotemporal 
behaviour. A few of the reactor configurations of interest will be mentioned here. 

The Turing instability is specific in requiring the feedback species to be selectively immobilized. An related 
instability, the differential flow-induced chemical instability or DIFICI requires only that one active species 
be immobilized relative to the others [44]. The experimental configuration is simple: a column of ion 
exchange beads is loaded with one chemical component, for example, ferrion for the BZ system. The 
remaining species are prepared in solution and flowed through this column. Above some critical flow rate, a 
travelling spatial structure with narrow bands of oxidized reagent (in the BZ system) separated by a 
characteristic wavelength and propagating with a characteristic velocity (not equal to the liquid flow velocity) 
is established. This effect has been realized experimentally — see figure A3. 14. 15 . 
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Figure A3. 14.15. The differential flow-induced chemical instability (DIFICI) in the BZ reaction. (Reprinted 
with permission from [44], © The American Physical Society.) 

If a fluid is placed between two concentric cylinders, and the inner cylinder rotated, a complex fluid 
dynamical motion known as Taylor-Couette flow is established. Mass transport is then by exchange between 
eddy vortices which can, under some conditions, be imagined as a substantially enhanced diffusivity 
(typically with 'effective diffusion coefficients several orders of magnitude above molecular diffusion 
coefficients) that can be altered by varying the rotation rate, and with all species having the same diffusivity. 
Studies of the BZ and CIMA/CDIMA systems in such a Couette reactor [45] have revealed bifurcation 
through a complex sequence of front patterns, see figure A3. 14. 16 . 
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Figure A3. 14.16. Spatiotemporal complexity in a Couette reactor: space-time plots showing the variation of 

position with time of fronts corresponding to high concentration gradients of .t in the CIMA/CDIMA 
reaction. (Reprinted with permission from Ouyang et al [45], © Elsevier Science Publishers 1989.) 


The FIS reaction ( section A3. 14.3.2 ) has been studied in a CFUR and revealed a series of structures known as 
'serpentine patterns'; also, the birth, self-replication and death of 'spots', corresponding to regions of high 
concentration of particular species (see figure A3. 14. 17 have been observed [46]. 
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Figure A3. 14.17. Self-replicating spots in the FIS reaction in a CFUR, comparing an experimental time 
sequence with numerical simulation based on a simple autocatalytic scheme. (Reprinted with permission from 
Lee et al [46], © Macmillan Magazines Ltd. 1994.) 


A3.14.6 THEORETICAL METHODS 

Much use has been, and continues to be, made of simplified model schemes representative of general classes 
of chemical or thermal feedback. The oregonator and Lengyel-Epstein models for the BZ and CDIMA 
systems have been given earlier. Pre-eminent among the more abstracted caricature models is the brusselator 
introduced by Prigogine and Lefever [47] which has the following form: 


X 


(A3. 14. 15a) 


B + X -+ Y + D 


(A3. 14. 156) 


Y + 2X^3X 


(A3. 14. 15c) 


H. 


(A3.14.15d) 


Here, A and B are regarded as 'pool chemicals', with concentrations regarded as imposed constants. The 
concentrations of the intermediate species X and Y are the variables, with D and E being product species 
whose concentrations do not influence the reaction rates. The reaction rate equations for [X] and [Y] can be 
written in the following dimensionless form: 
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dx/d/ = A - Bx + yx 2 - x dy/d/ = Bx - yx 2 


Oscillations are found in this model if B < B* = 1 + A . 

A variation on this theme introduced by Gray and Scott, known as the 'autocatalator', is also widely 
exploited. This is often written in the form 

P -> A (A3.14.16a) 

A -*■ B (A3.14.16b) 

A * 2b -+ 313 (A3.14.16c) 

B -> C (A3.14.16d) 

so here A and B are equivalent to the Y and X in the brusselator and the main clockwork again involves the 
cubic autocatalytic step (15c) or step (16. The dimensionless equations here are 

dafdr = ft - tftf - Qh 2 dh/dr = tea +ati 2 - h (A3. 14. 17) 

where |u is a scaled concentration of the reactant P and k is a dimensionless rate coefficient for the 
'uncatalysed' conversion of A to B in step (16. In this form, the model has oscillatory behaviour over a range 
of experimental conditions: 

fi\ < p < f** 2 Wlth IW.2) 2 = J[l " 2* ±0 - 8*r) l/1 ], (A3.14.18) 

Outside this range, the system approaches the steady state obtained by setting da/dt = db/dt = 0: 

Uiss, M = WO* 2 +*)■ ^)- (A3.14.19) 

The existence of an upper and a lower limit to the range of oscillatory behaviour is more typical of observed 
behaviour in chemical systems. 
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The autocatalytic driving step (16c) can also be taken on its own, or with the 'decay' step (I6d) in models of 
open systems such as a CSTR, with an inflow of species A and, perhaps, of B. This system is then one of the 
simplest to show bistability and more complex steady-state loci of the type described in section A3. 14.2.3 . 
Also, generic features of wave front propagation can be studied on the basis of this scheme [7]. A 
comprehensive account can be found in the book by Gray and Scott (see Further Readin g). Essentially, this 
model has been used in the context of modelling the broad features of oscillations in glycolysis and, with 
some modification, for animal coat patterning through Turing-like mechanisms. 

The main theoretical methods have in common the determination of the stability of steady-state or other 


simple solutions to the appropriate form of the governing mass balance equations. Bifurcations from simple to 
more complex responses occur when such a solution loses its stability. Thus, the steady state given in equation 
(A3. 14. 19) does not cease to exist in the oscillatory region defined by (A3. 14. 18) , but is now unstable, so that, 
if the system is perturbed, it departs from the steady state and moves to the (stable) oscillatory state which is 
also a solution of the reaction rate equations (A3. 14. 17) . 

In the generalized representation of the rate equations 

du/d/ = fia. h) dhfdt = g(a, h) 

where/and g are functions of the species concentrations, a determining role is played by the eigenvalues of 
the Jacobian matrix J defined by 


_ f af/dn iif/nh \ 


evaluated with the steady-state concentrations. (This is readily generalized to ^-variable systems, with /then 
being an n x n matrix.) Bifurcations corresponding to a turning or fold point in a steady-state locus (ignition 
or extinction point) occur if a real eigenvalue passes through zero. Equivalently, this arises if the determinant 
det /= 0. This is knows as a 'saddle-node' bifurcation. The oscillatory instability, or Hopf bifurcation, occurs 
if the real part of an imaginary pair of eigenvalues passes through zero (provided all other eigenvalues are 
negative or have negative real parts). For a two-variable system, this occurs if the trace Tr /= 0, and is the 
origin of the result in equation (A3 .14.18) . 

The autocatalator model is in many ways closely related to the FONI system, which has a single first-order 
exothermic reaction step obeying an Arrhenius temperature dependence and for which the role of the 
autocatalyst is taken by the temperature of the system. An extension of this is the Sal'nikov model which 
supports 'thermokinetic' oscillations in combustion-like systems [48]. This has the form: 

P — ► A rate = kop 

A-> C-heat rate = k[{T)a. 
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The reactant P is again taken as a pool chemical, so the first step has a constant rate. The rate of the second 
step depends on the concentration of the intermediate A and on the temperature T and this step is taken as 
exothermic. (In the simplest case, k^ is taken to be independent of T and the first step is thermoneutral.) 
Again, the steady state is found to be unstable over a range of parameter values, with oscillations being 
observed. 

The approach to investigating spatial structure is similar — usually some simple solutions, such as a spatially 
uniform steady state exists and the condition for instability to spatial perturbations is determined in terms of 
eigenvalues of an extended Jacobian matrix. For conditions marginally beyond a bifurcation point (whether 
the instability is temporal or spatial), amplitude equations, such as the complex Ginsburg-Landau equation, 
are exploited [49]. For conditions far from bifurcation points, however, recourse to numerical integration is 
generally required. Frequently these will involve reaction-diffusion (and perhaps advection) equations, 
although representations of such systems in terms of cellular automata or gas-lattice models can be 
advantageous [50]. 
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B1.1 Electronic spectroscopy 

S J Strickler 


B1.1.1 INTRODUCTION 

Optical spectroscopy is the study of the absorption and emission of light by atoms, molecules, or larger 
assemblies. Electronic spectroscopy is the branch of the field in which the change produced by the absorption 
or emission is a rearrangement of the electrons in the system. These changes are interpreted in terms of the 
quantum theory of electronic structure. To a first approximation, the rearrangements usually correspond to an 
electron being transferred from one orbital to another, and a transition will be described in terms of those 
orbitals. The wavelengths or frequencies of transitions help identify atoms and molecules and give 
information about their energy levels and hence their electronic structure and bonding. Intensities of 
absorption or emission give information about the nature of the electronic states and help to determine 
concentrations of species. In the case of molecules, along with the rearrangements of electrons there are 
usually changes in nuclear motions, and the vibrational and rotational structure of electronic bands give 
valuable insights into molecular structure and properties. For all these reasons, electronic spectroscopy is one 
of the most useful tools in chemistry and physics. 

Most electronic transitions of interest fall into the visible and near-ultraviolet regions of the spectrum. This 
range of photon energies commonly corresponds to electrons being moved among valence orbitals. These 
orbitals are important to an understanding of bonding and structure, so are of particular interest in physical 
chemistry and chemical physics. For this reason, most of this chapter will concentrate on visible and near-UV 
spectroscopy, roughly the region between 200 and 700 nm, but there are no definite boundaries to the 
wavelengths of interest. Some of the valence orbitals will be so close in energy as to give spectra in the near- 
infrared region. Conversely, some valence transitions will be at high enough energy to lie in the vacuum 
ultraviolet, below about 200 nm, where air absorbs strongly and instrumentation must be evacuated to allow 
light to pass. In this region are also transitions of electrons to states of higher principal quantum number, 
known as Rydberg states. At still higher energies, in the x-ray region, are transitions of inner-shell electrons, 
and their spectroscopy has become an extremely useful tool, especially for studying solids and their surfaces. 
However, these other regions will not be covered in detail here. 

Section B 1 . 1 .2 provides a brief summary of experimental methods and instrumentation, including definitions 
of some of the standard measured spectroscopic quantities. Section B 1.1. 3 reviews some of the theory of 
spectroscopic transitions, especially the relationships between transition moments calculated from 
wavefunctions and integrated absorption intensities or radiative rate constants. Because units can be so 
confusing, numerical factors with their units are included in some of the equations to make them easier to use. 
Vibrational effects, the Franck-Condon principle and selection rules are also discussed briefly. In the final 
section, Bl.1.4 , a few applications are mentioned to particular aspects of electronic spectroscopy. 


B1.1.2 EXPERIMENTAL METHODS 


B1. 1.2.1 STANDARD INSTRUMENTATION 


There are two fundamental types of spectroscopic studies: absorption and emission. In absorption 
spectroscopy an atom or molecule in a low-lying electronic state, usually the ground state, absorbs a photon to 
go to a higher state. In emission spectroscopy the atom or molecule is produced in a higher electronic state by 
some excitation process, and emits a photon in going to a lower state. In this section we will consider the 
traditional instrumentation for studying the resulting spectra. They define the quantities measured and set the 
standard for experimental data to be considered. 

(A) EMISSION SPECTROSCOPY 

Historically, emission spectroscopy was the first technique to be extensively developed. An electrical 
discharge will break up most substances into atoms and small molecules and their ions. It also excites these 
species into nearly all possible stable states, and the higher states emit light by undergoing transitions to lower 
electronic states. In typical classical instruments light from the sample enters through a slit, is collimated by a 
lens or mirror, is dispersed by a prism or grating so that different colours or wavelengths are travelling in 
different directions and is focused on a detector. Gratings are generally favoured over prisms. A concave 
grating may do the collimation, dispersion and focusing all in one step. The angle at which the light is 
reflected by a plane grating is simply related to the spacing, d, of the grooves ruled on the grating and the 
wavelength, X, of the light: 

±n\ = dis'ma + sin^) 

where a and (3 are the angles of incidence and reflection measured from the normal, and n is the order of the 
reflection, i.e. the number of cycles by which wave fronts from successive grooves differ for constructive 
interference. (The angles may be positive or negative depending on the experimental arrangement.) The first 
determinations of the absolute wavelengths of light and most of our knowledge of energies of excited states of 
atoms and molecules came from measuring these angles. In everyday use now the wavelength scale of an 
instrument can be calibrated using the wavelengths of known atomic lines. Grating instruments do have the 
disadvantage in comparison with prism spectrometers that different orders of light may overlap and need to be 
sorted out. 

The earliest detector was the human eye observing the different colours. More versatile is a photographic 
plate, where each wavelength of light emitted shows up as a dark line (an image of the entrance slit) on the 
plate. It has the advantage that many wavelengths are measured simultaneously. Quantitative measurements 
are easier with a photomultiplier tube placed behind an exit slit. The spectrum is obtained as the different 
wavelengths are scanned across the slit by rotating the grating. The disadvantage is that measurements are 
made one wavelength at a time. With the advent of solid-state electronics, array detectors have become 
available that will measure many wavelengths at a time much like a photographic plate, but which can be read 
out quickly and quantitatively into a computer or other data system. 

The design and use of spectrographs or spectrometers involves a compromise between resolution — how close 
in wavelength two lines can be and still be seen as separate — and sensitivity — how weak a light can be 
observed or how 


long it takes to make a measurement. Books have been written about the design of such instruments [1], and 
the subject cannot be pursued in this work. 

Larger molecules generally cannot be studied in quite the same way, as an electric discharge merely breaks 
them up into smaller molecules or atoms. In such a case excited states are usually produced by optical 
excitation using light of the same or higher energy. Many modern fluorimeters are made with two 


monochromators, one to select an excitation wavelength from the spectrum of a suitable lamp, and the other to 
observe the emission as discussed above. Most studies of large molecules are done on solutions because 
vapour pressures are too low to allow gaseous spectra. 

The fundamental measurements made in emission spectroscopy are the wavelengths or frequencies and the 
intensities of emission lines or bands. The problem with intensity measurements is that the efficiency of a 
dispersing and detecting system varies with wavelength. Relative intensities at a single wavelength are usually 
quite easily measured and can be used as a measure of concentration or excitation efficiency. Relative 
intensities of lines at nearly the same wavelength, say different rotational lines in a given band, can usually be 
obtained by assuming that the efficiency is the same for all. But the absolute intensity of a band or relative 
intensities of well separated bands require a calibration of the sensitivity of the instrument as a function of 
wavelength. In favourable cases this may be done by recording the spectrum of a standard lamp that has in 
turn been calibrated to give a known spectrum at a defined lamp current. For critical cases it may be necessary 
to distinguish between intensities measured in energy flow per unit time or in photons per unit time. 

(B) ABSORPTION SPECTROSCOPY 

Absorption spectroscopy is a common and well developed technique for studying electronic transitions 
between the ground state and excited states of atoms or molecules. A beam of light passes through a sample, 
and the amount of light that is absorbed during the passage is measured as a function of the wavelength or 
frequency of the light. The absorption is measured by comparing the intensity, /, of light leaving the sample 
with the intensity, / Q , entering the sample. The transmittance, T, is defined as the ratio 

T = Ijh- 

It is often quoted as a percentage. In measuring the spectra of gases or solutions contained in cells, / Q is 
usually taken to be the light intensity passing through an empty cell or a cell of pure solvent. This corrects 
well for reflection at the surfaces, absorption by the solvent or light scattering, which are not usually the 
quantities of interest. 

It is usually convenient to work with the decadic absorbance, A, defined by 

y\ = log(/ n //)=-log7\ 

The unmodified term absorbance usually means this quantity, though some authors use the Napierian 
absorbance B = - In T. The absorbance is so useful because it normally increases linearly with path length, /, 
through the sample and with the concentration, c, of the absorbing species within the sample. The relationship 
is usually called Beer's law: 


A = eel. 

The quantity s is called the absorption coefficient or extinction coefficient, more completely the molar 
decadic absorption coefficient; it is a characteristic of the substance and the wavelength and to a lesser extent 
the solvent and temperature. It is common to take path length in centimetres and concentration in moles per 

litre, so s has units of 1 mol -1 cm -1 . The electronic absorption spectrum of a compound is usually shown as a 
plot of s versus wavelength or frequency. 

Another useful quantity related to extinction coefficient is the cross section, a, defined for a single atom or 
molecule. It may be thought of as the effective area blocking the beam at a given wavelength, and the value 


may be compared with the size of the molecule. The relationship is 

<t = tliilO)£/A/ A 

where N A is Avogadro's number. If e is in 1 mol -1 cm -1 and a is desired in cm 2 the relationship may be 
written 

<x = {3,8235 x ](T JI cm 1 mol 1" V 

The standard instrument for measuring an absorption spectrum is a double-beam spectrophotometer. A typical 
instrument uses a lamp with a continuous spectrum to supply the light, usually a tungsten lamp for the visible, 
near-infrared, and near-ultraviolet regions and a discharge lamp filled with hydrogen or deuterium for farther 
in the ultraviolet. Light from the source passes through a monochromator to give a narrow band of 
wavelengths, and is then split into two beams. One beam passes through a cell containing the sample, the 
other through an identical reference cell filled with solvent. These beams define / and 7 , respectively. The 
beams are monitored by a detection system, usually a photomultiplier tube. An electronic circuit measures the 
ratio of the two intensities and displays the transmittance or absorbance. The wavelength is varied by scanning 
the monochromator, and the spectrum may be plotted on a chart recorder. 

A different design of instrument called a diode array spectrometer has become popular in recent years. In this 
instrument the light from the lamp passes through the sample, then into a spectrometer to be dispersed, and 
then is focused onto an array of solid-state detectors arranged so that each detector element measures intensity 
in a narrow band of wavelengths — say one detector for each nanometre of the visible and ultraviolet regions. 
The output is digitized and the spectrum displayed on a screen, and it can be read out in digital form and 
processed with a computer. The complete spectrum can be recorded in a few seconds. This is not formally a 
double-beam instrument, but because a spectrum is taken so quickly and handled so easily, one can record the 
spectrum of a reference cell and the sample cell and then compare them in the computer, so it serves the same 
purpose. The available instruments do not give quite the resolution or versatility of the standard 
spectrophotometers, but they are far quicker and easier to use. 

B1. 1.2.2 SOME MODERN TECHNIQUES 

The traditional instruments for measuring emission and absorption spectra described above set the standard 
for the types of information which can be obtained and used by spectroscopists. In the more recent past, 
several new 


techniques have become available which have extended the range of spectroscopic measurements to higher 
resolution, lower concentrations of species, weaker transitions, shorter time scales, etc. Many studies in 
electronic spectroscopy as a branch of physical chemistry or chemical physics are now done using these new 
techniques. The purpose of this section is to discuss some of them. 

(A) LASERS 

The foremost of the modern techniques is the use of lasers as spectroscopic tools. Lasers are extremely 
versatile light sources. They can be designed with many useful properties (not all in the same instrument) such 
as high intensity, narrow frequency bandwidth with high-frequency stability, tunability over reasonable 
frequency ranges, low-divergence beams which can be focused into very small spots, or pulsed beams with 


very short time durations. There are nearly as many different experimental arrangements as there are 
experimenters, and only a few examples will be mentioned here. 

While a laser beam can be used for traditional absorption spectroscopy by measuring / and 7 , the strength of 
laser spectroscopy lies in more specialized experiments which often do not lend themselves to such 
measurements. Other techniques are commonly used to detect the absorption of light from the laser beam. A 
common one is to observe fluorescence excited by the laser. The total fluorescence produced is normally 
proportional to the amount of light absorbed. It can be used as a measurement of concentration to detect 
species present in extremely small amounts. Or a measurement of the fluorescence intensity as the laser 
frequency is scanned can give an absorption spectrum. This may allow much higher resolution than is easily 
obtained with a traditional absorption spectrometer. In other experiments the fluorescence may be dispersed 
and its spectrum determined with a traditional spectrometer. In suitable cases this could be the emission from 
a single electronic-vibrational-rotational level of a molecule and the experimenter can study how the 
spectrum varies with level. 

Other methods may also be useful for detecting the absorption of laser radiation. For example, the heat 
generated when radiation is absorbed can be detected in several ways. One way observes the defocusing of the 
laser beam when the medium is heated and its refractive index changes. Another way, called photoacoustic 
spectroscopy, detects sound waves or pressure pulses when light is absorbed from a pulsed laser. Still another 
method useful with high-intensity pulsed lasers is to measure light absorption by the excited states produced. 
This is often useful for studying the kinetics of the excited species as they decay or undergo reactions. 

Another example of a technique for detecting absorption of laser radiation in gaseous samples is to use 
multiphoton ionization with intense pulses of light. Once a molecule has been electronically excited, the 
excited state may absorb one or more additional photons until it is ionized. The electrons can be measured as a 
current generated across the cell, or can be counted individually by an electron multiplier; this can be a very 
sensitive technique for detecting a small number of molecules excited. 

(B) EXCITED-STATE LIFETIMES 

Measurements of the decay rates of excited states are important, both for the fundamental spectroscopic 
information they can give, and for studies of other processes such as energy transfer or photochemistry. The 
techniques used vary greatly depending on the time scale of the processes being studied. For rather long time 
scales, say of the order of a millisecond or longer, it is rather simple to excite the molecules optically, cut off 
the exciting light, and watch the decay of emission or some other measurement of excited-state concentration. 


For fluorescent compounds and for times in the range of a tenth of a nanosecond to a hundred microseconds, 
two very successful techniques have been used. One is the phase-shift technique. In this method the 
fluorescence is excited by light whose intensity is modulated sinusoidally at a frequency/ chosen so its 
period is not too different from the expected lifetime. The fluorescent light is then also modulated at the same 
frequency but with a time delay. If the fluorescence decays exponentially, its phase is shifted by an angle A(|) 
which is related to the mean life, x, of the excited state. The relationship is 

[an 40 = 2tt/t. 

The phase shift is measured by comparing the phase of the fluorescence with the phase of light scattered by a 
cloudy but non-fluorescent solution. 

The other common way of measuring nanosecond lifetimes is the time-correlated single-photon counting 


technique [2]. In this method the sample is excited by a weak, rapidly repeating pulsed light source, which 
could be a flashlamp or a mode-locked laser with its intensity reduced. The fluorescence is monitored by a 
photomultiplier tube set up so that current pulses from individual photons can be counted. It is usually 
arranged so that at most one fluorescence photon is counted for each flash of the excitation source. A time-to- 
amplitude converter and a multichannel analyser (equipment developed for nuclear physics) are used to 
determine, for each photon, the time between the lamp flash and the photon pulse. A decay curve is built up 
by measuring thousands of photons and sorting them by time delay. The statistics of such counting 
experiments are well understood and very accurate lifetimes and their uncertainties can be determined by 
fitting the resulting decay curves. 

One advantage of the photon counting technique over the phase-shift method is that any non-exponential 
decay is readily seen and studied. It is possible to detect non-exponential decay in the phase-shift method too 
by making measurements as a function of the modulation frequency, but it is more cumbersome. 

At still shorter time scales other techniques can be used to determine excited-state lifetimes, but perhaps not 
as precisely. Streak cameras can be used to measure faster changes in light intensity. Probably the most useful 
techniques are pump-probe methods where one intense laser pulse is used to excite a sample and a weaker 
pulse, delayed by a known amount of time, is used to probe changes in absorption or other properties caused 
by the excitation. At short time scales the delay is readily adjusted by varying the path length travelled by the 
beams, letting the speed of light set the delay. 

(C) PHOTOELECTRON SPECTROSCOPY 

Only brief mention will be made here of photoelectron spectroscopy. This technique makes use of a beam of 
light whose energy is greater than the ionization energy of the species being studied. Transitions then occur in 
which one of the electrons of the molecule is ejected. Rather than an optical measurement, the kinetic energy 
of the ejected electron is determined. Some of the technology is described in section B 1.6 on electron energy- 
loss spectroscopy. The ionization energy of the molecule is determined from the difference between the 
photon energy and the kinetic energy of the ejected electron. 

A useful light source is the helium resonance lamp which produces light of wavelength 58.4 nm or a photon 
energy of 21.2 eV, enough to ionize any neutral molecule. Often several peaks can be observed in the 
photoelectron spectrum 


corresponding to the removal of electrons from different orbitals. The energies of the peaks give 
approximations to the orbital energies in the molecule. They are useful for comparison with theoretical 
calculations. 

An interesting variation on the method is the use of a laser to photodetach an electron from a negative ion 
produced in a beam of ions. Since it is much easier to remove an electron from a negative ion than from a 
neutral molecule, this can be done with a visible or near-ultraviolet laser. The difference between the photon 
energy and the electron energy, in this case, gives the electron affinity of the neutral molecule remaining after 
the photodetachment, and may give useful energy levels of molecules not easily studied by traditional 
spectroscopy [3]. 

(D) OTHER TECHNIQUES 

Some other extremely useful spectroscopic techniques will only be mentioned here. Probably the most 
important one is spectroscopy in free jet expansions. Small molecules have often been studied by gas-phase 
spectroscopy where sharp rotational and vibrational structure gives detailed information about molecular 


states and geometries. The traditional techniques will often not work for large molecules because they must be 
heated to high temperatures to vaporize them and then the spectra become so broad and congested that 
detailed analysis is difficult or impossible. In jet spectroscopy the gaseous molecules are mixed with an inert 
gas and allowed to expand into a vacuum. The nearly adiabatic expansion may cool them rapidly to 
temperatures of a few degrees Kelvin while leaving them in the gas phase. The drastic simplification of the 
spectrum often allows much more information to be extracted from the spectrum. 

Fourier-transform instruments can also be used for visible and ultraviolet spectroscopy. In this technique, 
instead of dispersing the light with a grating or prism, a wide region of the spectrum is detected 
simultaneously by splitting the beam into two components, one reflected from a stationary mirror and one 
from a movable mirror, and then recombining the two beams before they enter the detector. The detected 
intensity is measured as a function of the position of the movable mirror. Because of interference between the 
two beams, the resulting function is the Fourier transform of the normal spectrum as a function of wavelength. 
This offers some advantages in sensitivity and perhaps resolution because the whole spectrum is measured at 
once rather than one wavelength at a time. The technique is not too common in electronic spectroscopy, but is 
very widely used for the infrared. It is described more fully in the chapter on vibrational spectroscopy, section 
B1.2 . While the light sources and detectors are different for the visible and ultraviolet region, the principles of 
operation are the same. 


B1. 1.3 THEORY 

The theory of absorption or emission of light of modest intensity has traditionally been treated by time- 
dependent perturbation theory [4]. Most commonly, the theory treats the effect of the oscillating electric field 
of the light wave acting on the electron cloud of the atom or molecule. The instantaneous electric field is 
assumed to be uniform over the molecule but it oscillates in magnitude and direction with a frequency v. The 
energy of a system of charges in a uniform electric field, E, depends on its dipole moment according to 

E = -pi . E 


where the dipole moment is defined by 


r t 


and the q f and r. are the charges and positions of the particles, i.e. the electrons and nuclei. The result of the 
time-dependent perturbation theory is that the transition probability for a transition between one quantum state 
i and another state y is proportional to the absolute value squared of the matrix element of the electric dipole 
operator between the two states 


ftj = / *>*/ d 


r 

(B1.1.1) 

transit ion probability oc |/*y| 2 . 

The transition occurs with significant probability only if the frequency of the light is very close to the familiar 
resonance condition, namely hv = AE, where h is Planck's constant and AE is the difference in energy of the 


two states. However, transitions always occur over a range of frequencies because of various broadening 
effects; if nothing else, as required by the uncertainty principle, the states will not have precisely defined 
energies if they have finite lifetimes. 


B1. 1.3.1 ABSORPTION SPECTROSCOPY 


(A) INTEGRATED ABSORPTION INTENSITY 


The relationship between the theoretical quantity \i.. and the experimental parameter s of absorption 
spectroscopy involves, not the value of s at any one wavelength, but its integral over the absorption band. The 
relationship is 


/ 


* dr ' = ->,* t A !> J 2 = (2.512 x 10 1 * I mol" 1 cm-'j^lftjl 2 - ( B1 - 1 - 2 ) 


We will quote a numerical constant in some of these equations to help with actual calculations. The units can 
be very confusing because it is conventional to use non-SI units for several quantities. The wavenumber 
value, v, is usually taken to be in cm -1 . The extinction coefficient is conveniently taken in units of 1 mol -1 

cm" . We have inserted the factor e into the equation because values of |i.. are usually calculated with the 
charges measured in units of the electron charge. For the sake of consistency, we have quoted the numerical 
factor appropriate for \i/e taken in centimetres, but the values are easily converted to use other length units 
such as Angstroms or atomic units. The value of Din the right-hand side of the equation is to be interpreted as 
a suitable average frequency for the transition. This causes no difficulty unless the band is very broad. Some 
of the difficulties with different definitions of intensity terms have been discussed by Hilborn [5]. 


(B) OSCILLATOR STRENGTH 

A related measure of the intensity often used for electronic spectroscopy is the oscillator strength,/ This is a 
dimensionless ratio of the transition intensity to that expected for an electron bound by Hooke's law forces so 
as to be an isotropic harmonic oscillator. It can be related either to the experimental integrated intensity or to 
the theoretical transition moment integral: 

(B1.1.3) 

or 

(B1.1.4) 

The harmonically bound electron is, in a sense, an ideal absorber since its harmonic motion can maintain a 
perfect phase relationship with the oscillating electric field of the light wave. Strong electronic transitions 
have oscillator strengths of the order of unity, but this is not, as sometimes stated, an upper limit to/ For 
example, some polyacetylenes have bands with oscillator strengths as high as 5 [6]. There is a theorem, the 
Kuhn-Thomas sum rule, stating that the sum of the oscillator strengths of all electronic transitions must be 
equal to the number of electrons in an atom or molecule [7]. 

In the above discussion we have used the electric dipole operator |i. It is also sometimes possible to observe 
electronic transitions occurring due to interaction with the magnetic field of the light wave. These are called 
magnetic dipole transitions. They are expected to be weaker than electric dipole transitions by several orders 
of magnitude. If account is taken of a variation of the field of the light wave over the size of the molecule it is 
possible to treat quadrupole or even higher multipole transitions. These are expected to be even weaker than 
typical magnetic dipole transitions. We will concentrate on the more commonly observed electric dipole 


transitions. 

Equation (B 1.1.1) for the transition moment integral is rather simply interpreted in the case of an atom. The 
wavefunctions are simply functions of the electron positions relative to the nucleus, and the integration is over 
the electronic coordinates. The situation for molecules is more complicated and deserves discussion in some 
detail. 

(C) TRANSITION MOMENTS FOR MOLECULES 

Electronic spectra are almost always treated within the framework of the Born-Oppenheimer approximation 
[8] which states that the total wavefunction of a molecule can be expressed as a product of electronic, 
vibrational, and rotational wavefunctions (plus, of course, the translation of the centre of mass which can 
always be treated separately from the internal coordinates). The physical reason for the separation is that the 
nuclei are much heavier than the electrons and move much more slowly, so the electron cloud normally 
follows the instantaneous position of the nuclei quite well. The integral of equation (B 1.1.1) is over all 
internal coordinates, both electronic and nuclear. Integration over the rotational wavefunctions gives 
rotational selection rules which determine the fine structure and band shapes of electronic transitions in 
gaseous molecules. Rotational selection rules will be discussed below. For molecules in condensed phases the 
rotational motion is suppressed and replaced by oscillatory and diffusional motions. 
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In this section we concentrate on the electronic and vibrational parts of the wavefunctions. It is convenient to 
treat the nuclear configuration in terms of normal coordinates describing the displacements from the 
equilibrium position. We call these nuclear normal coordinates Q. and use the symbol Q without a subscript to 
designate the whole set. Similarly, the symbol x. designates the coordinates of the /th electron and x the whole 
set of electronic coordinates. We also use subscripts 1 and u to designate the lower and upper electronic states 
of a transition, and subscripts a and b to number the vibrational states in the respective electronic states. The 
total wavefunction 4*can be written 


Here each § (Q) is a vibrational wavefunction, a function of the nuclear coordinates Q, in first approximation 
usually a product of harmonic oscillator wavefunctions for the various normal coordinates. Each \\f (x,Q) is the 
electronic wavefunction describing how the electrons are distributed in the molecule. However, it has the 
nuclear coordinates within it as parameters because the electrons are always distributed around the nuclei and 
follow those nuclei whatever their position during a vibration. The integration of equation (B 1.1.1) can be 
carried out in two steps — first an integration over the electronic coordinates x, and then integration over the 
nuclear coordinates Q. We define an electronic transition moment integral which is a function of nuclear 
position: 

/*h,ce> = j *'(*, <?W'uU* odjc. (Bi.1.5) 

We then integrate this over the vibrational wavefunctions and coordinates: 

t^.ui, = / *i>*i* dr = J K(Q)i L \AQ)MQ) <*Q (B1.1.6) 


This last transition moment integral, if plugged into equation (B 1.1.2) , will give the integrated intensity of a 
vibronic band, i.e. of a transition starting from vibrational state a of electronic state 1 and ending on 
vibrational level b of electronic state u. 

(D) THE FRANCK— CONDON PRINCIPLE 

The electronic transition moment of equation (Bl.1.5) is related to the intensity that the transition would have 
if the nuclei were fixed in configuration g, but its value may vary with that configuration. It is often useful to 
expand ^ u (0 as a power series in the normal coordinates, Q:. 


?(&).* 


ttJfi) = vUO) + V Mtt )Qi + - ■ (B1.1.7) 
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Here ^ u (0) is the value at the equilibrium position of the initial electronic state. 

In many cases the variation is not very strong for reasonable displacements from equilibrium, and it is 
sufficient to use only the zero-order term in the expansion. If this is inserted into equation (Bl.1.6) we get 

and using this in equation (Bl.1.2) for the integrated intensity of a vibronic band we get the relationship 


/•"-^s^^H/*^®'* 


(B1.1.8) 


The last factor, the square of the overlap integral between the initial and final vibrational wavefunctions, is 
called the Franck-Condon factor for this transition. 

The Franck-Condon principle says that the intensities of the various vibrational bands of an electronic 
transition are proportional to these Franck-Condon factors. (Of course, the frequency factor must be included 
for accurate treatments.) The idea was first derived qualitatively by Franck through the picture that the 
rearrangement of the light electrons in the electronic transition would occur quickly relative to the period of 
motion of the heavy nuclei, so the position and momentum of the nuclei would not change much during the 
transition [9]. The quantum mechanical picture was given shortly afterwards by Condon, more or less as 
outlined above [10]. 

The effects of the principle are most easily visualized for diatomic molecules for which the vibrational 
potential can be represented by a potential energy curve. A typical absorption starts from the lowest 
vibrational level of the ground state (actually a thermal distribution of low-lying levels). A useful qualitative 
statement of the Franck-Condon principle is that vertical transitions should be favoured. Figure B 1.1.1(a) 
illustrates the case where the potential curve for the excited state lies nearly directly above that for the ground 
state. Then by far the largest overlap of excited state wavefunctions with the lowest level of the ground state 
will be for the v = level, and we expect most intensity to be in the so-called 0-0 band, i.e. from v = in the 
lower state to v = in the upper state. A case in point is the transition of the 2 molecule at about 750 nm in 
the near-infrared. (This is actually a magnetic dipole transition rather than electric dipole, so it is very weak, 


but the vibrational effects are the same.) Both ground and excited state have a (n*) electron configuration 
and nearly the same equilibrium bond length, only 0.02 A different. The spectrum shows most of the intensity 
in the 0-0 band with less than one tenth as much in the 1-0 band [11]. 
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Figure Bl.1.1. (a) Potential curves for two states with little or no difference in the equilibrium position of the 
upper and lower states. A transition of 2 , with displacement only 0.02 A, is shown as an example. Data taken 
from [11]. Most of the intensity is in the 0-0 vibrational band with a small intensity in the 1-0 band, (b) 
Potential curves for two states with a large difference in the equilibrium position of the two states. A transition 
in I 2 , with a displacement of 0.36 A, is shown as an example. Many vibrational peaks are observed. 

Figure B 1.1. 1(b) shows a contrasting case, where the potential curve for the excited state is displaced 
considerably relative to the ground-state curve. Then a vertical transition would go to a part of the excited- 
state curve well displaced from the bottom, and the maximum overlap and greatest intensity should occur for 
high-lying levels. There results a long progression of bands to various vibrational levels. The spectrum of I 2 is 
shown as an illustration; here the displacement between the two minima is about 0.36 A. Many vibronic 
transitions are seen. One can observe the excited-state levels getting closer together and converging as the 
dissociation limit is approached, and part of the absorption goes to continuum states above the dissociation 
energy. (The long-wavelength part of the spectrum is complicated by transitions starting from thermally 
excited vibrational levels of the ground state.) 

(E) BEYOND FRANCK— CONDON 

There are cases where the variation of the electronic transition moment with nuclear configuration cannot be 
neglected. Then it is necessary to work with equation (B 1.1.6) keeping the dependence of |u lu on Q and 
integrating it over the vibrational wavefunctions. In most such cases it is adequate to use only the terms up to 
first-order in equation (Bl.1.7) . This results in 'modified Franck-Condon factors' for the vibrational 
intensities [12]. 

(F) TOTAL INTENSITY OF AN ELECTRONIC TRANSITION 


Equation (Bl.1.8) gives the intensity of one vibronic band in an absorption spectrum. It is also of interest to 
consider 
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the total integrated intensity of a whole electronic transition, i.e. the sum of all the vibronic bands 
corresponding to the one electronic change. In the most common absorption spectroscopy experiment we can 
assume that all transitions originate in the lowest vibrational level of the ground electronic state, which we can 
designate as level 10. The transitions can go to various levels ub of the upper electronic state. The total 
integrated intensity is then obtained by summing over the index b which numbers the excited state vibrational 
levels. 


J o/ic'fo In IU ^77 | / 


(B1.1.9) 


This equation can be simplified if the frequency term !\o-»uB * s remove ^ f rom ^ e summation. One way to do 

this is to incorporate it into the integral on the left-hand side by writing j s d In i. The alternative is to use an 
appropriate average Doutside the sum, choosing the proper average by making the expressions equal. Often it 
is enough to pick an average by eye, but if high accuracy is important the value to use is given by 


fed In v 

With the frequency removed from the sum, (Bl.1.9) has just a sum over vibrational integrals. Because all the 
vibrational wavefunctions for a given potential surface will form a complete set, it is possible to apply a sum 
rule to simplify the resulting expression: 

h \J \ J 

i.e. the sum is just the mean value of |^ M (0| i n the initial vibrational state. Then the total integrated intensity 
of the electronic band is given by 

j eii - S/* ( * (B|1 * ,iiiid(! - <B1 1 10) 

If we can get by with using only the zero-order term of ( B 1.1. 7 ), we can take |u lu out of the integral and use 
the fact that ^ is normalized. The last equation then simplifies further to 
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edv = ^|^ lu (0)| J = (2.512 x ](T 1 mol" 1 cm^^l^tO)^ (B1.1.11) 


Equation (B 1.1. 10) and equation (B 1.1.1 1) are the critical ones for comparing observed intensities of 
electronic transitions with theoretical calculations using the electronic wavefunctions. The transition moment 
integral \i ]u 
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calculated from electronic wavefunctions is related to the absorption intensity integrated over the whole 
electronic transition. It is found that simple forms of electronic wavefunctions often do not give very good 
intensities, and high-quality wavefunctions are required for close agreement with experiment. 

B1. 1.3.2 EMISSION SPECTROSCOPY 

The interpretation of emission spectra is somewhat different but similar to that of absorption spectra. The 
intensity observed in a typical emission spectrum is a complicated function of the excitation conditions which 
determine the number of excited states produced, quenching processes which compete with emission, and the 
efficiency of the detection system. The quantities of theoretical interest which replace the integrated intensity 
of absorption spectroscopy are the rate constant for spontaneous emission and the related excited-state 
lifetime. 

(A) EMISSION RATE CONSTANT 

Einstein derived the relationship between spontaneous emission rate and the absorption intensity or stimulated 
emission rate in 1917 using a thermodynamic argument [13]. Both absorption intensity and emission rate 
depend on the transition moment integral of equation (B 1.1.1) , so that gives us a way to relate them. The 
symbol A is often used for the rate constant for emission; it is sometimes called the Einstein A coefficient. For 
emission in the gas phase from a state i to a lower state y we can write 
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(B) MOLECULAR EMISSION AND THE FRANCK— CONDON PRINCIPLE 

For molecules we can use Born-Oppenheimer wavefunctions and talk about emission from one vibronic level 
to another. Equation (Bl.1.5) , equation (bl.1.6) and equation (bl.1.7) can be used just as they were for 
absorption. If we have an emission from vibronic state ub to the lower state la, the rate constant for emission 
would be given by 
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If we can use only the zero-order term in equation (Bl.1.7) we can remove the transition moment from the 
integral and recover an equation involving a Franck-Condon factor: 
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Now the spectrum will show various transitions originating with state ub and ending on the various 
vibrational levels la of the lower electronic state. Equation (B 1. 1. 14) (or B 1.1. 13) if we have to worry about 
variation of transition 
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moment) gives us a way of comparing the intensities of the bands. The intensities will be proportional to the 
v4 u/? _^ a provided that we measure the intensity in photons per unit time rather than the more conventional 


units of energy per unit time. The first part of the expression in ( B 1.1. 14 ) is the same for all the transitions. 
The part that varies between bands is the Franck-Condon factor multiplied by the cube of the frequency. 
Equation (Bl.1.8) for absorption intensity also had a frequency factor, but the variation in frequency has more 
effect in emission spectroscopy because it appears to a higher power. Equation (B 1.1. 14) embodies the 
Franck-Condon principle for emission spectroscopy. 

(C) EXCITED-STATE LIFETIME 

We now discuss the lifetime of an excited electronic state of a molecule. To simplify the discussion we will 
consider a molecule in a high-pressure gas or in solution where vibrational relaxation occurs rapidly, we will 
assume that the molecule is in the lowest vibrational level of the upper electronic state, level uO, and we will 
further assume that we need only consider the zero-order term of equation (B 1 . 1 .7) . A number of radiative 
transitions are possible, ending on the various vibrational levels la of the lower state, usually the ground state. 
The total rate constant for radiative decay, which we will call ^ u0 ^, is the sum of the rate constants, 
^ u 0-»A,a' ^y summing the terms in equation (Bl.1.14) we can get an expression relating the radiative lifetime 
to the theoretical transition moment |u ul . Further, by relating the transition moment to integrated absorption 
intensity we can get an expression for radiative rate constant involving only experimental quantities and not 
dependent on the quality of the electronic wavefunctions: 


= (7.235 x 10 10 cm s" V * 1 ^'^ l/^WI 3 (B1/M5) 


or 

= (2.881 x 10 -9 s _L l _1 roolcm 4 )(V 3 };y— fsdlnO. 


(B1.1.16) 


These equations contain the peculiar average fluorescence frequency {v7 3 )~', the reciprocal of the average 

value of u in the fluorescence spectrum. It arises because the fluorescence intensity measured in photons per 

unit time has a v 3 dependence. For completeness we have added a term /i 2 , the square of the refractive index, 
to be used for molecules in solution, and the term g 1 / g u , the ratio of the degeneracies of the lower and upper 

electronic states, to allow for degenerate cases [14]. It is also possible to correct for a variation of transition 

moment with nuclear configuration if that should be necessary [15]. 

A u0 ^ is the first-order rate constant for radiative decay by the molecule. It is the reciprocal of the intrinsic 
mean life of the excited state, x n : 
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1/Tfl = A u o_h|. 

If there are no competing processes the experimental lifetime x should equal t q . Most commonly, other 
processes such as non-radiative decay to lower electronic states, quenching, photochemical reactions or 


energy transfer may compete with fluorescence. They will reduce the actual lifetime. As long as all the 
processes are first-order in the concentration of excited molecules, the decay will remain exponential and the 
mean life x will be reduced by a factor of the fluorescence quantum yield, Op the fraction of the excited 
molecules which emit: 

T = *fTci- (B1.1.17) 

B1. 1.3.3 SELECTION RULES 

Transition intensities are determined by the wavefunctions of the initial and final states as described in the last 
sections. In many systems there are some pairs of states for which the transition moment integral vanishes 
while for other pairs it does not vanish. The term 'selection rule' refers to a summary of the conditions for 
non- vanishing transition moment integrals — hence observable transitions — or vanishing integrals so no 
observable transitions. We discuss some of these rules briefly in this section. Again, we concentrate on 
electric dipole transitions. 

(A) ATOMS 

The simplest case arises when the electronic motion can be considered in terms of just one electron: for 
example, in hydrogen or alkali metal atoms. That electron will have various values of orbital angular 
momentum described by a quantum number /. It also has a spin angular momentum described by a spin 
quantum number s of L and a total angular momentum which is the vector sum of orbital and spin parts with 

quantum numbery. In the presence of a magnetic field the component of the angular momentum in the field 
direction becomes important and is described by a quantum number m. The selection rules can be summarized 
as 

A/ = ±1 Ay = 0,±1 Aiu=0. ±1. 

This means that one can see the electron undergo transitions from an s orbital to a p orbital, from a p orbital to 
s or d, from a d orbital to p or f, etc, but not s to s, p to p, s to d, or such. In terms of state designations, one 

can have transitions from S 1/2 to ^1/2 or to ^3/2' etc * 

In more complex atoms there may be a strong coupling between the motion of different electrons. The states 
are usually described in terms of the total orbital angular momentum L and the total spin angular momentum 
S. These are coupled to each other by an interaction called spin-orbital coupling, which is quite weak in light 
atoms but gets rapidly stronger as the nuclear charge increases. The resultant angular momentum is given the 
symbol J. States are named using capital letters S, P, D, F, G,. . . to designate L values of 0, 1, 2, 3, 4, .... A 
left superscript gives the multiplicity (2S +1), and a right subscript gives the value of/, for example S Q , P 1 
or 2 D c/0 . 
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There is a strict selection rule for J: 

A J = 0. ± 1 wiih I he rcsmciion ihai J = 10 J = is forbidden. 

There are approximate selection rules for L and S, namely 


AL = 0, ±1 and AS = Q. 

These hold quite well for light atoms but become less dependable with greater nuclear charge. The term 
'intercombination bands' is used for spectra where the spin quantum number S changes: for example, singlet- 
triplet transitions. They are very weak in light atoms but quite easily observed in heavy ones. 

(B) ELECTRONIC SELECTION RULES FOR MOLECULES 

Atoms have complete spherical symmetry, and the angular momentum states can be considered as different 
symmetry classes of that spherical symmetry. The nuclear framework of a molecule has a much lower 
symmetry. Symmetry operations for the molecule are transformations such as rotations about an axis, 
reflection in a plane, or inversion through a point at the centre of the molecule, which leave the molecule in an 
equivalent configuration. Every molecule has one such operation, the identity operation, which just leaves the 
molecule alone. Many molecules have one or more additional operations. The set of operations for a molecule 
form a mathematical group, and the methods of group theory provide a way to classify electronic and 
vibrational states according to whatever symmetry does exist. That classification leads to selection rules for 
transitions between those states. A complete discussion of the methods is beyond the scope of this chapter, but 
we will consider a few illustrative examples. Additional details will also be found in section A 1.4 on 
molecular symmetry. 

In the case of linear molecules there is still complete rotational symmetry about the internuclear axis. This 
leads to the conservation and quantization of the component of angular momentum in that direction. The 
quantum number for the component of orbital angular momentum along the axis (the analogue of L for an 
atom) is called A. States which have A = 0, 1, 2, ... are called S, n, A, . . . (analogous to S, P, D, . . . of 
atoms). S states are non-degenerate while II, A, and higher angular momentum states are always doubly 
degenerate because the angular momentum can be in either direction about the axis. S states need an 

additional symmetry designation. They are called S + or IT according to whether the electronic wavefunction 
is symmetric or antisymmetric to the symmetry operation of a reflection in a plane containing the internuclear 
axis. If the molecule has a centre of symmetry like N 2 or C0 2 , there is an additional symmetry classification, 
g or u, depending on whether the wavefunction is symmetric or antisymmetric with respect to inversion 
through that centre. Symmetries of states are designated by symbols such as n^ ]~I U . E* etc. Finally, the 

electronic wavefunctions will have a spin multiplicity (2S +1), referred to as singlet, doublet, triplet, etc. The 
conventional nomenclature for electronic states is as follows. The state is designated by its symmetry with a 
left superscript giving its multiplicity. An uppercase letter is placed before the symmetry symbol to indicate 
where it stands in order of energy: the ground state is designated X, higher states of the same multiplicity are 
designated A, B, C, . . . in order of increasing energy, and states of different multiplicity are designated a, b, c, 
... in order of increasing energy. (Sometimes, after a classification of states as A, B, C, etc has become well 
established, new states will be discovered lying between, say, the B and C states. Then the new states may be 
designated B', B" and so on rather 
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x'i: 


than renaming all the states.) For example, the C 2 molecule has a singlet ground state designated as a 2- s , a 

triplet state designated a 3 n u lying only 700 cm -1 above the ground state, another triplet 5700 cm -1 higher in 
energy designated b" E~, a singlet state designated A 1 IT u lying 8400 cm above the ground state and many 

other known states [16]. A transition between the ground state and the A state would be designated as 
A ' n u -X ] E*. The convention is to list the upper state first regardless of whether the transition is being 

studied in absorption or emission. 

The electronic selection rules for linear molecules are as follows. AA = 0, ± 1. AS = 0. Again, these are really 


valid only in the absence of spin-orbital coupling and are modified in heavy molecules. For transitions 
between 2 states there is an additional rule that 2 + combines only with 2 + and IT combines only with E~ so 
that transitions between 2 + states and IT states are forbidden. If the molecule has a centre of symmetry then 
there is an additional rule requiring that g^u, while transitions between two g states or between two u states 
are forbidden. 

We now turn to electronic selection rules for symmetrical nonlinear molecules. The procedure here is to 
examine the structure of a molecule to determine what symmetry operations exist which will leave the 
molecular framework in an equivalent configuration. Then one looks at the various possible point groups to 
see what group would consist of those particular operations. The character table for that group will then 
permit one to classify electronic states by symmetry and to work out the selection rules. Character tables for 
all relevant groups can be found in many books on spectroscopy or group theory. Here we will only pick one 
very simple point group called C 2v and look at some simple examples to illustrate the method. 

The C 2v group consists of four symmetry operations: an identity operation designated E, a rotation by one- 
half of a full rotation, i.e. by 180°, called a C 2 operation and two planes of reflection passing through the C 2 

axis and called a y operations. Examples of molecules belonging to this point group are water, H 2 0; 
formaldehyde, H 2 CO; or pyridine, C 5 H 5 N. It is conventional to choose a molecule-fixed axis system with the 
z axis coinciding with the C 2 axis. If the molecule is planar, it is conventionally chosen to lie in the yz plane 
with the x axis perpendicular to the plane of the molecule [17]. For example, in H 2 CO the C 2 or z axis lies 
along the C-0 bond. One of the a y planes would be the plane of the molecule, the yz plane. The other 
reflection plane is the xz plane, perpendicular to the molecular plane. 

Table Bl.1.1 gives the character table for the C 2v point group as it is usually used in spectroscopy. Because 
each symmetry operation leaves the molecular framework and hence the potential energy unchanged, it should 
not change the electron density or nuclear position density: i.e. the square of an electronic or vibrational 
wavefunction should remain unchanged. In a group with no degeneracies like this one, that means that a 
wavefunction itself should either be unchanged or should change sign under each of the four symmetry 
operations. The result of group theory applied to such functions is that there are only four possibilities for how 
they change under the operations, and they correspond to the four irreducible representations designated as 
A 1? A 2 , B 1 and B 2 . The characters may be taken to describe what happens in each symmetry. For example, a 
function classified as B 1 would be unchanged by the identity operation or by o y (xz) but would be changed in 
sign by the C 2 or c y (yz) operations. Every molecular orbital and every stationary state described by a many- 
electron wavefunction can be taken to belong to one of these symmetry classes. The same applies to 
vibrations and vibrational states. 
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Table Bl.1.1 Character table for the C 2v point group. 


C 2v E C 2 o y (xz) o y (yz) 


^ 

1 

1 

1 

1 z 

x 2 ,^ 2 

A 2 

1 

1 

-1 

-1 R z 

xy 

B 1 

1 

-1 

1 

"I *>Ry 

xz 

B 2 

1 

-1 

-1 

1 y,R x 

yz 


The last two columns of the character table give the transformation properties of translations along the x, y, 


and z directions, rotations about the three axes represented by R r , etc, and products of two coordinates or two 

translations repi 
selection rules. 


translations represented by x 2 , xy, etc. The information in these columns is very useful for working out 


Whenever a function can be written as a product of two or more functions, each of which belongs to one of 
the symmetry classes, the symmetry of the product function is the direct product of the symmetries of its 
constituents. This direct product is obtained in non-degenerate cases by taking the product of the characters 
for each symmetry operation. For example, the function xy will have a symmetry given by the direct product 
of the symmetries of x and of y\ this direct product is obtained by taking the product of the characters for each 
symmetry operation. In this example it may be seen that, for each operation, the product of the characters for 
B 1 and B 2 irreducible representations gives the character of the A 2 representation, so xy transforms as A 2 . 

The applications to selection rules work as follows. Intensities depend on the values of the transition moment 
integral of equation (Bl.1.1) : 


m 


= / ^fi^jdz. 


An integral like this must vanish by symmetry if the integrand is antisymmetric under any symmetry 
operation, i.e. it vanishes unless the integrand is totally symmetric. For C 2v molecules that means the 
integrand must have symmetry A^ The symmetry of the integrand is the direct product of the symmetries of 
the three components in the integral. The transition moment operator is a vector with three components, \i x , \i 
and |i , which transform like x, y and z, respectively. To see if a transition between state i and state j is 
allowed, one determines the symmetries of the three products containing the three components of |i, i.e. 
^*^ v *jand *'f^*^.. If any one of them is totally symmetrical, the transition is formally allowed. If none of 

the three is totally symmetrical the transition is forbidden. It should be noted that being allowed does not 

mean that a transition will be strong. The actual intensity depends on the matrix element u 7? whose value will 

ij 

depend on the details of the wavefunctions. 
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There is a further item of information in this procedure. If one of the three component integrals is non-zero, 
the molecule will absorb light polarized along the corresponding axis. For example, if pt Vi .^ = / ^fjt y ^j dris 

non-zero, the transition will absorb light polarized along thej^ axis. One may be able to observe this 
polarization directly in the spectrum of a crystal containing the molecules in known orientations. 
Alternatively, in the gas phase one may be able to tell the direction of polarization in the molecular framework 
by looking at the intensity distribution among the rotational lines in a high-resolution spectrum. 

Analogous considerations can be used for magnetic dipole and electric quadrupole selection rules. The 
magnetic dipole operator is a vector with three components that transform like R x , R and R z . The electric 

9 9 9 

quadrupole operator is a tensor with components that transform like x , y , z , xy, yz and xz. These latter 
symmetries are also used to get selection rules for Raman spectroscopy. Character tables for spectroscopic use 
usually show these symmetries to facilitate such calculations. 

When spectroscopists speak of electronic selection rules, they generally mean consideration of the integral 
over only the electronic coordinates for wavefunctions calculated at the equilibrium nuclear configuration of 
the initial state, Q = 0, 

#*iu(0> = / V'f(*>0}p^ u (x l 0)dx. (B1.1.18) 


If one of the components of this electronic transition moment is non-zero, the electronic transition is said to be 
allowed; if all components are zero it is said to be forbidden. In the case of diatomic molecules, if the 
transition is forbidden it is usually not observed unless as a very weak band occurring by magnetic dipole or 
electric quadrupole interactions. In polyatomic molecules forbidden electronic transitions are still often 
observed, but they are usually weak in comparison with allowed transitions. 

The reason they appear is that symmetric polyatomic molecules always have some non-totally-symmetric 
vibrations. When the nuclear framework is displaced from the equilibrium position along such a vibrational 
coordinate, its symmetry is reduced. It can then be thought of as belonging, for the instant, to a different point 
group of lower symmetry, and it is likely in that group that the transition will be formally allowed. Even 
though |Li lu (0) from equation (B 1.1. 18) is zero, H^ u (0 f rom equation (B 1.1.5) will be non-zero for some 
configurations Q involving distortion along antisymmetric normal coordinates. The total integrated intensity 
of an electronic band is given by ( bl.1.10 ). It involves the square of the electronic transition moment averaged 
over the initial vibrational state, including the configurations in which the transition moment is not zero. In 
suitable cases it may be possible to calculate the first-order terms of equation (Bl.1.7) from electronic 
wavefunctions and use them in equation (Bl.1.5) to calculate an integrated absorption intensity to compare 
with the observed integrated intensity or oscillator strength [18]. 

The spin selection rule for polyatomic molecules is again AS = 0, no change in spin in the absence of spin- 
orbital coupling. This rule becomes less valid when heavy atoms are included in the molecule. Spin-changing 
transitions can be observed by suitable techniques even in hydrocarbons, but they are quite weak. When spin- 
orbital coupling is important it is possible to use the symmetries of the spin wavefunctions, assign symmetries 
to total orbital-plus-spin wavefunctions and use group theory as above to get the selection rules. 
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Most stable polyatomic molecules whose absorption intensities are easily studied have filled-shell, totally 
symmetric, singlet ground states. For absorption spectra starting from the ground state the electronic selection 
rules become simple: transitions are allowed to excited singlet states having symmetries the same as one of 
the coordinate axes, x, y or z. Other transitions should be relatively weak. 

(C) VIBRONIC SELECTION RULES 

Often it is possible to resolve vibrational structure of electronic transitions. In this section we will briefly 
review the symmetry selection rules and other factors controlling the intensity of individual vibronic bands. 

In the Born-Oppenheimer approximation the vibronic wavefunction is a product of an electronic 
wavefunction and a vibrational wavefunction, and its symmetry is the direct product of the symmetries of the 
two components. We have just discussed the symmetries of the electronic states. We now consider the 
symmetry of a vibrational state. In the harmonic approximation vibrations are described as independent 
motions along normal modes Q i and the total vibrational wavefunction is a product of functions, one 
wavefunction for each normal mode: 


#(ff)=^(Gl)^ : (Gi)^(Gi).-.. (B1-1.19) 

Each such normal mode can be assigned a symmetry in the point group of the molecule. The wavefunctions 
for non-degenerate modes have the following simple symmetry properties: the wavefunctions with an odd 
vibrational quantum number v f have the same symmetry as their normal mode Q t ; the ones with an even v f are 
totally symmetric. The symmetry of the total vibrational wavefunction §(Q) is then the direct product of the 
symmetries of its constituent normal coordinate functions (p w (2,-)- I n particular, the lowest vibrational state, 


with all v. = 0, will be totally symmetric. The states with one quantum of excitation in one vibration and zero 
in all others will have the symmetry of that one vibration. Once the symmetry of the vibrational wavefunction 
has been established, the symmetry of the vibronic state is readily obtained from the direct product of the 
symmetries of the electronic state and the vibrational state. This procedure gives the correct vibronic 
symmetry even if the harmonic approximation or the Born-Oppenheimer approximation are not quite valid. 

The selection rule for vibronic states is then straightforward. It is obtained by exactly the same procedure as 
described above for the electronic selection rules. In particular, the lowest vibrational level of the ground 
electronic state of most stable polyatomic molecules will be totally symmetric. Transitions originating in that 
vibronic level must go to an excited state vibronic level whose symmetry is the same as one of the 
coordinates, x, y, or z. 

One of the consequences of this selection rule concerns forbidden electronic transitions. They cannot occur 
unless accompanied by a change in vibrational quantum number for some antisymmetric vibration. Forbidden 
electronic transitions are not observed in diatomic molecules (unless by magnetic dipole or other interactions) 
because their only vibration is totally symmetric; they have no antisymmetric vibrations to make the 
transitions allowed. 

The symmetry selection rules discussed above tell us whether a particular vibronic transition is allowed or 
forbidden, but they give no information about the intensity of allowed bands. That is determined by equation 
(Bl.1.9) for absorption or ( Bl.1.13 ) for emission. That usually means by the Franck-Condon principle if only 
the zero-order term in equation (Bl.1.7) is needed. So we take note of some general principles for Franck- 
Condon factors (FCFs). 
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Usually the normal coordinates of the upper and lower states are quite similar. (When they are not it is called 
a Duschinsky effect [ 19 ] and the treatment becomes more complicated.) Because of the product form of the 
vibrational wavefunctions of equation (B 1 . 1 . 1 9) the FCF is itself a product of FCFs for individual normal 
modes. If there is little or no change in the geometry of a given normal mode, the FCF for that mode will be 
large only if its vibrational quantum number does not change. But for modes for which there is a significant 
change in geometry, the FCFs may be large for a number of vibrational levels in the final state. The spectrum 
then shows a series of vibronic peaks differing in energy by the frequency of that vibration. Such a series is 
referred to as a progression in that mode. 

Most commonly, the symmetry point group of the lower and upper states will be the same. Then only totally 
symmetric vibrations can change equilibrium positions — a change in a non-totally-symmetric mode would 
mean that the states have configurations belonging to different point groups. So one may expect to see 
progressions in one or more of the totally symmetric vibrations but not in antisymmetric vibrations. In 
symmetry-forbidden electronic transitions, however, one will see changes of one quantum (or possibly other 
odd numbers of quanta) in antisymmetric vibrations as required to let the transition appear. 

An example of a single-absorption spectrum illustrating many of the effects discussed in this section is the 
spectrum of formaldehyde, H 2 CO, shown in figure Bl.1.2 [20], This shows the region of the lowest singlet- 
singlet transition, the A A, — X A 1 transition. This is called an n — » tt* transition; the electronic change is the 
promotion of an electron from a non-bonding orbital (symmetry B 2 ) mostly localized on the oxygen atom into 

an antibonding tt* orbital (symmetry B^ on the C-0 bond. By the electronic selection rules, a transition from 
the totally symmetric ground state to a A 2 state is symmetry forbidden, so the transition is quite weak with an 
oscillator strength of 2.4 x 10 . The transition is appearing with easily measured intensity due to coupling 
with antisymmetric vibrations. Most of the intensity is induced by distortion along the out-of-plane 
coordinate, Q 4 . This means that in equation (Bl.1.7) the most significant derivative of |u lu is the one with 
respect to Q 4 . The first peak seen in figure Bl.1.2 , at 335 nm, has one quantum of vibration v 4 excited in the 
upperstate. The band is designated as which means that vibration number 4 has 1 quantum of excitation in the 


upper state and quanta in the lower state. The symmetry of Q 4 is B 1? and combined with the A 2 symmetry of 
the electronic state it gives an upper state of vibronic symmetry B 2 , the direct product A 2 x B^ It absorbs 
light with its electric vector polarized in the y direction, i.e. in plane and perpendicular to the C-0 bond. 


Figure Bl.1.2. Spectrum of formaldehyde with vibrational resolution. Several vibronic origins are marked. 
One progression in v 2 starting from the origin is indicated on the line along the top. A similar progression is 
built on each vibronic origin. Reprinted with permission from [20], Copyright 1982, American Chemical 
Society. 
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If the 0-0 band were observable in this spectrum it would be called the origin of the transition. The 4* band is 

referred to as a vibronic origin. Starting from it there is a progression in v 2 , the C-0 stretching mode, which 
gets significantly longer in the upper state because the presence of the antibonding n electron weakens the 
bond. Several of the peaks in the progression are marked along the line at the top of the figure. 

Several other vibronic origins are also marked in this spectrum. The second major peak is the 4* band, with 

three quanta of v 4 in the upper state. This upper state has the same symmetry as the state with one quantum. 
Normally, one would not expect much intensity in this peak, but it is quite strong because the excited state 
actually has a non-planar equilibrium geometry, i.e. it is distorted along Q 4 . Every vibronic origin including 
this one has a progression in v 2 built on it. The intensity distribution in a progression is determined by the 
Franck-Condon principle and, as far as can be determined, all progressions in this spectrum are the same. 

At 321 nm there is a vibronic origin marked 5^. This has one quantum of v 5 , the antisymmetric C-H 
stretching mode, in the upper state. Its intensity is induced by a distortion along Qy This state has B 2 
vibrational symmetry. The direct product of B 2 and A 2 is B 1? so it has B 1 vibronic symmetry and absorbs x- 
polarized light. One can also see a 4-^vibronic origin which has the same symmetry and intensity induced by 

distortion along Q 6 . 

A very weak peak at 348 nm is the ^origin. Since the upper state here has two quanta of v 4 , its vibrational 

symmetry is A 1 and the vibronic symmetry is A 2 , so it is forbidden by electric dipole selection rules. It is 

actually observed here due to a magnetic dipole transition [21]. By magnetic dipole selection rules the A 2 - 
A, electronic transition is allowed for light with its magnetic field polarized in the z direction. It is seen here 
as having about 1% of the intensity of the symmetry-forbidden electric dipole transition made allowed by 

vibronic coupling, or an oscillator strength around 10~ 6 . This illustrates the weakness of magnetic dipole 
transitions. 

(D) ROTATIONAL SELECTION RULES 

If the experimental technique has sufficient resolution, and if the molecule is fairly light, the vibronic bands 
discussed above will be found to have a fine structure due to transitions among rotational levels in the two 
states. Even when the individual rotational lines cannot be resolved, the overall shape of the vibronic band 
will be related to the rotational structure and its analysis may help in identifying the vibronic symmetry. The 
analysis of the band appearance depends on calculation of the rotational energy levels and on the selection 
rules and relative intensity of different rotational transitions. These both come from the form of the rotational 
wavefunctions and are treated by angular momentum theory. It is not possible to do more than mention a 
simple example here. 


The simplest case is a 2-2 transition in a linear molecule. In this case there is no orbital or spin angular 
momentum. The total angular momentum, represented by the quantum number J, is entirely rotational angular 
momentum. The rotational energy levels of each state approximately fit a simple formula: 

Ej = tfJ(J + 1) - DJ 2 {J + I) 2 - 

The second term is used to allow for centrifugal stretching and is usually small but is needed for accurate 
work. The quantity B is called the rotation constant for the state. In a rigid rotator picture it would have the 
value 
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and is usually quoted in reciprocal centimetres. lis the moment of inertia. In an actual molecule which is 
vibrating, the formula for B must be averaged over the vibrational state, i.e. one must use an average value of 
III. As a result B varies somewhat with vibrational level. The values of B and the moments of inertia obtained 
from the spectra are used to get structural information about the molecule. The bonding and hence the 
structure will be different in the two states, so the B values will generally differ significantly. They are called 
B f and B". The convention is to designate quantities for the upper state with a single prime and quantities for 
the lower state with a double prime. 

The rotational selection rule for a 2-2 transition is AJ= ± 1. Lines which have J' = J" - I are called P lines 
and the set of them is called the P branch of the band. Lines for which J = J" + 1 are called R lines and the 

set of them the R branch. (Although not seen in a 2-2 transition, a branch with J' = /'' would be called a Q 
branch, one with J = J' - 2 would be an O branch, or one with f = J" + 2 would be an S branch, etc.) 
Individual lines may be labeled by giving /'' in parentheses like R(l), P(2), etc. For lines with low values of/, 
the R lines get higher in energy as /increases while the P lines get lower in energy with increasing /. If B" 
and B' are sufficiently different, which is the usual case in electronic spectra, the lines in one of the two 
branches will get closer together as J increases until they pile up on top of each other and then turn around 
and start to move in the other direction as J continues to increase. The point at which the lines pile up is called 
a band head. It is often the most prominent feature of the band. If B" > B\ this will happen in the R branch 
and the band head will mark the high-energy or short-wavelength limit of each vibronic band. Such a band is 
said to be shaded to the red because absorption or emission intensity has a sharp onset on the high-energy side 
and then falls off gradually on the low-energy or red side. This is the most common situation where the lower 
state is bound more tightly and has a smaller moment of inertia than the upper state. But the opposite can 
occur as well. If B" < B' the band head will be formed in the P branch on the low-energy side of the vibronic 
band, and the band will be said to be shaded to the violet or shaded to the blue. Note that the terms red for the 
low-energy direction and violet or blue for the high-energy direction are used even for spectra in the 
ultraviolet or infrared regions where the actual visible red and blue colours would both be in the same 
direction. 

The analysis of rotational structure and selection rules for transitions involving n or A states becomes 
considerably more complicated. In general, Q branches will be allowed as well as P and R branches. The 
coupling between different types of angular momenta — orbital angular momentum of the electrons, spin 
angular momentum for states of higher multiplicity, the rotational angular momentum and even nuclear spin 
terms — is a complex subject which cannot be covered here: the reader is referred to the more specialized 
literature. 

B1. 1.3.4 PERTURBATIONS 


Spectroscopists working with high-resolution spectra of small molecules commonly fit series of rotational 
lines to formulae involving the rotational constants, angular momentum coupling terms, etc. However, 
occasionally they find that some lines in the spectrum are displaced from their expected positions in a 
systematic way. Of course, a displacement of a line from its expected position means that the energy levels of 
one of the states are displaced from their expected energies. Typically, as /increases some lines will be seen 
to be displaced in one direction by increasing amounts up to a maximum at some particular /, then for the 
next /the line will be displaced in the opposite direction, and then as /increases further the lines will 
gradually approach their expected positions. These displacements of lines and of state energies are called 
perturbations [22]. 
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They are caused by interactions between states, usually between two different electronic states. One hard and 
fast selection rule for perturbations is that, because angular momentum must be conserved, the two interacting 
states must have the same /. The interaction between two states may be treated by second-order perturbation 
theory which says that the displacement of a state is given by 


Hi ^ 

i 


A£,- ' ,a 


where H l v is the matrix element between the two states of some small term H' in the Hamiltonian which is 

unimportant in determining the expected energies of the states. This interaction always has the effect of 
pushing the two states apart in energy by equal and opposite displacements inversely proportional to the zero- 
order separation of the two states. The perturbation is observed when the vibronic level of the state with the 
larger B value lies slightly lower in energy than a vibronic level of the other state. Then with increasing / the 
energy of the rotating level of the first state gets closer and closer to the energy of the second state and finally 
passes it and then gets farther away again. The maximum displacements occur at the /values where the two 
energies are the closest. 

The spectral perturbations are observed in a transition involving one of the interacting states. Sometimes it is 
possible also to see an electronic transition involving the other of the interacting states, and then one should 
see equal but opposite displacements of rotational levels with the same / 

An interesting example occurs in the spectrum of the C 2 molecule. The usual rule of absorption spectroscopy 
is that the transitions originate in the ground electronic state because only it has sufficient population. 
However, in C 2 transitions were observed starting both from a 3 IT u state and from a ' £|Tstate, so it was not 

clear which was the ground state. The puzzle was solved by Ballik and Ramsay [23] who observed 
perturbations in a ^E"- 3 IT transition due to displacements of levels in the ■*£ "state. They then reinvestigated 

a ' flu- 1 Z ^transition known as the Phillips system, and they observed equal and opposite displacements of 

levels in the ' £*state, thus establishing that the ' ££and * Estates were perturbing each other. For example, in 

the v = 4 vibrational level of the ' Estate, the /= 40 rotational level was displaced to lower energy by 0.26 

cm" ; correspondingly, in the v = 1 vibrational level of the Estate, the /= 40 level was displaced upwards 

by 0.25 cm -1 . The values have an uncertainty of 0.02 cm -1 , so the displacements of the levels with the same / 
are equal and opposite within experimental error. Similarly, the /= 42 level in the ' £+state was displaced 

upwards by 0.17 cm -1 and the /= 42 level of the Estate displaced downwards by 0.15 cm -1 . These 
observations established that these particular levels were very close to each other in energy and the authors 


were able to prove that the ' E^was lower by about 600 cm than the Il u state and was the ground state. 

Absorption spectra of C 2 are typically observed in the vapour over graphite at high temperatures. At 2500 K 

the value of kT is about 1700 cm" , much greater than the energy separation of the two states. Since the 
1 2 Estate is non-degenerate and the ri u state has a sixfold degeneracy, most of the molecules are actually in 

the upper state. This accounts for the observation of absorptions starting from both states. 

The perturbations in this case are between a singlet and a triplet state. The perturbation Hamiltonian, H\ of 
the second-order perturbation theory is spin-orbital coupling, which has the effect of mixing singlet and 
triplet states. 
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The magnitude of the perturbations can be calculated fairly quantitatively from high-quality electronic 
wavefunctions including configuration interaction [24]. 


B1. 1.4 EXAMPLES 

B1. 1.4.1 PHOTOPHYSICS OF MOLECULES IN SOLUTION 

To understand emission spectroscopy of molecules and/or their photochemistry it is essential to have a picture 
of the radiative and non-radiative processes among the electronic states. Most stable molecules other than 
transition metal complexes have all their electrons paired in the ground electronic state, making it a singlet 
state. Figure B 1.1.3 gives a simple state energy diagram for reference. Singlet states are designated by the 
letter S and numbered in order of increasing energy. The ground state is called S Q . Excited singlet states have 
configurations in which an electron has been promoted from one of the filled orbitals to one of the empty 
orbitals of the molecule. Such configurations with two singly occupied molecular orbitals will give rise to 
triplet states as well as singlet states. A triplet state results when one of the electrons changes its spin so that 
the two electrons have parallel spin. Each excited singlet state will have its corresponding triplet state. 
Because the electron-electron repulsion is less effective in the triplet state, it will normally be lower in energy 
than the corresponding singlet state. 



Figure Bl.1.3. State energy diagram for a typical organic molecule. Solid arrows show radiative transitions; 
A: absorption, F: fluorescence, P: phosphorescence. Dotted arrows: non-radiative transitions. 


Spectroscopists observed that molecules dissolved in rigid matrices gave both short-lived and long-lived 
emissions which were called fluorescence and phosphorescence, respectively. In 1944, Lewis and Kasha [ 25 ] 
proposed that molecular phosphorescence came from a triplet state and was long-lived because of the well 
known spin selection rule AS = 0, i.e. interactions with a light wave or with the surroundings do not readily 
change the spin of the electrons. 
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Typical singlet lifetimes are measured in nanoseconds while triplet lifetimes of organic molecules in rigid 
solutions are usually measured in milliseconds or even seconds. In liquid media where diffusion is rapid the 
triplet states are usually quenched, often by the nearly ubiqitous molecular oxygen. Because of that, 
phosphorescence is seldom observed in liquid solutions. In the spectroscopy of molecules the term 
fluorescence is now usually used to refer to emission from an excited singlet state and phosphorescence to 
emission from a triplet state, regardless of the actual lifetimes. 

If a light beam is used to excite one of the higher singlet states, say S 2? a very rapid relaxation occurs to S 1? 
the lowest excited singlet state. This non-radiative process just converts the difference in energy into heat in 
the surroundings. A radiationless transition between states of the same multiplicity is called internal 
conversion. Relaxation between states of the same multiplicity and not too far apart in energy is usually much 
faster than radiative decay, so fluorescence is seen only from the S 1 state. These radiationless processes in 
large molecules are the analogue of the perturbations observed in small molecules. They are caused by small 
terms in the Hamiltonian such as spin-orbital coupling or Born-Oppenheimer breakdown, which mix 
electronic states. The density of vibrational levels of large molecules can be very high and that makes these 
interactions into irreversible transitions to lower states. 

Once the excited molecule reaches the S 1 state it can decay by emitting fluorescence or it can undergo a 
further radiationless transition to a triplet state. A radiationless transition between states of different 
multiplicity is called intersystem crossing. This is a spin-forbidden process. It is not as fast as internal 
conversion and often has a rate comparable to the radiative rate, so some S 1 molecules fluoresce and others 
produce triplet states. There may also be further internal conversion from S 1 to the ground state, though it is 
not easy to determine the extent to which that occurs. Photochemical reactions or energy transfer may also 
occur from S^ 

Molecules which reach a triplet state will generally relax quickly to state Tj. From there they can emit 
phosphorescence or decay by intersystem crossing back to the ground state. Both processes are spin forbidden 
and again often have comparable rates. The T 1 state is often also important for photochemistry because its 
lifetime is relatively long. Both phosphorescence and intersystem crossing are dependent on spin-orbital 
coupling and are enhanced by heavy atoms bound to the molecule or in the environment. They are also 
enhanced by the presence of species with unpaired electrons such as 2 because electron exchange can effect 
such transitions without actually requiring the spin of an electron to be reversed. 2 is found to quench both 
fluorescence and phosphorescence, and it is often necessary to remove oxygen from solutions for precise 
emission measurements. 

B1. 1.4.2 WIDTHS AND SHAPES OF SPECTRAL LINES 

High-resolution spectroscopy used to observe hyperfine structure in the spectra of atoms or rotational 
structure in electronic spectra of gaseous molecules commonly must contend with the widths of the spectral 
lines and how that compares with the separations between lines. Three contributions to the line width will be 
mentioned here: the natural line width due to the finite lifetime of the excited state, collisional broadening of 
lines, and the Doppler effect. 

The most fundamental limitation on sharpness of spectral lines is the so-called natural linewidth. Because an 


excited state has a finite lifetime, the intensity of light it emits falls off exponentially as a function of time. A 
beam of light whose intensity varies with time cannot have a single frequency. Its spectral distribution is the 
Fourier transform of its temporal shape. For an exponential decay the spectral distribution will have the form 
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I(V) = /(^o)- 2 — -y (B1.1.20) 

where v Q is the frequency of the centre of the band and x is the mean life of the excited state. The same 
formula applies to lines in the absorption spectrum. This shape is called a Lorentzian shape. Its full width at 
half maximum (FWHM) is 1/(2ttt). The shorter the lifetime, the broader the line. Another way to think about 
the width is to say that the energy of a state has an uncertainty related to its lifetime by the uncertainty 
principle. If the transition is coupling two states, both of which have finite lifetimes and widths, it is necessary 
to combine the effects. 

Spectral lines are further broadened by collisions. To a first approximation, collisions can be thought of as just 
reducing the lifetime of the excited state. For example, collisions of molecules will commonly change the 
rotational state. That will reduce the lifetime of a given state. Even if the state is not changed, the collision 
will cause a phase shift in the light wave being absorbed or emitted and that will have a similar effect. The 
line shapes of collisionally broadened lines are similar to the natural line shape of equation (B 1.1. 20) with a 
lifetime related to the mean time between collisions. The details will depend on the nature of the 
intermolecular forces. We will not pursue the subject further here. 

A third source of spectral broadening is the Doppler effect. Molecules moving toward the observer will emit 
or absorb light of a slightly higher frequency than the frequency for a stationary molecule; those moving away 
will emit or absorb a slightly lower frequency. The magnitude of the effect depends on the speed of the 
molecules. To first order the frequency shift is given by 

where v is the component of velocity in the direction of the observer and c is the speed of light. 

For a sample at thermal equilibrium there is a distribution of speeds which depends on the mass of the 
molecules and on the temperature according to the Boltzmann distribution. This results in a line shape of the 
form 




where M is the atomic or molecular mass and R the gas constant. This is a Gaussian line shape with a width 
given by 


FWHM = ^(HIi^f 


The actual line shape in a spectrum is a convolution of the natural Lorentzian shape with the Doppler shape. It 
must be calculated for a given case as there is no simple formula for it. It is quite typical in electronic 


spectroscopy to have the FWHM determined mainly by the Doppler width. However, the two shapes are quite 
different and the Lorentzian shape 
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does not fall off as rapidly at large (v - v Q ). It is likely that the intensity in the wings of the line will be 
determined by the natural line shape. 

Collisional broadening is reduced by the obvious remedy of working at low pressures. Of course, this reduces 
the absorption and may require long path lengths for absorption spectroscopy. Doppler widths can be reduced 
by cooling the sample. For many samples this is not practical because the molecules will have too low a 
vapour pressure. Molecular beam methods and the newer technique of jet spectroscopy can be very effective 
by restricting the motion of the molecules to be at right angles to the light beam. Some other techniques for 
sub-Doppler spectroscopy have also been demonstrated using counter-propagating laser beams to compensate 
for motion along the direction of the beam. The natural linewidth, however, always remains when the other 
sources of broadening are removed. 

B1. 1.4.3 RYDBERG SPECTRA 

The energies of transitions of a hydrogen atom starting from the ground state fit exactly the equation 


where R is the Rydberg constant, E l is the ionization energy of the atom (which in hydrogen is equal to the 
Rydberg constant) and n is the principal quantum number of the electron in the upper state. The spectrum 
shows a series of lines of increasing n which converge to a limit at the ionization energy. 

Other atoms and molecules also show similar series of lines, often in the vacuum ultraviolet region, which fit 
approximately a similar formula: 


Such a series of lines is called a Rydberg series [26], These lines also converge to the ionization energy of the 
atom or molecule, and fitting the lines to this formula can give a very accurate value for the ionization energy. 
In the case of molecules there may be resolvable vibrational and rotational structure on the lines as well. 

The excited states of a Rydberg series have an electron in an orbital of higher principal quantum number, n, in 
which it spends most of its time far from the molecular framework. The idea is that the electron then feels 
mainly a Coulomb field due to the positive ion remaining behind at the centre, so its behaviour is much like 
that of the electron in the hydrogen atom. The constant 8 is called the quantum defect and is a measure of the 
extent to which the electron interacts with the molecular framework. It has less influence on the energy levels 
as n gets larger, i.e. as the electron gets farther from the central ion. The size of 5 will also depend on the 
angular momentum of the electron. States of lower angular momentum have more probability of penetrating 
the charge cloud of the central ion and so may have larger values of 8. Actual energy levels of Rydberg atoms 
and molecules can be subject to theoretical calculations [27]. Sometimes the higher states have orbitals so 
large that other molecules may fall within their volume, causing interesting effects [28]. 
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B1.1.4.4 MULTIPHOTON SPECTROSCOPY 


All the previous discussion in this chapter has been concerned with absorption or emission of a single photon. 
However, it is possible for an atom or molecule to absorb two or more photons simultaneously from a light 
beam to produce an excited state whose energy is the sum of the energies of the photons absorbed. This can 
happen even when there is no intermediate stationary state of the system at the energy of one of the photons. 
The possibility was first demonstrated theoretically by Maria Goppert-Mayer in 1931 [29], but experimental 
observations had to await the development of the laser. Multiphoton spectroscopy is now a useful technique 
[30, 31]. 

The transition probability for absorption of two photons can be described in terms of a two-photon cross 
section 5 by 

-d; =&i-N<if 

where / is a photon flux in photons cm s , N is the number of molecules per cubic centimetre, and / is 
distance through the sample. Measured values of 8 are of the order of 10 cm s photon - molecule - [32]. 
For molecules exposed to the intensity of sunlight at the earth's surface this would suggest that the molecule 
might be excited once in the age of the universe. However, the probability is proportional to the square of the 
light intensity. For a molecule exposed to a pulsed laser focused to a small spot, the probability of being 
excited by one pulse may be easily observable by fluorescence excitation or multiphoton ionization 
techniques. 

One very important aspect of two-photon absorption is that the selection rules for atoms or symmetrical 
molecules are different from one-photon selection rules. In particular, for molecules with a centre of 
symmetry, two-photon absorption is allowed only for g^g or u^^u transitions, while one-photon absorption 
requires g<r^u transitions. Therefore, a whole different set of electronic states becomes allowed for two-photon 
spectroscopy. The group-theoretical selection rules for two-photon spectra are obtained from the symmetries 

of the x 2 , xy, etc. terms in the character table. This is completely analogous to the selection rules for Raman 
spectroscopy, a different two-photon process. 

A good example is the spectrum of naphthalene. The two lowest excited states have B 2u and B lu symmetries 
and are allowed for one-photon transitions. A weak transition to one of these is observable in the two-photon 
spectrum [33], presumably made allowed by vibronic effects. Much stronger two-photon transitions are 
observable at somewhat higher energies to a B 3 and an A state lying quite close to the energies predicted by 
theory many years earlier [34]. 

An interesting aspect of two-photon spectroscopy is that some polarization information is obtainable even for 
randomly oriented molecules in solution by studying the effect of the relative polarization of the two photons. 
This is readily done by comparing linearly and circularly polarized light. Transitions to A states will absorb 
linearly polarized light more strongly than circularly polarized light. The reverse is true of transitions to B 1 , 
B 9 , or B. states. The physical picture is that the first photon induces an oscillating u-type polarization of the 
molecule in one direction. To get to a totally symmetric A state the second photon must reverse that 
polarization, so is favoured for a photon of the same polarization. However, to get to, say, a B^ state, the 
second photon needs to act at right angles to the first, and in circularly polarized light that perpendicular 
polarization is always strong. Figure Bl.1.4 shows the two-photon fluorescence excitation spectrum of 
naphthalene in the region of the g states [35]. One peak shows stronger absorption for circularly polarized, 
one for linearly polarized light. That confirms the identification as B^ and A states respectively. 
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Figure Bl.1.4. Two-photon fluorescence excitation spectrum of naphthalene. Reprinted from [35]. Courtesy, 
Tata McGraw-Hill Publishing Company Ltd, 7 West Patel Nagar, New Dehli, 1 10008, India. 

Three-photon absorption has also been observed by multiphoton ionization, giving Rydberg states of atoms or 
molecules [36]. Such states usually require vacuum ultraviolet techniques for one-photon spectra, but can be 
done with a visible or near-ultraviolet laser by three-photon absorption. 

B1. 1.4.5 OTHER EXAMPLES 

Many of the most interesting current developments in electronic spectroscopy are addressed in special 
chapters of their own in this encyclopedia. The reader is referred especially to sections B2.1 on ultrafast 
spectroscopy, CI. 5 on single molecule spectroscopy, C3.2 on electron transfer, and C3.3 on energy transfer. 
Additional topics on electronic spectroscopy will also be found in many other chapters. 
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B1.2 Vibrational spectroscopy 

Charles Schmuttenmaer 


B1.2.1 INTRODUCTION 


B1. 2.1.1 OVERVIEW 


Vibrational spectroscopy provides detailed information on both structure and dynamics of molecular species. 
Infrared (IR) and Raman spectroscopy are the most commonly used methods, and will be covered in detail in 
this chapter. There exist other methods to obtain vibrational spectra, but those are somewhat more specialized 
and used less often. They are discussed in other chapters, and include: inelastic neutron scattering (INS), 
helium atom scattering, electron energy loss spectroscopy (EELS), photoelectron spectroscopy, among others. 

Vibrational spectra span the frequency range 10-4000 cm (10 cm =1.2 meV = 0.03 kcal mol = 0.1 1 kJ 
mol - , and 4000 cm = 496 meV = 10.3 kcal mol - = 42.9 kJ mol ), depending on the strength of the bond 


and the reduced mass of the vibrational mode. Very weakly bound species, such as clusters bound only by van 
der Waals forces, or condensed phase modes with very large effective reduced masses, such as intermolecular 
modes in liquids or solids, or large amplitude motions in proteins or polymers, have very low frequency 
vibrations of 10-100 cm . Modes involving many atoms in moderately large molecules absorb in the range 
300-1200 cm" . The region 600-1200 cm is referred to as the fingerprint region because while many 
organic molecules will all have bands due to vibrations of C-H, C = O, O-H and so on, there will be low 
frequency bands unique to each molecule that involve complicated motions of many atoms. The region 1200- 

3500 cm is where most functional groups are found to absorb. Thus, the presence or absence of absorption 
at a characteristic frequency helps to determine the identity of a compound. The frequency of the absorption 
can be justified in terms of the masses of the atoms participating, the type of motion involved (stretch versus 
bend) and the bond strengths. The H 2 molecule has a reasonably large force constant and the smallest reduced 

mass, which results in the highest vibrational frequency at 4400 cm" . The width and intensity of an 
absorption feature provide information in addition to the absorption frequency. In favourable cases in the gas 
phase the width is determined by the vibrational lifetime or even the lifetime of a molecule if the vibrational 
energy is greater than the bond strength. The intensity yields information on the nature of the vibrational 
motion, and can also be used to determine the temperature of the sample. 

Infrared and Raman spectroscopy each probe vibrational motion, but respond to a different manifestation of it. 
Infrared spectroscopy is sensitive to a change in the dipole moment as a function of the vibrational motion, 
whereas Raman spectroscopy probes the change in polarizability as the molecule undergoes vibrations. 
Resonance Raman spectroscopy also couples to excited electronic states, and can yield further information 
regarding the identity of the vibration. Raman and IR spectroscopy are often complementary, both in the type 
of systems that can be studied, as well as the information obtained. 

Vibrational spectroscopy is an enormously large subject area spanning many scientific disciplines. The 
methodology, both experimental and theoretical, was developed primarily by physical chemists and has 
branched far and wide over the last 50 years. This chapter will mainly focus on its importance with regard to 
physical chemistry. 


B1.2.1.2 INFRARED SPECTROSCOPY 


For many chemists, the most familiar IR spectrometer is the dual beam instrument that covers the region 900- 

3400 cm -1 ; it is used for routine analysis and compound identification. Typically, each of the functional 
groups of a molecule have unique frequencies, and different molecules have different combinations of 
functional groups. Thus, every molecule has a unique absorption spectrum. Of course, there can be situations 
where two molecules are similar enough that their spectra are indistinguishable on a system with moderate 
signal-to-noise ratio, or where there are strong background absorptions due to a solvent or matrix that 
obscures the molecular vibrations, so that it is not possible to distinguish all compounds under all 
circumstances; but it is usually quite reliable, particularly if one is comparing the spectrum of an unknown to 
reference spectra of a wide variety of compounds. Ease of implementation and reasonably unambiguous 
spectra have led to the widespread use of IR spectroscopy outside of physical chemistry. 

Within physical chemistry, the long-lasting interest in IR spectroscopy lies in structural and dynamical 
characterization. High resolution vibration-rotation spectroscopy in the gas phase reveals bond lengths, bond 
angles, molecular symmetry and force constants. Time-resolved IR spectroscopy characterizes reaction 
kinetics, vibrational lifetimes and relaxation processes. 

B1.2.1.3 RAMAN SPECTROSCOPY 

Raman spectrometers are not as widespread as their IR counterparts. This is partially due to the more stringent 


requirements on light source (laser) and monochromator. As is the case with IR spectroscopy, every molecule 
has a unique Raman spectrum. It is also true that there can be ambiguity because of molecular similarities or 
impurities in the sample. Resonance Raman spectroscopy allows interfering bands to be eliminated by 
selectively exciting only specific species by virtue of their electronic absorption, or coupling to a nearby 
chromophore. This is particularly helpful in discriminating against strong solvent bands. For example, the first 
excited electronic state of water is at about 7 eV (-175 nm excitation wavelength), whereas many larger 
molecules have electronic transitions at much lower photon energy. By using the resonant enhancement of the 

Raman signal from the excited electronic state, it is possible to obtain a factor of 10 enhancement of the 
dissolved molecule. 

One of the well known advantages of resonance Raman spectroscopy is that samples dissolved in water can be 
studied since water is transparent in the visible region. Furthermore, many molecules of biophysical interest 
assume their native state in water. For this reason, resonance Raman spectroscopy has been particularly 
strongly embraced in the biophysical community. 


B1. 2.2 THEORY 

B1. 2.2.1 CLASSICAL DESCRIPTION 

Both infrared and Raman spectroscopy provide information on the vibrational motion of molecules. The 
techniques employed differ, but the underlying molecular motion is the same. A qualitative description of IR 
and Raman spectroscopies is first presented. Then a slightly more rigorous development will be described. For 
both IR and Raman spectroscopy, the fundamental interaction is between a dipole moment and an 
electromagnetic field. Ultimately, the two 


can only couple with each other if they have the same frequency, otherwise the time average of their 
interaction energy is zero. 

The most important consideration for a vibration to be IR active is that its dipole moment changes upon 
vibration. That is to say, its dipole derivative must be nonzero. The time-dependence of the dipole moment for 
a heteronuclear diatomic is shown in figure Bl.2.1. Classically, an oscillating dipole radiates energy at the 
oscillation frequency. In a sense, this is true for a vibrating molecule in that when it is in an excited 
vibrational state it can emit a photon and lose energy. However, there are two fundamental differences. First, 
it does not continuously radiate energy, as a classical oscillator would. Rather, it vibrates for long periods 
without radiating energy, and then there is an instantaneous jump to a lower energy level accompanied by the 
emission of a photon. It should be noted that vibrational lifetimes in the absence of external perturbations are 
quite long, on the millisecond timescale. The second difference from a classical oscillator is that when a 
molecule is in its ground vibrational state it cannot emit a photon, but is still oscillating. Thus, the dipole can 
oscillate for an indefinitely long period without radiating any energy. Therefore, a quantum mechanical 
description of vibration must be invoked to describe molecular vibrations at the most fundamental level. 



Figure Bl.2.1. Schematic representation of the dependence of the dipole moment on the vibrational 
coordinate for a heteronuclear diatomic molecule. It can couple with electromagnetic radiation of the same 
frequency as the vibration, but at other frequencies the interaction will average to zero. 

The qualitative description of Raman scattering is closely related. In this case, the primary criterion for a 
Raman active mode is that the polarizability of the molecule changes as a function of vibrational coordinate. 
An atom or molecule placed in an electric field will acquire a dipole moment that is proportional to the size of 
the applied field and the ease at which the charge distribution redistributes, that is, the polarizability. If the 
polarizability changes as a function of vibration, then there will be an induced dipole whose magnitude 
changes as the molecule vibrates, as depicted in figure Bl.2.2 and this can couple to the EM field. Of course, 
the applied field is oscillatory, not static, but as we will see below there will still be scattered radiation that is 
related to the vibrational frequency. In fact, the frequency of the scattered light will be the sum and difference 
of the laser frequency and vibrational frequency. 




Figure Bl.2.2. Schematic representation of the polarizability of a diatomic molecule as a function of 
vibrational coordinate. Because the polarizability changes during vibration, Raman scatter will occur in 
addition to Rayleigh scattering. 

Before presenting the quantum mechanical description of a harmonic oscillator and selection rules, it is 
worthwhile presenting the energy level expressions that the reader is probably already familiar with. A 
vibrational mode v, with an equilibrium frequency of v (in wavenumbers) has energy levels (also in 

wavenumbers) given by E = v Q (v + 1/2), where v is the vibrational quantum number, and v > 0. The notation 

can become a bit confusing, so note that: v (Greek letter nu) identifies the vibration, v is the vibrational 

frequency (in wavenumbers), and v (italic letter V) is the vibrational quantum number. It is trivial to extend 
this expression to a molecule with n uncoupled harmonic modes, 


Evu^v^.^ =J]5Kvi + 1/2) 


(B1.2.1) 


j = L 


where ^ is the equilibrium vibrational frequency of the zth mode. 


Of course, real molecules are not harmonic oscillators, and the energy level expression can be expanded in 
powers of (v + 1/2). For a single mode we have 


E u = %(v + 1/2) - i c P c (i- + I/2) 2 + 


where x Q is the anharmonicity constant. This allows the spacing of the energy levels to decrease as a function 

of vibrational quantum number. Usually the expansion converges rapidly, and only the first two terms are 
needed. Harmonic oscillator and anharmonic oscillator potential energy curves with their respective energy 
levels are compared in figure Bl.2.3 . 


Figure Bl.2.3. Comparison of the harmonic oscillator potential energy curve and energy levels (dashed lines) 
with those for an anharmonic oscillator. The harmonic oscillator is a fair representation of the true potential 
energy curve at the bottom of the well. Note that the energy levels become closer together with increasing 
vibrational energy for the anharmonic oscillator. The anharmonicity has been greatly exaggerated. 

There usually is rotational motion accompanying the vibrational motion, and for a diatomic, the energy as a 
function of the rotational quantum number, J, is 

Ej =B,JU + \)-D c [JlJ + l)] 2 

where B and D are the equilibrium rotational constant and centrifugal distortion constant respectively. The 

rotational constant is related to the moment of inertia through i? = A/871 Ic, where h is Planck's constant, I 
is the equilibrium moment of inertia, and c is the speed of light in vacuum. As the molecule rotates faster, it 
elongates due to centrifugal distortion. This increases the moment of inertia and causes the energy levels to 
become closer together. This is accounted for by including the second term with a negative sign. Overall, the 
vibration-rotation term energy is given by 

E^j = £ c <* + 1/2) - £ € %{v + 1/2)*+ J? f /(/+ 1) - DJJU + 1)1" -cr e (u+ ]/2)JU - I). (B1.2.2) 

The only term in this expression that we have not already seen is a , the vibration-rotation coupling constant. 
It accounts for the fact that as the molecule vibrates, its bond length changes which in turn changes the 
moment of inertia. Equation B 1.2. 2 can be simplified by combining the vibration-rotation constant with the 
rotational constant, yielding a vibrational-level-dependent rotational constant, 


B rl = « tf -ff tf (w + l/2) 

so the vibration-rotation term energy becomes 


tC vJ = v e {v+ 1/2) - JE e 5e(v + 1/2)- + /UU + 1) - Oc[i(i + I)] 2 - 

fit .2.2.2 QUANTUM MECHANICAL DESCRIPTION 

The quantum mechanical treatment of a harmonic oscillator is well known. Real vibrations are not harmonic, 
but the lowest few vibrational levels are often very well approximated as being harmonic, so that is a good 
place to start. The following description is similar to that found in many textbooks, such as McQuarrie (1983) 
[2]. The one-dimensional Schrodinger equation is 

"- -y + t/tOK*)= WJt) ( B1 - 2 - 3 ) 

where |u is the reduced mass, U(x) is the potential energy, \|/(x) is the wavefunction and E is the energy. The 
harmonic oscillator wavefunctions which solve this equation yielding the energies as given in equation B 1.2.1 
are orthonormal, and are given by 

where a = (k\ilh ), N are normalization constants given by 


and H are the Hermite polynomials. The Hermite polynomials are defined as [1] 


hm = i-\y <r — e - A \ 

Upon inspection, the first three are seen to be 

H (x) = 1 Hi (x) = 2x H 2 {X) = Ax 2 -2 
and higher degree polynomials are obtained using the recursion relation 

#„+!<*> = 2XH V (X) - 2u// l ,_ ] (A). (B1.2.4) 


The first few harmonic oscillator wavefunctions are plotted in figure B 1.2.4. 
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Figure Bl.2.4. Lowest five harmonic oscillator wavefunctions \\i and probability densities |v|/| . 

Upon solving the Schrodinger equation, the energy levels are E = vjy + 1/2), where i> is related to the force 
constant k and reduced mass u through 


2xr \fi/ 


Since the reduced mass depends only on the masses of the atoms (l/|n = l/zwj + l/m 2 for a diatomic), which 
are known, measurement of the vibrational frequency allows the force constant to be determined. 

The electric dipole selection rule for a harmonic oscillator is Av = ±1. Because real molecules are not 
harmonic, transitions with |Av| > 1 are weakly allowed, with Av = ±2 being more allowed than Av = ±3 and so 
on. There are other selection rules for quadrupole and magnetic dipole transitions, but those transitions are six 
to eight orders of magnitude weaker than electric dipole transitions, and we will therefore not concern 
ourselves with them. 

The selection rules are derived through time-dependent perturbation theory [1, 2]. Two points will be made in 
the following material. First, the Bohr frequency condition states that the photon energy of absorption or 
emission is equal 


to the energy level separation of the two states. Second, the importance of the transition dipole moment is 
shown and, furthermore, it is also shown that the transition dipole moment for a vibrational mode is in fact the 
change in dipole as a function of vibration, that is, the dipole derivative. The time-dependent Schrodinger 
equation is 


HV=ifl—. (B1.2.5) 

tit 

The time- and coordinate-dependent wavefunction for any given state is 

where \\f y (x) is a stationary state wavefunction obtained by solving the time-independent Schrodinger equation 
(Bl.2.3) . The Hamiltonian is broken down into two parts, //= /? + fr\ where /7 is the Hamiltonian for the 

isolated molecule, and /r ) describes the interaction Hamiltonian of the molecule with the electromagnetic 
field. In particular, the interaction energy between a dipole and a monochromatic field is 

H i]) = -^E = -fiEftCOslnvt. 

Consider a two-state system, where 

*i(JM) = lM-*)e~ i£|f/ * and W 2 (xJ) = fcWe - **. 

These wavefunctions are orthogonal. Assume that the system is initially in state 1, and that the interaction 
begins at t = 0. Since there are only two states, at any later time the wavefunction for the system is 

*(f) =fliCO*|(f)+fl2(0*2(0 

where the coefficients a^(t) and a 2 (t) are to be determined. We know that a^(0) = 1 and a 2 (0) = from the 
initial conditions. By substitution into equation (Bl.2.5), we have 

dt dt 

We can find the time-dependent coefficient for being in state 2 by multiplying from the left by ^ 2 *, and 
integrating over spatial coordinates: 


*,(/) j : *tf m #]dr ~<i;(/) / ^*H (l '* 3 dr = ifi-p f 0j*]dr + i7i-p / ^* z dr. 

This expression can be simplified considerably since \|/ 2 and \(/ 1 are orthogonal which implies that \|/ 2 and vpj 
are orthogonal as well. Furthermore, a n (7) « ^(0) = 1 and a 2 (t) « a 2 (0) = since /r ) is a small perturbation: 

(B1.2.6) 
where we have used Dirac bracket notation for the integral, i.e. 


(^■"i^i-y^'v,* 


In order to evaluate equation Bl.2.6, we will consider the electric field to be in the z-direction, and express the 
interaction Hamiltonian as 

Before substituting everything back into equation Bl.2.6, we define the transition dipole moment between 
states 1 and 2 to be the integral 

(hfcl = tifalftl* !>■ (B1-2.7) 

Now, we substitute it into equation Bl.2.6 to get 

T «<M.«,|«*[ j j+«p[ jj 

and then integrate from to t to obtain 

fllft) OC (^) 5] E 0z — — + \ . 

There are two important features of this result. The energy difference between states 1 and 2 is AE = Ej-Ey 
When AE « /rv, the denominator of the second term becomes very small, and this term dominates. This is the 
well known 
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Bohr frequency condition. The second important feature is that (|i ) 21 must be nonzero for an allowed 
transition, which is how the selection rules are determined. 

The molecular dipole moment (not the transition dipole moment) is given as a Taylor series expansion about 
the equilibrium position 


" = "° + © j ■ G?) 


Ji 2 + — ■ = no + fL^X + fl^X 2 + 


where x is the displacement from equilibrium, jllq is the permanent dipole moment and |u 1 is the dipole 
derivative and so on. It is usually fine to truncate the expansion after the second term. Now we need to 
evaluate (p. ) 21 f° r the harmonic oscillator wavefunctions. Using equation (Bl.2.7) , we have 


(^ 


-00 


which can be expanded by substituting |u, = |u, + ju^x: 


j-txt 

+ NjN if iy {°° H xj [a l/1 x)c- a! * 1 '' 2 xH xi (<x ]/2 x)c- ai ~ f2 dx. 


(B1.2.8) 


The first term is zero if i *j due to the orthogonality of the Hermite polynomials. The recursion relation in 
equation (B 1.2.4) is rearranged 

1 
xH„{x) = vH v -t(x) + -H v + l (x) 


and substituted into the second term in equation (Bl.2.8), 


N,N t /* 


(?) 


*///,,_ L <$) + ^ + ,($) 


,-? 


d* 


1 II 

where we have let £, = ax. Clearly, this integral will be nonzero only wheny = (i ±1), yielding the familiar 
Av = ±1 harmonic oscillator selection rule. Furthermore, the overtone intensities for an anharmonic oscillator 
are obtained in a straightforward manner by determining the eigenfunctions of the energy levels in a harmonic 
oscillator basis set, and then summing the weighted contributions from the harmonic oscillator integrals. 
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B1. 2.2.3 RAMAN SPECTROSCOPY 


NORMAL RAMAN SPECTROSCOPY 


Raman scattering has been discussed by many authors. As in the case of IR vibrational spectroscopy, the 
interaction is between the electromagnetic field and a dipole moment, however in this case the dipole moment 
is induced by the field itself. The induced dipole is |u ind = a E, where a is the polarizability. It can be 
expressed in a Taylor series expansion in coordinate displacement 


o=a0+ j_j A ^_j,. 




Here, a Q is the static polarizability, a' is the change in polarizability as a function of the vibrational 
coordinate, a" is the second derivative of the polarizability with respect to vibration and so on. As is usually 
the case, it is possible to truncate this series after the second term. As before, the electric field is E = 
E^coslnv^t, where v Q is the frequency of the light field. Thus we have 


jwind = («A + «'jt)£u cos 2?r tw. 


(B1.2.9) 


The time dependence of the displacement coordinate for a mode undergoing harmonic oscillation is given by 
x = x w cos2ttv v ^ where x m is the amplitude of vibration and v y is the vibrational frequency. Substitution into 
equation (Bl.2.9) with use of Euler's half-angle formula yields 

pi ind = ff0 E$ cos 2;r v$t + <x* (x m cos 2jt iy./) £q cos 2jt v$t 

= on ft'o cos 2;r i'o/ + — — — [cos2tt(i7o - v fl )f + cos2;r(uo + iv)r]> 


The first term results in Rayleigh scattering which is at the same frequency as the exciting radiation. The 
second term describes Raman scattering. There will be scattered light at (v Q - v ) and (v Q + v ), that is at sum 
and difference frequencies of the excitation field and the vibrational frequency. Since a'x m is about a factor of 
10 6 smaller than a Q , it is necessary to have a very efficient method for dispersing the scattered light. 

The bands on the low frequency side of the excitation frequency (v Q - v ) are referred to as the Stokes lines, 
consistent with the terminology used in fluorescence, whereas those on the high frequency side (v Q + v ) are 
the anti-Stokes lines. It is a bit unfortunate that this terminology was chosen, since the Raman process is 
fundamentally different from fluorescence. In particular, fluorescence is the result of a molecule absorbing 
light, undergoing vibrational relaxation in the upper electronic state, and re-emitting a photon at a lower 
frequency. The timescale for fluorescence is typically of the order of nanoseconds. The Raman process, on the 
other hand, is an instantaneous scattering process that occurs on a femtosecond timescale. The photon is never 
absorbed by the molecule. It is usually clear whether fluorescence or Raman scattering is being observed, but 
there are situations where it is ambiguous. We shall not pursue the issue any further here, however. 
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It is well known that the intensity of scattered light varies as the fourth power of the frequency, and based on 
this alone one would predict the Stokes lines to be less intense than the anti-Stokes by a factor of 


'anti-Stoko; t^b + v v)* 

which is 0.68 for a 1000 cm vibration excited at 488 nm (20 492 cm"). In reality, the Stokes lines are more 
intense, typically by a factor of 2 to 10,000, as the vibrational frequency varies from 200 to 2000 cm . This is 
easily justified when the Boltzmann population of initial states is taken into account. As seen in figure Bl.2.5 
the Stokes transitions correspond to the molecule being initially in a low energy state, usually v = 0, whereas it 
must be in an excited vibrational state if it is going to undergo an anti-Stokes process. The ratio of populations 
of two energy levels is given by 

where A E^ 2 is the energy difference between the two levels, k is Boltzmann's constant and Tis the 
temperature in Kelvin. Since the Stokes lines are more intense than anti-Stokes, only the Stokes lines for a 
Raman spectrum are typically presented, and the abscissa is labelled with the frequency offset (v Q - v ) rather 
than the actual frequency of scattered light. 
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Figure Bl.2.5. Comparison of several light scattering processes, (a) Rayleigh scattering, (b) Stokes and anti- 
Stokes Raman scattering, (c) pre-resonance Raman scattering, (d) resonance Raman scattering and (e) 
fluorescence where, unlike resonance Raman scattering, vibrational relaxation in the excited state takes place. 
From [3], used with permission. 
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Additional information about the vibration can be obtained through the depolarization ratio. This is the ratio 
of the intensity of scattered light that is polarized in a plane perpendicular to the incident radiation relative to 
that the scattered light that is polarized parallel to the incident polarization, p = Ijjly For totally symmetric 
modes, p = 0, while < p < 3/4 for non-totally symmetric modes [1, 3]. The polarization ratio can actually be 
greater than 3/4 for a resonantly enhanced Raman band [3]. 

Consistent with the notion that Raman scattering is due to a change in polarizability as a function of vibration, 
some of the general features of Raman spectroscopy [3] are: 


(1) it is more sensitive to stretching modes than bending modes, especially totally symmetric modes; 

(2) the intensity increases with bond order (i.e. double bond vibrations are more intense than single bond 
vibrations); 

(3) for modes involving only one bond, the intensity increases with increasing atomic number of the atoms; 

(4) for cyclic molecules, breathing modes are strongest. 

RESONANCE RAMAN SPECTROSCOPY 


Resonance Raman spectroscopy has been discussed by many authors [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 and 
15 ]. If the excitation frequency is resonant with an excited electronic state, it is possible to dramatically 
increase the Raman cross-section. It is still a scattering process without absorption, but the light and 

polarizability can now couple much more efficiently. Resonant enhancements of 10 4 to 10 6 are achievable. A 
second important fact is that the vibrational modes that are coupled to the electronic transition are selectively 
enhanced, which greatly aids in structural determination. Another implication is that much lower 

concentrations can be used. A typical neat liquid might have a concentration of roughly 5 to 50 mol 1 . It is 
possible for resonant enhanced Raman bands of a molecule to be as strong or stronger than the solvent bands 

even at a concentration of 10~ 6 M. This is useful because it is often desirable to maintain low enough 
concentrations such that the solute molecules do not interact with each other or aggregate. Finally, excited 
state vibrational dephasing rates can be determined. 


The sum-over-states method for calculating the resonant enhancement begins with an expression for the 
resonance Raman intensity, /. * for the transition from initial state i to final state/in the ground electronic 
state, and is given by [ 14 ] 

2 7 jt 5 
'*>f = ^h^H K<W;,/I 3 (vl ± v ir ) 4 

where 7 Q is the incident intensity, v L is the laser frequency, v y is the vibrational frequency and (a ) . ^is the 
path element of the Raman scattering tensor, 

l"*°hr = E r _ frir + T-TTT^iF" (B1 " Z10) 

Here, (M ) ^ is the pth component of the electronic transition moment from state a to 6, E . and ^ />are the 
energy 
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differences between the resonant excited state and the initial and final states, respectively, E L - hv L is the 
photon energy of the laser radiation, and T r is the excited state vibrational dephasing rate. The Raman 
scattering tensor has two contributions, the so-called A and B terms: 


The i?-type enhancement is smaller than the v4-type, and is usually, though not always, negligible. Therefore, 
we will concentrate on the A term [14], 


where p and a are pure electronic transition moments, and (i \ r) and ( r \j) are vibrational overlap integrals. In 
many cases, a single diagonal component of the scattering tensor dominates, which leads to a simplified 
expression, 

(B1.2.11) 

where ( e\M \g) is the transition dipole moment for the electronic transition between ground state g and 
electronically excited state e, and r are the vibrational levels in the excited electronic state with vibrational 
energy E. relative to the initial vibrational level in the ground electronic state. From the form of this 
equation, we see that the enhancement depends on the overlap of the ground and excited state wavefunctions 
(Franck-Condon overlap), and is strongest when the laser frequency equals the energy level separation. 
Furthermore, systems with very large dephasing rates will not experience as much enhancement, all other 
factors being equal. 

If there is only significant overlap with one excited vibrational state, equation (B 1.2.1 1) simplifies further. In 
fact, if the initial vibrational state is v f = 0, which is usually the case, and there is not significant distortion of 
the molecule in the excited electronic state, which may or may not hold true, then the intensity is given by 


It is also possible to determine the resonant Raman intensities via a time-dependent method [16]. It has the 


advantages that the vibrational eigenstates of the excited electronic state need not be known, and that it 
provides a physical picture of the dynamics in the excited state. Assuming only one ground vibrational state is 
occupied, the Raman cross section is [ 16 , 17 ] 
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where a> L is the laser frequency, a> s is the frequency of the scattered light, (/^%) is the electronic transition 

dipole, to. is the zero point vibrational energy of the initial state, (f\ i(t)) is the overlap of the final state with 

the time-evolving state on the upper electronic state potential energy surface and the exp[-g(7)] term accounts 
for solvent-induced electronic dephasing. Unlike the sum-over-states method, the only excited state 
information needed is the potential energy surface in the immediate vicinity of where the initial state is 
projected onto the excited state. 


B1.2.3 SPECTROMETERS 

B1. 2.3.1 INFRARED SPECTROMETERS 

In the most general terms, an infrared spectrometer consists of a light source, a dispersing element, a sample 
compartment and a detector. Of course, there is tremendous variability depending on the application. 

LIGHT SOURCES 

Light sources can either be broadband, such as a Globar, a Nernst glower, an incandescent wire or mercury 
arc lamp; or they can be tunable, such as a laser or optical parametric oscillator (OPO). In the former case, a 
monochromator is needed to achieve spectral resolution. In the case of a tunable light source, the spectral 
resolution is determined by the linewidth of the source itself. In either case, the spectral coverage of the light 
source imposes limits on the vibrational frequencies that can be measured. Of course, limitations on the 
dispersing element and detector also affect the overall spectral response of the spectrometer. 

Desirable characteristics of a broadband light source are stability, brightness and uniform intensity over as 
large a frequency range as possible. Desirable characteristics of a tunable light source are similar to those of a 
broadband light source. Furthermore, its wavelength should be as jitter- free as possible. Having a linewidth of 

0.001 cm is meaningless if the frequency fluctuates by 0.01 cm on the timescale of scanning over an 
absorption feature. 

The region 4500-2850 cm is covered nicely by f-centre lasers, and 2800-1000 cm by diode lasers (but 
there are gaps in coverage). Difference frequency crystals allow two visible beams whose frequencies differ 
by the wavelengths of interest to be mixed together. Using different laser combinations, coverage from greater 

than 5000-1000 cm -1 has been demonstrated. The spectral range is limited only by the characteristics of the 
difference frequency crystal. The ultimate resolution of a laser spectrometer is dictated by the linewidth of the 
tunable light source. If the linewidth of the light source is broader than the absorption linewidth, then 
sensitivity is diminished, and the transition will appear broader than it actually is. 

When extremely high resolution is not required, an attractive alternative is found in OPOs. An OPO is based 
on a nonlinear crystal that converts an input photon into two output photons whose energies add up to the 

input photon's energy. They can be rapidly tuned over a relatively large range, 5000-2200 cm -1 , depending 
on the nonlinear crystal. In the IR, commonly used crystals are LiNb0 3 and KDP (KH 2 P0 4 ). The 

wavelengths of the two output beams are determined by the angle of the nonlinear crystal. That is, it will only 


function when the index of refraction of the crystal in the direction of propagation is identical for all three 
beams. If narrow linewidths are required, it is necessary to seed the OPO with a weak beam from a diode 
laser. Thus, the parametric oscillation does not have to build up out of the noise, and is therefore more stable. 
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DISPERSING ELEMENTS 

Dispersing elements must be used when broadband IR light sources are employed; either diffraction gratings 
or prisms can be used. Gratings are made by depositing a metal film, usually gold, on a ruled substrate. They 
can be used over a broad spectrum because the light never penetrates into the substrate. However, the 
reflectivity of the coating can be wavelength dependent. The useable range is determined by the pitch of the 

rulings. A grating suitable for use from 1000 to 3000 cm -1 would have 100-200 lines mm -1 . If there are too 
many lines per millimetre, then it acts like a mirror rather than a grating. If it has too few, then the efficiency 
is greatly reduced. 

Prisms can also be used to disperse the light. They are much easier to make than gratings, and are therefore 
less expensive, but that is their only real advantage. Since the light has to pass through them, it is necessary to 
find materials that do not absorb. Greater dispersion is obtained by using materials with as high an index of 
refraction as possible. Unfortunately, materials with high index are also close to an absorption, which leads to 
a large nonlinear variation in index as a function of frequency. 

When dispersing elements are used, the resolution of the spectrometer is determined by the entrance slit 
width, the exit slit width, the focal length and the dispersing element itself. Resolving power is defined as 

R = k/&k= v/Av 

that is, the central wavelength (or frequency) of the light exiting the spectrometer relative to its linewidth (or 
bandwidth). Higher resolving powers allow closely spaced lines to be distinguished. 

DETECTORS 

The detector chosen is just as important as the light source. If the sample is absorbing light, but the detector is 
not responding at that frequency, then changes in absorption will not be recorded. In fact, one of the primary 

limitations faced by spectroscopists working in the far-IR region of the spectrum (10-300 cm ) has been lack 
of highly sensitive detectors. The type of detector used depends on the frequency of the light. Of course, at 
any frequency, a bolometer can be used, assuming the detection element absorbs the wavelength of interest. A 
bolometer operates on the principle of a temperature-dependent resistance in the detector element. If the 
detector is held at a very low temperature, 2-4 K, and has a small heat capacity, then incoming energy will 
heat it, causing the resistance to change. While these detectors are general, they are susceptible to missing 
weak signals because they are swamped by thermal blackbody radiation, and are not as sensitive as detectors 
that have a response tied to the photon energy. At frequencies between 4000 and 1900 cm , InSb 
photodiodes are used, HgCdTe detectors are favoured for frequencies between 2000 and 700 cm , and 
copper-doped germanium (Cu:Ge) photoconductors are used for frequencies between 1000 and 300 cm -1 
[18]. More recently, HgCdTe array detectors have become available that respond in the range 3500-900 cm -1 
[19]. 

B1. 2.3.2 RAMAN SPECTROMETERS 

While Raman spectroscopy was first described in a paper by C V Raman and K S Krishnan in 1928, it has 


only come into widespread use in the last three decades owing to the ready availability of intense 
monochromatic laser light 
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sources. Oddly enough, prior to 1945, Raman spectroscopy using Hg arc lamps was the method of choice for 
obtaining vibrational spectra since there was not yet widespread availability of IR spectrometers [5]. Just as 
with IR spectrometers, a Raman spectrometer consists of a light source, a dispersing element, a sample 
compartment and a detector. Again, there is tremendous variability depending on the application. 

LIGHT SOURCES 

The light source must be highly monochromatic so that the Raman scattering occurs at a well-defined 

frequency. An inhomogeneously broadened vibrational linewidth might be of the order of 20 cm -1 , therefore, 
if the excitation source has a wavelength of 500 nm, its linewidth must be less than 0.5 nm to ensure that it 
does not further broaden the line and decrease its intensity. This type of linewidth is trivial to achieve with a 
laser source, but more difficult with an arc lamp source. 

Fixed frequency laser sources are most commonly used. For example, the Ar + laser has lines at 514, 496, 488, 
476 and 458 nm. Sometimes a helium-neon laser is used (628 nm), or a doubled or tripled YAG (532 or 355 
nm, respectively). Other wavelengths are generated by employing a Raman shifter with a variety of different 
gases. It is also desirable to have tunability, in order to carry out resonance Raman studies wherein one 
selectively measures the vibrations most strongly coupled to the electronic state being excited. Tunable lasers 
can be either line-tunable or continuously tunable. Thanks to the high sensitivity of photomultiplier tubes, the 
light source need only provide moderate power levels, -10-100 mW for example. In fact, one must be careful 
not to use too much power and damage the sample. 

DISPERSING ELEMENTS 

Due to the rather stringent requirements placed on the monochromator, a double or triple monochromator is 
typically employed. Because the vibrational frequencies are only several hundred to several thousand cm_ 1? 
and the linewidths are only tens of cm_ 1? it is necessary to use a monochromator with reasonably high 
resolution. In addition to linewidth issues, it is necessary to suppress the very intense Rayleigh scattering. If a 
high resolution spectrum is not needed, however, then it is possible to use narrow-band interference filters to 
block the excitation line, and a low resolution monochromator to collect the spectrum. In fact, this is the 
approach taken with Fourier transform Raman spectrometers. 

DETECTORS 

Because the scattered light being detected is in the visible to near UV region, photomultiplier tubes (PMTs) 
are the detector of choice. By using a cooled PMT, background counts can be reduced to a level of only a few 
per second. For studies with higher signal levels, array detectors such as optical multichannel analysers 
(OMAs), or CCD arrays can be used. This allows a complete spectrum to be obtained without having to scan 
the monochromator. 

RESOLUTION 

The resolution of the Raman spectrum is determined by the monochromator. Furthermore, since the light 

being measured is in the visible region, usually around 20 000 cm , the resolution of the monochromator 
must be significantly better than that of its IR counterpart because the resolving power is described by Av/v. 
That is, for 
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a 2000 cm" vibration a resolving power of 20,000 is needed to get the same resolution as that obtained in an 
IR spectrometer with a resolution of only 2000. 

B1. 2.3.3 FOURIER TRANSFORM TECHNIQUES 

Fourier transform techniques do not change the underlying absorption or scattering mechanisms that couple 
light with matter. The data acquisition and processing are significantly different, however. In short, the 
difference is that the data are collected interferometrically in the time domain and then Fourier transformed 
into the frequency domain for subsequent analysis. These types of instruments are often used in experiments 
where broad spectral coverage with moderate sensitivity and frequency resolution is needed. This is often 
encountered when other aspects of the experiment are more difficult, such as surface studies. There is, 
however, ongoing research directed towards time-resolved FTIR with nanosecond time resolution [20, 21 ]. 
The basic requirements are a broadband light source, a beamsplitter, two delay lines (one fixed and one 
variable), a detector, and a computer to run the show and Fourier-transform the data. 

The underlying concept behind an FTIR spectrometer can be understood when considering what happens 
when a beam of monochromatic light with wavelength X is sent through a Michelson interferometer, shown 
schematically in figure Bl.2.6 [22]. If the pathlength difference between the two beams is zero, then they will 
constructively interfere, yielding an intensity at the detector of I Q . Now consider what happens if the movable 
mirror is displaced by a distance d = A/4. This will cause the optical path of that beam to change by an amount 
8 = A/2, and lead to destructive interference at the detector, with no power being measured. If the mirror is 
displaced by another A/4, then the two beams will once again constructively interfere, leading to full intensity 
at the detector. The intensity as a function of mirror position is shown in figure B 1.2. 7(a) . The intensity as a 
function of optical delay, 8, is described by 

/(5) = -^[l +cos(2tt^)] 


where fiis the the frequency of interest and /(ft) is its intensity. Similarly, figure B 1.2. 7(b) shows the intensity 
as a function of mirror position when two frequencies with equal amplitudes are present, and/j = 1.2f 2 . Figure 
B1.2.7(c) depicts these same two frequencies, but with the amplitude of the lower frequency twice as large as 
that of the higher frequency. Finally, figure B 1.2. 7(d) shows the result for a Gaussian distribution of 
frequencies. For a discrete distribution of frequencies, the intensity as a function of optical delay is 


1 n 


2fe, 


and if there is a continuous distribution of frequencies, it is 

1 **"* 


2 Jn 
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While the data are collected in the time domain by scanning a delay line, they are most easily interpreted in 
the frequency domain. It is straightforward to connect the time and frequency domains through a Fourier 
transform 


/(*) 


= 4/ t 


[J(S)-/(0)/2]co*(2*^Hti. 


Two scans are required to obtain an absorption spectrum. First, a blank reference scan is taken that 
characterizes the broadband light source. Then a scan with the sample in place is recorded. The ratio of the 
sample power spectrum to the reference power spectrum is the transmission spectrum. If the source has stable 
output, then a single reference scan can be used with many sample scans. 
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Figure Bl.2.6. Schematic representation of a Michelson interferometer. From Griffiths P R and de Haseth J A 
1986 Fourier transform infrared spectroscopy Chemical Analysis ed P J Elving and J D Winefordner (New 
York: Wiley). Reprinted by permission of John Wiley and Sons Inc. 


-20- 



— i 1 r 

6 a 10 12 14 



1— 
ft 10 12 M 



"I™ 
4 


-4 


—I 
10 12 14 



t 1 1 1 1 r 

-6 -4 O 2 4 
Optica) Delay (wb. units) 



6 8 10 12 14 
Frequency {orb. unite) 


Figure Bl.2.7. Time domain and frequency domain representations of several interfero grams, (a) Single 
frequency, (b) two frequencies, one of which is 1.2 times greater than the other, (c) same as (b), except the 
high frequency component has only half the amplitude and (d) Gaussian distribution of frequencies. 

The interferogram is obtained by continuously scanning the movable mirror and collecting the intensity on the 
detector at regular intervals. Thus, each point corresponds to a different mirror position, which in turn 
corresponds to a different optical delay. Several scans can be averaged together because the scanning 
mechanism is highly reproducible. 

The spectral resolution of the Fourier transformed data is given by the wavelength corresponding to the 
maximum optical delay. We have A, v = 5^ ov , and therefore Av = 1/5 W . Therefore, if 0.1 cm spectral 

* niaX mdx niaX 

resolution is desired, the optical delay must cover a distance of 10 cm. The high frequency limit for the 
transform is the Nyquist frequency which is determined by the wavelength that corresponds to two time steps: 
1 • = 2AS, which leads to D v = 1/(2A5). The factor of two arises because a minimum of two data points per 

mm mdx. 

period are needed to sample a sinusoidal waveform. Naturally, the broadband light source will determine the 
actual content of the spectrum, but it is important that the step size be small enough to accommodate the 
highest frequency components of the source, otherwise they 
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will be folded into lower frequencies, which is known as 'aliasing'. The step size for an FT-Raman 
instrument must be roughly ten times smaller than that in the IR, which is one of the reasons that FT-Raman 
studies are usually done with near-IR excitation. 


There are several advantages of FT techniques [23]. One is the Jaquinot (or throughput) advantage. Since 
there are fewer optical elements and no slits, more power reaches the detector. Assuming that the noise source 
is detector noise, which is true in the IR, but not necessarily for visible and UV, the signal-to-noise (S/N) ratio 
will increase. A second advantage of FT techniques is that they benefit from the Fellget (or multiplex) 
advantage. That is, when collecting the data, absorptions at all frequencies simultaneously contribute to the 
interferogram. This is contrasted with a grating spectrometer where the absorption is measured only at a single 
frequency at any given time. Theoretically, this results in an increase in the S/N ratio by a factor of 




This assumes that both spectra have the same resolution, and that it takes the same amount of time to collect 
the whole interferogram as is required to obtain one wavelength on the dispersive instrument (which is usually 
a reasonable assumption). Thus, %^/At 1 interferograms can be obtained and averaged together in the same 

amount of time it takes to scan the spectrum with the dispersive instrument. Since the S/N ratio scales with the 
square root of the number of scans averaged, the square root of this number is the actual increase in S/N ratio. 


B1.2.4 TYPICAL EXAMPLES 

There are thousands of scientists whose work can be classified as vibrational spectroscopy. The following 
examples are meant to show the breadth of the field, but cannot be expected to constitute a complete 
representation of all the fields where vibrational spectroscopy is important. 

B1. 2.4.1 LASER IR 

For the highest resolution and sensitivity, laser-based spectrometers must be used. These have the advantage 
that the resolution depends on the linewidth of the laser, rather than the monochromator. Furthermore, at any 
given moment, all of the power is at the frequency of interest, rather than being spread out over the whole IR 

spectrum. Due to the fact that the emission from any given laser typically has only 100-1500 cm of 
tunability, and there can be difficulty maintaining narrow linewidths, the spectral coverage can be limited. 

High resolution spectroscopic measurements in the gas phase yield the most detailed structural information 
possible. For example, measurements of weakly-bound complexes in the far-IR [24, 25] and IR [26, 27] have 
provided the most exact information on their structure and steady-state dynamics. Of course, a much higher 
level of theory must be used than was presented in section B 1.2. 2.1 . Quite often the modes are so strongly 
coupled that all vibrational degrees of freedom must be treated simultaneously. Coriolis coupling and 
symmetry-allowed interactions among bands, i.e. Fermi resonances, are also quite significant, and must be 
treated explicitly. Direct measurement of the low frequency van der 
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Waals modes in weakly-bound complexes has been discussed in section B 1.4 , Microwave and Terahertz 
Spectroscopy, and will not be repeated here. 

A recent review of high-resolution, direct IR laser absorption spectroscopy in supersonic slit jets provides a 
prototypical example [26]. Figure Bl.2.8 displays the experimental set-up. There are three different IR 
sources employed. The first utilizes difference frequency mixing of a single mode tunable ring dye laser with 
a single mode Ar + laser in a LiNbO^ crystal to obtain light in the 2-4 jim region (5000-2500 cm") with a 
frequency stability of 2 MHz (6.7x10 cm ). The second uses cryogenically cooled, single mode, lead-salt 


diodes to cover 4-10 |um (2500-1000 cm ) with a frequency resolution of 15 MHz (5x10 cm ). The third 
is a difference frequency scheme between a single mode dye laser and a single mode Nd: YAG laser which 

accesses wavelengths below 2 |um (frequencies greater than 5000 cm -1 ). A long pathlength is achieved by 
combining a slit jet expansion with a multipass cell. This enhances the sensitivity by over two orders of 
magnitude. 
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Figure Bl.2.8. Schematic diagram of a slit jet spectrometer for high resolution IR studies of weakly-bound 
species. From [26], used with permission. 

One of the systems studied is the Ar-HF van der Waals (vdW) dimer. This rare gas-molecule complex has 
provided a wealth of information regarding intermolecular interactions. The vdW complex can be thought of 
in terms of the monomer subunit being perturbed through its interaction with the rare gas. The rotational 
motion is hindered, and for certain modes can be thought of as extremely large amplitude bending motion. 
Furthermore, there is low frequency intermolecular stretching motion. In a weakly-bound complex, the bends, 
stretches and internal rotations are all coupled with each other. The large spectral coverage has allowed for the 
measurement of spectra of Ar-HF with v HF = 1 and 2. By combining these measurements with those made in 
the far-IR [28], intermolecular potential energy 
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surfaces for v HF = 0, 1 and 2 have been determined. Quite small amounts of internal energy (-150 cm ) 
allow the complex to fully sample the angular degree of freedom. Interestingly, the barrier to internal rotation 
is about the same for Vtt F = or 1, but significantly larger when Vtt F = 2. 


In addition to the dependence of the intermolecular potential energy surface on monomer vibrational level, the 
red-shifting of the monomer absorption as a function of the number of rare gas atoms in the cluster has been 
studied. The band origin for the v HF = 1 <— vibration in a series of clusters Ar^-HF, with < n < 5, was 
measured and compared to the HF vibrational frequency in an Ar matrix (n = oo). The monomer vibrational 
frequency v HF red shifts monotonically, but highly nonlinearly, towards the matrix value as sequential Ar 
atoms are added. Indeed, roughly 50% of the shift is already accounted for by n = 3. 


CAVITY RINGDOWN SPECTROSCOPY 

The relatively new technique of cavity ringdown laser absorption spectroscopy, or CRLAS [29, 30, 31 and 
32], has proven to be exceptionally sensitive in the visible region of the spectrum. Recently, it has been used 
in the IR to measure O-H and O-D vibrations in weakly-bound water and methanol clusters [33]. The concept 
of cavity ringdown is quite straightforward. If a pulse of light is injected into a cavity composed of two very 
highly reflective mirrors, only a very small fraction will escape upon reflection from either mirror. If mirrors 
with 99.996% reflectivity are used, then the photons can complete up to 15 000 round trip passes, depending 
on other loss factors in the cavity. This makes the effective pathlength up to 30,000 longer than the physical 
pathlength of the sample. Currently, dielectric coatings for IR wavelengths are not as efficient as those for the 
visible, and the highest reflectivity available is about 99.9-99.99%, which leads to sensitivity enhancements 
of several hundred to several thousand. It is best to use pulses with a coherence length that is less than the 
cavity dimensions in order to avoid destructive interference and cancellation of certain frequencies. 

The light leaking out of the cavity will decay exponentially, with a time constant that reflects the round-trip 
losses. When an absorbing sample is placed in the cavity, there are additional losses and the exponential time 
constant will become shorter. More highly absorbing samples will affect the time constant to a larger extent, 
and the absolute absorption is determined. The experiment is shown schematically in figure Bl.2.9. One of 
the most important attributes of CRLAS is that it is relatively insensitive to laser pulse intensity fluctuations 
since the ringdown time constant, not the transmitted intensity, is measured. 
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Figure Bl.2.9. Schematic representation of the method used in cavity ringdown laser absorption 
spectroscopy. From [33], used with permission. 

To date, the IR-CRLAS studies have concentrated on water clusters (both H 2 and D 2 0), and methanol 
clusters. Most importantly, these studies have shown that it is in fact possible to carry out CRLAS in the IR. 
In one study, water cluster concentrations in the molecular beam source under a variety of expansion 
conditions were characterized [34]. In a second study OD stretching bands in (D 2 0) w clusters were measured 
[35]. These bands occur between 2300 
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and 2800 cm , a spectral region that has been largely inaccessible with other techniques. Cooperative effects 
in the hydrogen-bonded network were measured, as manifested in red shifts of OD stretches for clusters up to 
(D 2 0) 8 . These data for additional isotopes of water are necessary to fully characterize the intermolecular 
interactions. 


For methanol clusters [36], it was found that the dimer is linear, while clusters of 3 and 4 molecules exist as 
monocyclic ring structures. There also is evidence that there are two cyclic ring trimer conformers in the 
molecular beam. 

B1. 2.4.2 RESONANCE RAMAN SPECTROSCOPY 


The advantages of resonance Raman spectroscopy have already been discussed in section Bl.2.2.3 . For these 
reasons it is rapidly becoming the method of choice for studying large molecules in solution. Here we will 
present one study that exemplifies its attributes. There are two complementary methods for studying proteins. 


First, it is possible to excite a chromophore corresponding to the active site, and determine which modes 
interact with it. Second, by using UV excitation, the amino acids with phenyl rings (tryptophan and tyrosine, 
and a small contribution from phenylalanine) can be selectively excited [4]. The frequency shifts in the 
resonance Raman spectrum associated with them provide information on their environment. 


There has been extensive work done on myoglobin, haemoglobin, Cytochrome-c, rhodopsin and 
bacteriorhodopsin. In fact, there are literally hundreds of articles on each of the above subjects. Here we will 
consider haemoglobin [12]. The first three of these examples are based on the protohaeme unit, shown in 
figure B 1.2. 10. 
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Figure Bl.2.10. Structure of the protohaeme unit found in haemoglobin and myoglobin. 

Haemoglobin is made up of four protohaeme subunits, two are designated a and two are designated (3. The 
protohaeme unit with surrounding protein is shown in figure B 1.2.11 . The binding curve of 2 suggests a two 
state model, where haemoglobin binds 2 more strongly when the concentration is higher. Thus, it will 
strongly bind 2 in the lungs where the concentration is high, and then release it in tissues where the 2 
concentration is low. The high affinity and low affinity states are referred to as the R state and T state, 
respectively [37]. The R and T states each 
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have a distinct structure, and have been characterized crystallographically. Therefore, there are three primary 
issues: 


(1) What is the behaviour of the haeme macrocycles? 

(2) What is the interaction between the iron and the nearby histidines? 

(3) What are the structural dynamics of the tetramer as a whole? The haemoglobin bound to 2 (Hb0 2 ) is 
not photoactive, so the CO adduct, HbCO, is used instead. 


r-HEME b 



Figure Bl.2.11. Biologically active centre in myoglobin or one of the subunits of haemoglobin. The bound 
CO molecule as well as the proximal and distal histidines are shown in addition to the protohaeme unit. From 
Rousseau D L and Friedman J M 1988 Biological Applications of Raman Spectroscopy vol 3, ed T G Spiro 
(New York: Wiley). Reprinted by permission of John Wiley and Sons Inc. 

Information about the haeme macrocycle modes is obtained by comparing the resonance Raman spectra of 
deoxyHb with HbCO. The d-d transitions of the metal are too weak to produce large enhancement, so the 

Soret band of the macrocycle, a n to n transition, is excited instead. It has been found that the vinyl groups 
do not participate in the conjugated system [4]. This is based on the fact that the vinyl C=C stretch does not 

exhibit resonant enhancement when the n to n transition is excited. On the other hand, it is found that both 
totally symmetric and nontotally symmetric modes of the macrocycle are all at a lower frequency in the 
HbCO photoproduct spectra relative to deoxyHb. This is interpreted to mean that the photoproduct has a 
slightly expanded core relative to the deoxy structure [12, 13 ]. Given that there is a structural change in the 
haeme centre, it is expected that the interaction of the iron with the proximal histidine should also be affected. 

The Fe-N Hls mode is at 222 cm -1 in the R state and 207 cm -1 in the T state for the a subunits, but only 
shifted to 218 cm -1 T state for the P subunits. This is consistent with the interpretation that the Fe-imidazole 
interations are weakened more in the T state of the a subunits than (3 subunits. Time-resolved resonance 
Raman studies have shown that the R — > T switch is complete on a 10 |us timescale [38]. Finally, UV 
excitation of the aromatic protein side chains yields 
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frequency shifts that indicate a change in the quaternary structure, and this occurs on the same timescale as the 
frequency shifts in the Fe-N modes. 


B1.2.4.3 TIME-RESOLVED SPECTROSCOPY 


Time-resolved spectroscopy has become an important field from x-rays to the far-IR. Both IR and Raman 
spectroscopies have been adapted to time-resolved studies. There have been a large number of studies using 
time-resolved Raman [39], time-resolved resonance Raman [7] and higher order two-dimensional Raman 
spectroscopy (which can provide coupling information analogous to two-dimensional NMR studies) [40]. 
Time-resolved IR has probed neutrals and ions in solution [41, 42], gas phase kinetics [43] and vibrational 
dynamics of molecules chemisorbed and physisorbed to surfaces [44]. Since vibrational frequencies are very 
sensitive to the chemical environment, pump-probe studies with IR probe pulses allow structural changes to 


be monitored as a reaction proceeds. 

As an illustrative example, consider the vibrational energy relaxation of the cyanide ion in water [45]. The 
mechanisms for relaxation are particularly difficult to assess when the solute is strongly coupled to the 
solvent, and the solvent itself is an associating liquid. Therefore, precise experimental measurements are 
extremely useful. By using a diatomic solute molecule, this system is free from complications due to coupling 

of vibrational modes in polyatomics. Furthermore, the relatively low frequency stretch of roughly 2000 cm -1 
couples strongly to the internal modes of the solvent. 

Infrared pulses of 200 fs duration with 150 cm -1 of bandwidth centred at 2000 cm -1 were used in this study. 
They were generated in a two-step procedure [46]. First, a P~BaB 2 4 (BBO) OPO was used to convert the 
800 nm photons from the Tksapphire amplifier system into signal and idler beams at 1379 and 1905 nm, 
respectively. These two pulses were sent through a difference frequency crystal (AgGaS 2 ) to yield pulses 

centred at 2000 cm -1 . A 32-element array detector was used to simultaneously detect the entire bandwidth of 
the pulse [45]. 

Two isotopic combinations of CN -2 ( 13 C 15 N~ which has a stretching frequency of 2004 cm -1 and 12 C 14 N~ 
with a stretching frequency of 2079 cm -1 ) in both H 2 and D 2 yield a range of relaxation times. In 
particular, it is found that the vibrational relaxation time decreases from 120 to 71 ps in D 2 as the vibrational 

frequency increases from 2004 to 2079 cm -1 . However, in H 2 0, the relaxation time is roughly 30 ps for both 
isotopomers. The vibrational relaxation rate is highly correlated to the IR absorption cross section of the 
solvent at the frequency of solute vibration, which indicates that Coulombic interactions have a dominant role 
in the vibrational relaxation of CN~. 

B1. 2.4.4 ACTION SPECTROSCOPY 

The term 'action spectroscopy' refers to those techniques that do not directly measure the absorption, but 
rather the consequence of photoabsorption. That is, there is some measurable change associated with the 
absorption process. There are several well known examples, such as photoionization spectroscopy [47], multi- 
photon ionization spectroscopy [48], photoacoustic spectroscopy [49], photoelectron spectroscopy [ 50 , 51], 
vibrational predissociation spectroscopy [52] and optothermal spectroscopy [53, 54]. These techniques have 
all been applied to vibrational spectroscopy, but only the last one will be discussed here. 

Optothermal spectroscopy is a bolometric method that monitors the energy in a stream of molecules rather 
than in the light beam. A well collimated molecular beam is directed toward a liquid helium cooled bolometer. 
There will be energy 
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deposited in the bolometer from the translational kinetic energy of the molecules as well as any internal 
energy they may have. A narrow linewidth (2 MHz) infrared laser illuminates the molecular beam, typically 
in a multipass geometry. As the laser frequency is scanned the molecules will absorb energy when the 
frequency corresponds to a transition frequency. At that point, the energy deposited in the bolometer will 
change. 

The optothermal spectrum will faithfully represent the absorption spectrum provided that the molecules do not 
fluoresce prior to arrival at the detector. Fluorescence is not a problem because infrared fluorescence lifetimes 
are of the order of milliseconds. The transit time from the laser-molecular beam interaction region to the 
detector is tens of |us. Furthermore, it is possible to determine if the absorbing species is a stable monomer, or 
weakly bound cluster. When a stable monomer absorbs a photon, the amount of energy measured by the 
bolometer increases. When a weakly-bound species absorbs a photon greater than the dissociation energy, 
vibrational predissociation will take place and the dissociating fragments will not hit the bolometer element, 


leading to a decrease in energy measured by the bolometer. It is also possible to place the bolometer off-axis 
from the collimated molecular beam so that only dissociating molecules will register a signal. 

A nice example of this technique is the determination of vibrational predissociation lifetimes of (HF) 2 [55], 
The HF dimer has a nonlinear hydrogen bonded structure, with nonequivalent HF subunits. There is one 'free' 
HF stretch (v^, and one 'bound' HF stretch (v 2 ), which rapidly interconvert. The vibrational predissociation 
lifetime was measured to be 24 ns when exciting the free HF stretch, but only 1 ns when exciting the bound 
HF stretch. This makes sense, as one would expect the bound HF vibration to be most strongly coupled to the 
weak intermolecular bond. 

B1. 2.4.5 MICROSCOPY 

It is possible to incorporate a Raman or IR spectrometer within a confocal microscope. This allows the spatial 
resolution of the microscope and compound identification of vibrational spectroscopy to be realized 
simultaneously. One of the reasons that this is a relatively new development is because of the tremendous 
volume of data generated. For example, if a Raman microscope has roughly 1 |um spatial resolution, and an 
area of 100 |Limxl00 |um is to be imaged, and the frequency region from 800 cm -3400 cm is covered with 
4 cm spectral resolution, then the data set has 6 million elements. Assuming each value is represented with a 
4 byte number, the image would require 24 M Bytes of storage space. While this is not a problem for current 
computers, the capacity of a typical hard drive on a PC from around 1985 (IBM 8088) was only 20 M Byte. 
Also, rapid data transfer is needed to archive and retrieve images. Furthermore, in order to obtain the 
spectrum at any spatial position, array detectors (or FT methods) are required. A representative experimental 
set-up is shown in figure Bl. 2. 12 [3]. 
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Figure Bl.2.12. Schematic diagram of apparatus for confocal Raman microscopy. From [3], used with 
permission. 

Raman microscopy is more developed than its IR counterpart. There are several reasons for this. First, the 
diffraction limit for focusing a visible beam is about 10 times smaller than an IR beam. Second, Raman 
spectroscopy can be done in a backscattering geometry, whereas IR is best done in transmission. A 
microscope is most easily adapted to a backscattering geometry, but it is possible to do it in transmission. 


Raman microscopy is particularly adept at providing information on heterogeneous samples, where a 


conventional spectrometer would average over many domains. It has found applications in materials science 
and catalysis; earth, planetary and environmental sciences; biological and medical sciences; and even in art 
history and forensic science [3]. For example, consider a hypothetical situation where someone owned what 
they believed to be a mediae val manuscript from the 12th century. One could easily identify a small region of 
green colour to non-destructively analyse using Raman microscopy. If it happens that the dye is identified as 
phthalocyanine green, which was not discovered until 1938, then one can be certain that the manuscript is not 
authentic, or that it has undergone restoration relatively recently. 


B1.2.5 CONCLUSIONS AND FUTURE PROSPECTS 

Vibrational spectroscopy has been, and will continue to be, one of the most important techniques in physical 
chemistry. In fact, the vibrational absorption of a single acetylene molecule on a Cu(100) surface was recently 
reported [56]. Its endurance is due to the fact that it provides detailed information on structure, dynamics and 
environment. It is employed in a wide variety of circumstances, from routine analytical applications, to 
identifying novel (often transient) species, to providing some of the most important data for advancing the 
understanding of intramolecular and intermolecular interactions. 
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B1.3 Raman spectroscopy 

Darin J Ulness, Jason C Kirkwood and A C Albrecht 


B1.3.1 INTRODUCTION 

Light, made monochromatic and incident upon a sample, is scattered and transmitted at the incident 
frequency — a process known as Rayleigh scattering. Early in 1928 C V Raman and K S Krishnan reported [I, 
2 and 3] the visual discovery of a new form of secondary radiation. Sunlight, passed through a blue-violet 
filter, was used as the light source incident upon many different organic liquids and even vapours. A green 
filter was placed between the sample and the viewer. The filters were sufficiently complementary to suppress 
the strong Rayleigh scattering and leave at longer wavelengths a feeble new kind of scattered radiation. 
Impurity fluorescence was discounted, for the signal was robust under purification and it also was strongly 
polarized. It was suggested that the incident photons had undergone inelastic scattering with the material — 
much as the Compton scattering of x-rays by electrons. Almost simultaneously, G Landsberg and L 
Mandelstam [4] reported a new kind of secondary radiation from crystalline quartz illuminated by the lines of 
the mercury vapour lamp. Spectrograms revealed a feeble satellite line to the red of each of the Rayleigh 
scattered mercury lines. In each case the displacement came close to a characteristic vibrational frequency of 

quartz at -480 cm . They wondered whether their new secondary radiation might not be of the same type as 
that seen by Krishnan and Raman. Such inelastic scattering of photons soon came to be called (spontaneous) 
Raman scattering or, given its quantized energy displacements, simply (spontaneous) Raman spectroscopy 
[5]. 

The 70 years since these first observations have witnessed dramatic developments in Raman spectroscopy, 
particularly with the advent of lasers. By now, a large variety of Raman spectroscopies have appeared, each 
with its own acronym. They all share the common trait of using high energy ('optical') light to probe small 
energy level spacings in matter. 

This chapter is aimed at any scientist who wishes to become acquainted with this broad and interesting field. 
At the start we outline and present the principles and modern theoretical structure that unifies the many 
versions of Raman spectroscopies currently encountered. Then we sketch, briefly, individual examples from 
the contemporary literature of many of the Raman spectroscopies, indicating various applications. Though the 
theoretical structure is intended to stand on its own when discussing any one subject, there is no pretence of 
completeness, nor depth — this is not a review. But it is hoped that through the selected citations the interested 
reader is easily led into the broader literature on any given topic — including investigations yet to appear. 

The study of small energy gaps in matter using the 'optical' spectral region (say the near-IR, visible and UV) 
offers many advantages over direct one-photon spectroscopies in the IR, far IR or even the microwave. First, 


it is instrumentally convenient. Second, one can readily avoid the problem of absorbing solvents when 
studying dissolved solutes. Finally, in the optical region, additional strong resonances (usually involving 
electronic transitions) are available which can enhance the Raman scattered intensity by many orders of 
magnitude. This has created the very lively field of resonance Raman spectroscopy — arguably one of the most 
popular current Raman based spectroscopies. Resonance Raman spectroscopy not only provides greatly 
amplified signals from specific Raman resonances in the ground state, but it also exposes useful properties of 
the upper electronic potential energy hypersurface that is reached in the optical resonance. 


In general, the coupling of light with matter has as its leading term the electric field component of the 
electromagnetic (EM) radiation — extending from the microwave into the vacuum ultraviolet. In the optical 
region (or near optical), where the Raman spectroscopies are found, light fields oscillate in the 'petahertz' 

range (of the order of 10 cycles per second). So for technical reasons the signals are detected as photons 
('quadrature in the field') — not as the oscillating field itself. As in any spectroscopy, Raman spectroscopies 
measure eigenvalue differences, dephasing times (through the bandwidths, or through time-resolved 
measurements) and quantitative details concerning the strength of the light/matter interaction — through the 
scattering cross-sections obtained from absolute Raman intensities. The small energy gap states of matter that 
are explored in Raman spectroscopy include phonon-like lattice modes (where Brillouin scattering can be the 
Raman scattering equivalent), molecular rotations (rotational Raman scattering), internal vibrations of 
molecules (vibrational Raman scattering) and even low lying electronic states (electronic Raman scattering). 
Also spin states may be probed through the magnetic component of the EM field. With the introduction of 
lasers, the Raman spectroscopies have been brought to a new level of sensitivity as powerful analytic tools for 
probing samples from the microscopic level (microprobes) to remote sensing. Not only have lasers inspired 
the development of new techniques and instrumentation, but they also have spawned more than 25 new kinds 
of Raman spectroscopy. As we shall see, this growth in experimental diversity has been accomplished by 
increasingly comprehensive theoretical understanding of all the spectroscopies. 

The many Raman spectroscopies are found as well defined subgroups among the 'electric field' 
spectroscopies in general. (In this chapter, the magnetic field spectroscopies are mentioned only in passing.) 
First, except for very high intensities, the energy of interaction of light with matter is sufficiently weak to 
regard modern spectroscopies as classifiable according to perturbative orders in the electric field of the light. 
Thus any given spectroscopy is regarded as being linear or nonlinear with respect to the incident light fields. 
In another major classification, a given spectroscopy (linear or nonlinear) is said to be active ox passive [6, 7]. 
The active spectroscopies are those in which the principal event is a change of state population in the material. 
In order to conserve energy this must be accompanied by an appropriate change of photon numbers in the 
light field. Thus net energy is transferred between light and matter in a manner that survives averaging over 
many cycles of the perturbing light waves. We call these the Class I spectroscopies. They constitute all of the 
well known absorption and emission spectroscopies — whether they are one-photon or multi-photon. The 
passive spectroscopies, called Class II, arise from the momentary exchange of energy between light and 
matter that induces a macroscopically coherent, oscillating, electrical polarization (an oscillating electric 
dipole density wave) in the material. As long as this coherence is sustained, such polarization can serve as a 
source term in the wave equation for the electric field. A new (EM) field (the signal field) is produced at the 
frequency of the oscillating polarization in the sample. Provided the polarization wave retains some 
coherence, and matches the signal field in direction and wavelength (a condition called 'phase matching'), the 
new EM field can build up and escape the sample and ultimately be measured 'in quadrature' (as photons). In 
their extreme form, when no material resonances are operative, the Class II events ('spectroscopies' is a 
misnomer in the absence of resonances!) will alter only the states of the EM radiation and none of the 
material. In this case, the material acts passively while 'catalysing' alterations in the radiation. When 
resonances are present, Class II events become spectroscopies; some net energy may be transferred between 
light and matter, even as one focuses experimentally not on population changes in the material, but on 
alterations of the radiation. Class II events include all of the resonant and nonresonant, linear and nonlinear 
dispersions. Examples of Class II spectroscopies are classical diffraction and reflection (strongest at the linear 
level) and a whole array of light-scattering phenomena such as frequency summing (harmonic generation), 


frequency differencing, free induction decay and optical echoes. 
The Class II (passive) spectroscopies 


(i) may or may not contain resonances, 

(ii) can appear at all orders of the incident field (but only at odd order for isotropic media (gases, liquids, 

amorphous solids)), 
(iii) have a cross-section that is quadratic in concentration when the signal wave is homodyne detected and 
(iv) require phase-matching through experimental design, because the signal field must be allowed to build 

up from the induced polarization over macroscopic distances. 

In table B 1.3.1 and table Bl.3.2 , we assemble the more than twenty-five Raman spectroscopies and order 
them according to their degree of nonlinearity and by their class ( table B 1.3.1 for Class I, table Bl.3.2 for 
Class II). 

A diagrammatic approach that can unify the theory underlying these many spectroscopies is presented. The 
most complete theoretical treatment is achieved by applying statistical quantum mechanics in the form of the 
time evolution of the light/matter density operator. (It is recommended that anyone interested in advanced 
study of this topic should familiarize themselves with density operator formalism [8, 9, 10, H and 12 ]. Most 
books on nonlinear optics [13, 14, 15, 16 and 17] and nonlinear optical spectroscopy [18, 19] treat this in 
much detail.) Once the density operator is known at any time and position within a material, its matrix in the 
eigenstate basis set of the constituents (usually molecules) can be determined. The ensemble averaged 
electrical polarization, P, is then obtained — the centrepiece of all spectroscopies based on the electric 
component of the EM field. 

Following the section on theory, the chapter goes on to present examples of most of the Raman spectroscopies 
that are organized in table B 1.3.1 and table Bl.3.2 . 

The Class I (active) spectroscopies, both linear and nonlinear [6, 7], 

(i) always require resonances, 

(ii) appear only at odd order in the incident fields, which are acting 'maximally in quadrature'. (That is, in 
polarizing the medium all but one of the fields act in pairs of Fourier components — also known as 'in 
quadrature' or as 'conjugate pairs'. When this happens, the electrical polarization must carry the same 
frequency, wavelength, and wave vector as the odd, unpaired field. Since this odd order polarization acts 
conjugately with the odd incident field, the 'photon' picture of the spectroscopy survives.), 

(iii) have a cross-section that is linear in concentration of the resonant species and 

(iv) do not require phase matching by experimental design, for it is automatic. 


B1. 3.2 THEORY 1 

The oscillating electric dipole density, P (the polarization), that is induced by the total incident electric field, 
E is the principal property that generates all of the spectroscopies, both linear and nonlinear. The energy 
contained in P may be used in part (or altogether) to shift the population of energy states of the material, or it 
may in part (or altogether) reappear in the form of a new EM field oscillating at the same frequency. When the 
population changes, or energy loss or gain in the light is detected, one is engaged in a Class I spectroscopy. 
On the other hand, when properties of the new field are being measured — such as its frequency, direction 
(wavevector), state of polarization and amplitude (or intensity) — one has a Class II spectroscopy. 


Normally the amplitude of the total incident field (or intensity of the incident light) is such that the 
light/matter coupling energies are sufficiently weak not to compete seriously with the 'dark' matter 
Hamiltonian. As already noted, when this is the case, the induced polarization, P is treated perturbatively in 
orders of the total electric field. Thus one writes 

piU p.". /i- 1 -. p"i 

, — A — ■* "* n. - ^ "• * * *. (B1 3 1) 

where the successive terms clearly appear with increasing order of nonlinearity in the total field, E. At this 
point, all properties appearing in equation (B 1.3.1) are mathematically pure real. The response function of the 

material to the electric field acting at sth order is the electrical susceptibility, X> s ' (it is an s + 1 rank tensor). 
Each element of this tensor will carry s + 1 subscripts, a notation that is used, understandably, only when 
necessary. Furthermore, the events at, say the sth order, are sometimes referred to as 's + 1 -wave-mixing' — 
the additional field being the new EM field derived from P^ s \ 

All nonlinear (electric field) spectroscopies are to be found in all terms of equation (B 1.3.1) except for the 
first. The latter exclusively accounts for the standard linear spectroscopies — one-photon absorption and 
emission (Class I) and linear dispersion (Class II). For example, the term at third order contains by far the 
majority of the modern Raman spectroscopies ( table B 1.3.1 and table Bl.3.2 ). 

It is useful to recognize that in the laboratory one normally configures an experiment to concentrate on one 
particular kind of spectroscopy, even while all possible light/matter events must be occurring in the sample. 
How can one isolate from equation (B 1.3.1) a spectroscopy of interest? In particular, how does one uncover 
the Raman spectroscopies? We shall see how passage to the complex mathematical representation of the 
various properties is invaluable in this process. It is useful to start by addressing the issue of distinguishing 
Class I and Class II spectroscopies at any given order of nonlinearity . 

B1. 3.2.1 CLASS I AND CLASS II SPECTROSCOPIES AND THE COMPLEX SUSCEPTIBILITIES 

All Class I spectroscopies at any order can be exposed by considering the long-term exchange of energy 
between light and matter as judged by the nonvanishing of the induced power density over many cycles of the 
field. The instantaneous power density at sth order is given by {e ■ ^— }• For this product to survive over 

time, we ask that its normalized integral over a time J, which is much longer than the optical period of the 
field, should not vanish. This is called the cycle averaged, sth order power density. It is expressed as 


"" = 7f4 


U- _- j d^i-.i—ll (B1.3.2) 


For W^ > 0, one has absorption; for W^ < 0, emission. Multiphoton absorption and emission fall into this 
class. The Class I Raman spectroscopies clearly exhibit a net absorption of energy in Stokes scattering and a 

net emission of energy in anti-Stokes scattering. Though p( s ' involve s actions of the total field, the 
light/matter energy exchange is always in the language of photons. A net energy in the form of photons is 
destroyed (absorbed) as the quantum state population of the material moves upward in energy; a net energy in 
the form of photons is created (emitted) as the 
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population moves downward in energy. To survive the integration, the instantaneous power density should not 
oscillate rapidly (if at all), certainly not at optical frequencies. Since E is intended to consist entirely of fields 
that oscillate at optical frequencies, the power density can have a non-oscillating term only when the field 


appears altogether an even number of times. Since it appears s times in P^ ( equation (B 1.3.1) ) and once in E, 
all Class I spectroscopies exist only when s + 1 is even, or s is odd (see equation (Bl.3.2) ). 

Furthermore, the non-oscillating component of the integrand can best be sorted out by going to the complex 
representation of the total field, the polarization, and the susceptibility. The mathematically pure real 
quantities in equation (Bl.3.2) can be written in their complex representation as follows: 


£=±(e + e*) (B1.3.3) 

P (A >= l(p (A > + j/-°*) (B1.3.4) 
and 

X M _ I^^+a^*) (B1.3.5) 

in which e,/r^ and %^ are, in general, complex quantities whose real parts are given by E, P^ and X\ s \ 
respectively. Introducing equation (B1.3.4)-equation (Bl.3.5) into equation (Bl.3.2) , and applying the cycle 
average theorem for the integral [20], one finds that for all spectroscopies at sth order involving long-term 
light/matter energy exchange (Class I, in particular), the signal measured as a net energy exchange, S^ s \ is 
proportional to the cycle averaged power density, or S^ s ' oc W^ oc Im %( s \ If the complex susceptibility, %( s \ 
is pure real, there can be no long term energy exchange between light and matter. There can be no Class I 

spectroscopies based on a susceptibility component for which Im %^ = 0. 

Consider an ensemble composed of TV constituents (such as molecules) per unit volume. The (complex) 
density operator for this system is developed perturbatively in orders of the applied field, and at sth order is 
given by p( s \ The (complex) sth order contribution to the ensemble averaged polarization is given by the trace 
over the eigenstate basis of the constituents of the product of the dipole operator, N \i and p^:/r^ = Tr{7V |ip 
( s \ In turn, an expression for %( s ' is obtained, which, in the frequency domain, consists of a numerator 
containing a product of (s + 1) transition moment matrix elements and a denominator of s complex energy 

factors. These complex energies express light/matter resonances and allow x to become a complex quantity, 
its Im %( s ) part (pure real) being responsible for Class I spectroscopies. The light/matter resonances introduce 
the imaginary component to %( s ' and permit a Class I spectroscopy to exist. 

As noted, the Class II spectroscopies are based on detecting the new EM field that is derived from the induced 

polarization, F^ s \ at sth order. Here P^ s \ oscillating at optical frequencies, acts as the source term in 
Maxwell's equation to create the new optical field, 2? new , at the same frequency. Again, we recognize s new oc 

p( s ) oc j{s) 

Since optical fields oscillate too quickly for direct detection, they are measured 'in quadrature' — as photons 
(see below). There are two ways to achieve quadrature. One is homodyne detection in which the new field is 
measured at 


its quadrature, Cihtw^w = |£iwwl 2 . These signals must be proportional to \y} s '\ . Thus sf'^homodyne) oc |%^| 

and all phase information in %^ is lost. Such is the case for almost all of the Class II spectroscopies, 
especially the Raman events at third order. 

The second way to achieve quadrature is to introduce another field, E^ Q , (called a local oscillator) designed in 
frequency and wavevector to conjugate (go into quadrature) in its complex representation with the new field 
of interest. Thus in the heterodyne case, the signal photons are derived from Cm-u^. or S^ '(heterodyne) oc y} s \ 


In heterodyne detected s + 1 wave mixing, phase information is retained and one can take a full measure of 

the complex susceptibility, including its phase. The phase of the complex induced polarization, /r% 
determines how its energy will partition between Class I (absorbed or emitted) and Class II (a new EM wave 
is launched) spectroscopies. 

Consider all of the spectroscopies at third order (s = 3). To be as general as possible, suppose the total incident 
field consists of the combination of three experimentally distinct fields (j = 1,2, 3). These can differ in any 
combination of their frequency, polarization and direction of incidence (wavevector). Thus the total field is 
written as 


E = Y^Ej. (B1.3.6) 

/-I 

In using the complex representation ( equation (Bl.3.4) ), they'th electric field is given as 

E y = !te;+£p (B1.3.7) 

where (using Euler's identity) 

€ . _ gO e -i(k/-r-*jf) (B1.3.8) 

£* = (£*)* e i(k ^ J ^ (B1.3.9) 

Here E^ is the amplitude of they'th field and the real part of co. is its (circular) frequency or 'colour'. The real 
part of k. is the product of the unit vector of incidence inside the sample, e^, and its amplitude, -j- 1 . Here ^is 

the wavelength of they'th field inside the sample — X. being the wavelength inside a vacuum and n . being the 
(real) index of refraction of the sample at co.. As implied, all three properties may be complex: the amplitude 
because of an added phase to the field and/or a field that is elliptically (or circularly) polarized; the frequency 
because the field may be growing or decaying in time and the wavevector because the field may be decaying 
and growing according to its location within the sample. 

The total field (equation (Bl.3.6) with equation (Bl.3.7)) is now 

] 3 

£?=- J^y+£p. (B1.3.10) 


It is only a matter of inserting this 'hexanomial', equation (B 1.3. 10) , into equation (B 1.3.1) to organize all 
possible three-beam spectroscopies that might appear at any given order. 

B1. 3.2.2 THE 'GENERATORS' FOR ALL THIRD ORDER SPECTROSCOPIES FROM THE COMPLEX REPRESENTATION OF 
THE FIELD 

In order to develop the theoretical structure that underlies each of the many Raman spectroscopies at third 


order, we use the above complex representation of the incident fields to produce the 'generators' of all 
possible electric field spectroscopies at third order. After this exercise, it is a simple matter to isolate the 
subset that constitutes the entire family of Raman spectroscopies. 

At third order, one must expand £(JJ;? = , (e i + s*} ^ =1 (^j +■ g* } JjJ = { (e* + eJ))to enumerate the 'generators' 

of all possible third order spectroscopies. In this case, any given generator consists of an ordered list of these 
complex fields, such as ffr^yfj- The ordering of the fields in each generator represents a time ordering of the 

actions of the applied fields. This can be of physical significance. Clearly this expansion must give 216 terms 

(6 ). These 216 terms or generators can be arranged into 108 pairs of mutually conjugate generators, since the 
total electric field is itself a quantity that is pure real. Of these 108 paired terms, exactly 27 are in the category 
of what is termed nondegenerate four wave mixing (ND4WM), where the signal frequency must be very far 
from any of the incident optical frequencies. These 27 pairs can generate only Class II spectroscopies and (it 
turns out) none of the Raman spectroscopies. The generic pair for these ND4WM processes is 
£;£;£*■ +e'e'ejj where each of i,j or k can be fields 1, 2 or 3. (Henceforth the factor of Ishall be suppressed 

since it is common to all 216 generators.) The simple algebra of exponents is applied to such a product using 
equation (Bl.3.9) and equation (B 1.3. 10) . Thus, one sees how the polarization wave generated from such a 
term must oscillate at a frequency much larger than any of the incident colours, namely at the (real part of) co 
= (D f + go . + a>£. The polarization wave must have a wavevector given by 

Ru(k p ) = Rc(k, + k,- + k*) = 27i{JjAe K + (^ W + (lj) e M' 

The appropriate (complex) susceptibility tensor for this generator is %^\(0 = co. + co. + coA 

p l J K 

When resonances, or near resonances, are present in the 4WM process, the ordering of the field actions in the 
perturbative treatment ( equation (B 1.3.1) ), can be highly significant. Though the three-colour generators 
(s-8^ S/Sy^? e k e i e P ' ' ') ^ ave identical frequency and wavevector algebra, their associated susceptibility 

functions (x (3) (co p = w. + co . + (o k ) 9 x (3) (a> p = to,- + <o k + co •), x (3) (<*> = <*>* + <o . + © •), . . .) are, in general, 
different. As a result of the different colour ordering, two of their three energy denominator factors must 
differ. For this reason, the field ordering in each generator, together with its own response function, must be 
regarded individually. 

The fourth electromagnetic wave, ^ new , shall henceforth be called the signal wave, £< = j fe + £*)• It always 

must carry the same frequency as that of the polarization wave (co = co ). It is launched by the collapse of this 
wave, provided that the polarization extends coherently over at least a few wavelengths of the incident light 
and that k = k g . 


-8- 

The latter condition corresponds to the phase matching requirement already mentioned — the wavelength and 
direction of the material polarization wave must match those of the new EM wave as closely as possible. 
However, for all Class I spectroscopies, this condition is automatically achieved because of quadrature. In 
fact, this is true for all 'quadrature' spectroscopies — the Class I spectroscopies being the principal such, but, 
as noted, it is a nontrivial requirement in the 'nonquadrature' Class II spectroscopies, particularly in optically 
dispersive media. 

In the complex mathematical representation, 'quadrature' means that, at the (s + 1) wave mixing level, the 
product of s input fields constituting the sth order generator and the signal field can be organized as a product 
of (s + l)/2 conjugately paired fields. Such a pair for field i is given by One sees that the exponent algebra for 


such a pair removes all dependence on the Re(cD.) and Re(k), thus automatically removing all oscillations as 
well as satisfying the phase-matching requirement. A necessary (but not sufficient) requirement for full 
quadrature is for s to be odd and also that half of the 5+1 fields be found to act conjugately with respect to 
the other half. Thus, for s + 1 fields, whenever the number of nonconjugate fields differs from the number of 
conjugate fields, one can only have 'nonquadrature'. Phase matching then must become an issue. This must 
always be the case for odd-wave mixing (s is even), and it is also true for the above set of 27 frequency 
summing generators (for ND4WM). Thus for the currently considered generator, s^e-e^, all three fields (i,j 
and k) act nonconjugately, so quadrature at the four-wave level simply is not possible. 

B1. 3.2.3 THE FIELD GENERATORS FOR ALL THIRD ORDER RAMAN SPECTROSCOPIES 

We are now prepared to uncover all of the third order Raman spectroscopies ( table B 1.3.1 and table Bl.3.2 ) 
and, in doing so, indicate how one might proceed to reveal the Raman events at higher order as well. Of the 
108 pairs of conjugate generators, the remaining 81, unlike the above 27, are characterized by having one of 
the three fields acting conjugately with respect to the remaining two. Nine of these 81 terms generate fully 

degenerate 4WM. They constitute the 'one-colour' terms: and with susceptibility %^\(0 = co z ), since a> s = -co ? . 

+ cd. + cd. = cd. with wavevector k = -k- + k + k = k . Their full degeneracy is evident since all four fields 

iii j i i i i "" 

carry the same frequency (apart from sign). Resonances appear in the electric susceptibilities when, by choice 
of incident colours and their signs, one or more of their energy denominators (s in number at sth order) 
approaches a very small value because the appropriate algebraic colour combination matches material energy 
gaps. All Raman spectroscopies must, by definition, contain at least one low frequency resonance. When 
using only optical frequencies, this can only be achieved by having two fields acting conjugately and 
possessing a difference frequency that matches the material resonance. Further, they must act in the first two 
steps along the path to the third order polarization of the sample. These first two steps together prepare the 
Raman resonant material coherence and can be referred to as the 'doorway' stage of the Raman 4WM event. 

Suppose the incident colours are such that the required Raman resonance (cd r ) appears at cd 1 - cd 2 - cd r (with 
cd 1 > cd 2 ). Thus the appropriate generators for the two-step doorway stage, common to all of the Raman 
spectroscopies, must be and , (and their conjugates, and ). These differ only in the permutation of the ordering 
of the actions of fields 1 and 2. The usual algebra of exponents tells us that this doorway stage produces an 
intermediate polarization that oscillates at cd 1 - cd 2 ~ cd r . The resonantly produced Raman coherence having 
been established, it remains only to probe this intermediate polarization, that is, to convert it into an optical 
polarization from which the signal at optical frequencies is prepared. This is accomplished by the action of the 
third field which converts the low frequency Raman coherence into an optical one. This in turn, leads to the 
signal. This last two-step event can be referred to as the 'window' stage of the 4WM process. Obviously there 
are at most only six possible choices for this 


third field: s. and ^J withy = 1, 2, 3. In this way we have isolated from the original 108 4WM generators the 

12 that are responsible for all of the Raman spectroscopies. The choice of the probe field determines the 
frequency of the signal as well as its wavevector. While all types of Raman signal must be present at some 
level of intensity in any given experiment, it is a matter of experimental design — the detected cd § and the 
aperture selected k § — as to which one Raman spectroscopy is actually being studied. 

This classification by field generators in the complex field representation goes a long way towards organizing 
the nonlinear spectroscopies that carry at least one resonance. However it must be remembered that, in 
emphasizing a field ordering that locates the essential Raman resonance, we have neglected any other possible 
resonant and all nonresonant contributions to the third order polarization. While any additional resonance(s) 
are important to both Class I and Class II spectroscopies, the nonresonant contributions play no role in Class I 
spectroscopies, but must not be ignored in Class II studies. If one starts with the generator e^e^and then 

permutes the ordering of the three distinct complex field amplitudes, one arrives at 6 (3!) generators, only two 


of which induce the Raman resonance at co 1 - a> 2 « co R (^i*^ and s^i^i). The remaining four can only 

polarize the material without this Raman resonance (co 3 ^ o> 1? co 2 ), but, if otherwise resonant, it can interfere 
with the Raman lineshape. Also when the issue of time resolved spectroscopies arises, the time ordering of 
field actions is under experimental control, and the active field generators are limited not only by 
requirements of resonance but by their actual time ordering in the laboratory. 

In any case, the polarizing action upon the material by a given generator must be followed in more detail. In 
density matrix evolution, each specified field action transforms either the 'kef or the 'bra' side of the density 

matrix. Thus for any specified sth order generator, there can be 2 s detailed paths of evolution. In addition, 
evolution for each of the s\ generators corresponding to all possible field orderings must be considered. One 

then has altogether 2 s paths of evolution. For s = 3 there are 48 — eight for each of six generators. 

B1. 3.2.4 TIME EVOLUTION OF THE THIRD ORDER POLARIZATION BY WAVE MIXING ENERGY LEVEL (WMEL) 
DIAGRAMS. THE RAMAN SPECTROSCOPIES CLASSIFIED 

The general task is to trace the evolution of the third order polarization of the material created by each of the 
above 12 Raman field operators. For brevity, we choose to select only the subset of eight that is based on two 
colours only — a situation that is common to almost all of the Raman spectroscopies. Three-colour Raman 
studies are rather rare, but are most interesting, as demonstrated at both third and fifth order by the work in 
Wright's laboratory [21, 22, 23 and 24]- That work anticipates variations that include infrared resonances and 
the birth of doubly resonant vibrational spectroscopy (DOVE) and its two-dimensional Fourier transform 
representations analogous to 2D NMR [25]. 

Interestingly, three-colour spectroscopies at third order can only be of Class II, since the generators cannot 
possibly contain any quadrature. Maximal quadrature is necessary for Class I. 

For the two-colour spectroscopies, maximal quadrature is possible and both Class I and Class II events are 
accounted for. Thus we trace diagrammatically the evolution to produce a signal field caused by the eight 
generators: (i) the four (£|ej)e| + (s i £*)£*> (£|£*)£|and (e|£*)f 2 and (ii) the four with the first two fields 

permuted, (^iJcf* (£^])e^{£|£|}£|and(e^i-£j. 

In each case we have indicated the doorway stage using parentheses. We note how in each of the two groups 
((i)and 
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The appropriate (complex) susceptibility tensor for this generator is x^ 3 \(^ = co. + a>. + co^). 

When resonances, or near resonances, are present in the 4WM process, the ordering of the field actions in the 
perturbative treatment ( equation (B 1.3.1) ), can be highly significant. Though the three-colour generators 
(zf-£ k , £.8^., s £ £ ; £ > • • •) have identical frequency and wavevector algebra, their associated susceptibility 

functions (x (3 \co = co. + co. + co^), X^C 03 = co. + co^ + co.), y^\(o = co^ + co. + «.),...) are, in general, 
different. As a result of the different colour ordering, two of their three energy denominator factors must 
differ. For this reason, the field ordering in each generator, together with its own response function, must be 
regarded individually. 

The fourth electromagnetic wave, E , shall henceforth be called the signal wave, £* = ^ Ss + O- It always 

must carry the same frequency as that of the polarization wave (co = co ). It is launched by the collapse of this 
wave, provided that the polarization extends coherently over at least a few wavelengths of the incident light 
and that k = k . 


The latter condition corresponds to the phase matching requirement already mentioned — the wavelength and 
direction of the material polarization wave must match those of the new EM wave as closely as possible. 
However, for all Class I spectroscopies, this condition is automatically achieved because of quadrature. In 
fact, this is true for all 'quadrature' spectroscopies — the Class I spectroscopies being the principal such, but, 
as noted, it is a nontrivial requirement in the 'nonquadrature' Class II spectroscopies, particularly in optically 
dispersive media. 

In the complex mathematical representation, 'quadrature' means that, at the (s + 1) wave mixing level, the 
product of s input fields constituting the sth order generator and the signal field can be organized as a product 
of (s + l)/2 conjugately paired fields. Such a pair for field / is given by & s §* = |^ ; | 2 b One sees that the exponent 

algebra for such a pair removes all dependence on the Re(co.) and Re(k), thus automatically removing all 
oscillations as well as satisfying the phase-matching requirement. A necessary (but not sufficient) requirement 
for full quadrature is for s to be odd and also that half of the 5+1 fields be found to act conjugately with 
respect to the other half. Thus, for s + 1 fields, whenever the number of nonconjugate fields differs from the 
number of conjugate fields, one can only have 'nonquadrature'. Phase matching then must become an issue. 
This must always be the case for odd-wave mixing (s is even), and it is also true for the above set of 27 
frequency summing generators (for ND4WM). Thus for the currently considered generator, s.e.e^, all three 
fields (i, j and k) act nonconjugately, so quadrature at the four- wave level simply is not possible. 

B1. 3.2.3 THE FIELD GENERATORS FOR ALL THIRD ORDER RAMAN SPECTROSCOPIES 

We are now prepared to uncover all of the third order Raman spectroscopies ( table B 1.3.1 and table Bl.3.2 ) 
and, in doing so, indicate how one might proceed to reveal the Raman events at higher order as well. Of the 
108 pairs of conjugate generators, the remaining 81, unlike the above 27, are characterized by having one of 
the three fields acting conjugately with respect to the remaining two. Nine of these 81 terms generate fully 
degenerate 4WM. They constitute the 'one-colour' terms: s^fSi +cc, €*£;£* +crand 

£(-*ve* +oc (i = L 2, or 3) with susceptibility %®\(0 = co), since <d =-<d. + <d. + <d. = <d. with wavevector k = 
-k + k + k = k . Their full degeneracy is evident since all four fields carry the same frequency (apart from 
sign). Resonances appear in the electric susceptibilities when, by choice of incident colours and their signs, 
one or more of their energy denominators (s in number at sth order) approaches a very small value because the 
appropriate algebraic colour combination matches material energy gaps. All Raman spectroscopies must, by 
definition, contain at least one low frequency resonance. When using only optical frequencies, 
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chromophore). This matter-bath interaction is an important, usually dominant, source of damping of the 
macroscopic coherence. The radiative blackbody (bb) background is another ever-present bath that imposes 
radiative damping of excited states, also destroying coherences. 

For s = 3, the time evolution of the system is tracked by following the stepwise changes in the bra state, {/|, or 
the ket state, \k), of the system caused by each of the three successive field interventions. This perturbative 
evolution of the density operator, or of the density matrix, is conveniently depicted diagrammatically using 
double sided Feynman diagrams or, equivalently, the WMEL diagrams. The latter are preferred since 
light/matter resonances are explicitly exposed. In WMEL diagrams, the energy levels of the constituents of 
the matter are laid out as solid horizontal lines to indicate the states (called 'real') that are active in a 
resonance, and as dashed horizontal lines (or no lines) when they serve as nonresonant ('virtual') states. The 
perturbative evolution of the density matrix is depicted using vertically oriented arrows for each of the field 
actions that appears in a given generator. These arrows are placed from left to right in the diagram in the same 
order as the corresponding field action in the generator. The arrow length is scaled to the frequency of the 
acting field. Solid arrows indicate evolution from the old ket (tail of arrow) to the new ket (head of arrow); 
dashed arrows indicate evolution from the old bra (tail of arrow) to the new bra (head of arrow). For a field 
acting nonconjugately, like s ., the frequency is positively signed, co ., and the arrow for a ket change points up 


and that for a bra change points down. When the field acts conjugately, , the frequency is negatively signed, - 
o^., and a ket changing arrow points down, while a bra changing arrow points up. These rules allow one to 
depict diagrammatically any and all density matrix evolutions at any order. Given the option of a bra or a ket 

change at each field action, one sees how a given sth order generator leads to 2 s diagrams, or paths of 
evolution. Normally only some (if any) encounter resonances. A recipe has been published [6] that allows one 
to translate any WMEL diagram into the analytic expression for its corresponding electrical susceptibility. 
After s arrows have appeared (for an sth order evolution), the (s + l)th field is indicated for any WMEL 
diagram of the nonquadrature class by a vertical wavy line segment whose vertical length scales to the signal 
frequency. For the WMEL diagrams of the full quadrature sort, the (s + l)th field must be conjugate to one of 
the incident fields, so the wavy segment becomes a wavy arrow; either solid (ket-side action) or dashed (bra- 
side action). 

Of the four possible WMEL diagrams for each the and doorway generators, only one encounters the Raman 
resonance in each case. We start with two parallel horizontal solid lines, together representing the energy gap 
of a Raman resonance. For ket evolution using , we start on the left at the lowest solid line (the ground state, 
g) and draw a long solid arrow pointing up (+03^, followed just to the right by a shorter solid arrow pointing 
down (-co 2 ) to reach the upper solid horizontal line,/ The head of the first arrow brings the ket to a virtual 
state, from which the second arrow carries the ket to the upper of the two levels of the Raman transition. Since 
the bra is until now unchanged, it remains in g ((g|); this doorway event leaves the density matrix at second 
order off-diagonal in which is not zero. Thus a Raman coherence has been established. Analogously, the 
doorway action on the ket side must be short solid arrow down (-co 2 ) from g to a virtual ket state, then long 
arrow up (+03^ to/from the virtual state. This evolution also produces . Both doorway actions contain the 
same Raman resonance denominator, but differ in the denominator appearing at the first step; the downward 
action is inherently anti-resonant ('N' for nonresonant) in the first step, the upward action is potentially 
resonant ('R' for resonant) in the first step and is therefore stronger. Accordingly, we distinguish these two 
doorway events by labels £> N and £) R , respectively (see figure Bl.3.2 . In resonance Raman spectroscopy, this 
first step in D R is fully resonant and overwhelms D^. (The neglect of £> N is known as the rotating wave 
approximation.) It is unnecessary to explore the bra-side version of these doorway actions, for they would 
appear in the fully conjugate version of these doorway events. Each of the doorway steps, D R 
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and £> N , may be followed by any one of eight window events. The WMEL diagrams for the window events 
consist of the arrow for the last step of the third order polarization and the wavy segment for the signal wave. 
There are eight such window diagrams since each of the two steps can involve two colours and either bra- or 
ket-side evolution. These eight window WMEL diagrams are shown in figure B 1.3. 2(a) and figure B 1.3. 2(b) 
and are identified alphabetically. These also carry potentially resonant and anti-resonant properties in the third 
energy denominator (the first window step) and accordingly are labelled W R and W^, where, as before, W R > 
W N . If the third step is completely resonant, W R » W N , and ^ N may be completely neglected (as with D R 

>>d n)- 


Figure Bl.3.2. The separate WMEL diagrams for the doorway and window stages of the Raman 
spectroscopies. Solid and dashed vertical arrows correspond to ket- and bra-side light/matter interactions, 
respectively. The signal field is denoted by the vertical wavy line (arrow). The ground and final molecular 
levels (solid horizontal lines) are labelled g and/ while the virtual levels (dashed horizontal lines) are labelled 
j and k. The associated generators are given below each diagram. The doorway/window stages are classified 
as potentially resonant (D^JW^) or certainly nonresonant (D^/W^). In addition, the window stages are 
labelled alphabetically in order to distinguish the Raman techniques by their window stage WMEL diagram(s) 


(as in table B 1.3.1 and table Bl.3.2 and figure B 1.3.1). (a) The doorway and window stage WMEL diagrams 
for SR, SRS and RRS. (b) The doorway and window stage WMEL diagrams for CARS and CSRS. 
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It is now possible to label every one of the Raman spectroscopies listed in table B 1.3.1 and table Bl.3.2 
according to its essential doorway/window WMEL diagram. This is shown in the third column of those tables. 
Again, the analytic form of the associated susceptibilities is obtained by recipe from the diagrams. When 
additional resonances are present, other WMEL diagrams must be included for both Class I and Class II 
spectroscopies. For the Class II spectroscopies, all of the nonresonant WMEL diagrams must be included as 
well. 

B1.3.2.5 THE MICROSCOPIC HYPERPOLARIZABILITY TENSOR, ORIENTATIONAL AVERAGING, THE KRAMERS- 
HEISENBERG EXPRESSION AND DEPOLARIZATION RATIOS 

As implied by the trace expression for the macroscopic optical polarization, the macroscopic electrical 
susceptibility tensor at any order can be written in terms of an ensemble average over the microscopic 
nonlinear polarizability tensors of the individual constituents. 

(A) MICROSCOPIC HYPERPOLARIZABILITY AND ORIENTATIONAL AVERAGING 

Consider an isotropic medium that consists of independent and identical microscopic chromophores 
(molecules) at number density N. At sth order, each element of the macroscopic susceptibility tensor, given in 
laboratory Cartesian coordinates A, B, C, D, must carry 5+1 (laboratory) Cartesian indices (X, 7 or Z) and 

therefore number altogether 3^ \ Thus the third order susceptibility tensor contains 81 elements. Each tensor 
element of the macroscopic susceptibility is directly proportional to the sum over all elements of the 
corresponding microscopic, or molecular, hyperpolarizability tensor. The latter are expressed in terms of the 
four local (molecule based) Cartesian coordinates, a, b, c, d (each can be x, y, or z — accounting for all 81 
elements of the microscopic tensor). To account for the contribution to the macroscopic susceptibility from 
each molecule, one must sum over all molecules in a unit volume, explicitly treating all possible orientations, 
since the projection of a microscopic induced dipole onto theX, Y and Z laboratory coordinates depends very 
much on the molecular orientation. This is accomplished by averaging the microscopic hyperpolarizability 
contribution to the susceptibility over a normalized distribution of orientations, and then simply multiplying 
by the number density, N. Let the orientational averaging be denoted by (. . .). For any macroscopic tensor 
element, ¥?*-, one finds 

&D = ^~ N E ((A<«)(B t b)<C,c)iD.J)) Y , lM (B1.3.11) 

where (A, a) etc are the direction cosines linking the specified local Cartesian axes with specified laboratory 
axes and L is a 'local' field factor. (The field experienced by the molecule is the incident field altered by 
polarization effects.) Often the ft factor is absorbed into the definition of the sth order hyperpolarizability. 

The microscopic polarization involves four molecular based transition moment vectors — the induced dipole is 
along a, the first index. The transition moments along b, c and d are coupled to the laboratory axes, B, C and 
D, respectively, along which the successive incident (or black-body) fields are polarized. The four transition 
moment unit vectors have been extracted and projected onto the laboratory axes: A — the direction of the 
induced macroscopic polarization, B — the polarization of the first acting field, C — the polarization of the 
second acting field and D — the polarization of the third acting field. The product of the four direction cosines 
is subjected to the orientational averaging process, as indicated. Each such average belongs to its 


corresponding scalar tensor element y nhn j. It is important to sum over all 


-14- 

local Cartesian indices, since all elements of y can contribute to each tensor element of %( \ At third order, the 
averaging over the projection of microscopic unit vectors to macroscopic, such as those in equation (B 1.3. 11) , 
is identical to that found in two-photon spectroscopy (another Class I spectroscopy at third order). This has 
been treated in a general way (including circularly polarized light) according to molecular symmetry 
considerations by Monson and McClain [26]. 

For an isotropic material, all orientations are equally probable and all such products that have an odd number 
of 'like' direction cosines will vanish upon averaging-. This restricts the nonvanishing tensor elements to 
those such as Xaaaa* Xabba q ^ c - Similarly for the elements y abcd - Such orientational averaging is crucial in 
dictating how the signal field in any spectroscopy is polarized. In turn, polarization measurements can lead to 
important quantitative information about the elements of the macroscopic and microscopic tensors. 

The passage from microscopic to macroscopic ( equation (B 1.3. 11) ) clearly exposes the additivity of the 

microscopic hyperpolarizabilities. Significantly, it is seen immediately why x is linear in concentration, N. 
This brings out one of the major distinctions between the Class I and Class II spectroscopies (see item (iii) in 

section B 1.3.1). The signals from Class I, being proportional to Im yy\ are linear in concentration. Those 
(homodyne signals) from Class II are proportional to \^Pt an d therefore must be quadratic in concentration. 
(However, Class II spectroscopies that are heterodyne detected are proportional to yy> and are linear in N.) 

(B) THE MICROSCOPIC HYPERPOLARIZABILITY IN TERMS OF THE LINEAR POLARIZABILITY: THE KRAMERS- 
HEISENBERG EQUATION AND PLACZEK LINEAR POLARIZABILITY THEORY OF THE RAMAN EFFECT 

The original Placzek theory of Raman scattering [30] was in terms of the linear, or first order microscopic 
polarizability, a (a second rank tensor), not the third order hyperpolarizability, y (a fourth rank tensor). The 
Dirac and Kramers-Heisenberg quantum theory for linear dispersion did account for Raman scattering. It 
turns out that this link of properties at third order to those at first order works well for the electronically 
nonresonant Raman processes, but it cannot hold rigorously for the fully (triply) resonant Raman 
spectroscopies. However, provided one discards the important line shaping phenomenon called 'pure 
dephasing', one can show how the third order susceptibility does reduce to the treatment based on the (linear) 
polarizability tensor [6, 27 ]. 

What is the phenomenon 'pure dephasing' that one cannot formally encounter in the linear polarizability 
theories of Raman spectroscopies? It arises when theory is obliged to treat the environment of the 
spectroscopically active entity as a 'bath' that statistically modulates its states. In simple terms, there are two 
mechanisms for the irreversible decay of a coherent state such as a macroscopically coherent polarization 
wave. One involves the destruction by the bath of the local induced dipoles that make up the wave (a lifetime 
effect); the other involves the bath induced randomization of these induced dipoles (without their destruction). 
This latter mechanism is called pure dephasing. Together, their action is responsible for dephasing of a pure 
coherence. In addition, if the system is inherently inhomogeneous in the distribution of the two-level energy 
gap of the coherence, the local coherences will oscillate at slightly different frequencies causing, as these walk 
off, the macroscopic coherence, and its signal, to decay — even while the individual local coherences might 
not. This is especially important for the Class II spectroscopies. Unlike the first two dephasing mechanisms, 
this third kind can be reversed by attending to signals generated by the appropriate Fourier components of the 
subsequent field actions. The original macroscopic coherence will be reassembled (at least partially) in due 
course, to produce a renewed signal, called an 'echo'. For an electronic coherence made by a single optical 
field, this happens at the four-wave mixing level. However for the Raman spectroscopies, two (conjugately 
acting) optical fields are needed to create the vibrational coherence, and hence the true Raman echo appears at 
the eight-wave mixing level 
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where x is the important susceptibility. (We shall see that a quasi Raman echo can be exploited at the %( 5 > 
level.) 

The important and frequently ignored fact is that Raman theory based on the polarizability tensor cannot 
contain the randomization mechanism for dephasing (pure dephasing). This mechanism is especially 
important in electronically resonant Raman spectroscopy in the condensed phase. The absence of pure 
dephasing in linear polarizability theory arises simply because the perturbative treatment upon which it is 
based involves the independent evolution of the bra and the ket states of the system. Conversely, the third 
order susceptibility approach, based on the perturbative development of the density operator, links together 
the evolution of the bra and ket states and easily incorporates pure dephasing. Furthermore, in resonance 
Raman spectroscopy, it is the pure dephasing mechanism that governs the interesting competition between 
resonance fluorescence and resonance Raman scattering [6]. In the linear polarization theory these are fixed in 
their relation, and the true resonance fluorescence component becomes an indistinguishable part of the Raman 
line shape. (However, interestingly, if the exciting light itself is incoherent, or the exciting light consists of 
sufficiently short pulses, even the linear polarizability theory tells how the resonance fluorescence-like 
component becomes distinguishable from the Raman-like signal [20].) 

At the linear level, the microscopic induced dipole vector on a single molecule in the local Cartesian 

coordinate system is simply written as jjA 1 ) = a E where E is the applied field also expressed in the local 
Cartesian system. In full matrix language, in which the local second rank polarizability tensor is exposed, we 
can write: 

If we neglect pure dephasing, the general tensor element of the third order hyperpolarizability relates to those 
of the first order polarizability tensor according to 

(B1.3.12) 

Here, the linear polarizability, a^ c (co 1 , a> 2 ), corresponds to the doorway stage of the 4WM process while to 
the window stage. We also see the (complex) Raman resonant energy denominator exposed. Of the three 
energy denominator factors required at third order, the remaining two appear, one each, in the two linear 
polarizability tensor elements. 

In fact, each linear polarizability itself consists of a sum of two terms, one potentially resonant and the other 
anti-resonant, corresponding to the two doorway events, D R and £> N , and the window events, W R and W^, 
described above. The hyperpolarizability chosen in equation (B 1.3. 12) happens to belong to the generator. As 
noted, such three-colour generators cannot produce Class I spectroscopies (full quadrature with three colours 
is not possible). Only the two-colour generators are able to create the Class I Raman spectroscopies and, in 
any case, only two colours are normally used for the Class II Raman spectroscopies as well. 

For linear polarizability elements that are pure real, we see that (from equation (B 1.3. 12)) 
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(wi -o>2 -war + Fr 

When co 1 = co^ (two colours), this is relevant to the Class I Raman spectroscopies (see section B 1.3. 2.1). In 


this case we expose a Lorentzian Raman lineshape with an HWHM of y R . At this point, the notation for the 
elements of the polarizability tensor suppresses the identity of the Raman transition, so it is now necessary to 
be more specific. 

Consider Raman transitions between thermalized molecular eigenstate g (ground) and molecular eigenstate/ 
(final). The quantum mechanical expression for a bc responding to colours i andy is the famous (thermalized) 
Kramers-Heisenberg equation [ 29 ] 

(«ft t -<W|, tt)j))st = > f + — (B1.3.13) 

where the notation on the left-hand side recognizes the Raman transition between molecular eigenstates g and 
/ The sum on n is over all molecular eigenstates. One should be reminded that the 'ground state' should 
actually be a thermal distribution over Boltzmann weighted states. Thus at the hyperpolarizability level, one 
would write ^ ^(Yabt-j)^f, where W is the appropriate Boltzmann weighting factor, exp[-fi(D JkT\. This 
detail is suppressed in what follows. 

To branch into electronic, vibrational and rotational Raman spectroscopy, the Born-Oppenheimer (B-O) 
approximation must be introduced, as needed, to replace the molecular eigenstates as rovibronic products. For 
example, consider vibrational Raman scattering within the ground electronic state (or, analogously, within any 
other electronic state). For scattering between vibrational levels v and V in the ground electronic state, we 
expand the molecular eigenstate notation to \g) = \g)\v) and ]/) = |g)|v f ) (the intermediate states, \n), may be left 
as molecular eigenstates). The curved bracket refers to the electronic eigenstate and the straight bracket to the 
vibrational states (where until now it referred to the molecular eigenstate). Now equation (B 1.3. 13) becomes 

We note that the expression in brackets is just the b c tensor element of the electronic polarizability in the 
ground electronic state, «"£L(ft?j, £>)• Thus 

(fffa.(WnW;)W = ("K'^m^;)I^. (B1.3.14) 

Since the vibrational eigenstates of the ground electronic state constitute an orthonormal basis set, the off- 
diagonal matrix elements in equation (B 1.3. 14) will vanish unless the ground state electronic polarizability 
depends on nuclear coordinates. (This is the Raman analogue of the requirement in infrared spectroscopy that, 
to observe a transition, the electronic dipole moment in the ground electronic state must properly vary with 
nuclear displacements from 
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equilibrium.) Indeed such electronic properties do depend on nuclear coordinates, for in the B-0 
approximation electronic eigenstates are parametrized by the positional coordinates of the constituent nuclei. 
For matrix elements in vibrational space, these coordinates become variables and the electronic polarizability 
(or the electronic dipole moment) is expanded in a Taylor series in nuclear displacements. Usually the normal 
mode approximation is introduced into the vibrational space problem (though it need not be) and the 
expansion is in terms of the normal displacement coordinates, {A Q ; }, of the molecule. Thus for the electronic 


polarizability we have: 

the leading term of which is unable to promote a vibrational transition because 
(v|£ffj!.)o|tO = (Hfcl.)o(i'lv^ = taJl-h^T 1 • (^ e v = v ' situation corresponds to Rayleigh scattering for which 

this leading term is the principal contributor.) The A Q f in the second term is able to promote the scattering of 
fundamentals, since (v|A Q.|v + 1) need not vanish. The A Q A Q. in the third set of terms can cause scattering 
of the first overtones (i =j) and combination states (i #/), etc, for the subsequent terms. As usual in 
spectroscopy, point group theory governs the selection rules for such matrix elements. As already noted these 
are identical to the two-photon selection rules [26], though here in vibrational space. 

This linear polarizability theory of Raman scattering [30] forms the basis for bond polarizability theory of the 
Raman effect. Here the polarizability derivative is discussed in terms of its projection onto bonds of a 
molecule and the concept of additivity and the transferability of such bond specific polarizability derivatives 
can be discussed, and even semiquantitative^ supported. Further, the vibronic (vibrational-electronic) theory 
of Raman scattering appears at this level. It introduces the Herzberg-Teller development for the nuclear 
coordinate dependence of electronic states, therefore that of the electronic transition moments and hence that 
of the electronic polarizability. This leads to the so-called 'A', 'B' and 'C terms for Raman scattering, each 
having a different analytical form for the dispersive behaviour of the Raman cross-section as the exciting light 
moves from a nonresonant region towards an electronically resonant situation. An early review of these 
subjects can be found in [31]. 

For excitation at a wavenumber d * i.V\ = <oi f2nch an d the Raman wavenumber at t\, the total Raman cross- 
section for scattering in isotropic media-' - onto a spherical surface of 4tt radians, for all analysing 
polarizations, for excitation with linearly polarized or unpolarized light, and integration over the Raman line, 
we have in terms of rotational invariants of the linear polarizability: 

The upper sign is for anti-Stokes scattering, the lower for Stokes scattering. The factor in the parentheses is 

just the fine-structure constant and X , Z , X are the three rotationally invariant tensor elements of the 
hyperpolarizability (or the linear polarizability when pure dephasing is ignored), which are given by: 
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(B1.3.17) 




1Xptr—<Xtip\ (B1.3.18) 


and 


E* = ^^i^V^ + fW^-J^JW^*^ £ |tf^*o^| ? *^^|of^-Of^| 2 (B1.3.19) 

where Q is the Raman resonant energy denominator, co 1 - a> 2 - co R + iy R . With appropriate algebra, one finds 
that their sum is given by: Z + Z + Z = Z y , or in terms of the linear polarizability tensor elements: 

Experimentally, it is these invariants (equation (Bl.3.17), equation (Bl.3.18) and equation (Bl.3.19)) that can 
be obtained by scattering intensity measurements, though clearly not by measuring the total cross-section 
only. 

Measurement of the total Raman cross-section is an experimental challenge. More common are reports of the 
differential Raman cross-section, da R /dQ, which is proportional to the intensity of the scattered radiation that 
falls within the element of solid angle dQ when viewing along a direction that is to be specified [15]. Its value 
depends on the design of the Raman scattering experiment. 

In the appendix, we present the differential Raman scattering cross-section for viewing along any wavevector 
in the scattering sphere for both linearly and circularly polarized excitation. The more conventional 
geometries used for exciting and analysing Raman scattering are discussed next. 

Suppose the exciting beam travels along X, and is linearly (1) polarized along Z. A popular experimental 
geometry is to view the scattered light along 7 (at tt/2 radians to the plane defined by the wavevector, and the 
polarization unit vector, # z , of the exciting light). One analyses the Z polarized component of the scattered 
light, called L (7 Z ), and the X polarized component, called I± (/*). (Careful work must properly correct for the 

finite solid angle of detection.) The two intensities are directly proportional to the differential cross-section 
given by 

(da)" 4 * teJ 1'n±^(— J;— ) (B1-3.20, 
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and 


-a \ 2 


The depolarization ratio is defined as 

Pi = "T < B1 - 3 - 22 ) 

which for the present case of an orientationally averaged isotropic assembly of Raman scatterers reduces to: 

5E 1 +3E 2 
p. = -. (B1.3.23) 

This result is general, for it includes the case where the tensor elements are complex, regardless of whether or 


not the hyperpolarizability tensor is built of the linear polarizability. 

Another frequent experimental configuration uses naturally (n) polarized incident light, with the same viewing 
geometry and polarization analysis. Such light may be regarded as polarized equally along Z (as before) and 
along the viewing axis, 7 Given the 7j_ and L as defined in the linearly polarized experiment, one can reason 
that now with naturally polarized excitation I z oc /, + 1^ (where the additional 1^ term along Z originates from 
the 7 polarized excitation). Similarly we expect that T^oc 21^, one 1^ from each of the two excitation 
polarizations. The differential Raman cross-sections for naturally polarized excitation are defined as 

and 

\<\Sl/ z \47teo7tcJ \ 60 / 

Thus one predicts that the depolarization ratio for excitation with natural light should be 

°« = h = 7,^77 = TT^j ■ ioe* + 5z' -7* (B1 " 3-24) 
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or 

„ El! 

two well known relations. Of course, if one does not measure the polarization of the scattered light for either 
experiment, the detector collects signal from the sum of the differential cross sections, (^-) |KZ> + (^)_L(j). 

Similar reasoning shows that were one to view along the X, 7 and Z axes and polarization analyse the signal 
each time, whether excited by linearly or by naturally polarized light, the total intensity should be given by 
Aotal ^ ^ll + ^_L* Given equation (B 1.3.23) , if we add its denominator to twice the numerator we find that 7 total 
oc {X + Z + X , a reassuring result. 

Knowledge of the depolarization ratios allows one to classify easily the Raman modes of a molecule into 
symmetric and asymmetric vibrations. If a molecule is undergoing a totally symmetric vibration, the 
depolarization ratio, pj (p n ) will be less than 3/4 (6/7) and we say that the vibration is polarized (p). On the 
other hand, for asymmetric vibrations, the depolarization ratio will have a value close to 3/4 (6/7) and we say 
these vibrations are depolarized (dp) [34]. It should be stated that these values for p are only valid when the 
scattered radiation is collected at right angles to the direction of the incident light. If different geometry is 
used, pj and p n are accordingly changed (see the appendix ). 

An interesting phenomenon called the 'noncoincidence effect' appears in the Raman spectroscopies. This is 
seen when a given Raman band shows a peak position and a bandwidth that differs (slightly) with the 


polarization. It can be attributed to varying sensitivity of the different tensor elements to interchromophore 
interactions. 


B1.3.3 RAMAN SPECTROSCOPY IN MODERN PHYSICS AND 
CHEMISTRY 

Raman spectroscopy is pervasive and ever changing in modern physics and chemistry. In this section of the 
chapter, sources of up-to-date information are given followed by brief discussions of a number of currently 
employed Raman based techniques. It is impractical to discuss every possible technique and impossible to 
predict the many future novel uses of Raman scattering that are sure to come, but it is hoped that this section 
will provide a firm launching point into the modern uses of Raman spectroscopy for present and future 
readers. 

B1. 3.3.1 SOURCES OF UP-TO-DATE INFORMATION 

There are three very important sources of up-to-date information on all aspects of Raman spectroscopy. 
Although papers dealing with Raman spectroscopy have appeared and will continue to appear in nearly every 
major chemical physics-physical chemistry based serial, The Journal of Raman Spectroscopy [35] is solely 
devoted to all aspects, both theoretical and experimental, of Raman spectroscopy. It originated in 1973 and 
continues to be a constant source of information on modern applications of Raman spectroscopy. 

Advances in Infrared and Raman Spectroscopy [ 36 ] provides review articles, both fundamental and applied, 
in the fields 
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of both infrared and Raman spectroscopy. This series aims to review the progress in all areas of science and 
engineering in which application of these techniques has a significant impact. Thus it provides an up-to-date 
account of both the theory and practice of these two complementary spectroscopic techniques. 

The third important source for information on modern Raman spectroscopy are the books cataloguing the 
proceedings of the International Conference on Raman Spectroscopy (ICORS) [37], ICORS is held every two 
years at various international locations and features hundreds of contributions from leading research groups 
covering all areas of Raman spectroscopy. Although the published presentations are quite limited in length, 
they each contain references to the more substantial works and collectively provide an excellent overview of 
current trends in Raman spectroscopy. A 'snapshot' or brief summary of the 1998 conference appears at the 
end of this chapter. 

Through these three serials, a researcher new to the field, or one working in a specialized area of Raman 
spectroscopy, can quickly gain access to its current status. 

B1. 3.3.2 SURVEY OF TECHNIQUES 

With the theoretical background presented in the previous sections, it is now possible to examine specific 
Raman techniques. Of the list in table Bl.3.1 , we briefly discuss and provide references to additional 
information for the Class I spectroscopies — spontaneous Raman scattering (SR), Fourier transform Raman 
scattering (FTRS), resonance Raman scattering (RRS), stimulated Raman scattering (SRS), and surface 
enhanced Raman scattering (SERS) — and in table Bl.3.2 , the Class II spectroscopies — coherent Raman 
scattering (CRS), Raman induced Kerr-effect spectroscopy (RIKES), Raman scattering with noisy light, time 
resolved coherent Raman scattering (TRCRS), impulsive stimulated Raman scattering (ISRS) and higher 


order and higher dimensional Raman scattering. 
First we discuss some Class I spectroscopies. 
(A) SPONTANEOUS RAMAN SCATTERING (SR) 

Conventional spontaneous Raman scattering is the oldest and most widely used of the Raman based 
spectroscopic methods. It has served as a standard technique for the study of molecular vibrational and 
rotational levels in gases, and for both intra- and inter-molecular excitations in liquids and solids. (For 
example, a high resolution study of the vibrons and phonons at low temperatures in crystalline benzene has 
just appeared [38].) 

In this earliest of Raman spectroscopies, there is only one incident field (originally sunlight or lines of the 
mercury lamp; today a single laser source). This is field 1 in the above language and it appears in quadrature 
in the two generators, and , relevant to SR. Figure B 1.3. 2(a) shows that these generators lead to four WMEL 
diagrams: D R W R (A), D R ^ N (B), Z> N W R (A), £> N JT N (B). The first is the strongest contributor (it is 
potentially resonant, R, in both the first and last steps). The last term is the weakest (being nonresonant, N, in 
both the first and last steps). We note that at the 4WM level, in all four terms, not only is field 1 in quadrature, 
but field 2 is likewise in quadrature (since for window events A and B, we have s^ = s 2 , namely the signal 
field is conjugate to the action of field 2). Now, since quadrature means photons, the Raman scattering event 
has destroyed a photon at oo 1? while it has created a new photon at co 2 . 
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The unique feature in spontaneous Raman spectroscopy (SR) is that field 2 is not an incident field but (at 
room temperature and at optical frequencies) it is resonantly drawn into action from the zero-point field of the 
ubiquitous blackbody (bb) radiation. Its active frequency is spontaneously selected (from the infinite colours 
available in the blackbody) by the resonance with the Raman transition at co 1 - a> 2 = 03 R in the material. The 
effective bb field intensity may be obtained from its energy density per unit circular frequency, V! , the 


Einstein A coefficient at a> 2 . When the polarization field at frequency go ,/r 3 )(oo = co 1 - a> 2 - CDj), produces 
an electromagnetic field which acts conjugately with this selected blackbody field (at co^ = a> 2 ), the scattered 
Raman photon is created. Thus, one simply has growth of the blackbody radiation field at co 2 , since full 
quadrature removes all oscillatory behaviour in time and all wavelike properties in space. 

Unlike the typical laser source, the zero-point blackbody field is spectrally 'white', providing all colours, a> 2 , 
that seek out all co 1 - a> 2 = co R resonances available in a given sample. Thus all possible Raman lines can be 
seen with a single incident source at ooj. Such multiplex capability is now found in the Class II spectroscopies 
where broadband excitation is obtained either by using modeless lasers, or a femtosecond pulse, which on first 
principles must be spectrally broad [32]. Another distinction between a coherent laser source and the 
blackbody radiation is that the zero-point field is spatially isotropic. By performing the simple wavevector 
algebra for SR, we find that the scattered radiation is isotropic as well. This concept of spatial incoherence 
will be used to explain a certain 'stimulated' Raman scattering event in a subsequent section. 

For SR, a Class I spectroscopy, there must be a net transfer of energy between light and matter which survives 
averaging over many cycles of the optical field. Thus, the material must undergo a state population change 
such that the overall energy (light and matter) may be conserved. In Stokes vibrational Raman scattering 
( figure B 1.3. 3(a) ), the chromophore is assumed to be in the ground vibrational state |g). The launching of the 
Stokes signal field creates a population shift from the ground state |g) to an excited vibrational state |f). 
Conversely, in anti-Stokes vibrational Raman scattering ( figure B 1.3. 3(b) ), the chromophore is assumed to be 


initially in an excited vibrational state, |f). Thus, the launching of the anti-Stokes field leaves the chromophore 
in the ground vibrational state, |g). This process is typically weaker than the Stokes process since it requires 
that an excited vibrational population exist (usually W f ■CW ). In thermal equilibrium, the intensity of the 

anti-Stokes frequencies compared to the Stokes frequencies clearly is reduced by the Boltzmann factor, W f l 


W = Gxp[-ft<s)f/kT] [17]. Let us now discuss the apparatus used for the production and detection of the 
Raman scattered radiation. 
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Figure Bl.3.3. The full WMEL diagram (of the D R W R sort) for spontaneous (or stimulated) (a) Stokes 
Raman scattering and (b) anti-Stokes Raman scattering. In Stokes scattering, the chromophore is initially in 
the ground vibrational state, g, and co 1 > a> 2 . In spontaneous anti-Stokes scattering, the chromophore must be 
initially in an excited vibrational state,/ Also note that in (b), co 2 is (arbitrarily) defined as being greater than 
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(B) RAMAN INSTRUMENTATION 


Dramatic advances in Raman instrumentation have occurred since 1928. At the beginning, various elemental 


lamps served as the incident light source and photographic plates were used to detect the dispersed scattered 
light. Mercury arc lamps were primarily used since they had strong emission lines in the blue region of the 
visible spectrum (see equation (B 1.3. 16) ). As in all spectroscopies, detection devices have moved from the 
photographic to the photoelectric. 

Even while Raman spectrometers today incorporate modern technology, the fundamental components remain 
unchanged. Commercially, one still has an excitation source, sample illuminating optics, a scattered light 
collection system, a dispersive element and a detection system. Each is now briefly discussed. 

Continuous wave (CW) lasers such as Ar + and He-Ne are employed in commonplace Raman spectrometers. 
However laser sources for Raman spectroscopy now extend from the edge of the vacuum UV to the near 
infrared. Lasers serve as an energetic source which at the same time can be highly monochromatic, thus 
effectively supplying the single excitation frequency, Vj. The beams have a small diameter which may be 

focused quite easily into the sample and are convenient for remote probing. Finally, almost all lasers are 
linearly polarized, which makes measurements of the depolarization ratio, equation (B 1.3. 22) , relatively 
straightforward. 

The laser beam is typically focused onto the sample with a simple optical lens system (though microscopes 
are also used). The resultant scattered radiation is often collected at 90° from the incident beam propagation 
and focused onto the entrance of a monochromator. The monochromator removes extraneous light from the 
Raman radiation by use of a series of mirrors and optical gratings. Two features of the monochromator for 
maximum resolution are the slit width and, for monochromatic detection, the scanning speed. The resolution 
of the Raman spectrum is optimized by adjusting the slit width and scanning rate, while still maintaining a 
strong signal intensity. 

After passing through the monochromator, the signal is focused onto the detection device. These devices 
range from photomultiplier tubes (PMTs) in which the signal is recorded at each frequency and the spectrum 
is obtained by scanning over a selected frequency range, to multichannel devices, such as arrays of 
photodiodes and charge coupling devices, which simultaneously detect the signal over a full frequency range. 
One may choose a specific detection device based on the particulars of an experiment. Sensitive detection of 
excited vibrational states that are produced in the Class I Raman spectroscopies is an alternative that can 
include acoustic detection of the heat released and resonance enhanced multiphoton ionization (REMPI). For 
special applications microspectroscopic techniques and fibre optic probes ('optodes') are used. 

This basic instrumentation, here described within the context of spontaneous Raman scattering, may be 
generalized to most of the other Raman processes that are discussed. Specific details can be found in the 
citations. 

(C) FOURIER TRANSFORM RAMAN SCATTERING (FTRS) 

Normal spontaneous Raman scattering suffers from lack of frequency precision and thus good spectral 
subtractions are not possible. Another limitation to this technique is that high resolution experiments are often 
difficult to perform [39]. These shortcomings have been circumvented by the development of Fourier 
transform (FT) Raman spectroscopy [40]. FT Raman spectroscopy employs a long wavelength laser to 
achieve viable interferometry, 
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typically a Nd:YAG operating at 1.064 |um. The laser radiation is focused into the sample where spontaneous 
Raman scattering occurs. The scattered light is filtered to remove backscatter and the Raman light is sent into 
a Michelson interferometer. An interferogram is collected and detected on a near-infrared detector (typically 


an N 2 cooled Ge photoresist). The detected signal is then digitized and Fourier transformed to obtain a 
spectrum [41] . This technique offers many advantages over conventional Raman spectroscopy. The 1.064 |um 
wavelength of the incident laser is normally far from electronic transitions and reduces the likelihood of 
fluorescence interference. Also near-IR radiation decreases sample heating, thus higher powers can be 
tolerated [42]. Since the signal is obtained by interferometry, the FT instrument records the intensity of many 
wavelengths. Such simultaneous detection of a multitude of wavelengths is known as the multiplex advantage, 
and leads to improved resolution, spectral acquisition time and signal to noise ratio over ordinary spontaneous 
Raman scattering [39]. The improved wavelength precision gained by the use of an interferometer permits 
spectral subtraction, which is effective in removing background features [42]. The interferogram is then 
converted to a spectrum by Fourier transform techniques using computer programs. By interfacing the Raman 
setup with an FT-IR apparatus, one has both IR and Raman capabilities with one instrument. 

Other than the obvious advantages of reduced fluorescence and high resolution, FT Raman is fast, safe and 
requires minimal skill, making it a popular analytic tool for the characterization of organic compounds, 
polymers, inorganic materials and surfaces and has been employed in many biological applications [41]. 

It should be noted that this technique is not without some disadvantages. The blackbody emission background 

in the near IR limits the upper temperature of the sample to about 200°C [43]. Then there is the v 4 dependence 
of the Raman cross-section ( equation (B 1.3. 16) and equation (B1.3.20)-equation (Bl.3.21) ) which calls for an 
order of magnitude greater excitation intensity when exciting in the near-IR rather than in the visible to 
produce the same signal intensity [39]. 

(D) RESONANCE RAMAN SCATTERING (RRS) 

As the incident frequencies in any Raman spectroscopy approach an electronic transition in the material, the 
D R W R term in Raman scattering is greatly enhanced. One then encounters an extremely fruitful and versatile 
branch of spectroscopy called resonance Raman scattering (RRS). In fact, it is fair to say that in recent years 
RRS (Class I or Class II) has become the most popular form of Raman based spectroscopy. On the one hand, 
it offers a diagnostic approach that is specific to those subsystems (even minority components) that exhibit the 
resonance, even the very electronic transition to which the experiment is tuned. On the other hand, it offers a 
powerful tool for exploring potential-energy hypersurfaces in polyatomic systems. It forms the basis for many 
time resolved resonant Raman spectroscopies (TRRRSs) that exploit the non-zero vibronic memory implied 
by an electronic resonance. It has inspired the time-domain theoretical picture of RRS which is formally the 
appropriate transform of the frequency domain picture [44, 45]. Here the physically appealing picture arises in 
which the two-step doorway event (£) R ) prepares (vertically upward) a vibrational wave-packet that moves 
(propagates) on the upper electronic state potential energy hypersurface. In the window stage (^ R ) of the 
4WM event, this packet projects (vertically downward), accordingly to its lingering time on the upper surface, 
back onto the ground state to complete the third order induced optical polarization that leads to the new fourth 
wave. 

RRS has also introduced the concept of a 'Raman excitation profile' (REP. for they'th mode) [46, 47, 48, 49, 
50 and 51 ]. An REP . is obtained by measuring the resonance Raman scattering strength of they'th mode as a 
function of the excitation frequency [52, 53 ]. How does the scattering intensity for a given (they'th) Raman 
active vibration vary with excitation frequency within an electronic absorption band? In turn, this has led to 
transform theories that try to predict 
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the REP. from the ordinary absorption band (ABS), or the reverse. Thus one has the so-called forward 

transform, ABS — » REP., and the inverse transform, REP. — » ABS [54, 55 and 56]- The inverse transform is a 
formal method that transforms an observed REP . into the electronic absorption band that is responsible for 
resonantly scattering modey. This inverse transform raises theoretical issues concerning the frequently 


encountered problem of phase recovery of a complex function (in this case the complex Raman 
susceptibility), knowing only its amplitude [57]. 

One group has successfully obtained information about potential energy surfaces without measuring REPs. 
Instead, easily measured second derivative absorption profiles are obtained and linked to the full RRS 
spectrum taken at a single incident frequency. In this way, the painstaking task of measuring a REP is 
replaced by carefully recording the second derivative of the electronic absorption spectrum of the resonant 
transition [58, 59 ]. 

The fitting parameters in the transform method are properties related to the two potential energy surfaces that 
define the electronic resonance. These curves are obtained when the two hypersurfaces are cut along they'th 
normal mode coordinate. In order of increasing theoretical sophistication these properties are: (i) the relative 
position of their minima (often called the displacement parameters), (ii) the force constant of the vibration (its 
frequency), (iii) nuclear coordinate dependence of the electronic transition moment and (iv) the issue of mode 
mixing upon excitation — known as the Duschinsky effect — requiring a multidimensional approach. 

We have seen how, by definition, all Raman spectroscopies must feature the difference frequency resonance 
that appears following the two-step doorway stage of the 4WM process. Basically, RRS takes advantage of 
achieving additional resonances available in the two remaining energy denominator factors found at third 
order (and, of course, still more at higher order). The two remaining energy factors necessarily involve an 
algebraic sum of an odd number of optical frequencies (one for the first step in the doorway stage, and three 
for the initial step in the window event). Since the algebraic sum of an odd number of optical frequencies 
must itself be optical, these additional resonances must be at optical frequencies. Namely, they must 
correspond to electronic transitions, including (in molecules) their dense rotational- (or librational-) 
vibrational substructure. The literature is filled with a great many interesting RRS applications, extending 
from resonances in the near-IR (dyes and photosynthetic pigments for example) to the deep UV (where 
backbone electronic resonances in proteins and nucleic acids are studied). These increasingly include TRRRS 
in order to follow the folding/unfolding dynamics of substructures (through the chromophore specificity of 
RRS) in biologically important molecules [60, 61 and 62 ]. 

The reader must turn to the literature to amplify upon any of these topics. Here we return to the two-colour 
generator/WMEL scheme to see how it easily can be adapted to the RRS problem. 

Let us consider RRS that contains both of the available additional resonances, as is normally the case (though 
careful choices of colours and their time sequence can isolate one or the other of these). First, we seek out the 
doorway events that contain not only the usual Raman resonance after the two fields, 1 and 2, have acted 
conjugately, but also the new resonance that appears after the first field has acted. The appropriate doorway 
generators remain ^s* and e*^, in order to retain the Raman resonance. There are now two fully resonant 

doorway WMEL diagrams, which we shall call ^rr(A 12 *) and ^Drr(B 12 *). These diagrams are shown in 
figure Bl.3.4 in which the full manifold of sublevels for each electronic state is intimated. In doorway channel 
A, vibrational coherences are produced in the ground electronic state g, as usual, but now they are enhanced 
by the electronic resonance in step 1 (s^. However, in doorway channel B, vibrational coherences are 
produced in the excited electronic state, e, as well. Interestingly, since A is a ket-ket event and B is a ket-bra 
event (which differ by an overall sign), these two coherences must differ in 
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phase by n (180°). This may be important in any 4WM experiment that is phase sensitive (heterodyned Class 
II) and in which the window event does not reverse this phase difference. 


Figure Bl.3.4. The two fully resonant doorway stages for resonance Raman scattering (RRS), in which the 
manifold of vibrational sublevels for each electronic state is indicated, (a) Doorway stage ^rr(A 12 *), in 
which a vibrational coherence is produced in the ground electronic state, g. It is a ket-ket evolution, (b) 
Doorway stage ^rr(B 12 *) ? where a vibrational coherence is created in the excited electronic state, e. It is a 
ket-bra evolution. The coherences in both doorway stages are enhanced by the electronic resonance in their 
identical step 1 (generated by s^. 

Frequently, femtosecond pulses are used in such electronically resonant spectroscopy. Such pulses usually 
have near-transform limited bandwidths and can spectrally embrace a fair range of vibrational coherences. 
Thus, even when a single central colour is chosen to define the femtosecond exciting pulse, actually a broad 
band of colours is available to provide a range for both co 1 and a> 2 . Instead of preparing single well-defined 
vibrational coherences using sharply 
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defined colours, now vibrational wavepackets are made in both the ground (channel A) and excited states 
(channel B). These must evolve in time, for they are not eigenstates in either potential energy hypersurface. 
The nature of the 4WM signal is sensitive to the location of the wavepacket in either hypersurface at the time 
the window event takes place. Such wavepackets, prepared from a spectrum of colours contained coherently 
within a femtosecond pulse, are termed 'impulsively' prepared (see impulsive Raman scattering for further 
details). 

Whenever coherences in the upper manifold are particularly short lived, doorway channel A will dominate the 
evolution of the polarization at least at later times. In that case, the fringes seen in the 4WM signal as the time 
between the doorway and window stages is altered (with a delay line) reflect those Raman frequencies in the 
ground state that can be spectrally embraced by the femtosecond pulse. Then the Fourier transform of the 
fringes leads to the conventional spontaneous RRS of the ground state. Indeed, in the absence of electronic 
resonance, channel B reverts to a purely nonresonant doorway event (£> N ) and only channel A reveals Raman 
resonances — those in the electronic ground state [62]. 

Whatever the detection technique, the window stage of the 4WM event must convert these evolved vibrational 
wavepackets into the third order polarization field that oscillates at an ensemble distribution of optical 
frequencies. One must be alert to the possibility that the window event after doorway channel B may involve 
resonances from electronic state manifold e to some higher manifold, say r. Thus channel B followed by an e 3 
(ket) or a &*(bra) event might be enhanced by an e-to-r resonance. However, it is normal to confine the 

window event to the e-to-g resonances, but this is often simply for lack of substantive e-to-r information. 
Given that the third field action can be of any available colour, and considering only g-to-e resonances, one 
has, for any colour co v only two possibilities following each doorway channel. Channel A should be followed 
by an s 3 (ket up) or an *y(bra up) event; channel B should be followed by a s 3 (bra down) or a E y(het down) 

event. 

Before looking more closely at these, it is important to recognize another category of pump-probe Raman 
experiments. These are often referred to as 'transient Raman' pump-probe studies. In these, a given system is 
'pumped' into a transient condition such as an excited vibronic state, or a photochemical event such as 
dissociation or radical formation [63, 64 and 65]. Such pumping can be achieved by any means — even by high 
energy radiation [66, 67 and 68] — though normally laser pumping is used. The product(s) formed by the 
pump step is then studied by a Raman probe (often simply spontaneous Raman, sometimes CARS). Since the 
transient state is normally at low concentrations, the Raman probing seeks out resonant enhancement, as we 
are describing, and also means must be taken to stay away from the luminescence background that is 
invariably caused by the pump event. Often, time gated Fourier transform Raman in the near-IR is employed 


(that spectral region being relatively free of interfering fluorescence) and yet upper e-to-r type resonances 
may still be available for RRS. Since transient systems are 'hot' by their very nature, both anti-Stokes as well 
as Stokes spontaneous Raman scattering can be followed to time the vibrational relaxation in transient excited 
states (see [69, 70]). 

Returning to the original pump-probe RRS, it is a simple matter to complete the 4WM WMEL diagrams for 
any proposed RRS. Usually RRS experiments are of the full quadrature sort, both spontaneous RRS as well as 
homodyne detected femtosecond RRS. The latter fit most pump-probe configurations. 

Let us, for example, present the full WMEL diagrams for full quadrature RRS with two colours, 1 and 2. 
(Recall that three colours cannot lead to full Q at the 4WM level.) Given the £|£*doorway generator for 

channels A and B, the generator for the first step of the window event must either be £|Or s 2 , and the 

corresponding signal must conjugate 
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either with e * (e* = £;)or with e } (e* = £■* ), respectively. In the former case, field 1 will have acted twice (and 
conjugately) to help produce the third order polarization and signal field directed along k 2 - In the latter case, 
field 2 has acted twice (and conjugately) to help produce a third order polarization along -ky Of the two 
fields, the most intense clearly is the candidate for the twice acting field. The weaker field, then, is examined 
for the signal. 

In addition, we have asked that a third resonance exist in the window stage. For channel A this requires that 
the window event begin either with ket up using s 2 , or with bra up using £| , while the window event 

following channel B should begin either with bra down using e 2 or with ket down using e\ . The corresponding 

four WMEL diagrams are shown in figure Bl.3.5. 
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Figure Bl.3.5. Four WMEL diagrams for fully resonant Raman scattering (RRS). Diagrams (a) and (b) both 
have doorway stage ^rr(A 12 *) (Figure B 1.3. 4(a)), in which a vibrational coherence is created in the ground 
electronic state, g. For the window event in (a), field 1 promotes the bra from the ground electronic state, g, to 


an excited electronic bra state, e. In this window stage W R (A^ 2 )> the third field action helps produce the third 
order polarization, which in turn gives rise to a signal field with frequency a> 2 . For the window event W R 
(A 21 *) in (b), field 2 acts to promote the ket in the ground electronic state into one in the excited electronic 

state. Now a signal field with frequency -co 1 is created. Both diagrams (c) and (d) have doorway stage D^ 
(B 12 *) (figure B 1.3. 4(b)), in which a vibrational coherence is created in the excited electronic state, e. At the 
window stage W R (B 1>i{2 ) (c) field 1 demotes the ket from one in electronic state e to one in the ground 

electronic state g. A consequent third order polarization leads to a signal field at frequency a> 2 . At the window 
stage W R (B 2 i*) (d) the bra is demoted by field 2 to the ground electronic state to produce a signal field at 
frequency -o^ . Full quadrature is achieved in all four diagrams. 
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In these WMEL diagrams, the outcome of the collapse of the polarization, the second step of the window 
stage (the curvy line segment), depends on the phase difference between the induced sth order polarization 
and the new (s + l)th electromagnetic field [7]. If this difference is 0°, then the energy contained in the 
polarization is fully converted into population change in the medium (pure Class I spectroscopies). At third 
order, channel A populates vibrational levels in the ground electronic state; channel B populates vibrational 
levels in the excited electronic state. However if this phase difference is 90°, the energy is converted fully into 
the new (s + l)th electromagnetic field and the material is unchanged (pure Class II spectroscopies). If the 
phase difference lies between and 90°, as is almost always the case, then both outcomes occur to some 
extent and the experimentalist may perform either a Class I or a Class II spectroscopy. 

As we have already seen, in such a full quadrature situation, phase matching is automatic (the signal is 
collinear with either of the incident fields), so the experiment then measures changes in one or more 
properties of one of the incident fields — either the first appearing light pulse, or the second appearing light 
pulse. These are distinguished both in order of appearance and by their wavevector of incidence. At full 
quadrature, the obvious property to measure is simply the intensity (Class I) as one (or more) of the time 
parameters is changed. With this example, it should be a simple matter to explore WMEL diagrams for any 
other RRS spectroscopies, in particular the nonquadrature ones. 

(E) STIMULATED RAMAN SCATTERING (SRS) 

Since the inception of the laser, the phenomenon called stimulated Raman scattering (SRS) [71, 72, 73 and 
74 ] has been observed while performing spontaneous Raman experiments. Stimulated Raman scattering is 
inelastic light scattering by molecules caused by presence of a light field that is stronger than the zero-point 
field of the blackbody. Thus SRS overtakes the spontaneous counterpart. SRS, like SR, is an active Class I 
spectroscopy. For the case of Stokes scattering, this light field may be a second laser beam at frequency a> 2 , or 
it may be from the polarization field that has built up from the spontaneous scattering event. Typically in an 
SRS experiment one also observes frequencies other than the Stokes frequency: those at co 1 - 2<d r , co 1 - 3co R , 
03 1 _ 4(D R etc which are referred to as higher order Stokes stimulated Raman scattering and at co 1 + co R (anti- 
Stokes) and co 1 + 2co R , co 1 + 3co R , co 1 + 4co R (higher order anti-Stokes stimulated Raman scattering) [19]. This 
type of scattering may arise only after an appreciable amount of Stokes scattering is produced (unless the 
system is not at thermal equilibrium). 

First order stimulated Stokes scattering experiences an exponential gain in intensity as the fields propagate 
through the scattering medium. This is given by the expression [ 75 ] 

/s(:) = /s(Q)e** /L = 

where z is the path length, 7 L is the intensity of the incident radiation and ^ s (0) is the intensity of the quantum 
noise associated with the vacuum state [15], related to the zero-point bb field. g s is known as the stimulated 


Raman gain coefficient, whose maximum occurs at exact resonance, a> L - a> s = co R . For a Lorentzian 
lineshape, the maximum gain coefficient is given by 
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where a> s is the frequency of the Stokes radiation and y R is the HWHM of the Raman line (in units of circular 
frequency) [75]. This gain coefficient is seen to be proportional to the spontaneous differential Raman cross 
section (da R /dQ), (the exact nature of which depends on experimental design (see equation (B 1.3. 20) and 
equation (B 1.3. 21) ) and also to the number density of scatterers, N. This frequency dependent gain coefficient 

may also be written in terms of the third order nonlinear Raman susceptibility, %^\(0^). As with SR, only the 
imaginary part of X O^g) contributes to Stokes amplification (the real part accounts for intensity dependent 
(nonlinear) refractive indices). For the definition of %@' used here ( equation (B 1.3. 11) and equation (Bl.3.12) , 
their relation is given by 

where the magnitude of Im X O^c) is negative, thus leading to a positive gain coefficient [33]. For this 
expression, only the component of the scattered radiation parallel to the incident light is analysed. 

Once an appreciable amount of Stokes radiation is generated, enough scatterers are left in an excited 
vibrational state for the generation of anti-Stokes radiation. Also, the Stokes radiation produced may now act 
as incident radiation in further stimulated Raman processes to generate the higher order Stokes fields. 
Although the Stokes field is spatially isotropic, scattered radiation in the forward and backward directions 
with respect to the incident light traverses the longest interaction length and thus experiences a significantly 
larger gain (typically several orders of magnitude larger than in the other directions [33]. Thus the first and 
higher order Stokes frequencies lie along the direction of the incident beam. 

This is not the case for stimulated anti-Stokes radiation. There are two sources of polarization for anti-Stokes 
radiation [17]. The first is analogous to that in figure B 1.3. 3(b) where the action of the blackbody (-co 2 ) is 
replaced by the action of a previously produced anti-Stokes wave, with frequency a> A . This radiation actually 

experiences an attenuation since the value of Im %^(oo A ) is positive (leading to a negative 'gain' coefficient). 
This is known as the stimulated Raman loss (SRL) spectroscopy [76]. However the second source of anti- 
Stokes polarization relies on the presence of Stokes radiation [17]. This anti-Stokes radiation will emerge 
from the sample in a direction given by the wavevector algebra: k A = 2k 1 - k s . Since the Stokes radiation is 
isotropic (-k s ), the anti-Stokes radiation (and subsequent higher order radiation) is emitted in the form of 
concentric rings. 

(F) SURFACE ENHANCED RAMAN SCATTERING (SERS) 

We have seen that the strength of Raman scattered radiation is directly related to the Raman scattering cross- 
section (cj r ). The fact that this cross-section for Raman scattering is typically much weaker than that for 
absorption (a R ^^ a ^ s ) limits conventional SR as a sensitive analytical tool compared to (linear) absorption 

techniques. The complication of fluorescence in the usual Raman techniques of course tends to decrease the 
signal-to-noise ratio. 


It was first reported in 1974 that the Raman spectrum of pyridine is enhanced by many orders of magnitude 
when 
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the pyridine is adsorbed on silver metal [77]. This dramatic increase in the apparent Raman cross-section in 
the pyridine/silver system was subsequently studied in more detail [78, 79]. The enhancement of the Raman 
scattering intensity from molecules adsorbed to surfaces (usually, though not exclusively, noble metals) has 
come to be called the surface enhanced Raman spectroscopy (SERS). Since these early discoveries, SERS has 
been intensively studied for a wide variety of adsorbate/substrate systems [80, 81, 82 and 83]. The 
pyridine/silver system, while already thoroughly studied, still remains a popular choice among investigators 
both for elucidating enhancement mechanisms and for analytical purposes. The fluorescence of the adsorbed 
molecules does not experience similarly strong enhancement, and often is actually quenched. So, since the 
signal is dominated by the adsorbate molecules, fluorescence contamination is relatively suppressed. 

The metal substrate evidently affords a huge (~10 10 and even as high as 10 [84, 85]) increase in the cross- 
section for Raman scattering of the adsorbate. There are two broad classes of mechanisms which are said to 
contribute to this enhancement [86, 87 and 88]. The first is based on electromagnetic effects and the second 
on 'chemical' effects. Of these two classes the former is better understood and, for the most part, the specific 
mechanisms are agreed upon; the latter is more complicated and is less well understood. SERS enhancement 
can take place in either physisorbed or chemisorbed situations, with the chemisorbed case typically 
characterized by larger Raman frequency shifts from the bulk phase. 

The substrate is, of course, a necessary component of any SERS experiment. A wide variety of substrate 
surfaces have been prepared for SERS studies by an equally wide range of techniques [87], Two important 
substrates are electrochemically prepared electrodes and colloidal surfaces (either deposited or in solution). 

A side from the presence of a substrate, the SERS experiment is fundamentally similar to the standard 
conventional Raman scattering experiment. Often a continuous wave laser, such as an argon ion laser, is used 
as the excitation source, but pulsed lasers can also be used to achieve time resolved SERS. Also, as in 
conventional Raman scattering, one can utilize pre-resonant or resonant conditions to perform resonant SERS 
(often denoted SERRS for surface enhanced resonant Raman scattering). SERRS combines the cross-section 
enhancement of SERS with the electronic resonance enhancement of resonance Raman scattering. In fact, 
through SERRS, one can achieve extraordinary sensitivity, with reports appearing of near-single-molecule- 
based signals [84, 85, 89]. 

We now move on to some Class II spectroscopies. 

(G) COHERENT RAMAN SCATTERING (CRS) 

The major Class II Raman spectroscopy is coherent Raman scattering (CRS) [90, 91, 92 and 93 ]. It is an 
extremely important class of nearly degenerate four- wave mixing spectroscopies in which the fourth wave (or 
signal field) is a result of the coherent stimulated Raman scattering. There are two important kinds of CRS 
distinguished by whether the signal is anti-Stokes shifted (to the blue) or Stokes shifted (to the red). The 
former is called CARS (coherent anti-Stokes Raman scattering) and the latter is called CSRS (coherent Stokes 
Raman scattering). Both CARS and CSRS involve the use of two distinct incident laser frequencies, co 1 and 
a> 2 (<£>! > co 2 ). In the typical experiment co 1 is held fixed while co 2 is scanned. When co 1 - a> 2 matches a Raman 
frequency of the sample a resonant condition results and there is a strong gain in the CARS or CSRS signal 
intensity. The complete scan of a> 2 then traces out the CARS or CSRS spectrum of the sample. ( Figure Bl.3.2 
(b)) shows representative WMEL diagrams for the CARS and CSRS processes.) There are, in actuality, 48 
WMEL diagrams (including the nonresonant contributions) that one must 
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consider for either of these two processes. These have been displayed in the literature ([98] (CARS) and [ 99 ] 
(CSRS)). For both processes, a pair of field-matter interactions produces a vibrational coherence between 
states |g) and |f) (see D R and Z> N of figure B 1.3. 2(b) ). For the CARS process, the third field, having frequency 
(Dp acts in phase (the same Fourier component) with the first action of co 1 to produce a polarization that is 
anti-Stokes shifted from co 1 (see W R (E) and W^(¥) of figure B 1.3. 2(b) ). For the case of CSRS the third field 
action has frequency a> 2 and acts in phase with the earlier action of a> 2 (^ R (C) and W^(D) of figure Bl.3.2 
(b) . Unlike the Class I spectroscopies, no fields in CARS or CSRS (or any homodyne detected Class II 
spectroscopies) are in quadrature at the polarization level. Since homodyne detected CRS is governed by the 
modulus square of %(' (/ rR c ^ |x I h its lineshape is not a symmetric lineshape like those in the Class I 
spectroscopies, but it depends on both the resonant and nonresonant components of % K \ x R and Xn •> 
respectively. Thus 

and one is faced with both an absorptive component |Xr I an d a dispersive component, xli ^(Xr } + Xr" )• 
v f ;^can, to a very good approximation, be taken to be a (pure real) constant over the width of the Raman line.) 

As a result, the CRS lineshape is asymmetric and more complicated due to this nonresonant background 
interference. 

The primary advantages of CARS and CSRS include an inherently stronger signal than spontaneous Raman 
scattering (the incident fields are stronger than the zero-point blackbody fields) and one that is directional 
(phase matched). These characteristics combine to give the technique a much lower vulnerability to sample 
fluorescence and also an advantage in remote sensing. For CARS, fluorescence is especially avoided since the 
signal emerges to the blue of the incident laser frequencies and fluorescence must be absent (unless it were 
biphotonically induced). The primary disadvantage of CARS and CSRS is the interference of the nonresonant 

part of x^ in the form of the dispersive cross-term. A class of techniques called polarization CRS utilizes the 
control over the polarization of the input beams to suppress the nonresonant background interference [94, 95 ]. 
On the other hand, the background interference necessarily carries information about the nonresonant 
component of the electric susceptibility which is sometimes a sought after quantity. 

(H) RAMAN INDUCED KERR EFFECT SPECTROSCOPY (RIKES) 

The nonresonant background prevalent in CARS experiments (discussed above), although much weaker than 
the signals due to strong Raman modes, can often obscure weaker modes. Another technique which can 
suppress the nonresonant background signal is Raman induced Kerr-effect spectroscopy or RIKES [%, 97 ]. 

A RIKES experiment is essentially identical to that of CW CARS, except the probe laser need not be tunable. 
The probe beam is linearly polarized at 0° (— »), while the polarization of the tunable pump beam is controlled 
by a linear polarizer and a quarter waveplate. The pump and probe beams, whose frequency difference must 
match the Raman frequency, are overlapped in the sample (just as in CARS). The strong pump beam 
propagating through a nonlinear medium induces an anisotropic change in the refractive indices seen by the 
weaker probe wave, which alters the polarization of a probe beam [96], The signal field is polarized 
orthogonally to the probe laser and any altered polarization may be detected as an increase in intensity 
transmitted through a crossed polarizer. When the pump beam is linearly polarized at 45° (/*), contributions 

from the nonlinear background susceptibility exist (y ^ = 3[v ( "^ + y f3j ])• If the quarter- wave plate is 

adjusted to give circularly polarized light (Q),, the nonresonant background will disappear 
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, provided [19]. 

A unique feature of this Class II spectroscopy is that it occurs in full quadrature, and thus the phase-matching 
condition is automatically fulfilled for every propagation direction and frequency combination (for isotropic 
media). Characteristic WMEL diagrams for this process are given by diagrams B and G of figure B 1.3. 2(a) . 
From these diagrams, one may notice that the frequency of the signal field is identical to that of one of the 
incident fields, thus one must carefully align the crossed polarizer to eliminate contamination by the probe 
beam. 

A common technique used to enhance the signal-to-noise ratio for weak modes is to inject a local oscillator 
field polarized parallel to the RIKE field at the detector. This local oscillator field is derived from the probe 
laser and will add coherently to the RIKE field [96], The relative phase of the local oscillator and the RIKE 
field is an important parameter in describing the optical heterodyne detected (OHD)-RIKES spectrum. If the 
local oscillator at the detector is in phase with the probe wave, the heterodyne intensity is proportional to 
Rdx^"'}* ^ ^ l° ca l oscillator is in phase quadrature with the probe field, the heterodyne intensity becomes 

proportional to Im {y*- l) |. Thus, in addition in to signal-to-noise improvements, OHD-RIKES, being a 

heterodyne method, demonstrates a phase sensitivity not possible with more conventional homodyne 
techniques. 

Still another spectroscopic technique used to suppress the nonresonant background is ASTERISK. The setup 
is identical to a conventional CARS experiment, except three independent input fields of frequencies, o^, a> 2 
and a> 3 are used. The relative polarization configuration (not wavevectors) for the three incident fields and the 
analyser (e s ) is shown in figure Bl.3.6 where the signal generated at a> s will be polarized in the x-direction. 
(Both and § are defined to be positive angles, as denoted in figure Bl.3.6 .) The recorded spectra will be 
relatively free of the nonresonant background (for = § ^45°, the detected intensity will be proportional to 
IXabad "" *abbaI 2 ' however one must satisfy the phase matching condition: Ak = k 1 - k 2 + k 3 - k § . In 
transparent materials, a greater than three-orders-of-magnitude reduction of the nonresonant background 
occurs as compared to its 25-100-fold suppression by RIKES [96], 
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Figure Bl.3.6. The configuration of the unit polarization vectors e 1? e 2 , e 3 and e § in the laboratory Cartesian 
basis as found in the ASTERISK technique. 

(I) RAMAN SCATTERING WITH NOISY LIGHT 

In the early 1980s, it was shown that noisy light offers a time resolution of the order of the noise correlation 
time, t , of the light (typically tens to several hundreds of femtoseconds) — many orders of magnitude faster 
than the temporal profile of the light (which is often several nanoseconds, but in principle can be CW) [ 100 , 
101 and 102 ]. A critical review of many applications of noisy light (including CRS) is given by Kobayashi 
[ 103 ]. A more recent review by Kummrow and Lau [ 104 ] contains an extensive listing of references. 

A typical noisy light based CRS experiment involves the splitting of a noisy beam (short autocorrelation time, 
broadband) into identical twin beams, B and B', through the use of a Michelson interferometer. One arm of 
the interferometer is computer controlled to introduce a relative delay, x, between B and B'. The twin beams 
exit the interferometer and are joined by a narrowband field, M, to produce the CRS-type third order 
polarization in the sample (w^ - (y H ss ^). The delay between B and B' is then scanned and the frequency- 
resolved signal of interest is detected as a function of x to produce an interferogram. As an interferometric 
spectroscopy, it has come to be called I (2) CRS (I (2) CARS and I (2) CSRS), in which the 'I (2) ' refers to the two, 
twin incoherent beams that are interferometrically treated [ 105 ]. The theory of I^CRS [ 106 , 107 ] predicts 
that the so-called radiation difference oscillations (RDOs) should appear in the 'monochromatically' detected 

IS ^CRS interferogram with a frequency of A = a> M + 2<d r - co D , where co D is the detected frequency, co M is 
the narrowband frequency and <d r the Raman (vibrational) frequency. Since a> D and a> M are known, co R may 
be extracted from the experimentally measured RDOs. Furthermore, the dephasing rate constant, y R , is 
determined from the observed decay rate constant, y, of the I^CRS interferogram. Typically for the I^CRS 
signal co D « a> M + 2co R and thus A « 0. That is, the RDOs represent strongly down-converted (even to zero 

frequency) Raman frequencies. This down-conversion is one of the chief advantages of the I^CRS technique, 
because it allows for the characterization of vibrations using optical fields but with a much smaller 
interferometric sampling rate than is needed in FT-Raman or in FT-IR. More explicitly, the Nyquist sampling 
rate criterion for the 
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RDOs is much smaller than that for the vibration itself, not to mention that for the near-IR FT-Raman 
technique already discussed. This is particularly striking for high energy modes such as the C-H vibrations 
[1081. Modern applications of I^CRS now utilize a 'two-dimensional' time-frequency detection scheme 


which involves the use of a CCD camera to detect an entire I^CARS spectrum at every delay time [ 109 ]. 
These are called Raman spectrograms and allow for a greatly enhanced level of precision in the extraction of 
the Raman parameters — a precision that considerably exceeds the instrumental uncertainties. 

The understanding of the underlying physical processes behind I^CRS (and noisy light spectroscopies in 
general) has been aided by the recent development of a diagrammatic technique called factorized time 
correlation (FTC) diagram analysis for properly averaging over the noise components in the incident light 
[ 110 , 111 and 112 ] in any noisy light based spectroscopy (linear or nonlinear). 

(J) TIME RESOLVED COHERENT RAMAN SCATTERING (TRCRS) 

With the advent of short pulsed lasers, investigators were able to perform time resolved coherent Raman 
scattering. In contrast to using femtosecond pulses whose spectral width provides the two colours needed to 
produce Raman coherences, discussed above, here we consider pulses having two distinct centre frequencies 
whose difference drives the coherence. Since the 1970s, picosecond lasers have been employed for this 
purpose [ 113 , 114 ], and since the late 1980s femtosecond pulses have also been used [ 115 ]. Here we shall 
briefly focus on the two-colour femtosecond pulsed experiments since they and the picosecond experiments 
are very similar in concept. 

The TR-CRS experiment requires a femtosecond scale light source (originally a rhodamine 6G ring dye laser 
[ 115 , 116 ]) and a second longer pulsed (typically several picoseconds) laser operating at a different frequency. 
The femtosecond source at one colour is split into two pulses having a relative and controllable delay, x, 
between them. Each of these two pulses acts once and with the same Fourier component, one in the doorway 
stage, the other in the window stage. The third, longer pulsed field at the second colour and in a conjugate 
manner participates with one of the femtosecond pulses in the doorway event to produce the Raman 
coherence. This polarization then launches the TR-CRS signal field which can be either homodyne or 
heterodyne detected. This signal must decay with increasing x as the Raman coherence is given time to decay 
before the window event takes place. 

For homodyne detection, the TR-CRS intensity (for Lorentzian Raman lines) is of the form [ 115 ] 


^TR-CRS OC 


e" 5 ^ e" 


icwj- r + \tp t 


(B1.3.25) 


where y runs over all Raman active modes contained within the bandwidth of the femtosecond scale pulse. 
The parameters v., co. and (|). are the dephasing rate constant, the Raman frequency and phase for they'th mode. 
One can see from equation (B 1.3. 25) that for a single mode ^ TR _^ R § is a simple exponential decay, but when 
more modes are involved ^ TR _ CRS will reveal a more complicated beat pattern due to the cross-terms. 

TR-CRS has been used to study many molecules from benzene [ 115 , 116 , 117 and 118 ] to betacarotene [ 119 ]. 
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(K) IMPULSIVE STIMULATED RAMAN SCATTERING (ISRS) 


In discussing RRS above, mention is made of the 'impulsive' preparation of wavepackets in both the excited 
electronic potential surface and the ground state surface. In the absence of electronic resonance only the latter 
channel is operative and ground state wavepackets can be prepared in transparent materials using the spectral 
width of femtosecond light to provide the necessary colours. Such impulsive stimulated Raman scattering 
(ISRS) was first performed by Nelson et al on a variety of systems including acoustic modes in glasses [ 120 ] 


and librational and intramolecular vibrations in liquids [ 121 ]. 

To date, there are two types of configuration employed in ISRS: three-pulse [ 121 ] and two-pulse [ 121 ]. In 
both cases, (an) excitation pulse(s) provide(s) the necessary frequencies to create a vibrational wavepacket 
which proceeds to move within the potential surface. After a delay time x, a probe pulse having the same 
central frequency enters the sample and converts the wavepacket into an optical polarization and the coherent 
fourth wave is detected. For the case of two-pulse ISRS, the transmitted intensity along the probe pulse is 
followed. In three-pulse ISRS (defined by three wavevectors), the coherently scattered radiation is detected 
along its unique wavevector. The intensity of the scattered (transmitted) pulse as a function of x shows 
damped oscillations at the frequency of the Raman mode (roughly the reciprocal of the recurrence time as the 
packet oscillates between the two walls of the potential curve). If the pulse durations are longer than the 
vibrational period (the spectral width is too small to embrace the resonance), no such oscillations can occur. 
Since in ISRS the spectral width of each pulse is comparable to the Raman frequency, each pulse contains 
spectral components that produce Stokes and anti-Stokes scattering. These oscillations occur due to the 
interference between the Stokes and anti-Stokes scattering processes [ 122 ], These processes differ in phase by 
180° (the WMEL rules can show this). This expected phase difference has been demonstrated when 
heterodyne detection is used (the optical Kerr effect probed by an 2? lo ) and the signal is frequency resolved 

[123]- 

As already mentioned, electronically resonant, two-pulse impulsive Raman scattering (RISRS) has recently 
been performed on a number of dyes [ 124 ]. The main difference between resonant and nonresonant ISRS is 
that the beats occur in the absorption of the probe rather than the spectral redistribution of the probe pulse 
energy [ 124 ], These beats are ^out of phase with respect to the beats that occur in nonresonant ISRS (cosine- 
like rather than sinelike). RISRS has also been shown to have the phase of oscillation depend on the detuning 
from electronic resonance and it has been shown to be sensitive to the vibrational dynamics in both the ground 
and excited electronic states [ 122 , 124 ]. 

(L) HIGHER ORDER AND HIGHER DIMENSIONAL TIME RESOLVED TECHNIQUES 

Of great interest to physical chemists and chemical physicists are the broadening mechanisms of Raman lines 
in the condensed phase. Characterization of these mechanisms provides information about the microscopic 
dynamical behaviour of material. The line broadening is due to the interaction between the Raman active 
chromophore and its environment. 

It has been shown that spontaneous or even coherent Raman scattering cannot be used to distinguish between 
the fast homogeneous and the slow inhomogeneous broadening mechanisms in vibrational transitions [ 18 , 
125 ]. One must use higher order (at least fifth order) techniques if one wishes to resolve the nature of the 
broadening mechanism. The ability of these higher order techniques to make this distinction is based on the 
echo phenomena very well known for NMR and mentioned above for D4WM with electronic resonances. The 
true Raman echo experiment is a time resolved seventh order technique which has recently been reported by 
Berg et al [126, 127, 128 and 129]. It is thus an 8WM 
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process in which two fields are needed for each step in the normal 4WM echo. A Raman echo WMEL 
diagram is shown in figure Bl.3.7. It is seen that, as in CRS, the first two pulsed field actions create a 
vibrational coherence. This dephases until the second pair of field actions creates a vibrational population. 
This is followed by two field actions which again create a vibrational coherence but, now, with opposite phase 
to the first coherence. Hence one obtains a partial rephasing, or echo, of the macroscopic polarization. The 
final field action creates the seventh order optical polarization which launches the signal field (the eighth 
field). Just as for the spin echo in NMR or the electronic echo in 4WM, the degree of rephasing (the 


magnitude of the echo) is determined by the amount of slow time scale (inhomogeneous) broadening of the 
two-level system that is present. Spectral diffusion (the exploration of the inhomogeneity by each 
chromophore) destroys the echo. 
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Figure Bl.3.7. A WMEL diagram for the seventh order Raman echo. The first two field actions create the 
usual Raman vibrational coherence which dephases and, to the extent that inhomogeneity is present, also 
weakens as the coherence from different chromophores 'walks off. Then such dephasing is stopped when a 
second pair of field actions converts this coherence into a population of the excited vibrational state/ This is 
followed by yet another pair of field actions which reconvert the population into a vibrational coherence, but 
now one with phase opposite to the first. Now, with time, the 'walked-off component of the original 
coherence can reassemble into a polarization peak that produces the Raman echo at frequency co = 2co 1 - co 9 
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An alternative fifth order Raman quasi-echo experiment can also be performed [ 130 , 131 , 132 , 133 and 134 ], 
Unlike the true Raman echo which involves only two vibrational levels, this process requires the presence of 
three very nearly evenly spaced levels. A WMEL diagram for the Raman quasi-echo process is shown in 
figure Bl.3.8 . Here again the first two field actions create a vibrational coherence which is allowed to 
dephase. This is followed by a second pair of 
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field actions, which, instead of creating a population, creates a different vibrational coherence which is of 
opposite phase and roughly the same order of magnitude as the initial coherence. This serves to allow a 
rephasing of sorts, the quasi-echo, provided the levels are in part inhomogeneously spread. The final field 
action creates the fifth order optical polarization that launches the signal field (the sixth field in this overall 
six- wave mixing process). 
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Figure Bl.3.8. A WMEL diagram for the three-colour fifth order 'quasi-Raman echo'. As usual, the first pair 
of field actions creates the^/g Raman coherence which is allowed both to dephase and 'walk off with time. 
This is followed by a second pair of field actions, which creates a different but oppositely phased Raman 
coherence (now hf) to the first. Its frequency is at a> 3 - co 1 = ca w. Provided that ca w- frequencies are identified 
with an inhomogeneous distribution that is similar to those of the co^ frequencies, then a quasi-rephasing is 
possible. The fifth field action converts the newly rephased Raman polarization into the quasi-echo at a> s = 


2co Q 


CO, 


co 3 + ay 


As one goes to higher orders, there are many other processes that can and do occur. Some are true fifth or 
seventh order processes and others are 'cascaded' events arising from the sequential actions of lower order 
process [ 135 ]. Many of these cascaded sources of polarization interfere with the echo and quasi-echo signal 
and must be handled theoretically and experimentally. 
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The key for optimally extracting information from these higher order Raman experiments is to use two time 
dimensions. This is completely analogous to standard two-dimensional NMR [ 136 ] or two-dimensional 4WM 
echoes. As in NMR, the extra dimension gives information on coherence transfer and the coupling between 
Raman modes (as opposed to spins in NMR). 

With the wealth of information contained in such two-dimensional data sets and with the continued 
improvements in technology, the Raman echo and quasi-echo techniques will be the basis for much activity 
and will undoubtedly provide very exciting new insights into condensed phase dynamics in simple molecular 
materials to systems of biological interest. 


(M) SOME OTHER TECHNIQUES 


This survey of Raman spectroscopies, with direct or implicit use of WMEL diagrams, has by no means 
touched upon all of the Raman methods ( table B 1.3.1 and table Bl.3.2 ). We conclude by mentioning a few 
additional methods without detailed discussion but with citation. The first is Raman microscopy/imaging 
[ 137 ]. This technique combines microscopy and imaging with Raman spectroscopy to add an extra dimension 
to the optical imaging process and hence to provide additional insight into sample surface morphology. 
(Microscopy is the subject of another chapter in this encyclopedia.) The second is Raman optical activity 
[ 138 , 139 ]. This technique discriminates small differences in Raman scattering intensity between left and right 
circularly polarized light. Such a technique is useful for the study of chiral molecules. The third technique is 
Raman-SPM [ 140 ], Here Raman spectroscopy is combined with scanning probe microscopy (SPM) (a subject 
of another chapter in this encyclopedia) to form a complementary and powerful tool for studying surfaces and 
interfaces. The fourth is photoacoustic Raman spectroscopy (PARS) which combines CRS with photoacoustic 
absorption spectroscopy. Class I Raman scattering produces excited vibrational or rotational states in a gas 
whose energies are converted to bulk translational heating. A pressure wave is produced that is detected as an 

acoustic signal [ 141 , 142 ], The fifth is a novel 5WM process (x^) which can occur in noncentrosymmetric 
isotropic solutions of chiral macromolecules [ 143 ], This technique has been given the acronym BioCARS 
since it has the potential to selectively record background free vibrational spectra of biological molecules 
[143, 144]. It could be generalized to BioCRS (to include both BioCSRS and BioCARS). The signal is quite 
weak and can be enhanced with electronic resonance. Finally, we touch upon the general class of 
spectroscopies known as hyper-Raman spectroscopies. The spontaneous version is called simply hyper- 
Raman scattering, HRS, a Class I spectroscopy, but there is also the coherent Class II version called CHRS 

(CAHRS and CSHRS) [145, 146 and 147]. These 6WM spectroscopies depend on x (5) (Im x (5) for HRS) and 
obey the three-photon selection rules. Their signals are always to the blue of the incident beam(s), thus 
avoiding fluorescence problems. The selection rules allow one to probe, with optical frequencies, the usual IR 
spectrum (one photon), not the conventional Raman active vibrations (two photon), but also new vibrations 
that are symmetry forbidden in both IR and conventional Raman methods. 

Although the fifth order hyper- (H-) Raman analogues exist for most of the Raman spectroscopies at third 
order (HRS [148], RHRS [149, 150], CHRS [145, 146 and 147], SEHRS [151, 152 and 153], SERHRS 
[ 154 ]), let us illustrate the hyper-Raman effect with HRS as the example. In this three-photon process, the 
scattered radiation contains the frequencies 20^ ± co R (Stokes and anti-Stokes hyper-Raman scattering), at 
almost twice the frequency of the incident light. The WMEL diagrams are identical to those of SR, except 
each single action of the incident laser field (a^) must be replaced by two simultaneous actions of the laser 
field. (The WMEL diagrams for any other 'hyper' process may be obtained in a similar manner.) 

Experimentally, this phenomenon is difficult to observe (Thrs^&R l^ -5 ); however again electronic resonance 
enhancement is seen to greatly increase the signal intensity [ 148"] . 
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B1. 3.4 APPLICATIONS 


To emphasize the versatility of Raman spectroscopy we discuss just a few selected applications of Raman 
based spectroscopy to problems in chemical physics and physical chemistry. 

B1. 3.4.1 APPLICATIONS IN SURFACE PHYSICS 

In addition to the many applications of SERS, Raman spectroscopy is, in general, a useful analytical tool 
having many applications in surface science. One interesting example is that of carbon surfaces which do not 
support SERS. Raman spectroscopy of carbon surfaces provides insight into two important aspects. First, 
Raman spectral features correlate with the electrochemical reactivity of carbon surfaces; this allows one to 
study surface oxidation [ 155 ]. Second, Raman spectroscopy can probe species at carbon surfaces which may 
account for the highly variable behaviour of carbon materials [ 155 ], Another application to surfaces is the use 


of Raman microscopy in the nondestructive assessment of the quality of ceramic coatings [ 156 ]. Finally, an 
interesting type of surface which does allow for SERS are Mellfs (metal liquid-like films) [ 157 ]. Mellfs form 
at organic-aqueous interfaces when colloids of organic and aqueous metal sols are made. Comparisons with 
resonance Raman spectra of the bulk solution can give insight into the molecule-surface interaction and 
adsorption [ 157 ]. 

B1. 3.4.2 APPLICATIONS IN COMBUSTION CHEMISTRY 

Laser Raman diagnostic techniques offer remote, nonintrusive, nonperturbing measurements with high spatial 
and temporal resolution [ 158 ]. This is particularly advantageous in the area of combustion chemistry. Physical 
probes for temperature and concentration measurements can be debatable in many combustion systems, such 
as furnaces, internal combustors etc., since they may disturb the medium or, even worse, not withstand the 
hostile environments [ 159 ]. Laser Raman techniques are employed since two of the dominant molecules 
associated with air-fed combustion are 2 and N 2 . Homonuclear diatomic molecules unable to have a nuclear 
coordinate-dependent dipole moment cannot be diagnosed by infrared spectroscopy. Other combustion 
species include CH 4 , C0 2 , H 2 and H 2 [ 160 ], These molecules are probed by Raman spectroscopy to 
determine the temperature profile and species concentration in various combustion processes. 

For most practical applications involving turbulent flames and combustion engines, CRS is employed. 
Temperatures are derived from the spectral distribution of the CRS radiation. This may either be determined 
by scanning the Stokes frequency through the spectral region of interest or by exciting the transition in a 
single laser shot with a broadband Stokes beam, thus accessing all Raman resonances in a broad spectral 
region (multiplexing) [ 161 ]. The spectrum may then be observed by a broadband detector such as an optical 
multichannel analyser [ 162 ]. This broadband approach leads to weaker signal intensities, but the entire CRS 
spectrum is generated with each pulse, permitting instantaneous measurements [ 163 ]. Concentration 
measurements can be carried out in certain ranges (0.5-30% [ 161 ]) by using the nonresonant susceptibility as 
an in situ reference standard [ 158 ]. Thus fractional concentration measurements are obtained from the spectral 
profile. 
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evenly spaced levels. A WMEL diagram for the Raman quasi-echo process is shown in figure Bl.3.8 Here 
again the first two field actions create a vibrational coherence which is allowed to dephase. This is followed 
by a second pair of field actions, which, instead of creating a population, creates a different vibrational 
coherence which is of opposite phase and roughly the same order of magnitude as the initial coherence. This 
serves to allow a rephasing of sorts, the quasi-echo, provided the levels are in part inhomogeneously spread. 
The final field action creates the fifth order optical polarization that launches the signal field (the sixth field in 
this overall six-wave mixing process). 
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Figure Bl.3.8. A WMEL diagram for the three-colour fifth order 'quasi-Raman echo'. As usual, the first pair 
of field actions creates the^g Raman coherence which is allowed both to dephase and 'walk off with time. 
This is followed by a second pair of field actions, which creates a different but oppositely phased Raman 
coherence (now hf) to the first. Its frequency is at a> 3 - co 1 = <$ hf Provided that co frequencies are identified 
with an inhomogeneous distribution that is similar to those of the ay frequencies, then a quasi-rephasing is 
possible. The fifth field action converts the newly rephased Raman polarization into the quasi-echo at <d = 
2ol 


CO, 


3 hf 


As one goes to higher orders, there are many other processes that can and do occur. Some are true fifth or 
seventh order processes and others are 'cascaded' events arising from the sequential actions of lower order 
process [ 135 ]. Many of these cascaded sources of polarization interfere with the echo and quasi-echo signal 
and must be handled theoretically and experimentally. 
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The diagnostic themes in art and archeology are to identify pigments and materials that are used in a given 
sample, to date them (and determine changes with time) and to authenticate the origins of a given specimen. 
Diagnosis for forensic purposes may also be considered. In medicine, the principal aim is to provide fast (tens 
of seconds or faster for in vivo studies), noninvasive, nondestructive Raman optical methods for the earliest 
detection of disease, malignancies being those of immediate interest. In a recent review [ 173 ], reference is 
made to the large variety of subjects for Raman probing of diseased tissues and cells. These include plaques in 
human arteries and malignancies in breasts, lungs, brain, skin and the intestine. Hair and nails have been 
studied for signs of disease — metabolic or toxic. Teeth, as well as implants and prostheses, have been 
examined. Foreign inclusions and chemical migration from surgical implants have been studied. 


Another application to biomedicine is to use Raman probing to study DNA biotargets and to identify sequence 
genes. In fact SERS has been applied to such problems [ 174 ]. 


B1. 3.4.5 ANALYTICAL/INDUSTRIAL APPLICATIONS 

Examples that use Raman spectroscopy in the quantitative analysis of materials are enormous. Technology 
that takes Raman based techniques outside the basic research laboratory has made these spectroscopies also 
available to industry and engineering. It is not possible here to recite even a small portion of applications. 
Instead we simply sketch one specific example. 

Undeniably, one of the most important technological achievements in the last half of this century is the 
microelectronics industry, the computer being one of its outstanding products. Essential to current and future 
advances is the quality of the semiconductor materials used to construct vital electronic components. For 
example, ultra-clean silicon wafers are needed. Raman spectroscopy contributes to this task as a monitor, in 
real time, of the composition of the standard SC-1 cleaning solution (a mixture of water, H 2 2 and NH 4 OH) 
[ 175 ] that is essential to preparing the ultra-clean wafers. 


B1.3.5 A SNAPSHOT OF RAMAN ACTIVITY IN 1998 

In conclusion, we attempt to provide a 'snapshot' of current research in Raman spectroscopy. Since any 
choice of topics must be necessarily incomplete, and certainly would reflect our own scientific bias, we 
choose, instead, an arbitrary approach (at least one not that is not biased by our own specialization). Thus an 
abbreviated summary of the topics just presented in the keynote/plenary lectures at ICORS XVI in Cape Town, 
South Africa, is presented. Each of the 22 lectures appears in the Proceedings (and Supplement) of the 16th 
International Conference on Raman Spectroscopy (1998), edited by A M Heyns (Chichester: Wiley) in a four- 
page format, almost all containing a short list of references. Rather than ourselves searching for seminal 
citations, we instead give the e-mail address of the principal author, when available. Though the intent in this 
procedure is to expose the wide scope of current Raman activity, it is hoped that the reader who is looking for 
more details will not hesitate to seek out the author in this fashion, and that the authors will not feel put upon 
by this manner of directing people to their work. To relate to table B 1.3.1 and table Bl.3.2 , the acronym is 
given for the principal Raman spectroscopy that is used for each entry. 

Keynote lecture. T G Spiro, e-mail address: spiro@princeton.edu (RRS and TRRRS). Review of protein 
dynamics followed by TRRRS selective to specific structural and prosthetic elements. 
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Plenary 7. D A Long, tel.: + 44 - 1943 608 472. Historical review of the first 70 years of Raman spectroscopy 

Plenary 2. S A Asher et al 9 e-mail address: asher@vms.cis.pitt.edu/asher+ (RRS, TRRRS). UV RRS is used 
to probe methodically the secondary structure of proteins and to follow unfolding dynamics. Developing a 
library based approach to generalize the method to any protein. 

Plenary 3. Ronald E Hester et al, e-mail address: reh@york.ac.uk (SERS). Use of dioxane envelope to bring 
water insoluble chromophores (chlorophylls) into contact with aqueous silver colloids for SERS enhancement. 
PSERRS — 'protected surface-enhanced resonance Raman spectroscopy'. 

Plenary 4. George J Thomas Jr et al, e-mail address: thomas g j@cctr.umkc.edu (RS). Protein folding and 
assembly into superstructures. (Slow) time resolved RS probing of virus construction via protein assembly 
into an icosahedral (capsid) shell. 

Plenary 5. Manuel Cardona, e-mail address: cardona@cardix.mpi-stuttgart.de (RS). Studies of high T Q 
superconductors. These offer all possible Raman transitions — phonons, magnons, free carrier excitations, pair 


breaking excitations and mixed modes. 

Plenary 6. Shu-Lin Zhang et al, e-mail address: slzhan g @pku.edu.cn (RS). Studies of phonon modes of 
nanoscale one-dimensional materials. Confinement and defect induced Raman transitions. 

Plenary 7. S Lefrante, e-mail address: lefrant@cnrs-imn.fr (RRS and SERS). Raman studies of electronic 
organic materials from conjugated polymers to carbon nano tubes. New insight into chain length distribution, 
charge transfer and diameter distribution in carbon nanotubes offered by Raman probing. 

Plenary 8. J Greve et al, e-mail address: J.Greve@tn.utwente.nl (RS). Confocal direct imaging Raman 
microscope (CDIRM) for probing of the human eye lens. High spatial resolution of the distribution of water 
and cholesterol in lenses. 

Plenary 9. J W Nibler et al, e-mail address: niblerj@chem.orst.edu (CARS and SRS). High resolution studies 
of high lying vibration-rotational transitions in molecules excited in electrical discharges and low density 
monomers and clusters in free jet expansions. Ionization detected (REMPI) SRS or IDSRS. Detect Raman 

lines having an FWHM of 30 MHz (10 -3 cm -1 ), possibly the sharpest lines yet recorded in RS. Line 
broadening due to saturation and the ac Stark effect is demonstrated. 

Plenary 10. Hiro-o Hamaguchi, e-mail address: hhama@chem.s.u-tokyo. ac.jp (time and polarization resolved 
multiplex 2D-CARS). Two-dimensional (time and frequency) CARS using broadband dye source and streak 
camera timing. Studies dynamic behaviour of excited (pumped) electronic states. Follows energy flow within 
excited molecules. Polarization control of phase of signal (NR background suppression). 

Plenary 11. W Kiefer et al, e-mail address: wolfgan g .kiefer@mail.uni-wue.de (TR CARS). Ultrafast 
impulsive preparation of ground state and excited state wavepackets by impulsive CARS with REMPI 
detection in potassium and iodine dimers. 
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Plenary 12. Soo-Y Lee, e-mail address: scileesy@nus.edu.sg (RRS). Addresses fundamental theoretical 
questions in the phase recovery problem in the inverse transform (REP to ABS). See above. 

Plenary 13. Andreas Otto, e-mail address: otto@rz.uni-duesseldorf.de (SERS). A survey of problems and 
models that underlie the SERS effect, now two decades old. Understanding the role of surface roughness in 
the enhancement. 

Plenary 14. A K Ramdas et al, e-mail address: akr@physics.purdue.edu (RS). Electronic RS studies of doped 
diamond as potential semiconducting materials. A Raman active Is (p 3/2 )-ls (p 1/2 ) transition of a hole 

trapped on a boron impurity both in natural and C diamond. A striking sensitivity of the transition energy to 
the isotopic composition of the host lattice. 

Plenary 15. B Schrader et al, e-mail address: bernhard.schrader@uni-essen.de (NIR-FTRS). A review of the 
use of Raman spectroscopy in medical diagnostics. Its possibilities, limitations and expectations. Emphasizes 
the need for a library of reference spectra and the applications of advanced analysis (chemometry) for 
comparing patient/library spectra. 

Plenary 16. N I Koroteev et al, e-mail address: Koroteev@nik.phys.msu.su (CARS/CSRS, CAHRS, 
BioCARS). A survey of the many applications of what we call the Class II spectroscopies from third order 
and beyond. 2D and 3D Raman imaging. Coherence as stored information, quantum information (the 'qubit'). 
Uses terms CARS/CSRS regardless of order. BioCARS is fourth order in optically active solutions. 


Plenary 17. P M Champion et al, e-mail address: champ@neu.edu (TRRRS). Femtosecond impulsive 
preparation and timing of ground and excited state Raman coherences in heme proteins. Discovery of 
coherence transfer along a de-ligation coordinate. See above for further comment. 

Plenary 18. Robin J H Clark, e-mail address: r.j.h.clark@ucl.ac.uk (RS). Reports on recent diagnostic probing 
of art works ranging from illuminated manuscripts, paintings and pottery to papyri and icons. Nondestructive 
NIR microscopic RS is now realistic using CCD detection. Optimistic about new developments. 

Plenary 19. H G M Edwards, e-mail address: h.g.m.edwards@bradford.ac.uk (NIR-FTRS). A review of 
recent applications of RS to archeology — characterizing ancient pigments, human skin, bone, ivories, teeth, 
resins, waxes and gums. Aging effects and dating possibilities. Emphasizes use of microscopic Raman. 

Plenary 20. M Grimsditch, e-mail address: marcos_grimsditch@qmgate.anl.gov (magnetic field based RS). 
Low frequency Raman scattering from acoustic phonons is known as Brillouin scattering (BS). However any 
kind of small quantum Raman scattering is likewise called BS. Ferromagnetic materials offer spin that 
precesses coherently in the presence of an applied field. Such spin waves or magnons can undergo quantum 
jumps by the inelastic scattering of light. Experiments (and energy level spacing theory) involving surface 
magnons in very thin multilayer slabs (such as Fe/Cr/Fe) are discussed. The energy spacing (the Brillouin 
spectrum) depends on the applied magnetic field, and the RS (or BS) theory is driven by the magnetic 
component of the electromagnetic field, not the electric (as discussed exclusively in the present chapter). 
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Plenary 21 A. Alian Wang et al, e-mail address: alianw@levee.wustl.edu (RS). (Unable to attend ICORS, but 
abstract is available in proceedings.) With technological advances, Raman spectroscopy now has become a 
field tool for geologists. Mineral characterization for terrestrial field work is feasible and a Raman instrument 
is being designed for the next rover to Mars, scheduled for 2003. 

Plenary 21B. A C Albrecht et al, e-mail address: aca7@cornell.edu (I^CRS) (substituting for plenary 21 A). 
Discusses four new applications using a 'third' approach to the Class II spectroscopies (see above). Raman 

spectrograms from I^CARS and I^CARS are seen to (i) decisively discriminate between proposed 
mechanisms for dephasing of the ring breathing mode in liquid benzene, (ii) detect the presence of memory in 
the Brownian oscillator model of dephasing, (iii) determine with very high accuracy Raman frequency shifts 
and bandwidths with changing composition in binary mixtures. Moreover these are successfully related to the 
partial pressures as they change with composition and (iv) to provide a new, definitive, way to discriminate 
between two competing processes at fifth order (6WM) — cascaded third order or true fifth order. 

Clearly the broad survey of current activity in Raman spectroscopy revealed by this simple snapshot promises 
an exciting future that is likely to find surprising new applications, even as present methods and applications 
become refined. 


APPENDIX 

Here we examine the viewing angle dependence of the differential Raman cross-section for the cases of 

linearly polarized and circularly polarized incident light 5 . The angles used in such experiments are shown in 
figure B1.3.A.9. Experiments involving circularly polarized light are entirely defined in terms of the 
scattering angle, £, the angle between the wavevectors of the incident (kj) and scattered (k ) light. For 
experiment involving linearly polarized light, two angles are needed: %, the angle between k g and the unit 
vector along the direction of polarization of the incident light (e^ and r|, the polar angle of k § in the plane 
perpendicular to e^ The polarization of the scattered light is analysed along the two axes e a and e b , where e b is 
chosen to be perpendicular to e-. The differential Raman cross-sections for the two analysing directions are 


[176] 


and 




fji) = ^f-^-y,,( g1± E^f 5a:1 - 3i:i y 
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The total differential cross-section ( equation (B1.3.A1) + equation (B1.3.A2) ) is then 


(B1.3.A1) 


(B1.3.A2) 




/da 


and the depolarization ratio 


P(^ V) = 


P\ 


1 - (1 — A)eos 2 £ 


(B1.3.A3) 


(B1.3.A4) 


where p 1 is given by equation (B 1.3.23) and (da/dQ) is the total differential cross section at 90°, (da/dQ),, + 
(da/dQ)^, ( equation (B 1.3. 20) and equation (Bl.3.21) . From expression (B1.3.A3) and expression (B1.3.A4), 
one can see that no new information is gained from a linearly polarized light scattering experiment performed 
at more than one angle, since the two measurables at an angle (£, r|) are given in terms of the corresponding 
quantities at 90°. 



Figure B1.3.A.9. Diagram depicting the angles used in scattering experiments employing linearly and 
circularly polarized light. The subscripts i and s refer to the incident and scattered beam respectively. 

For a circularly polarized light experiment, one can measure the cross sections for either right (r) or left (1) 
polarized scattered light. Suppose that right polarized light is made incident on a Raman active sample. The 
general expressions for the Raman cross sections are [ 176 ] 
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and 


In analogy with the depolarization ratio for linearly polarized light, the ratio of the two above quantities is 
known as the reversal coefficient, R(Q, given by 

where the zero angle reversal coefficient, ^(0), is 6Z /(10Z + 5Z + Z ). Measurement of the reversal 
coefficient (equation (B1.3.A5)) at two appropriate scattering angles permits one to determine both p 1 and R 
(0). Thus, only with circularly polarized light is one able to quantify all three rotational tensor invariants 
[176]. 
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applications to Raman scattering J. Phys. Chem. A 104 4167-73. 

In fact averaging over an odd number of direction cosines need not always vanish for an isotropic system. This is the 
;ase for solutions containing chiral centres which may exhibit even order signals such as 'BioCARS' in table B1.3.2 . 
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RS (or BS) theory is driven by the magnetic component of the electromagnetic field, not the electric (as 
discussed exclusively in the present chapter). 

Plenary 21 A. Alian Wang et al, e-mail address: alianw@levee.wustl.edu (RS). (Unable to attend ICORS, but 
abstract is available in proceedings.) With technological advances, Raman spectroscopy now has become a 
field tool for geologists. Mineral characterization for terrestrial field work is feasible and a Raman instrument 
is being designed for the next rover to Mars, scheduled for 2003. 

Plenary 2 IB. A C Albrecht et al, e-mail address: aca7@cornell.edu (I (2) CRS) (substituting for plenary 21 A). 
Discusses four new applications using a 'third' approach to the Class II spectroscopies (see above). Raman 

spectrograms from I (2 ^CARS and I (3 ^CARS are seen to (i) decisively discriminate between proposed 
mechanisms for dephasing of the ring breathing mode in liquid benzene, (ii) detect the presence of memory in 
the Brownian oscillator model of dephasing, (iii) determine with very high accuracy Raman frequency shifts 
and bandwidths with changing composition in binary mixtures. Moreover these are successfully related to the 
partial pressures as they change with composition and (iv) to provide a new, definitive, way to discriminate 
between two competing processes at fifth order (6WM) — cascaded third order or true fifth order. 

Clearly the broad survey of current activity in Raman spectroscopy revealed by this simple snapshot promises 
an exciting future that is likely to find surprising new applications, even as present methods and applications 
become refined. 


APPENDIX 

Here we examine the viewing angle dependence of the differential Raman cross-section for the cases of 

linearly polarized and circularly polarized incident light 5 . The angles used in such experiments are shown in 
figure B 1.3. A. 9 Experiments involving circularly polarized light are entirely defined in terms of the scattering 
angle, £, the angle between the wavevectors of the incident (k.) and scattered (k ) light. For experiment 
involving linearly polarized light, two angles are needed: ^, the angle between k and the unit vector along the 
direction of polarization of the incident light (e.) and r|, the polar angle of k in the plane perpendicular to e.. 
The polarization of the scattered light is analysed along the two axes e a and e b , where e b is chosen to be 
perpendicular to e r The differential Raman cross-sections for the two analysing directions are [ 176 ] 
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Figure B1.3.A.9. Diagram depicting the angles used in scattering experiments employing linearly and 
circularly polarized light. The subscripts i and s refer to the incident and scattered beam respectively. 




(B1.3.A1) 


and 


,2 \2 


mM^)'^i^} 


(B1.3.A2) 


The total differential cross-section (equation (B1.3.A1) + equation (B1.3.A2)) is then 


W^A,, U«Jv i+pi co v 


(B1.3.A3) 


and the depolarization ratio 
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ptf, ■?) = 


p\ 


1 -(I-a)cos 2 ^ 


(B1.3.A4) 


where p { is given by equation (B 1.3.23) and (da/dQ) is the total differential cross section at 90°, (da/dQ) + 
(da/dQ) ± , ( equation (B 1.3. 20) and equation (B 1.3. 21) . From expression (B 1.3. A3) and expression (B1.3.A4), 
one can see that no new information is gained from a linearly polarized light scattering experiment performed 
at more than one angle, since the two measurables at an angle (£ 5 r|) are given in terms of the corresponding 
quantities at 90°. 


For a circularly polarized light experiment, one can measure the cross sections for either right (r) or left (1) 
polarized scattered light. Suppose that right polarized light is made incident on a Raman active sample. The 
general expressions for the Raman cross sections are [ 176 ] 

/r v v / \+^[(l + cos C) 2 + 12(1 -cos- 


and 


{^--Gi)^™ 3 


, + ]?<;[(- 1 + cos{) 2 + 1 2(1+ cos f)]2 2 


In analogy with the depolarization ratio for linearly polarized light, the ratio of the two above quantities is 
known as the reversal coefficient, R(Q, given by 

r,,^ _ \Mth 2<Hvn) M " *> I -/Hill LOh * , R1U « 

* { ° = TE) = i_j=^^ c + i=Em cos , (B13A5) 

where the zero angle reversal coefficient, ^(0), is 6Z 2 /(10Z° + 5Z 1 + Z 2 ). Measurement of the reversal 
coefficient (equation (B1.3.A5)) at two appropriate scattering angles permits one to determine both p and 7^ 
(0). Thus, only with circularly polarized light is one able to quantify all three rotational tensor invariants 
[176]. 
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1 A version of this material appears in a special issue of the Journal of Physical Chemistry dedicated to the Proceedings of 
the International Conference on Time-Resolved Vibrational Spectroscopy (TRVS IX), May 16-22 1999, Tucson, Arizona. 
See: Kirkwood J C, Ulness D J and Albrecht A C 2000 On the classification of the electric field spectroscopies: 
applications to Raman scattering J. Phys. Chem. A 104 4167-73. 

2 ln fact averaging over an odd number of direction cosines need not always vanish for an isotropic system. This is the 
case for solutions containing chiral centres which may exhibit even order signals such as 'BioCARS' in table B1.3.2 . 


3 Here, we have averaged over all possible orientations of the molecules. (See [26].) 

4 Raman cross-sections, based on the linear polarizability, are now routinely subject to quantum chemical calculations. 
These may be found as options in commercial packages such as 'Gaussian 98' (Gaussian Inc., Pittsburgh, PA). 

5 This treatment is essentially that given in [ 176 ]. 
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B1.4 Microwave and terahertz spectroscopy 

Geoffrey A Blake 


B1.4.1 INTRODUCTION 

Spectroscopy, or the study of the interaction of light with matter, has become one of the major tools of the 
natural and physical sciences during this century. As the wavelength of the radiation is varied across the 
electromagnetic spectrum, characteristic properties of atoms, molecules, liquids and solids are probed. In the 


optical and ultraviolet regions (A,~l |um up to 100 nm) it is the electronic structure of the material that is 
investigated, while at infrared wavelengths (-1-30 |nm) the vibrational degrees of freedom dominate. 

Microwave spectroscopy began in 1934 with the observation of the -20 GHz absorption spectrum of 
ammonia by Cleeton and Williams. Here we will consider the microwave region of the electromagnetic 

spectrum to cover the 1 to 100 x 10 9 Hz, or 1 to 100 GHz (k ~ 30 cm down to 3 mm), range. While the 
ammonia microwave spectrum probes the inversion motion of this unique pyramidal molecule, more typically 
microwave spectroscopy is associated with the pure rotational motion of gas phase species. 
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The section of the electromagnetic spectrum extending roughly from 0.1 to 10 x 10 Hz (0.1-10 THz, 3-300 
cm" ) is commonly known as the far-infrared (FIR), submillimetre or terahertz (THz) region, and therefore 
lies between the microwave and infrared windows. Accordingly, THz spectroscopy shares both scientific and 
technological characteristics with its longer- and shorter-wavelength neighbours. While rich in scientific 
information, the FIR or THz region of the spectrum has, until recently, been notoriously lacking in good 
radiation sources — earning the dubious nickname 'the gap in the electromagnetic spectrum'. At its high- 
frequency boundary, most coherent photonic devices (e.g. diode lasers) cease to radiate due to the long 
lifetimes associated with spontaneous emission at these wavelengths, while at its low-frequency boundary 
parasitic losses reduce the oscillatory output from most electronic devices to insignificant levels. As a result, 
existing coherent sources suffer from a number of limitations. This situation is unfortunate since many 
scientific disciplines — including chemical physics, astrophysics, cosmochemistry and planetary/atmospheric 
science to name but a few — rely on high-resolution THz spectroscopy (both in a spectral and temporal sense). 
In addition, technological applications such as ultrafast signal processing and massive data transmission 
would derive tremendous enhancements in rate and volume throughput from frequency-agile THz 
synthesizers. 

In general, THz frequencies are suitable for probing low-energy light-matter interactions, such as rotational 
transitions in molecules, phonons in solids, plasma dynamics, electronic fine structure in atoms, thermal 
imaging of cold sources and vibrational-rotation-tunnelling behaviour in weakly bound clusters. Within the 
laboratory, THz spectroscopy of a variety of molecules, clusters and condensed phases provides results that 
are critical to a proper interpretation of the data acquired on natural sources, and also leads to a better 
understanding of important materials — particularly hydrogen-bonded liquids, solids and polymers that 
participate in a variety of essential (bio)chemical processes. 

For remote sensing, spectroscopy at THz frequencies holds the key to our ability to remotely sense 
environments as diverse as primaeval galaxies, star and planet-forming molecular cloud cores, comets and 
planetary atmospheres. 


In the dense interstellar medium characteristic of sites of star formation, for example, scattering of visible/UV 
light by sub-micron-sized dust grains makes molecular clouds optically opaque and lowers their internal 
temperature to only a few tens of Kelvin. The thermal radiation from such objects therefore peaks in the FIR 
and only becomes optically thin at even longer wavelengths. Rotational motions of small molecules and 
ro vibrational transitions of larger species and clusters thus provide, in many cases, the only or the most 
powerful probes of the dense, cold gas and dust of the interstellar medium. 

Since the major drivers of THz technology have been scientists, particularly physicists and astrophysicists 
seeking to carry out fundamental research, and not commercial interests, a strong coupling of technology 
development efforts for remote sensing with laboratory studies has long characterized spectroscopy at 
microwave and THz frequencies. In many respects the field is still in its infancy, and so this chapter will 
present both an overview of the fundamentals of microwave and THz spectroscopy as well as an assessment 
of the current technological state of the art and the potential for the future. We will begin with a brief 
overview of the general characteristics of THz spectrometers and the role of incoherent sources and detection 


strategies in the THz region, before turning to a more detailed description of the various coherent THz sources 
developed over the past decade and their applications to both remote sensing and laboratory studies. 


B1.4.2 INCOHERENT THZ SOURCES AND BROADBAND 
SPECTROSCOPY 

B1. 4.2.1 PRINCIPLES AND INSTRUMENTATION 

Like most other fields of spectroscopy, research at THz frequencies in the first half of the twentieth century 
was carried out with either dispersive (i.e. grating-based) or Fourier transform spectrometers. The much 
higher throughput of Fourier transform spectrometers compared to those based on diffraction gratings has 
made THz Fourier transform spectroscopy, or THz FTS, the most popular incoherent technique for acquiring 
data over large regions of the THz spectrum. This is especially true for molecular line work where THz FTS 
resolutions of order 50-100 MHz or better have been obtained [JJ. With large-format detector arrays, such as 
those available at optical through near- to mid-infrared wavelengths, grating- or Fabry-Perot-based 
instruments can provide superior sensitivity [2], but have not yet been widely utilized at THz frequencies due 
to the great difficulty of fabricating arrays of THz detectors. 

The components of THz spectrometers can be grouped into three main categories: sources (e.g. lasers, Gunn 
oscillators, mercury-discharge lamps), propagating components (e.g. lenses, sample cells, filters) and 
detectors (e.g. bolometers, pyroelectric detectors, photoacoustic cells). Propagating components in the FIR are 
well established (see [3] for an excellent overview of technical information). In the area of detectors, recent 
progress has placed them ahead of source technology. For example, spider-web Si bolometers developed by 

Bock et al [4] have an electrical NEP (noise-equivalent power) of 4 x 10 17 W Hz _/2 when cooled to 300 mK. 
For those who desire less exotic cryogenic options, commercially available Si-composite bolometers offer an 

1 O 1 / 

electrical NEP of 1 x 10 W Hz , operating at liquid helium temperature (4.2 K), and an electrical NEP of 
3 x 10~ 15 W Hz _1/2 , operating at 1.2 K (pumped L He ). Combining these into large-format arrays remains a 
considerable technological challenge, although arrays of several tens of pixels on a side are now beginning to 
make their way into various telescopes such as the Caltech Submillimeter Observatory and the James Clerk 
Maxwell Telescope [5, 6]. In addition, their electrical bandwidths are typically only between a few hundred 
hertz and 1 kHz. Fast-modulation schemes cannot therefore be used, and careful attention must be paid to I If 
noise in experiments with Si bolometers. Hot-electron bolometers based on InSb offer electrical 


bandwidths of 1 MHz, but without cyclotron-assisted resonance, InSb THz bolometers cannot be used above 
v = 25-30 cm -1 [7]. Similarly, photoconductors based on Ga:Ge offer high electrical speed and good quantum 
efficiency, but due to the bandgap of the material are unusable below 50-60 cm -1 [8]. 

In practice, the NEP of a room-temperature THz spectrometer is usually limited by fluctuations (shot-noise) in 
the ambient blackbody radiation. Using an optical bandwidth Av = 3 THz (limited by, for example, a 

polyethylene/diamond dust window), a field of view (at normal incidence) = 9 and a detecting diameter 
(using a so-called Winston cone, which condenses the incident radiation onto the detecting element) d = 1 . 1 
cm, values that are typical for many laboratory applications, the background-limited NEP of a bolometer is 
given by 

(B1.4.1) 

where k is the Boltzmann constant, T= 300 K, Af (the electronic amplified bandwidth) = 1 Hz, and X (the 
band-centre wavelength) »200 jum. The equation above uses the Rayleigh- Jeans law, which is valid for = 17 


THz. Therefore, for laboratory absorption experiments, a typical FIR detector provides an estimated detection 
limit (NEP/source power) of 10 -4 with a source output of 20 nW. In general, high-sensitivity bolometers 
saturate at an incident-power level of «1 |uW or less, resulting in an ultimate detection limit of 10~ 7 . For yet 
higher dynamic range, a filter element (e.g. cold grating, prism, or etalon) must be placed before the detector 

to reduce background noise, or the background temperature must be lowered. Note the (Av) /2 dependence in 
equation (B 1.4.1), which means that the optical bandwidth must be reduced to -30 GHz to drop the NEP by a 
factor often. Thus, unlike shorter wavelength regions of the electromagnetic spectrum, due to the high 
background luminosity in the THz, spectroscopic sensitivity in the laboratory is limited by the source power, 
in comparison to the background power, incident on the detector — not by shot-noise of the available 
spectroscopic light sources. As higher-power light sources are developed at THz frequencies, lower-NEP 
detectors can be utilized that are less prone to saturation, and shot-noise will become the limiting factor, as it 
is in other regions of the electromagnetic spectrum. 

For broadband THz FTS instruments this large background actually leads to a 'multiplex disadvantage' in that 
the room-temperature laboratory background can easily saturate the sensitive THz detectors that are needed to 
detect the feeble output of incoherent THz blackbody sources, which drop rapidly as the wavelength 
increases. The resulting sensitivity is such that signal-to-noise ratios in excess of 100 are difficult to generate 
at the highest feasible resolutions of 50-100 MHz [I], which is still quite large compared to the 1-2 MHz 
Doppler-limited line widths at low pressure. For low-resolution work on condensed phases, or for the 
acquisition of survey spectra, however, THz FTS remains a popular technique. Beyond wavelengths of ~1 
mm, the sensitivity of FTS is so low that the technique is no longer competitive with the coherent approaches 
described below. 

B1. 4.2.2 THZ FTS STUDIES OF PLANETARY ATMOSPHERES 

Rather different circumstances are encountered when considering THz remote sensing of 
extraterrestrial sources. The major source of THz opacity in the Earth's atmosphere is water vapour, 
and from either high, dry mountain sites or from space there are windows in which the background 
becomes very small. Incoherent instruments which detect the faint emission from astronomical 
sources can therefore be considerably more sensitive than their laboratory 
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counterparts. Again, grating- or etalon-based and FTS implementations can be considered, with the former 
being preferred if somewhat coarse spectral resolution is desired or if large-format detector arrays are 
available. 

In planetary atmospheres, a distinct advantage of THz studies over those at optical and infrared wavelengths is 
the ability to carry out spectroscopy without a background or input source such as the sun. Global maps of a 
wide variety of species can therefore be obtained at any time of day or night and, when taken at high enough 
spectral resolution, the shapes of the spectral lines themselves also contain additional information about 
vertical abundance variations and can be used to estimate the atmospheric temperature profile. The 'limb 
sounding' geometry, in which microwave and THz emission from the limb of a planetary atmosphere is 
imaged by an orbiting spacecraft or a balloon, is particularly powerful in this regard, and excellent reviews are 
available on this subject [9]. The observing geometry is illustrated in figure B 1.4.1 which also presents a 

portion of the Earth's stratospheric emission spectrum near 118 cm obtained by the balloon-borne 
Smithsonian Astrophysical Observatory limb-sounding THz FTS [10]. 


B1 A3 COHERENT THZ SOURCES AND HETERODYNE 
SPECTROSCOPY 

B1. 4.3.1 PRINCIPLES AND INSTRUMENTATION 

The narrow cores of atmospheric transitions shown in figure B 1.4.1 can be used, among other things, to trace 
the wind patterns of the upper atmosphere. For such work, or for astronomical remote-sensing efforts, 
resolutions of the order of 30-300 m s are needed to obtain pressure broadening or kinematic information, 
which correspond to spectral resolutions of (v/Av) - 1-10 x 10 6 . Neither FTS nor grating-based spectrometers 
can provide resolution at this level and so other techniques based on coherent radiation sources must be used. 
The most important of these is called heterodyne spectroscopy. Heterodyne spectroscopy uses nonlinear 
detectors called mixers, in order to downconvert the high-frequency THz radiation into radiofrequency or 
microwave signals that can be processed using commercial instrumentation. An outline of a heterodyne 
receiver is presented in figure Bl.4.2 . In such a receiver, an antenna (dish) collects radiation from space, and 
this radiation is focused onto a detector operating in heterodyne mode, which means that the incoming signal 
is mixed with the output of a coherent source (called the local oscillator, or LO). Now, if a device can be 
constructed that responds quadratically, rather than linearly, to the two input beams, the output, S(0, of such a 
device is given by 
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where the identities 2cosacosP = cos(a+P)+cos(a-(3) and cos a = l/2(l+cos2a) are invoked. The last term is 
a field oscillating at a frequency equal to the difference between the two incidental fields — representing the 

beat-note between v 1 and v 2 . 
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Figure Bl.4.1. Top: schematic illustration of the observing geometry used for limb sounding of the Earth's 
atmosphere. Bottom: illustrative stratospheric OH emission spectra acquired by the SAO FIRS-2 far-infrared 
balloon-borne FTS in autumn 1989. The spectra are from a range of tangent heights (h = tangent height in the 
drawing), increasing toward the bottom, where the data are represented by solid curves; nonlinear least-square 
fits to the measurements, based on a combination of laboratory data, the physical structure of the stratosphere 
and a detailed radiative transfer calculation, are included as dashed curves. The OH lines are F 1 ( n 3/2 ), 7/2~ 
— » 5/2 + and 7/2 + — » 5/2~ (the hyperfine structure is unresolved in these measurements). Other major 
contributing lines are also identified [10]. 
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Figure Bl.4.2. (A) Basic components of an astronomical heterodyne receiver. The photomicrograph in (B) 
presents the heart of a quasi-optical SIS mixer and its associated superconducting tuning circuits, while the 
image in (C) shows the fully assembled mixer, as it would be incorporated into a low-temperature cryostat (J 
Zmuidzinas, private communication). 


v LQ and v 2 = v SKy « v LQ , the sum frequency lies at THz frequencies, while the difference lies at radio 


Ifv 1 

or microwave frequencies, and is called the intermediate frequency, or IF. The IF can be amplified and 
recorded (for example, on a spectrum analyser or by a digital correlator or filterbank) across a range of 
frequencies simultaneously. The fact that the IF power is proportional to the product of the remote-signal 
power and the LO power results in two main advantages: (a) the signal-to-noise ratio is enhanced by using an 
LO power that is much higher than that of the remote signal and (b) the spectral resolution is set by the 
linewidth of the LO, which can be as narrow as desired. 
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Among the first THz mixers to be constructed were those based on room-temperature Schottky diodes [11]. 
Over the past decade, new mixers based on superconducting tunnel junctions have been developed that have 
effective noise levels only a few times the quantum limit of ^ m j xer = hvlk [12]. However, certain conditions 
must be met in order to exploit these advantages. For example, while the LO power should be strong 
compared to the remote signal and to the noise, it must not be so strong as to saturate the detecting element. 
Roughly speaking, the optical coupled LO power level is about 1 |uW for SIS (superconductor-insulator- 
superconductor) mixers and 100 nW for superconductive Nb HEBs (hot-electron bolometers, or transition 


edge bolometers, TEBs). In addition, since the overlap between the remote signal and the LO is important, the 
spatial distribution of the LO output must be well coupled to the receiver's antenna mode. Although this 
requirement imposes experimental complexity, it also provides excellent rejection of ambient background 
radiation. 

As noted above, at THz frequencies the Rayleigh- Jeans approximation is a good one, and it is typical to 
report line intensities and detector sensitivities in terms of the Rayleigh- Jeans equivalent temperatures. In 
frequencies range where the atmospheric transmission is good, or from airborne or space-borne platforms, the 
effective background temperature is only a few tens of Kelvin. Under such conditions, SIS mixers based on 
Nb, a particular implementation of which is pictured in figure Bl.4.2 can now perform up to 1.0 THz with T 
mixer = 130 K [13]. The earliest SIS microwave and millimetre- wave receivers utilized waveguide 
components, but as the operating frequencies have been pushed into the THz region, quasi-optical designs 
such as those shown in figure Bl.4.2 become attractive. Such designs may also be easier to incorporate into 
THz receiver arrays. Recently, alternative superconducting Nb hot-electron mixers that rely on a diffusion- 
based relaxation mechanism have been demonstrated with ^ mixer = 750 K between 1 and 2.5 THz [14]. These 
devices are expected to operate up to at least several tens of THz — if coherent sources are available as LOs. 
Thus, THz source technology plays a key role in setting the spectroscopic sensitivity for both laboratory and 
remote-sensing experiments. 

B1. 4.3.2 THZ REMOTE SENSING WITH HETERODYNE RECEIVERS 

Heterodyne spectroscopy has been particularly critical to the study of the Earth's stratosphere, where the 
improved resolution and sensitivity compared to the FTS spectra shown in figure B 1.4.1 have led to the 
collection of global maps of species important to ozone chemistry and atmospheric dynamics (0 3 , CIO, S0 2 , 
H 2 0, 2 , etc: see [9] for an extended overview), and of the dense interstellar medium. Although human 
beings have been systematically observing astronomical objects for thousands of years, until the advent of 
radioastronomy in the 1960s we possessed little knowledge of what, if anything, exists in the space between 
stars. Optical observations revealed only stars, galaxies and nebulae; if matter existed in the vast, dark 
interstellar medium, it was not detectable. However, the discovery of the first polyatomic microwave in the 
interstellar medium, water (H 2 0), ammonia (NH 3 ) and formaldehyde (H 2 CO), by microwave remote sensing 
(in 1968 [15]) set off an exciting era of discovery. 

To date, researchers have identified more than 100 different molecules, composed of up to 13 atoms, in the 
interstellar medium [16]. Most were initially detected at microwave and (sub)millimetre frequencies, and the 
discoveries have reached far beyond the mere existence of molecules. Newly discovered entities such as 
diffuse interstellar clouds, dense (or dark) molecular clouds and giant molecular cloud complexes were 
characterized for the first time. Indeed, radioastronomy (which includes observations ranging from radio to 
submillimetre frequencies) has dramatically changed our perception of the composition of the universe. 
Radioastronomy has shown that most of the mass in the interstellar medium is contained in so-called dense 

molecular clouds, which have tremendous sizes of 1-100 light years, average gas densities of 10—10 cm -3 , 
and temperatures in the range of 10-600 K. An overview of the THz emission from a cold, dense interstellar 
cloud is presented in figure Bl.4.3 [ 17 ]. 
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Figure Bl.4.3. (a) A schematic illustration of the THz emission spectrum of a dense molecular cloud core at 
30 K and the atmospheric transmission from ground and airborne altitudes (adapted, with permission, from 
[17]). (b) The results of 345 GHz molecular line surveys of three cores in the W3 molecular cloud; the 
graphics at left depict the evolutionary state of the dense cores inferred from the molecular line data [21]. 

In addition to striking differences from one cloud to another, each dense molecular cloud is inhomogeneous, 
containing clumps, or cores, of higher-density material situated within envelopes of somewhat lower density. 
Many of these higher-density cores are active sites of star formation, with the youngest stars being detectable 
only in the IR or FIR. Star formation is of major interest in astrophysics, and it contains a wealth of interesting 
chemical reactions and physical phenomena (for excellent reviews, see [18, 19 and 20]). Optical observations 
are unable to characterize interstellar clouds due to absorption and scattering, both of which have an inverse 
wavelength dependence, by the pervasive dust particles inside these clouds. Thus, microwave and THz 
spectroscopy is responsible for identifying most of the hundred or so interstellar molecules to data, and 
continues to dominate the fields of molecular astrophysics and interstellar chemistry. The power of 
heterodyne spectroscopy in examining the differences between dense clouds undergoing star (and presumably 
planet) formation is shown in the right panel of figure Bl.4.3 which depicts the 
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345 GHz spectral line surveys of three regions of the W3 giant molecular cloud complex [21]. From such 
studies, which reveal dramatic differences in the THz spectrum of various objects, molecular astrophysicists 
hope to classify the evolutionary state of the cloud, just as optical spectra are used to classify stars. 


High angular resolution studies with modern THz telescopes and interferometric arrays can even probe the 


material destined to become part of planetary systems. In the accretion discs around young stars, for example, 
a range of simple organic species have now been detected at (sub)millimetre wavelengths [22]. Such accretion 
disks are the assembly zones of planets, and the first steps in THz imaging their outer regions and in 
understanding the means by which they evolve have been taken [23]. The physical and chemical conditions in 
these objects can now be compared to that observed in primitive solar system objects such as comets and icy 
satellites. The recent apparitions of comets Hyakutake and Hale-Bopp, for example, have provided a wealth 
of new observations at IR, THz and microwave frequencies that have led to a much improved understanding 
of the origin and evolution of planetesimals in the outer solar system [24]. Future work in high-resolution THz 
imaging will be dramatically enhanced by the Atacama Large Millimeter Array (ALMA), which will operate 
over the 1 cm to 350 jum interval at an altitude of 16 000 ft in the Atacama desert of northern Chile [25], 

Ultimately, studies from ground-based observatories are limited by absorption in the Earth's atmosphere. For 
example, no studies above »1 THz are possible even from mountain-top observatories. Two major 
instruments, SOFIA and FIRST, are poised to change this situation dramatically. SOFIA (for Stratospheric 
Observatory For Infrared Astronomy [26]) will carry a 2.7 m telescope in a 747SP aircraft to altitudes of 41 
000-45 000 feet. At these altitudes, nearly 60-70% of the THz spectrum up to the mid-IR is accessible ( figure 
Bl.4.3 ), and both incoherent and heterodyne spectrometers are being constructed as part of the initial 
instrument suite. SOFIA will become operational in 2002. On somewhat longer timescales, FIRST (the Far- 
InfraRed Space Telescope [27]) will carry 0.4-1.2 THz SIS and 1.2-1.9/2.4-2.7 THz antenna coupled HEB 
receivers at the focal plane of a 4 m telescope into space. Like SOFIA, FIRST will also include incoherent 
spectrometers and imagers to take advantage of the low background flux and to survey larger regions of the 
THz spectrum and sky. 


B1 .4.4 SPECTROSCOPY WITH TUNABLE MICROWAVE AND THZ 
SOURCES 

B1. 4.4.1 PRINCIPLES AND BACKGROUND 

Long before the invention of the laser, coherent radiation sources such as electron beam tubes (e.g. klystrons 
and backward wave oscillators, or BWOs) were in use by microwave spectroscopists to examine the direct 
absorption rotational spectroscopy of molecules. Being the interface between microwave spectroscopy 
(generally associated with low-energy rotational transitions of molecules) and mid-IR spectroscopy (generally 
associated with vibrational transitions of molecules), THz spectra such as those outlined in figure Bl.4.3 
probe the high-frequency rotations of molecules and certain large-amplitude vibrational motions. As the 
rotational energy levels of a molecule depend largely on its moment of inertia, which is determined by the 
molecular structure, high-resolution spectroscopy of gas phase molecules provides the most precise 
information available on molecular geometries. 

Rotational transition frequencies acquired in the THz region expand upon and complement those acquired in 
the microwave. Two types of molecules undergo rotational transitions that fall in the FIR: molecules with 
rotation about an axis having a small moment of inertia, and molecules in high- J states. FIR spectra of the first 
type of molecules are 
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important for determining their equilibrium geometry, as many light molecules (H 2 0, NH 3 , HF, etc) only 
have transitions in the submillimetre and FIR regions. Due to the high rotational energy in the second type of 
species (high-J molecules), interactions between the vibrational and rotational motions, namely, centrifugal 
distortion and Coriolis perturbations, become important. Given high enough spectral resolution and accuracy 
(Av/v < 10~ 5 ), shifts in rotational frequencies and changes in selection rules resulting from these interactions 


become significant. Thus, FIR spectroscopy of high- J transitions enables detailed characterization of 
molecular Hamiltonians far beyond the rigid rotor approximation, giving more accurate zero-point rotational 
constants and rough estimates of the shapes of potential energy surfaces. Finally, for very large molecules or 
weakly bound clusters, the softest vibrational degrees of freedom can be proved at THz frequencies, as is 
outlined in greater detail below. 

B1. 4.4.2 FOURIER TRANSFORM MICROWAVE SPECTROSCOPY 

At microwave frequencies, direct absorption techniques become less sensitive than those in the THz region 
due to the steep dependence of the transition intensities with frequency. A variant of heterodyne spectroscopy, 
pioneered by Flygare and Balle [28], has proven to be much more sensitive. In this approach, molecules are 
seeded into or generated by a pulsed molecular beam which expands into a high-g microwave cavity. The 
adiabatic expansion cools the rotational and translation degrees of freedom to temperatures near 1-10 K, and 
thus greatly simplifies the rotational spectra of large molecules. In addition, the low-energy collisional 
environment of the jet can lead to the growth of clusters held together by weak intermolecular forces. 

A microwave pulse from a tunable oscillator is injected into the cavity by an antenna, and creates a coherent 
superposition of rotational states. In the absence of collisions, this superposition emits a free-induction decay 
signal, which is detected with an antenna-coupled microwave mixer similar to those used in molecular 
astrophysics. The data are collected in the time domain and Fourier transformed to yield the spectrum whose 
bandwidth is determined by the quality factor of the cavity. Hence, such instruments are called Fourier 
transform microwave (FTMW) spectrometers (or Flygare-Balle spectrometers, after the inventors). FTMW 
instruments are extraordinarily sensitive, and can be used to examine a wide range of stable molecules as well 
as highly transient or reactive species such as hydrogen-bonded or refractory clusters [29, 30 ]. 

An outline of an FTMW instrument used in the study of large, polar, carbonaceous species is shown in figure 
Bl.4.4 . In this instrument, the FTMW cavity is mated to a pulsed electric discharge/supersonic expansion 
nozzle [ 31 , 32 ]. Long-chain carbon species, up to that of HC 17 N, as shown in figure Bl.4.4 , can be studied 
with this technique, as can a wide variety of other molecules and clusters. With the jet directed along the axis 
of the cavity, the resolution is highly sub-Doppler, with the slight complication that a Doppler doublet is 
formed by the difference between the laboratory and molecular beam reference frames. Studies of the 
rotational spectra of hydrogen-bonded clusters have also been carried out by several groups using FTMW 
instruments, a topic we shall return to later. FT-THz instruments can, in principle, be built using the highly 
sensitive THz SIS or HEB mixers outlined above, and would have extraordinary sensitivities. In order to 
saturate the rotational or rovibrational transitions, however, high-power THz oscillators are needed, but are 
not yet available. 
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Figure Bl.4.4. (a) An outline of the Harvard University electric discharge supersonic nozzle/Fourier 
transform microwave spectrometer, (b) The rotational states of HC 17 N observed with this apparatus [31]. 

B1. 4.4.3 CW THZ SOURCES AND MOLECULAR SPECTROSCOPY 

At frequencies up to -150-200 GHz, solid-state sources such as YIG-tuned oscillators or Gunn diode 
oscillators are now available with power outputs of up to 100 mW. The harmonic generation of such 
millimetre-wave sources is relatively efficient for doubling and tripling (>10-15%), but for higher harmonics 
the power drops rapidly (^ out (l THz)< 0.1-10 |uW). Nevertheless, harmonic generation was used as early as 
the 1950s to record the submillimetre wave spectra of stable molecules [33], Harmonics from optimized solid- 
state millimetre-wave sources are now used to drive astronomical heterodyne receivers up to 900-1 100 GHz 
[34], and the prospects for operation up to 2-3 THz are promising. 


-15- 


Even higher output power (-1-10 mW) is available from rapidly tunable BWOs up to 1-15 THz. BWOs are 
capable laboratory sources where they operate, and offer wide tunability and excellent spectral purity, 
especially when phase locked to the harmonics of lower-frequency microwave or millimetre-wave oscillators 
[35]. The high output power of BWOs and the relatively strong intrinsic strengths of pure rotational 
transitions of polar molecules gives BWO spectrometers very high sensitivity, and also enables them to utilize 
nonlinear methods such as Lamb dip spectroscopy. An example for the CO molecule is presented in figure 
Bl.4.5 [36]. The resulting resolution is truly exceptional and leads to among the most precise molecular 


constants ever determined. Pioneered in the former Soviet Union, THz BWOs are finding increased 
applications in a number of laboratories. 
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Figure Bl.4.5. The Lamb dip spectrum of the CO 6-5 transition obtained with the Cologne THz BWO 
spectrometer. The dip is of order 30-40 kHz in width and the transition frequency is determined to 0.5 kHz 
[36]. 

BWOs are typically placed in highly overmoded waveguides, and the extension of this technology to higher 
frequencies will, by necessity, require a number of innovative solutions to the very small-scale structures that 
must be fabricated. Their size and weight also preclude them from space-based applications (e.g. FIRST). 
Thus, electronic oscillators are unlikely to cover all THz spectroscopy and/or remote sensing applications over 
the full range of 1-10 THz in the short term, and a number of alternative THz radiation sources are therefore 
under investigation. Among the most promising of these, over the long term, are engineered materials such as 
quantum wells that either possess high nonlinearity for THz mixing experiments [37] or that can be tailored to 
provide direct emission in the FIR [38]. Tunable laser sources are the ultimate goal of such development 
programmes, but while THz spontaneous emission has been observed, laser action is still some time in the 
future. 

A number of mixing experiments have therefore been used to generate both pulses and CW THz radiation. 
Among these, diode-based mixers used as upconvertors (that is, heterodyne spectroscopy 'in reverse') have 
been the workhorse FIR instruments. Two such techniques have produced the bulk of the spectroscopic 
results: 
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(1) GaAs-based diode mixers that generate tunable sidebands on line-tunable FIR molecular gas lasers [39, 
40], and (2) C0 2 laser-based THz difference frequency generation in ultrafast metal-insulator-metal (MIM) 
diodes [41, 42]. Both types of mixers have sufficient instantaneous bandwidth to place any desired millimetre- 
wave frequency on the carrier radiation, and respond well at THz frequencies. Having been used for many 
years in astronomical receiver applications, GaAs mixer technology is more mature than that for MIM diodes, 
and its conversion, noise and coupling mechanisms are better understood at present. In addition, their 
conversion efficiencies are good up to at least 4-5 THz, and several to several tens of microwatts are available 
from GaAs Schottky diode laser sideband generators. It is thus possible to construct THz spectrometers based 
on laser sideband generation that operate at or near the shot-noise limit, and this sensitivity has been used to 


investigate a wide range of interesting reactive and/or transient species, as is described in section (B 1.4.4. 6) . 

While the conversion efficiency of MIM diodes is not as good as that of their GaAs counterparts, they are 
considerably faster, having also been used at IR and even visible wavelengths! Thus, MIM-based THz 
spectrometers work over wider frequency ranges than do GaAs FIR laser sideband generators, but with less 
output power. As described above, this translates directly into spectrometer sensitivity due to the high 
laboratory background in the THz region. Thus, where intense FIR gas laser lines are available, sideband 
generators are to be preferred, but at present only MIM-diode spectrometers can access the spectroscopically 

important region above 200 cm -1 [42]. Also, because it is easy to block CO- laser radiation with a variety of 
reststrahlen solid-state filters, there is no fixed-frequency FIR gas laser carrier to reject in MIM spectrometers, 
and this simplifies the overall experimental design. MIM spectrometers that perform third-order mixing of 
two C0 2 laser lines and a tunable microwave source have also been constructed. This approach leads to very 
wide tunability and eliminates the need to scan the C0 2 lasers, and only decreases the output power by a small 
amount. Thus, it is possible to phase lock the C0 2 and microwave sources, leading to a direct synthesis 
approach to THz radiation to very high frequencies indeed. This is not feasible for the FIR laser sideband 
generators, and so the MIM approach has provided extremely accurate THz frequency standards for 
calibration gases such as CO, HF and HC1, which can then be used as secondary standards in a number of 
other techniques [41]. 

While extremely useful in the laboratory, the size and power requirements of both FIR laser sideband 
generators and MIM-based C0 2 laser spectrometers are excessive for space-borne applications. Research on 
other THz generation approaches by mixing has therefore continued. One particularly interesting approach 
from a technology and miniaturization point of view is optical heterodyne conversion, in which optical 
radiation is converted to THz light by semiconducting materials pumped above their band gaps. The use of 
optical or near-IR lasers to drive the process results in wide tunability and spectral coverage of the THz 
spectrum. Such approaches, described next, also have the considerable advantage of levering the rapid 
technological innovations in diode-pumped lasers and fast optoelectronic devices required for emerging 
industries such as optical telecommunications and optical computing. They therefore 'break the mould' of 
traditional THz LO development by small groups focused on scientific problems, and rapid developments can 
be expected with little or no investment by the THz community. Finally, as described next, both ultrafast 
time-resolved and CW high-resolution spectroscopies can be carried out using these approaches. 

B1. 4.4.4 TIME DOMAIN THZ SPECTROSCOPY 

Free-electron lasers have long enabled the generation of extremely intense, sub-picosecond THz pulses that 
have been used to characterize a wide variety of materials and ultrafast processes [43]. Due to their massive 
size and great expense, however, only a few research groups have been able to operate them. Other 
approaches to the generation of sub-picosecond THz pulses have therefore been sought, and one of the earliest 
and most successful involved semiconducting materials. In a photoconductive semiconductor, carriers (for n- 
type material, electrons) 
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in the valence band absorb the incident radiative power (if /zv Q > E b d ) and are injected into the conduction 
band. Once they are in the conduction band, the electrons become mobile, and, if there is an applied bias, they 
begin to drift toward the photoconductor electrodes. Some of the electrons will reach the electrodes while 
some will encounter sites of ionic impurities. The latter electrons are trapped by the impurity sites and 
removed from the conduction band. As early as two decades ago, pulsed optical lasers were used by Auston 
and co-workers to generate and detect electrical pulses in DC biased voltage transmission lines [44]. For the 
earliest used materials such as GaAs and Si, the pulse widths were of order nanoseconds due to their long 
recombination times. The discovery of materials such as radiation-damaged silicon-on-sapphire and of low- 
temperature-grown (LTG) GaAs changed this situation dramatically. 


Especially with LTG GaAs, materials became available that were nearly ideal for time-resolved THz 
spectroscopy. Due to the low growth temperature and the slight As excess incorporated, clusters are formed 
which act as recombination sites for the excited carriers, leading to lifetimes of <250 fs [45]. With such 
recombination lifetimes, THz radiators such as dipole antennae or log-periodic spirals placed onto 
optoelectronic substrates and pumped with ultrafast lasers can be used to generate sub-picosecond pulses with 
optical bandwidths of 2-4 THz. Moreover, coherent sub-picosecond detection is possible, which enables both 

the real and imaginary refractive indices of materials to be measured. The overall sensitivity is >10 , and a 
variety of solid-state and gas phase THz spectra have been acquired with such systems [46, 47], an excellent 
overview of which may be found in [48]. 

Recently, it has been shown that both the detection and generation of ultrafast THz pulses can be carried out 
using the electro-optic effect in thin films of materials such as ZnTe, GaAs and InP that are pumped in the 
near-IR [49]. The generation efficiency is similar to that of the photoconducting antenna approach, but the 
electro-optic scheme offers two extremely significant advantages. First, the detection bandwidth can be 
extremely large, up to 30-40 THz under optimum conditions [49]. Second, it is possible to directly image the 
THz field with such spectrometers. Such approaches therefore make possible the THz imaging of optically 
opaque materials with a compact, all solid-state, room-temperature system [50] ! 

The great sensitivity and bandwidth of electro-optic approaches to optical-THz conversion also enable a 
variety of new experiments in condensed matter physics and chemistry to be conducted, as is outlined in 
figure Bl.4.6 . The left-hand side of this figure outlines the experimental approach used to generate ultrafast 
optical and THz pulses with variable time delays between them [51]. A mode-locked Tksapphire laser is 
amplified to provide approximately 1 W of 100 fs near-IR pulses at a repetition rate of 1 kHz. The -850 nm 
light is divided into three beams, two of which are used to generate and detect the THz pulses, and the third of 
which is used to optically excite the sample with a suitable temporal delay. The right-hand panel presents the 
measured relaxation of an optically excited TBNC molecule in liquid toluene. In such molecules, the charge 
distribution changes markedly in the ground and electronically excited states. In TBNC, for example, the 
excess negative charge on the central porphyrin ring becomes more delocalized in the excited state. The 
altered charge distribution must be accommodated by changes in the surrounding solvent. This so-called 
solvent reorganization could only be indirectly probed by Stokes shifts in previous optical-optical pump- 
probe experiments, but the optical-THz approach enables the solvent response to be directly investigated. In 
this case, at least three distinct temporal response patterns of the toluene solvent can be seen that span several 
temporal decades [51]. For solid-state spectroscopy, ultrafast THz studies have enabled the investigation of 
coherent oscillation dynamics in the collective (phonon) modes of a wide variety of materials for the first time 
[49]. 
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Figure Bl.4.6. Left: an experimental optical THz pump-probe set-up using sub-picosecond THz pulse 
generation and detection by the electro-optic effect. Right: the application of such pulses to the relaxation of 
optically excited TBNC in toluene. The THz electric field used for these experiments is shown in the upper- 
right inset. Three exponential decay terms, of order 2, 50 and 700 ps, are required to fit the observed temporal 
relaxation of the solvent [51]. 

B1 .4.4.5 THZ OPTICAL-HETERODYNE CONVERSION IN PHOTOCONDUCTORS 

For CW applications of optical-heterodyne conversion, two laser fields are applied to the optoelectronic 
material. The non-linear nature of the electro-optic effect strongly suppresses continuous emission relative to 
ultrashort pulse excitation, and so most of the CW research carried out to date has used photoconductive 
antennae. The CW mixing process is characterized by the average drift velocity Band carrier lifetime t q of the 

mixing material, typically 
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LTG GaAs. If z u v — *, the electrode spacing, then a significant amount of current will be generated by the 
photo-excitation. That is 


m= 


N c [t)ev 


(B1.4.3) 


where 7V~ c is the number of carriers. The rate equation for the photo-excitation-recombination process can be 
written as 


diV c </) D N c {0 


i\f 


*b 


(B1.4.4) 


where a is a proportionality constant and Pq is the average power of the incident field over a few optical 
periods. The time-average is of interest here because t q in a semiconductor is always longer than an optical 
cycle and, therefore, the output current will not respond directly to the optical oscillations. Since the THz 
waves are generated with optical light, co = (co 1 - a> 2 ) ^caj « a> 2 . Thus, integrating the right-hand side of the 

expression above over a few cycles of co 1 « a> 2 yields 


P = E 2 t + £; ~2E ] E 2 cos(w/ + ^ - ^>). 


(B1.4.5) 


Substitution yields 


at ^ 


(B1.4.6) 


By solving the differential equation above and using the single-pump case (P 2 = 0) to determine a, it can be 
shown that the photo-current is 


tit) = 


hVfi S 


[ p| + ft+ VT^fe OT( '" r "* ) ] 
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where v Q = v 1 « v 2 , i ) 1 and P 2 are the incident optical powers, (|) = tan (cdt ) and r|, the quantum efficiency, 
is the number of carriers excited per incident photon. Recognizing that v= \iE (where jli is the carrier mobility 
and E is the electric field), 
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Separating the DC and oscillatory parts of the above equation gives 


'dc = —(Pi { P l) 


(B1.4.9) 


(B1.4.10) 


Thus, the beating of the two incident optical fields generates a modulation in the photo-excitation of the 
carriers, which in turn results in an oscillating electrical signal. Initial microwave experiments using an 
interdigitated electrode geometry by Brown et al [52] showed a flat frequency response up to 25 GHz with a 
conversion efficiency of 0.14%, in agreement with the signal level predicted by the theoretical analysis 
outlined above. At THz frequencies, the power decays rapidly from such structures due to the parasitic 
capacitance of the electrode structure and the finite carrier lifetime. Free-space radiation is generated by 
coupling the electrodes to a planar THz antenna. At 1 THz, the observed conversion efficiency is roughly 3 x 

10 , and the damage threshold is of order 1 mW jum . To alleviate these limitations, travelling-wave 
structures have now been developed that eliminate the capacitive roll-off and allow large-device active areas 
to be pumped. Powers in excess of 1 jllW can now be achieved above 2 THz for input drive levels of 300-400 
mW[51]. 

This power level is sufficient for laboratory spectroscopy or for use as a THz local oscillator, and such 
travelling-wave structures can be used over at least a decade of frequency (0.3-3 THz, for example) without 
moving parts. Further, compact, all solid-state spectrometers, such as that outlined in figure Bl.4.7 can now 
be constructed using CW diode laser and optical tapered amplifier technology [54]. The major challenge in 
working with diode lasers is, in fact, their instantaneous line widths (>15 MHz) and long-term frequency 
stability (-100-200 MHz), both of which need considerable improvement to be useful as THz LOs. The main 
source of both instabilities is the notorious susceptibility of diode lasers to optical feedback. However, this 
susceptibility can be used to one's advantage by sending a small fraction of the laser output into a high-finesse 
(F > 60) optical cavity such that the diode laser 'sees' optical feedback only at cavity resonances. By locking 
the diode lasers to different longitudinal modes of an ultrastable reference cavity, it is possible to construct 

Q 

direct synthesis spectrometers that can be absolutely calibrated to Av/v < 10 or better. To demonstrate the 
continuous tunability and frequency stability of such an instrument, the lower panel of figure Bl.4.7 presents 
a submillimetre spectrum of acetonitrile in which the transition frequencies are measured to better than 50 
kHz. Future improvements to such systems should allow similar measurements on both stable and transient 
species up to at 1 east 5-6 THz. 
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Figure Bl.4.7. Top: THz generation by optical-heterodyne conversion in low-temperature GaAs. (a) The 
three DBR laser system that synthesizes a precise difference frequency for the THz photomixer spectrometer, 
(b) the MOPA system and the set-up for spectroscopy. Bottom: second-derivative absorption spectrum of the 
CH 3 CN J K = 16^— » 17 ^rotational transitions near 312 GHz. (a) The spectrum for ordinary 12 CH 3 , 12 CN. 
The inset is an expanded view of the K= 0-2 lines, (b) The K= 0-3 lines of CH a 13 CN [541. 
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B1. 4.4.6 THZ SPECTROSCOPY OF HYDROGEN-BONDED AND REFRACTORY CLUSTERS 


Among the most interesting transient species that can be studied at THz frequencies are those involving 
collections of molecules held together by van der Waals or hydrogen bonding forces. In no small measure this 
is true because hydrogen bonds are ubiquitous in nature. From the icy mantles covering interstellar dust to the 
nuclei of living cells, hydrogen bonds play crucial roles in the regulation and evolution of both inorganic and 
living systems. Accurate, fully anisotropic, descriptions of the intermolecular forces involved in these and 
other weak interactions are therefore assuming an increasingly pivotal role in modern molecular science, 


particularly in molecular biology [54]. Within chemical physics, the anisotropy of intermolecular forces plays 
a central role in understanding the dynamics associated with photoinitiated reactions in clusters [55], to name 
but one example. 

Over the past century, much of the data underlying current descriptions of van der Waals and hydrogen bonds 
were obtained from measurements of second virial coefficients, pressure broadening, and other classical 
properties. Experimental advances during the past three decades have led to many techniques capable of 
interrogating intermolecular forces, most notably scattering experiments and the spectroscopic study of 
isolated clusters. Despite this long-standing interest, however, truly quantitative, microscopic models of these 
forces have only become available in recent years as advances in ab initio theory, high-resolution 
spectroscopy and eigenvalue generation on complex potential energy surfaces have converged, to enable the 
fitting of fully anisotropic force fields to experimental data for systems with two, three and four degrees of 
freedom [57], 

The weak interactions in clusters are mathematically modelled by means of a multi-dimensional 
intermolecular potential energy surface, or IPS. Microwave spectroscopy, carried out primarily with the 
elegant, extraordinarily sensitive and sub-Doppler resolution FTMW technique outlined previously, proves 
the very lowest region of the ground-state surface; visible and IR spectroscopy probes states above the 
dissociation energy of the adduct (the latter of which are described elsewhere in this encyclopaedia). None 
are, in themselves, direct probes of the total ground-state IPS. Indeed, while microwave, IR and UV/Vis 
instruments have produced structural parameters and dynamical lifetimes for literally dozens of binary (and 
larger) weakly bound complexes (WBCs) over the past two decades [58, 59, 60, and 61], recent calculations 
which explicitly allow coupling between all the degrees of freedom present in the cluster reveal that structural 
parameters alone are not sufficient to accurately characterize the IPS [62]. 

Weak interactions are characterized by binding energies of at most a few kcal/mole and by IPSs with a very 

rich and complex topology connected by barriers of at most a few hundred cm . Rotational, tunnelling and 
intermolecular vibrational states can therefore become quite strongly mixed, hence the general term of 
vibration-rotation-tunnelling (VRT) spectroscopy for the study of eigenvalues supported by an IPS [63, 64 ]. 
The VRT states in nearly all systems lie close to or above the tunnelling barriers, and therefore sample large 
regions of the potential surface. In addition, as they become spectroscopically observable, the number, 
spacings and intensities of the tunnelling splittings are intimately related to the nature of the tunnelling paths 
over the potential surface. 

Thus, by measuring the intermolecular vibrations of a WBC, ultimately with resolution of the rotational, 
tunnelling and hyperfine structure, the most sensitive measure of the IPS is accessed directly. The difficulty of 
measuring these VRT spectra is the fact that they lie nearly exclusively at THz frequencies. As expected, the 
'stiffer' the interaction, the higher in frequency these modes are found. In general, the total 0.3-30 THz 
interval must be accessed, although for the softest or heaviest species the modes rarely lie above 10-15 THz. 


-24- 


For WBCs composed of stable molecules, planar jet expansions produce sufficiently high concentrations that 
direct absorption THz studies can be pursued for clusters containing S6 small molecules using FIR laser 
sidebands. Research on water clusters has been particularly productive, and has been used to investigate the 
structures and large-amplitude dynamics of the clusters outlined in the top panel of figure Bl.4.8 [65, 66, 67 
and 68]. In addition, as the bottom panel illustrates, not only are the VRT modes directly sampled by such 
work, but the available spectral resolution of .Si MHz enables full rotational resolution along with a detailed 
investigation of the VRT and hyperfine splittings. The high resolution is also essential in untangling the often 
overlapping bands from the many different clusters formed in the supersonic expansion. 


For clusters beyond the dimer, each of the monomers can both accept and donate hydrogen bonds, which 
leads to a rich suite of large-amplitude motions. Their spectroscopic manifestations are illustrated for the 
water trimer in figure Bl.4.9 . The most facile motion in this system is the 'flipping' of one of the non-bonded 
hydrogen atoms through the plane of the oxygen atoms. This motion is sufficiently fast that it produces 
symmetric top rovibrational spectra even though at any one instant the molecule is always asymmetric. Six of 
these flipping motions lead to the same structure, a process know as 'pseudorotation', and leads to the 
manifold of states produced in the bottom panel of figure Bl.4.9 . The exchange of bound versus free 
hydrogen atoms in a monomer leads to the hyperfine splittings of the individual transitions, as is illustrated in 
the top panel of the same figure. From a comparison of such spectra with detailed calculations, a variety of 
IPS properties can be extracted to experimental precision. 

Molecules like those presented in figure B 1.4.4 form another interesting suite of targets from a THz 
perspective. Such chains can be treated as rigid rods, and as they get longer their lowest bending frequencies 
move rapidly into the FIR. For example, the lowest frequencies of a variety of chains are as follows: 

Cyanopolyyncs, HCj*N -* HC^N 222 -* 8 cm" 1 

Polyacetylenes, HOK -> HC W H 220 -+ 19 cm" 1 
Carbon clusters, C^ -> Cjq 63 -> 17 cm ' 

C r N radicals, CjN -+ C| 9 N 144 -* 4 cm" 1 

A real advantage of working in the FIR is that both polar and non-polar chains may be searched for. Indeed, 
the lowest bending frequency of C 3 has been studied in the laboratory [69], and tentatively detected toward 
the galactic centre source Sgr B2 [70]. Other large molecules such as polycyclic aromatic hydrocarbons 
(anthracene, pyrene, perylene, etc) or 'biomolecules' such as glycine or uracil also possess low-frequency FIR 
vibrations, and can be produced in sizable quantities in supersonic expansions through heated planar nozzles 
[71]. The study of such species is important cosmochemically, but is quite difficult at microwave frequencies 
where the rotational spectra are weak, and nearly impossible at IR or optical wavelengths due to the extinction 
present in dense molecular clouds and young stellar objects. 
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Figure Bl.4.8. Top: the lowest-energy structures of water clusters, (H 2 0)^, from n = 2 - 6. Bottom: a sample 
-2.5 THz spectrum of such clusters formed in a pulsed planar supersonic expansion [65]. 
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Figure Bl.4.9. Top: rotation-tunnelling hyperfine structure in one of the 'flipping' modes of (D 2 0) 3 near 3 
THz. The small splittings seen in the ^-branch transitions are induced by the bound-free hydrogen atom 
tunnelling by the water monomers. Bottom: the low-frequency torsional mode structure of the water dimer 
spectrum, including a detailed comparison of theoretical calculations of the dynamics with those observed 
experimentally [65]. The symbols next to the arrows depict the parallel (A k = 0) versus perpendicular (A K = 
±1) nature of the selection rules in the pseudorotation manifold. 
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B1 .4.5 OUTLOOK 


Technology developments are revolutionizing the spectroscopic capabilities at THz frequencies. While no one 
technique is ideal for all applications, both CW and pulsed spectrometers operating at or near the fundamental 
limits imposed by quantum mechanics are now within reach. Compact, all-solid-state implementations will 
soon allow such spectrometers to move out of the laboratory and into a wealth of field and remote-sensing 
applications. From the study of the rotational motions of light molecules to the large-amplitude vibrations of 


clusters and the collective motions of condensed phases, microwave and THz spectroscopy opens up new 
windows to a wealth of scientifically and technologically important fields. Over the coming decade, truly 
user- friendly and extraordinarily capable instruments should become cost affordable and widely available, 
enabling this critical region of the electromagnetic spectrum to be fully exploited for the first time. 
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B1.5 Nonlinear optical spectroscopy of surfaces 
and interfaces 

Jerry I Dadap and Tony F Heinz 


B1.5.1 INTRODUCTION 

B1. 5.1.1 NONLINEAR OPTICS AND SPECTROSCOPY 

Nonlinear optics is the study of the interaction between intense electromagnetic radiation and matter. It 
describes phenomena arising when the response of a medium to the electric field of light leaves the linear 
regime associated with the familiar and ubiquitous effects, such as reflection, refraction and absorption, 
comprising classical optics. In the presence of a sufficiently intense light source, the approximation of 
linearity breaks down. A new and much broader class of optical phenomena may be observed. Prototypical 
among these nonlinear optical effects is the production of light at new frequencies. Indeed, nonlinear optics is 
generally considered to have begun in 1961 when Franken and coworkers demonstrated optical second- 
harmonic generation (SHG) by insertion of a quartz crystal along the path of a laser beam [1]. In addition to 
the generation of new frequencies from excitation of a monochromatic source, nonlinear optical effects lead to 
the coupling between beams of identical and disparate frequencies, as well as to the action of a beam of light 
on itself. 

Given the complexity of materials, it is perhaps surprising that a linear response to an applied optical field 
should be so common. This situation reflects the fact that the strength of electric fields for light encountered 
under conventional conditions is minute compared to that of the electric fields binding atoms and solids 
together. The latter may, for example, be estimated as E ~ 1 V A _1 = 10 8 V cm -1 . Since the irradiance of a 
light beam required to reproduce this electric field strength is -10 W cm , we may understand why a linear 
approximation of the material response is adequate for conventional light sources. With the advent of the 
laser, with its capability for producing high optical power and a high degree of coherence, this situation has 
changed. Under laser radiation, nonlinear optical effects are readily observed and widely exploited. 

Over the past decades, nonlinear optics has come to have a broad impact on science and technology, playing a 
role in areas as diverse as telecommunications, materials processing and medicine. Within the context of 
chemical science, nonlinear optics is significant in providing new sources of coherent radiation, in permitting 
chemical processes to be induced under intense electromagnetic fields and in allowing matter to be probed by 
many powerful spectroscopic techniques. In this chapter, we shall be concerned only with the spectroscopic 
implications of nonlinear optics. In particular, our attention will be restricted to the narrowed range of 
spectroscopic techniques and applications related to probing surfaces and interfaces. This subject is a 
significant one. Surfaces and interfaces have been, and remain, areas of enormous scientific and technological 
importance. Sensitive and flexible methods of interface characterization are consequently of great value. As 
the advances discussed in this chapter reveal, nonlinear optics offers unique capabilities to address surface and 
interface analysis. 


B1. 5.1.2 PROBING SURFACES AND INTERFACES 

The distinctive chemical and physical properties of surfaces and interfaces typically are dominated by the 
nature of one or two atomic or molecular layers [2, 3]. Consequently, useful surface probes require a very 
high degree of sensitivity. How can this sensitivity be achieved? For many of the valuable traditional probes 
of surfaces, the answer lies in the use of particles that have a short penetration depth through matter. These 
particles include electrons, atoms and ions, of appropriate energies. Some of the most familiar probes of solid 
surfaces, such as Auger electron spectroscopy (AES), low-energy electron diffraction (LEED), electron 
energy loss spectroscopy (EELS) and secondary ion mass spectroscopy (SIMS), exploit massive particles both 
approaching and leaving the surface. Other techniques, such as photoemission spectroscopy and inverse 
photoemission spectroscopy, rely on electrons for only half of the probing process, with photons serving for 
the other half. These approaches are complemented by those that directly involve the adsorbate of interest, 
such as molecular beam techniques and temperature programmed desorption (TPD). While these methods are 
extremely powerful, they are generally restricted to — or perform best for — probing materials under high 
vacuum conditions. This is a significant limitation, since many important systems are intrinsically 
incompatible with high vacuum (such as the surfaces of most liquids) or involve interfaces between two dense 
media. Scanning tunnelling microscopy (STM) is perhaps the electron-based probe best suited for 
investigations of a broader class of interfaces. In this approach, the physical proximity of the tip and the probe 
permits the method to be applied at certain interfaces between dense media. 

Against this backdrop, the interest in purely optical probes of surfaces and interfaces can be easily understood. 
Since photons can penetrate an appreciable amount of material, photon-based methods are inherently 
appropriate to probing a very wide class of systems. In addition, photons, particularly those in the optical and 
infrared part of the spectrum, can be produced with exquisite control. Both the spatial and temporal properties 
of light beams can be tailored to the application. Particularly noteworthy is the possibility afforded by laser 
radiation of having either highly monochromatic radiation or radiation in the form of ultrafast (femtosecond) 
pulses. In addition, sources of high brightness are available, together with excellent detectors. As a 
consequence, within the optical spectral range several surface and interface probes have been developed. 
(Complementary approaches also exist within the x-ray part of the spectrum.) For optical techniques, one of 
the principal issues that must be addressed is the question of surface sensitivity. The ability of light to 
penetrate through condensed media was highlighted earlier as an attractive feature of optics. It also represents 
a potential problem in achieving the desired surface sensitivity, since one expects the bulk contribution to 
dominate that from the smaller region of the surface or interface. 

Depending on the situation at hand, various approaches to achieving surface sensitivity may be appropriate for 
an optical probe. Some schemes rely on the presence of distinctive spectroscopic features in the surface region 
that may be distinguished from the response of the bulk media. This situation typically prevails in surface 
infrared spectroscopy [4] and surface-enhanced Raman scattering (SERS) [5, 6 and 7]. These techniques 
provide very valuable information about surface vibrational spectra, although the range of materials is 
somewhat restricted in the latter case. Ellipsometry [8, 9 and 10] permits a remarkably precise determination 
of the reflectivity of an interface through measurements of the polarization properties of light. It is a powerful 
tool for the analysis of thin films. Under appropriate conditions, it can be pushed to the limit of monolayer 
sensitivity. Since the method has, however, no inherent surface specificity, such applications generally require 
accurate knowledge and control of the relevant bulk media. A relatively recent addition to the set of optical 
probes is reflection difference absorption spectroscopy (RDAS) [ 10 , 11 ]. In this scheme, the lowered 
symmetry of certain surfaces of crystalline materials is exploited in a differential measurement of reflectivity 
that cancels out the optical response of the bulk media. 


In this chapter, we present a discussion of the nonlinear spectroscopic methods of second-harmonic generation 
(SHG) and sum-frequency generation (SFG) for probing surfaces and interfaces. While we have previously 
described the relative ease of observing nonlinear optical effects with laser techniques, it is still clear that 
linear optical methods will always be more straightforward to apply. Why then is nonlinear spectroscopy an 
attractive option for probing surfaces and interfaces? First, we may note that the method retains all of the 
advantages associated with optical methods. In addition, however, for a broad class of material systems the 
technique provides an intrinsic sensitivity to interfaces on the level of a single atomic layer. This is a very 
desirable feature that is lacking in linear optical probes. The relevant class of materials for which the 
nonlinear approach is inherently surface sensitive is quite broad. It consists of interfaces between all pairs of 
centrosymmetric materials. (Centrosymmetric materials, which include most liquids, gases, amorphous solids 
and elemental crystals, are those that remain unchanged when the position of every point is inverted through 
an appropriate origin.) The surface sensitivity of the SHG and SFG processes for these systems arises from a 
simple symmetry property: the second-order nonlinear optical processes of SHG and SFG are forbidden in 
centrosymmetric media. Thus, the bulk of the material does not exhibit a significant nonlinear optical 
response. On the other hand, the interfacial region, which necessarily breaks the inversion symmetry of the 
bulk, provides the desired nonlinear optical response. 

Because of the generality of the symmetry principle that underlies the nonlinear optical spectroscopy of 
surfaces and interfaces, the approach has found application to a remarkably wide range of material systems. 
These include not only the conventional case of solid surfaces in ultrahigh vacuum, but also gas/solid, 
liquid/solid, gas/liquid and liquid/liquid interfaces. The information attainable from the measurements ranges 
from adsorbate coverage and orientation to interface vibrational and electronic spectroscopy to surface 
dynamics on the femtosecond time scale. 

B1.5.1.3 SCOPE OF THE CHAPTER 

In view of the diversity of material systems to which the SHG/SFG method has been applied and the range of 
the information that the method has yielded, we cannot give a comprehensive account of the technique in this 
chapter. For such accounts, we must refer the reader to the literature, particularly as summarized in various 
review articles [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 and 27] and monographs [10, 28, 29]. 
Our aim here is only to present an overview of the subject in which we attempt to describe the basic 
principles. The chapter is organized in the following fashion. We first outline basic theoretical considerations 
relevant to the technique, both in a brief general discussion of nonlinear optics and in a specific description of 
the nonlinear response of interfaces. After a few words about experimental techniques for surface SHG/SFG 
measurement, we devote the remainder of the chapter to describing the type of information that may be 
extracted from the nonlinear measurements. We have attempted at least to mention the different classes of 
information that have been obtained, such as adsorbate coverage or vibrational spectroscopy. In most cases, 
the corresponding approach has been widely and fruitfully applied in many experimental studies. Although we 
offer some representative examples, space does not permit us to discuss these diverse applications in any 
systematic way. 


B1.5.2 THEORETICAL CONSIDERATIONS 


B1.5.2.1 GENERAL BACKGROUND ON NONLINEAR OPTICS 


(A) ANHARMONIC OSCILLATOR MODEL 


In order to illustrate some of the basic aspects of the nonlinear optical response of materials, we first discuss 
the anharmonic oscillator model. This treatment may be viewed as the extension of the classical Lorentz 
model of the response of an atom or molecule to include nonlinear effects. In such models, the medium is 
treated as a collection of electrons bound about ion cores. Under the influence of the electric field associated 
with an optical wave, the ion cores move in the direction of the applied field, while the electrons are displaced 
in the opposite direction. These motions induce an oscillating dipole moment, which then couples back to the 
radiation fields. Since the ions are significantly more massive than the electrons, their motion is of secondary 
importance for optical frequencies and is neglected. 

While the Lorentz model only allows for a restoring force that is linear in the displacement of an electron 
from its equilibrium position, the anharmonic oscillator model includes the more general case of a force that 
varies in a nonlinear fashion with displacement. This is relevant when the displacement of the electron 
becomes significant under strong driving fields, the regime of nonlinear optics. Treating this problem in one 
dimension, we may write an appropriate classical equation of motion for the displacement, x, of the electron 
from equilibrium as 
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[d _ jc dx 
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(B1.5.1) 


Here E(i) denotes the applied optical field, and -e and m represent, respectively, the electronic charge and 
mass. The (angular) frequency 03q defines the resonance of the harmonic component of the response, and y 
represents a phenomenological damping rate for the oscillator. The nonlinear restoring force has been written 

in a Taylor expansion; the terms m(M or + fr'x + •*•) correspond to the corrections to the harmonic 
restoring force, with parameters b^\ b^\ . . . taken as material-dependent constants. In this equation, we have 
recognized that the excursion of the electron is typically small compared to the optical wavelength and have 
omitted any dependence of the driving term -eE on the position of the electron. 

Here we consider the response of the system to a monochromatic pump beam at a frequency co, 


E{t) = E(w)exp(— ifttf) + c.c. 


(B1.5.2) 


where the complex conjugate (c.c.) is included to yield an electric field E(t) that is a real quantity. We use the 
symbol E to represent the electric field in both the time and frequency domain; the different arguments should 
make the interpretation clear. Note also that E(($) and analogous quantities introduced later may be complex. 
As a first 


approximation to the solution of equation B 1.5.1 , we neglect the anharmonic terms to obtain the steady-state 
motion of the electron 


*<0 = i ^ J ^ 2 < «■ 


(B1.5.3) 


This solution is appropriate for the regime of a weak driving field E(t). If we now treat the material as a 
collection of non-interacting oscillators, we may write the induced polarization as a sum of the individual 


dipole moments over a unit volume, i.e. P(t) = -Nex(t), where TV denotes the density of dipoles and local-field 
effects have been omitted. Following equation B 1.5. 2 , we express the P(t) in the frequency domain as 


P(t) = P(G>)exp(-kuO +C.C. 


(B1.5.4) 


We may then write the amplitude for the harmonically varying polarization as proportional to the 
corresponding quantity for the driving field, E(($): 


P(M) = x a) (»)Jff(»). 


(B1. 5.5) 


The constant of proportionality, 


x m fa) = 
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represents the linear susceptibility of the material. It is related to the dielectric constant s(co) by s(co) = 1 + 

Up to this point, we have calculated the linear response of the medium, a polarization oscillating at the 
frequency co of the applied field. This polarization produces its own radiation field that interferes with the 
applied optical field. Two familiar effects result: a change in the speed of the light wave and its attenuation as 
it propagates. These properties may be related directly to the linear susceptibility %^\(o). The index of 
refraction, n = Rc| ^ I + 4tt / ' ' ' J, is associated primarily with the real part of X C^)- It describes the reduced 
(phase) velocity, c/n, of the optical wave travelling through the medium compared to its speed, c, in vacuum. 
The imaginary part, Im|x\ca)], on the other hand, gives rise to absorption of the radiation in the medium. 
The frequency dependence of the quantities Re[%' )(go)] and Im[%^ '(©)] are illustrated in figure B 1.5.1 . They 
exhibit a resonant response for optical frequencies co near a> , and show the expected dispersive and 
absorptive lineshapes. 


R^ 



Lmtf 


Figure Bl.5.1 Anharmonic oscillator model: the real and imaginary parts of the linear susceptibility y^ are 


plotted as a function of angular frequency co in the vicinity of the resonant frequency C0q. 

If we now include the anharmonic terms in equation Bl. 5. 1 , an exact solution is no longer possible. Let us, 
however, consider a regime in which we do not drive the oscillator too strongly, and the anharmonic terms 
remain small compared to the harmonic ones. In this case, we may solve the problem perturbatively. For our 

discussion, let us assume that only the second-order term in the nonlinearity is significant, i.e. b^ ^ and b^ 
= for i > 2 in equation B 1.5.1 . To develop a perturbational expansion formally, we replace E(t) by X E(t), 
where X is the expansion parameter characterizing the strength of the field E. Thus, equation B 1.5.1 becomes 


x + 2yx + a>lx - h a) x 2 = -keE(t)fm. (B1 .5.7) 

We then write the solution of equation B 1.5. 7 as a power series expansion in terms of the strength X of the 
perturbation: 

X = \x {l) +k 2 x a) + JL 3 * a) +- ■ ■ . (B1.5.8) 

If we substitute the expression for Bl.5.8 back into Bl.5.7 and require that the terms proportional to X and X 
on both sides of the resulting equation are equal, we obtain the equations 

.t ! " + 2yx 0) + a>lx iX) =-eEU)/m (B1.5.9) 


jcM + lyx^+^xW -£< 2 V l> ) 2 = 0. ( B1 - 5 - 1 °) 


We immediately observe that the solution, x^\ to equation B 1.5. 9 is simply that of the original harmonic 
oscillator problem given by equation B 1.5. 3 . Substituting this result for x^ 1 ) into the last term of equation 
Bl.5.10 and solving for the second-order term x^\ we obtain two solutions: one oscillating at twice the 
frequency of the applied field and a static part at a frequency of co = 0. This behaviour arises from the fact that 

the square of the term x"\t), which now acts as a source term in equation Bl.5.10 , possesses frequency 
components at frequencies 2co and zero. The material response at the frequency 2co corresponds to SHG, 
while the response at frequency zero corresponds to the phenomenon of optical rectification. The effect of a 
linear and nonlinear relation between the driving field and the material response is illustrated in figure Bl.5.2. 
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Figure Bl.5.2 Nonlinear dependence of the polarization P on the electric field E. (a) For small sinusoidal 
input fields, P depends linearly on E\ hence its harmonic content is mainly that of E. (b) For a stronger driving 
electric field E, the polarization waveform becomes distorted, giving rise to new harmonic components. The 
second-harmonic and DC components are shown. 

In analogy to equation B 1.5. 3 , we can write the steady-state solution to equation Bl. 5. 10 for the SHG process 
as 


x {2 \t) =x {2) (2<r>)cxp(-2\o>i) 4 c.c + 
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The amplitude of the response, jt )(2g)), is given by the steady-state solution of equation B 1.5. 10 as 

(t>fm) 2 h [2 *E(M) 2 
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where the quantity D(cd) is defined as D(cd) = coj - oo - 2iyco. Similarly, the amplitude of the response at 
frequency zero for the optical rectification process is given by 
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Following the derivation of the linear susceptibility, we may now readily deduce the second-order 

susceptibility %( )(2oo = co + co) for SHG, as well as X (0 = 03 - co) for the optical rectification process. 
Defining the second-order nonlinear susceptibility for SHG as the relation between the square of the relevant 
components of the driving fields and the nonlinear source polarization, 


(B1.5.14) 


P(2o>) = x" ] (2to)E(«t)E(a>) 


we obtain 


X' ? (2*tf = &+(o) = = . (B1.5.15) 

As we shall discuss later in a detailed fashion, the nonlinear polarization associated with the nonlinear 
susceptibility of a medium acts as a source term for radiation at the second harmonic (SH) frequency 2co. 
Since there is a definite phase relation between the fundamental pump radiation and the nonlinear source term, 
coherent SH radiation is emitted in well-defined directions. From the quadratic variation of P(2co) with £(co), 
we expect that the SH intensity I 2(0 will also vary quadratically with the pump intensity / . 

If we compare the nonlinear response of %^(2oo = co + co) with the linear material response of y} '(co), we 
find both similarities and differences. As is apparent from equation B 1.5. 15 and shown pictorially in figure 

Bl.5.3 %( \2(o = co + co) exhibits a resonant enhancement for frequencies co near co Q , just as in the case for % 
( \co). However, in addition to this so-called one-photon resonance, %( 2 )(2co = co + co) also displays a resonant 
response when 2co is near co Q or co « C0q/2. This feature is termed a two-photon resonance and has no analogue 

in linear spectroscopy. Despite these differences between %W and yp\ one can see that both types of response 
provide spectroscopic information about the material system. A further important difference concerns 

symmetry characteristics. The linear response %^ may be expected to be present in any material. The second- 
order nonlinear response y} 2 > requires the material to exhibit a non-centrosymmetric character, i.e. in the one- 
dimensional model, the +x and -x directions must be distinguishable. To understand this property, consider 
the potential energy associated with the restoring force on the electron. This potential may be written in the 
notation previously introduced as V{x) = ^nuo^x 2 - ^mh^jc 3, + ■ - - The first term is allowed for all materials; 

the second term, however, cannot be present in centrosymmetric materials since it clearly differentiates 

between a displacement of +x and -x. Thus fr' = except in non-centrosymmetric materials. This symmetry 
distinction is the basis for the remarkable surface and interface sensitivity of SHG and SFG for bulk materials 
with inversion symmetry. 
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Figure Bl.5.3 Magnitude of the second-order nonlinear susceptibility y versus frequency co, obtained from 
the anharmonic oscillator model, in the vicinity of the single- and two-photon resonances at frequencies co Q 
and co 0/2 , respectively. 

(B) MAXWELL'S EQUATIONS 

We now embark on a more formal description of nonlinear optical phenomena. A natural starting point for 
this discussion is the set of Maxwell equations, which are just as valid for nonlinear optics as for linear optics. 


In the absence of free charges and current densities, we have in cgs units: 


V X Jf = (B1.5.16) 

C Hi 


I i)B 

Vx E = (B1.5.17) 

v a: 


V B= (B1.5.18) 


V D = Q (B1.5.19) 

where E and H are the electric and magnetic field intensities, respectively; D and B are the electric 
displacement and magnetic induction, respectively. In the optical regime, we generally neglect the magnetic 
response of the material and take B = H. The material response is then incorporated into the Maxwell 
equations through the displacement vector D. This quantity is related to the electric field E and the 
polarization P (the electric dipole moment per unit volume) by 

D = E + 4ttP. (B1.5.20) 
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The polarization P is given in terms of E by the constitutive relation of the material. For the present 
discussion, we assume that the polarization P(r) depends only on the field E evaluated at the same position r. 
This is the so-called dipole approximation. In later discussions, however, we will consider, in some specific 
cases, the contribution of a polarization that has a non-local spatial dependence on the optical field. Once we 
have augmented the system of equation B 1.5. 16 , equation B 1.5. 17 , equation B 1.5. 18 , equation B 1.5. 19 and 
equation B 1.5. 20 with the constitutive relation for the dependence of P on E, we may solve for the radiation 
fields. This relation is generally characterized through the use of linear and nonlinear susceptibility tensors, 
the subject to which we now turn. 

(C) NONLINEAR OPTICAL SUSCEPTIBILITIES 

If the polarization of a given point in space and time (r, f) depends only on the driving electric field at the 
same coordinates, we may write the polarization as P = P(E). In this case, we may develop the polarization in 
power series as P = P L + P NL = P^ + P^ + P^ + ■ • ', where the linear term is P} U = 5^ x/; '^and the 
nonlinear terms include the second-order response p i2) =Yi m k X-^EjEfr ^ e third-order response 
pP ] = V\ x-^ E-Ei-Ep an d so forth. The coefficients %!- l \ ^^\ and ^^ , are, respectively, the linear, 
second-order nonlinear and the third-order nonlinear susceptibilities of the material. The quantity £. l ?> , it 

should be noted, is a tensor of rank n + 1, and ijkl refer to indices of Cartesian coordinates. The simple 
formulation just presented does not allow for the variation of the optical response with frequency, a behaviour 
of critical importance for spectroscopy. We now briefly discuss how to incorporate frequency-dependent 
behaviour into the polarization response. To treat the frequency response of the material, we consider an 
excitation electric field of the form of a superposition of monochromatic fields 


*?(/) = ^E^JfT^' (B1.5.21) 


where the summation extends over all positive and negative frequency components. Since E(i) represents a 
physical field, it is constrained to be real and E(-<x> m ) = E((D m )*. In the same manner, we can write the 
polarization P as 


P(t) = ^PC^Je - ^'. (B1.5.22) 


Here the collection of frequencies in the summation may include new frequencies co n in addition to those in 
summation of equation B 1.5. 21 for the applied field. The total polarization can be separated into linear, P L , 
and nonlinear, P NL , parts: 

P(t) = P L [t) + PxlU) = P iU [l) + P i2t U) + P C3, (0 + ■ ■ ■ (B1.5.23) 

where P L (0 = P^\t) and P NL (0 = P^\t) + P^\t) + "\ and the terms in P correspond to an expansion in 
powers of the field E. 
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The linear susceptibility, jf . . , is the factor that relates the induced linear polarization to the applied field: 

/>/' V„) = Y. X,v W£j(«fe>. (B1.5.24) 


xfj (Wjt )in this formulation gives rise, as one expects, to a polarization oscillating as the applied frequency co^, 

but may now incorporate a strength that varies with co , as illustrated earlier in the harmonic oscillator model. 
Similarly, we can define the corresponding frequency-dependent second-order, ^.^, and third-order, xfjlj, 

susceptibility tensors by 


Pl 2 \™<f + **>r) = P^xlSi&v +f *i *>*> Wr)£Vl'V fc "*(aV) (B1.5.25) 


f^ V« + Wr + «*■) = /> ^ X^W* + **■ + ** i ^i^p at) Ejfo^MS* (aJr) E * to > (B1 .5.26) 

>l7 


where the quantity/? is called the degeneracy factor and is equal to the number of distinct permutations of the 
applied frequencies {co , co r } and {co , co r , co^} for the second-and third-order processes, respectively. The 
inclusion of the degeneracy factor/? ensures that the nonlinear susceptibility is not discontinuous when two of 
the fields become degenerate, e.g. the nonlinear susceptibility Xrl^ + w ±> w i • ^approaches the value 

Xjji,{2toiiU>i b wi)as co 2 approaches the value co 1 [32]. As can be seen from equation B 1.5. 25 and equation 
Bl.5.26, the first frequency argument of the nonlinear susceptibility is equal to the sum of the rest of its 


frequency arguments. Note also that these frequencies are not constrained to positive values and that the 
complete material response involves both the positive and negative frequency components. We now consider 
some of the processes described by the nonlinear susceptibilities. For the case of the second-order nonlinear 
optical effects (equation B 1.5. 25), three processes can occur when the frequencies co 1 and a> 2 are distinct. This 
can be seen by expanding equation B 1.5. 25: 

P; 2) ((Oy) =2^xljl.(w^O>\.&> 2 )E i {o> ] }Ei l .(OJ2) (B1.5.27) 

J* 


P/ 2) (2^ ff ) = ^xljl(2u) a ,a} a ,a} a )Ej(<o a )Ei(te a ) Of = 1.2 (B1.5.28) 


P^(0> = 2 [ ^ ft ( J (0 : ui + -W| }Ej (w,)£; ( Wl ) + 2 xS®+ **+ -*»J E J ( **> E l ( *>d\ (B1 .5.29) 
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These effects correspond, respectively, to the processes of sum-frequency generation (SFG), SHG and optical 
rectification. 

For the case of third-order nonlinear optical effects ( equation B 1.5. 26 ), a wide variety of processes are 
described by the different possible combinations of applied frequencies. We shall not attempt to catalogue 
them here. The most intuitive case is that of third-harmonic generation (3co = co + co + co), corresponding to 
addition of three equal frequencies. When one of the frequencies is zero (i.e. a DC field) one obtains the so- 
called electric-field induced SHG (EFISH) process (2co = co + co + 0). Several third-order effects have found 
significant use in both frequency and time-domain spectroscopy. These include notably coherent anti-Stokes 
Raman scattering, stimulated Raman scattering, general and degenerate four-wave mixing and two-photon 
absorption [30, 31 ]. 

(D) SYMMETRY PROPERTIES 

The second-order nonlinear susceptibility tensor y} 2 ^ (co 3 , a> 2 , cOj) introduced earlier will, in general, consist 
of 27 distinct elements, each displaying its own dependence on the frequencies (Dp a> 2 and a> 3 ( = ±00^ co 2 ). 
There are, however, constraints associated with spatial and time-reversal symmetry that may reduce the 
complexity of y} ' for a given material [32, 33 and 34]. Here we examine the role of spatial symmetry. 

The most significant symmetry property for the second-order nonlinear optics is inversion symmetry. A 
material possessing inversion symmetry (or centrosymmetry) is one that, for an appropriate origin, remains 
unchanged when all spatial coordinates are inverted via r — » - r. For such materials, the second-order 
nonlinear response vanishes. This fact is of sufficient importance that we shall explain its origin briefly. For a 

centrosymmetric material, yP^ should remain unchanged under an inversion operation, since the material by 
hypothesis does not change. On the other hand, the nature of the physical quantities E and P implies that they 
must obey E — » -E and P—> -P under the inversion operation [35]. The relation p. [2t = ^ j. X-t- £/ £t ^en 
yields Y^ - t x-^Eif* = ~Y1 t X-^EE^ whence jj. ( ?J — q. A further useful symmetry relation applies 
specifically to SHG process. Since the incident fields £ 7 (co) and E h (<&) are identical, we may take 


* 2 } _ y ( 2 > without loss of information. 

For materials that exhibit other classes of spatial symmetry not including centrosymmetry, we expect that 
£. ( r*will be non- vanishing but will display a simplified form [37]. To see how this might work, consider a 

crystal in which the x andy directions are equivalent. The nonlinear response of the medium would 
consequently be equivalent for an applied optical field polarized along either the x or y direction. It then 

follows, for example, that the nonlinear susceptibilities x^iand ^:.?vare equal, as are other elements in which 

the indices x and y are exchanged. This reduces the complexity of ^ significantly. Table B 1.5.1 presents the 
form of the second-order nonlinear susceptibility relevant for the classes of symmetry encountered at isotropic 
and crystalline surfaces and interfaces. 

(E) QUANTUM MECHANICAL DESCRIPTION 

Having now developed some of the basic notions for the macroscopic theory of nonlinear optics, we would 
like to discuss how the microscopic treatment of the nonlinear response of a material is handled. While the 
classical nonlinear 
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oscillator model provides us with a qualitative feeling for the phenomenon, a quantitative theory must 
generally begin with a quantum mechanical description. As in the case of linear optics, quantum mechanical 
calculations in nonlinear optics are conveniently described by perturbation theory and the density matrix 

formalism [36, 37 ]. The heart of this microscopic description is the interaction Hamiltonian H> . = —\i'E(t), 
which characterizes the interaction of the system with the radiation and is treated as a perturbation. Here ji = - 
er is the electric dipole operator and E{i) is the applied optical field. Applying this formalism with first-order 
perturbation theory to calculate the induced dipole moment \i, we obtain a linear susceptibility that is 

proportional to the product of two matrix elements of \i, (\i .) (|i .) , where (\i .) = (g\\i in). This product may 

i g<i j fig I* g>i i 

be viewed as arising from a process in which a photon is first destroyed in a rear or virtual transition from a 
populated energy eigenstate \g) to an empty state \n); a photon is then emitted in the transition back to the 
initial state \g). 

The second-order nonlinear optical processes of SHG and SFG are described correspondingly by second-order 
perturbation theory. In this case, two photons at the driving frequency or frequencies are destroyed and a 
photon at the SH or SF is created. This is accomplished through a succession of three real or virtual 
transitions, as shown in figure B 1.5.4 . These transitions start from an occupied initial energy eigenstate \g), 
pass through intermediate states \ri) and \n) and return to the initial state \g). A full calculation of the second- 
order response for the case of SFG yields [ 37 ] 

XijA h ^ l ' "~ J* 2 ^ (m -*^ + irvfo* -av, + irv.r* (B1.5.30) 

+ seven sijT.ilartenns, 

As for the linear response, the transitions occur through the electric-dipole operator ^ and are characterized by 

the matrix elements (|i •)<„,• I n equation B 1.5. 30, the energy denominators involve the energy differences fico 

i gn **- •" ng 

= E -E and widths hY for transitions between eigenstates \n) and \g). The formula includes a sum over 

n g ng 

different possible ground states weighted by the factor //^representing the probability that state \g) is 

occupied. It is assumed that the material can be treated as having localized electronic states of density TV per 
unit volume and that their interaction and local-field effects may be neglected. Corresponding expressions 


result for delocalized electrons in crystalline solids [ 36 , 37 ]. 

The frequency denominators in the eight terms of equation B 1.5. 30 introduce a resonant enhancement in the 
nonlinearity when any of the three frequencies (<d 1? a> 2 , g> 3 ) coincides with a transition from the ground state 
\g) to one of the intermediate states \n f ) and \n). The numerator of each term, which consists of the product of 
the three dipole matrix elements ( jlx -) ( jlx -) >, ( jlx ,) , , reflects, through its tensor character, the structural 
properties of the material, as well as the details of the character of the relevant energy eigenstates. 




\*) 
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Figure Bl.5.4 Quantum mechanical scheme for the SFG process with ground state \g) and excited states \n') 

and \ri). 

B1. 5.2.2 NONLINEAR OPTICS OF THE INTERFACE 

The focus of the present chapter is the application of second-order nonlinear optics to probe surfaces and 
interfaces. In this section, we outline the phenomenological or macroscopic theory of SHG and SFG at the 
interface of centrosymmetric media. This situation corresponds, as discussed previously, to one in which the 
relevant nonlinear response is forbidden in the bulk media, but allowed at the interface. 

(A) INTERFACIAL CONTRIBUTION 


In order to describe the second-order nonlinear response from the interface of two centrosymmetric media, the 
material system may be divided into three regions: the interface and the two bulk media. The interface is 
defined to be the transitional zone where the material properties — such as the electronic structure or molecular 
orientation of adsorbates — or the electromagnetic fields differ appreciably from the two bulk media. For most 
systems, this region occurs over a length scale of only a few Angstroms. With respect to the optical radiation, 
we can thus treat the nonlinearity of the interface as localized to a sheet of polarization. Formally, we can 
describe this sheet by a nonlinear dipole moment per unit area, P» " , which is related to a second-order bulk 
polarization P^ by P {1} hy F i2) {x , v, z, t) = P*~ ] (.r . v, t)&{z). Here z is the surface normal direction, and the 
x and y axes represent the in-plane coordinates ( figure Bl.5.5 ). 
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Figure Bl.5.5 Schematic representation of the phenomenological model for second-order nonlinear optical 
effects at the interface between two centrosymmetric media. Input waves at frequencies co 1 and a> 2 , with 
corresponding wavevectors k^co^) and ^(co^, are approaching the interface from medium 1. Nonlinear 
radiation at frequency a> 3 is emitted in directions described by the wavevectors k^co^) (reflected in medium 1) 
and k 2 ((£>2) (transmitted in medium 2). The linear dielectric constants of media 1, 2 and the interface are 
denoted by s 1? s 2 , and s', respectively. The figure shows the xz-plane (the plane of incidence) with z 
increasing from top to bottom and z = defining the interface. 

The nonlinear response of the interface may then be characterized in terms of a surface (or interface) 
nonlinear susceptibility tensor ^J*> • This quantity relates the applied electromagnetic fields to the induced 

surface nonlinear polarization K " : 


*C2)j 


— v & 


Prim) = xrfai = wi + **) : Eiwymcoz). 


(B1.5.31) 


In this equation as well as in the succeeding discussions, we have suppressed, for notational simplicity, the 
permutation or degeneracy factor of two, required for SFG. 

To define this model fully, we must specify the linear dielectric response in the vicinity of our surface 
nonlinear susceptibility. We do this in a general fashion by introducing a (frequency-dependent) linear 
dielectric response of the interfacial region s', which is bounded by the bulk media with dielectric constants s 1 
and s 2 - For simplicity, we consider all of these quantities to be scalar, corresponding to isotropic linear optical 
properties. The phenomenological model for second-order nonlinear optical effects is summarized in figure 
Bl.5.5 (An alternative convention [38, 39] for defining the surface nonlinear susceptibility is one in which the 
fundamental fields E(co^) and E((D 2 ) are taken as their value in the lower medium and the polarized sheet is 
treated as radiating in the upper medium. This convention corresponds in our model to the assignments of s' 
(a^) = s 2 ( CD i)' s'(co 2 ) = £ 2 (co 2 ), and s'(a> 3 ) = e 1 (co 3 ).) 

From the point of view of tensor properties, the surface nonlinear susceptibility x*^V s quite analogous to the 

bulk nonlinear response ^ . ( -/in a non-centrosymmetric medium. Consequently, in the absence of any 

symmetry constraints, ^l 1 - will exhibit 27 independent elements for SFG and, because of symmetry for the last 

two indices, 18 independent elements for SHG. If the surface exhibits certain in-plane symmetry properties, 
then the form of j^will be simplified 
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correspondingly. For the common situation of an isotropic surface, for example, the allowed elements of 


y^may be denoted as y ( ~ s , y f2 ? „,. y <2 , 5 , and v ^ . where _L corresponds to the z direction and II refers to 

either x or y. For the case of SHG, where the fundamental frequencies are equal, co 1 = co 2 = co, the tensor 
elements ^^ and Xj!mij_ are likewise equivalent. The non- vanishing elements of x<- h for other commonly 

encountered surface symmetries are summarized in table B 1.5.1. 

Table Bl.5.1 Independent non-vanishing elements of the nonlinear susceptibility, ** for an interface in the 
xy-plane for various symmetry classes. When mirror planes are present, at least one of them is perpendicular 
to the j-axis. For SHG, elements related by the permutation of the last two elements are omitted. For SFG, 
these elements are generally distinct; any symmetry constraints are indicated in parentheses. The terms 
enclosed in parentheses are antisymmetric elements present only for SFG. (After [71]) 


Symmetry class Independent non-vanishing elements 


1 xxx, xxy, xyy, xyz, xxz, xzz, yxx, yxy, yxz, yyy, yyz, yzz, zxx, 

zxy, zxz, zyy, zyz, zzz 
[3pt] m xxx, xyy, xzx, xzz, yxy, yyz, zxx, zxz, zyy, zzz 

xzx, xyz, yxz, yzy, zxx, zyy, zxy, zzz 
2mm xzx, yzy, zxx, zyy, zzz 

3 xxx=-xyy=-yyx(=-yxy) 

yyy=-yxx=-xyx(=-xxy),yzy=xzx, 

zxx=zyy,xyz=-yxz 

zzz,(zxy=-zyx) 
3m xxx=-xyy=-yxy(=-yyx) 

yzy -xzx, zxx-zyy, zzz 
4,6,oo xxz=yyz,zxx=zyy,xyz=-yxz,zzz, (zxy=-zyx) 

4mm, 6mm, oo m xxz=yyz,zxx=zyy,zzz 


The linear and nonlinear optical responses for this problem are defined by e 1? s 2 , s' and x'J\ respectively, as 
indicated in figure Bl.5.5 . In order to determine the nonlinear radiation, we need to introduce appropriate 
pump radiation fields ^(o^) and £ , (o3 2 ). If these pump beams are well-collimated, they will give rise to well- 
collimated radiation emitted through the surface nonlinear response. Because the nonlinear response is present 
only in a thin layer, phase matching [ 37 ] considerations are unimportant and nonlinear emission will be 
present in both transmitted and reflected directions. 
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Here we model the pump beams associated with fields ^(o^) and E((D 2 ) as plane waves with wavevectors 
ft i = k\w\ v^i(w])/f an d j^ = hah*J$i{&b)f€' ^e directions of the reflected and transmitted beams can 

then be obtained simply through conservation of the in-plane component of the wavevector, i.e. ^ x (g>i) + ^ x 
(a> 2 ) = ^ixC ^) = ^x^)* ^^ s ^ s ^ e non li near optical analogue of Snell's law. For the case of SHG, this 
equation may be reduced to ^(co)sin = h(2(d) sin0 2co for the angle of the incident pump radiation, , and 
the angle of the emitted nonlinear beams, 2q) . The refractive indices in this equation correspond to those of 
the relevant bulk medium through which the beams propagate. For reflection in a non-dispersive medium, we 
obtain simply = 2co , as for the law of reflection. For the transmitted beam, the relation in the absence of 
dispersion reduces to the usual Snell's law for refraction. 

A full solution of the nonlinear radiation follows from the Maxwell equations. The general case of radiation 
from a second-order nonlinear material of finite thickness was solved by Bloembergen and Pershan in 1962 
[40]. That problem reduces to the present one if we let the interfacial thickness approach zero. Other 
equivalent solutions involved the application of the boundary conditions for a polarization sheet [14] or the 


use of a Green's function formalism for the surface [38, 39 ]. 

From such a treatment, we may derive explicit expressions for the nonlinear radiation in terms of the linear 
and nonlinear response and the excitation conditions. For the case of nonlinear reflection, we obtain an 
irradiance for the radiation emitted at the nonlinear frequency a> 3 of 

C^[^l(«3)eL(M)t'l(W3)] 1/a 

where /(co^ and /(co 2 ) denote the intensities of the pump beams at frequencies co 1 and a> 2 incident from 
medium 1; c is the speed of light in vacuum; is the angle of propagation direction of the nonlinear radiation 
relative to the surface normal. The vectors e'((o^) 9 e'((£>^) and e f (a}^) represent the unit polarization vectors 
ei toih eifwj)' an d ei(&>0' respectively, adjusted to account for the linear optical propagation of the waves. 

More specifically, we may write 

e'(ft>) = F|_oeiM- (B1.5.33) 

Here the 'Fresnel transformation' F^ 2 describes the relationship between the electric field Eeiin medium 1 

(propagating towards medium 2) and the resulting field Ee' at the interface. For light incident in the x-z plane 
as shown in figure Bl.5.5 F^ 2 * s a diagonal matrix whose elements are 

Fj™ 2 = 2£i*2.;/te2*l.r + *1*2.;> (B1.5.34) 


Ffi 2 = 2fcu/(*L,:+^) (B1.5.35) 

F£ 2 = 2{$\i%fe?)ktJ{$zk u + etk u ) (B1.5.36) 
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where the quantity k. denotes the magnitude of the z-component of the wave vector in medium i at the 
relevant wavelength (o^, a> 2 or a> 3 ). 

The treatment of this section has been based on an assumed nonlinear surface response ** and has dealt 
entirely with electromagnetic considerations of excitation and radiation from the interface. A complete 
theoretical picture, however, includes developing a microscopic description of the surface nonlinear 
susceptibility. In the discussion in section B 1.5. 4 , we will introduce some simplified models. In this context, 
an important first approximation for many systems of chemical interest may be obtained by treating the 
surface nonlinearity as arising from the composite of individual molecular contributions. The molecular 
response is typically assumed to be that of the isolated molecule, but in the summation for the surface 
nonlinear response, we take into account the orientational distribution appropriate for the surface or interface, 
as we discuss later. Local-field corrections may also be included [ 41 , 42]. Such analyses may then draw on the 
large and well-developed literature concerning the second-order nonlinearity of molecules [43, 44]. If we are 
concerned with the response of the surface of a clean solid, we must typically adopt a different approach: one 
based on delocalized electrons. This is a challenging undertaking, as a proper treatment of the linear optical 
properties of surfaces of solids is already difficult [45], Nonetheless, in recent years significant progress has 
been made in developing a fundamental theory of the nonlinear response of surfaces of both metals [46, 47, 


48, 49, 50 and 51] and semiconductors [52, 51, 54 and 55]- 
(B) BULK CONTRIBUTION 

For centrosymmetric media the spatially local contribution to the second-order nonlinear response vanishes, 
as we have previously argued, providing the interface specificity of the method. This spatially local 
contribution, which arises in the quantum mechanical picture from the electric-dipole terms, represents the 
dominant response of the medium. However, if we consider the problem of probing interfaces closely, we 
recognize that we are comparing the nonlinear signal originating from an interfacial region of monolayer 
thickness with that of the bulk media. In the bulk media, the signal can build up over a thickness on the scale 
of the optical wavelength, as dictated by absorption and phase-matching considerations. Thus, a bulk 
nonlinear polarization that is much weaker than that of the dipole-allowed contribution present at the interface 
may still prove to be significant because of the larger volume contributing to the emission. Let us examine this 
point in a somewhat more quantitative fashion. 

The higher-order bulk contribution to the nonlinear response arises, as just mentioned, from a spatially non- 
local response in which the induced nonlinear polarization does not depend solely on the value of the 
fundamental electric field at the same point. To leading order, we may represent these non-local terms as 
being proportional to a nonlinear response incorporating a first spatial derivative of the fundamental electric 
field. Such terms correspond in the microscopic theory to the inclusion of electric-quadrupole and magnetic- 
dipole contributions. The form of these bulk contributions may be derived on the basis of symmetry 
considerations. As an example of a frequently encountered situation, we indicate here the non-local 
polarization for SHG in a cubic material excited by a plane wave E(($): 


P^(2m) = yVi[E(a>) ■ E{&)] + ^(^V.E^). (B1.5.37) 

The two coefficients y and Q describe the material response and the Cartesian coordinate / must be chosen as a 
principal axis of the material. 
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From consideration of the quantum mechanical expression of such a non-local response, one may argue that 
the dipole-forbidden bulk nonlinear polarization will have a strength reduced from that of the dipole-allowed 
response by a factor of the order of (a/A,), with a denoting a typical atomic dimension and X representing the 
wavelength of light. On the other hand, the relevant volume for the bulk contribution typically exceeds that of 
the interface by a factor of the order of (kid). Consequently, one estimates that the net bulk and surface 
contributions to the nonlinear radiation may be roughly comparable in strength. In practice, the interfacial 
contribution often dominates that of the bulk. Nonetheless, one should not neglect a priori the possible role of 
the bulk nonlinear response. This situation of a possible bulk background signal comparable to that of the 
interface should be contrasted to the expected behaviour for a conventional optical probe lacking interface 
specificity. In the latter case, the bulk contribution would be expected to dominate that of the interface by 
several orders of magnitude. 

(C) OTHER SOURCES 

As we have discussed earlier in the context of surfaces and interfaces, the breaking of the inversion symmetry 
strongly alters the SHG from a centrosymmetric medium. Surfaces and interfaces are not the only means of 
breaking the inversion symmetry of a centrosymmetric material. Another important perturbation is that 
induced by (static) electric fields. Such electric fields may be applied externally or may arise internally from a 
depletion layer at the interface of a semiconductor or from a double-charge layer at the interface of a liquid. 


Since the electric field is a polar vector, it acts to break the inversion symmetry and gives rise to dipole- 
allowed sources of nonlinear polarization in the bulk of a centrosymmetric medium. Assuming that the DC 
field, EjjQ, is sufficiently weak to be treated in a leading-order perturbation expansion, the response may be 
written as 

P^aoj) = X° ] ' B(w)B(w)Bdc (B1.5.38) 

where x is the effective third-order response. This process is called electric-field-induced SHG or EFISH. 

A different type of external perturbation is the application of a magnetic field. In contrast to the case of an 
electric field, an applied magnetic field does not lift the inversion symmetry of a centrosymmetric medium. 
Hence, it does not give rise to a dipole-allowed bulk polarization for SHG or SFG [23, 56]. A magnetic field 
can, however, modify the form and strength of the interfacial nonlinear response, as well as the bulk 
quadrupole nonlinear susceptibilities. This process is termed magnetization-induced SHG or MSHG. 
Experiments exploiting both EFISH and MSHG phenomena are discussed in section B 1.5.4. 7 . 


B1.5.3 EXPERIMENTAL CONSIDERATIONS 

In this section, we provide a brief overview of some experimental issues relevant in performing surface SHG 
and SFG measurements. 

B1. 5.3.1 EXPERIMENTAL GEOMETRY 

The main panel of figure Bl.5.6 portrays a typical setup for SHG. A laser source of frequency co is directed to 
the sample, with several optical stages typically being introduced for additional control and filtering. The 
combination of a 
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halfwave plate and a polarizer is used to specify the orientation of the polarization of the pump beam. It can 
also serve as a variable attenuator to adjust the intensity of the incoming beam. A lens then focuses the beam 
onto the sample. A low-pass filter is generally needed along the path of the fundamental radiation prior to the 
sample to remove any unwanted radiation at the frequency of the nonlinear radiation. This radiation may arise 
from previous optical components, including the laser source itself. 


Figure Bl.5.6 Experimental geometry for typical SHG and SFG measurements. 

The reflected radiation consists of a strong beam at the fundamental frequency co and a weak signal at the SH 
frequency. Consequently, a high-pass filter is introduced, which transmits the nonlinear radiation, but blocks 
the fundamental radiation. By inserting this filter immediately after the sample, we minimize the generation of 
other nonlinear optical signals from succeeding optical components by the fundamental light reflected from 
the sample. After this initial filtering stage, a lens typically recollimates the beam and an analyser is used to 
select the desired polarization. Although not always essential, a monochromator or bandpass filter is often 
desirable to ensure that only the SH signal is measured. Background signals near the SH frequency, but not 
associated with the SHG process, may arise from multiphoton fluorescence, hyper-Raman scattering and other 
nonlinear processes described by higher-order nonlinear susceptibilities. Detection is usually accomplished 
through a photomultiplier tube. Depending on the nature of the laser source, various sensitive schemes for 


electronic detection of the photomultiplier may be employed, such as photon counting and gated integration. 
In conjunction with an optical chopper in the beam path, lock-in amplification techniques may also be 
advantageous. 

One of the key factors for performing surface SHG/SFG measurements is to reduce all sources of background 
signals, since the desired nonlinear signal is always relatively weak. This goal is accomplished most 
effectively by exploiting the well-defined spectral, spatial and temporal characteristics of the nonlinear 
radiation. The first of these is achieved by spectral filtering, as previously discussed. The second may be 
achieved through the use of appropriate apertures; and the last, particularly for low-repetition rate systems, 
can be incorporated into the electronic detection scheme. When 
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the excitation is derived from two non-collinear beams, the nonlinear emission will generally travel in a 
distinct direction ( figure B1.5.6 )inset). In this case, one can also exploit spatial filters to enhance spectral 
selectivity, since the reflected pump beams will travel in different directions. This property is particularly 
useful for SFG experiments in which the frequency of the visible beam and the SF signal are relatively 
similar. 

For some experiments, it may be helpful to obtain a reference signal to correct for fluctuations and long-term 
drift in the pump laser. This correction is best accomplished by performing simultaneous measurements of the 

SHG or SFG from a medium that has a strong y} 2 ) response in a separate detection arm. By this means, one 
may fully compensate for variations not only in pulse energy, but also in the temporal and spatial substructure 
of the laser pulses. Some experiments may require measurement of the phase of the nonlinear signal [57]. 
Such phase measurements rely on interference with radiation from a reference nonlinear source. The required 
interference can be achieved by placing a reference nonlinear crystal along the path of the laser beam 
immediately either before or after the sample [58]. For effective interference, we must control both the 
amplitude and polarization of the reference signal. This may be achieved by appropriate focusing conditions 
and crystal alignment. The phase of the reference signal must also be adjustable. Phase control may be 
obtained simply by translating the reference sample along the path of the laser, making use of the dispersion 
of the air (or other medium) through which the beams propagate. 

B1. 5.3.2 LASER SOURCES 

In order to achieve a reasonable signal strength from the nonlinear response of approximately one atomic 
monolayer at an interface, a laser source with high peak power is generally required. Common sources include 
Q-switched (-10 ns pulsewidth) and mode-locked (-100 ps) Nd:YAG lasers, and mode-locked (-10 fs-1 ps) 
Ti: sapphire lasers. Broadly tunable sources have traditionally been based on dye lasers. More recently, optical 
parametric oscillator/amplifier (OPO/OPA) systems are coming into widespread use for tunable sources of 
both visible and infrared radiation. 

In typical experiments, the laser fluence, or the energy per unit area, is limited to the sample's damage 
threshold. This generally lies in the range <K1 J cm and constrains our ability to increase signal strength by 
increasing the pump energy. Frequently, the use of femtosecond pulses is advantageous, as one may obtain a 
higher intensity (and, hence, higher nonlinear conversion efficiency) at lower fluence. In addition, such 
sources generally permit one to employ lower average intensity, which reduces average heating of the sample 
and other undesired effects [59]. Independently of these considerations, femtosecond lasers are, of course, 
also attractive for the possibilities that they offer for measurements of ultrafast dynamics. 

B1. 5.3.3 SIGNAL STRENGTHS 

We now consider the signal strengths from surface SHG/SFG measurements. For this purpose, we may recast 


expression Bl.5.32 for the reflected nonlinear radiation in terms of the number of emitted photon per unit time 
as 


5 = »c 


ir , ; - ; ; , 1Ll , rs — r-J tfto)'3«r :*ta)eta] - (B1.5.39) 


The quantities in this formula are defined as in equation Bl.5.32 , but with the laser parameters translated into 
more convenient 
7? is the pulse 


more convenient terms: P is the average power at the indicated frequency; t is the laser pulse duration; 


-22- 


repetition rate; and A is the irradiated area at the interface. The last three defined quantities are assumed to be 

equal for both excitation beams in an SFG measurement. If this is not the case, then t, i? r . A, as well as the 

p rep 

average power P av ((0.), have to be replaced by the corresponding quantities within the window of spatial and 
temporal overlap. 

From this expression, we may estimate typical signals for a surface SHG measurement. We assume the 
following as representative parameters: yP^ = 10 -15 esu, s 1 = 1, and sec 2 = 4. For typical optical frequencies, 
we then obtain S « 10 (P ) 'it R r A) 9 where P v , A, R r and t are expressed, respectively, in W, cm 2 , 
Hz and s. Many recent SHG studies have been performed with moae-locked Tksapphire lasers. For typical 

laser parameters of P avg = 100 mW, A = 10~ 4 cm 2 , R = 100 MHz, and t^ = 100 fs, one then obtains S * 10 5 
counts per second as a representative nonlinear signal. 

B1. 5.3.4 DETERMINATION OF NONLINEAR SUSCEPTIBILITY ELEMENTS 

The basic physical quantities that define the material for SHG or SFG processes are the nonlinear 
susceptibility elements j[£jL. Here we consider how one may determine these quantities experimentally. For 

simplicity, we treat the case of SHG and assume that the surface is isotropic. From symmetry considerations, 
we know that y ,2, ip has three independent and non- vanishing elements: ypf . , , y*"?...., and y^' „ = y* 2 },, , . The 

individual elements ^ and » [1 * can be extracted directly by an appropriate choice of input and output 

polarizations. Response from the v^- 3 element requires s-polarized pump radiation and produces p-polarized 

SH emission; excitation of the ^ element requires a mixed-polarized pump radiation and can be isolated by 

detection of s-polarized radiation. The measurement of y®& is bit more complicated: to isolate it would 

require a pump electric field aligned normal to the surface, thus implying a pump beam travelling parallel to 
the surface (the limit of grazing incidence). 

An alternative scheme for extracting all three isotropic nonlinear susceptibilities can be formulated by 
examining equation B 1.5. 39 . By choosing an appropriate configuration and the orientation of the polarization 
of the SH radiation e'(2(ti) such that the SHG signal vanishes, one obtains, assuming only surface contribution 
with real elements y [2 .\ r , 

ijk 

Expanding this equation, we deduce that 


The magnitudes of e\ (i =_L ||)contain the Fresnel factors from equation B 1.5. 34 , equation B 1.5. 3 5 and 
equation Bl. 5. 36 , which depend on the incident, reflected and polarization angles. Experimentally, one 
approach is to fix the input polarization and adjust the analyser to obtain a null in the SH signal [60]. By 
choosing distinct configurations such that the corresponding three equations from equation B 1.5.40 are 
linearly independent, the relative values of y f -> Y i2] , and y^ — y*- s may be inferred. This method has 

been implemented, for example, in determining the three susceptibility elements for the air/water interface 
[61]. The procedure just described is suitable for 


-23- 


ascertaining the relative magnitudes of the allowed elements of x[ ly . 

A determination of the absolute magnitude of the elements of ^-taay also be of value. In principle, this might 

be accomplished by careful measurements of signal strengths from equation (B 1.5. 3 2) or equation (B 1.5. 3 9) . 
In practice, such an approach is very difficult, as it would require precise calibration of the parameters of the 
laser radiation and absolute detection sensitivity. The preferred method is consequently to compare the surface 
SHG or SFG response to that of a reference crystal with a known bulk nonlinearity inserted in place of the 
sample. Since the expected signal can be calculated for this reference material, we may then infer the absolute 
calibration of ^ 2) by comparison. It should be further noted that the phase of elements of ^J.- ) may also be 

established by interference measurements as indicated in section Bl. 5. 3.1 . 


B1. 5.4 APPLICATIONS 

The discussion of applications of the SHG and SFG methods in this section is directed towards an exposition 
of the different types of information that may be obtained from such measurements. The topics have been 
arranged accordingly into seven general categories that arise chiefly from the properties of the nonlinear 
susceptibility: surface symmetry and order, adsorbate coverage, molecular orientation, spectroscopy, 
dynamics, spatial resolution and perturbations induced by electric and magnetic fields. Although we have 
included some illustrative examples, a comprehensive description of the broad range of materials probed by 
these methods, and what has been learned about them, is clearly beyond the scope of this chapter. 

B1.5.4.1 SURFACE SYMMETRY AND ORDER 

Spatial symmetry is one of the basic properties of a surface or interface. If the symmetry of the surface is 
known a priori, then this knowledge may be used to simplify the form of the surface nonlinear susceptibility 
^jr\ as discussed in section B 1.5. 2. 2 . Conversely, in the absence of knowledge of the surface symmetry, we 

may characterize the form of vpexperimentally and then make inferences about the symmetry of the surface 

or interface. This provides some useful information about the material system under study, as we shall 
illustrate in this section. Before doing so, we should remark that the spatial properties being probed are 
averaged over a length scale that depends on the precise experimental geometry, but exceeds the scale of an 
optical wavelength. In the following paragraphs, we consider two of the interesting cases of surface 
symmetry: that corresponding to the surface of a crystalline material and that of a surface with chiral 
character. 

(A) CRYSTALLINE SURFACES 


All of the symmetry classes compatible with the long-range periodic arrangement of atoms comprising 
crystalline surfaces and interfaces have been enumerated in table Bl.5.1 . For each of these symmetries, we 
indicate the corresponding form of the surface nonlinear susceptibility ^J 2) . With the exception of surfaces 

with four- fold or six-fold rotational symmetries, all of these symmetry classes give rise to a ^--that may be 

distinguished from that of an isotropic surface with mirror symmetry, the highest possible surface symmetry. 
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An experimental analysis of the surface symmetry may be carried out in various ways. For a fixed crystal 
orientation, the surface symmetry may be probed by modifying the angle of incidence and polarization of the 
input and output beams. This approach is often employed for samples, such as those in ultrahigh vacuum, that 
are difficult to manipulate. An attractive alternative method is to probe the rotational anisotropy simply by 
recording the change in the nonlinear signal as the sample is rotated about its surface normal. The resulting 
data reflect directly the symmetry of the surface or interface. Thus, the rotational pattern for the (1 1 1) surface 
of a cubic centrosymmetric crystal with 3m symmetry will also have three-fold symmetry with mirror planes. 
A surface with four-fold symmetry, as in the case of the (001) surface of a cubic material, will give rise to a 
rotational anisotropy pattern that obeys four- fold symmetry. From table Bl.5.1 , we see that it will, however, 
do so in a trivial fashion by giving a response equivalent to that of an isotropic material, i.e. lacking any 
variation with rotation of the crystal surface. 

As an illustration, we consider the case of SHG from the (1 1 1) surface of a cubic material (3m symmetry). 
More general treatments of rotational anisotropy in centrosymmetric crystals may be found in the literature 
[ 62 , 63 and 64]. For the case at hand, we may determine the anisotropy of the radiated SH field from equation 
Bl.5.32 in conjunction with the form of v^from table Bl.5.1 . We find, for example, for the p-in/p-out and s- 

in/s-out polarization configurations: 


C^ = «* +AJ «*3*) (B1-5.41) 


^SnUt = I>1 *W* ) < B1 - 5 - 42 ) 

where a^ a^ and b^ are constants. The angle \\f corresponds to the rotation of the sample about its surface 
normal and is measured between the plane of incidence and the [1 12] direction in the surface plane. 

Figure Bl.5.7 displays results of a measurement of the rotational anisotropy for an oxidized Si(l 1 1) surface 
[65]. For the case shown in the top panel, the results conform to the predictions of equation B 1.5.42 (with / 

(2co)oc| J £ , ( 2co )| 2 ) for ideal 3m symmetry. The data clearly illustrate the strong influence of anisotropy on 
measured nonlinear signals. The lower panels of figure Bl.5.7 are perturbed rotational anisotropy patterns. 

They correspond to data for vicinal Si(l 1 1) surfaces cut at 3° and 5°, respectively, away from the true (1 11) 

orientation. The full lines fitting these data are obtained from an analysis in which the lowered symmetry of 

these surfaces is taken into account. The results show the sensitivity of this method to slight changes in the 

surface symmetry. 
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Figure Bl.5.7 Rotational anisotropy of the SH intensity from oxidized Si(l 1 1) surfaces. The samples have 
either ideal orientation or small offset angles of 3° and 5° toward the [112] direction. Top panel illustrates the 
step structure. The points correspond to experimental data and the full lines to the prediction of a symmetry 
analysis. (From [65].) 

SH anisotropy measurements of this kind are of use in establishing the orientation of a crystal face of a 
material, as suggested by figure Bl.5.7. The method is also of value for monitoring and study of crystal 
growth processes [66], Consider, for example, the growth of Si on Si(l 1 1) surface. The crystalline surface 
exhibits strong rotational anisotropy, corresponding to the 3m symmetry of the surface. This will also be the 
case when a crystalline Si layer is grown on the sample. If, however, the overlayer is grown in an amorphous 
state, as would occur for Si deposition at 
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room temperature, then the anisotropy will be reduced: the disordered overlayer will exhibit isotropic 
symmetry on the length scale of the optical wavelength. A further application of rotational anisotropy 
measurements has been found in the characterization of surface roughness [67], 


(B) CHIRAL INTERFACES 

An important distinction among surfaces and interfaces is whether or not they exhibit mirror symmetry about 
a plane normal to the surface. This symmetry is particularly relevant for the case of isotropic surfaces (oo- 
symmetry), i.e. ones that are equivalent in every azimuthal direction. Those surfaces that fail to exhibit mirror 
symmetry may be termed chiral surfaces. They would be expected, for example, at the boundary of a liquid 
comprised of chiral molecules. Magnetized surfaces of isotropic media may also exhibit this symmetry. (For a 
review of SHG studies of chiral interfaces, the reader is referred to [68].) 

Given the interest and importance of chiral molecules, there has been considerable activity in investigating the 
corresponding chiral surfaces [ 68 , 69 and 70]. From the point of view of performing surface and interface 
spectroscopy with nonlinear optics, we must first examine the nonlinear response of the bulk liquid. Clearly, a 
chiral liquid lacks inversion symmetry. As such, it may be expected to have a strong (dipole-allowed) second- 
order nonlinear response. This is indeed true in the general case of SFG [71]. For SHG, however, the 
permutation symmetry for the last two indices of the nonlinear susceptibility tensor combined with the 

requirement of isotropic symmetry in three dimensions implies that jP^ = 0. Thus, for the case of SHG, the 
surface/interface specificity of the technique is preserved even for chiral liquids. 

A schematic diagram of the surface of a liquid of non-chiral (a) and chiral molecules (b) is shown in figure 
Bl.5.8 . Case (a) corresponds to com-symmetry (isotropic with a mirror plane) and case (b) to oo-symmetry 
(isotropic). For the com-symmetry, the SH signal for the polarization configurations of s-in/s-out and p-in/s- 
out vanish. From table Bl.5.1 , we find, however, that for the oo-symmetry, an extra independent nonlinear 
susceptibility element, %^ v _ — —^ x _, is present for SHG. Because of this extra element, the SH signal for 

p-in/s-out configuration is no longer forbidden, and consequently, the SH polarization must no longer be 
strictly p-polarized. figure B 1.5. 8(c) shows the SH signal passing through an analyser as a function of its 
orientation for a racemic mixture (squares) and for a non-racemic mixture (circle) of molecules in a 
Langmuir-Blodgett film [70]. For the racemic mixture (squares), which contains equal amounts of both 
enantiomers, the effective symmetry is com. Hence, the p-in/s-out signal vanishes and the response curve of 
figure Bl.5.8 is centred at 90°. For the case of the non-racemic mixture, the effective symmetry is qo. A p-in/s- 
out SH signal is present and leads, as shown in figure B 1.5. 8(c) , to a displacement in the curve of the SH 
response versus analyser setting. 
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Figure Bl.5.8 Random distribution of (a) non-chiral adsorbates that gives rise to a surface having effective go 
m-symmetry; (b) chiral molecules that gives rise to effective oo-symmetry. (c) SH intensity versus the angle of 
an analyser for a racemic (squares) and a non-racemic (open circles) monolayer of chiral molecules. The 
pump beam was p-polarized; the SH polarization angles of 0° and 90° correspond to s- and p-polarization, 
respectively. (From [70].) 

This effect of a change in the SH output polarization depending on the enantiomer or mixture of enantiomer is 
somewhat analogous to the linear optical phenomenon of optical rotary dispersion (ORD) in bulk chiral 
liquids. As such, the process for SH radiation is termed SHG-ORD [70]. In general, chiral surfaces will also 
exhibit distinct radiation characteristics for left- and right-polarized pump beams. Again, by analogy with the 
linear optical process of circular dichroism (CD), this effect has been termed SHG-CD [69]. 

B1. 5.4.2 ADSORBATE COVERAGE 

A quantity of interest in many studies of surfaces and interfaces is the concentration of adsorbed atomic or 
molecular species. The SHG/SFG technique has been found to be a useful probe of adsorbate density for a 
wide range of interfaces. The surface sensitivity afforded by the method is illustrated by the results of figure 
Bl.5.9 [72]. These data show the dramatic change in SH response from a clean surface of silicon upon 
adsorption of a fraction of a monolayer of atomic hydrogen. 
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Figure Bl.5.9 Dependence of the magnitude (full circles) and the phase (open squares) of the nonlinear 

susceptibility X s of Si(l 1 1) 7 u 7 on the coverage of adsorbed atomic hydrogen for an excitation wavelength 
of 1064 nm. (From [72].) 

We now consider how one extracts quantitative information about the surface or interface adsorbate coverage 
from such SHG data. In many circumstances, it is possible to adopt a purely phenomenological approach: one 
calibrates the nonlinear response as a function of surface coverage in a preliminary set of experiments and 
then makes use of this calibration in subsequent investigations. Such an approach may, for example, be 
appropriate for studies of adsorption kinetics where the interest lies in the temporal evolution of the surface 
adsorbate density 7V § . 

For other purposes, obtaining a measure of the adsorbate surface density directly from the experiment is 
desirable. From this perspective, we introduce a simple model for the variation of the surface nonlinear 
susceptibility with adsorbate coverage. An approximation that has been found suitable for many systems is 


**(".> -x2 + jW 2> . 


(B1.5.43) 


From a purely phenomenological perspective, this relationship describes a constant rate of change in the 
nonlinear susceptibility of the surface with increasing adsorbate surface density 7V § . Within a picture of 

adsorbed molecules, or 2 ) may be interpreted as the nonlinear polarizability of the adsorbed species. The 
quantity v-*- 3 represents the nonlinear response in the absence of the adsorbed species. 

If we consider the optical response of a molecular monolayer of increasing surface density, the form of 
equation B 1.5. 43 is justified in the limit of relatively low density where local-field interactions between the 
adsorbed species may be neglected. It is difficult to produce any rule for the range of validity of this 
approximation, as it depends strongly on the system under study, as well as on the desired level of accuracy 
for the measurement. The relevant corrections, which may be viewed as analogous to the Clausius-Mossotti 
corrections in linear optics, have been the 
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subject of some discussion in the literature [ 41 , 42 ]. In addition to the local-field effects, the simple 
proportionality of variation in x^'* { Mjwith N s frequently breaks down for reasons related to the physical and 

chemical nature of the surface or interface. In particular, inhomogeneous surfaces in which differing binding 
sites fill in different proportions may give rise to a variation in yJ-^-^jthat would, to leading order, vary with 


relative populations of the different sites. Also, inter-adsorbate interactions that lead to shifts in energy levels 
or molecular orientation, as will be discussed later, will influence the nonlinear response in a manner beyond 
that captured in equation B 1.5. 43 . 

Despite these caveats in the application of equation B 1.5. 43 , one finds that it provides reasonable accuracy in 
many experimental situations. The SH response for the H/Si system of figure Bl.5.9 for example, is seen to 
obey the simple linear variation of ;^ J (W-Jwith N s of equation B 1 .5 .43 rather well up to an adsorbate 

coverage of about 0.5 monolayers. These data are also interesting because they show how destructive 
interference between the terms JcJ^and N^aS 2 ) can cause the SH signal to decrease with increasing N^. In this 

particular example, the physical interpretation of this effect is based on the strong nonlinear response of the 
bare surface from the Si dangling bonds. With increasing hydrogen coverage, the concentration of dangling 
bonds is reduced and the surface nonlinearity decreases. For a system where ^^is relatively small and the 

nonlinear response of the adsorbed species is significant, just the opposite trend for the variation of 
X* {AQwith 7V § would occur. 

The applications of this simple measure of surface adsorbate coverage have been quite widespread and 
diverse. It has been possible, for example, to measure adsorption isotherms in many systems. From these 
measurements, one may obtain important information such as the adsorption free energy, A G° = -RTln(K) 
[21]. One can also monitor the kinetics of adsorption and desorption to obtain rates. In conjunction with 
temperature-dependent data, one may further infer activation energies and pre-exponential factors [73, 74 ]. 
Knowledge of such kinetic parameters is useful for technological applications, such as semiconductor growth 
and synthesis of chemical compounds [75]. Second-order nonlinear optics may also play a role in the 
investigation of physical kinetics, such as the rates and mechanisms of transport processes across interfaces 
[76]. 

Before leaving this topic, we would like to touch on two related points. The first concerns the possibility of an 
absolute determination of the surface adsorbate density. Equation B 1.5. 43 would suggest that one might use 

knowledge, either experimental or theoretical, of or 2 ) and an experimental determination of ^ s '^and ^Mo 
infer TV in absolute terms. In practice, this is problematic. One experimental issue is that a correct 

measurement of X-^ in absolute terms may be difficult. However, through appropriate comparison with the 
response of a calibrated nonlinear reference material, we may usually accomplish this task. More problematic 

is obtaining knowledge of or 2 ) for the adsorbed species. The determination of or 2 ) in the gas or liquid phase is 
already difficult. In addition, the perturbation induced by the surface or interface is typically significant. 
Moreover, as discussed later, molecular orientation is a critical factor in determining the surface nonlinear 
response. For these reasons, absolute surface densities can generally be found from surface SHG/SFH 
measurements only if we can calibrate the surface nonlinear response at two or more coverages determined by 
other means. This situation, it should be noted, is not dissimilar to that encountered for many other common 
surface probes. 

The second issue concerns molecular specificity. For a simple measurement of SHG at an arbitrary laser 
frequency, one cannot expect to extract information of the behaviour of a system with several possible 
adsorbed species. To make the technique appropriate for such cases, one needs to rely on spectroscopic 
information. In the simplest implementation, one chooses a frequency for which the nonlinear response of the 
species of interest is large or dominant. As will 
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be discussed in section B 1.5.4.4 , this capability is significantly enhanced with SFG and the selection of a 
frequency corresponding to a vibrational resonance. 

B1. 5.4.3 MOLECULAR ORIENTATION 


The nonlinear response of an individual molecule depends on the orientation of the molecule with respect to 
the polarization of the applied and detected electric fields. The same situation prevails for an ensemble of 
molecules at an interface. It follows that we may garner information about molecular orientation at surfaces 
and interfaces by appropriate measurements of the polarization dependence of the nonlinear response, taken 
together with a model for the nonlinear response of the relevant molecule in a standard orientation. 

We now consider this issue in a more rigorous fashion. The inference of molecular orientation can be 
explained most readily from the following relation between the surface nonlinear susceptibility tensor and the 
molecular nonlinear polarizability a^ ': 

(B1.5.44) 

Here the ijk coordinate system represents the laboratory reference frame; the primed coordinate system i'fk 
corresponds to coordinates in the molecular system. The quantities T~, are the matrices describing the 
coordinate transformation between the molecular and laboratory systems. In this relationship, we have 
neglected local-field effects and expressed the in a form equivalent to summing the molecular response over 
all the molecules in a unit surface area (with surface density AT). (For simplicity, we have omitted any 
contribution to not attributable to the dipolar response of the molecules. In many cases, however, it is 
important to measure and account for the background nonlinear response not arising from the dipolar 
contributions from the molecules of interest.) In equation B 1.5. 44, we allow for a distribution of molecular 
orientations and have denoted by ( ) the corresponding ensemble average: 

(B1.5.45) 

Here/(0 ? qw) is the probability distribution of finding a molecule oriented at (0,(p,\|/) within an element dQ of 
solid angle with the molecular orientation defined in terms of the usual Euler angles ( figure Bl.5.10 ). 
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Figure Bl.5.10 Euler angles and reference frames for the discussion of molecular orientation: laboratory 
frame (x, y, z) and molecular frame (x', y\ z'). 
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Equation B 1.5. 44 indicates that if we know **■'/* and ff p7'* r , we may infer information about the third-order 
orientational moments ( 7*.., T.., T^). Since calibration of absolute magnitudes is difficult, we are generally 
concerned with a comparison of the relative magnitudes of the appropriate molecular (a^ ') and macroscopic ( 


X* ) quantities. In practice, the complexity of the general relationship between or ) and X* means that 
progress requires the introduction of certain simplifying assumptions. These usually follow from symmetry 

considerations or, for the case of or 2 ), from previous experimental or theoretical insight into the nature of the 
expected molecular response. 

The approach may be illustrated for molecules with a nonlinear polarizability cc 2 ) dominated by a single axial 
component orS,,, corresponding to a dominant nonlinear response from transitions along a particular 

molecular axis. Let us further assume that all in-plane direction of the surface or interface are equivalent. This 

would naturally be the case for a liquid or amorphous solid, but would not necessarily apply to the surface of a 

crystal. One then obtains from equation B 1.5. 44 , the following relations between the molecular quantities and 

surface nonlinear susceptibility: 


X5ll = X£U = ^OS 3 0}*gy (B1-5.46) 


xSlll. = X£, = ^^Osin'O)^ (B1-5.47) 


xiun = xiiiu = Xs 2 L = i^(cosVsin 2 0)a!?L. (B1.5.48) 
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Notice that ^ (2 | — ^-^ — x^*, so ^ at on ^y two °f the three nonlinear susceptibility tensor elements 

allowed for an isotropic surface are independent. From equation B 1.5.46 , equation B 1.5. 47 and equation 
Bl.5.48 , we may form the ratio 




xSxx t cos3 ^ 


(B1.5.49) 


Thus, a well-defined measure of molecular orientation is inferred from the measurement of the macroscopic 

quantities ***/*. For the case of a narrow and isotropic distribution, i.G.flfi) = 5(0 - 9 ), the left-hand side term 

of equation B 1.5. 49 becomes (cos0)/(cos 3 0) = sec 2 o , for which the mean orientation Q is directly obtained. 
For a broad distribution, one may extract the mean orientation from such an expression for an assumed 
functional form. 

As an example, the model described earlier for a molecule having a dominant nonlinear polarizability element 

ff : : :'has been applied to the determination of the molecular inclination between the molecular axis of a 
surfactant molecule, sodium-dodecylnaphtalene-sulphonate (SDNS) and the surface normal at the air/water 
interface [77]. This tilt angle 0, shown in figure Bl.5.1 1 was determined according to equation B 1.5.46 , 
equation B 1.5. 47 and equation B 1.5.48 under the assumption of a narrow orientational distribution. As the 
figure shows, the mean molecular orientation changes with increasing surface pressure n as the molecules are 
forced into a more nearly vertical orientation. 
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Figure Bl.5.11 Tilt angle between the molecular axis of sodium-dodecylnaphtalene-sulphonate (SDNS) and 
the surface normal as a function of the surface pressure n at the air/water interface. (From [77].) 
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In the literature, the interested reader may find treatment of molecules with differing, and more complex, 

behaviour for or 2 ) [14]. Also of importance is the role of SFG for orientational analysis. Surface SFG may 
provide more orientational information, since there are additional independent elements of the surface 
nonlinear susceptibility. For example, an isotropic surface is characterized by three independent elements for 
SHG, but is described by four independent elements for SFG. The principal advantage of SFG over SHG, 
however, lies in its molecular specificity. As will be discussed in section B 1.5.4.4, we may enhance the 
response of a molecular species of interest by choosing the appropriate infrared frequency for the SFG 
process. This behaviour helps to eliminate background signal and is useful in more complex systems with two 
or more molecular species present. Equally important, the excitation of a given vibration helps to define the 

form, and reduce the generality, of the molecular response a^ 2 \ Further, the method may be applied for 
different vibrational resonances to deduce the orientation of different moieties of larger molecules [78]. 

B1. 5.4.4 SPECTROSCOPY 

The second-order nonlinear susceptibility describing a surface or interface, as indicated by the microscopic 
form of equation B 1.5. 30 , is resonantly enhanced whenever an input or output photon energy matches a 
transition energy in the material system. Thus, by scanning the frequency or frequencies involved in the 
surface nonlinear process, we may perform surface-specific spectroscopy. This method has been successfully 
applied to probe both electronic transitions and vibrational transitions at interfaces. 


For studying electronic transitions at surfaces and interfaces, both SHG and SFG have been employed in a 
variety of systems. One particular example is that of the buried CaF 2 /Si(l 11) interface [79]. Figure B 1.5. 12(a) 
displays the experimental SH signal as a function of the photon energy of a tunable pump laser. An interface 
resonance for this system is found to occur for a photon energy near 2.4 eV. This value is markedly different 
from that of the energies of transitions in either of the bulk materials and clearly illustrates the capability of 
nonlinear spectroscopy to probe distinct electronic excitations of the interfacial region. The sharp feature 
appearing at 2.26 eV has been attributed to the formation of a two-dimensional exciton. It is important to 
point out that the measurement of the SHG signal alone does not directly show whether an observed 
resonance corresponds to a single- or a two-photon transition. To verify that the resonance enhancement does, 
in fact, correspond to a transition energy of 2.4 eV, a separate SF measurement ( Figure B 1.5. 12(b) ) was 


performed. In this measurement, the tunable laser photon was mixed with another photon at a fixed photon 
energy (1.17 eV). By comparing the two sets of data, one finds that the resonance must indeed lie at the 
fundamental frequency of the tunable laser for this system. 
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Figure Bl.5.12 SH and SF spectra (full dots) for the CaF 2 /Si(l 1 1) interface: (a) SH intensity as a function of 
the photon energy of the tunable laser; (b) SF intensity obtained by mixing the tunable laser with radiation at a 
fixed photon energy of 1.17 eV. For comparison, the open circles in (a) are signals obtained for a native-oxide 
covered Si(l 11). The full line is a fit to the theory as discussed in [79], 

The SHG/SFG technique is not restricted to interface spectroscopy of the delocalized electronic states of 
solids. It is also a powerful tool for spectroscopy of electronic transitions in molecules. Figure B 1.5. 13 
presents such an example for a monolayer of the R-enantiomer of the molecule 2,2'-dihydroxyl-l,l'- 
binaphthyl, (R)-BN, at the air/water interface [80]. The spectra reveal two-photon resonance features near 
wavelengths of 332 and 340 nm that are assigned to the two lowest exciton-split transitions in the naphth-2-ol 

monomer of BN. An increase in signal at higher photon energies is also seen as a resonance as the B^ state of 
the molecules is approached. The spectra in figure Bl.5.13 have been obtained for differing polarization 
configurations. The arrangements of p-in/p-out and s-in/p-out will yield SH signals for any isotropic surface. 
In this case, however, signal is also observed for the p-in/s-out configuration. This response arises from the 

X^.vnelement of the surface nonlinear response that is present because of the chiral character of the molecules 
under study, as previously discussed in section B 1.5.4.1 . 
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Figure Bl.5.13 Spectra of the various non-chiral [p-in/p-out (filled circles) and s-in/p-out (filled diamonds)] 
and chiral [p-in/s-out (triangle)] SHG signals of (R)-BN molecules adsorbed at the air/water interface. (From 
[80].) 

In addition to probing electronic transitions, second-order nonlinear optics can be used to probe vibrational 
resonances. This capability is of obvious importance and value for identifying chemical species at interfaces 
and probing their local environment. In contrast to conventional spectroscopy of vibrational transitions, which 
can also be applied to surface problems [4], nonlinear optics provides intrinsic surface specificity and is of 
particular utility in problems where the same or similar vibrational transitions occur at the interface as in the 
bulk media. In order to access the infrared region corresponding to vibrational transitions while maintaining 
an easily detectable signal, Shen and coworkers developed the technique of the infrared- visible sum- 
frequency generation [81] . In this scheme, a tunable IR source is mixed on the surface with visible light at a 
fixed frequency to produce readily detectable visible radiation. As the IR frequency is tuned through the 
frequency of a vibrational transition, the SF signal is resonantly enhanced and the surface vibration spectrum 
is recorded. In order to examine the IR- visible SFG process more closely, let us consider an appropriate 
formula for the surface nonlinear susceptibility when the IR frequency co 1 = co IR is near a single vibrational 
resonance and the visible frequency a> 2 = o> v j s is not resonant with an electronic transition. We may then write 
[81,82] 
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where **-'/* and * s -'/* are the non-resonant and resonant contributions to the signal, respectively; Aj ..^, co^, and 

T^ are the strength, resonant frequency, linewidth of the /th vibrational mode. The quantity A^ is proportional 
to the product of the first derivatives of the molecular dipole moment [i f and of the electronic polarizability ou 
with respect to the /th normal coordinate Q t : 
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Consequently, in order for a vibrational mode to be observed in infrared- visible SFG, the molecule in its 
adsorbed state has to be both IR [(d|u JdQj) * 0] and Raman [(da^/dg,) ^ 0] active. 


The form of equation B 1.5. 50 also allows us to make some remarks about the measured lineshapes in surface 
nonlinear spectroscopy. From this point of view, we may regard equation B 1.5. 50 as being representative of 
the surface nonlinear response typically encountered near any resonance: It has a strongly varying resonant 
contribution together with a spectrally flat non-resonant background. The interesting aspect of this situation 
arises from the fact that we generally detect the intensity of the SH or SF signal, which is proportional to 
Ix^jJ 2 - Consequently, interference between the resonant and non-resonant contributions is expected. 

Depending on the relative phase difference between these terms, one may observe various experimental 
spectra, as illustrated in figure B 1.5. 14. This type of behaviour, while potentially a source of confusion, is 
familiar for other types of nonlinear spectroscopy, such as CARS (coherent anti-Stokes Raman scattering) 
[ 30 , 31 ] and can be readily incorporated into modelling of measured spectral features. 



Figure Bl.5.14 Possible lineshapes for an SFG resonance as a function of the infrared frequency a> IR . The 

measured SFG signal is proportional to |% NR + A/(a> IR - co n + ir)| 2 . Assuming both % NR and Y are real and 
positive, we obtain the lineshapes for various cases: (a) X' << ^/ r (b) A is purely imaginary and negative 

with '* ^ ' > ±(c) A is real and positive; and (d) A is real and negative. Note the apparent blue and red 
shifts of the peaks in cases (c) and (d), respectively. 


We now present one of the many examples of interfacial vibrational spectroscopy using SFG. Figure B 1.5. 15 
shows the surface vibrational spectrum of the water/air interface at a temperature of 40 °C [83], Notice that 

the spectrum exhibits peaks at 3680, 3400 and 3200 cm . These features arise from the OH stretching mode 
of water molecules in different environments. The highest frequency peak is assigned to free OH groups, the 
next peak to water molecules with hydrogen-bonding to neighbours in a relatively disordered structure and the 
lowest frequency peak to water molecules in a well-ordered tetrahedrally bonded (ice-like) structure. In 
addition to the analogy of these assignments to water molecules in different bulk environments, the 
assignments are compatible with the measured temperature dependence of the spectra. The strong and narrow 

peak at 3680 cm -1 provides interesting new information about the water surface. It indicates that a substantial 
fraction of the surface water molecules have unbonded OH groups protruding from the surface of the water 
into the vapour phase. This study exemplifies the unique capabilities of 
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surface SFG, as there is no other technique that could probe the liquid/vapour interface in the presence of 
strong features of the vibrational modes of the bulk water molecules. 
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Figure Bl.5.15 SFG spectrum for the water/air interface at 40 °C using the ssp polarization combination (s-, 
s- and p-polarized sum-frequency signal, visible input and infrared input beams, respectively). The peaks 
correspond to OH stretching modes. (After [83].) 

An important consideration in spectroscopic measurements concerns the bandwidth of the laser sources. In 
order to resolve the vibrational resonances in a conventional approach, one needs, in the conventional scheme, 
a tunable source that has a narrow bandwidth compared to the resonance being studied. For typical 
resolutions, this requirement implies, by uncertainty principle, that IR pulses of picosecond or longer duration 
must be used longer. On the other hand, ultrafast pulsed IR sources with broad bandwidths are quite attractive 
from the experimental standpoint. In order to make use of these sources, two types of new experimental 
techniques have been introduced. One technique involves mixing the broadband IR source (-300 cm -1 ) with a 
narrowband visible input (-5 cm"). By spectrally resolving the SF output, we may then obtain resolution of 
the IR spectrum limited only by the linewidth of the visible source [84, 85]. This result follows from the fact 

that a> IR = co SF - a> vis must be satisfied for the SFG process. The second new approach involves the 
application of a Fourier transform scheme [86]. This is accomplished by passing the IR pulses through an 
interferometer and then mixing these pairs of pulses with visible radiation at the surface. 

B1. 5.4.5 DYNAMICS 

Many of the fundamental physical and chemical processes at surfaces and interfaces occur on extremely fast 
time scales. For example, atomic and molecular motions take place on time scales as short as 100 fs, while 
surface electronic states may have lifetimes as short as 10 fs. With the dramatic recent advances in laser 
technology, however, such time scales have become increasingly accessible. Surface nonlinear optics provides 
an attractive approach to capture such events directly in the time domain. Some examples of application of the 
method include probing the dynamics of melting on the time scale of phonon vibrations [87], 
photoisomerization of molecules [88], molecular dynamics of adsorbates [ 89 , 90], interfacial solvent 
dynamics [91], transient band-flattening in semiconductors [92] and laser- induced desorption [93]. A review 
article discussing such time-resolved studies in metals can be found in 
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[94]. The SHG and SFG techniques are also suitable for studying dynamical processes occurring on slower 
time scales. Indeed, many valuable studies of adsorption, desorption, diffusion and other surface processes 
have been performed on time scales of milliseconds to seconds. 


In a typical time-resolved SHG (SFG) experiment using femtosecond to picosecond laser systems, two (three) 
input laser beams are necessary. The pulse from one of the lasers, usually called the pump laser, induces the 


reaction or surface modification. This defines the starting point (t = 0). A second pulse (or a second set of 
synchronized pulses, for the case of SFG) delayed relative to the first pulse by a specified time At is used to 
probe the reaction as it evolves. By varying this time delay At, the temporal evolution of the reaction can be 
followed. In order to preserve the inherent time resolution of an ultrafast laser, the relevant pulses are 
generally derived from a common source. For instance, in a basic time-resolved SHG experiment, where both 
the pump and probe pulses are of the same frequency, one simply divides the laser beam into two sets of 
pulses with a beam splitter. One of these pulses travels a fixed distance to the sample, while the other passes 
through a variable delay line to the sample. This approach provides a means of timing with sub-femtosecond 
accuracy, if desired. In some cases, at least one of the input beams has a different frequency from the others. 
Such pulses can be produced through processes such as harmonic generation or optical parametric generation 
from the main laser pulse. 

As an example of this class of experiment, we consider an experimental study of the dynamics of molecular 
orientational relaxation at the air/water interface [90]. Such investigations are of interest as a gauge of the 
local environment at the surface of water. The measurements were performed with time-resolved SHG using 
Coumarin 314 dye molecules as the probe. In order to examine orientational motion, an anisotropic 
orientational distribution of molecules must first be produced. This is accomplished through a photoselection 
process in which the interface is irradiated by a linearly polarized laser pulse that is resonant with an 
electronic transition in the dye molecules. Those molecules that are oriented with their transition dipole 
moments parallel to the polarization of the pump beam are preferentially excited, producing an orientational 
anisotropy in the ground- and excited-state population. Subsequently, these anisotropic orientational 
distributions relax to the equilibrium configuration. The time evolution of the rotational anisotropy was 
followed by detecting the SH of a probe laser pulse as a function of the delay time, as shown in figure 
Bl.5.16 . Through a comparison of the results for different initial anisotropic distributions (produced by two 
orthogonal linearly-polarized pump beams, as shown in the figure, as well as by circularly-polarized pump 
radiation), one may deduce rates for both in-plane and out-of-plane orientational relaxation. The study yielded 
the interesting result that the orientational relaxation times at the liquid/vapour interface significantly 
exceeded those for the Coumarin molecules in the bulk of water. This finding was interpreted as reflecting the 
increased friction encountered in the surface region where the water molecules are more highly ordered than 
in the bulk liquid. 
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Figure Bl.5.16 Rotational relaxation of Coumarin 314 molecules at the air/water interface. The change in the 
SH signal is recorded as a function of the time delay between the pump and probe pulses. Anisotropy in the 
orientational distribution is created by linearly polarized pump radiation in two orthogonal directions in the 
surface. (After [90].) 


B1. 5.4.6 SPATIAL RESOLUTION 


Another application of surface SHG or SFG involves the exploitation of the lateral resolution afforded by 
these optical processes. While the dimension of the optical wavelength obviously precludes direct access to 
the length scale of atoms and molecules, one can examine the micrometre and submicrometre length scale that 
is important in many surface and interface processes. Spatial resolution may be achieved simply by detecting 
the nonlinear response with a focused laser beam that is scanned across the surface [95]. Alternatively, one 
may illuminate a large area of the surface and image the emitted nonlinear radiation [96]. Applications of this 
imaging capability have included probing of magnetic domains [92] and spatially varying electric fields [98], 
The application of near-field techniques may permit, as in linear optics, the attainment of spatial resolution 
below the diffraction limit. In a recent work [99], submicrometre spatial resolution was indeed reported by 
collecting the emitted SH radiation for excitation with a near-field fibre probe. 

Diffraction measurements offer a complementary approach to the real-space imaging described earlier. In 
such schemes, periodically modulated surfaces are utilized to produce well-defined SH (or SF) radiation at 
discrete angles, as dictated by the conservation of the in-plane component of the wavevector. As an example 
of this approach, a grating in the surface adsorbate density may be produced through laser-induced desorption 
in the field of two interfering beams. This monolayer grating will readily produce diffracted SH beams in 
addition to the usual reflected beam. In addition to their intrinsic interest, such structures have permitted 
precise measurements of surface diffusion. One may accomplish this by observing the temporal evolution of 
SH diffraction efficiency, which falls as surface diffusion causes the modulation depth of the adsorbate 
grating to decrease. This technique has been applied to examine diffusion of adsorbates on the surface of 
metals [ 100 ] and semiconductors [ 101 ]. 

B1. 5.4.7 ELECTRIC AND MAGNETIC FIELD PERTURBATION 

Probing electric and magnetic fields, and the effects induced by them, is of obvious interest in many areas of 
science and technology. We considered earlier the influence of such perturbations in a general fashion in 
section Bl. 5. 2. 2 . Here we describe some related experimental measurements and applications. Electric fields 
act to break inversion and thus 
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may yield bulk SHG and SFG signals from centrosymmetric media surrounding the interface, in addition to 
any field-dependent contribution of the interface itself. The high sensitivity of SHG toward applied electric 
fields was first demonstrated in calcite by Terhune et al as early as 1962 [ 102 ], Subsequent investigations of 
EFISH from centrosymmetric media have involved semiconductor/electrolyte and metal/electrolyte interfaces 
[ 103 , 104 and 105 ], as well as metal-oxide-semiconductor interfaces [ 106 , 107 ]. 

The generality of the EFISH process has led to a variety of applications. These include probing the surface 
potential at interfaces involving liquids. Such measurements rely on the fact that the EFISH field is 
proportional to the voltage drop across the polarized layer provided, as is generally the case, that this region is 
thin compared to the scale of an optical wavelength [ 108 ]. This effect also serves as a basis for probing 
surface reactions involving changes of charge state, such as acid/base equilibria [ 109 ]. Another related set of 
applications involves probing electric fields in semiconductors, notably in centrosymmetric material silicon 
[ 98 , 110 , 111 ]. These studies have demonstrated the capability for spatial resolution, vector analysis of the 
electric field and, significantly, ultrafast (subpicosecond) time resolution. These capabilities of SHG 
complement other optical schemes, such as electro-optical and photoconductive sampling, for probing the 
dynamics of electric fields on very fast time scales. 

The influence of an applied magnetic field, as introduced in section B 1.5. 2. 2 , is quite different from that of an 
applied electric field. A magnetic field may perturb the interfacial nonlinear response (and that of the weak 
bulk terms), but it does not lead to any dipole-allowed bulk nonlinear response. Thus, in the presence of 
magnetic fields and magnetization, SHG remains a probe that is highly specific to surfaces and interfaces. It 


may be viewed as the interface-sensitive analogue of linear magneto-optical effects. The first demonstration 
of the influence of magnetization on SHG was performed on an Fe(l 10) surface [ 112 ]. Subsequent 
applications have included examination of other materials for which both the bulk and surface exhibit 
magnetization. For these systems, surface specificity is of key importance. In addition, the technique has been 
applied to examine buried magnetic interfaces [ 113 ]. Excellent review articles on this subject matter are 
presented in [ 23 ] and [ 114 ]. 

B1. 5.4.8 RECENT DEVELOPMENTS 

Up to this point, our discussion of surface SHG and SFG has implicitly assumed that we are examining a 
smooth planar surface. This type of interface leads to well-defined and highly collimated transmitted and 
reflected beams. On the other hand, many material systems of interest in probing surfaces or interfaces are not 
planar in character. From the point of view of symmetry, the surface sensitivity for interfaces of 
centrosymmetric media should apply equally well for such non-planar interfaces, although the nature of the 
electromagnetic wave propagation may be modified to a significant degree. One case of particular interest 
concerns appropriately roughened surfaces of noble metals. These were shown as early as 1974 [ 115 ] to give 
rise to strong enhancements in Raman scattering of adsorbed species and led to extensive investigation of the 
phenomenon of surface-enhanced Raman scattering or SERS [5]. Significant enhancements in the SHG 
signals from such surfaces have also been found [ 116 ]. The resulting SH radiation is diffuse, but has been 
shown to preserve a high degree of surface sensitivity. Carrying this progression from planar surfaces one step 
further, researchers have recently demonstrated the possibility of probing the surfaces of small particles by 
SHG. 

Experimental investigations of the model system of dye molecules adsorbed onto surfaces of polystyrene 
spheres have firmly established the sensitivity and surface specificity of the SHG method even for particles of 
micrometre size [ 117 ]. The surface sensitivity of the SHG process has been exploited for probing molecular 
transport across the bilayer in liposomes [ 118 ], for measurement of electrostatic potentials at the surface of 
small particles [ 119 ] and for imaging 
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membranes in living cells [ 120 ]. The corresponding theoretical description of SHG from the surfaces of small 
spheres has been examined recently using the type of formalism presented earlier in this chapter [ 121 ]. Within 
this framework, the leading-order contributions to the SH radiation arise from the non-local excitation of the 
dipole and the local excitation of the quadrupole moments. This situation stands in contrast to linear optical 
(Rayleigh) scattering, which arises from the local excitation of the dipole moment. 


B1.5.5 CONCLUSION 

In this brief chapter, we have attempted to describe some of the underlying principles of second-order 
nonlinear optics for the study of surfaces and interfaces. The fact that the technique relies on a basic symmetry 
consideration to obtain surface specificity gives the method a high degree of generality. As a consequence, 
our review of some of the applications of the method has necessarily been quite incomplete. Still, we hope 
that the reader will gain some appreciation for the flexibility and power of the method. Over the last few 
years, many noteworthy applications of the method have been demonstrated. Further advances may be 
anticipated from on-going development of the microscopic theory, as well as from adaptation of the 
macroscopic theory to new experimental conditions and geometries. At the same time, we see continual 
progress in the range and ease of use of the technique afforded by the impressive improvement of the 
performance and reliability of high-power laser sources. 
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B1.6 Electron-impact spectroscopy 


John H Moore 


B1. 6.0 INTRODUCTION 

When one speaks of a 'spectrum', the dispersed array of colours from a luminous body comes to mind; 
however, in the most general sense, a spectrum is a record of the energy and probability of transitions between 
states of a substance. In electron spectroscopy the 'spectrum' takes the form of the energy distribution of 
electrons emanating from a sample. Electron spectroscopies are classified according to the phenomena giving 
rise to these electrons; historically, each technique has acquired an acronym until today one finds a veritable 
alphabet soup of electron spectroscopies in the scientific literature. For example, PES refers to photoelectron 
spectroscopy, a technique in which the detected electrons are emitted after the absorption of a photon induces 
transitions into the continuum beyond the first ionization potential of the sample. Electron-impact 
spectroscopies, the subject of this entry, entail the excitation of a transition by an electron impinging upon a 
sample with the subsequent measurement of the energy of the scattered electron. The spectrum is the 
scattered-electron intensity as a function of the difference between the incident- and scattered-electron 
energies — the energy loss. 

The technologies of the various electron spectroscopies are similar in many ways. The techniques for 
measuring electron energies and the devices used to detect electrons are the same. All electron spectrometers 

must be housed in an evacuated container at pressures less than about 10 -6 mbar since electrons cannot be 
transported through the atmosphere. Stray fields that perturb electron trajectories are a potential problem. To 
function correctly, an electron spectrometer must be shielded from the earth's magnetic field. 


Electron-impact energy-loss spectroscopy (EELS) differs from other electron spectroscopies in that it is 
possible to observe transitions to states below the first ionization edge; electronic transitions to excited states 
of the neutral, vibrational and even rotational transitions can be observed. This is a consequence of the 
detected electrons not originating in the sample. Conversely, there is a problem when electron impact induces 
an ionizing transition. For each such event there are two outgoing electrons. To precisely account for the 
energy deposited in the target, the two electrons must be measured in coincidence. 

In comparison to optical spectroscopy, electron-impact spectroscopy offers a number of advantages. Some of 
these are purely technological while others are a result of physical differences in the excitation mechanism. 
The energy of an electron can be varied simply and smoothly by scanning the voltage applied to, and hence 
the potential difference between, electrodes in the spectrometer. The same technology is applicable to 
electrons with energies, and energy losses, in the millielectronvolt (meV) range as in the kiloelectronvolt 
(keV) range. At least in principle, measurements analogous to IR spectroscopy can be carried out in the same 
instrument as measurements akin to x-ray spectroscopy. Unlike an optical instrument, the source intensity and 
the transmission of an electron spectrometer are nearly independent of energy, making the electron instrument 
more suitable for absolute intensity measurements; however, an electron spectrometer cannot always provide 
resolution comparable to that of an optical instrument. Studies of rotational and vibrational excitation, 
particularly in surface adsorbates, are routinely carried out by electron spectroscopy with a resolution of 2 to 5 
meV (16 to 40 cm -1 ). A resolution of 5 to 30 meV is typically obtained for 


electron-impact excitation of valence-electron transitions in atoms and molecules in the gas phase. Analogous 

studies by vis-UV spectroscopy easily provide 1 cm -1 resolution. For inner-shell-electron excitation, electron 
spectroscopy provides resolution comparable or superior to x-ray spectroscopy with discrete-line-source x-ray 
tubes. X-ray synchrotron sources now becoming available will provide better resolution and intensity than can 
be achieved with an electron spectrometer, but it must be borne in mind that an electron spectrometer is a 
relatively inexpensive, table-top device, whereas a synchrotron is a remote, multimillion-dollar facility. In 
many applications, electron-scattering experiments are more sensitive than optical experiments. This is in part 
due to the superior sensitivity of electron detectors. An electron multiplier has essentially unit efficiency 
(100%); a photomultiplier or photodiode may have an efficiency of only a few per cent. For surface analysis, 
electron spectroscopies have a special advantage over optical techniques owing to the short range of electrons 
in solids. 

The mechanism by which a transition is induced by electron impact depends on the nature of the coupling 
between the projectile electron and the target; this in turn is influenced by the velocity and closeness of 
approach of the projectile to the target. There is a wide range of possibilities. A high-energy projectile 
electron may pass quickly by, delivering only a photon-like electric-field pulse to the target at the instant of 
closest approach. Less probable are hard, billiard-ball-like collisions between the projectile and one target 
electron. At low energies, slower, more intimate collisions are characterized by many-electron interactions. 
Depending upon the mechanism, the momentum transferred from projectile to target can vary from the 
minimum necessary to account for the transition energy to many times more. The interaction influences the 
type of transition that can be induced and the way in which the projectile is scattered. It is even possible for 
the projectile electron to be exchanged for a target electron, thus allowing for electron-spin-changing 
transitions. This state of affairs is a contrast to optical excitation where the momentum transfer is a constant 
and only 'dipole-allowed' transitions occur with significant probability. 


B1.6.1 TECHNOLOGY 


B1. 6.1.1 CROSS SECTION AND SIGNAL INTENSITY 

The quantities to be measured in electron-impact spectroscopy are the probability of an electron impact's 
inducing a transition and the corresponding transition energy. The energy for the transition is taken from the 
kinetic energy of the projectile electron. Unlike the situation in optical spectroscopy, the exciting particle is 
not annihilated, but is scattered from the target at some angle to its initial direction. The scattering angle is a 
measure of the momentum transferred to the target, and, as such, is also an important variable. 

The probability of a collision between an electron and a target depends upon the impact parameter, b, which 
is the perpendicular distance between the line of travel of the electron and the centre offeree exerted by the 
target on the electron. The impact parameter is equivalent to the distance of closest approach if no potential is 
present between the electron and the target. For a hard-sphere collision between an infinitesimally small 
projectile and a target of radius r, the impact parameter must be less than r for a collision to occur, and, from 
simple geometry, the scattering angle 0=2 arccos (b/r). The scattering angle is large for small impact 
parameters, while for 'grazing' collisions, where b approaches r, the scattering angle is small. The probability 

of a collision is proportional to the cross sectional area of the target, n r 2 . Real collisions between an electron 
and an atom involve at least central-force-field potentials and, frequently, higher-multipole potentials, but the 
billiard-ball scattering model is so pervasive that collision probabilities are almost always expressed as cross 
sections, often denoted by the symbol a with units of the order of the cross 


section of an atom, such as 10 cm or square Angstroms (A ). The atomic unit of cross section is the Bohr 
radius squared (r Q 2 = 0.28* 10" 16 cm 2 ). 

In a very simple form of electron spectroscopy, known as electron transmission spectroscopy, the attenuation 
of an essentially monoenergetic beam of electrons is measured after passage through a sample. If the target is 
very thin or of such low density that most electrons pass through unscattered, the attenuation is small and the 

transmitted current, I (in units of electrons per unit time, s _1 ), compared to the incident current, 7 (s _1 ), is 
given by 

where n is the number density of particles in the target (typically in units of cm -3 ), £(cm) is the thickness of 
the target and a is the total electron scattering cross section. The cross section depends upon the energy, E^, 
of the incident electrons: a = a (Eq). The total electron scattering spectrum presented as II I ^ as a function of 
Eq bears an inverse relation to the cross section, the transmitted current decreasing as the cross section 
increases. 

An electron-energy-loss spectrometer consists of an 'electron gun' that directs a collimated beam of electrons 
upon a sample, and an 'analyser' that collects electrons scattered in a particular direction (specified by and (|) 
in spherical coordinates) and transmits to a detector those electrons with energy E. The electron-energy-loss 
spectrum is a plot of the scattered-electron current, 7 § , arriving at the detector as a function of the energy loss, 
(Eq - E). The cross section for the inelastic scattering process giving rise to the observed signal depends upon 
the scattering angle: it is a differential cross section, da/dQ, where dQ=sin0 d0 d(|) in spherical coordinates. 
The magnitude of the cross section depends upon the incident electron energy: there will be a threshold below 
which the cross section is zero when the incident electron has insufficient energy to excite the transition, and 
there will be an incident electron energy for which the coupling between projectile and target is greatest and 
the cross section passes through a maximum. The instrument parameters, as well as the cross section, 
determine the actual signal level: 
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where A Q is the solid-angle field of view of the scattered-electron analyser. 

The foregoing description of an electron energy-loss spectrometer assumes a monoenergetic incident electron 
beam and the excitation of a transition of negligible energy width. It is often the case that the transition 
intensity spans a range in energy, and, in addition, the incident beam has some energy spread and the analyser 

a finite bandpass. One must consider a cross section differential in both angle and energy, d 2 a/dQ dE. The 
signal intensity is then 
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where AE represents a convolution of the energy spread of the source and the passband of the analyser. 


B1.6.1.2 ELECTRON OPTICS 

The two essential elements of an electron spectrometer are the electrodes that accelerate electrons and focus 
them into a beam and the dispersive elements that sort electrons according to their energies. These serve the 
functions of lenses and prisms in an optical spectrometer. The same parameters are used to describe these 
elements in an electron spectrometer as in an optical spectrometer; the technology is referred to as electron 
optics. 

(A) ELECTRON LENSES 

The typical electron-optical lens consists of a closely spaced pair of coaxial cylindrical tubes biased at 
different electrical potentials. The equipotential surfaces in the gap between the tubes assume shapes similar 
to those of optical lenses as illustrated in figure B 1.6.1. An electron passing across these surfaces will be 
accelerated or decelerated, and its path will be curved to produce focusing. The main difference between the 
electron lens and an optical lens is that the quantity analogous to the refractive index, namely the electron 
velocity, varies continuously across an electrostatic lens, whereas a discontinuous change of refractive index 
occurs at the surface of an optical lens. Electron lenses are 'thick' lenses, meaning that their axial dimensions 
are comparable to their focal lengths. An important consequence is that the principal planes employed in ray 
tracing are separated from the midplane of the lens and lie to the low- velocity side of the lens gap, as shown 
in the figure. The design of these lenses is facilitated by tables of electron lens optical properties [1], and by 
computer programs that calculate the potential array for an arbitrary arrangement of electrodes, and trace 
electron trajectories through the resultant field [2]. In addition to cylindrical electrodes, electron lenses are 
sometimes created by closely spaced planar electrodes with circular apertures or slits. Shaped magnetic fields 
are also used to focus electrons, especially for electron energies much in excess of 10 keV where electrostatic 
focusing requires inconveniently high voltages. 



Figure Bl.6.1 Equipotential surfaces have the shape of lenses in the field between two cylinders biased at 
different voltages. The focusing properties of the electron optical lens are specified by focal points located at 
focal lengths /j and^, measured relative to the principal planes, H^ and H 2 . The two principal rays emanating 
from an object on the left and focused to an image on the right are shown. 

(B) ELECTRON ANALYSERS 

An electron 'prism', known as an analyser or monochromator, is created by the field between the plates of a 
capacitor. The plates may be planar, simple curved, spherical, or toroidal as shown in figure Bl.6.2 . The 
trajectory of an electron entering the gap between the plates is curved as the electron is attracted to the 
positively biased (inner) plate and 


repelled by the negatively biased (outer) plate. The curvature of the trajectory is a function of the electron's 
kinetic energy so that the electrons in a beam projected between the plates are dispersed in energy. These 
devices are not only dispersive, but focusing; electrons of the same energy originating from a point source are 
brought to a point on a focal plane at the output side of the analyser. The energy passband, or resolution, of an 
electrostatic analyser is the range of energies, AE, of electrons which, entering through a slit, are transmitted 
through the analyser to an exit slit. This quantity depends upon the width of the slits as well as the physical 
dimensions of the analyser. Fixing the slit widths and analyser dimension fixes the relative resolution, A E/E, 
where E is the nominal energy of transmitted electrons. The resolving power of an analyser is specified as 
E/AE, the inverse of the relative resolution. For each type of analyser there is a simple relation between 
analyser dimensions and resolution; for example, the analyser with hemispherical plates has relative 
resolution A E/E=w/2R, where w is the diameter of the entrance and exit apertures and R is the mean radius of 
the plates. 



Figure Bl.6.2 Electron analysers consisting of a pair of capacitor plates of various configurations: (a) the 
parallel-plate analyser, (b) the 127° cylindrical analyser and (c) the 180° spherical analyser. Trajectories for 
electrons of different energies are shown. 

For an analyser of fixed dimensions, the absolute resolution, AE, can be improved, that is, made smaller, by 
reducing the pass energy, E. This is accomplished by decelerating the electrons to be analysed with a 
decelerating lens system at the input to the analyser. By this means, an absolute resolution as small as 2 meV 
has been achieved. For most practical analysers, A EIE is of the order of 0.01, and the pass energy in the 
highest-resolution spectrometers is of the order of 1 eV. The transmission of electrons of energy less than 
about 1 eV is generally not practical since unavoidable stray electric and magnetic fields produce 
unpredictable deflection of electrons of lower energies; even spectrometers of 


modest resolution require magnetic shielding to reduce the magnetic field of the earth by two to three orders 
of magnitude. 


Magnetic fields are employed in several electron-energy analysers and filters ( figure Bl.6.3 ). For very-low- 
energy electrons (0 to 10 eV), the 'trochoidal analyser' has proven quite useful. This device employs a 
magnetic field aligned to the direction of the incident electrons and an electric field perpendicular to this 
direction. The trajectory of an electron injected into this analyser describes a spiral and the guiding centre of 
the spiral drifts in the remaining perpendicular direction. The drift rate depends upon the electron energy so 
that a beam of electrons entering the device is dispersed in energy at the exit. The projection of the trajectory 
on a plane perpendicular to the electric field direction is a troichoid, hence the name troichoidal analyser. The 
Wien filter is similar in that it uses crossed electric and magnetic fields; however, the fields are perpendicular 
to one another and both are perpendicular to the injected electron beam direction. The Coulomb force induced 
by the electric field, E, deflects electrons in one direction and the Lorentz force associated with the magnetic 
field, B, tends to deflect electrons in the opposite direction. The forces balance for one velocity, o = |E|/|B|, 
and electrons of this velocity are transmitted straight through the filter to the exit aperture. A magnetic field 
alone, perpendicular to the direction of an electron beam, will disperse electrons in energy. Sector magnets 


such as those used in mass spectrometers are used in electron spectrometers for very-high-energy electrons, 
the advantage over electrostatic deflectors being that large electrical potentials are not required. Another 
advantage is that deflection is in the direction parallel to the magnet pole face making it possible to view the 
entire dispersed spectrum at one time. By contrast, energy dispersion of an electron beam in an electrostatic 
device results in a significant portion of the dispersed electrons striking one or the other electrode. 
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Figure Bl.6.3 Electron energy analysers that use magnetic fields: (a) the trochoidal analyser employing an 
electromagnet, (b) the Wien filter and (c) the sector magnet analyser. Trajectories for electrons of different 
energies are shown. 

(C) ELECTRON GUNS 

Thermionic emission from an electrically heated filament is the usual source of electrons in an electron 
spectrometer. Field emission induced by a large electric-field gradient at a sharply pointed electrode may be 
used when fine spatial resolution is required. For very special applications, laser-induced photoemission has 
been used to produce nearly monoenergetic electrons and electrons with spin polarization. The filament in a 
thermionic source is a wire or ribbon of tungsten or some other refractory metal, sometimes coated or 
impregnated with thoria to reduce the work function. The passage of an electrical current of a few amps heats 
the filament to 1500 to 2500°C. As shown in figure B 1.6.4 the filament is typically mounted in a diode 
arrangement, protruding through a cathode and a small distance from an anode with an aperture through 
which electrons are extracted. The cathode is biased 30 to 50 V negative with respect to the source and the 
anode 50 to 100 V positive. An approximately Maxwellian energy distribution is produced. Depending on the 
filament temperature and the potential drop across the hot portion of the filament, this distribution is 0.3 to 0.7 
eV wide. For most applications, the electron beam extracted from a thermionic source is passed through a 
monochromator to select a narrower band of energies from that emanating from the source. 
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Figure Bl.6.4 Diode electron source. 

(D) ELECTRON SPECTROMETERS 

A typical electron energy-loss spectrometer is shown in figure Bl.6.5. The major components are an electron 
source, a premonochromator, a target, an analyser and an electron detector. For gaseous samples, the target 
may be a gas jet or the target may be a gas confined in a cell with small apertures for the incident beam and 
for the scattered electrons. The target may be a thin film to be viewed in transmission or a solid surface to be 
viewed in reflection. The analyser may be rotatable about the scattering centre so the angularly differential 
scattering cross section can be measured. Most often the detector is an electron multiplier that permits 
scattered electrons to be counted and facilitates digital processing of the scattered-electron spectrum. In low- 
resolution instruments, the scattered-electron intensity may be sufficient to be measured with a sensitive 
electrometer as an electron current captured in a 'Faraday cup'. 
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Figure Bl.6.5 Typical electron energy-loss spectrometer. 

Electron lens systems between each component serve a number of functions. A lens following the source 
focuses electrons on the entrance aperture of the premonochromator and decelerates these electrons to the pass 
energy required 


to obtain the desired resolution. The lens following focuses electrons on the target and provides a variable 
amount of acceleration to permit experiments with different incident energies. The lens system at the input to 
the analyser again decelerates electrons and focuses on the entrance aperture of the analyser; however, this 
lens system has an additional function. The scattered-electron energy spectrum must be scanned. This is 
accomplished with an 'adder lens' that progressively adds back energy to the scattered electrons. The analyser 
is set to transmit electrons which have lost no energy; the energy-loss spectrum is then a plot of the detector 
signal as a function of the energy added by the adder lens. The alternative method of scanning is to vary the 
pass energy of the analyser; this has the disadvantage of changing the analyser resolution and transmission as 
the spectrum is scanned. The lens following the analyser accelerates transmitted electrons to an energy at 
which the detector is most sensitive, typically several hundred eV. 


THEORY B1. 6.2 

Inelastic electron collisions can be roughly divided into two regimes: those in which the kinetic energy of the 
projectile electron greatly exceeds the energy of the target atom or molecule's electrons excited by the 
collision, and those in which the projectile and target electron energies are comparable. In the higher-energy 
region the target electrons are little disturbed by the approach and departure of the projectile; the excitation 
occurs suddenly when the projectile is very close to the target. In the lower-energy region, the interaction 
proceeds on a time scale comparable to the orbital period of the target electrons; both projectile and target 
electrons make significant adjustments to one another's presence. In some such cases, it may even make sense 
to consider the electron-target complex as a transient negative ion. 

B1. 6.2.1 BETHE-BORN THEORY FOR HIGH-ENERGY ELECTRON SCATTERING 

Bethe provided the theoretical basis for understanding the scattering of fast electrons by atoms and molecules 
[3, 4]. We give below an outline of the quantum-mechanical approach to calculating the scattering cross 
section. 

The Schrodinger equation for the projectile-target system is 


2in r 


2m 


^V^. + Vfr,rj,r A ) 


#fr.r,)-[» + g] 


$(r,Tj) 


B1.6.1 


where r gives the position of the projectile electron and the r- are the coordinates of the electrons in the target 
and the r N are the coordinates of the nuclei; V{r, r-, r N ) is the potential energy of interaction between the 
projectile and the particles (electrons and nuclei) that make up the target, as well as interactions between 
particles in the target; s Q is the energy of the target in its ground state; and fj 2 k Q 2 /2m is the initial kinetic 
energy of the projectile with wavevector k^ and momentum hk^. The wave equation (B 1.6.1) is inseparable 
because of terms in the potential-energy operator that go as \r -r-\. It is thus impossible to obtain an analytic 
solution for the wave function of the scattered electron. An approximate solution can be obtained by 
expanding the wave function, §(r,r-) 9 in the complete set of eigenfunctions of the target, X m ( r j)> an d of the 
projectile, \|/(r). This separates the wave equation into a set of coupled differential equations each of which 
manifests a discrete interaction coupling two states (n and m) of the target. The interaction is 
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described by a matrix element: 


Vmt - / Xn(Tj)VXm(rj)drj. 


Approximate methods may be employed in solving this set of equations for the i|/(r); however, the asymptotic 
form of the solutions are obvious. For the case of elastic scattering 


ik.rr 


r 

the first term representing an incident plane wave moving in the z-direction in a spherical coordinate system 
and the second term an outgoing spherical wave modulated by a scattering amplitude, f(0,(|)). For inelastic 
scattering, the solutions describe an outgoing wave with momentum ftk, 

r 
In this case the projectile has imparted energy h (k - k )/2m to the target. Assuming the target is initially in 

9 9 9 

its ground state (m = 0), the collision has excited the target to a state of energy E R =h (k^ - k )/2m. 

The cross section for scattering into the differential solid angle dQ centred in the direction (0,(|)), is 
proportional to the square of the scattering amplitude: 

where the ratio k/k^ = v/v Q accounts for the fact that, all other things being equal, the incident and scattered 
flux differ owing to the difference in velocity, v Q , of the incident electron compared to the velocity, v, of the 
scattered electron. As a consequence of the expansion of the total wave function, the scattering amplitude can 
also be decomposed into terms each of which refers to an interaction coupling specific states of the target: 

/-<*, *) = r^T U- ikr V mtt ^ - r dv = -^ f V„j*"dr B1.6.2 

IjrJr J ZTrTr J 

where hk=h{k^-k) is the momentum transfer in the collision. 
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B1. 6.2.2 THE BORN APPROXIMATION 


In the high-energy regime it is appropriate to employ the Born approximation. There are three assumptions: (i) 
The incident wave is undistorted by the target. For a target in its ground state (specified by m = 0) this is 
equivalent to setting V QQ = 0. (ii) There is no interaction between the outgoing electron and the excited target. 
For inelastic scattering with excitation of the final state n, this is equivalent to setting V nn = 0. (iii) The 
excitation is a direct process with no involvement of intermediate states, thus V mn = unless m = 0. The 
scattering amplitude ( equation (Bl.6.2) ) thus contains but one term: 


2717? J 


B1.6.3 


When the potential consists of electron-electron and electron-nucleus Coulombic interactions, 


v _y *" y z ^~ 

*-r* \t — Tj | Hf \V — T> 


substitution in (B 1.6.3) yields 


me 2 f ^ c iA ' r 

27Th J j \r — Tj\ 


the electron-nucleus terms having been lost owing to the orthonormality of the target wave functions. The 
important physical implication of the Born approximation becomes clear if one first performs the integration 
with respect to r, taking advantage of the transformation 


/ 


JfC-r 


4jr 


\r-rj\ K- 


The differential cross section for inelastic collisions exciting the «th state of the target then takes the form 


(da\ tS-m 2 k I f . ^ lK r 


4f 4 JH 3 jfc , 


In this expression, factors that describe the incident and scattered projectile are separated from the square 
modulus of an integral that describes the role of the target in determining the differential cross section. The 
term preceding the 
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integral, 4e 4 m 2 /7j 4 K 4 , with units of area, is the Rutherford cross section for electron-electron scattering. The 
integral, represented by the quantity s n (^0, is known as the inelastic scattering form factor. 

In the discussion above, scattering from molecules is treated as a superposition of noninteracting electron 


waves scattered from each atomic centre. In fact, there is a weak but observable interference between these 
waves giving rise to phase shifts associated with the different positions of the atoms in a molecule. This 
diffraction phenomenon produces oscillations in the differential cross section from which molecular structure 
information can be derived. 

The interaction of the target with the incoming and outgoing electron wave must be considered at lower 
impact energies. This is achieved in the distorted-wave approximation by including V QQ and V nn in the 
calculation of the scattering amplitude. Higher-level calculations must also account for electron spin since 
spin exchange becomes important as the collision energy decreases. 

B1.6.2.3 THE GENERALIZED OSCILLATOR STRENGTH 

The Born approximation for the differential cross section provides the basis for the interpretation of many 
experimental observations. The discussion is often couched in terms of the generalized oscillator strength, 


/"<*> = pf|i|/ x;(rj)^ K ^Mrj)Arj 
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B1.6.5 


Assuming the validity of the Born approximation, an 'effective' generalized oscillator strength can be derived 
in terms of experimentally accessible quantities: 


f tt {K) = — -E n K 2 [ — 1 


B1.6.6 


All the quantities on the right can be measured (k Q , k and K calculated from measurements of the incident 
energy, the energy-loss, and the scattering angle). For inelastic collisions resulting in transitions into the 
continuum beyond the first ionization potential, the cross section is measured per unit energy loss and the 
generalized oscillator strength density is determined: 
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B1.6.7 


The generalized oscillator strength provides the basis of comparison between electron energy-loss spectra and 
optical spectra; however, there is a problem with the determination of the absolute value of the generalized 
oscillator strength since measurements of the differential cross section can rarely be made on an absolute basis 
owing to the difficulty of accurately determining the target dimension and density. The problem is overcome 
by a normalization of 
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experimentally determined generalized oscillator strengths according to the Bethe sum rule which requires 
that the sum of the f n (if) ( equation B 1.6. 6 ) for all discrete transitions plus the integral of df (K)/dE ( equation 
Bl.6.7 ) over the continuum adds up to the number of electrons in the target. 


A particularly important property of the generalized oscillator strength is that, for high-energy, small-angle 


scattering, the generalized oscillator strength is approximately equal to the optical oscillator strength, f* pi , for 

electric-dipole transitions induced by photon absorption. That this is so can be seen from a power-series 
expansion of the form factor that appears in the expression for the generalized oscillator strength ( equation 
Bl.6.5 ): 




The operator in the first term goes as r-, and is thus proportional to the optical dipole transition moment 

J i 

The second term is proportional to the optical quadrupole transition moment, and so on. For small values of 
momentum transfer, only the first term is significant, thus 

lim/.<tf> = jr- 

The result is that the small-angle scattering intensity as a function of energy loss (the energy-loss spectrum) 
looks like the optical absorption spectrum. In fact, oscillator strengths are frequently more accurately and 
conveniently measured from electron-impact energy-loss spectra than from optical spectra, especially for 
higher-energy transitions, since the source intensity, transmission and detector sensitivity for an electron 
scattering spectrometer are much more nearly constant than in an optical spectrometer. The proportionality of 
the generalized oscillator strength and the optical dipole oscillator strength appears to be valid even for 
incident-electron energies as low as perhaps 200 eV, but it is strictly limited to small-angle, forward scattering 
that minimizes momentum transfer in a collision [5]. Of course K = is inaccessible in an inelastic collision; 
there must be at least sufficient momentum transferred to account for the kinetic energy lost by the projectile 
in exciting the target. For very-high-energy, small-angle scattering, the minimum momentum transfer is 
relatively small and can be ignored. At lower energies, an extrapolation technique has been employed in very 
accurate work. 

On the other hand, there are unique advantages to electron-scattering measurements in the lower-energy, 
larger-scattering-angle regime in which the momentum transfer is larger than the minimum required to induce 
a transition. In this case, higher-order multipoles in the transition moment become significant, with the result 
that the cross section for the excitation of optically forbidden transitions increases relative to that for dipole- 
allowed transitions. The ability to vary the momentum transfer in electron-energy-loss spectroscopy yields a 
spectrum much richer than the optical spectrum. 
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B1. 6.2.4 THE BETHE SURFACE: BINARY VERSUS DIPOLE COLLISIONS 

An important feature of electron-impact spectroscopy in comparison to photoabsorption is that the momentum 
transfer can be varied. As the scattering angle increases and the incident energy decreases, higher-order terms 
in the expansion of & n (K) become relatively more important. Large-angle, high-momentum-transfer scattering 
results from small impact parameters. In this case the target experiences a nonuniform electric field as the 
electron passes by. Significant amplitude of higher-order multipoles in a nonuniform field permits the field to 


couple with higher-order multipoles of the target. Thus, for example, optically forbidden electric-quadrupole 
transitions are a significant feature of the low-energy, large-scattering-angle electron energy-loss spectrum. 
Selection rules for electronic excitation by electron impact have been treated in detail in the review by Hall 
and Read [6]. 

From the discussion above it appears that small-angle scattering events might better not be thought of as 
collisions at all. The excitation, which is photon-like, appears to be a consequence of the high-frequency 
electric pulse delivered to the target as the projectile electron passes rapidly by. These energetic, but glancing 
collisions are referred to as dipole collisions, in contradistinction to the larger-angle scattering regime of 
binary collisions where the projectile electron appears to undergo a hard collision with one of the target 
electrons. 

A succinct picture of the nature of high-energy electron scattering is provided by the Bethe surface [4], a 
three-dimensional plot of the generalized oscillator strength as a function of the logarithm of the square of the 

momentum transfer, (K ) and the energy-loss, E . To see how this works, consider the form of the Bethe 
surface for a 3D billiards game with a 'projectile cue ball incident on a stationary billiard ball. This is a two- 
body problem so the energy-loss is uniquely determined by the momentum transfer: EJft K\ 2 /2m. For each 
value of K, all the oscillator strength appears at a single value of E . The Bethe surface displays a sharp ridge 

extending from low values of K and E (corresponding to large-impact-parameter, glancing collisions) to 
high values (corresponding to near 'head on' collisions). 

A schematic of the Bethe surface for electron scattering from an atom or molecule is shown in figure Bl.6.6 
superimposed on the surface for the two-body problem with a stationary target. The sharp ridge is broadened 
in the region of large momentum transfer and energy loss. These are hard collisions in which the projectile 
ejects an electron from the target. This is at least a three-body problem: because the recoil momentum of the 
ionic core is not accounted for, or, alternatively, since the target electron is not stationary, K and E are not 
uniquely related. The breadth and the shape of the ridge in the Bethe surface in the high-momentum region are 
a reflection of the momentum distribution of the target electron. Electron Compton scattering experiments [7, 
8] and (e, 2e) experiments [9] are carried out in the high-momentum-transfer, large-energy-loss, region for the 
purpose of investigating the electron momentum distribution in atoms and molecules. 
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Figure Bl.6.6 The Bethe surface. The sharp ridge corresponds to scattering from a single stationary target 
electron; the broadened ridge to scattering from the electrons in an atom or molecule. 

The dipole region is approached as one proceeds to the low-momentum-transfer portion of the Bethe surface. 
For sufficiently small momentum transfer, where 2m\hK\ 2 approaches the binding energy of valence electrons 
in the target, a section through the Bethe surface has the appearance of the optical absorption spectrum. For 
values of the energy loss less than the first ionization energy of the target, sharp structure appears 
corresponding to the excitation of discrete, dipole-allowed transitions in the target. The ionization continuum 
extends from the first ionization potential, but, as in the photoabsorption spectrum, one typically sees sharp 
'edges' as successive ionization channels become energetically possible. Also, as in the photoabsorption 
spectrum, resonance structures may appear corresponding to the excitation of metastable states imbedded in 
the continuum. 

B1. 6.2.5 LOW-ENERGY ELECTRON SCATTERING 

Theoretically, the asymptotic form of the solution for the electron wave function is the same for low-energy 
projectiles as it is at high energy; however, one must account for the protracted period of interaction between 
projectile and target at the intermediate stages of the process. The usual procedure is to separate the incident- 
electron wave function into partial waves 


J*** = JjH + l)i'/^COSfl)jj(V) 


f=0 


(/*! a spherical Bessel function; P l a Legendre polynomial) and expand the wave function of the target in some 
complete basis set, typically the complete set of eigenfunctions for the unperturbed target. This approach 
allows for any distortion of the incoming and scattered electron waves, as well as any perturbation of the 
target caused by the 
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approaching charge of the projectile. Each term in the expansion for the incident wave represents an angular 
momentum component: / = 0, the s-wave component; / = 1, the p-wave component; and so on. The scattering 
of the incident wave is treated component by component, the interaction with each giving rise to a phase shift 
in that component of the outgoing wave. 

To see how this works, consider elastic scattering in a situation where the electron-target interaction can be 

described by a simple central-force-field potential, V(r), that does not fall off faster than r at large r. In this 
case the wave equation for the projectile electron can be separated from the Schrodinger equation for the total 
electron-target system given above ( equation (B 1.6.1) ). The wave equation for the projectile is 


[~ 7 ^ v ^ Vi,) ] nr) = Tj ^ nr) 


Since the potential depends only upon the scalar r, this equation, in spherical coordinates, can be separated 
into two equations, one depending only on r and one depending on and (|). The wave equation for the r- 
dependent part of the solution, R(r), is 




B1.6.8 


where fth<Jl{l + l)/ris the orbital angular momentum associated with the /th partial wave. The solutions have 
the asymptotic form 


R(r) -* — sin U H r - — + ^ \ 


B1.6.9 


and the calculation of the cross section is reduced to calculating the phase shift, r| 1? for each partial wave. The 
phase shift is a measure of the strength of the interaction of a partial wave in the field of the target, as well as 
a measure of the time period of the interaction. High-angular-momentum components correspond to large 
impact parameters for which the interaction can generally be expected to be relatively weak. The exceptions 
are for the cases of long-range potentials, as when treating scattering from highly polarizable targets or from 
molecules with large dipole moments. In any event, only a limited number of partial waves need be 
considered in calculating the cross section — sometimes only one or two. 

B1. 6.2.6 RESONANCES 

The partial wave decomposition of the incident-electron wave provides the basis of an especially appealing 
picture of strong, low-energy resonant scattering wherein the projectile electron spends a sufficient period of 
time in the vicinity 


-17- 


of the target that the electron-target complex is describable as a temporary negative ion. With the radial wave 
equation (Bl.6.8) for the projectile in a central-force field as a starting point, define a fictitious potential 

„ (r) _fi«i±2> +V(r) . 

This is a fictitious potential because it includes not only the true potential, V{r\ that contains the screened 

Coulomb potential and the polarization potential of the target, but also the term (7r/2m)/(/+l)/r that arises 
from the centrifugal force acting on the projectile, a fictitious force associated with curvilinear motion. As 
shown in figure Bl.6.7 this repulsive term may give rise to an angular momentum barrier. Some part of the 
incident-electron-wave amplitude may tunnel through the barrier to impinge upon the repulsive part of the 
true potential from which it is reflected to tunnel back out to join the incident wave. The superposition of 
these two waves produces the phase shift in the scattered wave (see equation (Bl.6.9) ). More interestingly, as 
shown in the figure, there are special incident electron energies for which the width of the well behind the 
barrier is equal to some integral multiple of the electron wavelength. A standing wave then persists, 
corresponding to an electron being temporarily trapped in the field of the target. This model describes the 
resonant formation of a metastable negative ion, or, more simply, a 'resonance'. 
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Figure Bl.6.7 An angular momentum barrier created by the addition of the centrifugal potential to the 
electron-atom potential. 

Finally, it must be recognized that all the above discussion assumes an isolated atomic or molecular target. To 
describe electron scattering in a complex target such as a solid, one must consider the extended nature of the 
valence-electron 
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density that constitutes a kind of electron gas enveloping the array of positively charged atomic cores. Most 
calculations employ a so-called 'jellium' model in which the mobile electrons move in the field of a positive 
charge smeared out into a homogeneous neutralizing density. 


B1. 6.3 APPLICATIONS 

Electron energy-loss spectroscopy is used for obtaining spectroscopic data as a convenient substitute for 
optical spectroscopy, and, taking advantage of differences in selection rules, as an adjunct to optical 
spectroscopy. In addition, electron spectroscopy has many applications to chemical and structural analysis of 
samples in the gas phase, in the solid phase, and at the solid-gas interface. 

B1. 6.3.1 VALENCE-SHELL-ELECTRON SPECTROSCOPY 


Electronic transitions within the valence shell of atoms and molecules appear in the energy-loss spectrum 
from a few electron volts up to, and somewhat beyond, the first ionization energy. Valence-shell electron 
spectroscopy employs incident electron energies from the threshold required for excitation up to many 
kiloelectron volts. The energy resolution is usually sufficient to observe vibrational structure within the 
Franck-Condon envelope of an electronic transition. The sample in valence-shell electron energy-loss 
spectroscopy is most often in the gas phase at a sufficiently low pressure to avoid multiple scattering of the 

projectile electrons, typically about 10 mbar. Recently, electronic excitation in surface adsorbates has been 
observed in the energy-loss spectrum of electrons reflected from metallic substrates. When the measurements 


are carried out with relatively high incident-electron energies (many times the excitation and ionization 
energies of the target electrons) and with the scattered electrons detected in the forward direction (0° 
scattering angle), the energy-loss spectrum is essentially identical to the optical spectrum. As described above, 
this is the arrangement employed to determine oscillator strengths since forward scattering corresponds to 
collisions with the lowest momentum transfer. If the incident energy is reduced to about 100 eV (just a few 
times the target electron energies), symmetry-forbidden transitions can be uncovered and distinguished from 
optically allowed transitions by measuring the energy-loss spectrum at different scattering angles. An example 
is shown in figure Bl.6.8 . As the incident energy approaches threshold, it becomes possible to detect electron- 
spin-changing transitions. 
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Figure Bl.6.8 Energy-loss spectra of 200 eV electrons scattered from chlorine at scattering angles of 3° and 
9° [10]. Optically forbidden transitions are responsible for the intensity in the 9° spectrum that does not 
appear in the 3° spectrum. 


B1. 6.3.2 INNER-SHELL-ELECTRON ENERGY-LOSS SPECTROSCOPY 


Inner-shell-electron energy-loss spectroscopy (ISEELS) refers to measurements of the energy lost by 
projectile electrons that have promoted inner- shell electrons into unfilled valence orbitals or into the 
ionization continuum beyond the valence shell. Inner-shell excitation and ionization energies fall in the region 
between 100 eV and several kiloelectron volts. The corresponding features in the energy-loss spectrum tend to 
be broad and diffuse. Inner-shell-hole states of neutral atoms and molecules are very short lived; the transition 
energies to these states are correspondingly uncertain. The energy of a transition into the continuum is 
completely variable. Inner-shell-electron ionization energies cannot be uniquely determined from a 
measurement of the energy lost by the scattered electron since an unknown amount of energy is carried away 
by the undetected ejected electron. To be more precise, the electron-impact ionization cross section depends 
upon the energy (i? s ) and angle (Q § ) of the scattered electron as well as the energy (E Q ) and angle (Q e ) of the 

ejected electron; it is a fourfold differential cross section: do/d Q § d E § dQ e d E . The intensity in the inner- 
shell energy-loss spectrum is proportional to this cross section integrated over E and Q . For collisions in 
which the scattered electron is detected in the forward direction, the momentum transfer to the target is small 
and it is highly probable that the ejected-electron energy is very small. As a consequence, the ionization cross 
section for forward scattering is largest at the energy-loss threshold for ionization of each inner-shell electron 
and decreases monotonically, approximately as the inverse square of the energy loss. The basic appearance of 


each feature in the inner-shell electron energy-loss spectrum is that of a sawtooth that rises sharply at the low- 
energy 'edge' and falls slowly over many tens of electron volts. In addition, sharp structures may appear near 
the edge ( figure Bl.6.9 ). 
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Figure Bl.6.9 Energy-loss spectrum of La 2 3 showing O K and La M 45 ionization edges with prominent 
'white line' resonances at the La edge [11]. 

In contrast to the broadly distributed valence-shell electron density in molecules and solids, inner-shell 
electron density is localized on a single atom. Molecular configuration and solid-state crystal structure have 
little effect upon inner-shell ionization energies. Each ionization edge in the energy-loss spectrum is 
characteristic of a particular type of atom; consequently, ISEELS has become an important technique for 
qualitative and quantitative elemental analysis. An especially useful elaboration of this technique is carried 
out in the transmission electron microscope where energy-loss analysis of electrons passing through a thin 
sample makes elemental analysis possible with the spatial resolution of the microscope [12], A precision of 
the order of 100 ppm with nanometre spatial resolution has been achieved; this is close to single-atom 
sensitivity. A difficulty with ISEELS for analytical purposes is that each ionization edge is superimposed on 
the continuum associated with lower-energy ionizations of all the atoms in a sample. This continuum 
comprises an intense background signal which must be subtracted if the intensity at a characteristic edge is to 
be used as a quantitative measure of the concentration of a particular element in a sample. 


Electrons arising from near-threshold inner-shell ionization have very little kinetic energy as they pass 
through the valence shell and may become trapped behind an angular momentum barrier in the exterior 
atomic potential, much as do low-energy incident electrons. This phenomenon, a wave-mechanical resonance 
as described above ( section B 1.6. 2. 6 ), gives rise to structure in the vicinity of an ionization edge in the 
energy-loss spectrum. For isolated molecular targets in the gas phase, the energies of near-edge resonances 
(relative to threshold) can be correlated with the eigenenergies of low-lying, unoccupied molecular orbitals 
(relative to the first ionization energy). Resonances in solid-state targets are especially prominent for fourth- 
and fifth-period elements where sharp threshold peaks known as 'white lines' are associated with electrons 
being trapped in vacant d and f bands (see figure Bl.6.9). Resonance peaks have the effect of concentrating 
transition intensity into a narrow band of energies, thereby increasing the analytical sensitivity for these 
elements. Near-edge structure, being essentially a valence-shell or valence-band phenomenon, can provide 
important spectroscopic information about the chemical environment of the atoms in a sample. 
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B1. 6.3.3 REFLECTED-ELECTRON ENERGY-LOSS SPECTROSCOPY 


Vibrational spectroscopy of atoms and molecules near or on the surface of a solid has become an essential 
tool for the microscopic description of surface processes such as catalysis and corrosion. The effect of a 
surface on bonding is sensitively reflected by the frequencies of vibrational motions. Furthermore, since 
vibrational selection rules are determined by molecular symmetry that in turn is profoundly modified by the 
presence of a surface, it is frequently possible to describe with great accuracy the orientation of molecular 
adsorbates and the symmetry of absorption sites from a comparison of spectral intensities for surface-bound 
molecules to those for free molecules [13]. Electron spectroscopy has an advantage over optical methods for 
studying surfaces since electrons with energies up to several hundred electron volts penetrate only one or two 
atomic layers in a solid before being reflected, while the dimension probed by photons is of the order of the 
wavelength. Reflected-electron energy-loss spectroscopy (REELS) applied to the study of vibrational motion 
on surfaces represents the most highly developed technology of electron spectroscopy [14], Incident electron 
energies are typically between 1 and 10 eV and sensitivity as low as a few per cent of a monolayer is routinely 
achieved (figure B 1.6. 10). 
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Figure Bl.6.10 Energy-loss spectrum of 3.5 eV electrons specularly reflected from benzene absorbed on the 
rhenium(l 11) surface [15]. Excitation of C-H vibrational modes appears at 100, 140 and 372 meV. Only 
modes with a changing electric dipole perpendicular to the surface are allowed for excitation in specular 
reflection. The great intensity of the out-of-plane C-H bending mode at 100 meV confirms that the plane of 
the molecule is parallel to the metal surface. Transitions at 43, 68 and 176 meV are associated with Rh-C and 
C-C vibrations. 

B1. 6.3.4 ELECTRON TRANSMISSION SPECTROSCOPY 

An important feature of low-energy electron scattering is the formation of temporary negative ions by the 
resonant capture of incident electrons (see Bl.6.2.6 , above). These processes lead to sharp enhancements of 
the elastic-scattering cross section and often dominate the behaviour of the cross section for inelastic 
processes with thresholds lying close to the energy of a resonance [16], Elastic-electron-scattering resonances 
are observed by electron transmission spectroscopy. Two types of resonance are distinguished: shape 
resonances and Feshbach resonances. Shape resonances arise when an electron is temporarily trapped in a 
well created in the 'shape' of the electron-target potential by a centrifugal barrier ( figure Bl.6.7 ). Feshbach 
resonances involve the simultaneous trapping of the projectile 


-22- 


Ti-electron systems. In order to emphasize the abrupt change in cross section characteristic of a resonance, the 
spectra are presented as the first derivative of the transmitted current as a function of incident-electron energy. 
This is accomplished by modulating the incident-electron energy and detecting the modulated component of 
the transmitted current. An example is shown in figure B 1.6.1 1. Feshbach resonances fall in the to about 30 
eV range. They are found in electron scattering from atoms, but rarely for molecules. The study of these 
resonances has contributed to the understanding of optically inaccessible excited states of atoms and ions. 
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Figure Bl.6.11 Electron transmission spectrum of 1,3-cyclohexadiene presented as the derivative of 
transmitted electron current as a function of the incident electron energy [17]. The prominent resonances 
correspond to electron capture into the two unoccupied, antibonding 7i*-orbitals. The n^* negative ion state is 
sufficiently long lived that discrete vibronic components can be resolved. 

B1. 6.3.5 DIPOLE (E, 2E) SPECTROSCOPY 

The information from energy-loss measurements of transitions into the continuum, that is, ionizing 
excitations, is significantly diminished because the energy of the ionized electron is not known. The problem 
can be overcome by 


-23- 


measuring simultaneously the energies of the scattered and ejected electrons. This is known as the (e, 2e) 
technique — the nomenclature is borrowed from nuclear physics to refer to a reaction with one free electron in 
the initial state and two in the final state. For spectroscopic purposes the experiment is carried out in the 
dipole scattering regime (see section B 1.6. 2.4 ). Two analyser/detector systems are used: one in the forward 
direction detects fast scattered electrons and the second detects slow electrons ejected at a large angle to the 
incident-electron direction (typically at the 'magic angle' of 54.7°). In order to ensure that pairs of electrons 
originate from the same ionizing collision, the electronics are arranged to record only those events in which a 
scattered and ejected electron are detected in coincidence (see B.l.l 1, 'coincidence techniques'). The 
ionization energy, or binding energy, is unambiguously given by the difference between the incident electron 
energy and the sum of the energies of the scattered and ejected electrons detected in coincidence. Dipole (e, 
2e) spectra (figure B 1.6. 12) are analogous to photoabsorption or photoelectron spectra obtained with tunable 


UV or x-ray sources. 
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Figure Bl.6.12 Ionization-energy spectrum of carbonyl sulphide obtained by dipole (e, 2e) spectroscopy [18]. 
The incident-electron energy was 3.5 keV, the scattered incident electron was detected in the forward 
direction and the ejected (ionized) electron detected in coincidence at 54.7° (angular anisotropics cancel at 
this 'magic angle'). The energy of the two outgoing electrons was scanned keeping the net energy loss fixed at 
40 eV so that the spectrum is essentially identical to the 40 eV photoabsorption spectrum. Peaks are identified 
with ionization of valence electrons from the indicated molecular orbitals. 
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B1.7 Mass spectrometry 

Paul M Mayer 


B1.7.1 INTRODUCTION 

Mass spectrometry is one of the most versatile methods discussed in this encyclopedia. Ask a chemist 
involved in synthesis about mass spectrometry and they will answer that it is one of their most useful tools for 
identifying reaction products. An analytical chemist will indicate that mass spectrometry is one of the most 
sensitive detectors available for quantitative and qualitative analysis and is especially powerful when coupled 
to a separation technique such as gas chromatography. A physicist may note that high resolution mass 
spectrometry has been responsible for the accurate determination of the atomic masses listed in the periodic 
table. Biologists use mass spectrometry to identify high molecular weight proteins and nucleic acids and even 
for sequencing peptides. Materials scientists use mass spectrometry for characterizing the composition and 
properties of polymers and metal surfaces. 

The mass spectrometer tends to be a passive instrument in these applications, used to record mass spectra. In 
chemical physics and physical chemistry, however, the mass spectrometer takes on a dynamic function as a 


tool for the investigation of the physico-chemical properties of atoms, molecules and ions. It is this latter 
application that is the subject of this chapter, and it is hoped that it will bring the reader to a new 
understanding of the utility of mass spectrometry in their research. 

The chapter is divided into sections, one for each general class of mass spectrometer: magnetic sector, 
quadrupole, time-of-flight and ion cyclotron resonance. The experiments performed by each are quite often 
unique and so have been discussed separately under each heading. 


B1.7.2 ION SOURCES 

A common feature of all mass spectrometers is the need to generate ions. Over the years a variety of ion 
sources have been developed. The physical chemistry and chemical physics communities have generally 
worked on gaseous and/or relatively volatile samples and thus have relied extensively on the two traditional 
ionization methods, electron ionization (EI) and photoionization (PI). Other ionization sources, developed 
principally for analytical work, have recently started to be used in physical chemistry research. These include 
fast-atom bombardment (FAB), matrix-assisted laser desorption ionization (MALDI) and electrospray 
ionization (ES). 

B1.7.1.2 ELECTRON IONIZATION (El) 

A schematic diagram of an electron ionization (EI) ion source is shown in figure Bl.7.1 . A typical source will 
consist of a block, filament, trap electrode, repeller electrode, acceleration region and a focusing lens. Sample 
vapour, introduced into the ion source (held at the operating potential of the instrument) through a variable 
leak valve or capillary interface, is ionized by electrons that have been accelerated towards the block by a 
potential gradient and collected at the trap electrode. A repeller electrode nudges the newly formed ions out of 
the source through an exit slit, 


and they are accelerated to the operating kinetic energy of the instrument. A series of ion lenses is used to 
focus the ion beam onto the entrance aperture of the mass spectrometer. 
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Figure Bl.7.1. Schematic diagram of an electron ionization ion source: source block (1); filament (2); trap 
electrode (3); repeller electrode (4); acceleration region (5); focusing lens (6). 


Ionization with energetic electrons does not deposit a fixed amount of energy into a molecule. Rather, a 10 eV 


beam of electrons can deposit anywhere from to 10 eV of energy. For this reason, most instruments relying 
on electron ionization (often referred to as 'electron impact' ionization, a misleading expression as no actual 
collision between a molecule and an electron takes place) use electron beams of fairly high energy. Most 
analytical instruments employ electron energies of -70 eV as it has been found that a maximum in the ion 
yield for most organic molecules occurs around this value. The resulting ion internal energies can be 
described by the Wannier threshold law [1], but at an electron energy of 70 eV this corresponds almost 
exactly to the photoelectron spectrum. Superimposed on this is the internal energy distribution of the neutral 

molecules prior to ionization. Since most ion sources operate at very low pressure (~10 -7 to 10 Torr), the 
resulting ion population has a non-Boltzmann distribution of internal energies and thus it is difficult to discuss 
the resulting ion chemistry in terms of a thermodynamic temperature. 

It is fairly difficult to obtain energy-selected beams of electrons (see chapter B 1.7 ). Thus, electron beams 
employed in most mass spectrometers have broad energy distributions. One advantage of EI over 
photoionization, though, is that it is relatively simple to produce high energy electrons. All that is required is 
the appropriate potential drop between the filament and the ion source block. This potential drop is also 
continuously adjustable and the resulting electron flux often independent of energy. 

B1. 7.2.2 PHOTOIONIZATION (PI) 

Photoionization with photons of selected energies can be a precise method for generating ions with known 
internal energies, but because it is bound to a continuum process (the ejected electron can take on any energy), 
ions are usually generated in a distribution of internal energy states, according to the equation 
h v + Eukttii = ' £"ab + £ae- + Et, where hv is the photon energy, E^ is the average thermal energy of the 
molecule AB, IE is the ionization energy of the 


molecule, £\\ it-is the ion internal energy and E is the kinetic energy of the departing electron. The deposition 

of energy into the ion typically follows the photoelectron spectrum up to the photon energy. Superimposed on 
this is the internal energy distribution of the original neutral molecules. 

There are three basic light sources used in mass spectrometry: the discharge lamp, the laser and the 
synchrotron light source. Since ionization of an organic molecule typically requires more than 9 or 10 eV, 
light sources for photoionization must generate photons in the vacuum-ultraviolet region of the 
electromagnetic spectrum. A common experimental difficulty with any of these methods is that there can be 
no optical windows or lenses, the light source being directly connected to the vacuum chamber holding the 
ion source and mass spectrometer. This produces a need for large capacity vacuum pumping to keep the mass 
spectrometer at operating pressures. Multiphoton ionization with laser light in the visible region of the 
spectrum overcomes this difficulty. 

B1.7.2.3 CHEMICAL IONIZATION 

A third method for generating ions in mass spectrometers that has been used extensively in physical chemistry 
is chemical ionization (CI) [2]. Chemical ionization can involve the transfer of an electron (charge transfer), 
proton (or other positively charged ion) or hydride anion (or other anion). 


R h ' + M -> R + M + ' 

RH +M -* R + Mhf 
RH"+M -+ R + MH". 

The above CI reactions will occur if they are exothermic. In order for these reactions to occur with high 
efficiency, the pressure in the ion source must be raised to the milliTorr level. Also, the reagent species are 
often introduced in large excess so that they are preferentially ionized by the electron beam. 

B1.7.2.4 OTHER IONIZATION METHODS 

One feature common to all of the above ionization methods is the need to thermally volatilize liquid and solid 
samples into the ion source. This presents a problem for large and/or involatile samples which may 
decompose upon heating. Ionization techniques that have been developed to get around this problem include 
fast-atom bombardment (FAB) [3], matrix-assisted laser desorption ionization (MALDI) [4] and electrospray 
ionization (ES) [5] ( figure Bl.7.2 ). FAB involves bombarding a sample that has been dissolved in a matrix 
such as glycerol with a high energy beam of atoms. Sample molecules that have been protonated by the 
glycerol matrix are sputtered off the probe tip, resulting in gas-phase ions. If high energy ions are used to 
desorb the sample, the technique is called SIMS (secondary ion mass spectrometry). MALDI involves 
ablating a sample with a laser. A matrix absorbs the laser light, resulting in a plume of ejected material, 
usually containing molecular ions or protonated molecules. In electrospray, ions are formed in solution by 
adding protons or other ions to molecules. The solution is sprayed through a fine capillary held at a high 
potential relative to ground (several keV are common). The sprayed solution consists of tiny droplets that 
evaporate, leaving gas-phase adduct ions which are then introduced into a mass spectrometer for analysis. 


Figure Bl.7.2. Schematic representations of alternative ionization methods to EI and PI: (a) fast-atom 
bombardment in which a beam of keV atoms desorbs solute from a matrix (b) matrix-assisted laser desorption 
ionization and (c) electrospray ionization. 

B1.7.2.5 MOLECULAR BEAM SOURCES 

Sample can be introduced into the ion source in the form of a molecular beam [6, 7] ( figure B 1.7.3 ). 
Molecular beams are most often coupled to time-of- flight instruments for reasons that are discussed in section 
(Bl.7.5) . The important advantage that molecular beams have over the other methods discussed in this section 
is their ability to cool the internal degrees of freedom of the sample. Collisions between a carrier gas (such as 
helium or argon) and the sample molecule in the rapidly expanding gas mixture results in rotational and 
vibrational cooling. Using this approach, the effective internal 'temperature' of the sample can be 
significantly less than ambient. One example of the benefits of using molecular beams is in photoionization. 
The photon energy can be more readily equated to the ion internal energy if the initial internal energy 
distribution of the neutral molecule is close to K. This cooling also allows weakly bound species such as 
neutral clusters to be generated and their resulting PI or EI mass spectrum obtained. 
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Figure Bl.7.3. Schematic diagram of a molecular beam generator: nozzle (1); expansion region (2); skimmer 
(3) and molecular beam (4). 

B1. 7.2.6 HIGH PRESSURE SOURCES 

There are two significant differences between a high pressure ion source and a conventional mass 
spectrometer ion source. High pressure sources typically have only two very small orifices (aside from the 
sample inlet), one to permit ionizing electrons into the source and one to permit ions to leave. Total ion source 
pressures of up to 5-10 Torr can be obtained allowing the sample vapour to reach thermal equilibrium with 
the walls of the source. Because the pressures in the source are large, a 70 eV electron beam is insufficient to 
effectively penetrate and ionize the sample. Rather, electron guns are used to generate electron translational 
energies of up to 1-2 keV. 


B1.7.3 MAGNETIC SECTOR INSTRUMENTS 

The first mass spectrometers to be widely used as both analytical and physical chemistry instruments were 
based on the deflection of a beam of ions by a magnetic field, a method first employed by J J Thomson in 
1913 [8] for separating isotopes of noble gas ions. Modern magnetic sector mass spectrometers usually consist 
of both magnetic and electrostatic sectors, providing both momentum and kinetic energy selection. The term 
'double-focusing' mass spectrometer refers to such a configuration and relates to the fact that the ion beam is 
focused at two places between the ion source of the instrument and the detector. It is also possible to add 
sectors to make three-, four, five- and even six-sector instruments, though the larger of these are typically 
used for large molecule analysis. One of the staple instruments used in physical chemistry has been the 
reverse-geometry tandem sector mass spectrometer ('BE' configuration), which will be described below. The 
basic principles apply to any magnetic sector instrument configuration. 


B1. 7.3.1 INSTRUMENTATION 


A schematic diagram of a reverse geometry mass spectrometer is shown in figure B 1.7.4. 
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Figure Bl.7.4. Schematic diagram of a reverse geometry (BE) magnetic sector mass spectrometer: ion source 
(1); focusing lens (2); magnetic sector (3); field-free region (4); beam resolving slits (5); electrostatic sector 
(6); electron multiplier detector (7). Second field-free region components: collision cells (8) and beam 
deflection electrodes (9). 

(A) THE MAGNETIC SECTOR 

The magnetic sector consists of two parallel electromagnets surrounding an iron core. The ion beam travels 
through the flight tube perpendicular to the direction of the imposed magnetic field. The path of an ion 
travelling orthogonal to a magnetic field is described by a simple mathematical relationship: 


y =■ 


mv 


(B1.7.1) 


where r is the radius of curvature of the path of the ion, m is the ion's mass, v is the velocity, B is the magnetic 
field strength, z is the number of charges on the ion and e is the unit of elementary charge. Rearranged, 


mv = rBze. 


(B1.7.2) 


Most instruments are configured with a fixed value for the radius of curvature, r, so changing the value of B 
selectively passes ions of particular values of momentum, mv, through the magnetic sector. Thus, it is really 
the momentum that is selected by a magnetic sector, not mass. We can convert this expression to one 
involving the accelerating potential. 


Magnetic sector instruments typically operate with ion sources held at a potential of between 6 and 10 kV. 
This results in ions with keV translational kinetic energies. The ion kinetic energy can be written as 
zeV = ^»ip 2 and thus the ion velocity is given by the relationship 


-fir) 


1/2 


and equation (B 1.7.2) becomes 


»/«-^£- <B173) 

In other words, ions with a particular mass-to-charge ratio, m/z, can be selectively passed through the 
magnetic sector by appropriate choice of a value of Fand B (though normally Fis held constant and only B is 
varied). 

Magnetic sectors can be used on their own, or in conjunction with energy analysers to form a tandem mass 
spectrometer. The unique features of the reverse geometry instrument are presented from this point. 

(B) THE FIELD-FREE REGION 

The momentum-selected ion beam passes through the field-free region (FFR) of the instrument on its way to 
the electrostatic sector. The FFR is the main experimental region of the magnetic sector mass spectrometer. 
Significant features of the FFR can be collision cells and ion beam deflection electrodes. One particular 
arrangement is shown in figure B 1.7.4 . A collision cell consists of a 2-3 cm long block of steel with a groove 
to pass the ion beam. A collision (target) gas can be introduced into the groove, prompting projectile-target 
gas collisions. The beam deflecting electrode assembly allows the ion beam to be deflected out of the beam 
path by the application of a potential difference across the assembly (see section (B 1.7. 3. 2V ). 

(C) THE ELECTROSTATIC SECTOR (ESA) 

The electrostatic sector consists of two curved parallel plates between which is applied a potential difference 
producing an electric field of strength E. Transmission of an ion through the sector is governed by the 
following relationship 

^mv 2 = zeV = jzeEr. 

This relationship gives an expression similar to equation (B 1.7.1) : 

2V 

r =— (B1.7.4) 

E 


Adjusting the potential across the ESA plates allows ions of selected translational kinetic energy to pass 
through and be focused, at which point a detector assembly is present to monitor the ion flux. Note that in 
equation (Bl.7.4) neither the mass nor charge of the ion is present. So, isobaric ions with one, two, three etc. 
charges, accelerated by a potential drop, V, will pass through the ESA at a common value of E. An ion with 
+1 charge accelerated across a potential difference of 8000 V will have 8 keV translational kinetic energy and 
be transmitted through the ESA by a field E^ . An ion with +2 charges will have 16 keV translational energy 


but will experience a field strength equivalent to 2E^ and so forth. 

B1. 7.3.2 EXPERIMENTS USING MAGNETIC SECTOR INSTRUMENTS 

A single magnetic sector can be used as a mass filter for other apparatus. However, much more information of 
the simple mass spectrum of a species can be obtained using the tandem mass spectrometer. 

(A) MASS-ANALYSED ION KINETIC ENERGY SPECTROMETRY (MIKES) 

Ions accelerated out of the ion source with keV translational kinetic energies (and m/z selected with the 
magnetic sector) will arrive in the FFR of the instrument in several microseconds. Ions dissociating on this 

9 S 1 

timescale (with unimolecular decay rate constants between 10 and 10 s , depending on the physical 
geometry of the instrument) have been given the name 'metastable ions' [9]. 

In the FFR of the sector mass spectrometer, the unimolecular decomposition fragments, A + and B, of the mass 
selected metastable ion AB + will, by the conservation of energy and momentum, have lower translational 
kinetic energy, T, than their precursor: 

Thus we find 

'A" — 'AH"- 

'«AB- 

By scanning the ESA to pass ions with lower translational energies, the fragment ions will sequentially pass 

through to the detector (this is the so-called MS/MS, or MS 2 , experiment). The final ion abundance kinetic 
energy spectrum ( figure B 1.7. 5 (a) ) is converted to an ion abundance fragment m/z spectrum by the above 
relationships. The MIKE spectrum is the end result of all low energy unimolecular processes of the selected 
ions, including isomerization. Thus, isomeric ions which interconvert on the (is timescale often have closely 
related, if not identical, MIKE spectra. There 


are several characteristic peak shapes expected in a MIKE spectrum that are summarized in figure Bl.7.6 . 
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Figure Bl.7.5. (a) MIKE spectrum of the unimolecular decomposition of 1-butene ions (m/z 56). This 
spectrum was obtained in the second field-free region of a reverse geometry magnetic sector mass 
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spectrometer (VG ZAB-2HF). (b) Collision-induced dissociation mass spectrum of 1-butene ions. Helium 
target gas was used to achieve 10% beam reduction (single collision conditions) in the second field-free 
region of a reverse geometry magnetic sector mass spectrometer (VG ZAB-2HF). 


a) 


b) 






Figure Bl.7.6. Fragment ion peak shapes expected in MIKE spectra: (a) typical Gaussian energy profile; (b) 
large average kinetic energy release causing z-axial discrimination of the fragment ions, resulting in a 'dished- 
top' peak; (c) competing fragmentation channels, each with its own distinct kinetic energy release, producing 
a 'composite' peak; (d) fragmentation occurring from a dissociative excited state. 

(B) COLLISION-INDUCED DISSOCIATION (CID) MASS SPECTROMETRY 

A collision-induced dissociation (CID) mass spectrum [10, H] of mass selected ions is obtained by 
introducing a target gas into a collision cell in one of the field-free regions. The resulting high energy (keV) 
CID mass spectrum, obtained and analysed in the same way as a MIKE spectrum, contains peaks due to ions 
formed in virtually all possible unimolecular dissociation processes of the precursor ion ( figure B 1.7. 5(b) ). 
The timescale of the collision-induced fragmentation reactions is quite different from the MIKE experiment, 

ranging from the time of the collision event (t « 10 -15 s) to the time the ions exit the FFR. For this reason, 
isomerization reactions tend not to play a significant role in collision-induced reactions and thus the CID mass 
spectra are often characteristic of ion connectivity. 

In collisional excitation, translational energy of the projectile ion is converted into internal energy. Since the 
excited states of the ions are quantized, so will the translational energy loss be. Under conditions of high 
energy resolution, it is 
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possible to obtain a translational energy spectrum of the precursor ions exhibiting peaks that correspond to the 
formation for discrete excited states (translational energy spectroscopy [12]). 


It is possible in a sector instrument to perform a variety of other experiments on the projectile ions. Many 
involve examining the products of charge exchange with the target gas, while others allow neutral species to 
be studied. Some of the more common experiments are summarized in figure B 1.7.7. All of the experiments 


have been described for projectile cations, but anions can be studied in analogous manners. 
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Figure Bl.7.7. Summary of the other collision based experiments possible with magnetic sector instruments: 
(a) collision-induced dissociation ionization (CIDI) records the CID mass spectrum of the neutral fragments 
accompanying unimolecular dissociation; (b) charge stripping (CS) of the incident ion beam can be observed; 
(c) charge reversal (CR) requires the ESA polarity to be opposite that of the magnet; (d) neuturalization- 
reionization (NR) probes the stability of transient neutrals formed when ions are neutralized by collisions in 
the first collision cell. Neutrals surviving to be collisionally reionized in the second cell are recorded as 
'recovery' ions in the NR mass spectrum. 

(C) KINETIC ENERGY RELEASE (KER) MEASUREMENTS 

In a unimolecular dissociation, excess product energy is typically distributed among the translational, 
rotational and vibrational modes in a statistical fashion. The experimentally observed phenomenon is the 
distribution of translational kinetic energies of the departing fragment ions (the kinetic energy release, KER) 
[9]. In magnetic sector instruments, the result of x-axial (i.e. along the beam path) KER is the observation of 
fragment ion peaks in the MIKE or CID spectra which have a broader kinetic energy spread than the precursor 
ion peak. In a CID, however, this spread is complicated by collisional scattering and so KER is most often 
discussed for peaks in MIKE spectra. If the mass 
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spectrometer is operated under conditions of high resolution, obtained by narrowing thejy-axis beam 
collimating slits throughout the instrument, the widths of the fragment ion peaks are indicative of this kinetic 
energy release. The measured value for the KER is typically expressed as the value at half-height of the 
fragment ion peak, T Q 5 , and is calculated with the following equation 


TctsdneV) = — — 

where m is the mass of the various species, AFq 5 is the full width energy spread of the fragment and precursor 
ion peaks at half height and (8) represents the typically 8 keV translational kinetic energy of the precursor [9]. 
The resulting T^ 5 is in meV. Note that since knowledge of the internal energy distribution of the dissociating 
ions is lacking, the relationship between T^ 5 and the average KER is strictly qualitative, i.e. a large T^ 5 
indicates a large average KER value. How statistical the distribution of product excess energies is will depend 
on the dynamics of the dissociation. 


B1.7.4 QUADRUPOLE MASS FILTERS, QUADRUPOLE ION TRAPS 
AND THEIR APPLICATIONS 

Another approach to mass analysis is based on stable ion trajectories in quadrupole fields. The two most 
prominent members of this family of mass spectrometers are the quadrupole mass filter and the quadrupole 
ion trap. Quadrupole mass filters are one of the most common mass spectrometers, being extensively used as 
detectors in analytical instruments, especially gas chromato graphs. The quadrupole ion trap (which also goes 
by the name 'quadrupole ion store, QUISTOR', Paul trap, or just ion trap) is fairly new to the physical 
chemistry laboratory. Its early development was due to its use as an inexpensive alternative to tandem 
magnetic sector and quadrupole filter instruments for analytical analysis. It has, however, started to be used 
more in the chemical physics and physical chemistry domains, and so it will be described in some detail in 
this section. 

The principles of operation of quadrupole mass spectrometers were first described in the late 1950s by 
Wolfgang Paul who shared the 1989 Nobel Prize in Physics for this development. The equations governing 
the motion of an ion in a quadrupole field are quite complex and it is not the scope of the present article to 
provide the reader with a complete treatment. Rather, the basic principles of operation will be described, the 
reader being referred to several excellent sources for more complete information [13, 14 and 15 ]. 

B1.7.4.1 THE QUADRUPOLE MASS FILTER 

A schematic diagram of a quadrupole mass filter is shown in figure Bl.7.8 . In an ideal, three-dimensional, 
quadrupole field, the potential § at any point (x, y, z) within the field is described by equation (Bl.7.5) : 
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4> = -s-(01T +Ay' +er) (B1.7.5) 

r i 

where (|>q is the applied potential, r Q is half the distance between the hyperbolic rods and a, b and c are 
coefficients. The applied potential is a combination of a radio-frequency (RF) potential, Fcos otf, and direct 
current (DC) potential, U. The two can be expressed in the following relationship: 

4>u = U + V cob ti>r 

where co is the angular frequency of the RF field (in rad s ) and is 2tt times the frequency in Hertz. A 
potential applied across the rods (see figure Bl.7.8) is flipped at radio-frequencies. 
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Figure Bl.7.8. Quadrupole mass filter consisting of four cylindrical rods, spaced by a radius r (reproduced 
with permission of Professor R March, Trent University, Peterborough, ON, Canada). 

Ideally, the rods in a quadrupole mass filter should have a hyperbolic geometry, but more common is a set of 
four cylindrical rods separated by a distance 2r, where r =s Ll6ri>(figure Bl.7.8). This arrangement provides 

for an acceptable field geometry along the axis of the mass filter. Since there is no field along the axis of the 
instrument, equation (Bl.7.5) simplifies to ^ — (fofrhiax 2 + by*)- 


The values of a and b that satisfy this relationship are a = 1 and b 
mass filter takes on the form: 


-1, so the potential inside the quadrupole 


<f> = — U -y )< 


n\ 
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The equations of motion within such a field are given by: 


d 2 x c 

— - + — 7 (£/ - VcoswlU = 

d 2 y e 

— 4 + — r (U - V cos wt)y = 0. 


d/ 1 w^y. 


Now, we can make three useful substitutions 


rt , = - f j v = 


feu 

mar i $ 


<?* = -flfv = 


8eV 
m to* r ft 


(t)T 

* = T 


and the result is in the general form of the Mathieu equation [16]: 


dc-- 


— + (n„ - 2*/„ cos 2$ )u = 0. 


(B1.7.6) 


Equation (Bl.7.6) describes the ion trajectories in the quadrupole field (where u can be either x or y). The 
stable, bounded solutions to these equations represent conditions of stable, bounded trajectories in the 


quadrupole mass filter. A diagram representing the stable solutions to the equations for both the x- and jy-axes 
(really the intersection of two sets of stability diagrams, one for the x-axis, one for the j-axis) is shown in 
figure B 1.7. 9(a) . This figure represents the stability region closest to the axis of the instrument and is the most 
appropriate for the operation of the quadrupole. This stability region is also unique for a given mlz ratio. 
figure B 1.7. 9(b) represents the stability regions (transformed into axes of the applied DC and RF potential) for 
a series of ions with different mlz values. The line running through the apex of each region is called the 
operating line and represents the conditions ([/and V) for the selective filtering of different mass ions through 
the instrument. The ratio of [/to Fis a constant along the operating line. It is apparent from figure B 1.7. 9(b) 
that the resolution of the quadrupole mass filter can be altered by changing the slope of the operating line. A 
greater slope means there is greater separation of the ions, but at the expense of sensitivity (fewer ions will 
have stable trajectories). 
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Figure Bl.7.9. (a) Stability diagram for ions near the central axis of a quadrupole mass filter. Stable 
trajectories occur only if the a u and q u values lie beneath the curve, (b) Stability diagram (now as a function of 
[/and V) for six ions with different masses. The straight line running through the apex of each set of curves is 
the 'operating' line, and corresponds to values of UIV that will produce mass resolution (reproduced with 
permission of Professor R March, Trent University, Peterborough, ON, Canada). 


To be effective, it is necessary for the ions traversing the instrument to experience several RF cycles. Thus, 
unlike magnetic sector instruments, the ions formed in the ion source of a quadrupole mass filter apparatus are 
accelerated to only a few eV kinetic energy (typically 5-10 eV). The timescale of the experiment is therefore 


much longer than for magnetic sectors, ions having to survive milliseconds rather than microseconds to be 
detected. 

B1.7.4.2 EXPERIMENTS USING QUADRUPOLE MASS FILTERS 

Aside from the single mass filter, the most common configuration for quadrupole mass spectrometers is the 
triple-quadrupole instrument. This is the simplest tandem mass spectrometer using quadrupole mass filters. 
Typically, the 
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first and last quadrupole are operated in mass selective mode as described above. The central quadrupole is 
usually an RF-only quadrupole. The lack of a DC voltage means that ions of all m/z values will have stable 
trajectories through the filter. 

(A) COLLISION-INDUCED DISSOCIATION 

Adding a collision gas to the RF-only quadrupole of a triple-quadrupole instrument permits collision-induced 
dissociation experiments to be performed. Unlike magnetic sector instruments, the low accelerating potential 
in quadrupole instruments means that low energy collisions occur. In addition, the time taken by the ions to 
traverse the RF only quadrupole results in many of these low energy collisions taking place. So, collisional 
excitation occurs in a multi-step process, rather than in a single, high-energy, process as in magnetic sector 
instruments. 

(B) REACTIVE COLLISIONS 

Since ions analysed with a quadrupole instrument have low translational kinetic energies, it is possible for 
them to undergo bimolecular reactions with species inside an RF-only quadrupole. These bimolecular 
reactions are often useful for the structural characterization of isomeric species. An example of this is the 
work of Harrison and co-workers [17]. They probed the reactions of CH.iNHVions with isomeric butenes and 

pentenes in the RF-only quadrupole collision cell of a hybrid BEqQ instrument. The mass spectra of the 
products of the ion-molecule reactions were distinct for the various isomers probed. Addition of the amine to 
the olefin followed by fragmentation produced characteristic iminium ions only for terminal olefins without 
substitution at the olefinic carbons (figure B 1.7. 10). 
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Figure Bl.7.10. Three mass spectra showing the results of reactive collisions between a projectile ion 
^n.^n 2 anc j three isomeric butenes. (Taken from Usypchuk L L, Harrison A G and Wang J 1992 Reactive 
collisions in quadrupole cells. Part 1. Reaction of |CHiNH>] +- w ith isomeric butenes and pentenes Org. Mass 
Spectrom. 27 777-82. Copyright John Wiley & Sons Limited. Reproduced with permission.) 
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(C) EQUILIBRIUM MEASUREMENTS 


It is possible to determine the equilibrium constant, K, for the bimolecular reaction involving gas-phase ions 
and neutral molecules in the ion source of a mass spectrometer [18]. These measurements have generally 
focused on three properties, proton affinity (or gas-phase basicity) [19, 20], gas-phase acidity [ 21 ] and 
solvation enthalpies (and free energies) [22, 23]: 


proton affinity: 0]1I * B 2 — B| t B 2 1I 


gas-phase acidity: A\ H + A 7 
solvation: N + frT ^ MN + . 


A7 + A 2 H 


A common approach has been to measure the equilibrium constant, K, for these reactions as a function of 
temperature with the use of a variable temperature high pressure ion source (see section (Bl.7.2) ). The ion 
concentrations are approximated by their abundance in the mass spectrum, while the neutral concentrations 
are known from the sample inlet pressure. A van't Hoff plot of In K versus \IT should yield a straight line 
with slope equal to the reaction enthalpy (figure B 1.7.1 1). Combining the PA with a value for A basicit G at 
one temperature yields a value for AS for the half-reaction involving addition of a proton to a species. While 
quadrupoles have been the instruments of choice for many of these studies, other mass spectrometers can act 
as suitable detectors [19, 20 ], 


^r-'IZ-Ohcalmor 1 



Figure Bl.7.11. Van't Hoff plot for equilibrium data obtained for the reaction of isobutene with ammonia in a 
high pressure ion source (reproduced from data in [19]). 
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(D) SELECTED ION FLOW TUBE 

Another example of the use of quadrupole filters in studying reactive collisions of gaseous ions is the selected 
ion flow tube (SIFT) [24]. This has been perhaps the most widely employed instrument for studying the 
kinetics of bimolecular reactions involving ions. Its development by N G Adams and D Smith sprang from the 
utility of the flowing afterglow (FA) technique (developed in the early 1960s by Ferguson, Fehsenfeld and 
Schmeltekopf [25, 26]) in the study of atmospheric reactions [27], 


A schematic diagram of a SIFT apparatus is shown in figure B 1.7. 12. The instrument consists of five basic 
regions, the ion source, initial quadrupole mass filter, flow tube, second mass filter and finally the detector. 
The heart of the instrument is the flow tube, which is a steel tube approximately 1 m long and 10 cm in 
diameter. The pressure in the flow tube is kept of the order of 0.5 Torr, resulting in carrier gas flow rates of 

-100 m s _1 . Along the flow tube there are orifices that are used to introduce neutral reagents into the flow 
stream. Product ions arriving at the end of the flow tube are skimmed through a small orifice and mass 
analysed with a second quadrupole filter before being detected. The reactions occurring in the flow tube can 
be monitored as a function of carrier gas flow rate, and hence timescale. A detailed description of the 
extraction of rate constants from SIFT experiments is given by Smith and Adams [24]. Examples of the type 
of information obtained with the SIFT technique can be found in a recent series of articles by D Smith and co- 
workers [28, 29 and 30]. 
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Figure Bl.7.12. A schematic diagram of a typical selected-ion flow (SIFT) apparatus. (Smith D and Adams N 
G 1988 The selected ion flow tube (SIFT): studies of ion-neutral reactions Advances in Atomic and Molecular 
Physics vol 24, ed D Bates and B Bederson p 4. Copyright Academic Press, Inc. Reproduced with 
permission.) 

(E) ION-GUIDE INSTRUMENTS 

Another instrument used in physical chemistry research that employs quadrupole mass filters is the guided ion 
beam mass spectrometer [31]. A schematic diagram of an example of this type of instrument is shown in 
figure Bl. 7. 13 . A 
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mass selected beam of ions is introduced into an ion guide, in which their translational energy can be precisely 
controlled down to a few fractions of an electron volt. Normally at these energies, divergence of the ion beam 
would preclude the observation of any reaction products. To overcome this, the ions are trapped in the beam 
path with either a quadrupole or octapole filter operated in RF-only mode (see above). The trapping 
characteristics (i.e., how efficiently the ions are 'guided' along the flight path) of the octapole filter are 
superior to the quadrupole filter and so octapoles are often employed in this type of apparatus. The operational 
principles of an octapole mass filter are analogous to those of the quadrupole mass filter described above. 
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Figure Bl.7.13. A schematic diagram of an ion-guide mass spectrometer. (Ervin K M and Armentrout P B 

1985 Translational energy dependence of Ar + + XY — » ArX + + Y from thermal to 30 eV cm. J. Chem. Phys. 
83 166-89. Copyright American Institute of Physics Publishing. Reproduced with permission.) 

Using a guided ion beam instrument the translational energy dependent reaction cross sections of endothermic 
fragmentation processes can be determined [32], Modelling these cross sections ultimately yields their energy 
thresholds and a great deal of valuable thermochemical information has been derived with this technique. 
Precision of ± 0.2 eV can be obtained for reaction thresholds. Bimolecular reactions can also be studied and 
reaction enthalpies derived from the analysis of the cross section data. 

B1. 7.4.3 THE QUADRUPOLE ION TRAP 

The quadrupole ion trap is the three dimensional equivalent to the quadrupole mass filter. A typical geometry 
consists of two hyperbolic endcap electrodes and a single ring electrode ( figure B 1.7. 14 ). Unlike the 
quadrupole mass filter, however, the ion trap can be used both as a mass selective device or as an ion storage 
device. It is this latter ability that has led to the popularity and versatility of the ion trap. The theoretical 
treatment of ion trajectories inside the ion trap is similar to that presented above for the mass filter, except that 
now the field is no longer zero in the z-axis. It is convenient to use cylindrical coordinates rather than 
Cartesian coordinates and the resulting relationship describing the 
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potential inside the ion trap is given as: 


= ^-a# 


where r Q and z Q are defined as in figure B 1.7. 14 . The relationship i^ = 2^ has usually governed the geometric 

arrangement of the electrodes. The equations of motion for an ion in the ion trap are analogous to those for the 
quadrupole mass filter: 


A 2 r 2r 


- — y([/ - Vcoso>r)r = 

ft 


dp tar 


d 2 z 4e 


and with the analogous substitutions, 


Or = 


ft = 


— 2a r = 

-2cir = 

1 = 


-1 6^t/ 




the Mathieu equation is obtained. 
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Figure Bl.7.14. Schematic cross-sectional diagram of a quadrupole ion trap mass spectrometer. The distance 
between the two endcap electrodes is 2z Q , while the radius of the ring electrode is r Q (reproduced with 
permission of Professor R March, Trent University, Peterborough, ON, Canada). 

The Mathieu equation for the quadrupole ion trap again has stable, bounded solutions corresponding to stable, 
bounded trajectories inside the trap. The stability diagram for the ion trap is quite complex, but a subsection of 
the diagram, corresponding to stable trajectories near the physical centre of the trap, is shown in figure 
Bl.7.15 . The interpretation of the diagram is similar to that for the quadrupole mass filter. 
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Figure Bl.7.15. Stability diagram for ions near the centre of a quadrupole ion trap mass spectrometer. The 
enclosed area reflects values for a z and q z that result in stable trapping trajectories (reproduced with 
permission of Professor R March, Trent University, Peterborough, ON, Canada). 

B1.7.4.4 EXPERIMENTS USING QUADRUPOLE ION TRAPS 

The ion trap has three basic modes in which it can be operated. The first is as a mass filter. By adjusting U and 
Fto reside near the apex of the stability diagram in figure Bl.7.15 only ions of a particular m/z ratio will be 
selected by the trap. An operating line that intersects the stability diagram near the apex (as was done for the 
mass filter) with a fixed UIV ratio describes the operation of the ion trap in a mass scanning mode. A second 
mode of operation is with the potential (|)q applied only to the ring electrode, the endcaps being grounded. This 
allows ions to be selectively stored. The application of an extraction pulse to the endcap electrodes ejects ions 
out of the trap for detection. A third mode is the addition of an endcap potential, -U. This mode permits mass 
selective storage in the trap, followed by storage of all ions. This latter method permits the ion trap to be used 
as a tandem mass spectrometer. 

(A) ION ISOLATION 

One of the principle uses of the ion trap is as a tandem-in-time mass spectrometer. Ions with a particular m/z 
ratio formed in the ion trap, or injected into the trap from an external source, can be isolated by resonantly 
ejecting all other 
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ions. This can be accomplished in a variety of ways. One method involves applying a broad-band noise field 
between the endcap electrodes to resonantly excite (in the axial direction) and eject all ions. The secular 
frequency of the ions to be stored is notched out of this noise field, leaving them in the trap. In another 
method, ions with lower and higher mlz ratio can be ejected by adjusting the amplitude of the RF and DC 
potentials so that the {a ,q) value for the ion of interest lies just below the apex in figure B 1.7. 15 . 

(B) COLLISION-INDUCED DISSOCIATION 

A unique aspect of the ion trap is that the trapping efficiency is significantly improved by the presence of 
helium damping gas. Typical pressures of helium in the trap are -1 milliTorr. Collisions between the trapped 
ions and helium gas effectively cause the ions to migrate towards the centre of the trap where the trapping 
field is most perfect. This causes significant improvements is sensitivity in analytical instruments. The 
presence of the damping gas also permits collision-induced dissociation to be performed. In the manner 
described above, ions with a particular mlz ratio can be isolated in the trap. The axial component of the ion 
motion is then excited resonantly by applying a potential across the endcap electrodes. The amplitude of this 
potential is controlled to prevent resonant ejection of the ions, but otherwise this is a similar experiment to 
that described above for mass selective ejection. The potential (often called the 'tickle' potential), increases 
the kinetic energy of the selected ions which in turn increases the centre-of-mass collision energy with the 
helium damping gas. These collisions now excite the mass selected ions causing them to dissociate. After the 
'tickle' period, the trap is returned to a state where all ions can be trapped and thus the mass spectrum of the 
fragmentation product ions can be obtained. The CID mass spectra that are obtained in this manner are similar 
to those obtained with triple quadrupole instruments, in that they are the result of many low energy collisions 
( figure B. 1.7. 16(a) ). There have been attempts to measure unimolecular reaction kinetics by probing fragment 
ion intensities as a function of time after excitation [33], 
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Figure Bl.7.16. Mass spectra obtained with a Finnigan GCQ quadrupole ion trap mass spectrometer, (a) 

Collision-induced dissociation mass spectrum of the proton-bound dimer of isopropanol [(CH 3 )2CHOH] 2 H + . 
The mlz 121 ions were first isolated in the trap, followed by resonant excitation of their trajectories to produce 
CID. Fragment ions include water loss {mlz 103), loss of isopropanol {mlz 61) and loss of 42 amu {mlz 79). (b) 
Ion-molecule reactions in an ion trap. In this example the mlz 103 ion was first isolated and then resonantly 
excited in the trap. Endo thermic reaction with water inside the trap produces the proton-bound cluster at mlz 
121, while CID produces the fragment with mlz 61. 

With the right software controlling the instrument, it is possible for the above process to be repeated n times, 

i.e., the ion trap is theoretically capable of MS W experiments, though the ion concentration in the trap is 
always the limiting factor in these experiments. 

(C) BIMOLECULAR REACTIONS 

The same procedure as outlined above can be used to study ion-molecule reactions [ 15 , 34 ]. Mass-selected 
ions will react with neutral species inside the trap. The presence of the damping gas means that stable 
(thermodynamic and 
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kinetic) complexes may be formed. Allowing the mass-selected ions to react in the trap, while storing all 


reaction products, allows the course of the reactions to be followed ( figure B 1.7. 16(b) ). Changing the storage 
time allows the relative abundance of reactant and product ions to be monitored as a function of time. This 
introduces the possibility of measuring ion-molecule reaction rate constants with the ion trap. Since the ion 
internal energy distribution in such an experiment is described by a temperature near the ambient temperature 
of the damping gas, ion-molecule reactions can be probed as a function of temperature by raising the trap 
temperature. Resonant excitation of the mass-selected ions (see CID section) effectively raises their internal 
energy, allowing endothermic reactions with neutral species to take place (there will be a limit on the 
endothermicity extending from the limit on internal excitation occurring in the resonant excitation process). 
The degree of axial resonant excitation can be controlled and there have been attempts to relate this to an 
effective 'temperature' [35]. 


B1.7.5 TIME-OF-FLIGHT MASS SPECTROMETERS 

Probably the simplest mass spectrometer is the time-of-flight (TOF) instrument [36]- Aside from magnetic 
deflection instruments, these were among the first mass spectrometers developed. The mass range is 
theoretically infinite, though in practice there are upper limits that are governed by electronics and ion source 
considerations. In chemical physics and physical chemistry, TOF instruments often are operated at lower 
resolving power than analytical instruments. Because of their simplicity, they have been used in many 
spectroscopic apparatus as detectors for electrons and ions. Many of these techniques are included as chapters 
unto themselves in this book, and they will only be briefly described here. 

B1 . 7.5. 1 TIME-OF-FLIGHT EQUA TIONS 

The basic principle behind TOF mass spectrometry [ 36 ] is the equation for kinetic energy, z$V = iwii? 1 , 

where the translational kinetic energy of an ion accelerated out of the ion source by a potential drop, Fis zeV. 
If ions of mass m are given ze Akinetic energy, then the time, t^ the ions take to travel a distance d is given 
by: 


,,=rf^U (B1.7.7) 

In the simplest form, t* reflects the time of flight of the ions from the ion source to the detector. This time is 
proportional to the square root of the mass, i.e., as the masses of the ions increase, they become closer 
together in flight time. This is a limiting parameter when considering the mass resolution of the TOF 
instrument. 

The ion time of flight, as given by equation (B 1.7.7), is oversimplified, however. There are a number of 
factors which change the final measured TOF. These are considered below. 

(A) ION SOURCE RESIDENCE TIME 

A schematic diagram of a simple TOF instrument is shown in figure B 1.7. 17(a) . Since the ion source region 
of any instrument has a finite size, the ions will spend a certain amount of time in the source while they are 
accelerating. If the 
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initial velocity of the ions is v Q , the time spent in the source, t s , is given by 

v m 


h = 


zeV/d, 


where v is the velocity of the ion, d s is the width of the ion source and vld s represents the electric field 
strength inside the source. One obvious way to minimize this effect is to make the field strength as large as 
possible (by increasing Vox decreasing d s ). 
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Figure Bl.7.17. (a) Schematic diagram of a single acceleration zone time-of- flight mass spectrometer, (b) 
Schematic diagram showing the time focusing of ions with different initial velocities (and hence initial kinetic 
energies) onto the detector by the use of a reflecting ion mirror, (c) Wiley-McLaren type two stage 
acceleration zone time-of- flight mass spectrometer. 

Ions generated in the ion source region of the instrument may have initial velocities isotropically distributed in 
three dimensions (for gaseous samples, this initial velocity is the predicted Maxwell-Boltzmann distribution 
at the sample temperature). The time the ions spend in the source will now depend on the direction of their 
initial velocity. At one extreme, the ions may have a velocity v Q in the direction of the extraction grid. The 
time spent in the source will be 
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shorter than those with no component of initial velocity in this direction: 


h ~ zeV/d, • 

At the other extreme, ions with initial velocities in the direction opposite to the accelerating potential must 
first be turned around and brought back to their initial position. From this point their behaviour is the same as 
described above. The time taken to turn around in the ion source and return to the initial position (7 r ) is given 
by: 


»VM 


The final velocity of these two ions will be the same, but their final flight times will differ by the above turn- 
around time, t . This results in a broadening of the TOF distributions for each ion mass, and is another limiting 
factor when considering the mass (time) resolution of the instrument. 

The final total ion time of flight in the TOF mass spectrometer with a single accelerating region can be written 
in a single equation, taking all of the above factors into account. 

(2m) ]f2 [(Uo + zeVfd*) l t 2 ± ui /2 ] (2m) ]f2 d 

'toi-- = rrr; + 


where now the initial velocity has been replaced by the initial translation energy, [/q. This is the equation 
published in 1955 by Wiley and McLaren [ 37 ] in their seminal paper on TOF mass spectrometry. 

(B) ENERGY FOCUSING AND THE REFLECTRON TOF INSTRUMENT 

The resolution of the TOF instrument can be improved by applying energy focusing conditions that serve to 
overcome the above stated spread in initial translational energies of the generated ions. While there have been 
several methods developed, the most successful and the most commonly used method is the reflectron. The 
reflectron is an ion mirror positioned at the end of the drift tube that retards the ions and reverses their 
direction. Ions with a higher kinetic energy penetrate into the mirror to a greater extent than those with lower 
kinetic energies. The result is a focusing (in time) at the detector of ions having an initial spread of kinetic 
energies ( figure B 1.7. 17(b) ). The mirror also has the effect of increasing the drift length without increasing 
the physical length of the instrument. 

(C) SPATIAL FOCUSING 

Another consideration when gaseous samples are ionized is the variation in where the ions are formed in the 
source. The above arguments assumed that the ions were all formed at a common initial position, but in 
practice they may be formed anywhere in the acceleration zone. The result is an additional spread in the final 
TOF distributions, since ions 
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made in different locations in the source experience different accelerating potentials and thus spend different 


times in the source and drift tube. 

Spatial focusing in a single acceleration zone linear TOF instrument naturally occurs at a distance of 2d s along 
the drift tube, which is seldom a practical distance for a detector. Wiley and McLaren described a two stage 
accelerating zone that allows spatial focusing to be moved to longer distances. An example of such an 
instrument is shown in figure B 1.7. 17(c) . The main requirement is for the initial acceleration region to have a 
much weaker field than the second. The equation relating the instrumental parameters in figure B 1.7. 17(c) is: 


-^0-fWS 


where k — (^ tl] E\ + diE2)f{%diE\\ and E^ and E 2 are the field strengths in regions 1 and 2. So, if the 
physical dimensions of the instrument d^ d 2 and d are fixed, a solution can be obtained for the relative field 
strengths necessary for spatial focusing. 

(D) OTHER IONIZATION SOURCES 

Other methods of sample introduction that are commonly coupled to TOF mass spectrometers are MALDI, 
SIMS/FAB and molecular beams (see section (Bl.7.2) ). In many ways, the ablation of sample from a surface 
simplifies the TOF mass spectrometer since all ions originate in a narrow space above the sample surface. 
This reduces many of the complications arising from the need for spatial focusing. Also, the initial velocity of 
ions generated are invariably in the TOF direction. 

Molecular beam sample introduction (described in section (Bl.7.2) ), followed by the orthogonal extraction of 
ions, results in improved resolution in TOF instruments over effusive sources. The particles in the molecular 
beam typically have translational temperatures orthogonal to the beam path of only a few Kelvin. Thus, there 
is less concern with both the initial velocity of the ions once they are generated and with where in the ion 
source they are formed (since the particles are originally confined to the beam path). 

B1. 7.5.2 EXPERIMENTS USING TOF MASS SPECTROMETERS 

Time-of- flight mass spectrometers have been used as detectors in a wider variety of experiments than any 
other mass spectrometer. This is especially true of spectroscopic applications, many of which are discussed in 
this encyclopedia. Unlike the other instruments described in this chapter, the TOF mass spectrometer is 
usually used for one purpose, to acquire the mass spectrum of a compound. They cannot generally be used for 
the kinds of ion-molecule chemistry discussed in this chapter, or structural characterization experiments such 
as collision-induced dissociation. However, they are easily used as detectors for spectroscopic applications 
such as multi-photoionization (for the spectroscopy of molecular excited states) [38], zero kinetic energy 
electron spectroscopy [ 39 ] (ZEKE, for the precise measurement of ionization energies) and coincidence 
measurements (such as photoelectron-photoion coincidence spectroscopy [ 40 ] for the measurement of ion 
fragmentation breakdown diagrams). 
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B1.7.6 FOURIER TRANSFORM ION CYCLOTRON RESONANCE MASS 
SPECTROMETERS 


Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometry is another in the class of trapping 
mass spectrometers and, as such is related to the quadrupole ion trap. The progenitor of FT-ICR, the ICR mass 
spectrometer, originated just after the Second World War when the cyclotron accelerator was developed into a 
means for selectively detecting ions other than protons. At the heart of ICR is the presence of a magnetic field 
that confines ions into orbital trajectories about their flight axis. Early ICR experiments mainly took 
advantage of this trapping and were focused on ion-molecule reactions. The addition of the three-dimensional 
trapping cell by Mclver in 1970 [41, 42] led to improved storage of ions. In 1974 Comisarow and Marshall 
introduced the Fourier transform detection scheme that paved the way for FT-ICR [43 , 44] which is now 
employed in virtually all areas in physical chemistry and chemical physics that use mass spectrometry. 

B1. 7.6.1 ION MOTION IN MAGNETIC AND ELECTRIC FIELDS 

Figure B 1.7. 18(a) shows a typical FT-ICR mass spectrometer cubic trapping cell. The principal axes are 
shown in the diagram, along with the direction of the imposed magnetic field. To understand the trajectory of 
an ion in such a field, the electrostatic and magnetic forces acting on the ion must be described [45]. If only a 
magnetic field is present, the field acts on the ions such that they take up circular orbits with a frequency 
defined by the ion mass: 

zcB 
m 

where ze is the charge on the ion, B is the magnetic field strength (in tesla), m is the ion mass and a> is the 

_i c 

ion's cyclotron frequency (in rad s ). In modern FT-ICR instruments, magnetic fields of 6 T or more are 

common, with the latest being upwards of 20 T. These high field magnets are usually superconducting 
magnets, not unlike modern NMR instruments. Without trapping electrodes, the ions would describe a spiral 
trajectory and be lost from the trap. 
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Figure Bl.7.18. (a) Schematic diagram of the trapping cell in an ion cyclotron resonance mass spectrometer: 
excitation plates (E); detector plates (D); trapping plates (T). (b) The magnetron motion (co m ) due to the 
crossing of the magnetic and electric trapping fields is superimposed on the circular cyclotron motion co c ) 
taken up by the ions in the magnetic field. Excitation of the cyclotron frequency results in an image current 
being detected by the detector electrodes which can be Fourier transformed into a secular frequency related to 
the mlz ratio of the trapped ion(s). 

The circular orbits described above are perturbed by an electrostatic field applied to the two endcap trapping 
electrodes (figure B 1.7. 18(b)). In addition to the trapping motion, the crossed electric and magnetic fields 
superimpose a magnetron motion, co^, on the ions (figure B1.7. 18(b)). An idealized trapping cell would 
produce a DC quadrupolar potential in three dimensions. The resulting ion motion is independent of the axial 
and radial position in the cell. In practice, however, the finite size of the trapping cell produces irregularities 
in the potential that affect ion motion. 

(A) ION TRAPPING 

The component of the DC quadrupolar potential in the z-axis direction is described by the following equation. 
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where k is a constant and Vj is the trapping potential applied to the endcap electrodes of the trapping cell. The 
derivative of this relationship yields a linear electric field along the z-axis. 

E(z) = -Jtz. 
From this relationship, an expression can be derived for the trapping frequency, <D r 


■(?)■ 


L/2 


which again is a function of ion charge and mass. In theory, this trapping frequency is harmonic and 
independent of the ion's position in the trapping cell, but in practice, the finite size of the trapping cell 
produces irregularities in oo r The efficiency with which ions are trapped and stored in the FT-ICR cell 
diminishes as the pressure in the cell increases (as opposed to the quadrupole ion trap, which requires helium 
buffer gas for optimal trapping). For this reason, FT-ICR instruments are typically operated below 10~ 5 Torr 
(and usually closer to 10 -8 Torr). 

(B) ION DETECTION 

In the other types of mass spectrometer discussed in this chapter, ions are detected by having them hit a 
detector such as an electron multiplier. In early ICR instruments, the same approach was taken, but FT-ICR 
uses a very different technique. If an RF potential is applied to the excitation plates of the trapping cell ( figure 
B 1.7. 18(b) ) equal to the cyclotron frequency of a particular ion mlz ratio, resonant excitation of the ion 
trajectories takes place (without changing the cyclotron frequency). The result is ion trajectories of higher 


kinetic energy and larger radii inside the trapping cell. In addition, all of the ions with that particular mlz ratio 
take up orbits that are coherent (whereas they were all out of phase prior to resonant excitation). This coherent 
motion induces an image current on the detector plates of the trapping cell that has the same frequency as the 
cyclotron frequency ( figure B1.7. 18(b) ). This image current is acquired over a period of time as the ion packet 
decays back to incoherent motion. The digitized time-dependent signal can be Fourier transformed to produce 
a frequency spectrum with one component, a peak at the cyclotron frequency of the ions. It is possible to 
resonantly excite the trajectories of all ions in the trapping cell by the application of a broad-band RF 
excitation pulse to the excitation electrodes. The resulting time-dependent image current, once Fourier 
transformed, yields a frequency spectrum with peaks due to each ion mlz in the trapping cell, and hence a 
mass spectrum. The intensities of the peaks are proportional to the concentrations of the ions in the cell. 

B1. 7.6.2 EXPERIMENTS USING FT-ICR 

In many respects, the applications of FT-ICR are similar to those of the quadrupole ion trap, as they are both 
trapping instruments. The major difference is in the ion motion inside the trapping cell and the waveform 
detection. In recent 
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years there have been attempts to use waveform detection methods with quadrupole ion traps [46]. 

(A) COLLISION-INDUCED DISSOCIATION AND ION-MOLECULE REACTIONS 

As with the quadrupole ion trap, ions with a particular mlz ratio can be selected and stored in the FT-ICR cell 
by the resonant ejection of all other ions. Once isolated, the ions can be stored for variable periods of time 
(even hours) and allowed to react with neutral reagents that are introduced into the trapping cell. In this 
manner, the products of bi-molecular reactions can be monitored and, if done as a function of trapping time, it 
is possible to derive rate constants for the reactions [47]. Collision-induced dissociation can also be performed 
in the FT-ICR cell by the isolation and subsequent excitation of the cyclotron frequency of the ions. The extra 
translational kinetic energy of the ion packet results in energetic collisions between the ions and background 

o 

gas in the cell. Since the cell in FT-ICR is nominally held at very low pressures (10 Torr), CID experiments 
using the background gas tend not to be very efficient. One common procedure is to pulse a target gas (such 
as Ar) into the trapping cell and then record the CID mass spectrum once the gas has been pumped away. CID 
mass spectra obtained in this way are similar to those obtained on triple quadrupole and ion trap instruments. 

(B) KINETIC STUDIES 

Aside from the bimolecular reaction kinetics described above, it is possible to measure other types of kinetics 
with FT-ICR. Typically, for two species to come together in the gas phase to form a complex, the resulting 
complex will only be stable if a three body collision occurs. The third body is necessary to lower the internal 
energy of the complex below its dissociation threshold. Thus, complexes are generally made in high pressure 
ion sources. It is possible, however, for the complex to radiatively release excess internal energy. Dunbar and 
others [48, 49] have studied and modelled the kinetics of such 'radiative association' reactions in FT-ICR 
trapping cells because of the long time scales of the experiments (reaction progress is usually probed for many 
minutes). The rate constants for photon emission derived from the experimentally observed rate constants tend 

to be between 10 and 100 s . It has also been found that ions can be dissociated by the absorption of 
blackbody radiation in the trapping cell (BIRD — blackbody infrared radiative dissociation) [50]. This 

technique, which is only feasible at the low pressures (<10 -8 Torr) and long trapping times inside the FT-ICR 
cell, allows the investigator to measure unimolecular decay rate constants of the order of 10 s . Another 
approach to dissociation kinetics is time-resolved photodissociation [51]. Ions are photodissociated with laser 


light in the visible and near UV and the product ion intensity is monitored as a function of time. Rate 
constants from 10 -3 s _1 and higher can be measured with good precision using this technique. 

(C) THERMOCHEMICAL STUDIES: THE BRACKETING METHOD 

In an earlier section, measurements were described in which the equilibrium constant, K, for bimolecular 
reactions involving gas-phase ions and neutral molecules were determined. Another method for determining 
the proton or other affinity of a molecule is the bracketing method [52]. The principle of this approach is quite 
straightforward. Let us again take the case of a proton affinity determination as an example. In a reaction 

between a protonated base, B 1 H + and a neutral molecule, B-, proton transfer from B 1 to B 2 will presumably 
occur only if the reaction is exothermic (in other words, if the PA of B 2 is greater than that of B^. So, by 

choosing a range of bases B 1 covering a range of PA values and reacting them with a molecule of unknown 

PA, B 2 , the reactions leading to B~H + can be monitored by the presence of this latter ion in the mass 
spectrum. The PA of B 2 can quicldy be narrowed down provided the reference values are well established. 

The nature of this experiment requires that a bimolecular reaction takes place between the reference 
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base and unknown and thus these experiments are most commonly carried out in FT-ICR mass spectrometers, 
though quadrupole ion trap instruments have also been used. 
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B1.8 Diffraction: x-ray, neutron and electron 

Edward Prince 


B1.8.1 INTRODUCTION 

Diffraction is the deflection of beams of radiation due to interference of waves that interact with objects 
whose size is of the same order of magnitude as the wavelengths. Molecules and solids typically have 

interatomic distances in the neighbourhood of a few Angstroms (1 A = 10~ 10 m), comparable to the 
wavelengths of x-rays with energies of the order of 10 keV. Neutrons and electrons also have wave properties, 

with wavelengths given by the de Broglie relation, X = h/mv, where h is Planck's constant, m is the mass of 
the particle and v is its velocity. Neutron diffraction applications use neutrons with wavelengths in the range 
from 1 to 10 A; most electron diffraction applications use wavelengths of the order of 0.05 A, although low 
energy electron diffraction (LEED), used for studies of surfaces, employs electrons with wavelengths in the 
neighbourhood of 1 A. All three techniques have extensive applications in physics, chemistry, materials 
science, mineralogy and molecular biology. Although perhaps the most familiar application is determination 
of the structures of crystalline solids, there are also applications to structural studies of amorphous solids, 
liquids and gases. Diffraction also plays an important role in imaging techniques such as electron microscopy. 


B1.8.2 PRINCIPLES OF DIFFRACTION 


B1 .8.2.1 THE ATOMIC SCATTERING FACTOR 

We shall first discuss the diffraction of x-rays from isolated atoms, because this case illustrates principles that 
can be generalized to most practical applications. Consider [1] an atom consisting of a nucleus surrounded by 
a spherically symmetric cloud of electrons that can be represented by a density function p(r) and a plane wave 
that can be described by a vector s. normal to the wave front with magnitude 1/X, where X is the wavelength. 
According to Huygens's principle, each point within the electron cloud is the source of a spherical wavelet 
whose amplitude is proportional to p(r). At a distance large compared with the dimensions of the atom this 
spherical wavelet will approximate a new plane wave and we are interested in the amplitude of the wave 
formed by the interference of the wavelets originating at all points within the electron cloud. Referring to 
figure B 1.8.1 the difference in pathlength for a wave propagating in the direction of sJ\s^ = | s.|) and 
originating at point r, relative to the wave originating at the origin, is Al = Xr -(sr- s^, and the difference in 
phase is therefore A(|) = 2nir(sr- s^. The atomic scattering factor, f{sX is the amplitude of the resultant wave 
in the direction parallel to s* which is the vector sum of all contributions. This is given by 

f{ Sf ) = C J p{r) exp[27rir - (s f - *;)]dr (B1.8.1) 

where the integral is over the volume of the atom and C is a proportionality constant. f[s^) is therefore 
proportional to the Fourier transform of the electron density distribution, p(r). 


To evaluate this integral, first let Q = 2n(sr- s^, let Q = \Q\, let r = | r\ and let a be the angle between r and Q. 
Now \s f - s.\ = 2|s.|sin0 = 2 sin Q/X, so that 

r - [Sf — hS;) = 2t'\3j | sin 6 coso = 2r sin 8 cos a/X . 


The area of a ring around Q with width da at radius r is 27ir sina,dot, so 

dr — 2nr 3 .5inof da dr. 

Because p(r) is spherically symmetric, the number of electrons in this volume element is p(r),dx. Letting x = 
Qr cos a, da = -dx/(Qr sina). Then, making all substitutions, 

f(Q) = C / — — p(r)6r / exp(i.r)d.r (B1.8.2a) 

= 4nC I r 2 pir) — -^ dr. (B1.8.2&) 

Jo Qr 

If = 0, so that Q = 0, this reduces to 

f(Q) = 4ttC j r 2 p(r) dr (B1.8.3) 

Jo 

so that the integral is the total charge in the electron cloud. The constant C has the units of a length, and is 


conventionally chosen so that f(Q) is a multiple of the 'classical electron radius', 2.818x10 m. 



Figure Bl.8.1. The atomic scattering factor from a spherically symmetric atom. The volume element is a ring 
subtending angle a with width da at radius r and thickness dr. 

The electron distribution, p(r), has been computed by quantum mechanics for all neutral atoms and many ions 
and the values of f(Q), as well as coefficients for a useful empirical approximation, are tabulated in the 
International Tables for Crystallography vol C [2]. In general,/^ is a maximum equal to the nuclear charge, 
Z, for Q = and decreases monotonically with increasing Q. 


Because the neutron has a magnetic moment, it has a similar interaction with the clouds of unpaired d or f 
electrons in magnetic ions and this interaction is important in studies of magnetic materials. The magnetic 
analogue of the atomic scattering factor is also tabulated in the International Tables [3]. Neutrons also have 
direct interactions with atomic nuclei, whose mass is concentrated in a volume whose radius is of the order of 

10 -5 times the characteristic neutron wavelength. Thus p(r) differs from zero only when sin(Qr)/Qr is 
effectively equal to one, so that f(Q) is a constant independent of Q. Whereas the x-ray interaction depends on 
the total number of electrons in the cloud, and therefore on the nuclear charge, the neutron's interaction with a 
nucleus results from nuclear forces that vary in a haphazard manner from one isotope to another, lie within a 
rather narrow range and can even be negative, meaning that the Huygens wavelet from such a nucleus has a 
phase differing by n from the phase of one from a nucleus whose scattering factor is positive. The neutron 
scattering factors, or scattering lengths, conventionally denoted by b, have magnitudes in the range 1-10 x 
10" 15 m[4]. 

The atomic scattering factor for electrons is somewhat more complicated. It is again a Fourier transform of a 
density of scattering matter, but, because the electron is a charged particle, it interacts with the nucleus as well 
as with the electron cloud. Thus p(r) in equation (B1.8.2&) is replaced by cp(r), the electrostatic potential of an 
electron situated at radius r from the nucleus. Under a range of conditions the electron scattering factor, f(Q), 
can be represented in terms 


of the x-ray atomic scattering factor by the so-called Mott-Bethe formula, 


2 

MQ) = 2jt^[z - fAQWQ 2 (B1-8.4) 

where m is the mass of the electron, e is its charge and s Q is the permittivity of free space. 
B1. 8.2.2 DIFFRACTION FROM CLUSTERS OF ATOMS 

The derivation of equation (B 1.8.1) makes no use of the assumption of spherical symmetry and it is, in fact, a 
very general result that the amplitude of a scattered wave is the Fourier transform of a density of scattering 
matter. Although there are examples of experimental observations of scattering from isolated atoms, one 
being the scattering of neutrons by a dilute solid solution of paramagnetic atoms in a diamagnetic matrix, 
diffraction from molecules, such as that of electrons in a gas, or from particles of contrasting density in a 
uniform medium are much more important. Examples of the latter are colloidal suspensions in a fluid and 
precipitates in an alloy. Note that the particles of contrasting density can also be voids: Babinet's principle 
requires that the amplitude of a scattered wave due to a negative difference be the complex conjugate of that 
due to a positive difference and, because the intensity of scattered radiation is proportional to the square of the 
modulus of the amplitude, the diffraction patterns are indistinguishable. 

If individual scattering particles are far enough apart and their spatial distribution is such that the relative 
phases of their contributions to a scattered wave are random, the intensity distribution in the diffraction 
pattern will be the sum of contributions from all particles [5]. If the particles are identical (monodisperse) but 
have random orientations, or if they differ in size and shape (poly disperse), the resulting pattern will reflect an 
ensemble average over the sample. In either case there will be a spreading of the incident beam, so-called 
small-angle scattering. How small the angles are depends on the wavelength of the radiation and the size of 
the particles: long wavelengths give larger angles, but they also tend to be more strongly absorbed by the 
sample, so that there is a trade-off between resolution and intensity. 

B1 .8.2.3 DIFFRACTION FROM CRYSTALLINE SOLIDS 

(A) BRAGG'S LAW 

The diffraction of x-rays was first observed in 1912 by Laue and coworkers [6]. A plausible, though 
undocumented, story says that the classic experiment was inspired by a seminar given by P P Ewald, whose 
doctoral thesis was a purely theoretical study of the interaction of electromagnetic waves with an array of 
dipoles located at the nodes of a three-dimensional lattice. At the time it was hypothesized that crystals were 
composed of parallelepipedal building blocks, unit cells, fitted together in three dimensions and that x-rays 
were short- wavelength, electromagnetic radiation, but neither hypothesis had been confirmed experimentally. 
The Laue experiment confirmed both, but the application of x-ray diffraction to the determination of crystal 
structure was introduced by the Braggs. 

W L Bragg [7] observed that if a crystal was composed of copies of identical unit cells, it could then be 
divided in many ways into slabs with parallel, plane faces whose distributions of scattering matter were 
identical and that if the pathlengths travelled by waves reflected from successive, parallel planes differed by 
integral multiples of the 


wavelength there would be strong, constructive interference. Figure Bl.8.2 shows a projection of parallel 
planes separated by a distance d and a plane wave with wavelength X whose normal makes an angle with 
these reflecting planes. It is evident that the crests of waves reflected from successive planes will be in phase 
if X = 2d sin0, a relation that is known as Bragg' s law. (It appears that W H Bragg played no role in the 
formulation of this relation, so it is correctly Bragg 's law, not Braggs' law. In textbooks the relation is often 
stated in the form nX = 2d sin0, where n is the 'order' of the reflection, but in crystallography the order is 
conventionally incorporated in the definition of d.) 



Figure Bl.8.2. Bragg's law. When X = 2d sin 0, there is strong, constructive interference. 


(B) THE RECIPROCAL LATTICE 


The vertices of the unit cells form an array of points in three-dimensional space, a space lattice. The edges of 
the parallelepiped can be defined by three non-coplanar vectors, a, b and c, and then any lattice point can then 
be defined by a vector r = ua + vb + wc, where w, v and w are integers. Any point in the crystal can be 
specified by a vector r + jc, where x represents a vector within the unit cell. The periodicity of the crystal 
specifies that p(r+x) = p(x). The families of parallel planes are specified by their Miller indices, 
conventionally denoted by h, k and /. The three points that define the plane closest to the origin are a/h, b/k 
and c/l, with the understanding that if any of the indices is equal to zero, the plane is parallel to the 
corresponding vector. To find solutions to the Bragg equation it is necessary to determine the value of d. If a, 
b and c are orthogonal, this is easy, but for most crystals they are not, and the computation is greatly 
simplified by use of the reciprocal lattice. The reciprocal lattice, which was introduced by J W Gibbs [8] and 
applied to the crystallographic problem by Ewald [9], is defined by three vectors, a* = (b x c)/V, b* = (c x 
a)/Vand c* = (a* b)/V, where V= a • b x c is the volume of the unit cell. It can easily be shown [ 10 ] that the 
vector d* = ha* + kb* + Ic* is perpendicular to the planes defined by the Miller indices h, k and /, and that \d*\ 
= lid. Bragg's law then becomes sin0=A,|rf*|/2. Note that if d* = s^s f , as shown in figure Bl.8.3 this condition 
will be satisfied on the surface of a sphere (the Ewald sphere) passing through the origin of the reciprocal 
lattice whose centre is at the point -s f in reciprocal space. 
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Figure Bl.8.3. Ewald's reciprocal lattice construction for the solution of the Bragg equation. If sy 
vector of the reciprocal lattice, Bragg's law is satisfied for the corresponding planes. This occurs if a 
reciprocal lattice point lies on the surface of a sphere with radius \IX whose centre is at -s.. 

(C) THE STRUCTURE AMPLITUDE 


s f is a 


The amplitude and therefore the intensity, of the scattered radiation is determined by extending the Fourier 
transform of equation (B 1.8.1) over the entire crystal and Bragg's law expresses the fact that this transform 
has values significantly different from zero only at the nodes of the reciprocal lattice. The amplitude varies, 
however, from node to node, depending on the transform of the contents of the unit cell. This leads to an 
expression for the structure amplitude, denoted by F(hkl), of the form 


F{hkI) = C f dz f d.v / pU^y^JexpLTi^ + A^+i^ldx 


(B1.8.5) 


where a = \a\, b = \b\, c = \c\ and x, y and z are the coordinates of a point in a (not necessarily orthogonal) 
Cartesian system defined by a, b and c. Making use of the fact that the unit cell contents consist of atoms, 
each of which has its own atomic scattering factor, f(d*), this can be written 


p(AJtO = C^/;(J*)exp[27ri^- + Jt^ + /-^l 


(B1.8.6) 


where d* = \d*\ and the sum is over N atoms in the unit cell. 


Equation (Bl.8.6) assumes that all unit cells really are identical and that the atoms are fixed in their 
equilibrium positions. In real crystals at finite temperatures, however, atoms oscillate about their mean 
positions and also may be displaced from their average positions because of, for example, chemical 
inhomogeneity. The effect of this is, to a first approximation, to modify the atomic scattering factor by a 
convolution of p(r) with a trivariate Gaussian density function, resulting in the multiplication of f-(d*) by exp 
(-M), where 


M = $7l 2 U 2 j$\n 2 efk 2 (B1.8.7) 

and ir.is the mean square displacement of the centre of atomy parallel to rf*. The factor exp(-M) is the atomic 

displacement factor, or, in older literature, the temperature factor or, for early workers in the field, the 
Debye-Waller factor. 

(D) DIFFRACTION OF NEUTRONS FROM NONMAGNETIC AND MAGNETIC CRYSTALS 

Diffraction of neutrons [ 11 ] from nonmagnetic crystals is similar to that of x-rays, with the neutron atomic 
scattering factor substituted for the x-ray one. For magnetic crystals below their ordering temperatures, 
however, the neutron's magnetic moment interacts with an ordered array of electron spins, with the strength 
of the interaction being proportional to sin a, where a is the angle between rf* and the electron spin axis, and 
the phases of the wavelets originating at different atoms depend on the relative orientations of their magnetic 
moments as well as on the path length. The electron spins may all point in the same direction, aferromagnet, 
or those on different atoms may point in opposite directions, in equal numbers and in an ordered arrangement, 
an antiferromagnet, or they may be arranged in more complicated ways having a net magnetic moment, a so- 
called ferrimagnet. In the absence of an applied magnetic field the crystal tends to divide into domains in 
which the electron spins point along different, symmetry-equivalent directions and the diffracted intensities 
are averaged over the various possible values of the angle between the magnetic moment and rf*. 
Furthermore, the magnetic diffraction and the nuclear diffraction do not interfere with one another, and the 
nuclear and magnetic intensities simply add together, although in many cases the magnetic unit cell is larger 
than the nuclear unit cell, which produces additional diffraction peaks. 

If a magnetic field is applied to the crystal, the domains become aligned and the nuclear and magnetic 
wavelets do interfere with one another. Then the amplitude of the diffracted wave depends on the orientation 
of the neutron spin. In special cases, Co Q 92 Fe Q 08 is an example, the interference may be totally destructive 
for one neutron spin state and the diffracted beam becomes polarized. If a crystal of one of these materials is 
used as a monochromator, the diffraction of this polarized beam is a particularly sensitive probe for the study 
of magnetic structures. 

(E) DIFFRACTION OF ELECTRONS FROM CRYSTALS 

Diffraction of electrons from single crystals [ 12 ] differs from the diffraction of x-rays or neutrons because the 
interaction of electrons with matter is much stronger. Conventional electron diffraction is performed as an 
adjunct to electron microscopy. In fact, the same instrument, a transmission electron microscope, commonly 
serves for both, with the configuration of the electron optics determining whether a diffraction pattern is 
magnified, or the diffracted beams are recombined to form an image. Because of the strong interaction, 
specimens must be thin, and accelerating 


voltages must be large, of the order of 100 keV, so that the corresponding wavelength is much shorter than 
interatomic distances in a crystal, and the Ewald sphere is large compared with the spacing between points of 
the reciprocal lattice. As a result, when the direction of the incident beam is perpendicular to a reciprocal 
lattice plane, the small spread of the reciprocal lattice due to mosaic spread in the crystal produces a 
diffraction pattern that consists of spots with the structure of the reciprocal lattice plane. 

Another mode of electron diffraction, low energy electron diffraction or LEED [13], uses incident beams of 
electrons with energies below about 100 eV, with corresponding wavelengths of the order of 1 A. Because of 
the very strong interactions between the incident electrons and the atoms in the crystal, there is very little 
penetration of the electron waves into the crystal, so that the diffraction pattern is determined entirely by the 


arrangement of atoms close to the surface. Thus, in contrast to high energy diffraction, where the pattern is 
formed by transmission through a thin, crystalline film, and the diffracted beams make angles of only a 
fraction of a degree with the incident beam, the pattern in LEED is formed by reflection from a surface, and 
the diffracted beam may be in any direction away from the surface. Furthermore, because there is no 
significant interference with scattered wavelets coming from below the surface, the reciprocal lattice can be 
considered to consist not of points but of lines perpendicular to the surface that will always intersect the 
Ewald sphere, so that even with a monochromatic incident beam there will always be a pattern of spots on a 
photographic plate or a fluorescent screen. 

Although the structure of the surface that produces the diffraction pattern must be periodic in two dimensions, 
it need not be the same substance as the bulk material. Thus LEED is a particularly sensitive tool for studying 
the structures and properties of thin layers adsorbed epitaxially on the surfaces of crystals. 

B1. 8.2.4 DIFFRACTION FROM NONCRYSTALLINE SOLIDS 

We have seen that the intensities of diffraction of x-rays or neutrons are proportional to the squared moduli of 
the Fourier transform of the scattering density of the diffracting object. This corresponds to the Fourier 
transform of a convolution, P(s), of the form 


-/ 


P{&) = I p(r)p(r + s)dr. (B1.8.8) 

The integrand in this expression will have a large value at a point r if p(r) and p(r+s) are both large, and P(s) 
will be large if this condition is satisfied systematically over all space. It is therefore a self- or autocorrelation 
function of p(r). If p(r) is periodic, as in a crystal, P(s) will also be periodic, with a large peak when s is a 
vector of the lattice and also will have a peak when s is a vector between any two atomic positions. The 
function P(s) is known as the Patterson function, after A L Patterson [14], who introduced its application to 
the problem of crystal structure determination. 

(A) DIFFRACTION FROM GLASSES 

There are two classes of solids that are not crystalline, that is, p(r) is not periodic. The more familiar one is a 
glass, for which there are again two models, which may be called the random network and the random 
packing of hard spheres. An example of the first is silica glass or fused quartz. It consists of tetrahedral Si0 4 
groups that are linked at their vertices by Si-O-Si bonds, but, unlike the various crystalline phases of Si0 2 , 
there is no systematic relation between 


the orientations of neighbouring tetrahedra. In the random packing of spheres there is no regular arrangement 
of atoms even at short range and the coordination of any particular atom may have a wide variety of 
configurations. The two types of glass have similar diffraction properties, so we do not need to discuss them 
separately. 

If the material is not periodic (but is isotropic), the integral in equation (Bl.8.8) becomes spherically 

symmetric, and reduces for large values of s (= |s|) to a constant equal to the average value of p(r) . In either 
the sphere -packing model or the random-network model, however, there is always a shortest interatomic 

distance and p(r) falls to a small value between the atoms. The integrand will then have, on average, small 
values when s is equal to an atomic radius and large values when s is equal to a typical interatomic distance. 
The integrand and therefore P(s), will have smaller ripples as s increases through additional coordination 
shells. Because the diffracted intensity is proportional to the Fourier transform of P(s), it will also have broad 


maxima and minima as sin Q/X increases. 
(B) QUASICRYSTALS 

The other type of noncrystalline solid was discovered in the 1980s in certain rapidly cooled alloy systems. D 
Shechtman and coworkers [15] observed electron diffraction patterns with sharp spots with fivefold rotational 
symmetry, a symmetry that had been, until that time, assumed to be impossible. It is easy to show that it is 
impossible to fill two- or three-dimensional space with identical objects that have rotational symmetries of 
orders other than two, three, four or six, and it had been assumed that the long-range periodicity necessary to 
produce a diffraction pattern with sharp spots could only exist in materials made by the stacking of identical 
unit cells. The materials that produced these diffraction patterns, but clearly could not be crystals, became 
known as quasicrystals. 

Although details of quasicrystal structure remain uncertain, the circumstances under which diffraction patterns 
with 'impossible' symmetries can occur have become clear [16]. It is impossible to construct an object that 
has long-range periodicity using identical units with these symmetries, but it is not necessary for the object 
itself to have that symmetry. It is only necessary that its Patterson function be symmetric. The electron 
diffraction patterns observed by Shechtman actually have the symmetry of a regular icosahedron, and it is 
possible to build a structure with this symmetry using two rhombohedra, each having faces whose acute angle 
corners have an angle, a, equal to 2 arctan[2/(l + */5)] = 63.435^. One of them has three acute angle corners 

meeting at a vertex, making a prolate rhombohedron, while the other has three obtuse angle corners meeting at 
a vertex, making an oblate rhombohedron. Large objects made from these two rhombohedra contain vectors 
parallel to all of the fivefold axes of the regular icosahedron, although different subsets of them appear in 
different, finite regions. More importantly, although there is no long range periodicity, departures from 
periodicity are bounded, which produces, as in a crystal, families of parallel planes with alternately higher and 
lower density. This in turn produces the observed, sharp diffraction spots. 


B1.8.3 STRUCTURE DETERMINATION 

We have thus far discussed the diffraction patterns produced by x-rays, neutrons and electrons incident on 
materials of various kinds. The experimentally interesting problem is, of course, the inverse one: given an 
observed diffraction pattern, what can we infer about the structure of the object that produced it? Diffraction 
patterns depend on the Fourier transform of a density distribution, but computing the inverse Fourier 
transform in order to determine the density distribution is difficult for two reasons. First, as can be seen from 
equation (B 1.8.1) , the Fourier transform is 
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defined for all values of s« but it can be measured only for values of \s^s f \ less than 2/X. For practical reasons 
X cannot be arbitrarily small, so that the Fourier transform can never be measured over its entire range. 
Second, the value of the Fourier transform is in general a complex number. Denoting -h, -k and -/ by , and , 

respectively, equation (Bl.8.6) shows that, for a crystal, . Because the intensity is proportional to \F(hkl)\ 2 = 
F(hkl)F(hht)* , it will be the same independent of the phase of F(hkl). As a result, the structural information 
that can be determined is either restricted to averaged properties that do not depend on phase information, or 
the phase must be determined by methods other than the simple measurement of diffraction intensities. 

B1. 8.3.1 SMALL-ANGLE SCATTERING 

Materials have many properties that are important, scientifically and technologically, that do not depend on 
the details of long-range structure. For example, consider a solution of globular macromolecules in a solvent 


of contrasting scattering density. If the solution is not too highly concentrated, so that intermodular 
interactions can be neglected, the diffraction pattern will be the sum of the diffraction patterns of all 
individual molecules. Under these conditions all diffracted radiation makes a small angle with the incident 
beam. Although all molecules are identical, they can have all possible orientations relative to the incident 
beam, so the diffraction will be that from a spherically averaged distribution. The intensity of diffraction is 
proportional to the squared modulus of the Fourier transform of the density distribution, which is the Fourier 
transform of its Patterson function. An expression for the intensity, I(Q), can be derived by substituting P(r) 
for p(r) in equation (B 1.8.26) , giving 

(B1.8.9a) 
where C is a scale factor dependent on the conditions of the experiment. This can be rewritten 

(B1.8.96) 

The inverse of this Fourier sine transform is 

(B1.8.10a) 

It is conventional to express the structural information in terms of a, pair distance distribution function, or 
PDDF [5], which is defined by p(r) = r P(r). Using this, equation (Bl.8.10 becomes 

(B1.8.106) 
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Equation (B 1.8. 9a) and Equation (B 1 . 8 . 1 Ob) both involve integrals whose upper limits are infinite, but this 
does not present a serious problem, because P(r) is zero for r greater than the largest diameter of the molecule, 
and I(Q) has a value significantly different from zero only for small angles. The functions p(r) and I(Q) both 
contain information about the sizes and shapes of the molecules. It is customary to plot the logarithm of I(Q) 
as a function of Q and such a plot for a spherical molecule has broad maxima with sharp minima between 
them, while less symmetric molecules produce curves with smaller ripples or a smooth falloff with increasing 
angle. A useful property of the molecule is the radius of gyration, R , which is a measure of the distribution of 
scattering density. This may be determined from the relation 


,1 


£Vp(r)dr 


R m= r<*> ,./-. ■ (B1.8.11) 


f i?(r) dr 


The volume of a uniform density molecule may be found from the relation 

v ~ c JTviwia <B1812) 

where C" = 8tt if the measurements are on an absolute scale. In practice, measurements are relative to a 
standard sample of known structure. 

Although this discussion has been in terms of molecules in solution, the same principles apply to other cases, 
such as precipitates in an alloy or composites of ceramic particles dispersed in a polymer. The density, p(r), is 


not relative to a vacuum, but is rather relative to a uniform medium. For x-rays this means electron densities, 
but for neutrons, because the atomic scattering factor is different from one isotope to another, an effect that is 
very large for the two stable isotopes of hydrogen, there can be wide variations in contrast depending on the 
isotopic compositions of the different components of the sample. This 'contrast variation' makes small-angle 
neutron scattering (SANS) a very versatile tool for the study of microstructure. 

It has been shown that spherical particles with a distribution of sizes produce diffraction patterns that are 
indistinguishable from those produced by triaxial ellipsoids. It is therefore possible to assume a shape and 
determine a size distribution, or to assume a size distribution and determine a shape, but not both 
simultaneously. 

B1 .8.3.2 PAIR DISTRIBUTION FUNCTIONS 

Another application in which useful information can be obtained in the absence of knowledge of the phase of 
the Fourier transform is the study of glasses and of crystals that contain short-range order but are disordered 
over long ranges. Here the objective is to determine a pair distribution function (PDF) [17], which is a 
generalization of the Patterson function that describes the probability of finding pairs of atoms separated by a 
vector r . - r k For various 
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reasons these studies are most easily done with neutrons and most of them have been done with glasses. In a 
glass the long-range structure may be assumed to be isotropic. Setting r=\r.- r^|, the particle density, p(r), at 
distance r from another particle, can be represented to a good approximation by 


p(r) - po = C / Q[WG) - / ilW ]sin«>r)d(> (B1.8.13) 

J(t 

where p Q is the mean overall density and /. is an isotropic incoherent scattering that is the only source of 
scattering at sufficiently large Q. 

B1 .8.3.3 CRYSTAL STRUCTURE DETERMINATION 

(A) TRIAL AND ERROR 

Laue's original experiment established that x-rays were short- wavelength electromagnetic radiation and that 
crystals were composed of periodically repeated arrays of identical units, but it did not establish any scale for 
the wavelength or the sizes of the crystalline units. W L Bragg [ 18 ] observed that the positions of the spots in 
a diffraction photograph produced by zincblende, ZnS, could be explained by a model (see figure B 1.8.4 in 
which the fundamental units were arranged on a face-centred cubic (fee.) lattice. The same model explained 
the patterns of sodium chloride, potassium bromide and potassium iodide. (Interestingly, Bragg's initial model 
for potassium chloride was based on what is now called a primitive cubic lattice. This was an artifact resulting 
from the near identity of the atomic scattering factors of potassium and chlorine.) By observing the relative 
intensities of the diffraction spots and applying elementary principles of group theory, Bragg proposed models 
for the arrangements of atoms that turned out to be correct. 


cPo°cP cp&cP 


Figure Bl.8.4. Two of the crystal structures first solved by W L Bragg. On the left is the structure of 
zincblende, ZnS. Each sulphur atom (large grey spheres) is surrounded by four zinc atoms (small black 
spheres) at the vertices of a regular tetrahedron, and each zinc atom is surrounded by four sulphur atoms. On 
the right is the structure of sodium chloride. Each chlorine atom (grey spheres) is surrounded by six sodium 
atoms (black spheres) at the vertices of a regular octahedron, and each sodium atom is surrounded by six 
chlorine atoms. 
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At about the same time R A Millikan [19] measured the charge on the electron. Dividing this into the electric 
charge required to electroplate a gram atomic weight of silver yielded a value, good to within 1%, for 
Avogadro's number, the number of formula units in a mole. Knowledge of the density and of Avogadro's 
number leads immediately to knowledge of the size of the unit cell, and thence to knowledge of the 
wavelength of the radiation producing the diffraction pattern. Bragg 's models for the first crystal structures 
were deduced from Laue photographs, which use continuum radiation, but discovery of the characteristic 
spectral lines of the elements followed soon after, and H G J Moseley [20] used them to straighten out several 
anomalies in the periodic table of the elements and to predict the existence of several elements that had not 
previously been observed. 

With the discovery of the x-ray line spectra it became possible to determine, at least relative to the slightly 
uncertain wavelengths, the sizes and also the symmetries, of the unit cells and the approximate sizes of the 
atoms. The theory of space groups, which had been worked out by mathematicians, principally A M 
Schonflies, in the 19th century, has always played a vital role in structural crystallography. With the structures 
of most of the solid elements and many of their binary compounds it was necessary only to calculate how 
many atoms would fit into the unit cell and to choose from a limited set of possible positions the ones that best 
accounted for the relative intensities of the diffraction spots. In sodium chloride, for example, the symmetry of 
the diffraction pattern shows that the unit cell is a cube and the fact that the indices, h, k and /, are either all 
odd or all even shows that the cube is face centred. There is room for only four atoms each of sodium and 
chlorine and it is observed that those reflections with the indices all even are much stronger than those with 
the indices all odd. This is consistent with a model that has sodium atoms at the corners and at the centres of 
the faces of the cube and chlorine atoms at the centres of the edges and at the body centre. 

Potassium chloride actually has the same structure as sodium chloride, but, because the atomic scattering 
factors of potassium and chlorine are almost equal, the reflections with the indices all odd are extremely weak, 
and could easily have been missed in the early experiments. The zincblende form of zinc sulphide, by 
contrast, has the same pattern of all odd and all even indices, but the pattern of intensities is different. This 
pattern is consistent with a model that again has zinc atoms at the corners and the face centres, but the sulphur 
positions are displaced by a quarter of the body diagonal from the zinc positions. 

In all of these structures the atomic positions are fixed by the space group symmetry and it is only necessary 
to determine which of a small set of choices of positions best fits the data. According to the theory of space 
groups, all structures composed of identical unit cells repeated in three dimensions must conform to one of 
230 groups that are formed by combining one of 14 distinct Bravais lattices with other symmetry operations. 


These include rotation axes of orders two, three, four and six and mirror planes. They also include screw 
axes, in which a rotation operation is combined with a translation parallel to the rotation axis in such a way 
that repeated application becomes a translation of the lattice, and glide planes, where a mirror reflection is 
combined with a translation parallel to the plane of half of a lattice translation. Each space group has a 
general position in which the three position coordinates, x, y and z, are independent, and most also have 
special positions, in which one or more coordinates are either fixed or constrained to be linear functions of 
other coordinates. The properties of the space groups are tabulated in the International Tables for 
Crystallography vol A [21]. 

The first crystal structure to be determined that had an adjustable position parameter was that of pyrite, FeS 2 
In this structure the iron atoms are at the corners and the face centres, but the sulphur atoms are further away 
than in zincblende along a different threefold symmetry axis for each of the four iron atoms, which makes the 
unit cell primitive. 
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Unfortunately for modern crystallographers, all of the crystal structures that could be solved by the choose- 
the-best-of-a-small-number-of-possibilities procedure had been solved by 1920. Bragg has been quoted as 
saying that the pyrite structure was 'very complicated', but he wrote, in about 1930, 'It must be realized, 
however, that (cases having one or two parameters) are still extremely simple. The more typical crystal may 
have ten, twenty, or forty parameters, to all of which values must be assigned before the analysis of the 
structure is complete.' This statement is read with amusement by a modern crystallographer, who routinely 
works with hundreds and frequently with thousands of parameters. 

(B) PATTERSON METHODS 

We have seen that the intensities of diffraction are proportional to the Fourier transform of the Patterson 
function, a self-convolution of the scattering matter and that, for a crystal, the Patterson function is periodic in 
three dimensions. Because the intensity is a positive, real number, the Patterson function is not dependent on 
phase and it can be computed directly from the data. The squared structure amplitude is 


\r(hki)\ 2 = imn/Lp ( B1.8.14) 

where I{hkt) is the integrated intensity of the hkl reflection, L is the so-called Lorentz factor, which depends 
on the experimental geometry and/? is a polarization factor, which is equal to one for nuclear scattering of 
neutrons and depends on the scattering angle, 20, for x-rays. From this the Patterson function is 

P(x< y, z) = 53 | F(M/) I 2 C0S [ 2jr ( A * + ky + lz)] (B1 .8.15) 

W 

where the sum is over all values of h, k and /. In practice I{hkl) can be measured only over a finite range of h, 
k and / and the resulting truncation introduces ripples into the Patterson function. 

The Patterson function has peaks corresponding to all interatomic vectors in the density function, the height of 
a peak being proportional to the product of the atomic scattering factors of the two atoms. Thus, although the 
Patterson function contains superpositions of the structure as if each atom is in turn placed at the origin and, 
therefore, has so many peaks that it is difficult to interpret except for very simple structures, there are several 
features that give important information about the underlying density function. If one or two of the atoms in 
the unit cell have much higher atomic numbers and, therefore, large values of the atomic scattering factors for 


x-rays, the peaks in the Patterson function that correspond to vectors between them will stand out from the 
rest. Peaks corresponding to vectors between the heavy atoms and lighter ones will also be higher than those 
corresponding to vectors between lighter atoms, which may reveal features of the environment of the heavy 
atom. If neutron diffraction is used to study crystals that contain atoms with negative scattering factors, 
especially hydrogen, but also manganese and titanium, the Patterson function will have negative regions 
corresponding to vectors between the negative scatterers and other atoms. 

If the space group contains screw axes or glide planes, the Patterson function can be particularly revealing. 
Suppose, for example, that parallel to the c axis of the crystal there is a 2 1 screw axis, one that combines a 
180° rotation with 
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a translation of c/2. Then for an atom at position (x, y, z) there will be another at (-x, -y, z + i). The section of 

the Patterson function at z = Iwill therefore contain a peak at position (2x, 2y) for every atom in the unique 

part of the cell, the asymmetric unit. Because this property was first applied to structure determination by D 
Harker, these special sections of Patterson functions are known as Harker sections [22]. If there is a single 
heavy atom in the asymmetric unit, the Harker section can completely determine the position of the atom. 
This plays a critical role in the method of is omorphous replacement, which we discuss below. 

(C) MORE TRIAL AND ERROR 

With diffraction data alone, in the absence of phase information, it is always possible to put restrictions on the 
choice of space group and in many cases it is possible to determine the space group uniquely. Careful 
measurement of the positions of diffraction spots determines the dimensions of the unit cell and assigns it to 
one of seven symmetry systems, triclinic, monoclinic, orthorhombic, trigonal, tetragonal, hexagonal, and 
cubic. The 14 Bravais lattices divide into five basic types, designated primitive, single-face centred, all-face 
centred, body centred and rhombohedral, which can be distinguished by special patterns of observed and 
unobserved reflections. We have already discussed the all-face centred lattice, in which the indices are either 
all odd or all even. In a body centred cell the sum of the indices is always even, while in a primitive cell there 
are no restrictions. 

If one or two of the indices are zeros, there may be additional restrictions. We have seen that a 2 1 screw axis 
parallel to the c axis of the unit cell produces pairs of atoms at (x, y, z) and (-x, -y, z + I). From equation 

(Bl.8.6) we can write 

A72 


which can be written 

N/2 

F(Q(il) = cJ^fj{J M )[] + (-1)']exp(2jri/^ (B1. 8.16ft) 

All terms in the sum vanish if / is odd, so (00/) reflections will be observed only if / is even. Similar 
restrictions apply to classes of reflections with two indices equal to zero for other types of screw axis and to 
classes with one index equal to zero for glide planes. These systematic absences, which are tabulated in the 
International Tables for Crystallography vol A, may be used to identify the space group, or at least limit the 


choices. 

The presence of a 2 1 screw axis and a glide plane perpendicular to it implies also the existence of a centre of 
symmetry, so that, for an atom at (x, y, z), there is another one at (-x, -y, -z). Equation (Bl.8.6) can then be 
written 

S'ft 

F(hkt) = Cj2f } ur)\exp[2ni(h.xj + ky ; + Uj)] ■■ exp[-2jri(/r.rj + kyj +/^)]J* (B1.8.17) 
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The two exponential terms are complex conjugates of one another, so that all structure amplitudes must be 
real and their phases can therefore be only zero or n. (Nearly 40% of all known structures belong to 
monoclinic space group Pl^lc. The systematic absences of (MO) reflections when k is odd and of (hOl) 
reflections when / is odd identify this space group and show that it is centrosymmetric.) Even in the absence 
of a definitive set of systematic absences it is still possible to infer the (probable) presence of a centre of 
symmetry. A J C Wilson [23] first observed that the probability distribution of the magnitudes of the structure 
amplitudes would be different if the amplitudes were constrained to be real from that if they could be 
complex. Wilson and co-workers established a procedure by which the frequencies of suitably scaled values 
of |F| could be compared with the theoretical distributions for centrosymmetric and noncentrosymmetric 
structures. (Note that Wilson named the statistical distributions centric and acentric. These were not intended 
to be synonyms for centrosymmetric and noncentrosymmetric, but they have come to be used that way.) 

The knowledge that a crystal structure is centrosymmetric reduces the phase problem to one of determining 
signs, but it is still a formidable one. An extended trial-and-error method uses all available information, 
including that derived from Patterson methods, numbers of special positions in the unit cell, known 
interatomic distances, likely group configurations etc, to guess a trial structure and compute from it a set of 
signs, which are then used to compute a density map, or, more likely, a difference map, in which the Fourier 
coefficients are the differences between the values of FQtkt) computed from the trial structure and their 
observed values. Features of the difference map suggest modifications to the trial structure and a new set of 
signs is used to compute an updated map. With luck, this procedure will converge in a few iterations to a 
reasonable structure. 

(D) DIRECT METHODS 

As the number of atoms in the asymmetric unit increases, the solution of a structure by any of these phase- 
independent methods becomes more difficult, and by 1950 a PhD thesis could be based on a single crystal 
structure. At about that time, however, several groups observed that the fact that the electron density must be 
non-negative everywhere could be exploited to place restrictions on possible phases. The first use of this fact 
was by D Harker and J S Kasper [24], but their relations were special cases of more general relations 
introduced by J Karle and H Hauptman [25]. Denoting by h f the set of indices h f , k f , L, the Karle-Hauptman 
condition states that all matrices of the form 

/ F(Q) F{h t ) F(h 2 ) ► ♦* F(h ft ) ^ 

F*{h\) F{9) F(h 2 -hi) --- Fihn-hi) 

F*(h 2 ) F*(h 2 -h\) F(0) ^ Fihv-hi) 

\F*(k fl ) F+ikn-hO F*(k n -h 2 ) — F(0) I 

must be positive definite. Defining U(h 7 ) by U(h 7 ) = F(A z )/F(0) and taking a 3 x 3 matrix for an example, this 


condition implies that the determinant 


D(/M,/l2) = 


1 

U(hi) 

Uihz) 

tHM 

1 

U{h 2 -h,) 

V'ihi) 

U*{k 2 -kx) 

1 


> 0. (B1.8.18) 
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From this some tedious but straightforward algebra leads to 

(B1.8.19) 

The two factors on the right are both positive, real numbers less than one. If the magnitudes of U(h^) and U 
(h 2 ) are both close to one, therefore, the magnitude of the difference between the terms within the brackets on 
the left (complex numbers in general) must be small. 

Karle and Hauptman showed that the fact that the crystal is composed of discrete atoms implies that large 
enough determinants of the type in inequality (B 1.8. 18) must vanish, leading to exact relations among sets of 
phases. For structures with moderately large numbers of atoms in the asymmetric unit 'large enough' may be 
very large indeed, and relations such as inequality (B 1.8. 19) may not represent much of a restriction on the 
phase of U(h 2 - h^). Further development of these principles by Hauptman, Karle, I L Karle, M M Woolfson 
and many others [26] showed that, although no one of these relations would put a significant restriction on a 
phase, reasonable assumptions about probability distributions would lead to statistical tests that assigned high 
probabilities to sufficient numbers of phases so that the correct structure could be identified in a density map. 
These developments, together with the revolution in computing power, have made the solution of structures 
with up to several hundred atoms in the asymmetric unit a matter of routine. 

(E) ISOMORPHOUS REPLACEMENT 

While direct methods have opened up structural chemistry with hundreds of atoms in the asymmetric unit, 
many of the most interesting studies are of biological macromolecules, particularly proteins, which may have 
thousands of atoms in the asymmetric unit. Furthermore, all biological molecules are chiral, which means that 
the space groups in which they crystallize can never possess centres of symmetry or mirror (or glide) planes. 
Although the phases of some sets of reflections may be restricted by symmetry, most structure amplitudes are 
complex and with large structures the statistical techniques do not supply sufficient information to be useful. 
The first successful method of determining phases in macromolecular structure studies was the method of 
isomorphous replacement, in which a crystal of a protein is treated chemically to incorporate a small number 
of heavy atoms into the crystal without disturbing very much the arrangement of the protein molecules. In 
favourable cases two or more heavy-atom derivatives can be prepared in which the arrangements of the heavy 
atoms are different. The contribution of the protein molecule to the structure amplitude is assumed to be the 
same in the derivatives as in the native protein and the interatomic vectors of the heavy atoms stand out 
sufficiently in a Patterson map to allow the heavy-atom positions to be determined. 

Referring to figure Bl.8.5 the radii of the three circles are the magnitudes of the observed structure amplitudes 
of a reflection from the native protein, F and of the same reflection from two heavy-atom derivatives, F dl 
and F d2 - We assume that we have been able to determine the heavy-atom positions in the derivatives and F hl 
and F^ are the calculated heavy-atom contributions to the structure amplitudes of the derivatives. The centres 
of the derivative circles are at points -F hl and -F^ 2 i n the complex plane, and the three circles intersect at one 
point, which is therefore the complex value of F . The phases for as many reflections as possible can then be 


used to compute a density map. The protein molecule is a chain of amino acid residues, whose sequence can 
be determined by biochemical means and each 
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residue consists of a backbone portion, whose length is essentially the same for all amino acids and a side 
chain. With lots of both skill and luck, and sophisticated computer hardware and software, a model of the 
chain can be fitted into the density map to obtain a trial structure that can be refined. 



Figure Bl.8.5. F , F^ and F d2 are the measured structure amplitudes of a reflection from a native protein 
and from two heavy-atom derivatives. F^ and F 2 ^ are the heavy atom contributions. The point at which the 
three circles intersect is the complex value of F . 

(F) MULTIPLE-WAVELENGTH ANOMALOUS DIFFRACTION 

A technique that employs principles similar to those of isomorphous replacement is multiple-wavelength 
anomalous diffraction (MAD) [27]. The expression for the atomic scattering factor in equation (B1.8.2&) is 
strictly accurate only if the x-ray wavelength is well away from any characteristic absorption edge of the 
element, in which case the atomic scattering factor is real and F(hkl) = F(hkl)*- Since the diffracted 

intensity is proportional to \F(hkl)\ , the diffraction process in effect introduces a centre of symmetry into all 
data, a fact that is known as Friedel's law. If the wavelength is near an absorption edge, however, the atomic 
scattering factor becomes complex and the phases of the contributions of an atom to F(hkl) and F{hki)6\ffQv. 
The increasingly widespread availability of synchrotron radiation has made it possible to collect diffraction 
data at several wavelengths, including some near an absorption edge of one or more of the elements in the 
crystal. The differences between the intensities of M/ and F{hki)thdn serve in a role similar to that played by 
the differences between the intensities of native and derivative data in isomorphous replacement. 
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B1.8.4 EXPERIMENTAL TECHNIQUES 

There are many experimental techniques for diffraction studies, depending on whether the materials 
producing the diffraction are crystalline or amorphous solids, liquids or gases. Crystalline materials are further 
subdivided according to whether the sample is a single crystal or a powder composed of many small crystals, 
frequently of more than one phase. All techniques include a source of radiation, a system for holding and 
manipulating the sample and a means of detecting the scattered radiation. 

B1. 8.4.1 SOURCES OF RADIATION 

(A) X-RAYS 

X-rays for diffraction are generated in two ways. The most common is to bombard a metallic anode in a 
vacuum tube with electrons emitted thermionically from a hot cathode, thereby exciting the characteristic 
radiation from the anode material, which is usually copper or molybdenum, although some other metals are 
used for special purposes. If the accelerating voltage in the tube is well above that required to eject a K shell 
electron from an atom of the anode material, most of the x-radiation emitted will be in the characteristic lines 
of the K series on top of a continuous, Bremsstrahlung spectrum. Kp and higher energy lines may be filtered 
out using a suitable metallic filter, or the characteristic line may be selected by reflection from a 
monochromator crystal. 

The other type of x-ray source is an electron synchrotron, which produces an extremely intense, highly 
polarized and, in the direction perpendicular to the plane of polarization, highly collimated beam. The energy 
spectrum is continuous up to a maximum that depends on the energy of the accelerated electrons, so that x- 
rays for diffraction experiments must either be reflected from a monochromator crystal or used in the Laue 
mode. Whereas diffraction instruments using vacuum tubes as the source are available in many institutions 
worldwide, there are synchrotron x-ray facilities only in a few major research institutions. There are 
synchrotron facilities in the United States, the United Kingdom, France, Germany and Japan. 

(B) NEUTRONS 

Neutrons for diffraction experiments are also produced in two ways. Thermal neutrons from a nuclear reactor 
are reflected from a monochromator crystal and Bragg 's law is satisfied for neutrons scattered from the 
sample by measuring the scattering angle, 20. In a spallation source short pulses of protons bombard a heavy 
metal target and high energy neutrons are produced by nuclear reactions. These neutrons interact with a 
moderator, giving a somewhat longer pulse of neutrons with a spectrum that extends down to thermal energies 
and therefore to wavelengths up to a few Angstroms. Diffraction from a sample a few metres away from the 
moderator is observed at a fixed angle and the relation between wavelength and velocity causes Bragg 's law 
to be satisfied at some time after the initial pulse. 

As with synchrotron x-rays, neutron diffraction facilities are available at only a few major research 
institutions. There are research reactors with diffraction facilities in many countries, but the major ones are in 
North America, Europe and Australia. The are fewer spallation sources, but there are major ones in the United 
States and the United Kingdom. 
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(C) ELECTRONS 


As noted earlier, most electron diffraction studies are performed in a mode of operation of a transmission 
electron microscope. The electrons are emitted thermionically from a hot cathode and accelerated by the 
electric field of a conventional electron gun. Because of the very strong interactions between electrons and 
matter, significant diffracted intensities can also be observed from the molecules of a gas. Again, the source of 
electrons is a conventional electron gun. 

B1. 8.4.2 DETECTORS 

Detectors for the three types of radiation are similar and may be classified in two categories, photographic and 
electronic. In addition to photographic films and plates, photographic detectors also include fluorescent 
screens and image plates, in which x-rays produce a latent image in a storage phosphor. In the dark the 
phosphor emits radiation very slowly, but exposure to light from a laser stimulates fluorescence, which then 
can be observed by a photomultiplier tube and converted to an electronic signal. Because neutrons interact 
weakly with most materials, image plates and fluorescent screens must contain one of the elements, such as 
gadolinium, that have isotopes with high absorption cross-sections. Photographic detection of neutrons 
usually uses a fluorescent screen to enhance the image. 

There are many types of electronic detector. The original form of electronic detector was the Geiger counter, 
but it was replaced many years ago by the proportional counter, which allows selection of radiation of a 
particular type or energy. Proportional counters for x-rays are filled with a gas such as xenon, and those for 

neutrons are filled with a gas containing a neutron-absorbing isotope, usually He. Recently these gases have 
been used to construct position-sensitive area detectors for both x-rays and neutrons. 

B1 .8.4.3 SINGLE-CRYSTAL DIFFRACTION 

Many different geometrical arrangements are commonly used for measurements of diffraction of x-rays and 
neutrons from single crystals. All have a mechanism for setting the crystal so that Bragg 's law is satisfied for 
some set of crystal planes and for placing a detector in the proper position to observe the reflection. In the 
original x-ray diffraction experiments of Laue and co-workers the x-rays had a broad spectral distribution, so 
that for any angular position of a crystal and any interplanar spacing there were x-rays with the proper 
wavelength to satisfy Bragg' s law. Laue photographs reveal the internal symmetry of the crystal and are 
therefore used to determine the symmetry and orientation of the crystal. For crystal structure determination it 
is necessary to measure accurate intensities and it is usual to use a monochromatic beam of x-rays or neutrons. 

For diffraction studies with monochromatic radiation, the crystal is commonly mounted on an Eulerian cradle, 
which can rotate the crystal so that the normal to any set of planes bisects the angle between the incident and 
reflected beams, which is set for reflection from planes with a particular value of the interplanar spacing, d. 

If the detection system is an electronic, area detector, the crystal may be mounted with a convenient crystal 
direction parallel to an axis about which it may be rotated under the control of a computer that also records the 
diffracted intensities. Because the orientation of the crystal is known at the time an x-ray photon or neutron is 
detected at a particular point on the detector, the indices of the crystal planes causing the diffraction are 
uniquely determined. If 
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films are used, additional information is needed to index the pattern. The crystal is either oscillated through a 
narrow angular range, so that only a small number of planes can come into the reflecting position, or the film 
is moved, with a mask covering all but a small part of it, so that the exposed part of it is coordinated with the 
angular position of the crystal. There are two common moving-film methods, the Weissenberg method, in 


which a cylindrical film is moved parallel to the rotation axis of the crystal and the precession method, in 
which the normal to a reciprocal lattice plane is moved along a circular cone while a flat film and a circular 
slit are both moved in such a way that the positions of the spots on the film correspond to the points of the 
reciprocal lattice plane. 

One form of electron diffraction is similar to the precession method, except that the 'single crystal' is a grain 
of a polycrystalline foil. Figure Bl.8.6 shows an electron diffraction pattern produced when the beam is 
directed down a fivefold symmetry axis of a quasicrystal. Because of the very short wavelength the cone 
angle is so small that it lies within the mosaic spread of the grain, and the resulting diffraction pattern, after 
magnification by the electron optics, closely resembles a precession pattern made with x-rays. In this 
technique the divergence of the electron beam is extremely small, and the diffraction spots correspond to 
lattice points in a plane of the reciprocal lattice that passes through the origin. The diffraction pattern therefore 
has a centre of symmetry. In convergent beam electron diffraction (CBED) [28] (see figure Bl.8.7 the 
divergence of the electron beam is still only a few tenths of a degree, but the resultant smearing of the Ewald 
sphere allows it to intersect layers of the reciprocal lattice adjacent to the one passing through the origin, so 
that a region of broadened diffraction spots is surrounded by one or more rings of additional spots 
corresponding to points in these adjacent planes. Because Friedel's law does not apply in those planes, the 
pattern more closely reflects the true symmetry of the crystal. 



Figure Bl.8.6. An electron diffraction pattern looking down the fivefold symmetry axis of a quasicrystal. 
Because Friedel's law introduces a centre of symmetry, the symmetry of the pattern is tenfold. (Courtesy of L 
Bendersky.) 
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Figure Bl.8.7. A convergent beam diffraction pattern of the fivefold axis of a quasicrystal, as in figure 
Bl.8.6 . The diffraction rings show that the symmetry is fivefold, not tenfold. (Courtesy of L Bendersky.) 

An experimental technique that is useful for structure studies of biological macromolecules and other crystals 
with large unit cells uses neither the broad, 'white', spectrum characteristic of Laue methods nor a sharp, 
monochromatic spectrum, but rather a spectral band with AAA « 20%. Because of its relation to the Laue 
method, this technique is called quasi-Laue. It was believed for many years that the Laue method was not 
useful for structure studies because reflections of different orders would be superposed on the same point of a 
film or an image plate. It was realized recently, however, that, if there is a definite minimum wavelength in 
the spectral band, more than 80% of all reflections would contain only a single order. Quasi-Laue methods are 
now used with both neutrons and x-rays, particularly x-rays from synchrotron sources, which give an intense, 
white spectrum. 

B1 .8.4.4 POWDER DIFFRACTION 

Many scientifically and technologically important substances cannot be prepared as single-crystals large 
enough to be studied by single crystal diffraction of x-rays and, especially, neutrons. If a sample composed of 

a very large number (of order 10 10 or more) of very small (10 |um or smaller) crystals are irradiated by a 
monochromatic beam of x-rays or neutrons, there will be some crystals with the right orientation to reflect 
from all possible sets of crystal planes with interplanar spacings greater than A/2. The resulting diffraction 
pattern contains intensity peaks that are characteristic for any crystalline compound. The pattern corresponds 
to a uniform distribution of reciprocal lattice points on the surface of a sphere and analysis of the pattern to 
determine the size and shape of the unit cell can be a difficult (but generally computationally tractable) 
problem. Nevertheless, in addition to structure determination, powder diffraction is an extremely powerful 
tool for phase identification and, because a mixture of crystalline phases will give the characteristic patterns of 
all phases present, quantitative phase analysis. Also if, because of mechanical deformation, for example, the 
powder sample is not spherically uniform, powder diffraction can reveal the nature of the preferred 
orientation. Because of the weak interaction and, therefore, high penetration of the neutron, neutron powder 
diffraction is particularly useful for preferred orientation (texture) studies of bulk materials. 
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X-ray powder diffraction studies are performed both with films and with counter diffractometers. The powder 
photograph was developed by P Debye and P Scherrer and, independently, by A W Hull. The Debye-Scherrer 
camera has a cylindrical specimen surrounded by a cylindrical film. In another commonly used powder 


camera, developed by A Guinier, a convergent beam from a curved, crystal monochromator passes through a 
thin, flat sample and is focused on the film. The common x-ray powder diffractometer uses so-called Bragg- 
Brentano (although it was apparently developed by W Parrish) focusing. A divergent beam from the line focus 
of an x-ray tube is reflected from a flat sample and comes to an approximate focus at a receiving slit. 

Powder diffraction studies with neutrons are performed both at nuclear reactors and at spallation sources. In 
both cases a cylindrical sample is observed by multiple detectors or, in some cases, by a curved, position- 
sensitive detector. In a powder diffractometer at a reactor, collimators and detectors at many different 20 
angles are scanned over small angular ranges to fill in the pattern. At a spallation source, pulses of neutrons of 
different wavelengths strike the sample at different times and detectors at different angles see the entire 
powder pattern, also at different times. These slightly displaced patterns are then 'time focused', either by 
electronic hardware or by software in the subsequent data analysis. 


B1. 8.5 FRONTIERS 

Starting from the truly heroic solution of the structure of penicillin by D C Hodgkin (nee Crowfoot) and 
coworkers [29], x-ray diffraction has been the means of molecular structure determination (and the basis of 
many Nobel prizes in addition to Hodgkin' s) of important compounds, including natural products, of which 
minute quantities were available for analysis, enabling chemical synthesis and further study. Knowledge of 
the molecular structure leads in turn to an understanding of reaction mechanisms and, in the case of biological 
molecules in particular, to an understanding of enzyme function and how drugs can be designed to promote 
desirable reactions and inhibit undesirable ones. 

The development of neutron diffraction by C G Shull and coworkers [30] led to the determination of the 
existence, previously only a hypothesis, of antiferromagnetism and ferrimagnetism. More recently neutron 
diffraction, because of its sensitivity to light elements in the presence of heavy ones, played a crucial role in 
demonstrating the importance of oxygen content in high-temperature superconductors. 

The development of synchrotron x-ray sources has resulted in a vast expansion of the capability of x-ray 
diffraction for determining macromolecular structure, but advances are still limited by the rarity and expense 
of synchrotron facilities. Correspondingly, the use of neutron diffraction has always been inhibited by the 
relatively low intensities available and the resulting need for large samples and long data collection times. 
With both synchrotrons and neutron sources observation time at existing facilities is chronically 
oversubscribed. Thus there is a need to develop both instruments and methodologies for maximum utilization 
of the sources. 
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B1.9 Scattering: light neutrons, X-rays 

Benjamin S Hsiao and Benjamin Chu 


B1.9.1 INTRODUCTION 

Scattering techniques using light, neutrons and x-rays are extremely useful to study the structure, size and 
shape of large molecules in solids, liquids and solutions. The principles of the scattering techniques, which 
involve the interaction of radiation with matter, are the same. However, the data treatment for scattering from 
light, neutrons or x-rays can be quite different because the intrinsic property of each radiation and its 
interactions with matter are different. One major difference in the data treatment arises from the states of 
matter. Although the general equations describing the interaction between radiation and matter are valid for 
all classes of materials, unique analytical treatments have been made to suit the different states of matter. 

In this chapter, the general principles of the scattering phenomenon and specific data treatments for the 
material (isotropic and anisotropic) in both solid and solution states are presented. These treatments are useful 
for the analysis of scattering data by light, neutrons or x-rays from different material systems such as 
crystalline polymers, complex fluids (including colloidal suspensions and solutions of biological species), 
multicomponent systems (including microemulsions and nanocomposites) and oriented polymers. For detailed 
theoretical derivations, the reader should refer to the many excellent textbooks and review articles that deal 
with the subjects of scattering from light [1, 2, 3, 4, 5, 6 and 7], neutrons [7, 8, 9, 10 and H] and x-rays [7, H, 


12 , 13 , 14 and 15 ]. We, however, will not discuss the detailed instrumentation for different scattering 
experiments as this topic has been well illustrated in some of the above references [3, 4, 5, 6 and 7, 9, 10, H, 
12 , 13 , 14 and 15 ]. Also absent will be the analysis for thin films and interfaces which warrants a separate 
chapter by itself. The main focus of this chapter is to provide a comprehensive overview to the field of 
scattering from materials (with emphasis on polymers) including an appropriate comparison between the 
different techniques. Selected example studies will also be included at the end of each section to illustrate the 
applications of some advanced scattering techniques. 


B1.9.2 INTERACTION OF RADIATION AND MATTER 

In free space, electromagnetic radiation consists of simultaneous electric and magnetic fields, which vary 
periodically with position and time. These fields are perpendicular to each other and to the direction of wave 
propagation. The electromagnetic radiation consists of a wide range of wavelengths from 10 m (x-rays) to 
10 m (low frequency radio waves). Visible light has wavelengths from 400 nm (violet) to 700 nm (red), 
which constitutes a very small fraction of the electromagnetic spectrum. 

As the electromagnetic radiation interacts with matter, the resultant radiation may follow several pathways 
depending on the wavelength and the material characteristics. Scattering without loss of energy is termed 
elastic. Elastic scattering from the periodic structure of matter emitting radiation of wavelength with the same 
magnitude leads to diffraction phenomena. The radiation may be slowed down by the refraction phenomenon. 
The radiation may also be absorbed 


by the material. The absorbed energy may be transferred to different modes of motion, dissipated as heat or 
re-emitted as radiation at a different frequency. Energetic photons from x-rays or ultra-violet (UV) radiation 
can produce dissociation of chemical bonds leading to chemical reactions ejecting photoelectrons (known as 
x-ray photoelectron spectroscopy, XPS, or electron spectroscopy for chemical analysis, ESCA). Fluorescence 
occurs when the transfer of the residual energy to electronic modes takes place. Raman scattering occurs when 
the energy is transferred to or from rotational or vibrational modes. The characterization techniques based on 
these different phenomena are described in other chapters of section B. 

Light scattering arises from fluctuations in refractive index or polarizability and x-ray scattering arises from 
fluctuations in electron density. Both are dependent on interactions of radiation with extra-nuclear electrons 
( figure B 1.9.1 ) and will be discussed together. If we consider an incident beam having an electric field E = E^ 
cos(2 n r/X - cot), where Eq is the magnitude, X is the wavelength in vacuum, r is the distance of the observer 
from the scatterer and co is the angular frequency. From electromagnetic radiation incident upon an atom (with 
polarizability, a), a dipole moment m = aE will be induced. The oscillating dipole will serve as a source of 
secondary radiation (this is scattering) with amplitude £" [16], 


1 d-w 

E, = -ttt^^ (B1.9.1) 


where c is the velocity of light and cp is the angle between the plane of the polarization and the dipole 
moment. Thus we obtain 

£5, = = COS#>COS(rtrf — <j>) (B1.9.2) 


where § is a phase angle which takes into account that the wave must travel a distance (r = d) to reach the 
observer (c|) = 2nd/X). These equations presume that the electric field at the scattering position is not modified 
by the induced-dielectric environment (the Rayleigh-Gans approximation). Equation (Bl.9.2) is thus termed 
Rayleigh scattering, which holds true for light scattering provided that the light frequency is small when 
compared with the resonance frequency of the electrons. For x-ray scattering, the frequency of 
electromagnetic radiation is higher than the resonance frequency of the electrons. In this case, Thomson 
scattering prevails and the scattering amplitude becomes 


£ s = 
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(B1.9.3) 


where e is the electron charge and m Q is the electron mass. Thus, all electrons scatter x-rays equally and the x- 
ray scattering ability of an atom depends on the number of electrons, which is proportional to the atomic 
number, Z, in the atom. It should be noted that Rayleigh scattering is dependent on frequency and 
polarizability, but Thomson scattering is not. As a result, more polarizable molecules (larger, conjugated, 
more aromatic) are better Rayleigh scatterers than others. Neutron scattering depends upon nuclear properties 
being related to fluctuations in the neutron scattering cross 


section a between the scatterer and the surroundings. Hence, hydrogen can be a strong neutron scatterer in an 
isotope environment, but it is a weak electron scatterer. 
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Figure Bl.9.1. Diagrams showing that x-ray and light scattering involve extra-nuclear electrons, while 
neutron scattering depends on the nature of the atomic nucleus. 

The generalized scattering equation can be expressed by a complex exponential form 


(Ejy = EqKj vxp[\U,jf - $j)] 


(B1.9.4) 


where the subscript y refers to the scattering from they'th element, (E\ represents the amplitude of the 
scattering of theyth scatterer and K. is proportional to the scattering power of they'th scatterer. For a co 
of scattering elements, the total field strength (amplitude) of the scattered waves is 


t\ = ^tW/ = £fl J^ Kj exp[\{wr - $,)]. (B1 9 5) 


All scattering phenomena (light, x-rays and neutrons) can be interpreted in terms of this equation (Bl.9.5). 
These techniques differ mainly in the structural entities that contribute to the K. term. For light, the refractive 
index or polarizability is the principal contributor; for x-rays, the electron density is the contributor; and for 
neutrons, the nature of the scattering nucleus is the contributor. Equation (Bl.9.5) thus represents a starting 
point for the discussion of the interference problem presented below. 


B1.9.3 LIGHT SCATTERING 

The intensity of light scattering, 7", for an isolated atom or molecule is proportional to the mean squared 
amplitude 


h=K{El) (B1.9.6) 

where the constant K is equal to c/4n for electromagnetic radiation and () represents an average operation. 
Combining equations ( Bl.9.2 ) and (Bl.9.6), we have 

<r 2 £ 2 *u 4 
/* - K — T^{eos 2 ^cos 2 (atf -^)). (B1.9.7) 
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As the average is over all values of time, 

{cos 2 (*tf - fy )) = (cos 2 .v} = ( J cos 2 x dx j ( I dx\ = - (B1 9.8) 

so equation (Bl.9.7) can be simplified to 
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The incident intensity, 7 Q , is given by 

r a = K(f-: 2 ) - ir/:fi<i»s 2 (<Bi -*)) =• jA-f:,]. (B1.9.10 


The ratio of the scattered to the incident intensity is given by 
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Equation (B 1.9. 11) is valid only for plane polarized light. For unpolarized incident light, the beam can be 
resolved into two polarized components at right angles to each other. The scattered intensity can thus be 
expressed as (figure B 1.9.2) 
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As a result, equation (B 1.9. 11) becomes 
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where we let cp 1 = and 20 = cp 2 . Note that in light scattering, one often defines = cp 2 with being the 
scattering angle. Herein, we define 20 as the scattering angle to be consistent with x-ray scattering described 
in Bl.9.4. Since the frequency term may be converted to wavelength (in vacuum) 


to 
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where v is the frequency, equation (B 1.9. 13) becomes 
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Figure Bl.9.2. Resolution of a plane unpolarized incident beam into polarized scattering components. 


This equation has the following implications. 

(1) At 20 = 0°, the scattering comprises both components of polarization of the incident beam; at 20 = 90°, 
the scattering comprises only one half of the incident beam. Consequently, the scattered light at 90° will 
be plane polarized. 

(2) Since I oc / A 4 , this indicates that the shorter wavelengths (such as blue) scatter more than longer ones 
(such as red). Therefore, the light from a clear sky being blue is due to scattering by gas molecules in the 
atmosphere. 

B1. 9.3.1 SCATTERING FROM A COLLECTION OF OBJECTS 

For random locations of TV scattering objects in volume V, the scattered intensity can be found by summing the 
scattering from each object: 


A=!^l(l+cos : 2tf). (B1.9.16) 

A modified form of equation (B 1.9. 16) is usually used to express the scattering power of a system in terms of 
the 'Rayleigh ratio' defined as 

R = 7777^ 7T7- - —n- 77 ■ (B1.9.17) 


V[] -cos- 20) X 4 


In this case, the scattering serves as a means for counting the number of molecules (or particles, or objects) 
per unit volume (N/V). It is seen that the polarizability, a, will be greater for larger molecules, which will 
scatter more. If we take the Clausius-Mosotti equation [16]: 


(n 2 - l)/(w 2 -2)= -7i ( — Ice (B1.9.18) 


and consider n « 1 (n is the refractive index), then 


H ~ 1 / V \ 

If the scattering particles are in a dielectric solvent medium with solvent refractive index n^ we can define the 

excess 
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polarizability (a = a(solution) - a(solvent)) as 


(B1.9.20) 


"" Alt \Nj- 

If the weight concentration C is used, C = MN l(N V), where M is the molecular weight of the particle, N is 
Avogadro's number and n » n Q , then the above relationship becomes 

{n + n ){n -n 9 )CV ri fdn\/M\ 

*<* = ^ — = ^\»c){7r m ) (B1 921; 


where d n/d C « (« - « )/C at constant temperature. The excess Rayleigh ratio R Qx = ^(solution) - ^(solvent) 
has the form 

«« = m;{3c) cm=hcu <B1 ' 922) 

where H is the optical constant for unpolarized incident light. For polarized light, the constant H can be twice 
as large due to the factor (1 + cos 2 20). It is noted that this factor of 2 depends on the definition of R, i.e., 
whether the 1 + cos 2 20 term is absorbed by 7?. 

» = tUlll (Hi) 2 cm = HCM (B1.9.23) 

where i? ex is the excess Rayleigh ratio for polarized light. The above treatment is valid for molecules that 
are small compared to the wavelength of the incident beam. 

B1. 9.3.2 SCATTERING FROM A SOLUTION OF LARGE MOLECULES 

For molecules having dimensions comparable with the wavelength, phase differences will occur between 
waves scattered from different regions of the molecule. These phase differences result in an angular 
dependence of the scattered intensity. The reduction may be expressed in terms of a particle interference 
factor P(29) such that 

P(20)= ^< CX P CTimCTtal > (B1.9.24) 

/f^tno interference) 
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where 20 is the scattering angle. We again caution the reader that the conventional symbol for the scattering 
angle by light is 0. Herein we use the symbol 20 to be consistent with x-ray and neutron scattering described 
later. The interference factor also follows the expression 

P(29) = (] Yi<Xj eos(«f - <pj) V J^ cry cosiwt) \ (B1.9.25) 


where the summation is over all parts of a molecule. The nature of P(2Q) may be qualitatively deduced from 
figure Bl.9.3. For scattering in the forward direction (0^ the path difference between the rays from elements 
A and B of the molecule (d B - d^) is less than that at the backward scattering angle (20 2 ) to observer 2 , (d' B 
- rf' A ). So, a greater phase difference occurs at 20 2 . If the dimensions of the molecule (or particle) are less 
than the wavelength, destructive interference occurs and P(2Q) will decrease. If the molecular (or particle) 
dimensions are much greater than the probing wavelength, both destructive and constructive interference can 
occur leading to maxima and minima in P(2Q). For 20 = 0°, no path difference exists and ,P(20) = 1. Thus, the 
scattering technique can be used to estimate the size of the molecule (or particle). 



Figure Bl.9.3. Variation of P(2Q) as a function of scattering angle. 

Equation (Bl.9.5) gives the total amplitudes of scattering from a collection of objects and is a good starting 
point for the derivation of interference phenomena associated with molecular size. 


£ s = £« Yl K J «PM' - -*//<■)] 


(B1.9.26) 


where d. is the distance between the scattering element and the observation point P (shown in figure B 1.9.4 . 
In figure figure Bl.9.4 , the following relationships can be approximated as 

dj = r ; * s [h + n - tj ■ s t = D + r ; * (.?„ - s,) = D + {r; ■ s) (B1 .9.27) 


where D represents the distance between the observation point P and the origin O, .s Q is the unit vector in the 
incident beam direction, tf ^ is the unit vector in the scattered beam direction and r- is the vector to theyth 
scattering element. Equation (B 1.9. 2 6) thus becomes 


£, = En ^ Kj e^ e-^v^'e- 1 "'^- = /r e -*" 


(B1.9.28) 


where ? = Z^ ^y^ ; © * 5 which is defined as the structure or form factor of the object, and k = co/c = 

2nfk. For a system with a large number of scattering elements, the summation in equation (B 1.9.28) may be 
replaced by an integral 


i (B1.9.29) 

The term i^e 1001 is related to the incident beam, which is often omitted in the theoretical derivation (as 
follows). Tne second term in (B 1.9. 29) represents the amplitude scattered by a three dimensional element with 

a volume element dr and p(r) being the density profile. From now on, we will simplify the symbol for the dot 
product of two vectors r >as r. s and other similar products. If we consider the spherical polar coordinates 

( figure Bl.9.5 ), this volume element becomes 

fi.T pS J- TO 

dV=/ / / r 2 sm<pdrd<pd<p (B1.9.30) 


so that more generally, 

F = j j j p(r,^)e ~ iM '^Vsiii<pd^d^dr. (B1.9.31) 

For spherically symmetric systems, we can derive the following expression from (B 1.9.31) [17]: 

F = 4tt p(r)— — r'dr (B1932) 
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where q = 47i(sin 0)/A,, with 20 being the scattering angle. The interference factor described previously 
( equation (B 1.9. 25) ) may be expressed in terms of the intramolecular particle scattering factor as 

P{29) = ([ p{r) S -^-r 2 dA(j pir)r 2 dr\ . (B1.9.33) 

If we define the radius of gyration, R , by 

R i=( r p(r)r2 dr ) ( r p(r) dr ) (B1 9 34) 

then equation (B 1.9. 33) can be expressed as [ 18 ] 

FQV) = 1 - --r-* + ■ ■ = I - —tt^ siir<20/2> + ■ * . (B1 .9.35) 

3 M* 

To measure the molecular weight of the molecule, we can modify equation (B 1.9. 23) to take into account the 
intramolecular interference in the dilute solution range, 




+ 2,1 3 C 


(B1.9.36) 


where A 2 is the second virial coefficient. By combining (B 1.9. 35) with (B 1.9. 36), we obtain 




I 24nC. 


(B1.9.37) 


This is the basic equation for monodisperse particles in light scattering experiments. We can derive three 
relationships by extrapolation. 
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Figure Bl.9.4. Geometrical relations between vectors associated with incident and scattered light. 



Figure Bl.9.5. Geometrical relations between the Cartesian coordinates in real space, the spherical polar 
coordinates and the cylindrical polar coordinates. 




(B1.9.38) 


lim ™ ^ — + 2A*C 


(B1.9.39) 


» ni TT~ — T7- 


(B1.9.40) 


A graphical method, proposed by Zimm (thus termed the Zimm plot), can be used to perform this double 
extrapolation to determine the molecular weight, the radius of gyration and the second virial coefficient. An 
example of a Zimm plot is shown in figure Bl.9.6 where the light scattering data from a solution of poly 


(2.9±0.2)xl0 5 gmor 1 ;A 


-(6.7±1.3)xl0 -5 mol cm 3 g" 2 and R 


(tetrafluoroethylene) (PTFE) (M - v ^.^w.^~±w B m^i , ^ 2 

17. 8± 2.4 nm) in oligomers of poly(chlorotrifluoroethylene) (as solvents) at 340°C is shown [19]. Theliashed 

lines represent the extrapolated values at C = and 20 = 0. 
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.• 1 


sin 2 ( 6 ) + 250C 


Figure Bl.9.6. A typical Zimm plot; data obtained from a solution of poly(tetrafluoroethylene) (PTFE) (M = 

(2.9 ± 0.2) xlO 5 g mol" 1 ; A 2 = -(6.7 + 1.3) x 10" 5 mol cm 3 g" 2 and 7? = 17.8 ± 2.4 nm) in oligomers of poly 
(chlorotrifluoroethylene) (as solvents) at 340°C. (Reprinted with permission from Chu et al [19].) 

B1. 9.3.3 CALCULATE SCATTERED INTENSITY 

There are two ways to calculate the scattered intensity. One is to first calculate the magnitude of the structure 

factor F by summing the amplitude of scattering elements in the system and then multiply it by F (the 
conjugate of F). This method is best for the calculation of the scattered intensity from discrete particles such 
as spheres or rods. Two examples are illustrated as follows. 

(1) The scattering from an isolated sphere may be calculated from equation (B 1.9. 32) . This derivation 
assumes that the sphere is uniform, with its density profile p(r) = p if r < r Q and p(r) = if r > r Q 
(surrounded by a non-scattering material). With this assumption, equation (B 1.9. 3 2) becomes 


-I s 




(B1.9.41) 


By changing the variable, x = qr, we obtain 


F 


"spaa — J XS\R(x)<ix. 

fl" Jx- 


/t-0 


(B1.9.42) 
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We can integrate the above equation by parts and derive 


F if = ^[sin U - U cosi/] = V SJ>A) <£>([/) (B1.9.43) 

Tr 

where V is the sphere volume and 
*(f > = iHq sphere scallcrinE> fimelton = — [sin I' - V cos(V] (B1 .9.44) 

with the parameter U= qr^. The intensity then is proportional to 

F 2 (R = K, FF , where K^ is a calibration constant for light 
scattering). 

(2) This treatment may be extended to spheres 

core-shell structure. If the core density is p 
to r 1? the shell density is p 2 in the range o 
density of the surrounding medium is p Q? th 
of the structure factor becomes 

Fa = Vi(pi - Pi)<b{U\) +■ V 2 (ft - Pv\ 

The second method to calculate the scattered intensity (or R: the Rayleigh ratio) is to square the sum in 1 


R{q) = K\ J] ^Pipj exp[i(f/r y )]. 


' j 


For a continuous system with assorted scattering elements, the sum can be replaced by integration and be expr* 

Rig) = *f|l'(jr} fy(r)exp[i(^)]d 3 r 

where Fis the scattering volume, r| is the heterogeneity of the fluctuation parameter defined as r| = p - (p) and 
the correlation function defined as 
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Equation (B 1.9.47) applies to the general scattering expression of any system. With spherical symmetry, the 
scattered intensity becomes [ 20 ] 


ft(^y> =4jrAT|V{f| 2 > / y(r)r 2 ' — di\ (B1.9.49) 


In the case of anisotropic systems with a cylindrical symmetry (such as rods or fibres), the scattered intensity 
can be expressed as (derivation to be made later in section B 1.9.4 ): 

J y(r)Mg r r)rdr I y(z)cos(^z)dz (B1.9.50) 

where the subscript r represents the operation along the radial direction, the subscript z represents the 
operation along the cylinder axis, J Q is the zero-order Bessel function and q r and q z are scattering vectors 
along the r (equator) and z (meridian) directions, respectively. 

B1. 9.3.4 ESTIMATE OBJECT SIZE 

One of the most important functions in the application of light scattering is the ability to estimate the object 
dimensions. As we have discussed earlier for dilute solutions containing large molecules, equation (B 1.9. 38) 
can be used to calculate the 'radius of gyration', R , which is defined as the mean square distance from the 
centre of gravity [12]. The combined use of equation (Bl.9.38) equation (Bl.9.39) and equation (B 1.9.40) 
(the Zimm plot) will yield information on R , A 2 and molecular weight. 

The above approximation, however, is valid only for dilute solutions and with assemblies of molecules of 
similar structure. In the event that concentration is high where intermolecular interactions are very strong, or 
the system contains a less defined morphology, a different data analysis approach must be taken. One such 
approach was derived by Debye et al [21]. They have shown that for a random two-phase system with sharp 
boundaries, the correlation function may carry an exponential form. 

y(r) = e" r/ * (B1.9.51) 


where a Q is a correlation length describing the dimension of heterogeneities. Substitution of (Bl.9.51) into 
( B 1.9.47 ) gives rise to the expression 
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Based on this equation, one can make a 'Debye-Bueche' plot by plotting [R(q)]~ versus q and determine 
the slope and the intercept of the curve. The correlation length thus can be calculated as [ 21 ] 

/ slope \ l/2 

Uc - — " J (B1.9.53) 

\ intercept / v ' 


B1.9.4 X-RAY SCATTERING 

X-ray scattering arises from fluctuations in electron density. The general expression of the absolute scattered 
intensity / abs (#) (simplified as I(q) from now on) from the three-dimensional objects immersed in a different 


density medium, similar to ( B 1.9.47 ), can be expressed as: 


l(q) = K x V(7] 2 }[y(r)J Ufr) d 3 r (B1.9.54) 


where Fis the volume of the scatterer, (r| ) is the square of the electron density fluctuations, y (r) is the 
correlation function and K x is a calibration constant depending on the incident beam intensity and the optical 

apparatus geometry (e.g. polarization factor) given by 


3 v^ 


/ e 2 V 1 1+ cos 2 20 
K, = /o J TTT (B1-9.55) 

where e, m Q are the charge and mass of an electron, c is the velocity of light, D s is the sample to detector 
distance and 20 is the scattering angle. 

If the scattering system is isotropic, equation (B 1.9. 54) can be expressed in spherical polar coordinates (the 
derivation is similar to equation (B 1.9. 3 2) ): 

Hq) = ^K^tf) f Y(r)r 2 * mUfr) dr. (B1.9.56) 




'/'" 
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This expression is very similar to ( Bl.9.49 ). If the scattering system is anisotropic, equation (B 1.9. 54) can 
then be expressed in cylindrical polar coordinates (see figure Bl.9.5 : 

I{q) = /(<?,, ^, q z ) = K X VAU 2 ) fjf Vi^^ z )e if '* ^* *^>r drdydz (B1.9.57) 

where r represents the distance along the radial direction and z represents the distance along the cylindrical 
axis (in real space), q r and q z are correspondent scattering momenta along the r and z directions in reciprocal 
space, cp is the polar angle in real space and \\r is the polar angle in reciprocal space. Equation (B 1.9. 57) is a 
general expression for cylinders without any assumptions. If we consider the scatterers having the geometry 
of a cylinder, it is reasonable to assume y (r, cp, z) = y (r, cp) y (z), which indicates that the two correlation 
functions along the radial (equatorial) direction and the cylindrical (meridional) direction are independent. In 
addition, the term y (cp) = 1 can be applied, which represents the symmetry of a cylinder. Equation (B 1.9. 57) 
thus can be rewritten as 

nq) = K x V(t) 2 )j° a y(r)(j * e j '* "*-*» d A dr /* y(j)e^ <fc. (B1.9.58) 

If we define 8 = cp - \\i and rq r = u, then 

(B1.9.59) 


^2jt rln 




where Jq(w) is the zeroth order Bessel function of the first kind. Also, we have 

}/(:)e ,l4/: dz = j Y(z){zos{zq : ) ^\sm(zq-))dz 
y(z)cos{zq : )dz 


-/. 


(B1.9.60) 


because both y(z) and cos(zg z ) are even functions (i.e., y(z) = y(-z) and cos(zg z ) = cos(-zg z )) and sin(zg z ) is an 
odd function. The integral of the latter (y(z)sin(zg z ), an odd function) is an even function integrated from -go 
to oo, which becomes zero. Combining equations (B 1.9. 59) and (B 1.9. 60) into equation (B 1.9. 58), we obtain 

i(q) = K, V{i} 2 ) j y {r)2rtMq t r)rdr j 2y(z) cos(^-)dz. (B1.9.61) 
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This equation is the same as ( Bl.9.50 ) for light scattering. 

B1. 9.4.1 PARTICLE SCATTERING 

The general expression for particle scattering can best be described by the correlation function y(r). Using the 
definition in ( B 1.9.48 ), we have 


y(r)^(&p) 2 Mr) (B1.9.62) 

where Ap is the electron density difference between the particle and the surrounding medium and is assumed 
to be constant, y(0) = 1 and y(r) = 1 for r > D (the diameter of the particle). The function y (r) can be 
expressed by a distribution function G(l) with / being the intersection length or the chord length of the particle 
( figure B 1.9.7 ) [12, 13] 


-\i>- 


y (rj = -t / U - r)Git)dl (B1.9.63) 


with 


Jo 


UDdl. (B1.9.64) 

Jo 

By differentiation, we can derive 


\r fj r 


d ™ (r) ■ ' GU)dt 


(B1.9.65) 

The distance distribution function p(r) has a clear geometrical definition. It is defined as 

p(r) - y(r)r 2 r (B1.9.66) 

For homogeneous particles, it represents the number of distances within the particle. For inhomogeneous 
particles, it has to take into account the different electron density of the volume elements. Thus it represents 
the number of pairs of difference in electrons separated by the distance r. A qualitative description of shape 
and internal structure of the 
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particle can be obtained directly from p(r). In addition, several structural parameters can be determined 
quantitatively [22]. We can describe several analytical forms of the distance distribution function for different 
shapes of homogeneous particles as follows. 



Figure Bl.9.7. Diagram to illustrate the relationship between chord length (£) and r in a particle. 
( 1 ) Globular particles 

p{r) = 12.r 2 (2 - 3-V 1 r 3 ) (B1.9.67) 


where x = r/D. 

(2) Rodlike particles 

P(r) = — plA 2 (L-r) (B1.9.68) 

Itz 

where p c is the particle electron density, A is the cross-section of the rod and L is the length of the rod 
particle. 

(3) Flat particles, i.e. particles elongated in two dimensions (discs, flat prisms) 


16 , T~ 

p(r) = — *(arccos(*) — JfVO — x 1 )} (bi.9.69) 

71 


where x=r/D. 
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The radius of gyration of the whole particle, R , can be obtained from the distance distribution function p(r) as 

R* = (f X p(r)r 2 <b\ ( j K p(,)6r\ . (B1.9.70) 

This value can also be obtained from the innermost part of the scattering curve. 


(-¥) 


IUf) = /(0)exp( ^— (B1.9.71) 


where 1(0) is the scattered intensity at zero scattering angle. A plot of log I(q) versus q 2 , which is known as 
the Guinier plot, should show a linear descent with a negative slope (= — R ^/3)related to the radius of 

gyration. 

In practical data analysis, we are interested in extracting information about the size, shape and distribution of 
the scattering particles. The most widely used approach for this purpose is the indirect Fourier transformation 
method, pioneered by O Glatter [23, 24,and 25]. This approach can be briefly illustrated as follows. In dilute 
solutions with spherical particles, where the interparticle scattering is negligible, the distance distribution 
function p(r) can be directly calculated from the scattering data through Fourier transformation (combining 
equations ( Bl.9.56 ) and (B 1.9.66)) 


2 {^ y sintt/r) 


tUf) = l7TK x V(tf) [ p(r)^^dr. (B1.9.72) 


This process is shown in figure figure Bl.9.8 where T, represents the Fourier transformation. If there are 
additional instrumentation effects desmearing the data (such as the slit geometry, wavelength distribution etc), 
appropriate inverse mathematical transformations (r 2 , T^, T 4 ) can be used to calculate p(r). If the particle has 
a certain shape, the sin(x)/x term in (Bl .9.72) must be replaced by the form factor according to the shape 
assumed. If concentrations increase, the interparticle scattering (the so-called structure factor) should be 
considered, which will be discussed later in section B 1.9. 5 . A unique example to illustrate the usefulness of 
the distance distribution function is the study of a DNA-dependent RNA polymerase core enzyme [26, 27 ] 
( figure Bl.9.9 ). The RNA polymerase enzyme is known to have four subunits but in two possible 
configurations (model 1 and model 2). Model 1 has a configuration with the two larger subunits having a 
centre-to-centre distance of 5 nm, and model 2 has a more open configuration having a centre-to-centre 
distance of 7 nm. From the experimental data (open circle), it is clear that model 1 gives a better fit to the 
data. 


Experiment 
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Figure Bl.9.8. Schematic diagram of the relationship between a particle distribution and the measured 
experimental scattering data. This figure is duplicated from [14],with permission from Academic Press. 



r(nm) 


Figure Bl.9.9. Comparison of the distance distribution function p(r) of a RNA-polymerase core enzyme from 
the experimental data (open circle) and the simulation data (using two different models). This figure is 
duplicated from [27], with permission from Elsevier Science. 
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B1. 9.4.2 NONPARTICULATE SCATTERING 


If we consider the scattering from a general two-phase system (figure B 1.9. 10) distinguished by indices 1 and 
2) containing constant electron density in each phase, we can define an average electron density pand a mean 

square density fluctuation as: 


P = Pipi +<P2Pi 


rj 2 = (pi -pi) pip2 


(B1.9.73) 
(B1.9.74) 


where cp and cp are the fractions of the total volume V. Equation (B 1.9. 74) is directly related to the invariant Q 
of the system 


Q = 2jt 2 V^ = 27r 2 V{pi - p 2 ) 2 <p m . 


(B1.9.75) 


In this case, the correlation function y(r) becomes 


(B1.9.76) 


where y (r) is the normalized correlation function, which is related to the geometry of the particle. 



Figure Bl.9.10. Non-particulate random two-phase system. 

There are several geometrical variables that one can extract from the correlation function approach. First, a 
correlation 
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volume v can be defined, which is related to the extrapolated intensity at zero angle 1(0). 


/(G) = V(p, -p>)Vi«^c- 


(B1.9.77) 


With the use of invariant Q, we obtain 


2jt 2 


(B1.9.78) 


If we consider the chord length distribution, we can express the alternating chords L and L as 


V V 


(B1.9.79) 


where S is the total surface area. The mean chord length thus becomes 

V 
I = l [<fi2 = l 2 ip] = 4 — PJVJ2- (B1.9.80) 

As we will discuss in the next section, the scattered intensity I(q) at very large q values will be proportional to 
the q term. This is the well known Porod approximation, which has the relationship 


g—rc / q 


\\m l{q) = Vip x <p 2 {p\ -p 2 ) 2 —— ■ (B1.9.81) 


It is sometimes more convenient to normalize the absolute scattered intensity I(q) and use the following 
expression: 

1 S 
lim Hq)q 4 fQ = — . (B1.9.82) 

if -\ Jr^iVs *' 

Thus the ratio of 5/Fcan be determined by the limiting value of I(q)q 4 . We will discuss the Porod 
approximation next. 
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B1. 9.4.3 POROD APPROXIMATION 


According to the Porod law [28], the intensity in the tail of a scattering curve from an isotropic two-phase 
structure having sharp phase boundaries can be given by equation (Bl.9.81) . In fact, this equation can also be 
derived from the deneral xpression of scattering ( Bl.9.56 ). The derivation is as follows. If we assume qr = u 
and use the Taylor expansion at large q, we can rewrite ( Bl.9.56 ) as 


= \ f / Y(0)us'mto)du+ I n 2 sin(w)dH (B1.9.83) 


where y' and y" are the first and second derivatives of y. The following expressions can be derived to simplify 
the above equation. 

fCC roc roo 

I jcs\wix)dx = 0, I * 2 sin(.v)dc = -2 h / r^ siii(,t>d_v = 0. (B1.9.84) 

Jo JO .'o 

Then 


I(q ) = - *!*ij*?i y '(0> + Ofo"* d 211 " VOl/dr 2 - 5 ) (* = 3 n 4, 5 , , .). (B1 .9.85) 

The second term in equation (B 1.9. 85) rapidly approaches zero in the large q region, thus 

]im IU/) = ^ p'(0). (B1.9.86) 


This equation is the same as equation (Bl.9.81) , which is termed the Porod law. Thus, the scattered intensity I 

(q) at very 
interfaces. 


(q) at very large q values will be proportional to the q term, this relationship is valid only for sharp 


The least recognized forms of the Porod approximation are for the anisotropic system. If we consider the 
cylindrical scattering expression of equation (B 1.9.61) , there are two principal axes (z and r directions) to be 
discussed 
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I(q) = K*V(ij 2 ) f y(r)2nMq t r)rdr [ 2y(z)cos(q z z)dz. (B1.9.61) 

where q r and q z are the scattering vectors along the r or z directions, respectively. 

(1) For the component of the scattered intensity along the equatorial direction (i.e., perpendicular to the 
cylinder direction, q y = 0), equation (B 1.9.61) can be simplified as 

l{q r ) = K[V{r} 2 ) f Y{r)2jrM(fr)rdr (B1.9.87) 

where K[ (=K iX JJJ* 2y U)dz)is a new constant, because the integral of y(z) is a constant. In this equation, 

the correlation function y(r) can again be expanded by the Taylor series in the region of small r, which 
becomes 


I(q f ) = V^ / y(0)wJo0Odw + / u 2 Mu)du 

<?; Wo Jo Qr 

The following relationships can be used to simplify the above equation 

pre pre poo 

j x J$(x) dx = 0, / x 2 Jt)(x)dx = -1, / ** Jo(.r) d* = 0. (B1.9.89) 

JO J(3 Ju 

Thus, we obtain 

(B1.9.90) 


(B1.9.88) 


27tK:V{!} 2 ) rj 


-2n-] j2w-I 


J(^) = - J " V r (Q) + Q(^"- | ,d J "-V(0)/dr' M ) ( ff = 2,3,4...). 

This expression holds true only in the large q r region in reciprocal space (or small r in real space). Since 
the second term in equation (B 1.9. 88) rapidly approaches zero at large q r , we have 

lim I(q f ) = \ w J /(0). (B1.9.91) 
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This equation is the Porod law for the large-angle tail of the scattering curve along the equatorial 
direction, which indicates that the equatorial scattered intensity I(q r ) is proportional to q^*in. the Porod 

region of an anisotropic system. Cohen and Thomas have derived the following relationships for the 
two-dimensional two-phase system (with sharp interfaces) such as fibres [29]. 


(ij 2 ) = y,(l-y ] }(/>,-^) 2 (B1.9.92) 

Y m ~ ~~A Kvtil - v x ) ( B1 - 993 ) 

where o represents the area fraction of a phase, Lj is the length of the interface between the two phases 
and A is the total cross sectional area. 

(2) For the component of the scattered intensity along the meridional direction (i.e., parallel to the cylinder 
direction,*^ = 0), equation (B 1.9.61) can be rewritten as 




I(tf.)=K^V{n l ) I 2y(z)COs(q;$<}z (B1.9.94) 

Jo 

where K" (=ff f f x 2nyir)rflr)is a constant, because the integral of y(r)r is a constant. Again, we can 

expand the term y(z) using the Taylor series in the small z region to derive the Porod law. Equation 
(B 1 .9.94) thus becomes {q v z = v). 


/(^.) = — if y(0)cos(M)du+ / - — -vcos(v)dv 

<iz \Jo Jo Vz 


(B1.9.95) 

/ ■— i-rcos(M)dv + --- J. 
Jo % / 


The following expressions can be derived to simplify (Bl .9.95) 


f cosU)<U-0. / *cosU><h = -l, / x 2 cos{x)dx = 0. (B1.9.96) 

o Jo Jo 
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Thus, 

!{q t ) = - 2K ^: { ^ ] Y' m + OWr 2 ", d 2 "" VWdr 2 "" 1 ) (n = 2, 3, 4...). (B1.9.97) 

Again, the second term in (B 1.9. 97) rapidly approaches zero at large q z , thus we obtain 

lim I(i fz ) = Y , W y'(O). (B1.9.98) 

If the scatterers are elongated along the fibre direction and the two phases and their interfaces have the same 
consistency in both radial and z directions, a different expression of the Porod law can be derived. 

2K"V(ft - p 2 ) 2 C 
lim l(q z ) = ' '\ - (B1.9.99) 

This is the Porod law for the large angle tail of the scattering curve in the meridional direction. In this case, 
the scattered intensity is proportional to f ^_ -at large scattering angles. 

B1.9.4.4 SCATTERING FROM SEMICRYSTALLINE POLYMERS 

Semicrystalline polymers are ideal objects to be studied by small-angle x-ray scattering (SAXS), because 
electron density variations of the semicrystalline morphology (with alternating crystalline and amorphous 
structures) have a correlation length of several hundred Angstroms, which falls in the resolution range of 
SAXS (1-100 nm). In addition, the semicrystalline structures can usually be described by assuming electron 
density variations to occur in one coordinate only. In this case, the scattered intensity I(q) can be described by 
a one-dimensional correlation function y^r). 

The scattered intensity measured from the isotropic three-dimensional object can be transformed to the one- 
dimensional intensity function I^(q) by means of the Lorentz correction [ 15 ] 

I\(q) = lUjHTiq 1 (B1.9.100) 

where the term 4nq 2 represents the scattering volume correction in space. In this case, the correlation and 
interface distribution functions become 
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Y\(r) 


= ij ^(qycQsiqridcjj / Q (B1.9.101) 


qdr) = d 2 {Y\ir))fdr 


= K 


Ii{qWcQsiqr)dtt]fQ 


)/< 


(B1.9.102) 


where £( = f™ i { (q)dq = 4jt / x I(q)q 2 ik/)i s the invariant. The above two equations are valid only for 
lamellar structures. 

Lamellar morphology variables in semicrystalline polymers can be estimated from the correlation and 
interface distribution functions using a two-phase model. The analysis of a correlation function by the two- 
phase model has been demonstrated in detail before [ 30 , 31 ]. The thicknesses of the two constituent phases 
(crystal and amorphous) can be extracted by several approaches described by Strobl and Schneider [32]. For 
example, one approach is based on the following relationship: 


B 
L 


(B1.9.103) 


where x 1 and x 2 are the linear fractions of the two phases within the lamellar morphology, B is the value of the 
abscissa when the ordinate is first equal to zero in y^(r) and L represents the long period determined as the 
first maximum of y^r) (figure Bl.9.1 1). 


0.015 



Figure Bl.9.11. The analysis of correlation function using a lamellar model. 
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The analysis of the interface distribution function g^r) is relatively straightforward [33]. The profile of g^(r) 
can be directly calculated from the Fourier transformation of the interference function or by taking the second 
derivative of the correlation function ( Bl.9.102 ). In the physical sense, the interface distribution function 
represents the probability of finding an interface along the density profile. A positive value indicates an even 
number of interfaces within a real space distance with respect to the origin. A negative value indicates an odd 
number of interfaces within the corresponding distance. With lamellar morphology, odd numbers of interfaces 
correspond to integral numbers of long periods. The shape of the probability distribution with distance for a 
given interface manifests as the shape of the corresponding peak on the interface distribution function. These 
distributions can be deconvoluted to reveal more detailed morphological parameters [34]. The schematic 
diagram of the relationships between the one-dimensional electron density profile, p(r), correlation function, 


y^r), and interface distribution function, g^r), is shown in figure Bl.9.12. In general, we find the values of 
the long period calculated from different methods, such as a conventional analysis by using Bragg 's law, the 
correlation function and the interface distribution function, to be quite different. However, their trends as 
functions of time and temperature are usually similar. The ordering of these long periods indicates the 
heterogeneity of the lamellar distributions in the morphology [35], 
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Figure Bl.9.12. The schematic diagram of the relationships between the one-dimensional electron density 
profile, p(r), correlation function y^r) and interface distribution function g^r). 
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In oriented systems (fibres or stretched films), the scattered image often appears as a two-bar or a four-point 
pattern with the scattering maximum at or near the meridian (fibre axis). The one-dimensional scattered 
intensity along the meridian must be calculated by the projection method using the following formalism 


IiUh) 


-i 


'V 


lUlr*tfz)qrd(tr* 


(B1.9.104) 


This intensity can be used to calculate the correlation function ( Bl.9.101 ) and the interface distribution 
function ( B 1.9. 102 ) and to yield the lamellar crystal and amorphous layer thicknesses along the fibre. 

Recently, a unique approach for using the correlation function method has been demonstrated to extract 
morphological variables in crystalline polymers from time-resolved synchrotron SAXS data. The principle of 
the calculation is based on two alternative expressions of Porod's law using the form of interference function 
[ 33 , 36 ]. This approach enables a continuous estimate of the Porod constant, corrections for liquid scattering 


and finite interface between the two phases, from the time-resolved data. Many detailed morphological 
variables such as lamellar long period, thicknesses of crystal and amorphous phases, interface thickness and 
scattering invariant can be estimated. An example analysis of isothermal crystallization in poly 
(ethyleneterephthalate) (PET) at 230°C measured by synchrotron SAXS is illustrated here. Time-resolved 
synchrotron SAXS profiles after the removal of background scattering (air and windows), calculated 
correlation function profiles and morphological variables extracted by using the two-phase crystal lamellar 
model are shown in figure B 1.9. 13 . Two distinguishable stages are seen in this figure (the first stage was 
collected at 5 seconds per scan; the later stage was collected at 30 seconds per scan). It is seen that the long 
period £;^and crystal lamellar thickness / decrease with time. This behaviour can be explained by the space 

filling of thinner secondary crystal lamellae after the initial formation of thicker primary lamellae during 
isothermal crystallization [36]. 
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Figure Bl.9.13. Time-resolved SAXS profiles during isothermal crystallization (230 °C) of PET (the first 48 
scans were collected with 5 seconds scan time, the last 52 scans were collected with 30 seconds scan time); 
calculated correlation functions y(r) (normalized by the invariant Q) and lamellar morphological variables 
extracted from the correlation functions (invariant Q, long period t;^, crystal lamellar thickness / and 

interlamellar amorphous thickness / a ). 
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B1.9.5 NEUTRON SCATTERING 

Neutron scattering depends upon nuclear properties, which are related to fluctuations in the neutron scattering 
cross section a between the scatterer and the surroundings. The scattered amplitude from a collection of 
scatterers can thus be written as (similar to ( Bl.9.29 )): 


F{q) = ^ h J OtptiflO-) (B1.9.105) 

where b. is referred to as the 'scattering length' of the objecty; its square is related to the scattering cross 
section ((<*/ = 4tf/>j)) of theyth object. The value of b. depends on the property of the nucleus and is generally 

different for different isotopes of the same element. As neutron scattering by nature is a nuclear event and the 
wavelength used in neutron scattering is much larger than the nuclear dimensions, the intranuclear 
interference of waves due to neutron scattering need not be considered such that b. is normally independent of 
scattering angle. 

In neutron scattering, the scattered intensity is often expressed in terms of the differential scattering cross 
section: 

i)a N 

— = Y"{bihj exp(i*/(/v - r,))) (B19.106) 

oii 77 

■ j 

where dQ is the small solid angle into which the scattered neutrons are accepted. We can mathematically 
divide the above equation into two parts: 

3(7 N 

— = y^ihibj&wOqirj - r t ))) (B1.9.107) 
dli 77 

If we define 


&h 2 =(h l )-{by (B1.9.108) 
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we can obtain the following relationship: 


3iJ N 

— = N&b 2 + (h) 2 ^{expdqirj - r,))} (B19.109) 

U 

where ( b) = b . The first term represents the incoherent scattering, which depends only on the fluctuations of 
the scattering length b occupying the different positions. The second term is the coherent scattering, which 


depends on the positions of all the scattering centres and is responsible for the angular dependence of the 
scattered intensity. In this chapter, we shall only focus on the phenomenon of coherent scattering. Using the 
concept of correlation function (as in equation (B 1.9. 54) ), we obtain 


'w=diL= w ' 2 /" (r ' e " k " d! 


(B1.9.110) 


where K is a calibration constant depending on the incident flux and apparatus geometry. The expressions of 
scattered intensity for isotropic and anisotropic systems can be obtained similarly to equations ( B 1.9.49 ) and 
(Bl.9.50 ). 


B1. 9.5.1 SCATTERING FROM MULTICOMPONENT SYSTEMS 


Equation (B 1.9.1 10) is for a system containing only one type of scattering length. Let us consider the system 
containing more than one species, such as a two-species mixture with N^ molecules of scattering length b^ 
and N 2 of scattering length b 2 . The coherent scattered intensity I(q) becomes 


AS /Vi 


-V| N> 




'I /3 

ft m 


Ft? /V: ^ 


(B1.9.111) 


or 


/(</) = K„\bi j y, , (r) e"^ dV + 2^2 J YnO) e" 1 *' dV 
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(B1.9.112) 


where y.. is the correlation function between the local density of constituents i andy. We can define three 
partial structural factors, S^, S^ 2 an d S 22 


S u (r) = K n j y u {r)s- v > r tfr. 


(B1.9.113) 


Thus 


I{q) = b 2 l S u (q)+2b ] b 2 S l2 (q) + blS 22 (q). 


(B1.9.114) 


If the two constituting molecular volumes are identical in a two component system, we can obtain [37, 38] 

S\\Uj) = S22U1) = -S ]2 {q) (B1.9.115) 


or 


Let us assume that we have/? + 1 different species; equation (B 1.9. Ill) can be generalized as 

P r 

Hq) = ^A?S -( to)+2 J^A,*,^^). (B1.9.117) 


'<J 


B1. 9.5.2 PROPERTIES OF S(Q) 

As we have introduced the structure factor S(q) (B 1.9.1 13), it is useful to separate this factor into two 
categories of interferences for a system containing TV scattering particles [9]: 

S(q) = N[PUt) + NQlq)l (B1.9.118) 


The first term, P(q), represents the interferences within particles and its contribution is proportional to the 
number of particle, TV. The second term, Q{q), involves interparticle interferences and is proportional to the 
number of pairs of particles, TV . 
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Let us consider the scattered intensity from a binary incompressible mixture of two species (containing N^ 
molecules of particle 1 and N 2 molecules of particle 2) as in ( Bl.9.112 ); we can rewrite the relationship as 


/(ff) = fc^ l f/ > i^) + /V,eii(ff)l + *lJV 2 fft(*)+W J {>22(^l +2V 2J V,tf 2 G ,,{<,} (B1.9.119) 

where Pfq) is the intramolecular interference of species i and QJq) is the intermolecular interferences 
between species i andy. 

Two of the most important functions in the application of neutron scattering are the use of deuterium labelling 
for the study of molecular conformation in the bulk state and the use of deuterium solvent in polymer 
solutions. In the following, we will consider several different applications of the general formula to 
deuteration. 

(1) Let us first consider two identical polymers, one deuterated and the other not, in a melt or a glassy state. 
The two polymers (degree of polymerization d) differ from each other only by scattering lengths Z? H and 
Z? D . If the total number of molecules is TV, x is the volume fraction of the deuterated species (x = TV D / TV, 
with TV D + TV H = TV). According to equation (B 1.9. 116) , we obtain 


fUf) = (bn - h H ) 2 S DD (q) (B1.9.120) 


or ^HH or ~^HD* Tk e coherent scattering factor can further be expressed in terms of P(q) and Q(q) as 
(see equation (B 1.9. 117) ) 


S DD = xNd 2 P(q) + x 2 N 2 tl 2 QU l ) = (I -x)Ntl 2 P(t t ) + (l -xfd 2 N 2 Q(t f ) 


(B1.9.121) 


= -.v(1 -x)N'tl*Q(q). 

Thus 

NQ(q) = -P(q). (B1.9.122) 

By combining equations (B 1.9. 120) and (B 1.9. 122), we have 

I{q) = (h D -h H fx(l -x)NtPP(q). (B1.9.123) 
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( 2 ) Next, let us consider the case of a system made up of two polymers with different degrees of 

polymerization J D for the deuterated species and J H for the other. The generalized expression of 
Bl.9.122 becomes: 


-N D 4P D (q) = NldlQviq) + N D Nnd D dnQ DH iq). (B1.9.124) 

(3) Let us take two polymers (one deuterated and one hydrogenated) and dissolve them in a solvent (or 
another polymer) having a scattering length b^. The coherent scattered intensity can be derived from 
( B 1.9. 117 ), which gives 


/(<?) = (hu - h^Suuiq) + (b H - bo) 2 S H n(q) 
+ 2(b D - h (} )(b u - 6 )Shd(*?) 


(B1.9.125) 


where 


Sn\l =(l-x)NtPP(q) + {l-x) 2 N 2 d 2 Q(tf) (B1. 9.126) 

S m = -x(i-x)NJ 2 Q(q). 

Thus, we have 
/(</) = (A|>- 6^**0 -x)Nd'PUn +U*o+ (1 -x)b H - hn} 2 Nd 2 [P(q) +dQUi}]< (B1. 9.127) 

The second term of the above equation gives an important adjustment to contrast variation between the 


solvent and the polymer. If one adjusts the scattering length b^ of the solvent by using a mixture of deuterated 
and hydrogenated solvent such that 

Xbjy + (1 — x)hu — /?o = (B1.9.128) 


then we can obtain ( Bl.9.123 ). This experiment thus yields directly the form factor P(q) of the polymer 
molecules in solution even at high polymer concentrations. 

B1. 9.5.4 ANALYSIS OF MOLECULAR PARAMETERS 

There are many different data analysis schemes to estimate the structure and molecular parameters of 
polymers from the neutron scattering data. Herein, we will present several common methods for 
characterizing the scattering profiles, depending only on the applicable q range. These methods, which were 
derived based on different assumptions, have 
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different limitations. We caution the reader to check the limitations of each method before its application. 

(1) If we deal with a solution at very low concentrations, we can ignore the interactions between the 
particles and express the scattered intensity as 


lOg/(^) = lOg/(0) — ^ 2 /?g+ ■- (B1. 9.129) 


where R is the radius of gyration. This equation is similar to ( Bl.9.35 ). If one plots log[/(g)] as a 

function of q , the initial part is a straight line with a negative slope proportional to R , which is called 
the Guinier plot. This approach is only suitable for scattering in the low qR range ana in dilute 

concentrations. A similar expression proposed by Zimm has a slightly different form: 


S(q) 


= (l + Jtf /?g +**-) (B1.9.130) 


where TV is the number of scattering objects and d is the degree of polymerization. This equation is 
similar to equation (Bl.9.35) . Thus if one plots l/S(q) as function q 2 , the initial slope is Jfg/3. 

The above radius of gyration is for an isotropic system. If the system is anisotropic, the mean square 
radius of gyration is equal to 


(R$) = {R 2 K ) + {Rl) + (R 2 ,) (B1.9.131) 


where R , R and R are the components of the radius of gyration along the x, y, z axes. For the isotropic 
system 


(*;) = ^> + {^}= \(R*). (B1.9.132) 

(2) In the intermediate and high q range, the analysis becomes quite different. The qualitative interpretation 
for the scattering profile at the high q range may make the use of scaling argument proposed by de 
Gennes [39]. If we neglect the intermolecular interactions, we can write 

${q) = Nd 2 P\qR^ = VpdPiqRs) (B1.9.133) 
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where Fis the volume of the sample, cp is the volume fraction occupied by the scattering units in 
polymer with a degree of polymerization d and qR is a dimensionless quantity which is associated with 

a characteristic dimension. Typically, the term P(qR ) can be approximated by (qR )~P. Since the 
relationships between 7? and z are known as follow^ g 


a Gaussian chain R% *w f/° 5 

a chain with excluded volume (Fiery) R„ =s tf 06 

in (B1.9.134) 

a rod tf g ^rf 10 

general expression R g ^ d (1 > 
This leads to the following equation: 

S(q) = V<pd{qd a y p = V<pq~ /I d l ~* p - (B1.9.135) 


In order to have S(q) independent of d, the power of d must be zero giving a = 1/p. This gives rise to the 
following relationships: 

a Gaussian chain S{q) = q~ 2 
a chain with excluded volume (Flory) S{q) = q~ lM (B1.9.136) 


a rod Siq) = q 


! 


(3) The quantitative analysis of the scattering profile in the high q range can be made by using the approach 
of Debye et al as in equation (B 1.9. 52) . As we assume that the correlation function y(r) has a simple 
exponential form y(r) = exp(-r/a c ), where a Q is the correlation length), the scattered intensity can be 
expressed as 


(l+q 2 4) 7 


fUf) = ,*^ 2 22 (B1.9.137) 


where cp is the volume fraction of the component (with scattering length Z? 1 and volume of the monomer 
of the polymer Oj), (1 - cp) is the volume fraction of the solvent (with scattering length b^ and volume of 
the monomer of the other polymer u Q ) and b v = b^/v^ - b^/v^. Thus the correlation length a Q can be 
calculated by plotting I(q) versus q , using equation (B 1.9. 53) (as light scattering). 


A more general case of continuously varying density was treated by Ornstein and Zernicke for scattering of 
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opalescence from a two-phase system [40]. They argued that 


y{r) = constant (B1.9.138) 


where ^ is a characteristic length. This leads to 

B 
S W = l+q^2 (B1.9.139) 

where B is a constant. 

B1.9.5.5 CALCULATE THERMODYNAMIC PARAMETER 

In polymer solutions or blends, one of the most important thermodynamic parameters that can be calculated 
from the (neutron) scattering data is the enthalpic interaction parameter % between the components. Based on 
the Flory-Huggins theory [41, 42], the scattering intensity from a polymer in a solution can be expressed as 

= + 2/ (B1.9.140) 

s(q) <pzPUf) ipn 

where s(q) = S(q)/N T corresponds to the scattered intensity from a volume equal to that of a molecule of 
solvent, TV^is the ratio of the solution volume to the volume of one solvent molecule (N T = V/v s ), cp is the 
volume fraction occupied by the polymer and cp is the volume fraction occupied by the solvent, u § is the 
solvent specific volume. For a binary polymer blend, equation (B 1.9. 140) can be generalized as: 

= + 2x (B1.9.141) 

s(q) tpiZiPiUt) <piZiPiU f ) 

where the subscripts 1 and 2 represent a solution of polymer 1 in polymer 2. The generalized Flory-Huggins 
model in the de Gennes formalism with the random phase approximation has the form: 

V = 1 [ I + 1 2^1 

/(W bl\{CiM\fN A )Piiq){v\)i {C 2 M 2 /N A }P2^){vl) 2 ^| (B1.9.142) 

where b y is the contrast factor as described before, N A is Avogadro's number, C is the concentration, M is the 
molecular weight, u § is the specific volume (Cjuf + Cii4 = Hand u Q is an arbitrary reference volume. 


-39- 


In the case of low concentration and low q expansion, equation (B 1.9. 142) can be expressed by replacing P. 
(q) with the Guinier approximation ( Bl.9.130 ) 


I^L(_L + _L_ 2x ] + i 


3l + 3. 


[<PlZl <p2Z2 


V 


(B1.9.143) 


Thus the slope of the I(q) versus q plot is related to the values of two radii of gyration. 


B1.6.1 CONCLUDING REMARKS 

In this chapter, we have reviewed the general scattering principles from matter by light, neutrons and x-rays 
and the data treatments for the different states of matter. The interaction between radiation and matter has the 
same formalism for all three cases of scattering, but the difference arises from the intrinsic property of each 
radiation. The major difference in data treatments results from the different states of matter. Although we 
have provided a broad overview of the different treatments, the content is by no means complete. Our 
objective in this chapter is to provide the reader a general background for the applications of scattering 
techniques to materials using light, neutrons and x-rays. 
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B1.10 Coincidence techniques 

Michael A Coplan 


B1.10 INTRODUCTION 

Time is a fundamental variable in almost all experimental measurements. For some experiments, time is 


measured directly, as in the determination of radioactive halflives and the lifetimes of excited states of atoms 
and molecules. Velocity measurements require the measurement of the time required for an object to travel a 
fixed distance. Time measurements are also used to identify events that bear some correlation to each other 
and as a means for reducing background noise in experiments that would otherwise be difficult or impossible 
to perform. Examples of time correlation measurements are so-called coincidence measurements where two or 
more separate events can be associated with a common originating event by virtue of their time correlation. 
Positron annihilation in which a positron and an electron combine to yield two gamma rays, is an example of 
this kind of coincidence measurement. Electron impact ionization of atoms in which an incident electron 
strikes an atom, ejects an electron and is simultaneously scattered is another example. In such an experiment 
the ejected and scattered electrons originate from the same event and their arrival at two separate detectors is 
correlated in time. In one of the very first coincidence experiments Bothe and Geiger [1] used the time 
correlation between the recoil electron and inelastically scattered x-ray photon, as recorded on a moving film 
strip, to identify Compton scattering of an incident x-ray and establish energy conservation on the 
microscopic level. An example of the use of time correlation to enhance signal-to-noise ratios can be found in 
experiments where there is a background signal that is uncorrected with the signal of interest. The effect of 
penetrating high-energy charged particles from cosmic rays can be eliminated from a gamma ray detector by 
construction of an anti-coincidence shield. Because a signal from the shield will also be accompanied by a 
signal at the detector, these spurious signals can be eliminated. 

Experiments in almost all subjects from biophysics to high-energy particle physics and cosmic ray physics use 
time correlation methods. There are a few general principles that govern all time correlation measurements 
and these will be discussed in sufficient detail to be useful in not only constructing experiments where time is 
a parameter, but also in evaluating the results of time measurements and optimizing the operating parameters. 
In a general way, all physical measurements either implicitly or explicitly have time as a variable. 
Recognizing this is essential in the design of experiments and analysis of the results. The rapid pace of 
improvements and innovation in electronic devices and computers have provided the experimenter with 
electronic solutions to experimental problems that in the past could only be solved with custom hardware. 


B1. 10.2 STATISTICS 

B1. 10.2.1 CORRELATED AND RANDOM EVENTS 

Correlated events are related in time and this time relation can be measured either with respect to an external 
clock or to the events themselves. Random or uncorrelated events bear no fixed time relation to each other 
but, on the other 


hand, their very randomness allows them to be quantified. Consider the passing of cars on a busy street. It is 
possible to calculate the probability, ^} 00 that n cars pass within a given time interval in terms of the average 
number of cars, fi, in that interval where P/M), the Poisson distribution is given by [2]. 

n" - 

For example, if one finds that 100 cars pass a fixed position on a highway in an hour, then the average number 

of cars per minute is 100/60. The probability that two cars pass in a minute is given by ^0.6 2 e -0 ' 6 = 0.10. The 
probability that three times the average number of cars pass per unit time is 0.02. /^(^Jand the integral of 
/^(fl)for Jiare shown in figure B 1.10.1. It is worthwhile to note that only a single parameter, fi, the average 
value, is sufficient to define the function; moreover, the function is not symmetric about ii. 
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Figure Bl.10.1. Poisson distribution for ii(left vertical axis). Cumulative Poisson distribution for ii= 1 (right 
vertical axis). The cumulative distribution is the sum of the values of the distribution from to n, where n=l, 
2, 3, 4, 5 on this graph. 

This example of passing cars has implications for counting experiments. An arrangement for particle counting 
is shown in figure B 1.10.2 . It consists of the source of particles, a detector, preamplifier, 
amplifier/discriminator, counter, and a storage device for recording the results. The detector converts the 
energy of the particle to an electrical signal that is amplified by a low-noise preamplifier to a level sufficient 
to be amplified and shaped by the amplifier. The discriminator converts the signal from the amplifier to a 
standard electrical pulse of fixed height and width, provided that the amplitude of the signal from the 
amplifier exceeds a set threshold. The counter records the number of pulses from the discriminator for a set 
period of time. The factors that affect the measurement are counting rate, signal durations, and processing 
times. 
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Figure Bl.10.2. Schematic diagram of a counting experiment. The detector intercepts signals from the source. 
The output of the detector is amplified by a preamplifier and then shaped and amplified further by an 
amplifier. The discriminator has variable lower and upper level thresholds. If a signal from the amplifier 
exceeds the lower threshold while remaining below the upper threshold, a pulse is produced that can be 
registered by a preprogrammed counter. The contents of the counter can be periodically transferred to an on- 
line storage device for further processing and analysis. The pulse shapes produced by each of the devices are 
shown schematically above them. 


In counting experiments, the instantaneous rate at which particles arrive at the detector can be significantly 


different from the average rate. In order to assess the rate at which the system can accept data, it is necessary 
to know how the signals from the detector, preamplifier, amplifier and discriminator each vary with time. For 
example, if signals are arriving at an average rate of 1 kHz at the detector, the average time between the start 
of each signal is 10 s. If we consider a time interval of 10 s, the average number of signals arriving in that 
interval is one; if as many as three pulses can be registered in the 10 s then 99% of all pulses will be 
counted. To accommodate the case of three pulses in 10~ 3 s it is necessary to recognize that the pulses will be 
randomly distributed in the 10~ 3 s interval. To register the three randomly distributed pulses in the 10~ 3 s 
interval requires approximately another factor of three in time resolution, with the result that the system must 
have the capability of registering events at a 10 kHz uniform rate in order to be able to register with 99% 
efficiency randomly arriving events arriving at a 1 kHz average rate. For 99.9% efficiency, the required 
system bandwidth increases to 100 kHz. Provided that the discriminator and counter are capable of handling 
the rates, it is then necessary to be sure that the duration of all of the electronic signals are consistent with the 
required time resolution. For 1 kHz, 10 kHz and 100 kHz bandwidths this means signal durations of 0.1, 0.01, 
and 0.001 ms, respectively. An excellent discussion of the application of statistics to physics experiments is 
given by Melissinos [3]. 

The processing times of the electronic units must also be taken into consideration. There are propagation 
delays associated with the active devices in the electronics circuits and delays associated with the actual 
registering of events by the counter. Processing delays are specified by the manufacturers of counters in terms 
of maximum count rate: a 10 MHz counter may only count at a 10 MHz rate if the input signals arriving at a 
uniform rate. For randomly arriving signals with a 10 MHz average rate, a system with a 100 MHz bandwidth 
is required to record 99% of the incoming events. 

B1. 10.2.2 STATISTICAL UNCERTAINTIES 

In counting experiments the result that is sought is a rate or the number of events registered per unit time. For 


convenience we divide the total time of the measurement, T, into n equal time intervals, each of length Tin. If 
there are TV. counts registered in interval /, then the mean or average rate is given by 




where N N is the total number of counts registered during the course of the experiment. To assess the 
uncertainty in the overall measured rate, we assume that the individual rate measurements are statistically 
distributed about the average value, in other words, they arise from statistical fluctuations in the arrival times 
of the events and not from uncertainties in the measuring instruments. Here the assumption is that all events 
are registered with 100% efficiency and that there is negligible uncertainty in the time intervals over which 
the events are registered. 

When the rate measurement is statistically distributed about the mean, the distribution of events can be 
described by the Poisson distribution, ^('0, given by 

P ft {n) = — *-« 
n\ 

where n is the number of events per unit time and the average number of events per unit time is D. The 
uncertainty in the rate is given by the standard deviation and is equal to the square root of the average rate, 
•J7i. Most significant is the relative uncertainty, - s /W// T /N/ | r- The relative uncertainty decreases as the 


number of counts per counting interval increases. For a fixed experimental arrangement the number of counts 
can be increased by increasing the time of the measurement. As can be seen from the formula, the relative 
uncertainty can be made arbitrarily small by increasing the measurement time; however, improvement in 
relative uncertainty is proportional to the reciprocal of the square root of the measurement time. To reduce the 
relative uncertainty by a factor of two, it is necessary to increase the measurement time by a factor of four. 
One soon reaches a point where the fractional improvement in relative uncertainty requires a prohibitively 
long measurement time. 

In practice, the length of time for an experiment depends on the stability and reliability of the components. 
For some experiments, the solar neutrino flux and the rate of decay of the proton being extreme examples, the 
count rate is so small that observation times of months or even years are required to yield rates of sufficiently 
small relative uncertainty to be significant. For high count rate experiments, the limitation is the speed with 
which the electronics can process and record the incoming information. 

In this section we have examined the issue of time with respect to the processing and recording of signals and 
also with regard to statistical uncertainty. These are considerations that are the basis for the optimization of 
more complex experiments where the time correlation between sets of events or among several different 
events are sought. 


B1.10.3 TIME-OF-FLIGHT EXPERIMENTS 

Time-of- flight experiments are used to measure particle velocities and particle mass per charge. The typical 
experiment 


requires start and stop signals from detectors located at the beginning and end of the flight path, see figure 
Bl.10.3. The time-of-flight is then the time difference between the signals from the stop and start detectors. 
The start signal is often generated by the opening of a shutter and stop by the arrival of the particle at a stop 
detector. Alternatively, the start signal may be generated by the particle passing through a start detector that 
registers the passing of the particle without altering its motion in a significant way. The result of accumulating 
thousands of time-of-flight signals is shown in figure B 1.1 0.4 where the number of events with a given time- 
of-flight is plotted against time-of-flight specified as channel number. The data for the figure were acquired 
for a gas sample of hydrogen mixed with air. The different flight times reflect the fact that the ions all have 
the same kinetic energy. 
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Figure Bl.10.3. Time-of- flight experiment. Detectors at the beginning and end of the flight path sense the 
passage of a particle through the entrance and exit apertures. The width of the exit aperture, W, determines the 
amount of transverse velocity particles can have and still be detected at the end of the flight path. Transverse 
velocity contributes to the dispersion of flight times for identical particles with the same kinetic energies. 
Detector signals are amplified by a preamplifier; threshold discriminators produce standard pulses whenever 
the incoming signals exceed an established threshold level. Signals from the start and stop discriminators 
initiate the operation of the TAC (time-to-amplitude converter). At the end of the TAC cycle, a pulse is 
produced with amplitude proportional to the time between the start and stop signals. The TAC output pulse 
amplitude is converted to a binary number by a pulse height analyser (PHA) or analogue-to-digital converter 
(ADC). The binary number serves as an address for the multichannel analyser (MCA) that adds one to the 
number stored at the specified address. In this way, a collection of flight times, a time spectrum, is built up in 
the memory of the MCA. The contents of the MCA can be periodically transferred to a storage device for 
analysis and display. 

B1. 10.3.1 SOURCES OF UNCERTAINTIES 

The measurement of velocity is given by L(t 2 -tj), where L is the effective length of the flight path and t 2 -t 1 is 
the difference in time between the arrival of the particle at the stop detector (t 2 ) and the start detector (^). The 
uncertainty 


in the velocity is 


Ai A/ 2 


where At = t^-ty 


The relative uncertainty is 


MM 


In a conventional time-of- flight spectrometer, the transverse velocities of the particles and the angular 
acceptance of the flight path and stop detector determine the dispersion in L. The dispersion in flight times is 
given by the time resolution of the start and stop detectors and the associated electronics. When a shutter is 
used there is also a time uncertainty associated with its opening and closing. A matter that cannot be 
overlooked in time-of-flight measurements is the rate at which measurements can be made. In the discussion 
the implicit assumption has been that only one particle is in the flight path at a time. If there is more than one, 
giving rise to multiple start and stop signals, it is not possible to associate a unique start and stop signal with 
each particle. It cannot be assumed that the first start signal and the first stop signals have been generated by 
the same particle, because the faster particles can overtake the slower ones in the flight path. When shutters 
are used, the opening time of the shutter must be sufficiently short to allow no more than one particle to enter 
the flight path at a time. These considerations give rise to constraints on the rate which measurements can be 
made. If the longest time-of-flight is ^ max , the particles must not be allowed to enter the flight path at a rate 
that exceeds 0.1/r max . Detector response time and the processing time of the electronics should be taken into 
consideration when calculating T x . Time-of-flight experiments are inherently inefficient because the rate at 
which the shutter is opened is set by the time-of-flight of the slowest particle while the duration of the shutter 
opening is set by the total flux of particles incident on the shutter. The maximum uncertainty in the measured 
time-of-flight is, on the other hand, determined by ^ min , the time for the fastest particle to traverse the flight 
path. 

Many of the electronics in the time-of-flight system are similar to those in the counting experiment, with the 
exception of the time-to-amplitude converter (TAC) and analogue-to-digital converter (ADC). The TAC has 
start and stop inputs and an output. The output is a pulse of amplitude (height) proportional to the time 
difference between the stop and start pulses. Different scale settings allow adjustment of both the pulse height 
and time range. Typical units also include true start and busy outputs that allow monitoring of the input start 
rate and the interval during which the unit is busy processing input signal and therefore unavailable for 
accepting new signals. 

The output of the TAC is normally connected to the input of a pulse height analyser (PHA), or ADC and a 
multichannel analyser (MCA). The PHA/ ADC assigns a binary number to the height of the input pulse; for a 
zero amplitude pulse and 2 n for the maximum amplitude pulse, where n is an integer that can be selected 
according to the application. The binary number is used as the address for a multichannel analyser with 2 n 
address locations in the MCA. For each address from the PHA/ ADC, a one is added to the contents of the 
address location. If, for example, a TAC has a range of to 4 V output amplitude for stop/start time 
differences of 200 ns and the PHA/ADC assigns binary to the V amplitude signal and binary 255 to the 4 
V amplitude signal, each of the 256 channels will correspond to 0.78 ns. With time, a histogram of flight 
times is built up in the memory of the MCA. Figure B 1.1 0.4 is an illustration of a time-of-flight spectrum for 
a sample of air that has been ionized and accelerated to 2000 eV. 


The different flight times are a measure of the mass per charge of the ions. 
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Figure Bl.10.4. Time-of- flight histogram for ions resulting from the ionization of a sample of air with added 
hydrogen. The ions have all been accelerated to the same energy (2 keV) so that their time of flight is directly 
proportional to the reciprocal of the square root of their mass. 

An alternative to the TAC and PHA/ADC is the time-to-digital converter (TDC), a unit that combines the 
functions of the TAC and PHA/ADC. There are start and stop inputs and an output that provides a binary 
number directly proportional to the time difference between the stop and start signals. The TAC can be 
directly connected to a MCA or PC with the appropriate digital interface. 

B1. 10.3.2 ELECTRONICS LIMITATIONS 

By storing the binary outputs of a PHA/ADC or a TDC in the form of a histogram in the memory of a MCA, 
computer analyses can be performed on the data that take full account of the limitations of the individual 
components of the measuring system. For the preamplifiers and discriminators, time resolution is the principal 
consideration. For the TAC, resolution and processing time can be critical. The PHA/ADC or a TDC provide 
the link between the analogue circuits of the preamplifiers and discriminators and the digital input of the 
MCA or PC. The conversion from analogue-to-digital by a PHA/ADC or TDC is not perfectly linear and the 
deviations from linearity are expressed in terms of differential and integral nonlinearities. Differential 
nonlinearity is the variation of input signal amplitude over the width of a single time channel. Integral 
nonlinearity is the maximum deviation of the measured time channel number from a least squares straight line 
fit to a plot of signal amplitude as a function of channel number. 


B1.10.4 LIFETIME MEASUREMENTS 


B1. 10.4.1 GENERAL CONSIDERATIONS 


Lifetime measurements have elements in common with both counting and time-of- flight experiments [4, 5]. In 
a lifetime experiment there is an initiating event that produces the system that subsequently decays with the 
emission of radiation, particles or both. Decay is statistical in character; taking as an example nuclear decay, 


at any time t, each nucleus in a sample of n nuclei has the same probability of decay in the interval dt. Those 
nuclei that remain then have the same probability of decaying in the next interval dt. The rate of decay is 
given by d N/dt = -kN. The constant k, with units 1/time, is called the lifetime and depends on the system and 
the nature of the decay. Integration of the first-order differential equation gives the exponential decay law, N 

(t) = Nq e , where N Q is the number of systems (atoms, molecules, nuclei) initially created and N(t) is the 
number that remain after a time t. The constant x =l/k can be obtained by measuring the time for the sample to 
decay to lie of the initial size. 

A more direct method for lifetime measurements is the delayed coincidence technique [6] in which the time 
between an initiation event and the emission of a decay product is measured. A schematic diagram of an 
apparatus used for the measurement of atomic lifetimes is shown in figure BLIP. 5 . The slope of the graph of 
the natural log of the number of decay events as a function of time delay gives the lifetime directly. The 

precision with which the slope can be determined increases with the number of measurements. With 10 5 
separate time determinations, times to 10x can be sampled providing a range sufficient for a determination of 
t to a few per cent. Enough time must be allowed between each individual measurement to allow for long 
decay times. This requires that the experimental conditions be adjusted so that on average one decay event is 
recorded for every 100 initiation events. The delayed coincidence method can routinely measure lifetimes 
from a few nanoseconds to microseconds. The lower limit is set by the excitation source and the time 
resolution of the detector and electronics. Lifetimes as short as 10 ps can be measured with picosecond pulsed 
laser excitation, microchannel plate photomultipliers, a GHz preamplifier and a fast timing discriminator. 
Instrument stability and available time set the upper limit for lifetime measurements. 
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Figure Bl.10.5. Lifetime experiment. A pulser triggers an excitation source that produces excited species in 
the sample. At the same time the pulser provides the start signal for a time-to-amplitude converter (TAC). 
Radiation from the decay of an excited species is detected with a photomultiplier at the output of a 
monochromator. The signal from the photomultiplier is amplified and sent to a discriminator. The output of 
the discriminator is the stop signal for the TAC. The TAC output pulse amplitude is converted to a binary 
number by a pulse height analyser (PHA) or analogue-to-digital converter (ADC). The binary number serves 
as an address for the multichannel analyser (MCA) that adds one to the number stored at the specified address. 
The pulser rate must accommodate decay times at least a factor of 10 longer than the lifetime of the specie 
under study. The excitation source strength and sample density are adjusted to have at most one detected 
decay event per pulse. 


The delayed coincidence method has been applied to the fluorescent decay of laser excited states of biological 
molecules. By dispersing the emitted radiation from the decaying molecules with a polychromator and using 
an array of photodetectors for each wavelength region it is possible to measure fluorescent lifetimes as a 
function of the wavelength of the emitted radiation. The information is used to infer the conformation of the 
excited molecules [7]. 

B1.10.4.2 MULTIPLE HIT TIME-TO-DIGITAL CONVERSION 

Both lifetime and time-of- flight measurements have low duty cycles. In the case of the lifetime measurements 
sufficient time between initiation events must be allowed to accommodate the detection of long decays. 
Moreover, the signal rate has to be adjusted to allow for the detection of at most one decay event for every 
initiation event. In the case of time-of- flight measurements enough time must be allowed between 
measurements to allow for the slowest particle to traverse the flight path. As with the case of lifetime 
measurements, each initiation event must give rise to no more than one particle in the flight path. 
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In order to circumvent the signal limitations of lifetime and time-of- flight measurements multiple hit time-to- 
digital converters can be used. Typically, these instruments have from eight to sixteen channels and can be 
used in the 'common start' or 'common stop' mode. When used in the common start mode a single start signal 
arms all of the channels with successive 'stop' signals directed to each of the channel inputs in sequence. In 
this way the signal rate can be increased by a factor equal to the number of channels in the TDC. A counter 
and demultiplexer/data selector [8] perform the function of routing the stop signals to the different channel 
inputs in sequence. A schematic diagram of the arrangement is shown in figure B 1.1 0.6. 
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Figure Bl.10.6. Multiple hit time-to -digital conversion scheme. Signals from the detector are routed to eight 
different TDC channels by a demultiplexer/data selector according to the digital output of the 3 -bit counter 
that tracks the number of detector output signals following the initiation signal. The initiation signal is used as 
the common start. Not shown are the delays and signal conditioning components that are necessary to ensure 
the correct timing between the output of the counter and the arrival of the pulses on the common data line. 
Control logic to provide for the counter to reset after eight detector signals is also required. 


B1.10.5 COINCIDENCE EXPERIMENTS 

Coincidence experiments explicitly require knowledge of the time correlation between two events. Consider 
the example of electron impact ionization of an atom, figure Bl.10.7 . A single incident electron strikes a 
target atom or molecule and ejects an electron from it. The incident electron is deflected by the collision and 
is identified as the scattered electron. Since the scattered and ejected electrons arise from the same event, there 
is a time correlation 
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between their arrival times at the detectors. 
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Figure Bl.10.7. Electron impact ionization coincidence experiment. The experiment consists of a source of 
incident electrons, a target gas sample and two electron detectors, one for the scattered electron, the other for 
the ejected electron. The detectors are connected through preamplifiers to the inputs (start and stop) of a time- 
to-amplitude converter (TAC). The output of the TAC goes to a pulse-height-analyser (PHA) and then to a 
multichannel analyser (MCA) or computer. 


Coincidence experiments have been common in nuclear physics since the 1930s. The widely used coinci- 
dence circuit of Rossi [9] allowed experimenters to determine, within the resolution time of the electronics of 
the day, whether two events were coincident in time. The early circuits were capable of submicrosecond 
resolution, but lacked the flexibility of today's equipment. The most important distinction between modern 
coincidence methods and those of the earlier days is the availability of semiconductor memories that allow 
one to now record precisely the time relations between all particles detected in an experiment. We shall see 
the importance of this in the evaluation of the statistical uncertainty of the results. 
In a two detector coincidence experiment, of which figure Bl.10.7 is an example, pulses from the two 
detectors are amplified and then sent to discriminators, the outputs of which are standard rectangular pulses of 
constant amplitude and duration. The outputs from the two discriminators are then sent to the start and stop 
inputs of a TAC or TDC. Even though a single event is responsible for ejected and scattered electrons, the two 
electrons will not arrive at the detectors at identical times because of differences in path lengths and electron 
velocities. Electronic propagation delays and cable delays also contribute to the start and stop signals not 
arriving at the inputs of the TAC or TDC at identical times; sometimes the start signal arrives first, sometimes 


the stop signal is first. If the signal to the start input arrives after the signal to the stop, that pair of events will 
not result in a TAC/TDC output. The result can be a 50% reduction in the number of recorded coincidences. 
To overcome this limitation a time delay is inserted in the stop line between the discriminator and the 
TAC/TDC. This delay can be a length of coaxial cable (typical delays are of the order of 1 ns/ft) or an 
electronic circuit. The purpose is to ensure that the stop signal always arrives at the stop input of the 
TAC/TDC after the start signal. A perfect coincidence will be recorded at a time difference approximately 
equal to the delay time. 
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When a time window twice the duration of the delay time is used, perfect coincidence is at the centre of the 
time window and it is possible to make an accurate assessment of the background by considering the region to 
either side of the perfect coincidence region. An example of a time spectrum is shown in figure B 1.1 0.8. 
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Figure Bl.10.8. Time spectrum from a double coincidence experiment. Through the use of a delay in the lines 
of one of the detectors, signals that occur at the same instant in both detectors are shifted to the middle of the 
time spectrum. Note the uniform background upon which the true coincidence signal is superimposed. In 
order to decrease the statistical uncertainty in the determination of the true coincidence rate, the background is 
sampled over a time Ax B that is much larger than the width of the true coincidence signal, Ax. 

B1.10.5.1 SIGNAL AND BACKGROUND 

Referring to figure BLIP. 7 consider electrons from the event under study as well as from other events all 
arriving at the two detectors. The electrons from the event under study are correlated in time and result in a 
peak in the time spectrum centred approximately at the delay time. There is also a background level due to 
events that bear no fixed time relation to each other. If the average rate of the background events in each 
detector is R^ and R 2 , then the rate that two such events will be recorded within time Ax is given by 7? B , where 

Let the rate of the event under study be R A . It will be proportional to the cross section for the process under 
study, g a , the incident electron current, 7 , the target density, n, the length of the target viewed by the 
detectors, ', the solid angles subtended by the detectors, Aco 1 and Aa> 2 the efficiency of the detectors, s 1 and 

8 2 . 


R\ = a A Itf!£A(t)\&W2E]E2' 


The product 7 n^, depends on the properties of the region from which the two electrons originate and is called 
the 
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source function, S. The properties of the detectors are described by the product, AcOj.Ao^e^. 

For the background, each of the rates, R^ and R 2 , will be proportional to the source function, the cross sections 
for single electron production and the properties of the individual detectors, 

Combining the two expressions for R^ and R 2 

Comparing the expressions for the background rate and the signal rate one sees that the background increases 
as the square of the source function while the signal rate is proportional to the source function. The signal-to- 
background rate, ^ AB , is then 

It is important to note that the signal is always accompanied by background. We now consider the signal and 
background after accumulating counts over a time T. For this it is informative to refer to figure Bl.10.8 the 
time spectrum. The total number of counts within an arrival time difference Ax is 7V~ T and this number is the 
sum of the signal counts, N A = R A T, and the background counts, N B = R B T, 

The determination of the background counts must come from an independent measurement, typically in a 
region of the time spectrum outside of the signal region, yet representative of the background within the signal 
region. The uncertainty in the determination of the signal counts is given by the square root of the 
uncertainties in the total counts and the background counts 

The essential quantity is the relative uncertainty in the signal counts, 8 NJN A . This is given by 
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Expressing R A in terms of R AB results in the following formula 


\l 


(/? ar + 2)At 




Actfiiftf2fi£ir 


There are a number of observations to be drawn from the above formula: the relative uncertainty can be 
reduced to an arbitrarily small value by increasing T, but because the relative uncertainty is proportional to 
|/*/7\ a reduction in relative uncertainty by a factor of two requires a factor of four increase in collection 

time. The relative uncertainty can also be reduced by reducing Ax. Here, it is understood that Ax is the 
smallest time window that just includes all of the signal. Ax can be decreased by using the fastest possible 
detectors, preamplifiers and discriminators and minimizing time dispersion in the section of the experiment 
ahead of the detectors. 

The signal and background rates are not independent, but are coupled through the source function, S, as a 
consequence the relative uncertainty in the signal decreases with the signal-to-background rate, R AB = 1, a 
somewhat unanticipated result. Dividing 8N A I N A by its value at R AB = 1 gives a reduced relative uncertainty 
5 N A / A^ AB ] R equal to ^/i.Rm * 2)/3. A plot of [8 N A I ^ AB ] R as a function of R AB is shown in figure B1.9. 



o.o L 


Figure Bl.10.9. Plot of the reduced relative uncertainty of a double coincidence experiment as a function of 
the signal-to-background ratio. Note that the relative uncertainty decreases as the signal-to-background rate 
decreases. 

To illustrate this result, consider a case where there is one signal count and one background count in the time 
window Ax. The signal-to-background ratio is 1 and the relative uncertainty in the signal is ^ 2 +- 1 / 1 = 1.7. 
By increasing the source strength by a factor of 10 the signal will be increased by a factor of 10 and the 
background by a factor of 100. The signal-to-background ratio is now 0.1, but the relative uncertainty in the 
signal is Vl 10+ 100/10 = 1.45, a clear improvement over the larger signal-to-background case. 
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Another method for reducing the relative uncertainty is to increase the precision of the measurement of 
background. The above formulae are based on an independent measurement of background over a time 
window, Ax, that is equivalent to the time window within which the signal appears. If a larger time window 
for the background is used, the uncertainty in background determination can be correspondingly reduced. Let 
a time window of width Ax B be used for the determination of background, where Ax / Ax B = p < 1. If the rate 
at which counts are accumulated in time window Ax B is ^ BB , the background counts to be subtracted from the 
total counts in time window Ax becomes p^ BB T= p A^ BB . The uncertainty in the number of background 
counts to be subtracted from the total of signal plus background is yfn-Nw The relative uncertainty of the 

signal is then 


\ + 2p2N m /N A /l+2/>W & /AF A 


/V A 


*a 


The expression for the reduced relative uncertainty is then 
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B1.10.5.2 DETECTOR ALIGNMENT AND SOURCE STRENGTH 


For coincidence experiments where the detectors record time correlated events, maximum efficiency and 
sensitivity is attained when the detectors accept particles originating from the same source volume, in other 
words, the length, £, that defines the common volume of the source region seen by the detectors should be the 
same as the lengths and corresponding volumes for each of the detectors separately. If the two volumes are 
different, only the common volume seen by the detectors is used in the calculation of coincidence rate. Events 
that take place outside the common volume, but within the volume accepted by one of the detectors will only 
contribute to the background. This is shown schematically in figure Bl.10.10 . The situation is potentially 
complex if the source region is not uniform over the common volume viewed by the detectors and if the 
efficiencies of the detectors varies over the volume they subtend. Full knowledge of the geometric 
acceptances of the detectors and the source volume is necessary to accurately evaluate the size and strength of 
the source. Such an analysis is typically done numerically using empirical values of the view angles of the 
detectors and the spatial variation of target density and incident beam intensity. 
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Figure Bl.10.10. Schematic diagram of the effect of detector view angles on coincidence rate. The view 
angles of two detectors are shown along with the common view angle. Maximum signal collection efficiency 
is achieved when the individual view angles have the maximum overlap and when the overlap coincides with 
the maximum density of the incident beam. 

B1.10.5.3 MULTIPLE PARAMETER MEASUREMENTS 

It is often the case that a time correlation measurement alone is not sufficient to identify a particular event of 
interest. For this reason, coincidence measurements are often accompanied by the measurement of another 
parameter such as energy, spin, polarization, wavelength, etc. For the case of the electron impact ionization 
experiment of figure BLIP. 7 electrostatic energy analysers can be placed ahead of the detectors so that only 
electrons of a preselected energy arrive at the detectors. In this case the coincidence rate must take into 
account the energy bandpass of the analysers and the fact that in this example the energies of the scattered and 
ejected electrons are not independent, but are coupled through the relation that the energy of the incident 
electron equals the sum of the energies of the scattered and ejected electrons and the binding energy of the 
ejected electron 

£,,= Ei -HEi+ BE, 

where E Q is the energy of the incident electron, E^ is the energy of the scattered electron, E 2 is the energy of 
the ejected electron and BE is the binding energy of the ejected electron. The coincidence rate now has an 
additional term, AE, that is the overlap of the energy bandwidths of the two detectors, /SE A and AE 9 , subject to 


the energy conservation constraint. An often used approximation to AE is ^£ = /a£ 2 + A£?- The 

background rates in the detectors depend only on the individual energy bandwidths, and are independent of 
any additional constraints. Maximum efficiency is achieved when the energy bandwidths are only just wide 
enough to accept the coincident events of interest. 


B1.10.5.4 MULTIPLE DETECTORS 


The arrangement for a single coincidence measurement can be expanded to a multiple detector arrangement. 
If, for example, it is necessary to measure coincidence rates over an angular range, detectors can be placed at 
the angles of 
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interest and coincidence rates measured between all pairs of detectors. This is a much more efficient way of 
collecting data than having two detectors that are moved to different angles after each coincidence 
determination. Depending on the signal and background rates, detector outputs can be multiplexed and a 
detector identification circuit used to identify those detectors responsible for each coincidence signal. If 
detectors are multiplexed, it is well to remember that the overall count rate is the sum of the rates for all of the 
detectors. This can be an important consideration when rates become comparable to the reciprocal of system 
dead time. 

B1. 10.5.5 PREPROCESSING 

It is often the case that signal rates are limited by the processing electronics. Consider a coincidence time 
spectrum of 100 channels covering 100 ns with a coincidence time window of 10 ns. Assume a signal rate of 1 
Hz and a background rate of 1 Hz within the 10 ns window. This background rate implies a uncorrected event 
rate of 10 kHz in each of the detectors. To register 99% of the incoming events, the dead time of the system 
can be no larger than 10 [is. The dead time limitation can be substantially reduced with a preprocessing circuit 
that only accepts events falling within 100 ns of each other (the width of the time spectrum). One way to 
accomplish this is with a circuit that incorporates delays and gates to only pass signals that fall within a 100 ns 
time window. With this circuit the number of background events that need to be processed each second is 
reduced from 10,000 to 10, and dead times as long as 10 ms can be accommodated while maintaining a 
collection efficiency of 99% . In a way this is an extension of the multiparameter processing in which the 
parameter is the time difference between processed events. A preprocessing circuit is discussed by 
Goruganthu et al [10]. 

B1.10.5.6 TRIPLE COINCIDENCE MEASUREMENTS 

A logical extension of the coincidence measurements described above is the triple coincidence measurement 
shown schematically in figure Bl. 10.11 . Taking as an example electron impact double ionization, the two 
ejected electrons and the scattered electron are correlated in time. On a three-dimensional graph with vertical 
axes representing the number of detected events and the horizontal axes representing the time differences 
between events at detectors 1 and 2 and 1 and 3, a time correlated signal is represented as a three-dimensional 
peak with a fixed base width in the horizontal plane. A triple coincidence time spectrum is shown in figure 
BLIP. 12 . Unlike the situation for double coincidence measurements the background has four sources, random 
rates in each of the three detectors, correlated events in detectors 1 and 2 with an uncorrelated signal in 3, 
correlated events in detectors 1 and 3 with an uncorrelated signal in 2 and correlated events in detectors 2 and 
3 with an uncorrelated signal in detector 1 . The first source gives a background that is uniform over the full 
horizontal plane. The three other sources produce two walls that are parallel to the two time axes and a third 
wall that lies at 45° to the time axes. These can be seen in figure Bl.10.12. 
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Figure Bl. 10.11. Electron impact double ionization triple coincidence experiment. Shown are the source of 
electrons, target gas, three electron detectors, one for the scattered electron and one for each of the ejected 
electrons. Two time differences, t^ and t 2 ^ are recorded for each triple coincidence, t^ is the difference in 
arrival times of ejected electron 1 and the scattered electron; ^3 is the difference in arrival times of ejected 
electron 2 and the scattered electron. Two sets of time-to-amplitude converters (TACs) and pulse height 
analysers/analogue-to-digital converters (PHA/ADC) convert the times to binary encoded numbers that are 
stored in the memory of a computer. The data can be displayed in the form of a two-dimensional histogram 
(see figure Bl. 10.12). 
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Figure Bl. 10.12. Schematic diagram of a two-dimensional histogram resulting from the triple coincidence 
experiment shown in figure Bl.10.10 . True triple coincidences are superimposed on a uniform background 
and three walls corresponding to two electron correlated events with a randomly occurring third electron. 
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For any region of the horizontal plane with dimensions Ax 12 Ax 13 , the uniform background rate is 

/?,23 = R\ R 2 /?3 A I\ 2 A Ti3 = [SO] &(0[£ |][5a2Aw2^2][^3Aw3^]A7ri2 Al"i3 


where R^R 2 an d ^3 are the random background rates in detectors 1, 2 and 3. The background due to two time 
correlated events and a single random event is ^ 12 + ^13 + ^23' w ^ ere 

R\2 = [SffiiAW] At/Jit \E2][S<T}&to$£]i]&Ti$ 

^25 = [^^^Aft^A^fj^JfScFi Aw|£|]Ar33- 


The signal rate R A , is 


R& = Sg& A(0[ Afc>2 AlO^€]£2^ 


where the symbols have the same meaning as in the treatment of double coincidences. If the signal falls within 
a two-dimensional time window Ax 12 Ax 13 , then the signal-to-background rate ratio is R A /[^ 12 3 + ^12 + ^13 
+ 7^23] = X? an( i ^e statistical uncertainty in the number of signal counts accumulated in time 7 is 


where K A = a A Aco 1 Aco 2 Aco 3 s 1 s 2 s 3 . In contrast to simpler double coincidence experiments, S, the source 
function is not directly proportional to 1/p. The full expression for 8 N A I N A is 


WaINa = 


+ 2/tf 
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As is the case for the double coincidence arrangement 8 N A / N A is inversely proportional to the square root of 
the acquisition time, however 8 N A I N A does not approach a minimum value in the limit of % = 0, but rather 
has a minimum in the region between % = and % = 00, the precise value of % depending on the experimental 
conditions and the magnitudes of the cross sections for the different electron producing events. A detailed 
treatment of this topic with examples is given by Dupre et al [11]. 


B1.10.6 ANTI-COINCIDENCE 

In high-energy physics experiments there can be many interfering events superimposed on the events of 
interest. An example is the detection of gamma rays in the presence of high-energy electrons and protons. The 


electrons, protons and gamma rays all produce very similar signals in the solid state detectors that are used, 
and it is not possible to distinguish the gamma rays from the charged particles. A technique that is frequently 
used is to surround the gamma ray detectors with a plastic scintillation shield that produces a flash of light 
when traversed by a charged particle, but is transparent to the gamma rays. Light from the shield is coupled to 
photomultiplier detectors via light pipes. Signals that occur simultaneously in the photomultipliers and solid 
state detector are due to high-energy charged particles entering the instrument and are excluded from any 
analysis. An example of such an anti-coincidence circuit can be found in the energetic gamma ray experiment 
telescope (EGRET) on the gamma ray observatory (GRO) space craft that was launched in 1991 and 
continues to provide information on the energies and sources of gamma rays in the 20 MeV to 30 GeV energy 
range. Figure B 1.1 0.1 3 is a schematic diagram of the EGRET experiment showing the anticoincidence shield. 
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Figure Bl. 10.13. EGRET experiment showing spark chambers and photomultiplier tubes for the detection of 
gamma rays. The gamma ray detectors are surrounded by an anticoincidence scintillator dome that produces 
radiation when traversed by a high-energy charged particle but is transparent to gamma rays. Light pipes 
transmit radiation from the dome to photomultiplier tubes. A signal in a anticoincidence photomultiplier tube 
causes any corresponding signal in the gamma ray detectors to be ignored. The Nal crystal detector and 
photomultiplier tubes at the bottom of the unit provide high-resolution energy analysis of gamma rays passing 
through the spark chambers. 
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B1.11 NMR of liquids 

Oliver W Howarth 


B1.11.1 INTRODUCTION 

Nuclear magnetic resonance (NMR) was discovered by Bloch, Purcell and Pound in 1945, as a development 
of the use of nuclear spins in low-temperature physics. Its initial use was for the accurate measurement of 
nuclear magnetic moments. However, increases in instrumental precision led to the detection of chemical 
shifts (Bl.l 1.5) and then of spin-spin couplings (Bl.l 1.6) . This stimulated use by chemists. There have been 
spectacular improvements in sensitivity, resolution and computer control since then, so that NMR equipment 
is now essential in any laboratory for synthetic chemistry. Within moments or hours, it can determine the 
structure and, if desired, conformation of most medium-sized molecules in the solution phase. For this reason, 
a large pharmaceutical company will typically generate several hundred NMR spectra in one working day. 
NMR is also widely used in biochemistry for the much more challenging problem of determining the 
structures of smaller proteins and other biomolecules. The rates and extents of molecular motions can also be 
measured, through measuring the rates of energy transfer to, from and between nuclei (relaxation, Bl.l 3). 
Outside of chemistry, it is used in a different mode for medical and other imaging, and for the detection of 
flow and diffusion in liquids (Bl.l 4). It can also be used for clinical and in vivo studies, as the energies 
involved present no physiological dangers. 


B1.11.2 NUCLEAR SPINS 

NMR depends on manipulating the collective motions of nuclear spins, held in a magnetic field. As with 

every rotatable body in nature, every nucleus has a spin quantum number/. If /= (e.g. 12 C, 16 0) then the 
nucleus is magnetically inactive, and hence 'invisible'. If, as with most nuclei, I> 0, then the nucleus must 
possess a magnetic moment, because of its charge in combination with its angular momentum hJTiTTT). This 

makes it detectable by NMR. The most easily detected nuclei are those with / = land with large magnetic 
moments, e.g. H, 13 C, 19 F, 31 P. These have two allowed states in a magnetic (induction) field B n : m vi and 

—1 l "1 1/9 

m -l/2 ? W ^ an § u l ar momentum components +A/2. Thus these nuclear magnets lie at angles ± cos ((j)/(|)) 
= 54.7° to Bq. They also precess rapidly, like all gyroscopes, at a rate v L . This is called the Larmor frequency, 
after its discoverer. The 27+ 1 permitted angles for other values of /differ from the above angles, but they can 
never be zero or 180°, because of the uncertainty principle. 

The energy difference AE corresponding to the permitted transitions, A m = ±1, is given by 

6.E = yhBo/27r = hv L . 

Therefore, in NMR, one observes collective nuclear spin motions at the Larmor frequency. Thus the 
frequency of NMR detection is proportional to B^. Nuclear magnetic moments are commonly measured either 

by their magnetogyric ratio y, or simply by their Larmor frequency v L in a field where H resonates at 100 
MHz (symbol S). 


B1 .11.2.1 PRACTICABLE ISOTOPES 

Almost all the stable elements have at least one isotope that can be observed by NMR. However, in some 
cases the available sensitivity may be inadequate, especially if the compounds are not very soluble. This may 
be because the relevant isotope has a low natural abundance, e.g. 17 0, or because its resonances are extremely 
broad, e.g. 33 S. Tables of such nuclear properties are readily available [1], and isotopic enrichment may be 
available as an expensive possibility. Broad resonances are common for nuclei with I> 0, because these 
nuclei are necessarily non-spherical, and thus have electric quadrupole moments that interact strongly with the 
electric field gradients present at the nuclei in most molecules. Line widths may also be greatly increased by 
chemical exchange processes at appropriate rates (B2.7), by the presence of significant concentrations of 
paramagnetic species and by very slow molecular tumbling, as in polymers and colloids. For nuclei with / = 
1/2, a major underlying cause of such broadening is the magnetic dipolar interaction of the nucleus under 
study with nearby spins. This and other interactions also lead to very large linewidths in the NMR spectra of 
solids and of near-solid samples, such as gels, pastes or the bead-attached molecules used in combinatorial 
chemistry. However, their effect may be reduced by specialized techniques (B1.13). 

Fortunately, the worst broadening interactions are also removed naturally in most liquids and solutions, or at 
least greatly reduced in their effect, by the tumbling motions of the molecules, for many of the broadening 

interactions vary as (3 cos - 1) where is the angle between the H-H vector and i? , and so they average to 
zero when covers the sphere isotropically. As a result, the NMR linewidths for the lighter spin— nuclei in 

smallish molecules will commonly be less than 1 Hz. Resolution of this order is necessary for the adequate 
resolution of the shifts and couplings described below. It requires expensive magnets and physically 
homogeneous samples. The presence of irregular interfaces degrades resolution by locally distorting the 
magnetic field, although the resulting spectra may still have enough resolution for some purposes, such as in 
vivo NMR. 

It is occasionally desirable to retain a small proportion of molecular orientation, in order to quantitate the 
dipolar interactions present, whilst minimizing their contribution to the linewidth. Partial orientation may be 
achieved by using a nematic solvent. In large, magnetically anisotropic molecules it may occur naturally at the 
highest magnetic fields. 

Figure Bl.11.1 shows the range of radiofrequencies where resonances may be expected, between 650 and 140 

MHz, when B^ = 14.1 T, i.e. when the H resonance frequency is 600 MHz. There is one bar per stable 
isotope. Its width is the reported chemical shift range (Bl.11.5) for that isotope, and its height corresponds to 
the log of the sensitivity at the natural abundance of the isotope, covering about six orders of magnitude. The 

radioactive nucleus H is also included, as it is detectable at safe concentrations and useful for chemical 
labelling. It is evident that very few ranges overlap. This, along with differences in linewidth, means that a 
spectrometer set to detect one nucleus is highly unlikely to detect any other in the same experiment. 



MH/ 


Figure Bl.11.1. Resonance frequencies for different nuclei in a field of 14.1 T. Widths indicate the quoted 
range of shifts for each nucleus, and heights indicate relative sensitivities at the natural isotopic abundance, on 
a log scale covering approximately six orders of magnitude. Nuclei resonating below 140 MHz are not shown. 

B1.11.2.2 PRACTICABLE SAMPLES 

Once the above restrictions on isotope, solubility, chemical lability and paramagnetism are met, then a very 
wide range of samples can be investigated. Gases can be studied, especially at higher pressures. Solutions for 

H or C NMR are normally made in deuteriated solvents. This minimizes interference from the solvent 
resonances and also permits the field to be locked to the frequency as outlined below. However, it is possible 
to operate with an unavoidably non-deuteriated solvent, such as water in a biomedical sample, by using a 
range of solvent-suppression techniques. An external field-frequency lock may be necessary in these cases. 

NMR spectra from a chosen nucleus will generally show all resonances from all such nuclei present in 
solution. Therefore, if at all possible, samples should be chemically pure, to reduce crowding in the spectra, 
and extraneous compounds such as buffers should ideally not contain the nuclei under study. When 
chromatographic separations are frequently and routinely essential, then on-line equipment that combines 
liquid chromatography with NMR is available [2]. Some separation into subspectra is also possible within a 
fixed sample, using specialized equipment, when the components have different diffusion rates. The technique 
is called diffusion ordered spectroscopy, or DOSY [3]. If the spectral crowding arises simply from the 
complexity of the molecule under study, as in proteins, then one can resort to selective isotopic labelling: for 
example, via gene manipulation. A wide range of experiments is available that are selective for chosen pairs 
of isotopes, and hence yield greatly simplified spectra. 


The available sensitivity depends strongly on the equipment as well as the sample. H is the nucleus of choice 
for most experiments. 1 mg of a sample of a medium-sized molecule is adequate for almost all types of 1 H- 
only spectra, and with specialized equipment one can work with nanogram quantities. At this lower level, the 

problem is not so much sensitivity as purity of sample and solvent. C NMR at the natural isotopic 
abundance of 1.1% typically requires 30 times this quantity of material, particularly if non-protonated carbon 

atoms are to be studied. Most other nuclei necessitate larger amounts of sample, although 31 P [4] and 19 F are 
useful exceptions. In vivo spectroscopy generally calls for custom-built probes suited for the organ or 
organism under study. Human in vivo spectra usually require a magnet having an especially wide bore, along 
with a moveable detection coil. They can reveal abnormal metabolism and internal tissue damage. 


NMR can be carried out over a wide range of temperatures, although there is a time and often a resolution 
penalty in using temperatures other than ambient. An effective lower limit of — 150 °C is set by the lack of 
solvents that are liquid below this. Temperatures above -130 °C require special thermal protection devices, 
although measurements have even been made on molten silicates. 


B1.11.3 THE NMR EXPERIMENT 

B1.11.3.1 EQUIPMENT AND RESONANCE 

Figure B 1.1 1.2 represents the essential components of a modern high-resolution NMR spectrometer, suitable 
for studies of dissolved samples. The magnet has a superconducting coil in a bath of liquid He, jacketed by 

liquid N 2 . The resulting, persistent field B^ ranges from 5.9 to 21.1 T, corresponding to H NMR at 250 to 
900 MHz. This field can be adjusted to a homogeneity of better than 1 ppb over the volume of a typical 
sample. The sample is commonly introduced as 0.5 ml of solution held in a 5 mm OD precision glass tube. 
Microcells are available for very small samples, and special tubes are also available for, for example, 
pressurized or flow-through samples. The sample can be spun slowly in order to average the field 
inhomogeneity normal to the tube's axis, but this is not always necessary, for the reduction in linewidth may 
only be a fraction of a hertz. The field aligns the spins as described above, and a Boltzmann equilibrium then 
develops between the various m.Zeeman states. A typical population difference between the two H levels is 
1 part in 10 . Thus 10 of the H spins in the sample are not paired. These are collectively called the bulk 
nuclear magnetization, and in many ways they behave in combination like a classical, magnetized gyroscope. 
The time required for the establishment of the Boltzmann equilibrium is approximately five times the spin- 
lattice relaxation time, T^ (B1.14). Because the population difference between the two H levels is 
proportional to the field, the sens itivity of NMR also rises with the field. 
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Figure Bl.11.2. Simplified representation of an NMR spectrometer with pulsed RF and superconducting 
magnet. The main magnetic field B^ is vertical and centred on the sample. 


The bulk magnetization is stimulated into precessional motion around B^ by a radiofrequency (RF) pulse at 
v L , applied through a solenoid-like coil whose axis is perpendicular to B^. This motion amounts to a nuclear 

magnetic resonance. Typically, 1CT 5 s of RF power tilts the magnetization through 90° and thus constitutes a 
90° pulse. The coil is then computer-switched from being a transmitter to becoming a receiver, with the free 
precessional motion of the magnetization generating a small RF voltage in the same coil, at or near the 
frequency v L . The decay time of this oscillating voltage is of the order of 7^, unless speeded by contributions 
from exchange or field inhomogeneity. After appropriate amplification and heterodyne detection it is reduced 
to an audiofrequency and digitized. These raw data are commonly called the free induction decay, or FID, 


although the term applies more properly to the original bulk precession. 

One further consequence of the use of a pulse, as against the older method of applying a single RF frequency, 
is that its underlying frequency is broadened over a range of the order of the reciprocal of the pulse length. 
Indeed, the RF power is in many cases virtually uniform or 'white' across a sufficient range to stimulate all 
the nuclei of a given isotope at once, even though they are spread out in frequency by differences of chemical 
shift. Thus, the repeat time for the NMR measurement is approximately T^ typically a few seconds, rather 
than the considerably longer time necessary for sweeping the frequency or the field. Most FIDs are built up by 
gathering data after repeated pulses, each usually less than 90° in order to permit more rapid repetition. This 
helps to average away noise and other imperfections, relative to the signal, and it also permits more elaborate 
experiments involving sequences of multiple pulses and variable interpulse delays. 

However, it also necessitates a strictly constant ratio of field to frequency, over the duration of the 
experiment. Although the master frequency source can be held very constant by a thermostatted source, the 
field is always vulnerable to local movements of metal, and to any non-persistence of the magnet current. 
Therefore the field is locked to the frequency through a feedback loop that uses continuous, background 

monitoring of the 2 H solvent resonance. The probe containing the sample and coil will also normally have at 
least one further frequency channel, for decoupling experiments (Bl.11.6) . The lock signal is also 
simultaneously employed to maximize the field homogeneity across the sample, either manually or 
automatically, via low-current field correction 'shim' coils. A feedback loop maximizes the height of the lock 
signal and, because the peak area must nevertheless remain constant, thereby minimizes the peak's width. 


The digitized FID can now be handled by standard computer technology. Several 'spectrum massage' 
techniques are available for reducing imperfections and noise, and for improving resolution somewhat. The 
FID is then converted into a spectrum by a discrete Fourier transformation. Essentially, digital sine and cosine 
waves are generated for each frequency of interest, and the FID is multiplied by each of these in turn. If the 
product of one particular multiplication does not average to zero across the FID, then that frequency, with that 
phase, is also present in the FID. The resulting plot of intensity versus frequency is the spectrum. It is 
normally 'phased' by appropriate combination of the sine and cosine components, so as to contain only 
positive-going peaks. This permits the measurement of peak areas by digital integration, as well as giving the 
clearest separation of peaks. 

The information within the spectrum can then be presented in many possible ways. In a few cases, it is 
possible to identify the sample by a fully automatic analysis: for example, by using comparisons with an 
extensive database. However, most analyses require the knowledge outlined in the following sections. 

Many other pulsed NMR experiments are possible, and some are listed in the final sections. Most can be 
carried out using the standard equipment described above, but some require additions such as highly 
controllable, pulsed field gradients, shaped RF pulses for (for example) single-frequency irradiations, and the 
combined use of pulses at several different frequencies. 


B1.1 1.4 QUANTITATION 

The simplest use of an NMR spectrum, as with many other branches of spectroscopy, is for quantitative 
analysis. Furthermore, in NMR all nuclei of a given type have the same transition probability, so that their 
resonances may be readily compared. The area underneath each isolated peak in an NMR spectrum is 
proportional to the number of nuclei giving rise to that peak alone. It may be measured to -1% accuracy by 
digital integration of the NMR spectrum, followed by comparison with the area of a peak from an added 
standard. 


The absolute measurement of areas is not usually useful, because the sensitivity of the spectrometer depends 
on factors such as temperature, pulse length, amplifier settings and the exact tuning of the coil used to detect 
resonance. Peak intensities are also less useful, because linewidths vary, and because the resonance from a 
given chemical type of atom will often be split into a pattern called a multiplet. However, the relative overall 
areas of the peaks or multiplets still obey the simple rule given above, if appropriate conditions are met. Most 
samples have several chemically distinct types of (for example) hydrogen atoms within the molecules under 
study, so that a simple inspection of the number of peaks/multiplets and of their relative areas can help to 
identify the molecules, even in cases where no useful information is available from shifts or couplings. 

This is illustrated in figure B 1.1 1.3 the integrated H NMR spectrum of commercial paracetamol in 
deuteriodimethylsulfoxide solvent. The paracetamol itself gives five of the integrated peaks or multiplets. 
Two other integrated peaks at 3.4 and 1.3 ppm, plus the smaller peaks, arise from added substances. The five 
paracetamol peaks have area ratios (left to right) of 1 : 1 :2:2:3. These tally with the paracetamol molecule (see 
diagram). The single H atoms are OH and NH respectively, the double ones are the two distinct pairs of 
hydrogens on the aromatic ring and the triple ones are the methyl group. Few other molecules will give these 
ratios, irrespective of peak position. 
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Figure Bl.11.3. 400 MHz H NMR spectrum of paracetamol (structure shown) with added integrals for each 
singlet or multiplet arising from the paracetamol molecule. 


The other peaks demonstrate the power of NMR to identify and quantitate all the components of a sample. 
This is very important for the pharmaceutical industry. Most of the peaks, including a small one accidentally 
underlying the methyl resonance of paracetamol, arise from stearic acid, which is commonly added to 
paracetamol tablets to aid absorption. The integrals show that it is present in a molar proportion of about 2%. 
The broader peak at 3.4 ppm is from water, present because no attempt was made to dry the sample. Such 
peaks may be identified either by adding further amounts of the suspected substance, or by the more 
fundamental methods to be outlined below. If the sample were less concentrated, then it would also be 

possible to detect the residual hydrogen atoms in the solvent, i.e. from its deuteriodimethylsulfoxide-d 5 
impurity, which resonates in this case at 2.5 ppm. 


It is evident from the figure that impurities can complicate the use of NMR integrals for quantitation. Further 
complications arise if the relevant spins are not at Boltzmann equilibrium before the FID is acquired. This 
may occur either because the pulses are repeated too rapidly, or because some other energy input is present, 
such as decoupling. Both of these problems can be eliminated by careful timing of the energy inputs, if strictly 
accurate integrals are required. 

Their effects are illustrated in figure Bl.11.4 which is a ^-decoupled 13 C NMR spectrum of the same sample 
of paracetamol, obtained without such precautions. The main peak integrals are displayed both as steps and 

also as numbers below the peaks. The peaks from stearic acid are scarcely visible, but the dmso-d 6 solvent 
multiplet at 30 ppm is prominent, because 13 C was also present at natural abundance in the solvent. The 
paracetamol integrals should ideally be in the ratios 1:1:1:2:2:1, corresponding to the carbonyl carbon, the 
four chemically distinct ring carbons and the methyl carbon to the right. However, the first three peaks 
correspond to carbons that have no attached H, and their integrals are reduced by factors of between 2 and 3. 
The methyl peak is also slightly reduced. 
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Figure Bl.11.4. Hydrogen-decoupled 100.6 MHz 3 C NMR spectrum of paracetamol. Both graphical and 
numerical peak integrals are shown. 

These reductions arose because the spectrum was obtained by the accumulation of a series of FIDs, generated 
by pulses spaced by less than the 5 x T^ interval mentioned above, so that a full Boltzmann population 
difference was not maintained throughout. The shortfall was particularly acute for the unprotonated carbons, 
for these have long relaxation times, but it was also significant for the methyl group because of its rapid 
rotational motion. These losses of intensity are called 'saturation'. If they are intentionally introduced by 
selective irradiation just prior to the pulse, then they become the analogues of hole-bleaching in laser 
spectroscopy. They are useful for the selective reduction of large, unwanted peaks, as in solvent suppression. 
Also, because they are a property of the relevant nuclei rather than of their eventual chemical environment 
they can be used as transient labels in kinetic studies. 


The integrals in figure B 1.1 1.4 are, however, also distorted by a quite different mechanism. The spectrum was 
obtained in the standard way, with irradiation at all the relevant H frequencies so as to remove any couplings 
from H. This indirect input of energy partly feeds through to the C spins via dipolar coupling, to produce 
intensity gains at all peaks, but particularly at protonated carbons, of up to x3, and is an example of the 
nuclear Overhauser enhancement. The phenomenon is quite general wherever T^ is dominated by dipolar 
interactions, and when any one set of spins is saturated, whether or not decoupling also takes place. It was 
discovered by A W Overhauser in the context of the dipolar interactions of electrons with nuclei [5]. The 


NOEs between H nuclei are often exploited to demonstrate their spatial proximity, as described in the final 

section. It is possible to obtain decoupled 13 C NMR spectra without the complications of the NOE, by 
confining the decoupling irradiation to the period of the FID alone, and then waiting for 10 x T^ before 

repeating the process. However, the x3 enhancement factor is useful and also constant for almost all 

protonated carbons, so that such precautions are often superfluous. Both carbon and hydrogen integrals are 

particularly valuable for identifying molecular symmetry. 


Figure B 1.1 1.5 is an example of how relative integrals can determine structure even if the peak positions are 
not adequately understood. The decavanadate anion has the structure shown, where oxygens lie at each vertex 
and vanadiums at the centre of each octahedron. An aqueous solution of decavanadate was mixed with about 
8 mol% of molybdate, and the three peaks from the remaining decavanadate were then computer- subtracted 

from the 51 V NMR spectrum of the resulting equilibrium mixture [6]. The remaining six peaks arise from a 
single product and, although their linewidths vary widely, their integrals are close to being in a 2:2:2: 1 : 1 : 1 
ratio. This not only suggests that just one V has been replaced by Mo, but also identifies the site of 
substitution as being one of the four M0 6 octahedra not lying in the plane made by vanadiums 1, 2 and 3. No 
other site or extent of substitution would give these integral ratios. 
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Figure Bl.11.5. 105 MHz 5 V NMR subtraction spectrum of the [MoV 9 28 ] 5 " anion (structure shown). The 
integrals are sufficient to define the position of the Mo atom. 


When T^ is very short, which is almost always true with nuclei having I> 1/2, the dipolar contribution to 
relaxation will be negligible and, hence, there will be no contributions to the integral from either NOE or 
saturation. However, resonances more than about 1 kHz wide may lose intensity simply because part of the 
FID will be lost before it can be digitized, and resonances more than 10 kHz wide may be lost altogether. It is 
also hard to correct for minor baseline distortions when the peaks themselves are very broad. 
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B1.11.5 CHEMICAL SHIFTS 

Strictly speaking, the horizontal axis of any NMR spectrum is in hertz, as this axis arises from the different 
frequencies that make up the FID. However, whilst this will be important in the section that follows (on 
coupling), it is generally more useful to convert these frequency units into fractions of the total spectrometer 
frequency for the nucleus under study. The usual units are parts per million (ppm). Because frequency is 
strictly proportional to field in NMR, the same ppm units also describe the fractional shielding or deshielding 
of the main magnetic field, by the electron clouds around or near the nuclei under study. The likely range of 
such shieldings is illustrated in figure Bl.11.1 figure B 1.1 1.3 and figure B 1.1 1.4 . 

B1.11.5.1 CALIBRATION 

The ppm scale is always calibrated relative to the appropriate resonance of an agreed standard compound, 
because it is not possible to detect the NMR of bare nuclei, even though absolute shieldings can be calculated 

with fair accuracy for smaller atoms. Added tetramethylsilane serves as the standard for H, 2 H, 13 C and 29 Si 
NMR, because it is a comparatively inert substance, relatively immune to changes of solvent, and its 
resonances fall conveniently towards one edge of most spectra. Some other standards may be unavoidably less 
inert. They are then contained in capillary tubes or in the annulus between an inner and an outer tube, where 
allowance must be made for the jump in bulk magnetic susceptibility between the two liquids. For dilute 
solutions where high accuracy is not needed, a resonance from the deuteriated solvent may be adequate as a 
secondary reference. The units of chemical shift are 8. This unit automatically implies ppm from the standard, 
with the same sign as the frequency, and so measures deshielding. Hence 8 = 10 (v - v re f)/ v re f 

B1.11.5.2 LOCAL CONTRIBUTIONS TO THE CHEMICAL SHIFT 

The shielding at a given nucleus arises from the virtually instantaneous response of the nearby electrons to the 
magnetic field. It therefore fluctuates rapidly as the molecule rotates, vibrates and interacts with solvent 
molecules. The changes of shift with rotation can be large, particularly when double bonds are present. For 

example, the 13 C shift of a carbonyl group has an anisotropy comparable to the full spectrum width in figure 
Bl.11.4 . Fortunately, these variations are averaged in liquids, although they are important in the NMR of 
solids. This averaging process may be visualized by imagining the FID emitted by a single spin. If the 
emission frequency keeps jumping about by small amounts, then this FID will be made up from a series of 
short segments of sine waves. However, if the variations in frequency are small compared to the reciprocal of 
the segment lengths, then the composite wave will be close to a pure sine wave. Its frequency will be the 
weighted average of its hidden components and the rapid, hidden jumps will not add to the linewidth. This is 
described as 'fast exchange on the NMR timescale' and it is discussed more fully in B.2.7. The same 
principles apply to rapid chemical changes, such as aqueous protonation equilibria. In contrast, somewhat 
slower exchange processes increase linewidths, as seen for the OH and NH resonances in figure Bl.11.3 . In 
this sample, the individual molecules link transiently via hydrogen bonding, and are thus in exchange between 
different H-bonded oligomers. 

The two primary causes of shielding by electrons are diamagnetism and temperature-independent 
paramagnetism (TIP). Diamagnetism arises from the slight unpairing of electron orbits under the influence of 
the magnetic field. This always occurs so as to oppose the field and was first analysed by Lamb [7]. A 
simplified version of his formula, 
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appropriate for a single orbital in isolated atoms, shows that the diamagnetic shielding contribution to 8 is 


proportional to -p Q (r) where p e is the electron density and (r ) is the average squared radius of the electron 
orbit [8]. Thus diamagnetic shielding is lowered and, hence, 8 is increased, by the attachment of an 

electronegative group to the atom under study, for this decreases both p e and (r 2 ). Similarly, in conjugated 
systems, 8 will often approximately map the distribution of electronic charge between the atoms. 

The TIP contribution has the opposite effect on 8. It should not be confused with normal paramagnetism, for it 
does not require unpaired electrons. Instead, it arises from the physical distortion of the electron orbitals by 
the magnetic field. A highly simplified derivation of the effect of TIP upon 8 shows it to be proportional to 
+(r _3 )/AE', with r as above. AE is a composite weighting term representing the energy gaps between the 
occupied and the unoccupied orbitals. A small AE implies orbitals that are accessible and hence vulnerable to 
magnetic distortion. A E is particularly large for H, making the TIP contribution small in this case, but for all 
other atoms TIP dominates 8. The effect of an electronegative group on the TIP of an atom is to shrink the 
atom slightly and, thus, to increase 8, i.e. it influences chemical shift in the same direction as described above 
in the case of diamagnetism, but for a different reason. 

These effects are clearly seen for saturated molecules in table Bl.11.1 , which correlates the H and C NMR 
chemical shifts of a series of compounds X-CH 3 with the approximate Pauling electronegativities of X. Both 
8 H and 8 C increase with X as predicted, although the sequence is slightly disordered by additional, relativistic 
effects at the C atoms, when X is a heavy atom such as I. The correlations of shift with changes in the local 
electric charge and, hence, with orbital radius, are also seen for an aromatic system in figure B 1.1 1.6. Here the 
hydrogen and carbon chemical shifts in phenol, quoted relative to benzene, correlate fairly well both with 
each other and also with the charge distribution deduced by other means. 
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Figure Bl.11.6. H and 13 C chemical shifts in phenol, relative to benzene in each case. Note that 8 (H or C) 
approximately follows 8 (the partial charge at C). 
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Table Bl.11.1. Effect of an electronegative substituent upon methyl shifts in X- x H 3< 


Electronegativity of X 


Si(CH 3 ) 3 

0.0 

0.0 

1.8 

H 


0.13 

-2.3 

2.1 

CH 3 


0.88 

5.7 

2.5 

1 


2.16 

-20.7 a 

2.5 

NH 2 


2.36 

28.3 

2.8 

Br 


2.68 

10.0 a 

2.8 

CI 


3.05 

25.1 

3.0 

OH 


3.38 

49.3 

3.5 

F 


4.26 

75.4 

4.0 


a Heavy-atom relativistic effects influence these shifts. 


The effects of TIP also appear in figure B 1.1 1.3 and figure B 1.1 1.4 . In the 13 C NMR spectrum, all the 
resonances of the sp carbons lie above 100 ppm (a useful general rule of thumb) because AE is smaller for 
multiple bonds. The highest shifts are for the carbonyl C at 169 ppm and the ring C attached to oxygen at 155 

ppm, because of the high electronegativity of O. In the H spectrum, H atoms attached to sp 2 carbons also 
generally lie above 5 ppm and below 5 ppm if attached to sp 3 carbons. However, the NH and OH resonances 
have much less predictable shifts, largely governed by the average strength of the associated hydrogen bonds. 

When these are exceptionally strong, as in nucleic acids, 8 H can be as high as 30 ppm, whereas most normal 
H shifts lie in the range 0-10 ppm. 

Table Bl.11.2 gives a different example of the effects of TIP. In many fluorine compounds, AE is largely 
determined by the energy difference between the bonding and antibonding orbitals in the bond to F, and hence 

by the strength of this bond. The 19 F shifts correlate nicely in this way. For example, C-F bonds are 
notoriously strong and so give low values of 8 F , whereas the remarkable reactivity of F 2 depends on the 
weakness of the F-F bond, which also gives F 2 a much higher chemical shift. The weakness of the bond to F 
is even more apparent in the chemical shifts of the explosive compounds XeF 6 and FOOF. In some 
compounds the spectra also reflect the presence of distinct types of fluorine within the molecule. For example, 
C1F 3 has a T-shaped structure with the Cl-F bond stronger for the stem of the T, giving this single F atom a 
lower shift. 
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Table Bl.11.2. Fluorine-19 chemical shifts. 


Organic compounds 5 Inorganic compounds 5 Metal fluorides 5 


CH 3 F 

-272 

HF 

CHqCH^F 

-213 

LiF 

C 6 F 6 

-163 

[BFJ 

CHpF/p 

-143 

BF 3 

F 2 C=CF 2 

-135 

CIF 3 

H/pC = CF 2 

-81 

IF 7 

CF 3 R 

-60 to -70 

"=6 

CF 2 CI 2 

-8 

CIF 5 


CFCL 


CF 2 Br 2 


CFBr Q 


(reference) xeF 


+7 
+7 


XeF, 


XeF c 


-221 




MoF 6 

-278 

-210 




SbF 5 

-108 

-163 




WF 6 

+166 

-131 




ReF 7 

+345 

-A 

(1), 

+116 

(2) 




+170 

+174(1), +222(4) 

+247(1), +412(4) 

+258 

+438 

+550 

+421.5 


CIF 
FOOF 


+448.4 
+865 


The shifts for other nuclei are not usually so simple to interpret, because more orbitals are involved, and 

because these may have differing effects on (r -3 ) and on AE . Nevertheless, many useful correlations have 
been recorded [9]. A general rule is that the range of shifts increases markedly with increasing atomic number, 

mainly because of decreases in AE and increases in the number of outer-shell electrons. The shift range is 
particularly large when other orbital factors also lower and vary AE, as in cobalt [III] complexes. However, the 

chemical shifts for isotopes having the same atomic number, such as H, H and H, are almost identical. 
Each nucleus serves merely to report the behaviour of the same electron orbitals, except for very small effects 
of isotopic mass on these orbitals. 
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B1.11.5.3 CHEMICAL SHIFTS ARISING FROM MORE DISTANT MOIETIES 


Hydrogen shifts fall within a narrow range and, hence, are proportionally more susceptible to the influence of 
more distant atoms and groups. As an example, aromatic rings have a large diamagnetic anisotropy, because 
the effective radius of the electron orbit around the ring is large. This affects the chemical shifts of nearby 
atoms by up to 5 ppm, depending on their position relative to the ring. It also explains the relatively high 
shifts of the hydrogens directly attached to the ring. The diamagnetism of the ring is equivalent to a solenoid 
opposing Bq. Its lines of force necessarily loop backwards outside the ring and therefore deshield atoms in or 
near the ring plane. The effect is illustrated in figure B 1.1 1.7 for a paracyclophane. The methylene shift values 
fall both above and below the value of 1.38, expected in the absence of the aromatic moiety, according to the 
average positioning of the methylene hydrogens relative to the ring plane. Similar but smaller shifts are 


observed with simple double and triple bonds, because these also generate a significant circulation of 
electrons. Polycyclic aromatic systems generate even larger shifts, in regions near to more than one ring. 
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Figure Bl.11.7. H chemical shifts in [ 1 0] -paracyclophane . They have values on either side of the 1.38 ppm 
found for large polymethylene rings and, thus, map the local shielding and deshielding near the aromatic 
moiety, as depicted in the upper part of the figure. 


Shifts are also affected by steric compression of any kind on the atom under study. The effect on a C atom can 


13 


reduce 8 C by up to 10 ppm. For example, the C chemical shifts of the methyl carbons in but-2-ene are 5 
ppm lower in the cis isomer than in the trans, because in the cis case the methyl carbons are only about 3 A 

apart. Steric shifts are particularly important in the 13 C NMR spectra of polymers, for they make the peak 
positions dependent on the local stereochemistry and, hence, on the tacticity of the polymer [10]. 


B1.11.5.4 SHIFT REAGENTS 

Nearby paramagnetic molecules can also have a similar effect on shifts. Their paramagnetic centre will often 
possess a substantially anisotropic g- factor. If the paramagnetic molecule then becomes even loosely and 
temporarily attached to the molecule under study, this will almost always lead to significant shift changes, for 
the same reasons as with nearby multiple bonds. Furthermore, if the molecule under study and its 
paramagnetic attachment are both chiral, the small variations in the strength and direction of the attachment 
can effect a separation of the shifts of the two chiral forms 
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and thus detect any enantiomeric excess. The paramagnetic species thus acts as a chiral shift reagent. In 
practice, the effect of the paramagnetic additive will be complicated, because there may be further shifts 
arising from direct leakage of free electron density between the paired molecules. There will also be 
broadening due to the large magnetic dipole of the unpaired electrons. These complications can be reduced by 
a judicious choice of shift reagent. It may also be possible to use a diamagnetic shift reagent such as 2,2'- 
trifluoromethyl-9-anthrylethanol and thus avoid the complications of paramagnetic broadening altogether, 
whilst hopefully retaining the induced chiral separation of shifts [11]. 

B1.1 1.5.5 THE PREDICTION OF CHEMICAL SHIFTS 


Enormous numbers of chemical shifts have been recorded, particularly for H and 13 C. Many algorithms for 
the prediction of shifts have been extracted from these, so that the spectra of most organic compounds can be 
predicted at a useful level of accuracy, using data tables available in several convenient texts [12, 13, 14 and 
151. Alternatively, computer programs are available that store data from 10 -10 5 spectra and then use direct 


structure-comparison methods to predict the H and C chemical shifts of new species. 

Shifts can also be predicted from basic theory, using higher levels of computation, if the molecular structure is 
precisely known [16]. The best calculations, on relatively small molecules, vary from observation by little 
more than the variations in shift caused by changes in solvent. In all cases, it is harder to predict the shifts of 
less common nuclei, because of the generally greater number of electrons in the atom, and also because fewer 
shift examples are available. 


B1.11.6 THE DETECTION OF NEIGHBOURING ATOMS-COUPLINGS 

The H NMR spectrum in figure B 1.1 1.2 shows two resonances in the region around 7 ppm, from the two 
types of ring hydrogens. Each appears as a pair of peaks rather than as a single peak. This splitting into a pair, 
or 'doublet', has no connection with chemical shift, because if the same sample had been studied at twice the 
magnetic field, then the splitting would have appeared to halve, on the chemical shift scale. However, the 
same splitting does remain strictly constant on the concealed, frequency scale mentioned previously, because 
it derives not from i? , but instead from a fixed energy of interaction between the neighbouring hydrogen 
atoms on the ring. Each pair of peaks in the present example is called a doublet: a more general term for such 
peak clusters is a multiplet. The chemical shift of any symmetrical multiplet is the position of its centre, on the 
ppm scale, whether or not any actual peak arises at that point. The area of an entire multiplet, even if it is not 
fully symmetric, obeys the principles of section B 1.1 1.4 above. It follows that the individual peaks in a 
multiplet may be considerably reduced in intensity compared with unsplit peaks, especially if many splittings 
are present. 

The splittings are called /couplings, scalar couplings or spin-spin splittings. They are also closely related to 
hyperfine splittings in ESR. Their great importance in NMR is that they reveal interactions between atoms 
that are chemically nearby and linked by a bonding pathway. The J value of a simple coupling is the 
frequency difference in hertz between the two peaks produced by the coupling. It is often given by the symbol 
n J, where n is the number of bonds that separate the coupled atoms. Thus, in combination with chemical shifts 
and integrals, they will often allow a chemical structure to be determined from a single H NMR spectrum, 
even of a previously unknown molecule of some complexity. Their presence also permits a wide range of 
elegant experiments where spins are manipulated by carefully timed pulses. However, they also complicate 
spectra considerably if more than a few interacting atoms are present, 


-16- 

particularly at low applied fields, when the chemical shift separations, converted back to hertz, are not large in 
comparison with the couplings. Fortunately some of the aforesaid elegant experiments can extract information 
on the connectivities of atoms, even when the multiplets are too complex or overlapped to permit a peak-by- 
peak analysis. 

B1.11.6.1 COUPLING MECHANISMS 

The simplest mechanism for spin-spin couplings is described by the Fermi contact model. Consider two 
nuclei linked by a single electron-pair bond, such as H and 19 F in the hydrogen fluoride molecule. The 
magnetic moment of H can be either 'up' or 'down' relative to i? , as described earlier, with either possibility 
being almost equally probable at normal temperatures. If the H is 'up' then it will have a slight preferential 
attraction for the 'down' electron in the bond pair, i.e. the one whose magnetic moment lies antiparallel to it, 
for magnets tend to pair in this way. The effect will be to unpair the two electrons very slightly and, thus, to 
make the F nucleus more likely to be close to the 'up' electron and thus slightly favoured in energy, if it is 
itself 'down'. The net result is that the H and F nuclei gain slightly in energy when they are mutually 
antiparallel. The favourable and unfavourable arrangements are summarized as follows: 


energetically favourable: I If e^T^ IV Hi efie fF energy gain \J 
energetically unfavourable: Hf eife fF HJ, efle ^F energy loss ^J. 

These energy differences then generate splittings as outlined in figure B 1.1 1.8 . If the energies of the two 
spins, here generalized to I and S, in the magnetic field are corrected by the above ± 4/ quantities, then the 

transitions will move from their original frequencies of Vj and v s to Vj ± ^.7 and v s ± \j. Thus, both the I and 

the S resonances will each be symmetrically split by J Hz. In the case of ^^F, l J^^ = +530 Hz. The positive 
sign applies when the antiparallel nuclear spin configuration is found, and the ' V superscript refers to the 
number of bonds separating the two spins. 

Figure B 1.1 1.8 does not apply accurately in the not uncommon cases when I and S have Larmor frequencies 
whose separation Av is not large compared with J IS . In these cases the two spin states where I and S are 
antiparallel have very similar energies, so that the coupling interaction mixes their spin states, analogously to 
the non-crossing rule in UV-visible spectroscopy. As Av falls, the peak intensities in the multiplet alter so as 
to boost the component(s) nearest to the other multiplet, at the expense of the other components, whilst 
keeping the overall integral of the multiplet constant. This 'roofing' or 'tenting' is an example of a second- 
order effect on couplings. It is actually useful up to a point, in that it helps the analysis of couplings in a 
complex molecule, by indicating how they pair off. Some examples are evident in subsequent figures. Figure 
Bl.11.10 for example, shows mutual roofing of the H-6a and H-6b multiplets, and also of H-3 with H-2. 
Roofing is also seen, more weakly, in Figure B 1.1 1.9 , where every multiplet is slightly tilted towards the 
nearest multiplet with which it shares a coupling. Indeed, it is unusual not to see second-order distortions in 
H NMR spectra, even when these are obtained at very high field. 
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Figure Bl.11.8. Combined energy states for two spins I and S (e.g. H and C) with an exaggerated 
representation of how their mutual alignment via coupling affects their combined energy. 


As Av gets even smaller, the outer components of the multiplet become invisibly small. Further splittings may 
also appear in complex multiplets. In the extreme case that Av = 0, the splittings disappear altogether, even 
though the physical coupling interaction is still operative. This is why the methyl resonance in figure Bl.11.3 
appears as a single peak. It is occasionally necessary to extract accurate coupling constants and chemical 
shifts from complex, second-order multiplets, and computer programs are available for this. 

B1.11.6.2 FACTORS THAT DETERMINE COUPLING CONSTANTS 

Because /arises from the magnetic interactions of nuclei, the simplest factor affecting it is the product YjY s of 

the two nuclear magnetogyric ratios involved. For example, v DF in H 9 F is 82 Hz, i.e. v HF x Y D /y H - This 
totally predictable factor is sometimes discounted by quoting the reduced coupling constant K^ = 

4n 2 J ls /hy lJs . 

A second determining factor in the Fermi contact mechanism is the requirement that the wavefunction of the 
bonding orbital has a significant density at each nucleus, in order for the nuclear and the electron magnets to 
interact. One consequence of this is that K correlates with nuclear volume and therefore rises sharply for 
heavier nuclei. Thus the ^constants in the XH 4 series with X = 3 C, Si, Ge, Sn and 07 Pb are 
respectively +41.3, +84.9, +232, +430 and +938 N A m . Here the average value of (X/nuclear mass 
number) = 3.5 ± 1. 
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The presence of a spin-spin splitting therefore means that the interacting atomic orbitals must each possess 

significant s character, because all orbitals other than s have zero density at the nucleus. The v CH values for 
CH 4 , H 2 C=CH 2 and HC=CH illustrate the dependence of /upon s character. They are respectively +124.9, 
+156.2 and +249 Hz, and so they correlate unusually precisely with the s characters of the carbon orbitals 
bonding to H, these being respectively 25, 33.3 and 50%. A coupling also shows that the bond that carries the 
coupling cannot be purely ionic. A significant J value demonstrates covalency, or at least orbital overlap of 
some kind. 

The next large influence on Vis the number of bonds n in the bonding pathway linking the two spins and 
their stereochemistry. When n > 1, then there will be at least two electron pairs in the pathway, and J will thus 
be attenuated by the extent to which one electron pair can influence the next. In simple cases, the insertion of 
a further electron pair will reverse the sign of J, because parallel electrons in different, interacting orbitals 
attract each other slightly, following Hund's rules. Thus, for a two-bond pathway, a favoured spin 
configuration will now be 


energeli cal I y fa vo uruble : X'felt^eTi^T^ 

Some other general rules have been extracted; they come particularly from V HH data, but also have a wider 
validity in many cases. 

• n = 2 2 Jis often low, because of competing pathways involving different orbitals. A typical value for 
chemically distinct H atoms in a methylene group is -14 Hz. However, it can rise to zero, or even to a 
positive value, in C = CH 2 moieties. 

• n = 3 3 Jis almost always positive and its magnitude often exceeds that of 2 J. It always depends in a 
predictable way on the dihedral angle (|) between the outer two of the three bonds in the coupling 

pathway. Karplus first showed theoretically that J varies to a good approximation as A cos § + B cos 
(|), where A and B are constants, and also that A ^B [17]. His equation has received wide-ranging 


experimental verification. For a typical HCCH pathway, with not particularly electronegative substituents, 

A = 13 Hz and B = -1 Hz. This predicts 3 J= 14.0 Hz for § = 180° and 2.75 Hz for ty = 60°. Thus, a 
typical HCCH 3 coupling, where the relevant dihedral angles are 60° (twice) and 1 80° (once) will 

average to (2 x 2.75 + 14)/3 = 6.5 Hz, as the C-C bond rotates. Almost all other 3 J couplings follow a 
similar pattern, with 3 Jbeing close to zero if § = 90°, even though the values of A vary widely 
according to the atoms involved. The pattern may be pictured as a direct, hyperconjugative interaction 
between the outer bonding orbitals, including their rear lobes. It is only fully effective when these 
orbitals lie in the same plane. The Karplus equation offers a valuable way of estimating bond angles 
from couplings, especially in unsymmetrical molecular fragments such as CH-CH 2 , where the 

presence of two 3 J couplings eliminates any ambiguities in the values of 4>. 

• n = 4 /couplings are often too small to resolve. However, they are important in cases where the 
relevant orbitals are aligned appropriately. If the molecular fragment HC-C=CH has the first CH 

bond approximately parallel to the C=C n orbital, the resulting hyperconjugative interaction will give 
rise to an 'allylic' coupling of about -2 Hz. A similar coupling arises in a saturated HC-C-CH 
fragment if the bonds lie approximately in a W configuration. In such cases the rear lobes of the CH 
bonding orbitals touch, thus offering an extra electronic pathway for coupling. Indeed, all such 
contacts give rise to couplings, even if the formal bonding pathway is long. 
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Although the Fermi contact mechanism dominates most couplings, there are smaller contributions where a 
nuclear dipole physically distorts an orbital, not necessarily of s type [18]. There are many useful 
compilations of J and K values, especially for HH couplings (see [9], ch 4, 7-21 and [ 12 , 13 , 14 and 15]). 

B1.1 1.6.3 MULTIPLE COUPLINGS 

In principle, every nucleus in a molecule, with spin quantum number 7, splits every other resonance in the 
molecule into 27+ 1 equal peaks, i.e. one for each of its allowed values of m v This could make the NMR 
spectra of most molecules very complex indeed. Fortunately, many simplifications exist. 

(i) As described above, most couplings fall off sharply when the number of separating bonds increases. 

(ii) Nuclei with a low isotopic abundance will contribute correspondingly little to the overall spectra of the 
molecule. Thus, splittings from the 1.1% naturally abundant C isotope are only detectable at low 
level in both H and C NMR spectra, where they are called C sidebands. 

(iii) Nuclei with I > 1/2 often relax so rapidly that the couplings from them are no longer detected, for 

reasons analogous to shift averaging by chemical exchange. For example, couplings from CI, Br and I 
atoms are invisible, even though these halogens all have significant magnetic moments. 

(iv) Chemical exchange will similarly average couplings to zero, if it takes place much faster than the 
value of the coupling in hertz. 

(v) Selected nuclei can also have their couplings removed deliberately (decoupled) by selective irradiation 
during the acquisition of the FID. 

Despite these simplifications, a typical H or 19 F NMR spectrum will normally show many couplings. Figure 
Bl.11.9 is the H NMR spectrum of propan-1-ol in a dilute solution where the exchange of OH hydrogens 
between molecules is slow. The underlying frequency scale is included with the spectrum, in order to 
emphasize how the couplings are quantified. Conveniently, the shift order matches the chemical order of the 
atoms. The resonance frequencies of each of the 18 resolved peaks can be quantitatively explained by the four 

chemical shifts at the centre of each multiplet, plus just three values of J, two of these being in fact almost 
the same. If the hydrogen types are labelled in chemical order from the methyl hydrogens H^-3 to the 


hydroxyl hydrogen then the three coupling types visibly present are J 23 = 2a, J^ 2 = 2b and J^_ QH = 2c Hz, 
with a « b because of the strong chemical similarity of all the HCCH bonding pathways. Consider first the 
methyl resonance H 3 -3, and its interaction with the two equivalent hydrogens H-2 and H-2'. Each of the three 
equivalent hydrogens of H 3 -3 will first be split into a 1:1 doublet by H-2, with peak positions S 3 ± a, i.e. a Hz 
on either side of the true chemical shift 8 3 , reconverted here into frequency units. H-2' will further split each 
component, giving three peaks with positions 8 3 ± a± a. When all the possible ± combinations are considered, 
they amount to peaks at 8 3 (twice) plus 8^±2a (once each) and are conveniently called a (1:2:1) triplet. The 
OH multiplet is similarly a (1:2:1) triplet with peaks at 5 nw (twice) plus 5 nw ±2c (once each). 
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Figure Bl.11.9. Integrated 250 MHz H NMR spectrum of dilute propan-1-ol in dimethylsulfoxide solvent. 
Here, the shift order parallels the chemical order. An expansion of the H 2 -l multiplet is included, as is the 

implicit frequency scale, also referenced here to TMS = 0. 
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Figure Bl.11.10. 400 MHz H NMR spectrum of methyl-a-glucopyranose (structure as in figure Bl.l 1.12) 


together with the results of decoupling at H-l (centre trace) and at H-4 (upper trace). 
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Each H-l hydrogen contributes to a more complex multiplet, with peaks at all combinations of the frequencies 
8 1 ± b ± b ± c, thus making a triplet of doublets whose peak positions are 8 1 ± c (twice each) and 8 1 ± 2b ± c 
(once each). Finally, each H-2 hydrogen contributes to the sextet at 8 2 = 1.42 ppm, whose individual 
components appear at 8 2 ± a ± a ± a ± 6 ± 6. If we take Z? = a, then the possible combinations of these 
frequencies amount to the following peak positions, with their relative heights given in brackets: S 2 ±5a (1), 
8 2 ±3a (5), § 2 ± a (10). Note that the relative intensities follow Pascal's triangle and also that the separation of 
the outermost line of a multiplet must always equal the sum of the underlying couplings, when only spin- 1/2 
nuclei are involved. 

The same principles apply to couplings from spins with I> 1/2, where these are not seriously affected by 
relaxation. Figure B 1.1 1.4 illustrates a common case. The solvent resonance at 30 ppm is a 1:3:6:7:6:3:1 

multiplet arising from the C H 3 carbons in the solvent deuterioacetone. Each of the three deuterium nuclei 
splits the carbon resonance into a 1:1:1: triplet, corresponding to the three possible Zeeman states of an / = 1 
nucleus. Thus the overall peak positions are the possible combinations of 8 C + {a or or - a) + {a or or - a) 
+ {a or or - a), where the coupling constant a is -20 Hz. 

With relatively simple spectra, it is usually possible to extract the individual coupling constants by inspection, 
and to pair them by size in order to discover what atoms they connect. However, the spectra of larger 
molecules present more of a challenge. The multiplets may overlap or be obscured by the presence of several 
unequal but similarly sized couplings. Also, if any chiral centres are present, then the two hydrogens in a 

methylene group may no longer have the same chemical shift, and in this case they will also show a mutual 2 J 
coupling. Fortunately, several powerful aids exist to meet this challenge: decoupling and a range of 
multidimensional spectra. 

B1.11.6.4 DECOUPLING 

Several examples have already been given of resonances that merge because the underlying molecular or spin 
states interchange more rapidly than their frequency separation. Similar interchanges can also be imposed so 
as to remove a coupling in a controllable way. One irradiates a selected resonance so as to give its 
magnetization no preferred direction, within the timescale set by the coupling to be removed. This can be 
achieved selectively for any resonance removed from others by typically 20 Hz, although multiplets nearly as 
close as this will unavoidably be perturbed in a noticeable and predictable way by the irradiation process. 

Figure Bl. 11.10 offers an example. It shows the 400 MHz H NMR spectrum of a-1-methylglucopyranose, 
below two further spectra where ^-decoupling has been applied at H-l and H-4 respectively. The main 
results of the decouplings are arrowed. Irradiation at the chemical shift position of H-l removes the smaller of 
the two couplings to H-2. This proves the saccharide to be in its a form, i.e. with § « 60° rather than 180°, 
according to the Karplus relationship given previously. Note that both the H-l resonance and the overlapping 
solvent peak are almost totally suppressed by the saturation, caused by the decoupling irradiation. The H-4 
resonance is a near- triplet, created by the two large and nearly equal couplings to H-3 and H-5. In both these 
cases, § « 180°. The spectrum in this shift region is complicated by the methyl singlet and by some minor 
peaks from impurities. However, these do not affect the decoupling process, beyond being severely distorted 
by it. Genuine effects of decoupling are seen at the H-3 and the H-5 resonances only. 

Single-frequency decoupling is easy and rapidly carried out. However, it may be limited by the closeness of 
different multiplets. Also, it will not normally be possible to apply more than one frequency of decoupling 
irradiation at a time. Fortunately, these disadvantages do not apply to the equivalent multidimensional 
methods. 
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It is also usually possible to remove all the couplings from a particular isotope, e.g. H, provided that one only 
wishes to observe the spectrum from another isotope, e.g. 13 C. Either the decoupling frequency is noise- 
modulated to cover the relevant range of chemical shifts, or else the same decoupling is achieved more 
efficiently, and with less heating of the sample, by using a carefully designed, continuous sequence of 

composite pulses. Figure B 1.1 1.4 is a H-decoupled C spectrum: this is sometimes abbreviated to C{ H}. 
In the absence of decoupling, the resonances would each be split by at least four short- and long-range 

couplings from H atoms, and the signal-to-noise ratio would drop accordingly. The largest couplings would 
arise form the directly attached hydrogens. Thus, one might use a 13 C NMR spectrum without decoupling to 
distinguish CH 3 (broadened 1:3:3:1 quartets) from CH 2 (1:2:1 triplets) from CH (1:1 doublets) from C 

(singlets). However, more efficient methods are available for this, such as the DEPT and INEPT pulse 
sequences outlined below. Two-dimensional methods are also available if one needs to detect the 
connectivities revealed by the various ^C-^YL couplings. 

B1.11.6.5 POLARIZATION TRANSFER 

Couplings can also be exploited in a quite different way, that lies behind a wide range of valuable NMR 
techniques. If just one component peak in a multiplet X can be given a non-equilibrium intensity, e.g. by 
being selectively inverted, then this necessarily leads to large changes of intensity in the peaks of any other 
multiplet that is linked via couplings to X. Figure Bl. 11.11 attempts to explain this surprising phenomenon. 

Part A shows the four possible combined spin states of a ^C 1 !! molecular fragment, taken as an example. 
These are the same states as in Figure B 1.1 1.8 , but attention is now drawn to the populations of the four spin 
states, each reduced by subtracting the 25% population that would exist at very low field, or alternatively at 
infinite temperature. The figures above each level are these relative differences, in convenient units. The 
intensity of any one transition, i.e. of the relevant peak in the doublet, is proportional to the difference of these 
differences, and is therefore proportionally relative to unity for any 13 C transition at Boltzmann equilibrium, 
and 4 for any H transition. 

The only alteration in part B is that the right hand H transition has been altered so as to interchange the 
relevant populations. In practice, this might be achieved either by the use of a highly selective soft pulse, or 
by a more elaborate sequence of two pulses, spaced in time so as the allow the Larmor precessions of the two 
doublet components to dephase by 180°. The result is to produce a 13 C doublet whose relative peak intensities 
are 5: -3, in place of the original 1:1. Further, similar manipulations of the 13 C spins can follow, so as to re- 
invert the -3 component. This results in a doublet, or alternatively a singlet after decoupling, that has four 
times the intensity it had originally. The underlying physical process is called polarization transfer. More 
generally, any nucleus with magnetogyric ratio y A , coupled to one with y B , will have its intensity altered by 
y B /y A . This gain is valuable if Y B »Y A - The technique is then called INEPT (insensitive nucleus enhancement 
by polarization transfer) [19]. It has an added advantage. The relaxation rate of a high-y nucleus is usually 
much less than that of a low-y nucleus, because it has a bigger magnetic moment to interact with its 
surroundings. In the INEPT experiment, the spin populations of the low-y nucleus are driven by the high-y 
nucleus, rather than by natural relaxation. Hence the repeat time for accumulating the FID is shortened to that 
appropriate for the high-y nucleus. 
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Figure Bl.11.11. Polarization transfer from H to 13 C (see the text). The inversion of one H transition also 
profoundly alters the C populations. 

The same general methodology can also be applied to edit (for example) a decoupled 13 C NMR spectrum into 
four subspectra, for the CH 3 , CH 2 , CH and C moieties separately. A common variant method called DEPT 

(distortionless enhancement by polarization transfer) uses non-standard pulse angles, and is a rapid and 

reliable way for assigning spectra of medium complexity [20], 

More generally, note that the application of almost any multiple pulse sequence, where at least two pulses are 
separated by a time comparable to the reciprocal of the coupling constants present, will lead to exchanges of 
intensity between multiplets. These exchanges are the physical method by which coupled spins are correlated 
in 2D NMR methods such as correlation spectroscopy (COSY) [21]. 
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B1.11.7 TWO-DIMENSIONAL METHODS 


The remarkable stability and controllability of NMR spectrometers permits not only the precise accumulation 
of FIDs over several hours, but also the acquisition of long series of spectra differing only in some stepped 
variable such as an interpulse delay. A peak at any one chemical shift will typically vary in intensity as this 
series is traversed. All the sinusoidal components of this variation with time can then be extracted, by Fourier 
transformation of the variations. For example, suppose that the normal ID NMR acquisition sequence 
(relaxation delay, 90° pulse, collect FID)^ is replaced by the 2D sequence (relaxation delay, 90° pulse, delay 
x-90° pulse, collect FID)^ and that x is increased linearly from a low value to create the second dimension. 
The polarization transfer process outlined in the previous section will then cause the peaks of one multiplet to 
be modulated in intensity, at the frequencies of any other multiplet with which it shares a coupling. 


The resulting data set constitutes a rectangular or square array of data points, having a time axis in both 
dimensions. This is converted via Fourier transformation in both dimensions, giving the corresponding array 
of points in a 2D spectrum, with each axis being in 8 units for convenience. These are most conveniently 
plotted as a contour map. In the above example the experiment is COSY-90, '90' referring to the second 
pulse, and the map should be at least approximately symmetrical about its diagonal. Any off-diagonal or 
'cross' peaks then indicate the presence of couplings. Thus, a cross-peak with coordinates (S A , 8 B ) indicates a 
coupling that connects the multiplets A and B. The spectrum is normally simplified by eliminating 
superfluous, mirror-image peaks, either with phase-cycled pulses and appropriate subtractions or by the use of 
carefully controlled, linear pulsed field gradients. No special equipment is needed in a modern spectrometer, 
although the data sets are typically 1 Mbyte or larger. The time requirement is only about 16 times that for a 
ID spectrum, in favourable cases, and may be less if pulsed field gradients are used. 

B1.11.7.1 HOMONUCLEAR COSY SPECTRA 

Figure Bl. 11.12 shows the 2D COSY-45 contour plot of the same a-1-methylglucopyranose compound as in 
a previous figure. The corresponding ID spectrum, plotted directly above, is in fact the projection of the 2D 
spectrum onto the horizontal shift axis. The hydrogen assignments are added underneath the multiplets. The 
analysis of such a spectrum begins by selecting one peak, identifiable either by its distinctive chemical shift or 
by mapping its coupling pattern onto the expected pattern of molecular connectivity. Here, the doublet at 4.73 
ppm uniquely has the high shift expected when a CH group bears two O substituents. The coupling pattern 
can now be identified by noting the horizontal and vertical alignments of all the strongest diagonal and off- 
diagonal resonances. These alignments must be exact, within the limits of the digitization, as the COSY 
process does not cause any shifts. The H-l to H-2 correlation exemplifies this precision, in that the cross-peak 
whose centre has coordinates at 8 (3.47, 4.73) is precisely aligned with the centres of the H-2 and H-l 
multiplets respectively. The same principle allows one to see that the cross-peak at 8 (3.47, 3.59) arises from 
the (H-2 to H-3) coupling, whereas the more complex multiplet to the right of it must come from the overlap 
of the (H-3 to H-4) and the (H-5 to H-4) cross-peaks, respectively at 8 (3.59, 3.33) and 8 (3.56, 3.33). In this 
way it is possible to distinguish the H-3 and H-5 multiplets, respectively at 3.59 and 3.56 ppm, even though 
they overlap in the ID spectrum. This illustrates the power of multidimensional NMR to separate overlapping 
resonances. In a similar way, the 2D spectrum makes it clear that the OCH 3 resonance at 3.35 ppm has no 
coupling connection with any other hydrogen in the saccharide, so that its shift overlap with H-4 is purely 
accidental. 
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Figure Bl.11.12. H-H COSY-45 2D NMR spectrum of methyl-a-glucopyranose (structure shown). The 
coupling links and the approximate couplings can be deduced by inspection. 
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Several other features of the figure merit attention. The two hydroxymethyl resonances, H-6a at 3.79 ppm and 
H-6b at 3.68 ppm, are separated in shift because of the chiral centres present, especially the nearby one at C-5. 
(One should note that a chiral centre will always break the symmetry of a molecule, just as an otherwise 
symmetrical coffee mug loses its symmetry whilst grasped by a right hand.) H-5 is thus distinctive in showing 
three coupling connections, and could be assigned by this alone, in the absence of other information. Also, the 
(H-6a to H-6b) cross-peak gives the general impression of a tilt parallel to the diagonal, whereas some of the 
other cross-peaks show the opposite tilt. This appearance of a tilt in the more complex cross-peaks is the 
deliberate consequence of completing the COSY pulse sequence with a 45° rather than a 90° pulse. It actually 
arises from selective changes of intensity in the component peaks of the off-diagonal multiplet. The 'parallel' 


pattern arises from negative values of the coupling constant responsible for the cross-peak, i.e. the 'active' 
coupling. It therefore usually shows that the active coupling is of 2 Jor 4 /type, whereas an 'antiparallel' 
pattern usually arises from a 3 J active coupling. 

Another advantage of a COSY spectrum is that it can yield cross-peaks even when the active coupling is not 
fully resolved. This can be useful for assigning methyl singlets, for example in steroids, and also for 
exploiting couplings comparable with the troublesomely large linewidths in protein spectra, for example, or 

the B{ H} spectra of boranes. However, it does also mean that longer-range couplings may appear 
unexpectedly, such as the weak (H-l to H-3), (H-l to H-6b) and (H-6a to H-4) couplings in figure Bl. 11.12 . 
Their appearance can yield stereochemical information such as the existence of bonding pathways having 
appropriate conformations, and they are usually recognizable by their comparatively weak intensities. 

Many variations of the basic homonuclear COSY experiment have been devised to extend its range. A brief 
guide to some classes of experiment follows, along with a few of the common acronyms. 

(i) Experiments using pulsed field gradients [23]. These can be very rapid if concentration is not a 
limiting factor, but they require added equipment and software. Acronym: a prefix of 'g'. 

(ii) Multiple quantum methods. These employ more complex pulse sequences, with the aim of suppressing 
strong but uninteresting resonances such as methyl singlets. Acronyms include the letters MQ, DQ, 
etc. They are also useful in isotope-selective experiments, such as INADEQUATE [24]. For example, 

the 13 C- 13 C INADEQUATE experiment amounts to a homonuclear 13 C COSY experiment, with 
suppression of the strong singlets from uncoupled carbons. 

(iii) Extended or total correlation methods. These reveal linked couplings, involving up to every atom in a 
group, where at least one reasonably large coupling exists for any one member atom. They are 
valuable for identifying molecular moieties such as individual amino acids in a peptide, for they can 
remove ambiguities arising from peak overlaps. Acronym: TOCSY (total correlation spectroscopy). 

(iv) Correlations exploiting nuclear Overhauser enhancements (NOEs) in place of couplings, such as 
NOESY. These are discussed in section B 1.1 1.8 . 

(v) Correlations that emphasize smaller couplings. These are simply achieved by the addition of 
appropriate, fixed delays in the pulse sequence. 

Homonuclear techniques such as /-resolved spectroscopy also exist for rotating all multiplets through 90°, to 
resolve overlaps and also give a ID spectrum from which all homonuclear couplings have been removed [26], 
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The theory of these and other multidimensional NMR methods requires more than can be visualized using 
magnetization vectors. The spin populations must be expressed as the diagonal elements of a density matrix, 
which not only has off-diagonal elements indicating ordinary, single-quantum transitions such as those 
described earlier, but also other off-diagonal elements corresponding to multiple-quantum transitions or 
'coherences'. The pulses and time developments must be treated as operators. The full theory also allows 
experiments to be designed so as to minimize artefacts [27, 28 and 29 ]. 

B1.11.7.2 HETERONUCLEAR CORRELATION SPECTRA 

Similar experiments exist to correlate the resonances of different types of nucleus, e.g. C with H, provided 
that some suitable couplings are present, such as ^Jq^ It is necessary to apply pulses at both the relevant 
frequencies and it is also desirable to be able to detect either nucleus, to resolve different peak clusters. 
Detection through the nucleus with the higher frequency is usually called reverse-mode detection and 
generally gives better sensitivity. The spectrum will have the two different chemical shift scales along its axes 


and therefore will not be symmetrical. 

A ^(detected)- 1 ^ shift correlation spectrum (common acronym HMQC, for heteronuclear multiple 
quantum coherence, but sometimes also called COSY) is a rapid way to assign peaks from protonated 
carbons, once the hydrogen peaks are identified. With changes in pulse timings, this can also become the 
HMBC (heteronuclear multiple bond connectivity) experiment, where the correlations are made via the 

smaller Jqu and 3 ^ CH couplings. This helps to assign quaternary carbons and also to identify coupling and, 
hence, chemical links, where H-H couplings are not available. Similar experiments exist for almost any useful 

pairing of nuclei: those to 15 N are particularly useful in the spectra of suitably labelled peptides. Figure 
Bl.11.13 shows a simple ^C^H shift correlation spectrum of the same saccharide as in the previous two 
figures, made by exploiting the 1 J CH couplings. The detection was via 13 C and so the spectrum has good 
resolution in the C dimension, here plotted as the horizontal axis, but rather basic resolution in the vertical 
*H dimension. The H resolution is nonetheless sufficient to show that C-6 is a single type of carbon attached 
to two distinct types of hydrogen. Inspection of the H axis shows that the approximate centre of each 2D 
multiplet matches the shifts in the previous spectra, this affords an unambiguous assignment of the carbons. 

Even more complex sequences can be applied for the simultaneous correlation of, for example, 15 N shifts 
with H-H COSY spectra. Because the correlation of three different resonances is highly specific, the chance 
of an accidental overlap of peaks is greatly reduced, so that very complex molecules can be assigned. 
Furthermore, the method once again links atoms where no simple H-H couplings are available, such as across 
a peptide bond. Such 3D NMR methods generate enormous data sets and, hence, very long accumulation 
times, but they allow the investigation of a wide range of labelled biomolecules. 
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Figure Bl.11.13. C- H shift correlation via V CH . This spectrum of methyl-a- 
glucopyranose (structure as in figure Bl. 11.12 ) permits unambiguous C assignments 
once the H assignments have been determined as in figure B 1.1 1.12 . 


B1.11.8 SPATIAL CORRELATIONS 


./couplings are not the only means in NMR for showing that two atoms lie close together. Pairs of atoms, 

particularly H, also affect each other through their dipolar couplings. Even though the dipolar splittings are 
averaged away when a molecule rotates isotropically, the underlying magnetic interactions are still the major 

contributors to the H Ty Each pairwise interaction of a particular H A with any other H B , separated by a 
distance r^ in a rigid molecule, will contribute to 1/7\ A in proportion to (7^3) . 

Although it is possible to detect this interaction by the careful measurement of all the T^s in the molecule, it 

may be detected far more readily and selectively, via the mutual NOE. When a single multiplet in a H NMR 
spectrum is selectively saturated, the resulting input of energy leaks to other nuclei, through all the dipolar 
couplings that are present in the molecule [29]. This alters the intensities and likewise the integrals of their 
resonances. The A spin of an isolated pair of spins, A and B, in a molecule tumbling fairly rapidly, will have 
its intensity increased by the multiplicative factor 1 + y B /2y A . This amounts to a gain of 50% when A and B 
are both 1 H, although it will normally be less in practice, because of the competing interactions of other spins 
and other relaxation mechanisms. If B is H and A is 13 C, the corresponding enhancement is 299%. In this 
case there need be no competition, for all the l H resonances can be saturated simultaneously, using the 
techniques of broadband decoupling. Also, dipolar coupling usually dominates the relaxation of all carbons 

bearing hydrogens. The threefold intensity gain is a valuable bonus in ^Cj 1 !!} NMR. 
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B1.11.8.1 NOE-DIFFERENCE SPECTRA 


Even if the intensity changes are only of the order of 1%, they may nevertheless be reliably detected using 

difference spectra [30], Figure B 1.1 1.14 shows an NOE-difference H NMR spectrum of slightly impure 
aspirin, over a normal spectrum of the same sample. The difference spectrum was obtained by gently 
saturating the methyl singlet at 2.3 ppm, for a period of 2 s just before collecting the spectrum, and then by 
precisely subtracting the corresponding spectrum acquired without this pre-irradiation. As the pre-irradiation 
selectively whitewashes the methyl peak, the result of the subtraction is to create a downwards-going resultant 
peak. The other peaks subtract away to zero, unless the energy leakage from the pre-irradiation has altered 
their intensity. In the figure, the four aromatic H resonances to the left subtract to zero, apart from minor 
errors arising from slight instabilities in the spectrometer. In contrast, the broad OH resonance at 5 ppm in the 
upper spectrum proves that this hydrogen lies close to the CH 3 group. Thus, aspirin must largely possess the 
conformation shown on the right. If the left-hand conformation had been significant, then the aromatic H at 
7.2 ppm would have received a significant enhancement. 
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Figure Bl.11.14. H NOE-difference spectrum (see the text) of aspirin, with pre-saturation at the methyl 
resonance, proving that the right-hand conformation is dominant. 

NOE-difference spectroscopy is particularly valuable for distinguishing stereoisomers, for it relies solely on 
internuclear distances, and thus avoids any problems of ambiguity or absence associated with couplings. With 
smallish molecules, it is best carried out in the above ID manner, because ~2 s are necessary for the 
transmission of the NOE. The transmission process becomes more efficient with large molecules and is 
almost optimal for proteins. However, problems can occur with molecules of intermediate size [31]. A 2D 
version of the NOE-difference experiment exists, called NOESY. 
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B1.11.8.2 PROTEIN STRUCTURES 

If multidimensional spectra of both COSY and NOESY types can be obtained for a protein, or any 
comparable structured macromolecule, and if a reasonably complete assignment of the resonances is achieved, 
then the NOESY data can be used to determine its structure [31, 32 and 33]. Typically, several hundred 
approximate H-H distances will be found via the NOESY spectrum of a globular protein having mass around 
20 000 Da. These can then be used as constraints in a molecular modelling calculation. The resulting 
structures can compare in quality with those from x-ray crystallography, but do not require the preparation of 
crystals. Related spectra can also elucidate the internal flexibility of proteins, their folding pathways and their 
modes of interaction with other molecules. Such information is vital to the pharmaceutical industry in the 
search for new drugs. It also underpins much biochemistry. 
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B1. 12 NMR of solids 

R Dupree and M E Smith 


B1.12.1 INTRODUCTION 

Solid-state NMR has long been used by physicists to study a wide range of problems such as 
superconductivity, magnetism, the electronic properties of metals and semiconductors, ionic motion etc. The 
early experiments mostly used 'wide line' NMR where high resolution was not required but with the 
development of the technique, particularly the improvements in resolution and sensitivity brought about by 
magic angle spinning (BLT2A3), and decoupling and cross polarization ( Bl. 12.4.4 ), solid-state NMR has 
become much more widely used throughout the physical and, most recently, biological sciences. Although 
organic polymers were the first major widespread application of high-resolution solid-state NMR, it has found 
application to many other types of materials, from inorganics such as aluminosilicate microporous materials, 
minerals and glasses to biomembranes. Solid-state NMR has become increasing multinuclear and the utility of 
the technique is evidenced by the steady and continued increase in papers that use the technique to 
characterize materials. There is no doubt that the solid-state NMR spectrometer has become a central piece of 
equipment in the modern materials physics and chemistry laboratories. 


The principal difference from liquid-state NMR is that the interactions which are averaged by molecular 
motion on the NMR timescale in liquids lead, because of their anisotropic nature, to much wider lines in 
solids. Extra information is, in principle, available but is often masked by the lower resolution. Thus, many of 
the techniques developed for liquid-state NMR are not currently feasible in the solid state. Furthermore, the 
increased linewidth and the methods used to achieve high resolution put more demands on the spectrometer. 
Nevertheless, the field of solid-state NMR is advancing rapidly, with a steady stream of new experiments 
forthcoming. 

This chapter summarizes the interactions that affect the spectrum, describes the type of equipment needed and 
the performance that is required for specific experiments. As well as describing the basic experiments used in 
solid-state NMR, and the more advanced techniques used for distance measurement and correlation, some 
emphasis is given to nuclei with spin/> ^since the study of these is most different from liquid-state NMR. 


B1.12.2 FUNDAMENTALS 

B1.12.2.1 INTERACTION WITH EXTERNAL MAGNETIC FIELDS 

NMR is accurately described as a coherent radiofrequency (RF) spectroscopy of the nuclear magnetic energy 
levels. The physical basis of the technique is the lifting of the degeneracy of the different m nuclear spin 
states through interaction with an external magnetic field, creating a set of energy levels. The total energy 
separation between these levels is determined by a whole range of interactions. The nuclear spin Hamiltonian 
has parts corresponding to the experimental conditions, which are termed external, and parts that result from 
the sample itself, which are called internal. The internal part provides information about the physical and 
electronic structure of the sample. 


The total interaction energy of the nucleus may be expressed as a sum of the individual Hamiltonians given in 
equation B 1.1 2.1, (listed in table B 1.1 2.1) and are discussed in detail in several excellent books [I, 2, 3 and 
4]. 


H lttT = H ? + H M + H D + H C$ + H YL + ^J + H ? + W Q(!) + H Qt2) + _ _ (B1.12.1) 

The large static applied magnetic field (2? ) produces the Zeeman interaction (= -Xg)q/ z5 where I z is the z- 

component of I (the nuclear spin) with eigenvalues m z (-1 <m <I), figure B 1.1 2. 1(a) with the nuclear 
magnetic dipole moment \i (= yfi/, where y is the gyromagnetic ratio of the nucleus). The B^ field is taken to 

define the z-axis in the laboratory frame and gives an interaction energy of 

H z =/iSu = -yhB m z = -ho^m z (B1.12.2) 

where co is the Larmor frequency in angular frequency. In this chapter only the high field limit is considered 
whereby the nuclear spin states are well described by the Zeeman energy levels, and all the other interactions 
can be regarded as perturbations of these spin states. Any nucleus that possesses a magnetic moment is 
technically accessible to study by NMR, thus only argon and cerium of the stable elements are excluded as 
they possess only even-even isotopes. 

Table Bl.12.1 Summary of main interactions important to NMR. 


H"! Interaction 


Typical 
size 


A. (Hz) 


Comments 


l_z Zeeman Unitary q 10 7 -10 9 Interaction with main magnetic field 

Unitary q 10 3 -10 5 Interaction with RF field 

D I 10 3 -10 4 Through space spin-spin interaction, axially symmetric traceless tensor 

Alteration of the magnetic field by the electrons 


H RF RF 

uD Dipolar D 

l,cs Chemical a 
H shielding 

lJ Indirect spin J 

mP Paramagnetic 
^K Knight shift K 
uQ Quadrupolar Eq 


B Q 10 2 -10 5 

/ 1_10 3 

S 10 2 -10 5 
S 10 2 -10 5 
/ 10 3 -10 7 


Spin-spin interaction mediated via the bonding electrons through the 
contact interaction 

Interaction with isolated unpaired electrons 

Interaction with conduction electrons via the contact interaction 

Interaction of nuclear quadrupolar moment with the electric field gradient 
(q) and / twice to produce effectively an I 2 interaction 
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Figure Bl.12.1. (a) Energy level diagram for an /= nucleus showing the effects of the Zeeman interaction 
and first- and second-order quadrupolar effect. The resulting spectra show static powder spectra for (b) first- 
order perturbation for all transitions and (c) second-order broadening of the central transition, (d) The MAS 
spectrum for the central transition . 


For a spin-j nucleus the two m z values ±$ have energies of ±yfi B^/2 giving an energy separation of yfl Bq. 

Thermal equilibrium produces a Boltzmann distribution between these energy levels and produces the bulk 
nuclear magnetization of the sample through the excess population which for a sample containing a total of TV 
spins is -Nyfi B^llkT. For example for y Si in an applied magnetic field of 8.45 T the excess in the 

populations at room temperature is only 1 in 10 5 so that only a small number of the total number of spins 
contribute to the signal, and this is possibly the greatest weakness of NMR. As the magnetization is directly 
proportional to the applied magnetic field, sensitivity provides one of the drives to higher applied magnetic 
fields. The dependence on y explains why some nuclei are more favoured as the ease with which a nucleus can 
be observed depends upon the receptivity (i? ) 

fi* = y*C s / x (/ x +l) (B1.12.3) 

where C x is the natural abundance of the isotope being considered [3]. 

In spectroscopy involving electromagnetic (em) radiation both spontaneous and stimulated events can occur. 

NMR is a relatively low-frequency em spectroscopy so that spontaneous events are very unlikely (-10 s ) 
and stimulated events therefore dominate. This means that NMR is a coherent spectroscopy so that the excess 
in population can be made to work in a concerted way. The NMR experiment involves measurement of the 
energy level separation by application of a time-varying magnetic field B^ orthogonal to B^. B^ excites 
transitions (through I + and I _, the conventional spin raising and lowering operators) when its frequency (co) is 
close to C0q, typically in the RF region of 10 MHz to 1 GHz. The Hamiltonian for this interaction is 

H RV = (-yfiB,/2)(7,c" ifttf + /_e' krf ). (B1.12.4) 

B1.12.2.2 LOCAL INTERACTIONS 

In diamagnetic insulating solids spin-1 nuclei experience a range of interactions that include magnetic dipolar 

(// D ) interaction through space with nearby nuclear magnetic moments, chemical shielding (Z/^), 
modification of the magnetic field at the nucleus due to the surrounding electrons and indirect spin-spin 

coupling (// J ) — interaction of magnetic moments mediated by intermediate electron spins. In materials that 
contain paramagnetic centres the unpaired electrons can interact strongly with the nuclei (£r) and possibly 
cause very large shifts and severe broadening of the NMR signal. The fluctuating magnetic fields produced by 
the electron spins can produce very efficient relaxation. Hence, for solids where the nuclei are slowly relaxing 
and will dissolve paramagnetic ions, small amounts (-0.1 mol%) are added to aid relaxation. In materials 


containing conduction electrons these can also interact strongly with the nuclear spin via a contact interaction 

// K that produces relaxation and a change in resonance position termed the Knight shift, both of which 
provide important information on the nature of the density of states at the Fermi surface. Nuclei with spin / > 

|are also affected by the electric quadrupole interaction (ifi), an interaction between the nuclear electric 
quadrupole moment and the gradient in the electric field at the nucleus. Although this is an electrical 
interaction it depends on the magnetic quantum number (m z ) so affects the NMR spectrum [2]. The 
background of the quadrupole interaction is given in the classic paper by Cohen and Reif [5]. 

All interactions associated with NMR can be expressed as tensors and may be represented by a general 
expression 


H m ^^IJ^AJ (B1.12.5) 

where Z/ 111 is one of the component Hamiltonians in equation B 1.1 2.1 . For each interaction there is a constant 
k, a 3 x 3 second-rank tensor T- and then another vector quantity that the spin (I) interacts with: this is either a 

field or a spin. Normally three numbers are needed to describe a 3 x 3 tensor relating two vectors and these 

numbers are usually the isotropic value, the anisotropy and the asymmetry [6]. Their exact definition can vary 

even though there are conventions that are normally used, so that any paper should be examined carefully to 

see how the quantities are being defined. Note that the chemical shielding is fundamentally described by the 

tensor a although in experiment data it is the chemical shift 8 that is normally reported. The shielding and the 

shift are related by 8 = 1 - a [3, 6]. 

In a typical isotropic powder the random distribution of particle orientations means the principal axes systems 
(where the tensor only has diagonal elements) will be randomly distributed in space. In the presence of a large 
magnetic field, this random distribution gives rise to broadening of the NMR spectrum since the exact 
resonance frequency of each crystallite will depend on its orientation relative to the main magnetic field. 
Fortunately, to first order, all these interactions have similar angular dependences of (3 cos 2 - 1 + r\ sin 2 
cos cp) where r| is the asymmetry parameter of the interaction tensor (r| = for axial symmetry). Lineshapes 
can provide very important information constraining the local symmetry of the interaction that can often be 
related to some local structural symmetry. 

Of the NMR-active nuclei around three-quarters have I> 1 so that the quadrupole interaction can affect their 
spectra. The quadrupole inter action can be significant relative to the Zeeman splitting. The splitting of the 
energy levels by the quadrupole interaction alone gives rise to pure nuclear quadrupole resonance (NQR) 
spectroscopy. This chapter will only deal with the case when the quadrupole interaction can be regarded as 
simply a perturbation of the Zeeman levels. 

The electric field gradient is again a tensor interaction that, in its principal axis system (PAS), is described by 
the three components V, v ,, V., and V,,, where ' indicates that the axes are not necessarily coincident with 

x. x. y y z z 

the laboratory axes defined by the magnetic field. Although the tensor is completely defined by these 
components it is conventional to recast these into 'the electric field gradient' eq = V , ,, the largest component, 

and the asymmetry parameter ri n = I V., - V..UV... The electric field gradient is set up by the charge 

v ,, , y y xx zz 

distribution outside the ion (e.g. AP ) but the initially spherical charge distribution of inner shells of electrons 
of an ion will themselves become polarized by the presence of the electrical field gradient to lower their 
energy in the electric field. This polarization produces an electric field gradient at the nucleus itself of eq n = 
eq(\ - y ) where (1 - y ) is the Sternheimer antishielding factor which is a measure of the magnification of 
eq due to this polarization of the core electron shells [7]. Full energy band structure calculations of electric 
field gradients show how important the contribution of the electrons on the ion itself are compared to the 
lattice. The quadrupole Hamiltonian (considering axial symmetry for simplicity) in the laboratory frame, with 


the angle between the z'-axis of the quadrupole PAS and i? , is 

where C Q = (e 2 g 2//z)(l ~ Y^) an d a = 1(1+ 1). In the limit // z 3§Wi/Q a standard perturbation expansion using 

the eigenstates of H z is applicable. The first-order term splits the spectrum into 21 components ( figure B 1.1 2.1 
(a)) of intensity |(m-l|/Jm}| i^a - m(m-l)) at frequency 


$ = OCq/AI(2I - L))(3cc5 2 - l)(m. - 1/2). (B1.12.7) 

This perturbation can cause the non-central transitions (i.e. m ^ 2) to be shifted ( figure B 1.1 2. 1(b) ) 

sufficiently far from the Larmor frequency such that these transitions become difficult to observe with 
conventional pulse techniques. This is particularly important for spin-1 nuclei (of which the most important 
ones are 2 H and 14 N) as there is no central transition (m z = \) and all transitions are broadened to first order. 

Fortunately, for non-integer quadrupolar nuclei for the central transition v m ' ) = and the dominant 
perturbation is second order only (equation B 1.1 2. 8) which gives a characteristic lineshape ( figure B 1.1 2. 1(c) ) 
for axial symmetry): 

v%> = (-9Cj/64^/ 2 (2/ - l) 2 )(fl-3/4)(l - cos 2 0)(9cos 2 - ]). (B1.12.8) 

This angular dependence is different from the first-order perturbations so that the conventional technique of 
removing linebroadening in solids, MAS (see below), cannot completely remove this interaction at the same 
time as removing the first-order broadening. Hence, the resolution of MAS spectra from quadrupolar nuclei is 
usually worse than for spin-1 nuclei and often characteristic lineshapes are observed. If this is the case, it is 

usually possible to deduce the NMR interactions Cq, r\ and 8 iso providing valuable information about the 
sample. 

B1. 12.2.3 BASIC EXPERIMENTAL PRINCIPLES OF FT NMR 

The essence of NMR spectroscopy is to measure the separation of the magnetic energy levels of a nucleus. 
The original method employed was to scan either the frequency of the exciting oscillator or to scan the 
applied magnetic field until resonant absorption occurred. However, compared to simultaneous excitation of a 
wide range of frequencies by a short RF pulse, the scanned approach is a very time-inefficient way of 
recording the spectrum. Hence, with the advent of computers that could be dedicated to spectrometers and 
efficient Fourier transform (FT) algorithms, pulsed FT NMR became the normal mode of operation. 
Operating at constant field and frequency also produced big advantages in terms of reproducibility of results 
and stability of the applied magnetic field. In an FT NMR experiment a pulse of RF close to resonance, of 
duration T ' is applied to a coil. If the pulse is applied exactly on resonance (i.e. the frequency of the applied 
em radiation exactly matches that required for a transition) it produces a resultant magnetic field orthogonal to 
Bq in the rotating frame which causes a coherent oscillation of the magnetization. The magnetization is 

consequently tipped by an angle (= y B^T ) away from the direction defined by Bq. After a tt/2 -pulse all the 
magnetization is in a plane transverse to the direction to B^ and, hence, is termed transverse magnetization. B^ 
exerts a torque on the transverse magnetization which will consequently Larmor precess about 2? Q . The 
rotating magnetization is then providing an alternating flux linkage with the NMR coil that, through Faraday's 
law of electromagnetic induction, will produce a voltage in the NMR coil. The transverse magnetization 


decays through relaxation processes (see chapter B 1.1 3). The observed signal is termed the free induction 


1/2 


decay (FID). Adding coherently n FIDs together improves SIN by n compared to a single FID. 


In the linear approximation there is a direct Fourier relationship between the FID and the spectrum and, in the 
great majority of experiments, the spectrum is produced by Fourier transformation of the FID. It is a tacit 
assumption that everything behaves in a linear fashion with, for example, uniform excitation (or effective RF 
field) across the spectrum. For many cases this situation is closely approximated but distortions may occur for 
some of the broad lines that may be encountered in solids. The power spectrum P(y) of a pulse applied at v Q is 
given by a sine 2 function f81 
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P{v) = [sin 2 jt7>(i>o - v)]/{ry - v)\ 


(B1.12.9) 


The spectral frequency range covered by the central lobe of this sine function increases as the pulselength 
decreases. For a spectrum to be undistorted it should really be confined to the middle portion of this central 
lobe (figure B 1.12. 2). There are a number of examples in the literature of solid-state NMR where the 
resonances are in fact broader than the central lobe so that the 'spectrum' reported is only effectively 
providing information about the RF-irradiation envelope, not the shape of the signal from the sample itself. 

The sine 2 function describes the best possible case, with often a much stronger frequency dependence of 
power output delivered at the probe-head. (It should be noted here that other excitation schemes are possible 
such as adiabatic passage [9] and stochastic excitation [10] but these are only infrequently applied.) The 
excitation/recording of the NMR signal is further complicated as the pulse is then fed into the probe circuit 
which itself has a frequency response. As a result, a broad line will not only experience non-uniform 
irradiation but also the intensity detected per spin at different frequency offsets will depend on this probe 
response, which depends on the quality factor (0. The quality factor is a measure of the sharpness of the 
resonance of the probe circuit and one definition is the resonance frequency/halfwidth of the resonance 
response of the circuit (also = (D^L/R where L is the inductance and R is the probe resistance). Hence, the 
width of the frequency response decreases as Q increases so that, typically, for a Q of 100, the halfwidth of 
the frequency response at 100 MHz is about 1 MHz. Hence, direct FT-pulse observation of broad spectral 
lines becomes impractical with pulse techniques for linewidths greater than -200 kHz. For a great majority of 

NMR studies carried out on nuclei such as H, 13 C and 29 Si this does not really impose any limitation on their 
observation. Broader spectral lines can be reproduced by pulse techniques, provided that corrections are made 
for the RF-irradiation and probe responses but this requires careful calibration. Such corrections have been 
most extensively used for examining satellite transition spectra from quadrupolar nuclei [11]. 
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Figure Bl.12.2. Power distribution for an RF pulse of duration T applied at frequency v Q . 


Another problem in many NMR spectrometers is that the start of the FID is corrupted due to various 
instrumental deadtimes that lead to intensity problems in the spectrum. The spectrometer deadtime is made up 
of a number of sources that can be apportioned to either the probe or the electronics. The loss of the initial 
part of the FID is manifest in a spectrum as a rolling baseline and the preferential loss of broad components of 


the spectrum. In the best cases the deadtime is <2 |us, but even this can still lead to severe distortion of broad 
spectral lines. Baseline correction can be achieved by use of either simple spline fits using spectrometer 
software (including back-prediction) or the use of analytical functions which effectively amount to full- 
intensity reconstruction. Many spectrometer software packages now contain correction routines, but all such 
procedures should be used with extreme caution. 


B1.12.3 INSTRUMENTATION 


B1. 12.3.1 OVERVIEW OF A PULSE FT NMR SPECTROMETER 


The basic components of a pulse FT NMR spectrometer are shown schematically in figure B 1.1 2. 3. It can be 
seen that, in concept, a NMR spectrometer is quite simple. There is a high-field magnet, which these days is 
nearly always a superconducting solenoid magnet that provides the basic Zeeman states on which to carry out 
the NMR experiment. The probe circuit containing the sample in the NMR coil is placed in the magnetic field. 
The probe is connected to the transmitter that is gated to form the pulses that produce the excitation. The 
probe is also connected to the receiver and it requires some careful design to ensure that the receiver that is 
sensitive to |uV does not see any of the large-excitation voltages produced by the transmitter. The relatively 
simple concept of the experiment belies the extensive research and development effort that has gone into 
developing the components of NMR spectrometers. Although all components are important, emphasis is 
placed on three central parts of the spectrometer: namely the magnet, the probe and signal detection. 
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Figure Bl.12.3. Schematic representation showing the components of a pulse FT NMR spectrometer. 


B1. 12.3.2 MAGNETS 


Much effort goes into producing ever higher magnetic fields, and the highest currently commercially available 
for solid-state NMR is 18.8 T. Standard instruments are now considered to be 4.7-9.4 T. The drive for higher 
fields is based on the increased chemical shift dispersion (in hertz) and the increase in sensitivity via both the 
Boltzmann factor and higher frequency of operation. For solid-state NMR of half- integer spin quadrupolar 
nuclei there is the additional advantage that the second-order quadrupolar broadening of the central transition 
decreases inversely proportionally with 2? Q . Superconducting solenoids dominate based on Nb^Sn or NbTi 
multifilament wire kept in liquid helium. However, fields and current densities now used are close to the 
critical limits of these materials demanding improved materials technology [12]. The principle of operation is 
very simple: a high current is passed through a long coil of wire, with typically 40-100 A of current 
circulating around several kilometres of wire. This means that the magnet stores significant amounts of energy 
in its field (=1/21/ ; L is the solenoid's inductance and lis the current flowing) of up to 10 MJ. 
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A superconducting magnet consists of a cryostat, main coil, superconducting shim set and a means for 
attaching the current supply to the main coil (figure B 1.1 2.4). The cryostat consists of two vessels for the 
liquid cryogens, an inner one for helium and the outer one for nitrogen. Then, to insulate these, there are 
several vacuum jackets with a radiation shield. The aim is to reduce heat leakage to the inner chamber to 
conserve helium. Superconducting magnets in NMR are usually operated in persistent mode, which means 
that, after a current is introduced, the start and end of the main coil are effectively connected so that the 
current has a continuous path within the superconductor and the power supply can then be disconnected. To 
achieve this the circuits within the cryostat have a superconducting switch. The coil circuits are also designed 
to cope with a sudden, irreversible loss of superconductivity, termed a quench. There are resistors present 
(called dump resistors) to disperse the heating effect and prevent damage to the main coil when a quench 
occurs. 
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EJ A - liquid helium (-269«C) 
B - superooroJuciing coil 
SC- Fiquid nitrogen M9G*C) 

D - vacuum apace 

E - room lemperalure bene 

Figure Bl.12.4. Construction of a high-field superconducting solenoid magnet. 
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Higher magnetic fields exist than those used in NMR but the NMR experiment imposes constraints in addition 
to just magnitude. An NMR spectroscopy experiment also demands homogeneity and stability of the magnetic 
field. Long-term stability is aided by persistent-mode operation and the drift should be <2 x 10 a day. 
Homogeneity requirements for solid-state NMR experiments are typically up to 2 x 10 over a volume of -1 
cm 3 . The main coil alone is unable to produce this level of homogeneity so there is also a set of smaller 


superconducting coils called cryoshims. The number of these cryoshims depends on the design and also the 
purpose of the magnet (e.g. solid-state NMR, high-resolution NMR, imaging) but typically varies between 
three and eight. Although for many wide-line solid-state NMR experiments the homogeneity produced by the 
cryoshims is sufficient, most commercial spectrometers also have a room-temperature shim set which further 
improves the homogeneity of the magnetic field. A final consideration for the magnet is the accessible room- 
temperature bore of the magnet. A standard liquids magnet has a bore of 52 mm diameter. However most 
solid-state NMR spectroscopists prefer 89 mm as this gives much more room for the probe, allowing the use 
of larger and more robust electrical components for handling high powers and for accommodating some of the 
more specialist probe designs (e.g. double angle rotation, dynamic angle spinning (see section Bl. 12.4. 5 ) etc). 

B1. 12.3.3 PROBES 

The heart of an NMR spectrometer is the probe, which is essentially a tuned resonant circuit with the sample 
contained within the main inductance (the NMR coil) of that circuit. Usually a parallel tuned circuit is used 

with a resonant frequency of a> = (ZC) _ . The resonant frequency is obviously the most important probe 

parameter but the input impedance, which should be 50 Q, and Q are also extremely important. Several 

designs of coil exist each having advantages in specific applications. The traditional coil design, particularly 

applicable at lower frequencies and for solids, is the conventional solenoid. For external access of large 

samples, Helmholtz or saddle coils are used, such as are in widespread use for high-resolution liquid-state 

NMR studies. For large samples, again with external access, birdcage coils are finding increasing uses 

especially in magnetic resonance imaging. There are competing requirements for the probe which can be 

i f\ 

characterized in terms of Q; pulselength (ccQ ), deadtime (ocg), maximum voltage (ocg), bandwidth (ocg 

) and sensitivity (pcQ ). A probe cannot be designed that will lead to all these being optimized 

simultaneously because of their differing ^-dependence so that compromise and focus on the most important 

aspects for a specific application is required. The probe needs to be constructed from robust electronic 

components, as they often have to withstand high voltages (many kilovolts). 

Often linear circuit analysis is applied but in probes designed for solids, and therefore high-power operation, 
nonlinear effects can occur. Furthermore, in doubly and triply tuned probes used for decoupling, cross- 
polarization, and for some of the more sophisticated pulse sequences such as REDOR, TEDOR etc even small 
voltages generated at the second resonant frequency are unacceptable. They can swamp the NMR signal, 
given that the coil is part of more than one resonant circuit. Detailed consideration of the design criterion for 
double-tuned CP-MAS probes has been given [13]. These days, a number of sequences demand triply 
resonant circuits with two channels tunable over the lower frequency ranges and the third tunable to the high-y 
nuclei (e.g. H, 19 F). 
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B1. 12.3.4 SIGNAL DETECTION 

In addition to the deadtime mentioned earlier, magnetoacoustic ringing (up to 200 |us) can be very significant 
without careful probe design. Samples that exhibit piezoelectric behaviour can produce very long response 
times of up to 10 ms. The microvolt NMR signal generated in the coil needs amplification prior to detection 
and digitization. The first stage is about 30 dB amplification in the preamplifier with the most important 
characteristic being the noise figure (NF), essentially a measure of the noise added to the signal by the 
amplifier. Careful consideration is necessary for the production of a low noise figure and rapid recovery from 
saturation and, again, some compromise is required, with recovery of <2 |us. The preamp. should also have 
good linearity. 


The amplified signal is passed to a double-balanced mixer configured as a phase-sensitive detector where the 
two inputs are the NMR signal (a> ) and the frequency of the synthesizer (co ref ) with the output proportional to 
cos(a> - oo ref )t + 0) + cos((co + G> ref V + 6). The sum frequency is much larger than the total bandwidth of the 
spectrometer so it is lost, leaving only the difference frequency. Phase-sensitive detection is equivalent to 
examining the NMR signal in the rotating reference frame and if the frequencies a> and co ref are equal a 
constant output is obtained (neglecting relaxation effects). Most modern spectrometers employ two phase- 
sensitive detectors which have reference signals that differ in phase by 90°, termed quadrature phase-sensitive 
detection [14]. This scheme can distinguish whether a signal is above or below the reference frequency and 
allows the transmitter frequency to be placed at the centre of the range of interest, improving pulse power 
efficiency and signal-to-noise. (This is not possible with a single detector, which can only provide the 
magnitude of the offset.) Imbalance in the two channels and non-90° angles between them can give rise to 
quadrature images and should be minimized in the spectrometer set-up. Phase cycling includes application of 
different phase pulses and normally four phases 90° apart are available. However, much more sophisticated 
phase cycles are now required and variable phases can be generated by using a digital synthesizer. In 
particular, there is increasing demand for much smaller phase shifts than 90°. 

The signal is then digitized ready for storage in the computer memory. Digitizers are characterized by the 
number of bits (usually 12 or 16, determining the dynamic range), the rate of digitization (determining the 
dwell time) and the size of memory capable of storing the data points. Until very recently the ability to record 
narrow spectral objects over a broad range of frequencies has been limited, usually by the on-board computer 
memory, but commercial spectrometers have now addressed this problem. 
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B1.12.4 EXPERIMENTAL TECHNIQUES 

B1. 12.4.1 CLASSIFICATION OF NUCLEI 

The sensitivity in an NMR experiment is directly proportional to the number of spins, making quantification 
of the amount of a particular element present straightforward, at least for spin / = ^nuclei. Furthermore, the 

large variation in the gyromagnetic ratio, y, means that, except for pathological cases, the resonant frequencies 
are sufficiently different from one element to another that there is no possibility of confusing different 
elements. The ease of obtaining a spectrum, and to a certain extent the type of experiment undertaken, 
depends upon y, the nuclear spin and the natural abundance of the isotope concerned. The large variation in y 
means that the sensitivity, which is proportional to y (see section Bl. 12. 2.1 ), varies by >10 from (say) H 
and 19 F which have the largest y to 109 Ag which has one of the smallest. For spin /= ^nuclei the ability to 
obtain a useful signal is also dependent on the spin-lattice relaxation time, T^ as well as the sensitivity. As 

the principal cause of relaxation in spin--| systems is usually dipolar and thus proportional to (at least) y , 
relaxation times can be very long for low-gamma nuclei making experiments still more difficult. 

Nuclei with spin I> I, about three-quarters of the periodic table, have a quadrupolar moment and as a 

consequence are affected by any electric field gradient present (see section Bl. 12. 2. 2 ). The lines can be very 
broad even when techniques such as MAS (see section Bl. 12.4. 3 ) are used, thus it is convenient to divide 
nuclei into groups depending upon ease of observation and likely width of the line. This is done in table 
Bl.12.2 for the most commonly studied elements. These have been divided into six categories: (a) spin/ = 
^nuclei which are readily observable, (b) spin / = Inuclei which are observable with difficulty because either 

the isotopic abundance is low, in which case isotopic enrichment is sometimes used (e.g. N), or because the 

y is very small (e.g. 183 W), (c) spin /= 1 nuclei where the quadrupolar interaction can lead to very broad lines 
(d) non-integer spin / > ^nuclei which are readily observable, (e) non-integer spin / > ^nuclei which are 

readily observable only in relatively symmetric environments because the quadrupolar interaction can 


strongly broaden the line and (f) those spin / > ^nuclei whose quadrupole moment is sufficiently large that 

they can only be observed in symmetric environments where the electric field gradient is small. Of course the 
boundaries between the categories are not 'rigid'; as technology improves and magnetic fields increase there 

is movement from (f) to (e) and (e) to (d). In each case the sensitivity at natural abundance relative to Si (for 
which resonance is readily observed) is given. More complete tables of nuclear properties relevant to NMR 
are given in several books on NMR [1, 2]. 
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Table Bl.12.2 Classification of nuclei according to spin and ease of observation. 


Sensitivity at natural Sensitivity at natural 

abundance relative to abundance relative to 

Isotope 29q. Isotope 29q- 


(a) Readily observable 
1 H 2700 

113 Cd 

3.6 

13 C 

0.48 

119 Sn 

12 

19 F 

2200 

125 Te 

6.0 

29 Si 

1.00 

195 pt 

9.1 

31 p 

180 

199 Hg 

2.6 

77 Se 

1.4 

205 T| 

35 

89 y 

0.32 

207 pb 

5.6 

(b) Observable with difficulty < 

15 N * 1.0 x 10" 2 

Dr requiring isotopic enrichment 
109 Ag 0.13 

57 Fe* 

1.0 x 10" 3 

183 w 

2.8 x 10~ 2 

109 Rh 

8.5 x 10" 2 



(c) Integer /= 1 
2 D * 3.9 x 10" 3 

14 N 

2.7 

Non-integer I -I 



(d) Readily observable 
7 U 730 

23 Na 

250 

9 Be 

37 

27 AI 

560 

11 B 

360 

51 V 

1030 

17 o* 

2.9 x 10" 2 

133 Cs 

130 

(e) Readily observable only ir 
25 Mg ° 73 

i relatively symmetric environments 
59 Co 750 

33 s* 

4.7 x 10- 2 

65 Cu 

96 

37 CI 

1.7 

67 Zn 

0.32 

39 K 

1.3 

71 Ga 

150 

43 Ca* 

2.3 x 10- 2 

81 Br 

133 

45 Sc 

820 

87 Rb 

133 

55 Mn 

470 

93 Nb 

1300 

(f) Observable in very symmetric environment 
49 Tj 0.44 I05 pd 0.68 

53 Cr 

0.23 

115 ln 

910 


61 Ni 

0.11 

121 Sb 

250 

73 Ge 

0.30 

127, 

260 

75 As 

69 

135 Ba 

0.89 

87 Sr 

0.51 

139 La 

160 

9l Zr 

2.9 

209 Bj 

380 

95 Mo 

1.4 




Isotopically enriched. 
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B1. 12.4.2 STATIC BROAD-LINE EXPERIMENTS 


Static NMR powder patterns offer one way of characterizing a material, and if spectral features can be 
observed and the line simulated then accurate determination of the NMR interaction parameters is possible. 
The simplest experiment to carry out is one-pulse acquisition and despite the effects outlined above that 
corrupt the start of the time domain signal, the singularities are relatively narrow spectral features and their 
position can be recorded. The magnitude of the interaction can then be estimated. A common way to 
overcome deadtime problems is to form a signal with an effective time zero point outside the deadtime, i.e. an 
echo. There is a huge multiplicity of methods for forming such echoes. Most echo methods are two-pulse 
sequences, with the classic spin echo consisting of 90° - x - 180° which refocuses at x after the second pulse. 
The echo decay shape is a good replica of the original FID and its observation can be used to obtain more 
reliable and quantitative information about solids than from the one-pulse experiment. 

To accurately determine broad spectral lineshapes from echoes hard RF pulses are preferred for uniform 
excitation. Often an echo sequence with phase cycling first proposed by Kunwar et al [15] has been used, 
which combines phase cycling to remove quadrature effects and to cancel direct magnetization (the remaining 
FID) and ringing effects. The rotation produced by the second pulse in the two-pulse echo experiment is not 
critical. In practice, the best choice is to make the second pulse twice the length of the first, with the actual 
length a trade-off between sensitivity and uniformity of the irradiation. In recording echoes there is an 
important practical consideration in that the point of applying the echo is to move the effective t = position 
for the FID to being outside the region where the signal is corrupted. However, in order that phasing problems 
do not re-emerge, a data sampling rate should be used that is sufficient to allow the effective t = point to be 
accurately defined. If T 2 allows the whole echo (both before and after the maximum, i.e. t = 0) to be 
accurately recorded without an unacceptably large loss of intensity, there is then no need to accurately define 
the new t = position. Fourier transformation of the whole echo (which effectively amounts to integration 
between ±oo) followed by magnitude calculation removes phasing errors producing a pure absorption 
lineshape with the signal-to-noise V2 larger than that obtained by transforming from the echo maximum. 

Even if echoes are used, there are still difficulties in recording complete broad spectral lines with pulsed 
excitation. Several approaches have been adopted to overcome these difficulties based on the philosophy that 
although the line is broad it can be recorded using a series of narrow-banded experiments. One of these 
approaches is to carry out a spin-echo experiment using relatively weak RF pulses, recording only the 
intensity of the on-resonance magnetization and repeating the experiment at many frequencies to map out the 

lineshape. This approach has been successfully used in a series of studies. An example is Zr from the 
polymorphs of Zr0 2 ( figure B 1.1 2. 5 ) where mapping out the lineshape clearly shows differences in NMR 

interaction [16]. This approach works but is extremely tedious because each frequency step requires accurate 

retuning of the probe. An alternative is to sweep the main magnetic field. There are several examples of 

sweeping the main magnetic field for solids dating from the earliest days of NMR but only a limited number 

reported using superconducting magnets, with a recent example being for Al in a-Al o 0„ [171. It is now 


possible to have a single NMR spectrometer that is capable of both conventional high-resolution spectroscopy 
and also field sweep operation. As with the stepped-frequency experiment, relatively soft pulses are applied, 
and although strictly the on-resonance part of the magnetization should be used, it has been shown 
experimentally that using the spin-echo intensity directly accurately reproduces the lineshape ( figure B 1.1 2. 6 ) 
[18]. Recording the full distortion-free lineshape including the satellite transitions then allows accurate 
determination of the asymmetry parameter. 
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Figure Bl.12.5. Zr static NMR lineshapes from Zr0 2 polymorphs using frequency-stepped spin echoes. 

For quadrupolar nuclei, the dependence of the pulse response on Vq/v 1 has led to the development of 
quadrupolar nutation, which is a two-dimensional (2D) NMR experiment. The principle of 2D experiments is 
that a series of FIDs are acquired as a function of a second time parameter (e.g. here the pulse length applied). 
A double Fourier transformation can then be carried out to give a 2D data set (Fl, F2). For quadrupolar nuclei 
while the pulse is on the experiment is effectively being carried out at low field with the spin states 
determined by the quadrupolar interaction. In the limits Vq ^v 1 and Vq 1&v^ the pulse response lies at v 1 and 

(/+ 2") v i respectively so is not very discriminatory. However, for Vq - v 1 the pulse response is complex and 

allows Cq and r| to be determined by comparison with theoretical simulation. Nutation NMR of quadrupolar 
nuclei has largely been limited by the range of RF fields that putsvQ in the intermediate region. This approach 
has been extended by Kentgens by irradiating off-resonance, producing a larger effective nutation field, and 
matches v 1 to Vq [19]. 
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Figure Bl.12.6. Field-swept 27 A1 NMR spectrum from a-Al 2 3 . 
B1. 12.4.3 MAGIC ANGLE SPINNING 

All of the anisotropic interactions described in section B 1.1 2. 2 are second-rank tensors so that, in 
polycrystalline samples, lines in solids can be very broad. In liquids, the rapid random molecular tumbling 
brought about by thermal motion averages the angular dependences to zero, leaving only the isotropic part of 
the interaction. The essence of the spinning technique is to impose a time dependence on these interactions 
externally so as to reduce the anisotropic part, which to first order has the same Legendre polynominal, P 2 (cos 
0), dependence for all the interactions, in a similar but not identical manner as in liquids. This time 
dependence is imposed by rapidly rotating the whole sample container (termed the rotor) at the so-called 

'magic angle' where 3 cos 0=1, i.e. = 54° 44'. 'Rapid' means that the spinning speed should be faster than 
the homogeneous linewidth. (A spectral line is considered homogeneous when all nuclei can be considered as 
contributing intensity to all parts of the line, so that the intrinsic linewidth associated with each spin is the 
same as the total linewidth.) Essentially, this is determined by the dipolar coupling: in proton-rich systems the 
proton-proton coupling can be >40 kHz, beyond the range of current technology where the maximum 
commercially available spinning speed is -35 kHz. Fortunately, in most other systems the line is 
inhomogeneously broadened, i.e. is made up of distinct contributions from individual spins in differently 
oriented crystallites which merge to give the composite line, the intrinsic width associated with each spin 
being considerably narrower than the total linewidth. To cause effective narrowing of these lines the spinning 
rate needs only to exceed the intrinsic linewidth and typically a few kilohertz is sufficient to narrow lines 
where the chemical shift is the dominant linebroadening mechanism. In figure B 1.12.7 both the static and 

MAS 29 Si spectrum of a sample of sodium disilicate (Na 2 Si 2 5 ) crystallized from a glass is shown as an 
example. Whilst the static spectrum clearly indicates an axial chemical shift powder pattern, it gives no 
evidence of more than one silicon site. The MAS spectrum clearly shows four resolved lines from the 
different polymorphs present in the material whose widths are -100 times less than the chemical shift 
anisotropy. 
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Note that if the spinning speed is less than the static width then a series of spinning sidebands are produced, 


separated from the isotropic line by the spinning speed (these are also visible in figure B 1.12. 7 ). For multisite 
systems, spinning sidebands can make interpretation of the spectrum more difficult and it may be necessary to 
do experiments at more than one spinning speed in order to determine the isotropic peak and to eliminate 
overlap between sidebands and main peaks. However, the presence of sidebands can be useful since from the 
amplitude of the spinning sideband envelope one can deduce the complete chemical tensor, giving additional 
information about the local site environment [20], 
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Figure Bl.12.7. Static and MAS 29 Si NMR spectra of crystalline Na 2 Si 2 5 . 

The interpretation of MAS experiments on nuclei with spin I> iin non-cubic environments is more complex 
than for /= ^nuclei since the effect of the quadrupolar interaction is to spread the ±m <r^ ±(m - 1) transition 
over a frequency range (2m - 1)Vq. This usually means that for non-integer nuclei only the j^^ - ^transition is 
observed since, to first order in the quadrupolar interaction, it is unaffected. However, usually second-order 
effects are important and the angular dependence of the ^<r^ - ^transition has both ,P 2 (cos 0) and .P 4 (cos 0) 

terms, only the first of which is cancelled by MAS. As a result, the line is narrowed by only a factor of -3.6, 
and it is necessary to spin faster than the residual linewidth Avq where 
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Av = /(/) 


224m 


(B1. 12.10) 


to narrow the line. The resulting lineshape depends upon both Cq and r| and is shown in figure B 1.12. 8 for 
several different asymmetry parameters. The centre of gravity of the line does not occur at the isotropic 


chemical shift (see figure B 1.1 2. 1(d) ); there is a quadrupolar shift 


"S«=^ ( ' ) §( 1+ 7)- <B11211) 


If the lineshape can be clearly distinguished, then C Q , r| and S iso can be readily determined even for 
overlapping lines, although it should be noted that most computer packages used to simulate quadrupolar 
lineshapes assume 'infinite' spinning speed. Significant differences between the experimental lineshape and 
simulation can occur if one is far from this limit and caution is required. A further problem for MAS of 
quadrupolar nuclei is that the angle must be set very accurately (which can be difficult and time consuming 
with some commercial probes) to obtain the true lineshape of broad lines. In many samples (e.g. glasses and 
other disordered systems) featureless lines are observed and the centre of gravity must then be used to 
estimate 8 iso and the electric field gradient since 

a 


*.-*.-s' m §( 1+ 7) <B1,212) 


and a plot of v against l/v n will give 8- and Cxi 1 + ^/^).f[I) is the spin-dependent factor [/(/+ 1) 
3/4]// 2 (2/- l) 2 and is given in table Bl.12.3. 

Table Bl.12.3 Spin-dependent factor J[I) for the isotropic second-order quadrupole shift. 
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Figure Bl.12.8. MAS NMR lineshapes from the central transition lineshape for non-integer quadrupole 
lineshapes with various r^ {A = (/(/+ I) - 3/4)i^/m). 

The second-order quadrupolar broadening of the ^<-^ - ^transition can be further reduced by spinning at an 

angle other than 54.7° (VAS), the width being a minimum between 60-70°. The reduction is only ~2 
however, and dipolar and shift anisotropy broadening will be reintroduced, thus VAS has only found limited 
application. 
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B1. 12.4.4 DECOUPLING AND CROSS-POLARIZATION 


In proton-containing systems such as organic materials the dipole-dipole interaction is usually sufficiently 
large that current spinning speeds are not sufficient to narrow the line. However, in a heteronuclear (7,5) spin 
system if a large RF magnetic field is applied at the resonance frequency of the spin, 7, one wishes to 
decouple, the magnetization will precess around the effective field in the rotating frame and the average IS 

dipolar coupling will tend to zero. This technique is most commonly applied to remove the ^-^C dipolar 
coupling, but can be used for any system. Decoupling is usually combined with cross-polarization (CP) and 
the pulse sequence is shown in figure B 1.12.9 . A 90° pulse is applied to the I spin system and the phase of the 
RF is then shifted by 90° to spin lock the magnetization in the rotating frame. The S spin system RF is now 


turned on with an amplitude such that yi^ij = Js^\s * e - * n ^ e rota ti n g frame both spin systems are precessing 
at the same rate (the Hartmann-Hahn condition) and thus magnetization can be transferred via the flip-flop 
term in the dipolar Hamiltonian. The length of time that the S RF is on is called the contact time and must be 
adjusted for optimum signal. Since magnetization transfer is via the dipole interaction, in general the signal 
from S spin nuclei closest to the / spins will appear first followed by the signal from S spins further away. 
Thus one use of CP is in spectral editing. However, the main use is in signal enhancement since the S spin 

magnetization will be increased by (y/j s ) which for ^-^C is ~4 and for ^-^N is -9. In addition, the S 
spin relaxation time T^ is usually much longer than that of the / spin system (because y is smaller); as a 
consequence in CP one can repeat the experiment at a rate determined by T u rather than reproducing a 
further significant gain in signal to noise. For spin /, S = -isystems the equation describing the signal is given 

by 


m ,ccr) = *o(£)[=*p(-^)-~p(-(4+^)] 


(B1. 12.13) 


where T,^ and T*^ are the relaxation times in the rotating frame of the / and S spin systems respectively and 
T JS , which is dependent upon the strength of the dipole coupling between the / and S spin systems, is the 
characteristic time for the magnetization of the S spin system to increase via contact with the / spins. 
Generally, T*^ is very much longer than T JS and can be ignored and although 7^ j is shorter it too is usually 
much longer than T JS . In this case maximum signal will occur for a contact time t - T JS ln(T JS /T^ A If one 
wishes to do quantitative measurements using CP it is necessary to plot signal amplitude versus contact time 
and then fit to equation B 1.1 2. 13. A typical plot is shown for glycine in figure Bl.12.10 . It can be seen from 
the inset that T JS for the CH 2 peak at -40 ppm is much shorter (20 |us) than that for the COOH peak at 176 
ppm (570 |us). Carbon spectra can be very complex. Various pulse sequences have been used to simplify the 
spectra. The most common is dipolar dephasing, in which there is a delay after the contact time (during which 
no decoupling takes place) before the signal is acquired. The signal from strongly coupled carbon (e.g. CH 2 ) 
will decay rapidly during this time leaving only weakly coupled carbons visible in the spectrum. 


-22- 


1 1 r Channel 
90* 


(Spin-lock) y 


Decoupling 


X Channel 


CP 


FID 


Figure Bl.12.9. Pulse sequence used for CP between two spins (/, S). 
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Figure Bl.12.10. C CP data from the two carbons in glycine as a function of contact time. The signal for 
short contact times is shown in the inset where the effect of the different T JS values can be clearly seen. 
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Whilst MAS is effective (at least for spin / = 1) in narrowing inhomogeneous lines, for abundant spin systems 

with a large magnetic moment such as protons (or fluorine) the homonuclear dipole coupling can be >30 kHz, 
too strong to be removed by MAS at currently available spinning speeds. Thus there have been many multiple 
pulse sequences designed to remove homonuclear coupling. When combined with MAS they are called 
CRAMPS (combined rotation and multiple pulse spectroscopy). The MREV-8 [6] sequence is one of the more 
commonly used and robust sequences; however, they all require a special probe because of the high powers 
needed, together with a very well tuned and set up spectrometer. Furthermore, to be effective the period of 
rotation needs to be much greater than the cycle time of the sequence, thus their use is somewhat restricted. 

B1. 12.4.5 HIGH-RESOLUTION SPECTRA FROM QUADRUPOLAR NUCLEI 


Although MAS is very widely applied to non-integer spin quadrupolar nuclei to probe atomic-scale structure 
in solids, such as distinguishing Al0 4 and A10 6 environments [21], simple MAS about a single axis cannot 
produce a completely averaged isotropic spectrum. As the second-order quadrupole interaction contains both 

second-rank (oc 3 cos 0-1 (P 2 (cos 0))) and fourth-rank (oc 35 cos 0-30 cos + 3 (P 4 (cos 0))), the fourth- 
order Legendre polynomial)) terms it can be seen from figure B 1.12. 11 that spinning at 54.7° can only 

partially average 35 cos 0-30 cos 2 + 3. If a characteristic well defined lineshape can be resolved the NMR 
interaction parameters can be deduced. Even if overlapping lines are observed, provided a sufficient number 
of features can be discerned, fitting, especially constrained by field variation, will allow the interactions to be 
accurately deduced. However, there are still many cases where better resolution from such nuclei would be 
extremely helpful, and over the last decade or so there have been many ingenious approaches to achieve this. 
Here, four of the main approaches will be briefly examined, namely satellite transition MAS spectroscopy, 
double-angle rotation (DOR), dynamic angle spinning (DAS) and multiple quantum (MQ) MAS. The latter 
three produce more complete averaging of the interactions by imposing more complex time dependences on 


the interactions. DOR and DAS do this directly on the spatial coordinates, whereas MQ also manipulates the 
spin part of the Hamiltonian. For each method the background to producing better-resolution solid-state NMR 
spectra will be given, and the pros and cons of each detailed. Extensive referencing to general reviews [22, 23 
and 24] and reviews of specific techniques is given where more details can be found. Then, by way of 


27, 


illustration, comparison will be made of these methods applied to Al NMR of kyanite 
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Figure Bl.12.11. Angular variation of the second- and fourth-rank Legendre polynomials. 
(A) SATELLITE TRANSITION MAS 

Physical background. MAS will narrow the inhomogeneously broadened satellite transitions to give a series 
of sharp sidebands whose intensity envelopes closely follow the static powder pattern so that the quadrupole 
interaction can be deduced. The work of Samoson [25] gave real impetus to satellite transition spectroscopy 
by showing that both the second-order quadrupolar linewidths and isotropic shifts are functions of /and m z . 
Some combinations of /and m produce smaller second-order quadrupolar effects on the satellite lines than 
for the central transition, thus offering better resolution and more accurate determination of 8 iso . The two 
cases where there are distinct advantages of this approach over using the central transition are (±|, ±1) for / = 

|and (±^ zbj) for / = |. 27 A1 has been the focus of much of the satellite transition work and has been used for a 
range of compounds including ordered crystalline, atomically disordered and amorphous solids. The work of 
the group of Jaeger has greatly extended the practical implementation and application of this technique and a 
comprehensive review is given in [26]. 

Advantages. The experiment can be carried out with a conventional fast-spinning MAS probe so that it is 
straightforward to implement. For recording the satellite transition lineshapes it offers better signal-to-noise 
and is less susceptible to deadtime effects than static measurements. As the effects differ for each m value, a 
single satellite transition experiment is effectively the same as carrying out multiple field experiments on the 
central transition. 


Disadvantages. The magic angle must be extremely stable and accurately set. The spinning speed must show 
good stability over the duration of the experiment. The probe needs to be accurately tuned and careful 
correction for irradiation and detection variations with frequency, and baseline effects are necessary. The gain 


in resolution only applies to / = |and 2. 
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(B) DOUBLE ANGLE SPINNING 

Physical background. DOR offers the most direct approach to averaging P 2 (cos 0) and i^ 4 (cos 0) terms 
simultaneously, by making the spinning axis a continually varying function of time [26]. To achieve this, 
DOR uses a spinning rotor (termed the inner rotor) which moves bodily by enclosing it in a spinning outer 
rotor, thereby forcing the axis of the inner rotor to describe a complicated but continuous trajectory as a 
function of time. The effect of this double rotation is to introduce modulation of the second-order quadrupolar 
frequency of the central transition in the laboratory frame of the form 

i f ~ 
tfC * 21( * ) = m (2/ - I w, [iMf ' m ' nq) + M{1 ' '"' »te>ft< C05 ft >ft tcosft)f L (g, 0) 12 

+ ^f /, m, jjq)/^ (cos fij) Precis fa) ^{Qi 0) + terms ex cosf/'V + )^}] 

where P 1 and (3 2 are the angles between the outer rotor and the magnetic field and the angle between the axes 
of the two rotors, respectively, and § and describe the orientation of the principal axis system of a crystallite 
in the inner rotor. co 1 is the angular frequency of the outer rotor and y 2 an angle describing the position of the 
outer rotor in the laboratory frame. More details of the functions A and F can be found by comparison with 
equation Bl. 12. 10 , equation Bl. 12. 11 , equation B 1.1 2. 12 and equation B 1.1 2. 13 in [24]. P 1 and P 2 can be 
chosen so that P 2 (cos p x ) = (54.74°) andi> 4 (cos (3 2 ) = (30.56° or 70.15°). 

Simulation of the complete DOR spectrum (centreband plus the spinning sidebands) will yield the NMR 
interaction parameters. However, it is most usual to perform the experiment to give improved resolution and 
simply quote the measured peak position, which appears at the sum of the isotropic chemical and second- 
order quadrupole shifts. DOR experiments at more than one applied magnetic field will allow these different 
isotropic contributions to be separated and hence provide an estimate of the quadrupole interaction. This 
approach is similar to that using the field variation of the centre of gravity of the MAS centreband ( equation 
Bl.12.12 ) but has the advantage that the narrower, more symmetric line makes determination of the correct 
position more precise. For experiments carried out at two magnetic fields where the Larmor frequencies are 
Vqj and v Q2 for the measured DOR peak positions (in ppm) at the two magnetic fields at 8 dorl 2 then 

5isfl 4 cs = 2 2 (B1. 12.15) 

V? 0L ~ v fb 

and 

3 f f ' - j) lw /i , Hq\ _ ^l |P 02^W1 — «dnra) (B1 12 16) 

40/*(2/ -l^JM 3 J" if^J B 

Advantages. DOR works well if the quadrupolar interaction is dominant and the sample is highly crystalline, 
with some extremely impressive gains in resolution. Provided that the correct RF-excitation conditions are 
employed the spectral information is directly quantitative. 
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Disadvantages. The technique requires investment in a specialized, complex probe-head that requires 
considerable experience to use effectively. The relatively slow rotation speed of the large outer rotor can lead 
to difficulties in averaging strong homogeneous interactions and produces many closely spaced spectral 
sidebands. In disordered solids where there is a distribution of isotropic chemical shifts, quite broad sidebands 
can result that may coalesce at the slow rotation rates used. Currently, the maximum actual spinning speed 
that can be routinely obtained in the latest system with active computer control of the gas pressures is -1500 
Hz. By the use of synchronous triggering [28] this effectively amounts to a spinning speed of 3 kHz. 
Undoubtedly the technology associated with the technique will continue to improve leading to increased 
spinning speeds and thus expanding the application of the technique. Also, the RF coil encloses the whole 
system, and the filling factor is consequently small leading to relatively low sensitivity. The large coil size 
also means that the RF generated is quite low and that double tuning for CP is difficult, although such an 
experiment has been performed. 

(C) DYNAMIC ANGLE SPINNING 

Physical background. DAS is a 2D NMR experiment where the evolution of the magnetization is divided into 
two periods and the sample is spun about a different axis during each period [27, 29]. During the first 
evolution time ^ the sample is spun at an angle of 9^ The magnetization is then stored along the z-axis and 
the angle of the spinning axis is changed to 2 . After the rotor is stabilized (-30 ms) the magnetization is 
brought into the xy-plane again and a signal is acquired. The second-order quadrupole frequency of an 
individual crystallite depends on the angle of the spinning axis. So during ^ the quadrupole frequency will be 
v 1? and v 2 during t 2 - If v 2 is of opposite sign to v 1 the magnetization from the crystallite will be at its starting 
position again at some time during t^. One can choose both angles in such a way that the signals from each 
individual crystallite will be at the starting position at exactly the same time. In other words, an echo will form 
and the effect of the second-order quadrupolar broadening is removed at the point of echo formation. To 
achieve this cancellation to form an echo then 


P 2 [cqsQ\) = -kP2(co&0 2 ) and P 4 (cosff L ) = -Jtfti(cosfl 2 ) (B1.12.17) 

must both be satisfied where k is the scaling factor. There is a continuous set of solutions for 1 and 2 , the 
so-called DAS complementary angles, and each set has a different scaling factor. For these solutions, the 
second-order quadrupole powder pattern at © 1 is exactly the scaled mirror image of the pattern at 2 and an 
echo will form at t 2 = kty For the combination 1 = 30.56°, 2 = 70.12° the i^ 4 (cos 0) terms are zero and the 
scaling factor k = 1.87. A practically favoured combination is 1 = 37.38°, 2 = 79.19° as the scaling factor k 
= 1 and the spectra are exact mirror images so that an echo will form at t^ = U. There are several ways in 
which the DAS spectra can be acquired; one can acquire the entire echo, which means that the resulting 2D 
spectrum will be sheared. Some additional processing is then required to obtain an isotropic spectrum in F\ . 
The acquisition could also start at the position of the echo so that an isotropic spectrum in F\ is obtained 
directly. A third possibility is to carry the experiment out as a pseudo ID experiment where only the top of the 
echo is acquired as a function of ty In this case the isotropic spectrum is acquired directly but there is no 
saving in the duration of the experiment. 

Advantages. Compared to DOR, a small rotor can be used allowing relatively fast spinning speeds; high RF 
powers can be attained and if the coil is moved with the rotor a good filling factor can be obtained. In the 
isotropic dimension high-resolution spectra are produced and the second dimension retains the anisotropic 
information. 
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Dis advantages. Again, a specialized probe-head is necessary to allow the rotor axis to be changed, usually 
using a stepper motor and a pulley system. Angle switching needs to be as fast as possible and reproducible. 


For an RF coil that changes orientation with the rotor the RF field will vary with the angle and the RF leads 
need to be flexible and resistant to metal fatigue. A major limitation of the DAS technique is that it cannot be 
used on compounds with a short ^ because of the time needed to reorient the spinning axis. If magnetization 
is lost through t^ and/or spin diffusion effects the signal can become very weak. This factor has meant that 

1 H 

DAS has been most useful on O where the dipole-dipole interaction is weak but has not had the impact on 
NMR of nuclei such as 27 A1 that it might otherwise have had. 

(D) MQMAS 

Physical background. Relatively recently (1995) a new experiment emerged that has already had a great 
impact on solid-state NMR spectroscopy of quadrupolar nuclei [30]. The 2D multiple quantum magic angle 
spinning (2D MQMAS) experiment greatly enhances resolution of the spectra of half- integer spin quadrupolar 
nuclei. Basically, this experiment correlates the (m, -m) multiple quantum transition to the (I, -~) transition. 

The resolution enhancement stems from the fact that the quadrupole frequencies for both transitions are 
correlated. At specific times the anisotropic parts of the quadrupole interaction are refocused and an echo 
forms. The frequency of an (m, -m) transition is given by 


i> = a(/j)vj - —C A {p)vf(9,$) (B1-12.18) 

where p = 2m is the order of the coherence,/? = 1 for the (4 ? ~4) transition, p = 3 for the (§,-§) transition, etc. 
The coefficients are defined as 


c 4 (p) = p(mu + 1) -^p 1 -s\ 


(B1.12.19) 


The isotropic part v^q of the quadrupole frequency is 

$ = -&& + nl))/90vn [IIZ] (B1.12.20) 

and the anisotropic part v^ 4 (0, §) is given by 

vf(9, <f>) = (i^/1 12i^) x[(7/18)(3- flfl cos 2^ sm 4 B 

(B1.12.21) 
+ (2* q cos 2^ -4 -(2/9)^) sin 2 fl +(2/45)^ +(4/5)]- 
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Numerous schemes exist that are used to obtain 2D MQMAS spectra [24]. The simplest form of the 
experiment is when the MQ transition is excited by a single, high-power RF pulse, after which the MQ 
coherence is allowed to evolve for a time t^ (figure Bl.12.12). After the evolution time, a second pulse is 
applied which converts the MQ coherence into ap = -1 coherence which is observed during t 2 - The signal is 
then acquired immediately after the second pulse and the echo will form at a time t 2 = \QA\ty Both pulses are 
non-selective and will excite all coherences to a varying degree. Selection of the coherences of interest is 
achieved by cycling the pulse sequence through the appropriate phases. Phase cycles can be easily worked out 


by noting that an RF phase shift of 4> degrees is seen as a phase shift ofpfy degrees for a/?-quantum coherence 
[31]. After a 2D Fourier transformation the resonances will show up as ridges lying along the quadrupole 
anisotropy (QA) axis. The isotropic spectrum can be obtained by projection of the entire 2D spectrum on a 
line through the origin (v 1 = v 2 = 0) perpendicular to the QA axis, figure B 1.1 2. 12 shows some of the many 
different pulse sequences and their coherence pathways that can be used for the 2D MQMAS experiment. 
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Figure Bl.12.12. Pulse sequences used in multiple quantum MAS experiments and their coherence pathways 
for (a) two-pulse, (b) z-filter, (c) split-^ with z-filter and (d) RIACT (II). 

The isotropic shift and the quadrupole-induced shift (QIS) can easily be obtained from the data. The QIS is 
different in both dimensions, however, and is given by 




10" 


[in ppm]. 


(B1. 12.22) 
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8F E , the position of the centre of gravity with/? = 1 for the single quantum dimension and/? = Am for the 


multiple quantum dimension, is given by 


S£ = Sm + S r 


^ 


qis' 


(B1. 12.23) 


Hence the isotropic shift can be easily retrieved using 


j P =L 


Oisfeft — 


C a (p)-pCu{l) 


[in ppm] 


(B1. 12.24) 


and the isotropic quadrupolar shift by 

v «= r ( nfr1t * l inHz J' (B11Z25) 

Cud j - t w (/')//j 

Shearing of the data is performed to obtain isotropic spectra in the F\ dimension and to facilitate easy 
extraction of the ID slices for different peaks. Shearing is a projection of points that lie on a line with a slope 
equal to the anisotropy axis onto a line that is parallel to the F2 axis [24]. Shearing essentially achieves the 
same as the split-^ experiment or delayed acquisition of the echo. Although sheared spectra may look more 
attractive, they do not add any extra information and they are certainly not necessary for the extraction of QIS 
and 8- values. 

ISO 

Advantages. The experiment can be readily carried out with a conventional probe-head, although the fastest 
spinning and highest RF powers available are useful. The pulse sequences are relatively easy to set up 
(compared to DAS and DOR) and the results are usually quite straightforward to interpret in terms of the 
number of sites and determination of the interactions. 

Disadvantages. A researcher new to the subject will be confronted with a large number of schemes for 
collecting, processing and presenting the data which can be very confusing. The relationship between the 
measured peak positions and the NMR interaction parameters crucially depends on the processing and 
referencing conventions adopted. There is one clear distinction between the two main approaches: either the 
MQ evolution is regarded as having taken place only in the evolution time (^), or the period up to the echo is 
also regarded as being part of the evolution time which is then (1 + QA)ty A detailed critique of these two 
approaches and the consequences of adopting each has recently been given by Man [32]. Shearing data 
introduces an extra processing step, which may introduce artefacts. The key point in determining the 
quadrupole parameters is the accuracy of measuring the position of the centre of gravity of the resonance. 
Both the excitation efficiency as well as significant intensity in the spinning sidebands can adversely affect 
the accuracy of determining the centre of gravity. MQ spectra show a strong dependence on excitation 
efficiency that is strongly dependent on the value of Cq relative to the RF field strength. This means that often 
the spectra are non-quantitative and sites with a large C Q can be lost completely from the spectrum. 
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(E) APPLICATION OF HIGH-RESOLUTION METHODS TO 27 AL IN KYANITE 

Kyanite, a polymorph of Al 2 Si0 5 , has four distinct A10 6 sites. The crystal structure and site geometries (e.g. 
Al-0 bondlengths) are well characterized. The quadrupole interactions vary from 3.6 to 10.1 MHz, providing 
a good test of the different approaches to achieving high-resolution NMR spectra from quadrupolar nuclei. 
Both single-crystal studies [33] and NQR [34] have accurately determined the quadrupole interaction 
parameters. MAS studies of the central transition have been reported from 7 to 18.8 T [35, 36, 37 and 38]. An 
example of a 17.55 T spectrum is shown in figure Bl. 12. 13(a) along with a simulation of the individual 
components and their sum. It can be clearly seen that at even this high field there is extensive overlap of the 
four components. However, many distinct spectral features exist that constrain the simulation and these can be 
followed as a function of applied magnetic field. As the field is increased the second-order quadrupole effects 
decrease. Comparing simulations at 4.7 and 9.4 T (figure Bl. 12. 13(b)) the field variation simply scales the 
width of the spectrum, being a factor of two narrower (in hertz) at the higher field. By extending the 
simulation to higher fields, 18 T can be seen to be a poor choice as there is considerable overlap of the 
resonances with virtually no shift dispersion between the sites. 135.3 T would provide a completely resolved 
spectrum under simple MAS. 
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Figure continued on next page 
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Figure Bl.12.13. Al MAS NMR spectra from kyanite (a) at 17.55 T along with the complete simulation and 
the individual components, (b) simulation of centreband lineshapes of kyanite as a function of applied 
magnetic field, and the satellite transitions showing (c) the complete spinning sideband manifold and (d) an 
expansion of individual sidebands and their simulation. 
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Satellite transition MAS NMR provides an alternative method for determining the interactions. The intensity 
envelope of the spinning sidebands are dominated by site A2 (using the crystal structure nomenclature) which 
has the smallest C Q , resulting in the intensity for the transitions of this site being spread over the smallest 

range (ocCq), and will have the narrowest sidebands (ocCq /v Q ) of all the sites ( figure Bl. 12. 13(c) ) [37], The 
simulation of this envelope provides additional constraints on the quadrupole interaction parameters for this 
site. Expanding the sidebands (shown for the range 650 to 250 ppm in figure B 1.12. 13(d) reveals distinct 
second-order lineshapes with each of the four sites providing contributions from the (±1, ±1) and (±|, ±|) 


transitions. The improved resolution provided by the (±1, ±1) transition compared to the central transition is 

clear. For the A2 site both of the contributing transitions are clearly seen while for the other three only the (±1 

±1) contribution is really observable until the vertical scale is increased by six, which then shows the outer 

transition for all sites, especially A3. The simulation of all three transitions provides an internal check on the 
interaction parameters for each site. 

DOR provides significantly higher resolution than MAS [ 37 , 39 ]. At 1 1.7 T a series of relatively narrow 
resonances and accompanying sidebands are observed under DOR ( figure Bl. 12. 14(a) ). The relatively slow 
spinning speed of the outer rotor results in numerous sidebands and the isotropic line is identified by 
collecting spectra at several different spinning speeds. If the isotropic position is then collected as a function 
of Bq the quadrupole interaction parameters can then be deduced. MQ MAS provides an alternative approach 
for producing high resolution [38, 40], with the whole 2D data set shown at 9.4 T along with the isotropic 
projections at 1 1.7 and 18.8 T ( figure Bl. 12. 14(b) )) [38]. All three isotropic triple quantum (3Q) projections 
show only three resolved lines as the NMR parameters from the two sites (Al, A4) with the largest Cq means 
that their resonances are superimposed at all fields. This is confirmed by the 9.4 T 3Q data where an RF field 
of 280 kHz was employed making the data more quantitative: the three resonances with isotropic shifts of 
43.0, 21.1 and 8.0 ppm had intensities of 2.1:1.1:1.0 respectively. At 1 1.7 and 18.8 T the MQ MAS NMR 
spectra collected are not quantitative since the RF fields to excite the 3Q transitions were not strong enough. 
For /= |the isotropic shifts are -[^ of the value compared to direct MAS at the same field, so MQ data 

effectively produces results at a 'negative' applied magnetic field thereby morestrongly constraining the NMR 
interaction parameters deduced from isotropic shift against B^~ 2 plots. 
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Figure Bl.12.14. (a) Al DOR NMR spectrum of kyanite at 1 1.7 T at various spinning speeds and (b) the 
27 A1 MQ MAS NMR spectrum of kyanite at 9.4 T (top) along with the isotropic projections at 1 1.7 and 18.! 
T. (The MQ MAS NMR data are taken from [ 38 ] with the permission of Academic Press.) 
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When 27 A1 MAS NMR spectra were first collected at moderate B with relatively slow MAS rates (<4 kHz) 
there was much confusion about the quantitative integrity of such spectra. In fact, provided the correct 
excitations are employed (i.e. v 1 l£>v r and Itiv^T <K1) all that is necessary to make MAS NMR spectra 

quantitative is to know what fraction of the different transitions are contributing to the centreband. In practice 
this usually amounts to estimating the fraction of the (^, -|) transition that contributes to the centreband, 

which depends on v 2 o /v v r [41]. At 17.55 T the intensity distribution between the sites in the centreband was 
23%:27%:26%:24% (A1:A2:A3:A4). Once the correction factors are taken into account, within experimental 
error all sites are equally populated. DAS is not reported for kyanite because of its shortcomings for fast- 


relaxing nuclei such as 7 A1. The data on kyanite demonstrates how accurately the interaction parameters and 
intensities can be extracted for quadrupole nuclei by using a combination of advanced techniques. 

B1. 12.4.6 DISTANCE MEASUREMENTS, DIPOLAR SEQUENCES AND CORRELATION EXPERIMENTS 

The dipolar coupling between two nuclei I,S is (f£ i] /A7r)ih/27r}y f y- ii yJ'^ thus in systems containing pairs of 

nuclei measurement of the dipolar coupling will give the distance between them. The simplest pulse sequence 
for doing this, SEDOR (spin echo double resonance) [2], is shown in figure B 1.1 2. 15(a) . A normal 'spin 
echo' sequence is applied to spin /which re focuses the heteronuclear dipole coupling and the chemical shift 
anisotropy producing an echo at 2x. However, if a 180° pulse is applied to the S spin system it will invert the 
sign of any IS dipole coupling and reduce the echo intensity. By varying the 180° pulse position a set of 
difference signals can be used to determine the dipolar coupling and hence the internuclear distance for an 
isolated spin pair system. The SEDOR sequence is only useful for static samples and the range of distances 
accessible is restricted by T 2 , however, it can give useful qualitative information in systems where the spins 
are not isolated pairs. 

MAS averages the heteronuclear dipole interaction giving a much increased resolution and increased T^. The 
REDOR (rotational echo double resonance) [42] sequence enables the heteronuclear dipole coupling to be 
measured for a spinning sample. There are several versions of this experiment; in one ( figure B 1.1 2. 15(b) ) a 
rotor-synchronized echo is applied to /whilst a series of rotor-synchronized 180° pulses are applied to S. As 
with SEDOR, the attenuated signal is subtracted from the normal echo signal to determine the REDOR 
fraction. In suitably labelled systems dipolar couplings as small as 25 Hz have been measured corresponding 
to a distance of 6.7 A [43]. 

TEDOR (transferred echo double resonance) [44, 45] is another experiment for measuring internuclear 
distances whilst spinning. In this ( figure B 1.1 2. 15(c) ) the S spin is observed. There is first a REDOR sequence 
on the / spin, the magnetization is then transferred by applying 90° pulses on both spins and this is followed 
by a REDOR sequence on the S spins. Both REDOR and TEDOR require accurate setting of the 180° pulse 
and good long-term spectrometer stability to be effective. An advantage of the TEDOR sequence is that there 
is no background signal from uncoupled spins; however, the theoretical maximum efficiency is 50% giving a 
reduced signal. Whilst both REDOR and TEDOR can work well for spin /, S = ^pairs, for quadrupolar nuclei 

the situation is more complex [24]. The echo sequence must be applied to the quadrupolar nucleus if only one 
of the pair has spin / > Isince accurate inversion is rarely possible for spins >1. A sequence designed 

specifically for a quadrupolar nucleus is TRAPDOR ( figure B 1.1 2. 15(d) ) in which the dephasing spin is 
always quadrupolar. One cannot obtain accurate values of the dipolar coupling with this sequence but it can 
be used to give qualitative information about spatial proximity. There are many other pulse sequences 
designed to give distance information via the dipole-dipole interaction, most of limited applicability. One, 
designed specifically for homonuclei, that works well in labelled compounds is DRAMA [ 46 ] and a 
comparison of the more popular homonuclear sequences is given in [47]. 
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Figure Bl.12.15. Some double-resonance pulse sequences for providing distance information in solids: (a) 
SEDOR, (b) REDOR, (c) TEDOR and (d) TRAPDOR. In all sequences the narrow pulses are 90° and the 
wide pulses 180°. For sequences that employ MAS the number of rotor cycles (N Q ) is shown along the 
bottom. 
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In liquid-state NMR two-dimensional correlation of spectra has provided much useful information. However, 
in solids the linewidths (T 2 ), even under MAS, are usually such that experiments like COSY and 
INADEQUATE which rely on through-bond J coupling cannot be used (although there are some notable 
exceptions). The most commonly used heteronuclear correlation (HETCOR) experiments in solids all rely on 
dipolar coupling and can thus be complicated by less local effects. Nevertheless, they are able to correlate 
specific sites in the MAS spectrum of one nucleus with sites of the second nucleus which are nearby. The 
simplest versions of the experiment are just a two-dimensional extension of CP in which the pulse that 
generates magnetization is separated from the matching pulses by a time which is incremented to give the 
second dimension. An example of the usefulness of the technique is shown in figure B 1.1 2. 16 where the H- 
3 P correlation of two inorganic hydrated phosphates, (a) brushite and (b) a bone, are shown [48], In both 
cases two phosphorus sites that completely overlap in the ID MAS spectrum are clearly visible in the 2D 
spectrum. 
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Figure Bl.12.16. 1 H- 31 P HETCOR NMR spectra from (a) brushite and (b) bone. (Adapted from [48] with 
the permission of Academic Press.) 
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B1.13 NMR relaxation rates 

Jozef Kowalewski 


B1.13.1 INTRODUCTION 

Nuclear magnetic resonance (NMR) spectroscopy deals with the interactions among nuclear magnetic 
moments (nuclear spins) and between the spins and their environment in the presence of a static magnetic 
field Bq, probed by radiofrequency (RF) fields. The couplings involving the spins are relatively weak which 
results, in particular in low-viscosity liquids, in narrow lines and spectra with a rich structure. The NMR 
experiments are commonly carried out in the time domain: the spins are manipulated by sequences of RF 
pulses and delays, creating various types of non-equilibrium spin state, and the NMR signal corresponding to 
magnetization components perpendicular to B^ is detected as a function of time (the free induction decay, 
FID). In terms of the kinetics, the weakness of the interactions results in slow decays (typically milliseconds 
to seconds) of non-equilibrium states. The recovery processes taking the ensembles of nuclear spins back to 
equilibrium and some related phenomena, are called nuclear spin relaxation. The relaxation behaviour of 
nuclear spin systems is both an important source of information about the molecular structure and dynamics 
and a factor to consider in optimizing the design of experiments. 

The concept of 'relaxation time' was introduced to the vocabulary of NMR in 1946 by Bloch in his famous 
equations of motion for nuclear magnetization vector M [JJ: 


(B1.13.1) 




The phenomenological Bloch equations assume the magnetization component along B^ (the longitudinal 
magnetization M z ) to relax exponentially to its equilibrium value, M Q . The time constant for the process is 
called the spin-lattice or longitudinal relaxation time, and is denoted Ty The magnetization components 
perpendicular to B n (the transverse MM magnetization, M ) are also assumed to relax in an exponential 
manner to their equilibrium value of zero. The time constant for this process is called the spin-spin or 
transverse relaxation time and is denoted T 2 - The inverse of a relaxation time is called a relaxation rate. 

The first microscopic theory for the phenomenon of nuclear spin relaxation was presented by Bloembergen, 
Purcell and Pound (BPP) in 1948 [2]. They related the spin-lattice relaxation rate to the transition 
probabilities between the nuclear spin energy levels. The BPP paper constitutes the foundation on which most 
of the subsequent theory has been built, but contains some faults which were corrected by Solomon in 1955 
[3]. Solomon noted also that a correct description of even a very simple system containing two interacting 
spins, requires introducing the concept of cross-relaxation, or magnetization exchange between the spins. The 
subsequent development has been rich and the goal of this entry is to provide a flavour of relaxation theory 
( section b 1.1 3. 2 ), experimental techniques ( section b 1.1 3. 3 ) and applications ( section b 1.1 3. 4 ). 


The further reading list for this entry contains five monographs, a review volume and two extensive reviews 
from the early 1990s. The monographs cover the basic NMR theory and the theoretical aspects of NMR 
relaxation. The review volume covers many important aspects of modern NMR experiments in general and 
relaxation measurements in particular. The two reviews contain more than a thousand references to 
application papers, mainly from the eighties. The number of literature references provided in this entry is 
limited and, in particular in the theory and experiments sections, priority is given to reviews rather than to 
original articles. 


B1.13.2 RELAXATION THEORY 

We begin this section by looking at the Solomon equations, which are the simplest formulation of the 
essential aspects of relaxation as studied by NMR spectroscopy of today. A more general Redfield theory is 
introduced in the next section, followed by the discussion of the connections between the relaxation and 
molecular motions and of physical mechanisms behind the nuclear relaxation. 

B1. 13.2.1 THE SOLOMON THEORY FOR A TWO-SPIN SYSTEM 

Let us consider a liquid consisting of molecules containing two nuclei with the spin quantum number 1/2. We 
denote the two spins / and S and assume that they are distinguishable, i.e. either belong to different nuclear 
species or have different chemical shifts. The system of such two spins is characterized by four energy levels 
and by a set of transition probabilities between the levels, cf. Figure Bl.13.1 . We assume at this stage that the 
two spins interact with each other only by the dipole-dipole (DD) interaction. The DD interaction depends on 
the orientation of the internuclear axis with respect to B^ and is thus changed (and averaged to zero on a 
sufficiently long time scale) by molecular tumbling. Taking this motional variation into consideration, it is 
quite straightforward to use time-dependent perturbation theory to derive transition probabilities between the 
pairs of levels in Figure Bl.13.1 . Briefly, the transition probabilities are related to spectral density functions 


{vide infra), which measure the intensity of the local magnetic fields fluctuating at the frequencies 
corresponding to the energy level differences in Figure B 1.1 3.1 . Solomon [3] showed that the relaxation of 
the longitudinal magnetization components, proportional to the expectation values of I z and S z operators, was 
related to the populations of the four levels and could be described by a set of two coupled equations: 


^ = -( Wu + 2W U + W 2 )({I Z ) - /?) - (W 2 - W V )({S Z } - £?) 


d{S 9 ) 
di 


(B1.13.2) 


= -(Wi- W )«J.> - I?) - (W Q + 2W ]S + W 2 )({S.) - 5?) 


or 


d(S z ) 




(B1.13.3) 


I? and S? are the equilibrium magnetization for the two spins and p 1 and p s are the corresponding decay rates 
(spin-lattice relaxation rates). The symbol g is denotes the cross-relaxation rate. The general solutions of 
equation B 1.1 3. 2 or equation B 1.1 3. 3 for ( I z ) or ( S z ) are sums of two exponentials, i.e. the longitudinal 
magnetizations in a two-spin system do not follow the Bloch equations. Solomon demonstrated also that the 
simple exponential relaxation behaviour of the longitudinal magnetization was recovered under certain 
limiting conditions. 
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Figure Bl.13.1. Energy levels and transition probabilities for an IS spin system. (Reproduced by permission 
of Academic Press from Kowalewski J 1990 Annu. Rep. NMR Spectrosc. 22 308-414.) 

(i) The two spins are identical (e.g. the two proton spins in a water molecule). We then have W 1I =W 1S =W 
and the spin-lattice relaxation rate is given by: 


- 1 


T' 1 =20^ + 1^2). 


(B1.13.4) 


(ii) One of the spins, say S, is characterized by another, faster, relaxation mechanism. We can then say that the 


S spin remains in thermal equilibrium on the time scale of the /-spin relaxation. This situation occurs in 
paramagnetic systems, where S is an electron spin. The spin-lattice relaxation rate for the / spin is then given 
by: 

Pl = Tf f * = Wo + 2W\ +■ W%* (B1.13.5) 

(iii) One of the spins, say S again, is saturated by an intense RF field at its resonance frequency. These are the 
conditions applying for e.g. carbon- 13 (treated as the /spin) under broad-band decoupling of protons (S 
spins). The relaxation rate is also in this case given by equation B 1.1 3. 5. In addition, we then observe the 
phenomenon referred to as the (hetero)nuclear Overhauser enhancement (NOE), i.e. the steady-state solution 
for ( I ) is modified to 


z' 


ih)*m4y*xe = /* + —5?. (B1 .13.6) 


B1. 13.2.2 THE REDFIELD THEORY 

A more general formulation of relaxation theory, suitable for systems with scalar spin-spin couplings (J 
couplings), is known as the Wangsness, Bloch and Redfield (WBR) theory or the Redfield theory [4]. In 
analogy with the Solomon theory, the Redfield theory is also based on the second-order perturbation theory, 
which in certain situations (unusual for nuclear spin systems in liquids) can be a limitation. Rather than 
dealing with the concepts of magnetizations or energy level populations in the Solomon formulation, the 
Redfield theory is given in terms of density operator (for a general review of the density operator formalism, 
see 'Further reading'). Briefly, the density operator describes the average behaviour of an ensemble of 
quantum mechanical systems. It is usually expressed by expansion in a suitable operator basis set. For the 
discussion of the IS system above, the appropriate 16-dimensional basis set can e.g. consist of the unit 
operator, E, the operators corresponding to the Cartesian components of the two spins, / x , / , I z , S x , S , S z and 
the products of the components of /and the components of S. These 16 operators span the Liouville space for 
our two-spin system. If we concentrate on the longitudinal relaxation (the relaxation connected to the 
distribution of populations), the Redfield theory predicts the relaxation to follow a set of three coupled 
differential equations: 

(B1.13.7) 
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The difference compared to equation B 1.1 3. 2 or equation B 1.1 3. 3 is the occurrence of the expectation value 
of the 2I zSz operator (the two-spin order), characterized by its own decay rate p IS and coupled to the one-spin 
longitudinal operators by the terms Sj IS and & SIS . We shall come back to the physical origin of these terms 
below. 

The matrix on the rhs of equation B 1.1 3. 7 is called the relaxation matrix. To be exact, it represents one block 
of a larger, block-diagonal relaxation matrix, defined in the Liouville space for the two-spin system. The 
remaining part of the large matrix (sometimes also called the relaxation supermatrix) describes the relaxation 
of coherences, which can be seen as generalizations of the transverse components of the magnetization vector. 
In systems without degeneracies, each of the coherences decays exponentially, with its own T^. The Redfield 
theory can be used to obtain expressions for the relaxation matrix elements for arbitrary spin systems and for 
any type of relaxation mechanism. In analogy with the Solomon theory, also these more general relaxation 


rates are expressed in terms of various spectral density functions. 
B1. 13.2.3 MOLECULAR MOTIONS AND SPIN RELAXATION 

Nuclear spin relaxation is caused by fluctuating interactions involving nuclear spins. We write the 
corresponding Hamiltonians (which act as perturbations to the static or time-averaged Hamiltonian, 
determining the energy level structure) in terms of a scalar contraction of spherical tensors: 


H t (r)= J^i-l^F^uyA^K (B1.13.8) 


<?=-/ 


j is the rank of the tensor describing the relevant interactions, which can be 0, 1 or 2. A^ _q ^ are spin operators 
and F( q )(t) represent classical functions related to the lattice, i.e. to the classically described environment of 
the spins. The functions F^ q \t) are, because of random molecular motions, stochastic functions of time. A 
fluctuating (stochastic) interaction can cause transitions between the energy levels of a spin system (and thus 
transfer the energy between the spin systems and its environment) if the power spectrum of the fluctuations 
contains Fourier components at frequencies corresponding to the relevant energy differences. In this sense, the 
transitions contributing to the spin relaxation and originating from randomly fluctuating interactions, are not 
really fundamentally different from the transitions caused by coherent interactions with electromagnetic 
radiation. According to theory of stochastic processes (the Wiener-Khinchin theorem) [5], the power 
available at a certain frequency, or the spectral density function J(co) at that frequency, is obtained as a Fourier 
transform of a time correlation function (tcf) characterizing the stochastic process. A tcf G(x) for a stochastic 
function of time F^(t) is defined: 

G(r) = (F^Ht)F w (t + t)) (B1.13.9) 

with the corresponding spectral density: 

J(g>)= / G(r)exp(i^r)dr (B1.13.10) 

J -iX> 

where the symbol () denotes ensemble average. The function in the lhs of equation B 1.1 3. 9 is called the auto- 
correlation function. The prefix 'auto' refers to the fact that we deal with an ensemble average of the product 
of a stochastic function taken at one point in time and the same function at another point in time, the 
difference between the two time points being x. In certain situations in relaxation theory, we also need cross- 
correlation functions, where we average the product of two different stochastic functions, corresponding to 
different Hamiltonians of equation Bl. 13. 8 . 

We now come back to the important example of two spin 1/2 nuclei with the dipole-dipole interaction 
discussed above. In simple physical terms, we can say that one of the spins senses a fluctuating local magnetic 
field originating from the other one. In terms of the Hamiltonian of equation Bl. 13. 8 , the stochastic function 

of time F( q \t) is proportional to Y 2 (0,(|))/r IS 3 , where Y~ is an / = 2 spherical harmonic and r™ is the 
internuclear distance. If the two nuclei are directly bondeu, r JS can be considered constant and the random 

variation of F^(t) originates solely from the reorientation of the molecule-fixed internuclear axis with respect 
to the laboratory frame. The auto-tcf for Y 2 can be derived, assuming that the stochastic time dependence can 

be described by the isotropic rotational diffusion equation. We obtain then: 


G{x) = G(Q)cxp{-T/T c ) 


(B1.13.11J 


with the corresponding spectral density: 


JUo)=G[0) 


2t, 


1 + flPr c 2 


(B1.13.12) 


We call x c the correlation time: it is equal to 1/6 D R , where D R is the rotational diffusion coefficient. The 
correlation time increases with increasing molecular size and with increasing solvent viscosity, equation 
Bl.13.11 and equation Bl. 13. 12 describe the rotational Brownian motion of a rigid sphere in a continuous and 
isotropic medium. With the Lorentzian spectral densities of equation Bl. 13. 12 , it is simple to calculate the 
relevant transition probabilities. In this way, we can use e.g. equation B 1.1 3. 5 to obtain T for a carbon- 13 
bonded to a proton as well as the corresponding T , as a function of the correlation time for a given magnetic 
field, cf. Figure B 1.1 3. 2. The two rates are equal in the region 03i c (( 1, referred to as the 'extreme narrowing'. 
In the extreme narrowing regime, the relaxation rates are independent of the magnetic field and are simply 
given by a product of the square of an interaction strength constant (the dipole coupling constant, DCC) and 
the correlation time. 
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Figure Bl.13.2. Spin-lattice and spin-spin relaxation rates (R^ and R 2 , respectively) for a carbon-13 spin 
directly bonded to a proton as a function of correlation time at the magnetic fields of 7 and 14 T. 

In order to obtain a more realistic description of reorientational motion of internuclear axes in real molecules 
in solution, many improvements of the tcf of equation Bl.13.11 have been proposed [6]. Some of these 
models are characterized in table Bl. 13.1 . The entry 'number of terms' refers to the number of exponential 
functions in the relevant tcf or, correspondingly, the number of Lorentzian terms in the spectral density 
function. 


Table Bl.13.1 Selected dynamic models used to calculate spectral densities. 


Dynamic model 


Parameters influencing the Number 
spectral densities of terms 


Comment 


Ref.- 


Isotropic rotational 
diffusion 

Rotational diffusion, 
symmetric top 


Rotational diffusion, 
asymmetric top 

Isotropic rotational 
diffusion with one 
internal degree of 
freedom 


'Model-free' 


Anisotropic model-free 


Rotational diffusion coefficient, 1 
D R =1/6x c 

Two rotational diffusion 3 

coefficients, D,, and D the 

angle 9 between the symmetry 
axis and the internuclear axis 

Three rotational diffusion 5 

coefficients, two polar angles $9 
and x 

Rotational diffusion coefficient, 3 
D 3 , internal motion rate 

parameter, angle between the 
internal rotation axis and the 
internuclear axis 

Global and local correlation 2 

times, generalized order 
parameter, S 

Dm and D 9, local correlation 4 

time, generalized order 
parameter, S 


Useful as a first 
approximation 

Further 
reading 

Rigid molecule, requires the 
knowledge of geometry 

m 

Rigid molecule, rather 
complicated 

[8] 

Useful for e.g. methyl 
groups 

[9] 


Widely used for non-rigid 
molecules 


Allows the interpretation of 
data for non-spherical, non- 
rigid molecules 


B1. 13.2.4 RELAXATION MECHANISMS 

The DD interaction discussed above is just one — admittedly a very important one — among many possible 
sources of nuclear spin relaxation, collected in table Bl. 13. 2 (not meant to be fully comprehensive). The 
interactions listed there are often used synonymously with 'relaxation mechanisms' and more detailed 
descriptions of various mechanisms can be found in 'Further reading'. We can note in table Bl. 13. 2 that one 
particular type of motion — reorientation of an axis — can cause random variations of many j = 2 interactions: 
DD for different intramolecular axes, CSA, quadrupolar interaction. In fact, all the interactions with the same 
tensor rank can give rise to the interference or cross-terms with each other. In an important paper by 
Szymanski et al [20], the authors point out that the concept of a 'relaxation mechanism' is more appropriate to 
use referring to a pair of interactions, rather than to a single one. The role of interference terms has been 
reviewed by Werbelow [21]. They often contribute to relaxation matrices through coherence or polarization 
transfer and through higher forms of order or coherence. The relevant spectral densities are of cross- 
correlation (rather than auto-correlation) type. For example, the off-diagonal 8-terms in the relaxation matrix 
of equation B 1.1 3. 7 , connecting the one- and two-spin longitudinal order, are caused by cross-correlation 
between the DD and CSA interactions, which explains why they are called 'cross-correlated relaxation rates'. 


Table Bl.13.2 Interactions giving rise to nuclear spin relaxation. 


Interaction 


Tensor Process causing the random changes 

rank Comment 


Ref. 


Intramolecular 

2 

Reorientation of the inter- 

Very common for / = 1/2 

Further 

dipole-dipole (DD) 


nuclear axis 


reading 

Intermolecular DD 

2 

Distance variation by 
translational diffusion 

Less common 

[12] 

Chemical shift 

2 

Reorientation of the 

Increases with the square 

[13] 

anisotropy (CSA) 


CSA principal axis 

of the magnetic field 


Intramolecular 

2 

Reorientation of the 

Dominant for />1 

[14] 

quadrupolar 


electric field gradient 
principal axis 

(covalently bonded) 


Intermolecular 

2 

Fluctuation of the 

Common for / >1 in free 

[15] 

quadrupolar 


electric field gradient, 
moving multipoles 

ions in solution 


Antisymmetric CSA 

1 

Reorientation of a pseudo- 
vector 

Very uncommon 

[13] 

Spin-rotation 

1 

Reorientation and time 
dependence of angular 
momentum 

Small molecules only 

[16] 

Scalar coupling 



Relaxation of the coupled 

Can be important for T 

Further 


Hyperfine interaction 2,0 
(dipolar and scalar) 


spin or exchange 
Electron relaxation, may be 
complicated 


reading 
Paramagnetic systems and [ 17-19 ] 
impurities 


Before leaving the subsection on relaxation mechanisms, I wish to mention the connections between 
relaxation and chemical exchange (exchange of magnetic environments of spins by a chemical process). The 
chemical exchange and relaxation determine together the NMR lineshapes, the exchange affects the measured 
relaxation rates and it can also act as a source of random variation of spin interactions. The relaxation effects 
of chemical exchange have been reviewed by Woessner [22]. 


B1.13.3 EXPERIMENTAL METHODS 

Relaxation experiments were among the earliest applications of time-domain high-resolution NMR 
spectroscopy, invented more than 30 years ago by Ernst and Anderson [23]. The progress of the experimental 
methodology has been enormous and only some basic ideas of the experiment design will be presented here. 
This section is divided into three subsections. The first one deals with Bloch equation-type experiments, 
measuring T^ and T 2 when such quantities can be defined, i.e. when the relaxation is monoexponential. As a 
slightly oversimplified rule of thumb, we can say that this happens in the case of isolated spins. The two 
subsections to follow cover multiple-spin effects. 

B1. 13.3.1 SPIN-LATTICE AND SPIN-SPIN RELAXATION RATES 


Measurements of spin-lattice relaxation time, 7^, are the simplest relaxation experiments. A straight-forward 
method to measure T^ is the inversion-recovery experiment, the principle of which is illustrated in figure 
Bl.13.3 . The equilibrium magnetization M Q or M z (oo) (cf. figure B 1.1 3. 3 (A) ), directed along B^ (the z-axis), 
is first inverted by a 180° pulse (a Ti-pulse), a RF pulse with the duration t 180 o and the RF magnetic field B^ 
(in the the direction perpendicular to Bq) chosen so that the magnetization is nutated by 180° around the B^ 
direction. The magnetization immediately after the 180° pulse is directed along the -z direction (cf. figure 
Bl. 13.3(B) ) and starts to relax following equation Bl. 13.1 . After a variable delay x, when the M z (x) has 
reached the stage depicted in figure Bl. 13. 3(C) , a 90° pulse is applied. This pulse nutates the magnetization 
along the z-axis to the x,jy-plane (cf. figure B 1.1 3. 3(D) ), where it can be detected in the form of a FID. If 


required, the experiment can be repeated a number of times to improve the signal-to-noise ratio, waiting for 

about 5T 1 (recycle delay) between scans to allow for return to equilibrium. The subsequent Fourier 
transformation (FT) of the FID gives a spectrum. The experiment is repeated for different values of the delay 

x and the measured line intensities are fitted to an exponential expression S(x) = A+ B exp(-x / T ). The 
inversion-recovery experiments are often performed for multiline spectra of low-natural abundance nuclei, 

such as 13 C or 15 N, under the conditions of broadband saturation (decoupling) of the abundant proton spins. 
The proton (S-spin) decoupling renders the relaxation of the /spin of the dipolarly coupled /S-spin system 

monoexponential; we may say that the decoupling results in 'pseudo-isolated' /spins. An example of a C 
inversion-recovery experiment for a trisaccharide, melezitose, is shown in figure B 1.1 3. 4 . 
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Figure Bl.13.3. The inversion-recovery experiment. (Reproduced by permission of VCH from Banci L, 
Bertini I and Luchinat C 1991 Nuclear and Electron Relaxation (Weinheim: VCH).) 
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Figure Bl.13.4. The inversion-recovery determination of the carbon- 13 spin-lattice relaxation rates in 
melezitose. (Reproduced by permission of Elsevier from Kowalewski J and Maler L 1997 Methods for 
Structure Elucidation by High-Resolution NMR ed Gy Batta, K E Kover and Cs Szantay (Amsterdam: 
Elsevier) pp 325-47.) 

NMR spectroscopy is always struggling for increased sensitivity and resolution, as well as more efficient use 
of the instrument time. To this end, numerous improvements of the simple inversion-recovery method have 
been proposed over the years. An early and important modification is the so-called fast inversion recovery 

(FIR) [25], where the recycle delay is made shorter than 5T and the experiment is carried out under the 
steady-state rather than equilibrium conditions. A still more time-saving variety, the super-fast inversion 
recovery (SUFIR) has also been proposed [26], 

Several other improvements of the inversion-recovery scheme employ advanced tools of modern NMR 
spectroscopy: polarization transfer and two-dimensional spectroscopy (see further reading). The basic design 
of selected pulse sequences is compared with the simple inversion-recovery scheme in figure Bl. 13.5 taken 
from Kowalewski and Maler [24], where references to original papers can be found. The figure B 1.1 3. 5(a) , 
where thick rectangular boxes denote the 180° /-spin pulses and thin boxes the corresponding 90° pulses, is a 
representation of the inversion-recovery sequence with the continuous saturation of the protons. In figure 
Bl. 13.5(b) , the inverting /-spin pulse is replaced by a series of pulses, separated by constant delays and 
applied at both the proton and the /-spin resonance frequencies, which creates a more strongly polarized initial 
/-spin state (the polarization transfer technique). In figure B 1.1 3. 5(c) , a two-dimensional (2D) NMR 
technique is employed. This type of approach is particularly useful when the sample contains many 
heteronuclear IS spin pairs, with different /s and different Ss characterized by slightly different resonance 
frequencies (chemical shifts), resulting in crowded spectra. In a generic 2D experiment, the NMR signal is 
sampled as a function of two time variables: t 2 is the running time during which the FID is acquired (different 


-12- 

points in the FID have different t 2 ). In addition, the pulse sequence contains an evolution time ^, which is 
systematically varied in the course of the experiment. The double Fourier transformation of the data matrix S 

(t ,t ) yields a two-dimensional spectrum. In the example of figure B 1.13.5(c) , the polarization is transferred 
first from protons to the /spins, in the same way as in figure B 1.1 3. 5(b). This is followed by the evolution 


time, during which the information on the various /-spin resonance frequencies is encoded. The next period is 
the analogue of the delay x of the simple inversion-recovery experiment. The final part of the sequence 
contains an inverse polarization transfer, from / spins to protons, followed by the proton detection. The 
resulting 2D spectrum, for a given delay x, has the proton chemical shifts on one axis and the shifts of the J- 
coupled /-spin on the other one. We can thus call the experiment the proton-/-spin correlation experiment. 
This greatly improves the spectral resolution. Spectra with several different x delays are acquired and the / 
spin T^ is determined by fitting the intensity decay for a given peak in the 2D spectrum. 
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Figure Bl.13.5. Some basic pulse sequences for T^ measurements for carbon-13 and nitrogen-15. 
(Reproduced by permission of Elsevier from Kowalewski J and Maler L 1997 Methods for Structure 
Elucidation by High-Resolution NMR ed Gy Batta, K E Kover and Cs Szantay (Amsterdam: Elsevier) pp 325- 

47.) 

The discussion above is concerned with T^ experiments under high resolution conditions at high magnetic 
field. In studies of complex liquids (polymer solutions and melts, liquid crystals), one is often interested to 
obtain information on rather slow (micro- or nanosecond time scale) motions measuring T, at low magnetic 
fields using the field-cycling technique [27]. The same technique is also invaluable in studies of paramagnetic 
solutions. Briefly, the non-equilibrium state is created by keeping the sample at a certain, moderately high 
polarizing field B^ and then rapidly switching B^ to another value, at which we wish to measure Ty After the 
variable delay x, the field is switched again and the signal is detected. 
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The spin-spin relaxation time, 7 2 , defined in the Bloch equations, is simply related to the width Av 1/2 of the 

Lorentzian line at the half-height: Av 1/2 =1/tt T . Thus, it is in principle possible to determine T 2 by measuring 
the linewidth. This simple approach is certainly useful for rapidly relaxing quadrupolar nuclei and has also 
been demonstrated to work for 1=1/2 nuclei, provided the magnetic field homogeneity is ascertained. The 
more usual practice, however, is to suppress the inhomogenous broadening caused by the spread of the 
magnetic field values (and thus resonance frequencies) over the sample volume, by the spin-echo technique. 
Such experiments are in general more difficult to perform than the spin-lattice relaxation time measurements. 
The most common echo sequence, the 90°-x-180°-x-echo, was originally proposed by Carr and Purcell [ 28 ] 
and modified by Meiboom and Gill [29]. After the initials of the four authors, the modified sequence is widely 
known as the CPMG method. The details of the behaviour of spins under the spin-echo sequence can be found 


in modern NMR monographs (see further reading) and will not be repeated here. We note, however, that 
complications can arise in the presence of scalar spin-spin couplings. 

An alternative procedure for determining the transverse relaxation time is the so-called T\ experiment. The 
basic idea of this experiment is as follows. The initial 90° /-spin pulse is applied with B^ in the x-direction, 
which turns the magnetization from the z-direction to the ^-direction. Immediately after the initial pulse, the 
B^ RF field is switched to the ^-direction so that M and B^ become collinear. The notation T, (7^ in the 
rotating frame) alludes to the fact that the decay of M along B^ is similar to the relaxation of longitudinal 
magnetization along B^. The T, in liquids is practically identical to T 2 - The measurements of T 2 or T, can, 
in analogy with T^ studies, also utilize the modern tools increasing the sensitivity and resolution, such as 
polarization transfer and 2D techniques. 

B1. 13.3.2 CROSS-RELAXATION AND NUCLEAR OVERHAUSER ENHANCEMENT 

Besides measuring T^ and T 2 for nuclei such as 13 C or 15 N, relaxation studies for these nuclei also include 
measurements of the NOE factor, cf. equation B 1.1 3. 6 . Knowing the 7",/~ (pj) and the steady-state NOE 

(measured by comparing the signal intensities in the presence and in the absence of the saturating field), we 
can derive the cross-relaxation rate, a IS , which provides an additional combination of spectral densities, 
useful for e.g. molecular dynamics studies. 

The most important cross-relaxation rate measurement are, however, performed in homonuclear networks of 
chemically shifted and dipolarly coupled proton spins. The subject has been discussed in two books [30, 31 ]. 
There is large variety of experimental procedures [32], of which I shall only mention a few. A simple method 
to measure the homonuclear NOE is the NOE-difference experiment, in which one measures spectral 
intensities using low power irradiation at selected narrow regions of the spectrum, before applying the observe 
pulse. This corresponds to different individual protons acting as the saturated S spins. The difference spectrum 
is obtained by subtracting the spectrum obtained under identical conditions, but with the irradiation frequency 
applied in a region without any peaks to saturate. The NOE-difference experiments are most often applied in a 
semi-quantitative way in studies of small organic molecules. 

For large molecules, such as proteins, the main method in use is a 2D technique, called NOESY (nuclear 
Overhauser effect spectroscopy). The basic experiment [33, 34] consists of three 90° pulses. The first pulse 
converts the longitudinal magnetizations for all protons, present at equilibrium, into transverse magnetizations 
which evolve during the subsequent evolution time ty In this way, the transverse magnetization components 
for different protons become labelled by their resonance frequencies. The second 90° pulse rotates the 
magnetizations to the -z-direction. 
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The interval between the second and third pulse is called the mixing time, during which the spins evolve 
according to the multiple-spin version of equation B 1.1 3. 2 and equation B 1.1 3. 3 and the NOE builds up. The 
final pulse converts the longitudinal magnetizations, present at the end of the mixing time, into detectable 
transverse components. The detection of the FID is followed by a recycle delay, during which the equilibrium 

is recovered and by the next experiment, e.g. with another ty After acquiring the 2D data matrix S(t 1 ,t 2 ) (for a 
given mixing time) and the double Fourier transformation, one obtains a 2D spectrum shown schematically in 
figure Bl.13.6. The individual cross-relaxation rates for pairs of spins can be obtained by following the build- 
up of the cross-peak intensities as a function of the mixing time for short mixing times (the so-called initial 
rate approximation). For longer mixing times and large molecules, the cross-peaks show up in a large number 
of positions, because of multiple transfers called spin-diffusion. The analysis then becomes more complicated, 
but can be handled based on a generalization of the Solomon equation to many spins (the complete relaxation 
matrix treatment) [35]. 


Diagonal peaks 






Figure Bl.13.6. The basic elements of a NOESY spectrum. (Reproduced by permission of Wiley from 
Williamson M P 1996 Encyclopedia of Nuclear Magnetic Resonance ed D M Grant and R K Harris 
(Chichester: Wiley) pp 3262-71). 

As seen in equation B 1.1 3. 2 and equation Bl. 13. 3 , the cross-relaxation rate a IS is given by W -W , the 
difference between two transition probabilities. Assuming the simple isotropic rotational diffusion model, 
each of the transition probabilities is proportional to a Lorentzian spectral density (cf. equation B 1.1 3. 12 ), 
taken at the frequency of the corresponding transition. For the homonuclear case, W 2 corresponds to a 
transition at high frequency (co I +co s )»2co I , while W^ is proportional to a Lorentzian at (co I -co s )«0. When the 
product gOjT is small, W 2 is larger than W^ and the cross-relaxation rate is positive. When the product co i t c is 
large, the Lorentzian function evaluated at 2cOj is much smaller than at zero-frequency and a IS changes sign. 
The corresponding NOESY peak intensities in a two-spin 
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system are shown as a function of the mixing time in figure B 1.1 3. 7. Clearly, the intensities of the cross-peaks 

for small molecules (co 2 x 2 (( 1) have one sign, while the opposite sign pertains for large molecules (co 2 x 2 ))1). 
At a certain critical correlation time, we obtain no NOESY cross-peaks. In such a situation and as complement 
to the NOESY experiments, one can perform an experiment called ROESY (rotating frame Overhauser effect 
spectroscopy) [36]. The relation between NOESY and ROESY is similar to that between T ] and T 1 . 
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Figure Bl.13.7. Simulated NOESY peak intensities in a homonuclear two-spin system as a function of the 
mixing time for two different motional regimes. (Reproduced by permission of Wiley from Neuhaus D 1996 
Encyclopedia of Nuclear Magnetic Resonance ed D M Grant and R K Harris (Chichester: Wiley) pp 3290- 
301.) 

B1. 13.3.3 CROSS-CORRELATED RELAXATION 

Studies of cross-correlated relaxation have received increasing attention during the last decades. The general 
strategies for creating and detecting different types of spin-ordered state and for measuring the transfer rates 
have been discussed by Canet [37]. I shall concentrate here on measurements of the DD-CSA interference 
terms in two-spin systems, the Sj IS and 8 SJS terms in equation Bl.13.7 . Let us consider a system where the 
two spins have their resonances sufficiently far apart that we can construct pulses selective enough to 
manipulate one of them at a time (this is automatically fulfilled for a heteronuclear case; in the homonuclear 
this requires specially shaped low power RF pulses). One way to measure the longitudinal cross-correlated 
relaxation rates is to invert one of the spins by a 180° pulse and to detect the build-up of the two-spin order. 
The two-spin order (2I zSz ) is not detectable directly but, if one of the spins is exposed to a 90° pulse, the two- 
spin order becomes converted into a detectable signal in the form of an antiphase doublet, cf. Figure Bl.13.8 
(the corresponding one-spin order subjected to a 90° pulse gives rise to an in-phase doublet). To separate the 
two types of order in a clean way, one can use an RF pulse trick called the double-quantum filter. There are 
many ways to optimize the above method as well as other schemes to measure the ( I z ) to (2I zSz ) transfer 
rates. One such scheme uses a set of three selective NOESY experiments, where three 90° pulses strike the 
spins in the sequence III, SSS and IIS [38]. Another scheme uses an extended sequence of 180° pulses, 
followed by a detecting pulse [39]. 
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Figure Bl.13.8. Schematic illustration of (a) an antiphase doublet, (b) an in-phase doublet and (c) a 
differentially broadened doublet. The splitting between the two lines is in each case equal to J, the indirect 
spin-spin coupling constant. 

The cross-correlation effects between the DD and CSA interactions also influence the transverse relaxation 
and lead to the phenomenon known as differential line broadening in a doublet [40], cf. Figure Bl.13.8. There 
is a recent experiment, designed for protein studies, that I wish to mention at the end of this section. It has 
been proposed by Pervushin et al [41], is called TROSY (transverse relaxation optimized spectroscopy) and 

employs the differential line-broadening in a sophisticated way. One works with the 15 N-proton /-coupled 
spin system. When the system is subjected to a 2D nitrogen-proton correlation experiment, the spin-spin 
coupling gives rise to four lines (a doublet in each dimension). The differential line broadening results in one 
of the four lines being substantially narrower than the other ones. Pervushin et al [41] demonstrated that the 
broader lines can be suppressed, resulting in greatly improved resolution in the spectra of large proteins. 


B1.13.4 APPLICATIONS 

In this section, I present a few illustrative examples of applications of NMR relaxation studies within different 
branches of chemistry. The three subsections cover one 'story' each, in order of increasing molecular size and 
complexity of the questions asked. 


-17- 
B1. 13.4.1 SMALL MOLECULES: THE DUAL SPIN PROBE TECHNIQUE 

Small molecules in low viscosity solutions have, typically, rotational correlation times of a few tens of pico- 
seconds, which means that the extreme narrowing conditions usually prevail. As a consequence, the 
interpretation of certain relaxation parameters, such as carbon- 13 T^ and NOE for proton-bearing carbons, is 
very simple. Basically, the DCC for a directly bonded CH pair can be assumed to be known and the 
experiments yield a value of the correlation time, x Q . One interesting application of the measurement of x c is 
to follow its variation with the site in the molecule (motional anisotropy), with temperature (the correlation 


time increases often with decreasing temperature, following an Arrhenius-type equation) or with the 
composition of solution. The latter two types of measurement can provide information on intermolecular 
interactions. 

Another application of the knowledge of x c is to employ it for the interpretation of another relaxation 
measurement in the same system, an approach referred to as the dual spin probe technique. A rather old, but 

illustrative, example is the case of ^ra(pentane-2,4-dionato)aluminium(III), Al(acac) 3 . Dechter et al [ 42 ] 
reported measurements of carbon- 13 spin-lattice relaxation for the methine carbon and of the aluminium-27 

linewidth in Al(acac) 3 in toluene solution. 27 A1 is a quadrupolar nucleus (1=5/2) and the linewidth gives 
directly the 7^, which depends on the rotational correlation time (which can be assumed the same as for the 

methine CH axis) and the strength of the quadrupolar interaction (the quadrupolar coupling constant, QCC). 
Thus, the combination of the carbon- 13 and aluminium-27 measurements yields the QCC. The QCC in this 
particular case was also determined by NMR in a solid sample. The two measurements agree very well with 
each other (in fact, there is a small error in the paper [42]: the QCC from the linewidth in solution should be a 
factor of 2tt larger than what is stated in the article). More recently, Champmartin and Rubini [43] studied 
carbon- 13 and oxygen- 17 (another 1=5/2 quadrupolar nucleus) relaxation in pentane-2,4-dione (Hacac) and Al 
(acac) 3 in solution. The carbon measurements were performed as a function of the magnetic field. The 

methine carbon relaxation showed, for both compounds, no field dependence, while the carbonyl carbon T 1_ 
increased linearly with £ This indicates the CSA mechanism and allows an estimate of its interaction strength, 

the anisotropy of the shielding tensor. Also this quantity could be compared with solid state measurements on 

Al(acac) and, again, the agreement was good. From the oxygen- 17 linewidth, the authors obtained also the 
oxygen- 17 QCC. The chemically interesting piece of information is the observation that the QCC changes 
only slightly between the free acid and the trivalent metal complex. 

B1.13.4.2 OLIGOSACCHARIDES: HOW FLEXIBLE ARE THEY? 

Oligosaccharides are a class of small and medium-sized organic molecules, subject to intense NMR work. I 
present here the story of two disaccharides, sucrose and a-D-Man/?-(1^3)-P-D-Glc/?-OMe and a trisaccharide 
melezitose. McCain and Markley [ 44 ] published a carbon- 13 T^ and NOE investigation of sucrose in aqueous 
solution as a function of temperature and magnetic field. At low temperatures, a certain field dependence of 
the parameters could be observed, indicating that the extreme narrowing conditions might not be fulfilled 
under these circumstances. Taking the system out of extreme narrowing (which corresponds to correlation 
times of few hundred picoseconds) renders the relaxation rates field dependent and allows the investigators to 
ask more profound questions concerning the interaction strength and dynamics. Kovacs et al [45] followed up 

the McCain-Markley study by performing similar experiments on sucrose in a 7:3 molar D O/DMSO-d 
solvent mixture. The solvent mixture has about four times higher viscosity than water and is a cryo-solvent. 
Thus, it was possible to obtain the motion of the solute molecule far from the extreme narrowing region and to 
make a quantitative determination of the effective DCC, in a way which is related to the Lipari-Szabo method 
[10]. Maler et al [46] applied a similar experimental approach (but extended also to include the T 2 
measurements) and the Lipari-Szabo analysis to study the molecular dynamics of melezitose in the same 
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mixed solvent. Somewhat different dynamic behaviour of different sugar residues and the exocyclic 
hydroxymethyl groups was reported. 

The question of the flexibility (or rigidity) of the glycosidic linkage connecting the different sugar units in 
oligosaccharides is a hot issue in carbohydrate chemistry. The findings of Maler et al [ 46 ] could be interpreted 
as indicating a certain flexibility of the glycosidic linkages in melezitose. The question was posed more 
directly by Poppe and van Halbeek [47], who measured the intra- and interresidue proton-proton NOEs and 
rotating frame NOEs in sucrose in aqueous solution. They interpreted the results as proving the non-rigidity of 


the glycosidic linkage in sucrose. Maler et al [ 48 ] investigated the disaccharide a-D-Man;?-(1^3)-P-D-Glc/?- 
OMe by a combination of a carbon- 13 spin-lattice relaxation study with measurements of the intra- and 
interresidue proton-proton cross-relaxation rates in a water-DMSO mixture. They interpreted their data in 
terms of the dynamics of the CH axes, the intra-ring HH axis and the trans-glycosidic HH axis being 
described by a single set of Lipari-Szabo parameters. This indicated that the inter-residue HH axis did not 
sense any additional mobility as compared to the other axes. We shall probably encounter a continuation of 
this story in the future. 

B1. 13.4.3 HUMAN UBIQUITIN: A CASE HISTORY FOR A PROTEIN 

Human ubiquitin is a small (76 amino acid) and well characterized protein. I choose to illustrate the 
possibilities offered by NMR relaxation studies of proteins [49] through this example. The 2D NMR studies 
for the proton resonance assignment and a partial structure determination through NOESY measurements 
were reported independently by Di Stefano and Wand [ 50 ] and by Weber et al [51]. Schneider et al [52] 
studied a uniformly nitrogen- 15 labelled species of human ubiquitin and reported nitrogen- 15 7^ (at two 
magnetic fields) and NOE values for a large majority of amide nitrogen sites in the molecule. The data were 
interpreted in terms of the Lipari-Szabo model, determining the generalized order parameter for the amide 
NH axes. It was thus possible to identify the more flexible and more rigid regions in the protein backbone. 
Tjandra et al [53] extended this work, both in terms of experiments (adding T 2 measurements) and in terms of 
interpretation (allowing for the anisotropy of the global reorientation, by means of the anisotropic Lipari- 
Szabo model [10]). This provided a more quantitative interpretation of the molecular dynamics. During the 
last two years, Bax and coworkers have, in addition, determined the CSA for the nitrogen- 15 and proton for 
most of the amide sites, as well as for the a-carbons, through measurements of the cross-correlated relaxation 
rates in the nitrogen- 15-amide proton or the C a -H a spin pairs [54, 55 and 56]. The CSA values could, in turn, 
be correlated with the secondary structure, hydrogen bond length etc. It is not likely that the ubiquitin story is 
finished either. 
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B1.14 NMR imaging (diffusion and flow) 

Ute Goerke, Peter J McDonald and Rainer Kimmich 


B1.14.1 INTRODUCTION 

Nuclear magnetic resonance imaging and magnetic resonance determinations of flow and diffusion are 
increasingly coming to the fore as powerful means for characterizing dynamic processes in diverse areas of 
materials science covering physics, chemistry, biology and engineering. In this chapter, the basic concepts of 
the methods are introduced and a few selected examples presented to show the power of the techniques. 
Although flow and diffusion through bulk samples can be measured, they are primarily treated here as 
parameters to be mapped in an imaging experiment. To that end, imaging is dealt with first, followed by flow 
and diffusion, along with other contrast parameters such as spin-relaxation times and chemical shift. 

The great diversity of applications of magnetic resonance imaging (MRI) has resulted in a plethora of 
techniques which at first sight can seem bemusing. However, at heart they are built on a series of common 


building blocks which the reader will progressively come to recognize. The discussion of imaging is focused 
very much on just three of these — slice selection, phase encoding and frequency encoding, which are brought 
together in perhaps the most common imaging experiment of all, the spin warp sequence [1, 2 and 3]. This 
sequence is depicted in figure B 1.14.1. Its building blocks are discussed in the following sections. Blocks for 
contrast enhancement and parameter selectivity can be added to the sequence. 


RF 


Ji II 4L 


fclmi.- 


^1 


rsn 


slice 


phnie 


n 


frcqunncy 


'HO 


Figure Bl.14.1. Spin warp spin-echo imaging pulse sequence. A spin echo is refocused by a non-selective 
180° pulse. A slice is selected perpendicular to the z-direction. To frequency-encode the x-coordinate the echo 
SE is acquired in the presence of the readout gradient. Phase-encoding of the ^-dimension is achieved by 
incrementing the gradient pulse G . 


B1.14.2 FUNDAMENTALS OF SPATIAL ENCODING 

The classical description of magnetic resonance suffices for understanding the most important concepts of 
magnetic resonance imaging. The description is based upon the Bloch equation, which, in the absence of 
relaxation, may be written as 




= yM(r, t) * J3(t\ r). 


The equation describes the manner in which the nuclear magnetization, M, at position r and time t precesses 
about the magnetic flux density, B, in which it is found. The constant y is the magnetogyric ratio of the 
nuclides under study. The precessional frequency, co , is given by the Larmor equation, 


M = y B. 


(B1.14.1) 


Magnetic resonance imaging, flow and diffusion all rely upon manipulating spatially varying magnetic fields 
in such a manner as to encode spatial information within the accumulated precession of the magnetization. In 


most MRI implementations, this is achieved with three orthogonal, constant, pulsed magnetic field gradients 
which are produced using purpose built current windings carrying switched currents. The gradient fields are 
superimposed on the normal static, applied field B^. Although the current windings produce gradient fields 
with components in all directions, for sufficiently high applied flux densities and small enough gradients, it is 
sufficient to consider only the components in the direction of the applied field, conventionally assigned the z- 
direction. Accordingly, the gradients are referred to as G t - = ^-, G v = -iand G_ = iiiThe local polarizing 

fl.r - By *- io 

field at position r is then given by 

B(r) = (Bb + G~T)u z . 

The Bloch equation is simplified, and the experiment more readily understood, by transformation into a frame 
of reference rotating at the frequency (Dq=A, B~ about the z-axis whereupon: 

dM'fr t) 

-=-^-i = yM\r> t)x(G + r)u : , 

dt 

The transverse magnetization may be described in this frame by a complex variable, m, the real and imaginary 
parts of which represent the real and imaginary components of observable magnetization respectively: 


The components of the Bloch equation are hence reduced to 

dML(r t t) 


dt 
6m(r, t) 


= 

= - iy{G * r)m{r t 0- 


dr 

The z-component of the magnetization is constant. The evolution of the transverse magnetization is given by 

m(r,i) = m(T\0)cxp[-i£2(T-)/] (B1.14.2) 

where Q(r)=yG.r is the offset frequency relative to the resonance frequency <Dq. Spin position is encoded 
directly in the offset frequency Q. Measurement of the offset frequency forms the basis of the frequency 
encoding of spatial information, the first building block of MRI. It is discussed further in subsequent sections. 
In a given time t = x the transverse magnetization accumulates a spatially varying phase shift, co(r)x. 
Measurement of the phase shift forms the basis of another building block, the second encoding technique, 
phase encoding, which is also further discussed below. However, before proceeding with further discussion of 
either of these two, we turn our attention to the third key component of an imaging experiment — slice 
selection. 

B1. 14.2.1 SLICE SELECTION 

A slice is selected in NMR imaging by applying a radio frequency (RF) excitation pulse to the sample in the 
presence of a magnetic field gradient. This is in contrast to spatial encoding, where the magnetization 
following excitation is allowed to freely evolve in the presence of a gradient. A simple appreciation of how 


slice selection works is afforded by comparing the spread of resonance frequencies of the nuclei in the sample 
with the frequency bandwidth of the RF pulse. The resonance frequencies of nuclear spins in a sample placed 
centrally in a magnetic field, i? , with a superimposed constant gradient G vary linearly across the sample 
between cd ±y Gd § /2 where d s is the dimension of the sample in the gradient direction. To a first 
approximation, the radio -frequency excitation pulse (of duration t w and carrier frequency a> ) contains 
frequency components in the range co ±7r/t w . If the pulse bandwidth is significantly less than the spread of 
frequencies within the sample, the condition for a so called soft pulse, then the pulse excites nuclei only in a 
central slice of the sample, perpendicular to the gradient, where the frequencies match. A slice at a position 
other than the centre of the sample can be chosen by offsetting the excitation carrier frequency, <Dq. 

A more detailed description of the action of an arbitrary pulse on a sample in a gradient can be obtained from 
a solution of the Bloch equations, either numerically or using advanced analytic techniques. In general this is 
complicated since the effective field in the rotating frame is composed not only of the spatially varying 
gradient field in the z-direction but also the transverse excitation field which, in general, varies with time. 
What follows is therefore an approximate treatment which nonetheless provides a surprisingly accurate 
description of many of the more commonly used slice selection pulses [4]. 

Suppose that a gradient, G=G z w z , is applied in the z-direction. In a local frame of reference rotating about the 
combined polarizing and gradient fields at the frequency co=03 +y(G.r), an excitation pulse B^(i) applied at the 
central resonance 
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frequency of a> and centred on t = is seen to rotate at the offset frequency y(G.r) and therefore to have 
components given by 

The excitation field is the only field seen by the magnetization in the rotating frame. The magnetization 
precesses about it. Starting from equilibrium (M =M w z ), transverse components are created and develop 
according to 

dniir.t) 


ir ;) T / t V 


The simplifying approximation of a linear response is now made, by which it is assumed that rotations about 
different axes may be decoupled. This is only strictly valid for small rotations, but is surprisingly good for 
larger rotations too. This means that M f z {r, t) s= Aio( inconstant. Accordingly, at the end of the pulse the 

transverse magnetization is given by 


m 




/ B 


(B1.14.3) 


itOexpC-iycG-rJOdr. 


The integral describes the spatial amplitude modulation of the excited magnetization. It represents the 
excitation or slice profile, g(z), of the pulse in real space. As B^ drops to zero for t outside the pulse, the 
integration limits can be extended to infinity whereupon it is seen that the excitation profile is the Fourier 
transform of the pulse shape envelope: 

B l {t)eKp{-i Y G..tz)dt. 

00 


For a soft pulse with a rectangular envelope 


Bi(0 = 


B\ for - f f <(<y 
otherwise 
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and a carrier frequency resonant at the position z=z Q the excitation profile is sine shaped. In normalized form, 
it is: 


SU - Zg) . f 1 1 


where 


sijic(jr) = 


sin(jT^) 


7TX 


A sine-shape has side lobes which impair the excitation of a distinct slice. Other pulse envelopes are therefore 
more commonly used. Ideally, one would like a rectangular excitation profile which results from a sine- 
shaped pulse with an infinite number of side lobes. In practice, a finite pulse duration is required and therefore 
the pulse has to be truncated, which causes oscillations in the excitation profile. Another frequently used pulse 
envelope is a Gaussian function: 

B,(/) = fi,(0)e-<" J 


which produces a Gaussian slice profile: 


g(z - Zo) 
giO) 


e -a|/(4*) 


where £\ = Y ^z( z-z o) an< ^ a ^ s a P u l se-W idth parameter. The profile is centred on z Q at which position the 
transmitter frequency is resonant. The full width at half maximum of the slice, A 1/2 , is determined by the 
constant: 


4 


yG z 

For a given half width at half maximum in the time domain, Atf/i = 2j—, the slice width A z 1/2 decreases 
with increasing gradient strength G . 

Closer examination of equation B 1.14. 3 reveals that, after the slice selection pulse, the spin isochromats at 
different positions in the gradient direction are not in phase. Rather they are rotated by i exp (_ \y c . z ^ ) an d 

therefore have a phase (y G z zt w -7i)/2. Destructive interference of isochromats at different locations leads to 
cancellation of the total transverse magnetization. 

The dephased isochromats may be refocused by applying one or more RF and trimming gradient pulses after 
the excitation. In the pulse sequence shown in figure B 1.14.1 the magnetization is refocused by a negative 
trimming z-gradient pulse of the same amplitude, but half the duration as the original slice selection gradient 
pulse. The trimming gradient causes a precession of the magnetization at every location which exactly 
compensates that occurring during the initial excitation (save for the constant factor n). A similar effect may 
be achieved using a (non-selective) 180° pulse and a trimming gradient pulse of the same sign. 


B1. 14.2.2 FREQUENCY ENCODING 

Once a slice has been selected and excited, it is necessary to encode the ensuing NMR signal with the 
coordinates of nuclei within the slice. For each coordinate (x and j) this is achieved by one of two very closely 
related means, frequency encoding or phase encoding [JJ. In this section we consider the former and in the 
next, the latter. In the section after that we show how the two are combined in the most common imaging 
experiment. 

As before, we note that the resonance frequency of a nucleus at position r is directly proportional to the 
combined applied static and gradient fields at that location. In a gradient G=G x w x , orthogonal to the slice 
selection gradient, the nuclei precess (in the usual frame rotating at co ) at a frequency co=y G xx . The observed 
signal therefore contains a component at this frequency with an amplitude proportional to the local spin 
density. The total signal is of the form 


S(t v )= / dxm(x)e-' ,rG * xr < 

J -OD 


from which it is seen that the spin density in the x-direction is recovered by a Fourier transform of the signal 
with respect to time 

mix) oc I tit* Stf^sxpOyGxXtj), 

In practice, it is generally preferable to create and record the signal in the form of an echo well clear of pulse 


ring-down and other artifacts associated with defining the zero of time. This can be done either by first 
applying a negative gradient lobe followed by a positive gradient — a gradient echo — or by including a 180° 
inversion pulse between two positive gradients — a spin echo. Figure B 1.1 4.1 demonstrates the spin echo. A 
trimming x-gradient of duration t is placed before the 180° pulse which inverts the phase of the 
magnetization, so that, with reference to figure B 1.14.1 . 

m (r, t x ) = m(r, 0) e ij,c ^'> e - ijfC * x '*. 


The maximum signal appears at the echo centre when the exponent disappears for all r at time t x =t xQ . If the 
magnitude of the read gradient and the corresponding trimming gradient is the same, then t xQ =t . The 
magnetization profile is obtained by Fourier transforming the echo. 

B1. 14.2.3 PHASE ENCODING 

If a gradient pulse is applied for a fixed evolution time t the magnetization is dephased by an amount 
dependent on the gradient field. The signal phase immediately after the gradient varies linearly in the direction 

4 of duration t as shown in figu 

mir,G v ) = m{r.0)^ yG ^ 


of the gradient. For ajy-gradient G=G u of duration t as shown in figure B 1.14.1 we have: 


For phase encoding the phase twist is most commonly varied by incrementing G in a series of subsequent 
transients as this results in a constant transverse relaxation attenuation of the signal at the measurement 
position. The signal intensity as a function of G is 


-00 


The magnetization profile in the ^-direction is recovered by Fourier transformation with respect to G . 

B1.14.2.4 2D SPIN-ECHO FT IMAGING AND K-SPACE 

We now bring all the elements of the imaging experiment together within the typical spin- warp imaging 
sequence [ 10 ] previously depicted in figure B 1.14.1 . A soft 90° pulse combined with slice-selection gradient 
G z excites a slice in the x/y-plane. The spin isochromats are refocused by the negative z-gradient lobe. A 
subsequent spin echo, SE, is formed by a hard 180° pulse inserted after the slice selection pulse. The G - 
gradient either side of the 180° pulse first dephases and then rephases the magnetization so that an echo forms 
at twice the time separation, x, between the two pulses. The x-dimension is encoded by acquiring the echo in 
the presence of this gradient, often known as a readout gradient. The ^-dimension is phase encoded using the 
gradient G which is incremented in subsequently measured transients. The signal intensity S of the echo is 
the superposition of all the transverse magnetization originating from the excited slice. The acquired data set 
is described by: 

The magnetization density is recovered by a two-dimensional Fourier transform of the data with respect to t x 


and G 

From a more general point of view, components b, j=x,y,z of a wave vector k which describes the influence of 
all gradient pulses may be defined as follows: kj(f) = CyG t (t f )di'. For the 2D imaging pulse sequence 

discussed here, k =y G (t -t n ) and k =-y G t (the negative signs resulting from the inversion pulse). Spatial 
encoding is the sampling of A-space and the acquired data set is then: 


S{k s 


J 


dx dym(x. y) e 


-tk t .\ p-i^_r 


The image, i.e., the spatially resolved distribution of the magnetization m(x,y), is reconstructed by two- 
dimensional Fourier transformation with respect to k x and k . In the absence of other interactions and 
encodings discussed below, it represents the spin density distribution of the sample. 

There is of course no requirement to confine the slice selection to the z-gradient. The gradients may be used in 
any combination and an image plane selected in any orientation without recourse to rotating the sample. 


Another frequently used imaging method is gradient-recalled spin-echo imaging (figure B 1.1 4. 2). In this 
method, the 180° pulse of the spin warp experiment is omitted and the first lobe of the G x gradient is instead 
inverted. Otherwise the experiment is the same. As the refocusing 180° pulse is omitted, the echo time T E can 
be adjusted to be shorter than in the spin-echo version. Therefore gradient-recalled echo and variants of this 
technique are used when samples with shorter T 2 are to be imaged. On the other hand, this method is more 
susceptible to off-resonance effects, e.g., due to chemical shift or magnetic field inhomogeneities. 
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Figure Bl.14.2. Gradient-recalled echo pulse sequence. The echo is generated by deliberately dephasing and 
refocusing transverse magnetization with the readout gradient. A slice is selected in the z-direction and x- and 
^-dimension are frequency and phase encoded, respectively. 


The imaging methods just described require n transients each containing / data points to be acquired so as to 
construct a two-dimensional image of a matrix of n x 1 pixels. Since it is necessary to wait a time of the order 
of the spin-lattice relaxation time, 7^, for the spin system to recover between the collection of transients, the 
total imaging time is in excess of nTj for a single average. This may mean imaging times of the order of 
minutes. The A-space notation and description of imaging makes it easy to conceive of single transient 


imaging experiments in which, by judicious switching of the gradients so as to form multiple echoes, the 
whole of A-space can be sampled in a single transient. 

Techniques of this kind go by the generic title of echo-planar imaging methods [5, 6, 7 and 8] and in the case 
of full three-dimensional imaging, echo-volumar imaging. A common echo-planar imaging pulse sequence — 
that for blipped echo-planar imaging — is shown in figure Bl.14.3 . Slice selection is as before. The alternate 
positive and negative pulses of x-gradient form repeated gradient echoes as A-space is repeatedly traversed in 
the positive and negative x-directions. These echoes are used for frequency encoding. The initial negative 
pulse of ^-gradient followed by the much smaller pulses of positives-gradient ensure that each traverse in the 
x-direction is for a different value of k , starting from an extreme negative value. 
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Figure Bl.14.3. Echo-planar imaging (EPI) pulse sequence. In analogy to the pure gradient-echo recalled 
pulse sequence a series of echoes GE1-GE6 is refocused by alternatingly switching between positive and 
negative readout gradients. During the readout gradient switching a small phase-encoding gradient pulse 
(blip) is applied. The spatial phase encoding is hence stepped through the acquired echo train. 

In cases where it is not possible to rapidly switch the gradients, frequency-encoded profiles may be acquired 
in different directions by rotating the gradient and/or sample orientation between transients. The image is 
reconstructed using filtered back-projection algorithms [9, 10]. The two-dimensional raster of k- space for the 
spin warp experiment shown in figure B 1.14.1 is shown in figure B 1.1 4.4(a) and that for the blipped echo- 
planar method in figure B 1.1 4.4(b) . The raster for back-projection methods is shown in figure B 1.1 4.4(c) . 
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Figure Bl.14.4. k-space representations of the (a) spin-echo imaging pulse sequence in figure B 1.14.1 , (b) 
echo-planar imaging sequence in figure B 1.14. 3 and (c) back-projection imaging. In (a) and (b) the 
components of the wave vectors k x and k represent frequency and phase encoding, respectively. For back- 
projection (c) solely a readout gradient for frequency encoding is used. The direction of this gradient is 
changed stepwise from to 180° in the x/y-plane (laboratory frame) by appropriate superposition of the x- and 
^-gradient. The related wave vector k R then rotates around the origin sampling the k x /k -plane. As the 
sampled points are not equidistant in k-space, an image reconstruction algorithm different from the two- 
dimensional Fourier transform has to be used [9, 10 ]. 

B1.14.2.5 RESOLUTION 

The achievable spatial resolution is limited by several effects. The first is the maximum gradient strength and 
encoding times available. Bearing in mind equation B 1.1 4.1 and f' im = 1/ A/ma* the pixel size resulting from 

the Fourier transform of the frequency encoded echo is given by 


Ax = 


2jf 


yG x t_ 


ma* 

.r 


(B1.14.4) 
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where tj***is the total data acquisition time. The maximum useful encoding time is limited by the transverse 
relaxation time of the nuclei, Tj. Therefore the best resolution which can be achieved is of the order of [14] 


Ax^ = 


2jt 
YG X T 2 


(B1.14.5) 


A similar result holds for phase encoding where the gradient strength appearing in the expression for the 
resolution is the maximum gradient strength available. According to these results the resolution can be 
improved by raising the gradient strength. However, for any spectrometer system, there is a maximum 
gradient strength which can be switched within a given rise time due to technical limitations. Although 
modern gradient coil sets are actively shielded to avoid eddy currents in the magnet, in reality systems with 
pulsed gradients in excess of 1 T m~ are rare. Since the T 9 of many commonly imaged, more mobile, samples 
is of the order of 10 ms, resolution limits in H imaging are generally in the range one to ten micrometres. At 
this resolution, considerable signal averaging is generally required in order to obtain a sufficient signal to 
noise ratio and the imaging time may extend to hours. Moreover, a slice thickness of typically 500 |um, which 
is significantly greater than the lateral resolution, is frequently used to improve signal to noise. In solids, T 2 is 
generally very much shorter than in soft matter and high resolution imaging is not possible without recourse 
either to sophisticated line narrowing techniques [11], to magic-echo refocusing variants [2] or to very high 
gradient methods such as stray field imaging [13]. 

Motion, and in particular diffusion, causes a further limit to resolution [14, 15 ]. First, there is a physical 
limitation caused by spins diffusing into adjacent voxels during the acquisition of a transient. For water 
containing samples at room temperature the optimal resolution on these grounds is about 5 |um. However, as 
will be seen in subsequent sections, diffusion of nuclei in a magnetic field gradient causes an additional 

attenuation of the signal in the time domain. In the presence of a steady gradient, it is exp(-y 2 G 2 Dt 3 /3). 
Hence, the linewidth for spins diffusing in a gradient is of the order of 

where D is the diffusion coefficient, so that the best resolution becomes in analogy to the derivation of the 
equation B 1.1 4. 4 and equation B 1 . 14.5 


Ax^ 


-»(£t 


In practice, internal gradients inherent to the sample resulting from magnetic susceptibility changes at internal 
interfaces can dominate the applied gradients and lead to strong diffusive broadening just where image 
resolution is most required. Again the resolution limits tend to be on the ten micrometre scale. 
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B1.14.3 CONTRASTS IN MR IMAGING 

Almost without exception, magnetic resonance 'images' are more than a simple reflection of nuclear spin 
density throughout the sample. They crudely visualize one or more magnetic resonance measurement 
parameters with which the NMR signal intensity is weighted. A common example of this kind is spin- 
relaxation-weighted image contrast. Mapping refers to encoding of the value of an NMR measurement 
parameter within each image pixel and thereby the creation of a map of this parameter. An example is a 
velocity map. The power of MRI compared to other imaging modalities is the large range of dynamic and 
microscopic structural contrast parameters which the method can encode, 'visualise' and map. 

B1. 14.3.1 RELAXATION 

Transverse relaxation weighting is perhaps the most common form of contrast imposed on a magnetic 


resonance image. It provides a ready means of differentiating between more mobile components of the sample 
such as low viscous liquids which are generally characterized by long T 2 values of the order of seconds and 
less mobile components such as elastomers, fats and adsorbed liquids with shorter T 2 values of the order of 
tens of milliseconds [6]. Transverse relaxation contrast is, in fact, a natural consequence of the spin warp 
imaging technique described in the previous section. As already seen, data are recorded in the form of a 
spatially encoded spin echo. Only those nuclides in the sample with a T 2 of the order of, or greater than, the 
echo time, T E =2x^2t . contribute significantly to the echo signal. Consequently, the image reflects the 
distribution of nuclides for which T 2 > T £ . Often, a crude distinction between two components in a sample, 
one more mobile than the other, is made on the basis of a single ^-weighted image in which the echo time is 
chosen intermediate between their respective T 2 values. For quantitative remapping, images are recorded at a 
variety of echo times and subsequently analysed by fitting single- or multi-modal relaxation decays to the 
image intensity on a pixel by pixel basis. The fit parameters are then used to generate a true T^-map in its own 
right. 

An example of the application of ^-weighted imaging is afforded by the imaging of the dynamics of 

chemical waves in the Belousov-Zhabotinsky reaction shown in figure B 1.14. 5 [16]. In these images, bright 

bands correspond to an excess of Mn 3+ ions with a long T 2 and dark bands to an excess of Mn 2+ ions with a 
short T 2 . 
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Figure Bl.14.5. ^-weighted images of the propagation of chemical waves in an Mn catalysed Belousov- 
Zhabotinsky reaction. The images were acquired in 40 s intervals (a) to (f) using a standard spin echo pulse 
sequence. The slice thickness is 2 mm. The diameter of the imaged pill box is 39 mm. The bright bands 


correspond to an excess of Mn 3+ ions with long T 9 , dark bands to an excess of Mn with a short T 1 . (From 
[16]). 

Another powerful contrast parameter is spin-lattice, or T^ relaxation. Spin-lattice relaxation contrast can 
again be used to differentiate different states of mobility within a sample. It can be encoded in several ways. 
The simplest is via the repetition time, 7 R , between the different measurements used to collect the image data. 
If the repetition time is sufficiently long such that T R )) T 1 for all nuclei in the sample, then all nuclei will 
recover to thermal equilibrium between measurements and will contribute equally to the image intensity. 
However, if the repetition time is reduced, then those nuclei for which T R < T 1 will not recover between 
measurements and so will not contribute to the subsequent measurement. A steady state rapidly builds up in 
which only those nuclei with 7^ « T R contribute in any significant manner. As with ^-contrast, single 
images recorded with a carefully selected T R may be used to select crudely a short T ] component of a sample. 
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The mathematical description of the echo intensity as a function of T 2 and 7^ for a repeated spin-echo 
measurement has been calculated on the basis that the signal before one measurement cycle is exactly that at 
the end of the previous cycle. Under steady state conditions of repeated cycles, this must therefore equal the 
signal at the end of the measurement cycle itself. For a spin-echo pulse sequence such as that depicted in 
figure B 1.14.1 the echo magnetization is given by [ 17 ] 

M« h0 = M w -^— _i ^ smtf 

I + cosflexp(— ^) 

where M Q is the equilibrium magnetization and is the tip angle of the radio-frequency excitation pulse and 
where it is assumed that there is total dephasing of the magnetization between cycles. Other expressions 
applicable to other situations are to be found in the literature. In practice, some kind of relaxation- weighting 
of image contrast is always present and can hardly be avoided. 

Other methods to encode 7^ contrast include saturation recovery [18, 19] and 7^ nulling techniques [20]. In 
the latter, a 180° pulse is applied some time T^ before the image data acquisition. This pulse inverts the 
magnetization. In the interval T^ the magnetization recovers according to [1 - 2exp(W/r i )] so that at the time 
of image data excitation it is [1 - 2 exp^T^/T^)]. Nuclei for which 7^=0.6937^ have zero magnetization at 
this time and so do not contribute to the signal intensity. This method may be used to suppress a large 
background component in an image, such as that due to bulk water. With saturation recovery, a train of radio- 
frequency pulses is applied to the sample some time T SR prior to the data acquisition sequence. The train of 
pulses, which are often applied with ever decreasing spacing between them, serves to saturate the equilibrium 
magnetization. Only those nuclides for which T SR <T^ recover to equilibrium prior to the image acquisition 
proper and so only these nuclei contribute to the image intensity. Multiple images recorded as a function of 
T SR may be used to build a true 7^ -map by fitting a relaxation recovery curve to the data on a pixel by pixel 
basis. An example of brine in a sandstone core is depicted in Figure B1.14.6 [ 21 ]. Both M Q - and 7^ -maps, 
which were obtained from fitting with a stretched exponential, clearly show layers in the stone. The spin- 
lattice relaxation presumably correlates with spatially varying pore sizes and surface relaxivity. 
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Figure Bl.14.6. 7^-maps of a sandstone reservoir core which was soaked in brine, (a), (b) and (c), (d) 
represent two different positions in the core. For 7^ -contrast a saturation pulse train was applied before a 
standard spin-echo imaging pulse sequence. A full 7^ -relaxation recovery curve for each voxel was obtained 
by incrementing the delay between pulse train and imaging sequence. M Q - ((a) and (c)) and 7^ -maps ((b) and 
(d)) were calculated from stretched exponentials which are fitted to the magnetization recovery curves. The 
maps show the layered structure of the sample. Presumably 7^ -relaxation varies spatially due to 
inhomogeneous size distribution as well as surface relaxivity of the pores. (From [21].) 

B1. 14.3.2 CHEMICALLY RESOLVED IMAGING 

In many instances, it is important that some form of chemical selectivity be applied in magnetic resonance 
imaging so as to distinguish nuclei in one or more specific molecular environment(s). There are many ways of 
doing this and we discuss here just three. The first option is to ensure that one of the excitation RF pulses is a 
narrow bandwidth, frequency selective pulse applied in the absence of any gradient [22]. Such a pulse can be 
made specific to one particular value of the chemical shift and thereby affects only nuclei with that chemical 
shift. In practice this can be a reasonable method for the specific selection of fat or oil or water in a mixed 
hydocarbon/water system. 

A higher level of sophistication involves obtaining a full chemical shift spectrum within each image pixel. 
The chemical information can be encoded either before or after the image encoding. The important 
requirement is that the spin system is allowed to evolve without gradients and that data are recorded as a 
function of this chemical shift evolution time. The data can then be Fourier transformed with respect to the 
evolution time as well as the standard imaging variables so as to yield the spatially resolved chemical shift 
spectrum. In this respect chemical shift imaging is like a four-dimensional imaging experiment [23]. A post- 
encoding sequence suitable for this purpose is shown in figure B 1.14.7 . The chemical shift encoding part — a 
spin echo in the absence of gradients — comes after the image encoding part — a two-dimensional phase 
encoding experiment. The 180° pulse refocuses all chemical-shift-induced dephasing occurring during the 
spatial encoding. 
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Figure Bl.14.7. Chemical shift imaging sequence [23]. Bothx- and ^-dimensions are phase encoded. Since 
line-broadening due to acquiring the echo in the presence of a magnetic field gradient is avoided, chemical 
shift information is retained in the echo. 

The third alternative is a more robust, sensitive and specialized form of the first, in that only hydrogen nuclei 
indirectly spin-spin coupled to C in a specific molecular configuration are imaged. In achieving selectivity, 
the technique exploits the much wider chemical shift dispersion of C compared to H. The method involves 
cyclic transfer from selected H nuclei to indirectly spin-spin coupled 13 C nuclei and back according to the 
sequence 
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Called CYCLCROP (cyclic cross polarization) [24], the method works by first exciting all H magnetization. 
Cross polarization pulses are then applied at the specific Larmor frequencies of the H- C pair of interest so 
as to transfer coherence from H to 13 C. The transfer pulses must satisfy the Hartmann-Hahn condition 


i? 1H and B 1C are the excitation magnetic field strengths and must be applied for a time of the order of \IJ 


13/ 


where J is the spin-spin coupling constant. Following magnetization transfer, the C magnetization is stored 
along the z-axis and the remaining H magnetization from other molecular groupings is destroyed by a series 
of pulses and homospoil gradients. The stored magnetization specific to the coupled ^-^C pair is then 
returned to the H by a second pair of cross polarization pulses. CYCLCROP chemical selective excitation 
may replace the initial excitation in a standard 
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2DFT imaging experiment with the slice selection moved to the 180° pulse in order to yield a C edited 
image. A number of pulse schemes for the cross coupling are known, each with various advantages in terms 
of low radio-frequency power deposition, tolerance of pulse artifact, breadth of the spectral bandwidth etc. In 
figure B 1.14. 8. CYCLCROP has been used to map C labelled sucrose in the stem of a castor bean seedling 
[251. Its arrival and accumulation are visualized in a series of subsequently acquired C-selective images. 
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Figure Bl.14.8. Time course study of the arrival and accumulation of C labelled sucrose in the stem of a 
castor bean seedling. The labelled tracer was chemically, selectively edited using CYCLCROP (cyclic cross 
polarization). The first image in the upper left corner was taken before the incubation of the seedling with 
enriched hexoses. The time given in each image represents the time elapsed between the start of the 
incubation and the acquisition. The spectrum in the lower right corner of each image shows the total intensity 

of 13 C nuclei. At later times, enriched sucrose is visible in the periphery of the stem. Especially high 
intensities are detected in the vascular bundles. The last image represents a micrograph of the stem structure 
showing the position of the vascular bundles (dark features). (From [25]). 
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B1.14.4 FLOW AND DIFFUSION 

NMR is an important technique for the study of flow and diffusion, since the measurement may be made 
highly sensitive to motion without in any way influencing the motion under study. In analogy to many non- 
NMR-methods, mass transport can be visualized by imaging the distribution of magnetic tracers as a function 
of time. Tracers may include paramagnetic contrast agents which, in particular, reduce the transverse 

relaxation time of neighbouring nuclei and therefore appear as r 2 -contrast in an image. The 13 C cross 
polarization method with enriched compounds may also be used as a tracer experiment. More sophisticated 
tracer methods include so called 'tagging' experiments in which the excitation of nuclei is spatially selective. 
The spatial evolution of the selected nuclei is followed. This example is discussed in section B 1.14.4.5. 


Generally, however, the application of tracer methods remains a rarity compared to methods which directly 
exploit the motion sensitivity of the NMR signal. The detection of motion is based on the sensitivity of the 


signal phase to translational movements of nuclei in the presence of magnetic field gradients [26, 27]. Using 
the large magnetic field gradient in the stray field of superconducting magnets, displacements as small as 10 

nm in slowly diffusing polymer melts can be detected. At the other extreme, velocities of the order of m s~ 
such as occur in blood-filled arteries, can be measured. 


B1. 14.4.1 COHERENT AND STATIONARY FLOW 


The displacement of a spin can be encoded in a manner very similar to that used for the phase encoding of 
spatial information [28, 29 and 30]. Consider a spiny with position r(t) moving in a magnetic field gradient G. 
The accumulated phase, (p., of the spin at time t is given by 




(B1.14.6) 


In order to encode displacement as opposed to average position, the gradient is applied in such a manner as to 
ensure that f 1 0(t T )dt f = 0- Generally, this means applying gradient pulses in bipolar pairs or applying uni- 

modal gradient pairs either side of a 180° RF inversion pulse, with the advantage that the necessary careful 
balancing of the gradient amplitudes is more straightforward. 

We first examine how this works for the case of coherent flow. A typical pulse sequence is shown in figure 
Bl.14.9 . This sequence creates a spin echo using two unipolar gradient pulses on either side of a 180° pulse. 
The duration of each gradient pulse of strength G^ T is 5. The centres of the gradient pulses are separated by A. 
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Figure Bl.14.9. Imaging pulse sequence including flow and/or diffusion encoding. Gradient pulses G D 
before and after the inversion pulse are supplemented in any of the spatial dimensions of the standard spin- 
echo imaging sequence. Motion weighting is achieved by switching a strong gradient pulse pair G yD (see 
solid black line). The steady-state distribution of flow (coherent motion) as well as diffusion (spatially 


incoherent motion) in a sample is encoded by incrementing G vD (see dashed grey lines). The measured data 
set then consists of two spatial and a motion-encoded dimension. Velocity and/or diffusion maps can be 
rendered by three-dimensional Fourier transformation. 

Under steady-state flow conditions (coherent motion), a Taylor series can be applied to describe the time- 
dependent position of the fluid molecules: 

r(t) = r + vr + -jftf 2 + >- - - 

If terms of higher order than linear in t are neglected, the transverse magnetization evolves in the presence of 
the first bipolar gradient pulse according to ( equation B 1.1 4. 2 and equation B 1.1 4. 6 ): 

mil) = m (0)e- , * G '™- [ : yG - v ' i Q<r<$. 

The phase of the transverse magnetization is inverted by the 180° pulse and the magnetization after the second 
gradient pulse and therefore at the echo centre is: 
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The echo phase does not depend on the initial position of the nuclei, only on their displacement, vA, occurring 
in the interval between the gradient pulses. Analysis of the phase of the echo yields a measure of flow velocity 
in a bulk sample. Spatial resolution is easily obtained by the incorporation of additional imaging gradients. 
One way of doing this is illustrated in figure B 1.14. 9 . The first part of the experiment is the same flow 
encoding experiment as just discussed. The velocity-encoded echo is the excitation for the subsequent 2DFT 
experiment which is as previously discussed. Where both stationary and moving spins are present, these 
superimpose in the image. A variety of methods exist for separating the two, including cycling the phase of 
the velocity encoding gradients or making measurements at two or more strengths of the velocity encoding 
gradient. In the latter, a wave number k v =yG v .v can be defined adding an additional dimension to spatial k- 
space. Fourier transformation of this dimension directly produces the velocity spectrum of each voxel. As an 
example, velocity maps of flow through an extruder are shown in figure B 1.1 4. 10 [31]. 
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Figure Bl. 14.10. Flow through an KENICS mixer, (a) A schematic drawing of the KENICS mixer in which 
the slices selected for the experiment are marked. The arrows indicate the flow direction. Maps of the z- 
component of the velocity at position 1 and position 2 are displayed in (b) and (c), respectively, (d) and (e) 
Maps of the x- and the ^-velocity component at position 1. The FOV (field of view) is 10 mm. (From [31].) 
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B1. 14.4.2 PULSATING MOTION 


Flow which fluctuates with time, such as pulsating flow in arteries, is more difficult to experimentally 
quantify than steady-state motion because phase encoding of spatial coordinate(s) and/or velocity requires the 
acquisition of a series of transients. Then a different velocity is detected in each transient. Hence the phase- 
twist caused by the motion in the presence of magnetic field gradients varies from transient to transient. 
However if the motion is periodic, e.g., v(r,t)=v sin{co v t +(|)q} with a spatially varying amplitude v =v (r), a 
pulsation frequency co v =co v (r) and an arbitrary phase (|> , the phase modulation of the acquired data set is 
described as follows: 

f 

dx dv dv m(x, >', v) e -*.*^,y*-*.H*W,T M w 


1 

-GO 


where ^=l p T R is the time at which the / p th transient is acquired. Since k and k y are 'incremented' for phase 
encoding in subsequent transients, they are linked to the phase twist caused by v(t) via the parameter / p . The 
reconstruction of the dimensions;; and v by Fourier transformation is therefore affected, making an 
interpretation difficult. 


Nevertheless, averaging provides information about the motional parameters such as velocity amplitude and 


pulsation frequency. If the repetition time, the delay between subsequent transients, is not equal to a multiple 
of the pulsation period the motion appears to be temporally uncorrelated. In this case, a temporal average over 
all velocity values is obtained by accumulating a sufficient number of transients. This causes a broad phase 
distribution resulting in signal attenuation similar to the one caused by diffusion, a spatially incoherent 
phenomenon. The images provide quantitative information about the distribution of motion and velocity 
amplitude. The temporal characteristics of the pulsation are detected by omitting phase encoding. Using a 
gradient pulse pair of constant magnitude for motion weighting the signal phase is then solely a function of 
the velocity. Since the y determined from the intensity modulation due to the motion in a series of transients. 

Figure B 1.14. 11 shows the application of averaging techniques for the characterization of pulsating motions 
which start on about the fourth incubation day in quail eggs [32]. Dark areas in the incoherent motion- 
weighted images in figure B 1.14. 11 and represent strong motion, white no uncorrelated motion. They are 
localized at the suspected position of the embryo above the egg yolk which is the black region in the middle 
of the egg. The signal attenuation strongly depends on the probed velocity component indicating the 
anisotropic nature of the motion. In figure Bl. 14. 12(a) and (b) a time series of profiles through the region of 
motion with spatially incoherent motion weighting were acquired before and after the start of pulsations. At 
the later incubation stage the modulation of the signal intensity due to temporally periodic motion is clearly 
visible. Fourier transformation ( figure Bl. 14. 12(d) ) reveals a pulsation frequency of about 0.4 Hz. 




<f a > 


Figure Bl. 14.11. Amplitude-weighted images of (temporally) uncorrelated motions in a quail egg at an 
incubation period at about 140 h. A standard gradient echo sequence supplemented with strong bipolar 
gradient pulses for motion weighting was used. A high number of transients (K = 490) was acquired for each 
phase-encoding step to adequately average out temporal fluctuation of the motion. The intensity in the images 
shown corresponds to the signal ratio with and without motion weighting. Light grey shades hence represent 
no signal attenuation, darker shades strong signal attenuation due to uncorrelated motion. Pixels with signal 
below the noise level are set black as is the case in the egg yolk (black region in the middle of the egg) due to 
comparatively short T^. The white double arrows indicate the probed velocity component. Both images show 
signal attenuation due to strong motion in the region above the egg yolk where the embryo presumably is 
located. Furthermore, the attenuation of the signal appears to be much stronger for the ^-velocity component 
than for the x-component indicating strongly anisotropic motion. The white bar represents 2 mm. (From [32].) 
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Figure Bl. 14.12. Study of the temporal fluctuation of motion in a quail egg at the incubation times 1 19 h and 
167 h. Spatial phase encoding of the ^-dimension was omitted to increase the rapidity of the imaging 
experiment. Profiles were obtained by ID Fourier transformation of echoes which were acquired in the 
presence of a readout gradient for frequency encoding of the x-coordinate. A strong gradient pulse pair for 
(spatially) incoherent motion weighting was applied during the evolution period of the magnetization. A series 
of subsequent single-scan profiles were measured at the two different incubation times 1 19 h (a) and 167 h 
(c). Temporal fluctuations of the signal intensity which are not visible in (a) reveal themselves at the later 
incubation stage. The intensity modulation which was caused by temporally fluctuating motion was analysed 
by Fourier transformation. The spectra which were calculated from the integral intensities of the pixels 
between the two dashed lines in (a) and (c) are displayed in (b) and (d) for the two incubation times. The line 
at Hz is due to the constant baseline offset. The arrows in (d) mark the peak representing a frequency of 0.4 
Hz of the pulsating motion. (From [32].) 

B1. 14.4.3 DIFFUSION AND PSEUDO-DIFFUSION 

If magnitude and/or direction of velocity vary on a length scale below spatial resolution, the detected motion 
is incoherent. (Self-) diffusion certainly falls into this category, but randomly oriented flow is also spatially 
incoherent. A well known example is the blood flow through brain capillaries, which are smaller than a voxel 
[33]. Incoherent flow is often referred to as pseudo-diffusion. An apparent diffusion coefficient which can be 
significantly bigger than the self-diffusion coefficient is then defined. Pulse sequences to measure coherent 
flow ( figure B 1.14.9 ) can also be used for (spatially) incoherent motion although the theory has to be 
reconsidered at this point [34, 35 and 36 ], 
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The observed magnetization in the echo is the superposition of all the spins y at different positions r-\ 


It is expanded into: 


m 


{T E )=m(Q)J2P(&j}d 


ty t u) 


where P((f>) is the probability of phase (p.. The underlying motional process influences this phase distribution. 
To demonstrate the principle, the simple case of normal isotropic diffusion will be discussed [27]. The 
solution of Fick's diffusion equation together with the central limit theorem implies that, for a constant 
gradient, a Gaussian phase distribution function with a mean squared phase twist cp 2 (7) applies at any instant 

leading to the transverse magnetization 


»i<0=iw(Q)c-^ u,/2 . (B1.14.7) 

The relationship between mean squared phase shift and mean squared displacement can be modelled in a 
simple way as follows: This motion is mediated by small, random jumps in position occurring with a mean 
interval x-. If the jump size in the gradient direction is e, then after n jumps at time t=m-, the displacement of 
a spin is 


E{nTj) = y%fli 


j=i 


where a- x is a randomly either +1 or -1. Hence, from the relation cp=y G D S E(nx), the phase shift distribution 
imposed by the diffusion measurement gradients G D of duration 8 is 


tp* = y 2 & 2 G 


w-(p.) 
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The summation averages to n. Using the definition of the diffusion coefficient, D=e 2 /(2x), and the diffusion 
time, A=nx-, equation B 1.14. 7 gives 

m{t) = m(Q)exp{-y 2 S 2 G 2 D DM 

This is the factor by which the echo magnetization is attenuated as a result of diffusion. More elaborate 
calculations, which account for phase displacements due to diffusion occurring during the application of the 
gradient pulses yield 

m(t) = m(0)*xp[-y 2 B 2 G 2 o D{& - 5/3)]. 

This expression can be used for pulsed field gradient spin-echo experiments and also for spin-echo 
experiments in which the gradient is applied continuously. 


A measure of the echo attenuation within each pixel of an image created using the pulse sequence of figure 
Bl.14.9 perhaps by repeating the experiment with different values of G D and/or S, gives data from which a 
true diffusion map can be constructed [ 37 , 38 ]. 

In principle, it is possible to measure both flow and diffusion in a single experiment: the echo is both phase 
shifted and attenuated. It is also possible to account for the presence of background gradients arising from the 
sample itself which can be significant. The exact form of the echo in these circumstances has been calculated 
and the results are to be found in the literature [39, 40, 41 and 42]. Furthermore, in systems in which T 2 is 
relatively short compared to 7^, the stimulated echo comprising three 90° pulses can be used instead of a 90°- 
7-180° spin-echo sequence [43]. In this case, the fact that the velocity encoding time, A, can be made 
significantly longer outweighs the fact that only half of the magnetization is refocused after the third pulse. 

B1. 14.4.4 RESTRICTED DIFFUSION PULSED FIELD GRADIENT MICROSCOPY 

Diffusometry and spatially resolved magnetic resonance are usefully combined in an alternate technique to 
imaging which is increasingly coming to the fore. It has been dubbed both <y-space microscopy [4] and 
restricted diffusion pulsed field gradient (PFG) microscopy. The method probes the microstructure of a 
sample on the micrometre scale — significantly smaller than conventional MRI permits — by measuring the 
effects of restricted diffusion of a translationally mobile species. The technique yields parameters 
characteristic of the average structure of the whole sample (the experiment done without spatial resolution) or 
of the sample within an image pixel (sub-millimetre scale) if the experiment is done in combination with 
conventional imaging. The method works because of a Fourier relationship which exists between the observed 
echo attenuation in a pulsed field gradient diffusometry experiment and the propagator, which describes the 

molecular motion P(r;r ,t). P(r;r ,t) is the conditional probability of finding a diffusing spin at location r at 
time t given its initial location, r. This propagator depends intimately on the microstructure of the sample. 
Following on from the above analysis, the echo attenuation is 
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where p(r) is the spin density defined by the microstructure. In the long timescale limit, the diffusing spins 
sample the whole of the available space and P(r;r ,t) becomes independent of r such that 


P(r; r , oo) = p(r) = p(r) 


then 


■u 


E(G D *^cc)=\ I p(r)exp[\yA{a D . r)]dr 


= Mq) I 2 


where q=(2n) y5G D and S(q) is a structure factor, defined by the above expression with direct analogies in 
optics and neutron scattering. Measurement of echo attenuation and hence S(q) and calculations of 
microstructure have been reported for both model and real systems including porous media and emulsions. 

As an example figure B 1.14. 13 shows the droplet size distribution of oil drops in the cream layer of a decane- 
in- water emulsion as determined by PFG [45]. Each curve represents the distribution at a different height in 
the cream with large drops at the top of the cream. The inset shows the PFG echo decay trains as a function of 


height and the curves to which the data were fitted using a Stokes-velocity-based model of the creaming 
process. 
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Figure Bl. 14.13. Derivation of the droplet size distribution in a cream layer of a decane/water emulsion from 
PGSE data. The inset shows the signal attenuation as a function of the gradient strength for diffusion 
weighting recorded at each position (top trace = bottom of cream). A Stokes-based velocity model (solid 
lines) was fitted to the experimental data (solid circles). The curious horizontal trace in the centre of the plot is 
due to partial volume filling at the water/cream interface. The droplet size distribution of the emulsion was 
calculated as a function of height from these NMR data. The most intense narrowest distribution occurs at the 
base of the cream and the curves proceed logically up through the cream in steps of 0.041 cm. It is concluded 
from these data that the biggest droplets are found at the top and the smallest at the bottom of the cream. 
(From [45].) 
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B1. 14.4.5 PLANAR TAGGING 


A more qualitative means of visualising flows is by 'multi -plane tagging' [46, 47]. The principle of tagging 
experiments is to excite magnetic resonance only in stripes across the image plane and to observe the spatial 
evolution of these stripes with time (figure B 1.14. 14). The excitation can be achieved using a comb of radio- 
frequency pulses each of narrow flip angle applied to a sample in a magnetic field gradient ( figure Bl.14.15 
(a)) [48]. The Fourier transform of this excitation shows distinct maxima occurring at frequency intervals 
given by the reciprocal of the pulse spacing. The overall excitation bandwidth — which must be sufficient to 
cover the frequency spread of all nuclei in the sample — is determined by the individual pulse widths, and the 
sharpness of the maxima by the number of pulses. This frequency response is illustrated in figure B 1.14. 15(b) 
from which it is clear how the excitation is achieved. The action of the comb can be understood more 
qualitatively as follows. Each pulse tips all spins by a small angle. Between the pulses the spins precess by an 
amount dependent on their position in the gradient. Only at positions where the precessional phases between 
pulses is equal to 2nn do the pulses have a cumulative effect in tipping the magnetization through a large 
angle. 
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Figure Bl. 14.14. Pulse sequence for multi-plane tagging. A magnetization saturation grid is prepared in the 
multi-plane tagging section. After a certain time of flight T ¥ this grid is imaged using a standard imaging 
pulse sequence. The motion of tagged spins is visualized by displacements of the grid lines. 
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Figure Bl. 14.15. Preparation of a magnetization grid by means of DANTE pulse trains. The pulse sequence 
for one-dimensional tagging is shown in (a). As depicted in (b) the spectrum of a pulse train consists of a 
comb of peaks. To tag spins in certain regions of the sample the DANTE pulse train is applied while a 
magnetic field gradient G x is switched. The magnetization is then saturated in planes which are located at 
equidistant positions corresponding to the spectral peaks of the DANTE pulse train and which are orthogonal 
to the spatial coordinate x=co/(y G x ). A magnetization saturation grid is prepared by subsequently using the 
sequence (a) with magnetic field gradients in the two directions of the imaging plane coordinates. Saturated 
spins (bearing transverse magnetization) are visualized as black grid lines in an image. These lines of 
thickness A x=2nv rn /(y G Y ) are separated by d=2n/(y G y t) where x is the delay from one RF pulse to the next. 


Following excitation of this kind, a standard spin warp imaging protocol can be used to create the image. By 
varying the delay between excitation and image acquisition, flow is visualized by different degrees of 
distortions of the saturated magnetization grid. An example of a rotating cylinder filled with water-oil mixture 
is shown in figure Bl.14.16 [49]. 
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Figure Bl.14.16. Multi-plane tagging experiment for a 1:1 water (bottom) and oil (top) mixture in the 

cylinder, which rotates clockwise. The rotation rate was 0.5 rev s , and the tagging delay times T^ are (a) 1 
ms, (b) 25 ms, (c) 50 ms and (d) 100 ms. The interface between the fluids is clearly shown. The misalignment 
in the horizontal direction at the interface is caused by chemical shift between water and oil. Flow was mainly 
detected in a thin layer near the cylinder and a layer along the water-oil interface in the centre that flows to 
the right. (From [49].) 
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B1.15EPR methods 

Stefan Weber 


B1.15.1 INTRODUCTION 

Systems containing unpaired electron spins, such as free radicals, biradicals, triplet states, most transition 
metal and rare-earth ions and some point defects in solids form the playground for electron paramagnetic 
resonance, EPR, also called electron spin resonance, ESR, or electron magnetic resonance, EMR. The 
fundamentals of EPR spectroscopy are very similar to the more familiar nuclear magnetic resonance (NMR) 
technique. Both deal with interactions of electromagnetic radiation with magnetic moments, which in the case 
of EPR arise from electrons rather than nuclei. With few exceptions, unpaired electrons lead to a non- 
vanishing spin of a particle that can be used as a spectroscopic probe. In EPR spectroscopy such molecules are 
studied by observing the magnetic fields at which they come into resonance with monochromatic 
electromagnetic radiation. Since species with unpaired electron spins are relatively rare compared to the 
multitude of species with magnetic nuclei, EPR is less widely applicable than NMR or even optical 
spectroscopy which has clear advantages with its ability to detect diamagnetic as well as paramagnetic states. 
What appears to be a drawback, however, can turn into an invaluable advantage, for instance, when 
selectively studying paramagnetic ions or molecules buried in a large protein environment. With its inherent 
specificity for those reactants, intermediates or products that carry unpaired electron spins, together with its 
high spectral resolution, EPR has excelled over many other techniques in, for example, unravelling the 
primary events of photosynthesis. Similarly, many key intermediates in this process have been identified by 


EPR. By appending a paramagnetic fragment — a so-called 'spin-label' — to a molecule of biological 
importance, in effect one has acquired a probe to supply data on the interactions and dynamics of biological 
molecules. Very many systems of biomedical interest have had their structure and function elucidated by 
application of modern EPR techniques. Also EPR has allowed chemists to probe into the details of reaction 
mechanisms by using the technique of spin trapping to identify reactive radical intermediates. As one last 
example of the many successes of EPR the identification of paramagnetic species in insulators and 
semiconductors is worth mentioning. 

More than 50 years after its invention by the Russian physicist Zavoisky (for a review of the EPR history see 
[1]), advanced EPR techniques presently applied in the above mentioned areas of physics, chemistry and 
biology include time-resolved continuous wave (CW) and pulsed EPR (Fourier transform (FT) EPR and 
electron-spin echo (ESE) detected EPR) at various microwave (MW) frequencies and multiple-resonance EPR 
methods such as electron-nuclear double resonance (ENDOR) and electron-nuclear-nuclear TRIPLE 
resonance in the case of electron and nuclear transitions and electron-electron double resonance (ELDOR) in 
the case of different electron spin transitions. High-field/high-frequency EPR and ENDOR have left the 
developmental stage, and a wide range of significant applications continues to emerge. The range of multi- 
frequency EPR spectroscopy is now extending from radiofrequencies (RFs) in near-zero fields up to several 
hundred gigahertz in superconducting magnets or Bitter magnets. 

In this article only the most important and frequently applied EPR methods will be introduced. For more 
extensive treatments of CW and pulsed EPR the reader is referred to some excellent review articles that will 
be specified in the respective sections of this article. A good starting point for further reading is provided by a 
number of outstanding textbooks which have been written on the various aspects of EPR in general [2, 3, 4, 5, 
6, 7 and 8]. Interested readers 


might also appreciate the numerous essays on various magnetic resonance topics that are published on a 
bimonthly basis in the educational journal 'Concepts in Magnetic Resonance'. 


B1.15.2 EPR BACKGROUND 

B1. 15.2.1 SPINS AND MAGNETIC MOMENTS 

Historically, the recognition of electron spin can be traced back to the famous Stern-Gerlach experiment in 
the early 1920s. Stern and Gerlach observed that a beam of silver atoms was split into two components 
deflected in different directions when passing through an inhomogeneous magnetic field. The observation 
could only be explained with the concept of a half- integral angular momentum ascribed to an intrinsic spin of 
the electron. EPR spectroscopy relies on the behaviour of the electron angular momentum and its associated 
magnetic moment in an applied magnetic field. 

If the angular momentum of a free electron is represented by a spin vector £=(S x ,S ,S z ), the magnetic moment 
jlx s is related to S by 

(B1.15.1) 

where g e is a dimensionless number called the electron g-factor and P e = |e|/(2m e ) = 9.2740154* 10~ 24 J T~ is 
the Bohr magneton; e is the electronic charge, = h/(27i) is Planck's constant and m e is the electron mass. The 
negative sign in equation (bl.15.1) indicates that, because of the negative charge of the electron, the magnetic 
moment vector is antiparallel to the spin (since g e 0). In the quantum theory |u s and S are treated as (vector) 


operators. Suppose that the angular momentum operator Sis defined in units of , then S has the eigenvalues S 
(S+l), where S is either integer or half integer. The magnitude of the angular momentum itself is given by the 

square root of the eigenvalue of S 1 , which is . Any component of S (for example S ) commutes with S 2 , so 

that simultaneously eigenvalues of both S and S may be specified, which are S(S+1) and M s , respectively. 
M s has (2S+1) allowed values running in integral steps from -S to +S. 


Classically, the interaction energy of a magnetic moment |u s in an applied magnetic field B is 


(B1.15.2) 


For a quantum mechanical system |u s is replaced by the appropriate operator, equation (b 1.1 5.1) to obtain the 
Hamiltonian for a free electron in a magnetic field, 

(B1.15.3) 

If the magnetic field is B^ in the z-direction, B = (0,0,i? ), the scalar product simplifies and the Hamiltonian 
becomes 

(B1.15.4) 


The eigenvalues of this Hamiltonian are simple, being only multiples g e P e B Q of the eigenvalues of S . 
Therefore, the allowed energies are E Ms = g e P e M SB() . For a simple system of one unpaired electron, S = jand 

M s = ±2, which results in two energy states which are degenerate in zero field and whose energy separation 
increases linearly with B^. This is summarized in figure B 1.1 5.1 where the two states are also labelled with 
their eigenfunctions |+j )=|a) and |-j )=|P) to indicate the M s = +j and M s = -^ eigenstates for S = j, 
respectively. The lowest state has M s = -^ (since g £ 0), so that the projection of S along the z-axis, S z , is 

antiparallel to the field, but in accordance with physical expectation the z-component of the magnetic moment 
is parallel to the field (see equation (b 1.1 5.1) ). The splitting of the electron spin energy levels by a magnetic 
field is referred to as the Zeeman effect. 
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Figure Bl.15.1. Energy levels for an electron (S = ^) as a function of the applied magnetic field B^. Ey 2 = + i 
g e P e B Q and E_y 2 = ~i § e Pe^0 re P resent ^e energies of the M s = +^ and M s = -^ states, respectively. 

Under the influence of the external magnetic field B, the spin S and the magnetic moment |u s perform a 
precessional motion about the axis pointing along the direction of B. In the absence of additional magnetic 
fields, the angle between S and B does not change and the motion of the spin about the B axis may be 
illustrated as cones, as shown on the right-hand side of figure Bl.15.1 for the two possible orientations of S. 
The z-component of the magnetic moment is sharply defined, while the x- and ^-components are not, since 
they oscillate in the xy-plane. The precession frequency of |u s and S about B is called the Larmor frequency, 

CO = gePe B / ' hj - 


The transition between the two eigenstates can be induced by the application of microwave (MW) radiation 
with frequency co and its magnetic field vector linearly polarized in the xy-plane. The oscillating B 1 -field of 
this radiation can be formally decomposed into two counter-rotating circularly polarized components, one of 
which is rotating in the same direction as the Larmor precession of the spins. The other component of the MW 
field does not interact significantly with the electron spins and can be neglected. If co is different from a> the 
precessing magnetic moment will not seriously be affected by By for its component in the xy-plane will pass 
in and out of phase with B 1 and there will be no resultant interaction. Transitions occur only near the 
resonance condition of 


fo = ton = gcfitBiifh. (B1.15.5) 

On exact resonance, |i s and B 1 can remain in phase so the precessing magnetic moment experiences a 
constant field B 1 in the xy-plane (see figure Bl.15.1 . It will respond to this by precessing about it with 
frequency co 1 = g^Bj/fi. 

In the language of quantum mechanics, the time-dependent B 1 -field provides a perturbation with a 
nonvanishing matrix element joining the stationary states |oc) and |P). If the rotating field is written in terms of 
an amplitude B^ a perturbing term in the Hamiltonian is obtained 

W — g c fi c B\(S x QQs((ot) + S y sin(&>/)). (B1.15.6) 

The operators S x and S have matrix elements between states |a) and |P) or, in general, between states |M S ) 
and |M S ±1) and consequently induce transitions between levels adjacent in energy. 

If Bj( Bq first-order perturbation theory can be employed to calculate the transition rate for EPR (at 
resonance) 


Wp^ t =7T a ^p(0>). (B1.15.7) 


In equation (bl.15.7) p(co) is the frequency distribution of the MW radiation. This result obtained with 
explicit evaluation of the transition matrix elements occurring for simple EPR is just a special case of a much 
more general result, Fermi's golden rule, which is the basis for the calculation of transition rates in general: 

(B1.15.8) 


W„s<-jfa+i = 2ir\(M s \H r \M s + \)\ 2 p(a». 

Using the selection rule for allowed transitions the relative intensity for the transition from the state |M S ) to 
|M S +1) is given by 

Wm s ~m s h oc (S(S ^ 1) - M s (Ms + 1)). (B1.15.9) 


The transition between levels coupled by the oscillating magnetic field B 1 corresponds to the absorption of the 
energy required to reorient the electron magnetic moment in a magnetic field. EPR measurements are a study 
of the transitions between electronic Zeeman levels with A M s = ±1 (the selection rule for EPR). 

The g- factor for a free electron, g e = 2.002 319 304 386(20), is one of the most accurately known physical 

constants. For a magnetic field of 1 T ( = 10 G) the resonance frequency 03/(2tt) is 28.024 945 GHz, 
approximately three orders of magnitude larger than is required for any nuclear resonance (because 

P e /P N ~1836). This corresponds to a wavelength of about 10 mm and is in the microwave (MW) region of the 
electromagnetic spectrum. 

B1. 15.2.2 THERMAL EQUILIBRIUM, MAGNETIC RELAXATION AND LORENTZIAN LINESHAPE 

Application of an oscillating magnetic field at the resonance frequency induces transitions in both directions 
between the two levels of the spin system. The rate of the induced transitions depends on the MW power 
which is proportional to the square of cOj = y e B 1 (the amplitude of the oscillating magnetic field) (see equation 
(bl.15.7) ) and also depends on the number of spins in each level. Since the probabilities of upward (|P)— »|a)) 
and downward (|oc> — >|P» transitions are equal, resonance absorption can only be detected when there is a 
population difference between the two spin levels. This is the case at thermal equilibrium where there is a 
slight excess of spins in the energetically lower |P)-state. The relative population of the two-level system in 
thermal equilibrium is given by the Boltzmann distribution 

N;="*{-m =t * p {-i^r) <B11510 » 

where N a and No are the populations of the upper and lower spin states, respectively, A E is the energy 
difference separating the states, k B is the Boltzmann constant and 7 is the temperature in Kelvin. The total 
number of spins is, of course, N = N a +Nn. 

Computation of the fractional excess of the lower level, 


JV 1+cxp(-^A^/(*B"n) 


(B1. 15.11) 


yields, for electrons in a magnetic field of 0.3 T at 300 K, a value of 7.6 x 10 , while for protons under the 
same conditions the value is only 1.2 x 10 . Thus, at thermal equilibrium, in EPR experiments one can 
virtually always take any nuclear spin state belonging to the same electron spin state to be equally populated. 
Because of the slightly larger number of spins occupying the lower energy level, there will be a net absorption 
of energy which results in an exponential decay of the initial population difference of the spin states. 
Eventually the levels would be equally populated (the spin system is then said to be saturated) if there were no 


radiationless processes that restored the thermal equilibrium distribution of the population by dissipating the 
energy absorbed by the spin system to other degrees of freedom. These nonradiative transitions between the 
two states |a) and |P) are called spin-lattice relaxation. Spin-lattice relaxation is possible because the spin 
system is coupled to fluctuating magnetic fields driven by the thermal motions of the surroundings which are 
at thermal equilibrium. These fluctuations can stimulate spin flips and, 


therefore, this process leads to unequal probabilities of spontaneous transitions |a)— »|P) and |p)— >|a) and 
unequal populations at thermal equilibrium. In a magnetic resonance experiment one always has a competition 
between spin-lattice relaxation and the radiation field whose nature is to equalize the population of the levels. 
Qualitatively, T^ is the time for the population difference to decay to 1/e of its equilibrium value after the 
perturbation (which in the case of magnetic resonance is the radiation field) is removed. 

A second type of relaxation mechanism, the spin-spin relaxation, will cause a decay of the phase coherence of 
the spin motion introduced by the coherent excitation of the spins by the MW radiation. The mechanism 
involves slight perturbations of the Larmor frequency by stochastically fluctuating magnetic dipoles, for 
example those arising from nearby magnetic nuclei. Due to the randomization of spin directions and the 
concomitant loss of phase coherence, the spin system approaches a state of maximum entropy. The spin-spin 
relaxation disturbing the phase coherence is characterized by T^ 

A result of the relaxation processes is a shortened lifetime of the spin states giving rise to a broadening of the 
EPR line, which for most magnetic resonance lines dominated by homogeneous linewidth can be written as 


/l*>) = 1 3 iRlTT * T^ \2' (B1.15.12) 

In equation (bl . 15. 12), M Q is the z-component of the bulk magnetization vector, M = (1/1/) Y^ /i-(unit J T~ 

m -3 ), for an ensemble of N spin magnetic moments at thermal equilibrium (in the absence of any resonant 
radiation), or in other words the net magnetic moment per unit volume, y=g e P e /A is the gyromagnetic ratio and 

A is a proportionality constant to include instrumental factors. The lineshape function f(co) has a maximum at 
go=gOq and it decreases for high power levels (i.e. for large B^) and when the spin-lattice relaxation is not fast 
enough to maintain the population difference. This decrease is called saturation. If the saturation factor s is 
defined by 

" l + yaU.l, <B115 ' 13) 


then J{ <d) has the form 


As yM nil 
fl°>) = TTTTf ^" (B1.15.14) 


Well below saturation s«l, and so the lineshape function becomes 




^ ) = -r7TS7ZT3- (B1 " 15 - 15) 


This is the famous Lorentzian function which is very often found for spectra of radicals in solution. In order to 
determine the relaxation times T^ and T 2 , a series of EPR spectra is recorded with the MW power varying 

from a condition of negligible saturation (Bj^TjT- (( l;s«l) to one of pronounced saturation (B 1 2 y 2 T 1 T 2 > 
l;s < 1). T 2 is then calculated from the linewidth below saturation by means of the expression 

r 2 

*2 = A (B1. 15.16) 

where Aco 1/2 * s the half width at half height of the magnetic resonance absorption line in the limit Bj— »0 


(s— »1). For Tj one obtains 

Aa>iy, /l/s- 1\ 


A#>?,, f\fs- ]\ 

T| = ^ f / , . (B1. 15.17) 


One of the principal experimental advantages of this method of determining relaxation times is that it may be 
carried out with standard EPR spectrometers using CW-detected EPR lines [9, 10 ]. A discussion of more 
direct measurements of T^ and T 2 using time-resolved EPR techniques is deferred to a later point (see sections 
bl. 15.4 and Bl. 15.6.3(b)). 

B1. 15.2.3 SPIN HAMILTONIAN 

To characterize and interpret EPR spectra one needs to obtain transition frequencies and transition 
probabilities between the (2S + 1) spin states. All interactions of the spins of electrons and nuclei with the 
applied magnetic field and with each other that lead to energy differences between states with different 
angular momenta have to be considered. The interactions are expressed in terms of operators representing the 
spins, with various coupling coefficients for the different interactions. The contributions of all these 
interactions make up the spin Hamiltonian that will be given in energy units throughout this text. Since, in 
principle, the spin Hamiltonian has no effect on the spatial part of the electronic wavefunction, the energy of 
the spin system in a certain state characterized by the quantum numbers M s and M l can be derived from the 
time-independent Schrodinger equation. The EPR spectrum is then interpreted as the allowed transitions 
between the eigenvalues of the spin Hamiltonian. 

(A) ELECTRON ZEEMAN INTERACTION 

The first contribution considered here is the electron Zeeman interaction, i.e. the coupling of the magnetic 
dipole moment of the electron spin to the external magnetic field. For symmetry reasons the electron Zeeman 
interaction is isotropic for a free electron spin and is characterized by the Zeeman splitting constant g e . The g- 
value of an unpaired electron in an atomic or molecular environment is very often different from g e and may 
also be anisotropic, i.e. dependent on the orientation of the system relative to the magnetic field B. The 
deviation from the spin-only value of the g- factor and the anisotropy result from the contribution of the orbital 
angular momentum to the total angular momentum of the electron. This phenomenon is called spin-orbit 
coupling. It leads to an anisotropic electron Zeeman interaction (EZI) which is usually formulated as 

Hnzi = /»*BflS (B1.15.18) 


where the field and angular momentum vectors are coupled through a symmetric matrix g of dimension 3x3. 
In organic radicals orbital momenta are almost completely quenched by chemical bonding (with the exception 
of cases where the energies of the two orbitals are nearly degenerate), leading to only small deviations A 
g=|g..-g J, i=X,Y,Z of the principal values of g from the free-electron value g^ (typically in a range from 10 

11 6 ry C 

to 5*10 ). g- values very different from g e are expected for first-row transition metal ions and for rare-earth 
ions where spin-orbit coupling is more complete. The g-matrix of organic radicals reflects certain features of 
the electronic wavefunction of the paramagnetic species. The spatial distribution of the orbital carrying the 
unpaired spin can be influenced by interactions with other molecules, e.g. via hydrogen bonding. Therefore, a 
determination of the g- values and the orientation of the main axes of g with respect to the molecular axis 
frame can give highly specific information on the interaction of the molecule with its surrounding. 

(B) ELECTRON SPIN-SPIN INTERACTION 

An atom or a molecule with the total spin of the electrons S = 1 is said to be in a triplet state. The multiplicity 
of such a state is (2S+l)=3. Triplet systems occur in both excited and ground state molecules, in some 
compounds containing transition metal ions, in radical pair systems, and in some defects in solids. 

For a system with S= 1, there are three sublevels characterized by M s =±l and M s =0. In contrast to systems 
with S=i these sublevels may not be degenerate in the absence of an external magnetic field (see figure 

Bl.15.2 ). The lifting of degeneracy of the spin states at zero field is called zero-field splitting (ZFS) and it is 
common for systems with S>1. For triplet states of organic molecules (S = 1) the ZFS arises from the dipolar 
interaction of the two magnetic moments of the electron spins with each other. The interaction is described by 
an additional term 


Hzrs = SDS (B1. 15.19) 

that must be included in a spin Hamiltonian when S>1. The spin-spin coupling (or ZFS) tensor D is a 
symmetric and traceless (Zp x Y z ^ii = ^) second-rank tensor. Therefore, D can be written in its principal axis 
frame with only two parameters ' 

KZFS = D(5j " iS 2 ) + Etf - Sj) (B1.15.20) 

where D=&f D zz is the axial and E=\ (D xx _ D yy ) is the rhombic zero-field parameter. One may define an 

asymmetry parameter r\^=E/D of the D-tensor. The case of r| D =0 (or E = 0) corresponds to an axially 
symmetric ZFS tensor C^xx^Vy) anc * two °f ^e states will remain degenerate at zero magnetic field. The 
ZFS parameters can in general be determined from the EPR spectrum (for Ti-zr-s < ^ezi))- ^ n liquids the ZFS 
is averaged out to zero. 
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Figure Bl.15.2. The state energies and corresponding eigenfunctions (high-field labels) as a function of the 
applied magnetic field Bq for a system of spin 5=1 and B|| z, shown for D>0 and E^O. The two primary 
transitions (A M s =±l) are indicated for a constant frequency spectrum. Note that, because E^O, the state 
energies vary nonlinearly with Bq at low Bq. 

(C) ELECTRON SPIN EXCHANGE INTERACTION 

For systems with two unpaired electrons, such as radical pairs, biradicals or triplets, there are four spin states 
that can be represented by the product functions |M S1 )®|M S2 ), i.e. |a 1 a 2 ), lot^X IPi a 2) anc ^ IP1P2)' w ^ ere 
the subscripts indicate electron spin 1 and electron spin 2, respectively. In a paramagnetic centre of moderate 
size, however, it is more advantageous to combine these configurations into combination states because, in 
addition to the dipolar coupling between the spin magnetic moments, there is also an electrostatic interaction 
between the electron spins, the so-called exchange interaction, which gives rise to an energy separation 
between the singlet state, |S)=-^(|a 1 P 2 )-|Pi a 2))? an d the triplet states (|T + )=|a 1 a 2 ), |Tq)=~;^ 

(|a 1 P 2 )+|P 1 a 2 )), |T_)=|P 1 P 2 )). The magnitude of the (isotropic) exchange interaction can be derived from the 
overlap of the wavefunctions and is described by the Hamiltonian 


Wex = — 2/Si ■ S? 


-10- 


(B1. 15.21) 


which denotes a scalar coupling between the spins ^'i = (S lx ,S 1 ,S lz ) and ^2 = ^2x'^2v'^2z^ ^ e ener gY 
separation between the |S) and |T Q ) wavefunctions is determined by the exchange coupling constant J. For JO 
the singlet state is higher in energy than the |T )-state. The observed properties of the system depend on the 
magnitude of J. If it is zero the two spins behave completely independently and one would have a true 
biradical. At the other extreme, when J is large the singlet lies far above the triplet and the magnetic resonance 
properties are solely determined by the interaction within the triplet manifold. 


(D) HYPERFINE INTERACTION 


The interaction of the electron spin's magnetic dipole moment with the magnetic dipole moments of nearby 
nuclear spins provides another contribution to the state energies and the number of energy levels, between 
which transitions may occur. This gives rise to the hyperfine structure in the EPR spectrum. The so-called 
hyperfine interaction (HFI) is described by the Hamiltonian 

Hhfi =SAI (B1. 15.22) 

where A is the HFI matrix and I = (I x , I , I z ) is the vector representation of the nuclear spin. The HFI consists 
of two parts and therefore, equation (b 1.1 5. 22) can be separated into the sum of two terms 

H}m = SA^l+aS>l (B1.15.23) 

where the first term describes the anisotropic dipolar coupling through space between the electron spin and 

the nuclear spin. A p is the symmetric and traceless dipolar HFI matrix. In the so-called point-dipole 
approximation, where both spins are assumed to be located, this part is given by 


ft = -stoAi 
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(B1. 15.24) 


In equation (b 1.1 5. 24), r is the vector connecting the electron spin with the nuclear spin, r is the length of this 
vector and g n and P N are the g- factor and the Bohr magneton of the nucleus, respectively. The dipolar 
coupling is purely anisotropic, arising from the spin density of the unpaired electron in an orbital of non- 
spherical symmetry (i.e. in p, d or f-orbitals) with a vanishing electron density at the nucleus. Since A p is 
traceless the dipolar interactions are averaged out in isotropic fluid solution and only the orientation- 
independent isotropic coupling represented by the second term in equation (b 1.1 5. 23) gives rise to the 
observed hyperfine coupling in the spectrum. This isotropic contribution is called the (Fermi) contact 

interaction arising from electrons in s orbitals (spherical symmetry) with a finite probability (|\|/(0)| ) of 
finding the electron at the nucleus. The general expression for the isotropic hyperfine coupling constant is 

a = ^gP £ gM*l>(Q)\\ (B1.15.25) 


-11- 

Hence, a measurement of hyperfine coupling constants provides information on spin densities at certain 
positions in the molecule and thus renders a map of the electronic wavefunction. 

The simplest system exhibiting a nuclear hyperfine interaction is the hydrogen atom with a coupling constant 
of 1420 MHz. If different isotopes of the same element exhibit hyperfine couplings, their ratio is determined 
by the ratio of the nuclear g- values. Small deviations from this ratio may occur for the Fermi contact 
interaction, since the electron spin probes the inner structure of the nucleus if it is in an s orbital. However, 
this so-called hyperfine anomaly is usually smaller than 1%. 

(E) NUCLEAR ZEEMAN AND NUCLEAR QUADRUPOLE INTERACTION 

While all contributions to the spin Hamiltonian so far involve the electron spin and cause first-order energy 
shifts or splittings in the EPR spectrum, there are also terms that involve only nuclear spins. Aside from their 
importance for the calculation of ENDOR spectra, these terms may influence the EPR spectrum significantly 
in situations where the high-field approximation breaks down and second-order effects become important. 
The first of these interactions is the coupling of the nuclear spin to the external magnetic field, called the 


nuclear Zeeman interaction (NZI). Neglecting chemical shift anisotropics that are usually small and not 
resolved in ENDOR spectra it can be considered isotropic and written as 


Ww/,1 = -tfpftili' L 


(B1. 15.26) 


The negative sign in equation (b 1.1 5. 26) implies that, unlike the case for electron spins, states with larger 
magnetic quantum number have smaller energy for g n 0. In contrast to the g-value in EPR experiments, g n is 
an inherent property of the nucleus. NMR resonances are not easily detected in paramagnetic systems because 
of sensitivity problems and increased linewidths caused by the presence of unpaired electron spins. 

Since atomic nuclei are not perfectly spherical their spin leads to an electric quadrupole moment if I>1 which 
interacts with the gradient of the electric field due to all surrounding electrons. The Hamiltonian of the nuclear 
quadrupole interactions can be written as tensorial coupling of the nuclear spin with itself 


Knqi = IPI 


(B1. 15.27) 


where P is the quadrupole coupling tensor. Comparison with equation (b 1.1 5. 19) shows that the NQI can be 
formally treated in a way analogous to that for the ZFS. In liquids the NQI is averaged out to zero. 

(F) THE COMPLETE SPIN HAMILTONIAN 

The complete spin Hamiltonian for a description of EPR and ENDOR experiments is given by 

U = Hezi + ftzFS + ttcx + 7i\m + W N zi + Anqi- (bi.is.2B) 

The approximate magnitudes of the terms in equation (bl. 15.28) are shown in an overview in figure B 1.1 5. 3 
(see also [2, 3, 11]). 


10* 


10* 


10* 


10* 


-12- 

io» 


10P 


10 10 


to 11 


10" 


to* 10* 10 * to* 10 ? to 1 1CP 10 1 


cm 


Figure Bl.15.3. Typical magnitudes of interactions of electron and nuclear spins in the solid state 
(logarithmic scale). 


B1.15.3 EPR INSTRUMENTATION 

A typical CW-detected EPR spectrum is recorded by holding the frequency constant and varying the external 
magnetic field. In doing so one varies the separation between the energy levels to match it to the quantum of 
the radiation. Even though magnetic resonance could also be achieved by sweeping the frequency at a fixed 
magnetic field, high-frequency sources with a broad frequency range and simultaneously low enough noise 
characteristics have not yet been devised to be practical for frequency-swept EPR. 

EPR absorption has been detected from zero magnetic field up to fields as high as 30 T corresponding to a 

MW frequency of 10 Hz. There are various considerations that influence the choice of the radiation 
frequency. Higher frequencies, which require higher magnetic fields, give inherently greater sensitivity by 
virtue of a more favourable Boltzmann factor (see equation (b 1.1 5. 11) ). However, several factors place limits 
on the frequency employed, so that frequencies in the MW region of the electromagnetic spectrum remain 
favoured. One limitation is the sample size; at frequencies around 40 GHz the dimensions of a typical 

resonant cavity are of the order of a few millimetres, thus restricting the sample volume to about 0.02 cm 3 . 
The requirement to reduce the sample size roughly compensates for the sensitivity enhancement in going to 
higher fields and frequencies in EPR. However, the sensitivity advantage persists if only small quantities of 
the sample are available. This is often the case for biological samples, particularly when single-crystal studies 
are intended. Second, high frequencies require high magnetic fields that are homogeneous over the sample 
volume. Sufficiently homogeneous magnetic fields above 2.5 T are difficult to produce with electromagnets. 
Superconducting magnets are commercially available for magnetic fields up to 22 T, but they are expensive 
and provide only small room-temperature bores, thus limiting the space available for the resonator. Third, the 
small size of MW components for high frequencies makes their fabrication technically difficult and costly. 
These and other factors have resulted in a choice of the frequency region around 10 GHz (usually 
denominated the X-band region) as the resonance frequency of most commercially available spectrometers. 
The most common frequency 
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bands for high-field/high-frequency EPR are Q-band (35 GHz) and W-band (95 GHz), where the wavelengths 
are 8 mm and 3 mm, respectively. In order to carry out EPR experiments in larger objects such as intact 
animals it appears necessary to use lower frequencies because of the large dielectric losses of aqueous 
samples. At L-band frequencies (1-2 GHz), with appropriate configurations of EPR resonators, whole animals 
the size of mice can be studied by insertion into the resonator. Table B 1.1 5.1 lists typical frequencies and 
wavelengths, together with the resonant fields required for resonance of a free electron. 

Table Bl.15.1 Some frequencies and resonance fields (for g = 2) used in EPR. 

lypical iiPR frequently Vacuum wavelength Typical EMR field Band designation 
v(GH7.) J. (HI) jB d (TJ Frequency (GHz) 

1.5 2x I0 l 0.054 

3-D 1 x I0" 1 0.107 

6.0 5x1Q" 2 0214 

9.5 3.2 x }Q' 2 0339 

24 1.2 k 10" 2 0.856 

36 8.3*10° 1.285 

50 6.0xlO- J 1-784 

95 3.2 x 10 3 3.390 

140 2.1 * 10" 3 4.996 

250 1.2x10° K.921 

360 8.3 x \Q~ 4 12.846 

604 5.0 * 10- 4 21.552 
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The components of a typical EPR spectrometer operating at X-band frequencies [3 , 4, 6] are shown in figure 
Bl.15.4 . 

Up to the present the MW radiation has usually been provided by reflex klystrons, which essentially consist of 
a vacuum tube and a pair of electrodes to produce an electron beam that is velocity modulated by a radio- 
frequency (RF) electric field. The net effect is the formation of groups or bunches of electrons. Using a 
reflector the bunched electron beam is turned around and will debunch, giving up energy to the cavity, 
provided the RF frequency and the beam and reflector voltages are properly adjusted. This will set up one of 
several stable klystron modes at the cavity frequency^; the mode corresponding to the highest output of 
power is usually the one utilized. A typical klystron has a mechanical tuning range allowing the klystron 
cavity frequency to be tuned over a range of 5-50% around/ Q . The adjustment of the reflector voltage allows 
one to vary the centre frequency of a given mode over a very limited range of 0.2-0.8%. Often a low- 
amplitude sine-wave reflector voltage modulation is employed as an integral part of an automatic frequency 
control (AFC). It is desirable that the klystron frequency be very stable; hence, fluctuations of the klystron 
temperature or of applied voltages must be minimized and mechanical vibrations suppressed. Klystrons are 
employed as generators of nearly monochromatic output radiation in the frequency range from 1 to 100 GHz. 
In commercial EPR spectrometers the klystron normally provides less than 1 W of continuous output power. 
Increasingly, however, solid-state devices, such as Gunn-effect oscillators and IMPATTs, are superseding 
klystron tubes. 
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Figure Bl.15.4. Block diagram of a typical EPR spectrometer operating at X-band frequencies. 


Waveguides are commonly used to transmit microwaves from the source to the resonator and subsequently to 
the receiver. For not-too-high-frequency radiation (<10 GHz) low-loss MW transmission can also be achieved 
using strip-lines and coaxial cables. At the output of a klystron an isolator is often used to prevent back- 
reflected microwaves to perturb the on-resonant klystron mode. An isolator is a microwave-ferrite device that 
permits the transmission of microwaves in one direction and strongly attenuates their propagation in the other 
direction. The principle of this device involves the Faraday effect, that is, the rotation of the polarization 


planes of the microwaves. 

The amount of MW power reaching the sample in the resonator is controlled by a variable attenuator. Like the 
isolator, the circulator is a non-reciprocal device that serves to direct the MW power to the resonator (port 1— » 
port 2) and simultaneously allows the signal reflected at resonance to go from the resonator directly to the 
receiver (port 2^> port 3). 

Although achievement of resonance in an EPR experiment does not require the use of a resonant cavity, it is 
an integral part of almost all EPR spectrometers. A resonator dramatically increases the sensitivity of the 
spectrometer and greatly simplifies sample access. A resonant cavity is the MW analogue of a RF-tuned RLC 
circuit and many expressions derived for the latter may also be applied to MW resonators. A typical resonant 
cavity for microwaves is a box or a cylinder fabricated from high-conductivity metal and with dimensions 
comparable to the wavelength of the radiation. Each particular cavity size and shape can sustain oscillations in 
a number of different standing wave patterns, called resonator modes. Visual images and mathematical 
expressions of the distributions of the electric and magnetic field vectors within the cavity can be derived 
from Maxwell's equations with suitable boundary conditions. 
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The locations of the maxima of the B 1 -field and the E-field are different depending on the mode chosen for 
the EPR experiment. It is desirable to design the cavity in such a way that the B 1 field is perpendicular to the 
external field B, as required by the nature of the resonance condition. Ideally, the sample is located at a 
position of maximum B 1? because below saturation the signal-to-noise ratio is proportional to By 
Simultaneously, the sample should be placed at a position where the E-field is a minimum in order to 
minimize dielectric power losses which have a detrimental effect on the signal-to-noise ratio. 

The sharpness of the frequency response of a resonant system is commonly described by a factor of merit, 
called the quality factor, Q=v/Av. It may be obtained from a measurement of the full width at half maximum 
Av, of the resonator frequency response curve obtained from a frequency sweep covering the resonance. The 
sensitivity of a system (proportional to the inverse of the minimum detectable number of paramagnetic centres 
in an EPR cavity) critically depends on the quality factor 

5 CX Qj) (B1. 15.29) 

where r\=Jl G[t T )dt* = Og^dFis the filling factor. The cavity types most commonly employed in EPR are 

the rectangular-parallelepiped cavity and the cylindrical cavity. The rectangular cavity is typically operated in 
a transverse electric mode, TE 102? which permits the insertion of large samples with low dielectric constants. 
It is especially useful for liquid samples in flat cells, which may extend through the entire height of the cavity. 
In the cylindrical cavity a TE Q11 mode is frequently used because of its fairly high £)-factor and the very 
strong B^ along the sample axis. 

Microwaves from the waveguide are coupled into the resonator by means of a small coupling hole in the 
cavity wall, called the iris. An adjustable dielectric screw (usually machined from Teflon) with a metal tip 
adjacent to the iris permits optimal impedance matching of the cavity to the waveguide for a variety of 
samples with different dielectric properties. With an appropriate iris setting the energy transmission into the 
cavity is a maximum and simultaneously reflections are minimized. The optimal adjustment of the iris screw 
depends on the nature of the sample and is found empirically. 

Other frequently used resonators are dielectric cavities and loop-gap resonators (also called split-ring 
resonators) [12]. A dielectric cavity contains a diamagnetic material that serves as a dielectric to raise the 
effective filling factor by concentrating the B 1 field over the volume of the sample. Hollow cylinders 
machined from fused quartz or sapphire that host the sample along the cylindrical axis are commonly used. 


Loop-gap resonators consist of one or a series of cylindrical loops interrupted by at least one or several gaps. 
Loops and gaps act as inductive and capacitive elements, respectively. With a suitable choice of loop and gap 
dimensions, resonators operating at different resonance frequencies over a wide range of the MW spectrum 
can be constructed. Loop-gap resonators typically have low ^-factors. Their broad-bandwidth frequency 
response, Av, makes them particularly useful in EPR experiments where high time resolution, x res =l/(27iAv), 
because fast signal changes are required. Excellent filling factors, r|, may be obtained with loop-gap devices; 
the high r| makes up for the typically low Q to yield high sensitivity (see equation (b 1.1 5. 29)), valuable for 
small sample sizes and in pulsed EPR experiments. Coupling of microwaves into these cavities is most 
conveniently accomplished by a coupling loop that acts as an antenna. Typically, the distance between the 
antenna and the loop-gap resonator is varied in order to obtain optimal impedance matching. 

When the applied magnetic field is swept to bring the sample into resonance, MW power is absorbed by the 
sample. This changes the matching of the cavity to the waveguide and some power is now reflected and 
passes via the circulator to the detector. This reflected radiation is thus the EPR signal. 
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The most commonly used detector in EPR is a semiconducting silicon crystal in contact with a tungsten wire, 
which acts as an MW rectifier. At microwatt powers, crystal detectors are typically non-linear and render a 

rectified current that is proportional to the MW power (i.e. proportional to B^). In the milliwatt region, the 
rectified crystal current becomes proportional to the square root of the MW power (i.e. proportional to B^), 

and the crystal behaves as a linear detector. In EPR spectroscopy it is preferred to operate the crystal rectifier 

in its linear regime. However, since the EPR signal is typically rather small, the diode needs to be biased to 

operate it at higher MW power levels. This can be done by slightly mismatching the cavity to the waveguide 

in order to increase the MW power back-reflected from the cavity, or by adding microwaves at a constant 

power level guided through the reference arm (often called the bypass arm) of the spectrometer. The reference 

arm takes microwaves from the waveguide ahead of the circulator and returns them with adjusted phase and 

power behind the circulator. When properly adjusted, the reference arm can also be used to detect the in-phase 

(%') and out-of-phase (x") components of the EPR signal with respect to the phase of the microwaves. 

When sweeping the magnetic field through resonance, a crystal detector renders a slowly varying DC signal 
which is not readily processed and which is superimposed by low-frequency noise contributions. To overcome 
this, a phase-sensitive detection technique utilizing small-amplitude magnetic field modulation is employed in 
most EPR spectrometers. Modulation of the magnetic field is achieved by placing a pair of Helmholtz coils on 
each side of the cavity along the axis of the external magnetic field. An alternating current is fed through them 
and a small oscillating magnetic field is induced which is superimposed on the external magnetic field. The 
effect of the modulation is depicted in figure Bl.15.5 . Provided the amplitude of the modulation field is small 
compared to the linewidth of the absorption signal, A B 1/2 , the change in MW power at the detector will 
contain an oscillatory component at the modulation frequency whose amplitude will be proportional to the 
slope of the EPR line. A lock-in detector compares the modulated EPR signal from the crystal with a 
reference and only passes the components of the signal that have the proper frequency and phase. 

The reference voltage comes from the same frequency generator that produces the field modulation voltage 
and this causes the EPR signal to pass through while most noise at frequencies other than the modulation 
frequency is suppressed. As a result of phase-sensitive detection using lock-in amplification one typically 
obtains the first derivative of the absorption line EPR signal. The application of field modulation, however, 
can cause severe lineshape distortion: to limit modulation-induced line broadening to below 1% of the 
undistorted linewidth, A B° 1/2 requires small modulation amplitudes (B mod < 0.15 A B° 1/2 for Lorentzian 
lineshapes and B mod < 0.3 A B° 1/2 f° r Gaussian lineshapes). 

After the signal emerges from the lock-in amplifier it still contains a considerable amount of noise. Most of 
the noise contributions to the signal can be eliminated by passing the signal through a low-pass filter. The 
filter time constant is a measure of the cutoff frequency of the filter. If accurate linewidth and g- factor 


measurements are intended, one must be careful to employ a sufficiently short response time because 
lineshape distortions may occur as a result of too intense filtering. 
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Figure Bl.15.5. Effect of small-amplitude 100 kHz field modulation on the detector output current. The static 
magnetic field is modulated between the limits B and B^. The corresponding detector current varies between 
the limits I and 1^. The upper diagram shows the recorded 100 kHz signal as a function of Bq. After [3]. 

Figure Bl.15.6 presents a liquid-phase EPR spectrum of an organic radical measured using a conventional 
EPR spectrometer like the one depicted in Figure B 1.1 5. 4 . As is usual, the lines are presented as first 
derivatives d%' 7d B Q of the power absorbed by the spins. The spectrum shows a pronounced pattern of 
hyperfine lines arising from two different groups of protons (see also Figure B 1.1 5. 9 ). The number, spacing 
and intensity of the lines provides information on the molecular and electronic structure of the molecule 
carrying the unpaired electron spin. The individual lines have a Lorentzian lineshape with a homogeneous 
linewidth determined by T^ The most common case for inhomogeneously broadened lines giving rise to a 
Gaussian lineshape is unresolved hyperfine interactions arising from a large number of nonequivalent nuclei 
and anisotropics of the hyperfine coupling which will persist when recording EPR spectra of radicals in solids. 
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Figure Bl.15.6. The EPR spectrum of the perinaphthenyl radical in mineral oil taken at room temperature. 
(A) First derivative of the EPR absorption %" with respect to the external magnetic field, B^. (B) Integrated 
EPR spectrum. 

EPR has been successfully applied to radicals in the solid, liquid and gaseous phase. Goniometer techniques 
have been adopted to measure anisotropic magnetic interactions in oriented (e.g. single-crystal) and partially 
oriented (e.g. film) samples as a function of the sample orientation with respect to the external field. Variable 
temperature studies can provide a great deal of information about a spin system and its interactions with its 
environment. Therefore, low-temperature as well as high-temperature EPR experiments can be conducted by 
either heating or cooling the entire cavity in a temperature-controlled cryostat or by heating or cooling the 
sample in a jacket inserted into the cavity. Specialized cavity designs have also been worked out to perform 
EPR studies under specific conditions (e.g. high pressures). Sample irradiation is facilitated through shielded 
openings in the cavity. 
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The practical goal of EPR is to measure a stationary or time-dependent EPR signal of the species under 
scrutiny and subsequently to determine magnetic interactions that govern the shape and dynamics of the EPR 
response of the spin system. The information obtained from a thorough analysis of the EPR signal, however, 
may comprise not only the parameters enlisted in the previous chapter but also a wide range of other physical 
parameters, for example reaction rates or orientation order parameters. 


B1.15.4 TIME-RESOLVED CW EPR METHODS 

Although EPR in general has the potential to follow the concentration changes of short-lived paramagnetic 
intermediates, standard CW EPR using field modulation for narrow-band phase-sensitive detection is geared 
for high sensitivity and correspondingly has only a mediocre time resolution. Nevertheless, transient free 
radicals in the course of (photo-)chemical processes can be studied by measuring the EPR line intensity of a 
spectral feature as a function of time at a fixed value of the external magnetic field. Typically, the optimum 
time response of a commercial spectrometer which uses a CW fixed-frequency lock-in detection is in the 
order of 20 |us. By use of field modulation frequencies higher than the 100 kHz usually employed in 
commercial instruments, the time resolution can be increased by about an order of magnitude, which makes 
this method well suited for the study of transient free radicals on a microsecond timescale. 

B1. 15.4.1 TRANSIENT EPR SPECTROSCOPY 

The time resolution of CW EPR can be considerably improved by removing the magnetic field modulation 
completely. Rather, a suitably fast data acquisition system is employed to directly detect the transient EPR 
signal as a function of time at a fixed magnetic field. In transient EPR spectroscopy (TREPR) [ 13 , 14 ] 
paramagnetic species (e.g. free radicals, radical pairs, triplets or higher multiplet states) are generated on a 
nanosecond timescale by a short laser flash or radiolysis pulse and the arising time-dependent signals are 
detected in the presence of a weak MW magnetic field. For this purpose the standard EPR spectrometer 
shown in figure B 1.15.4 needs to be modified. The components for the field modulation may be removed, and 
the lock-in amplifier and digitizer are replaced by a fast transient recorder or a digital oscilloscope triggered 
by a photodiode in the light path of the laser. The response time of such a spectrometer is potentially 
controlled by the bandwidth of each individual unit. Provided that the MW components are adequately 
broadbanded and the laser flash is sufficiently short (typically a few nanoseconds), the time resolution can be 

pushed to the 10~ 8 s range (which is not far from the physical limit given by the inverse MW frequency) if a 
resonator with a low Q-value (and hence wide bandwidth) is used. Fortunately, the accompanying sensitivity 
loss (see equation (b 1.1 5. 2 9) ) can be compensated to a large extent by using resonators with high filling 
factors [12]. Furthermore, excellent sensitivity is obtained in studies of photoprocesses where the light- 
generated paramagnetic species are typically produced in a state of high electron spin polarization (or, in other 
words, removed from the thermal equilibrium population of the electron spin states). Nevertheless, the EPR 
time profiles at a fixed magnetic field position are repeatedly measured in order to improve the signal-to-noise 
ratio by a factor y^, where TV is the number of time traces averaged. 

Following the pioneering work by Kim and Weissman [15], it has been demonstrated that TREPR works for a 
broad range of resonance frequencies from 4 GHz (S-band) up to 95 GHz (W-band). As an example the time- 
dependent EPR signal of the photo-generated triplet state of pentacene in a^ara-terphenyl single crystal 
obtained by TREPR at X-band is shown in figure B 1.1 5. 7 . Note that EPR signals taken in direct detection 
appear in the absorption (or dispersion) 
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mode, not in the usual derivative form associated with field modulation and phase-sensitive detection. 
Therefore, positive signals indicate absorptive (A) and negative signals emissive (E) EPR transitions. 
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Figure Bl.15.7. Transient EPR. Bottom: time-resolved EPR signal of the laser-flash-induced triplet state of 
pentacene in/?-terphenyl. B ^0.085 mT. Top: initially, the transient magnetization M is aligned along B|| z. In 
the presence of a MW magnetic field B 1 the magnetization precesses about B^\\ x' (rotating frame 
representation). 

The following discussion of the time dependence of the EPR response in a TREPR experiment is based on the 
assumption that the transient paramagnetic species is long lived with respect to the spin relaxation parameters. 

When 03, is large compared to the inverse relaxation times, i.e. co 1 =y B^) Tj ,T 2 5 or, in other words, for 
high MW powers, the signal exhibits oscillations with a frequency proportional to the MW magnetic field By 

These so-called transient nutations are observed if resonance between an electron spin transition and a 

coherent radiation field is suddenly achieved. The phenomenon can be understood when viewing the motion 

of the magnetization vector, M, in a 
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reference frame, (x',y',z), rotating at the frequency co of the MW field around the static external field (Bq|| z). 
Hence, in the rotating frame B^ is a stationary field defined along the x'-axis as indicated in figure Bl.15.7 . 
Paramagnetic species are created during the laser flash at t = with their spins aligned with the applied 
magnetic field, which causes an initial magnetization M z (0) in this direction. This is acted upon by the 
radiation field at or near resonance, which rotates it to produce an orthogonal component, M ,(t), to which the 
observed signal is proportional. The signal initially increases as M ,(t) grows under the rotation of M about B 1 
(at exact resonance) or B +B 1 (near resonance), but, while the system approaches a new state via oscillations, 


M continually decreases under the influence of spin-spin relaxation which destroys the initial phase 
coherence of the spin motion within the j'z-plane. In solid-state TREPR, where large inhomogeneous EPR 
linewidths due to anisotropic magnetic interactions persist, the long-time behaviour of the spectrometer 
output, S(t), is given by 


S(t) oc miMa>\t)c' i/i2T2) (B1.15.30) 


where the oscillation of the transient magnetization is described by a Bessel function ^(o^t) of zeroth order, 

damped by the spin-spin relaxation time r~. At low MW po^ 
EPR signal is observed, governed by spin-lattice relaxation 


damped by the spin-spin relaxation time T~. At low MW powers (co 1 2 T 1 T2« 1) an exponential decay of the 


S(t) <X CO^~ t/T K (B1.15.31) 

The rise time of the signals — independent of the chosen MW power — is proportional to the inverse 
inhomogeneous EPR line width. As can be seen from equations (b 1.1 5. 30) and (b 1.15. 31) a measurement of 
the co ^dependence of the transient EPR signals provides a straightforward method to determine not only the 
relaxation parameters of the spin system but also the strength of the MW magnetic field B^ at the sample. 
Spectral information can be obtained from a series of TREPR signals taken at equidistant magnetic field 
points covering the total spectral width. This yields a two-dimensional variation of the signal intensity with 
respect to both the magnetic field and the time axis. Transient spectra can be extracted from such a plot at any 
fixed time after the laser pulse as slices parallel to the magnetic field axis. 

B1. 15.4.2 MW-SWITCHED TIME INTEGRATION METHOD (MISTI) 

An alternative method to obtain accurate values of the spin-lattice relaxation time 7^ is provided by the 
TREPR technique with gated MW irradiation, also called the MW-switched time integration method (MISTI) 
[ 13 , 14 ]. The principle is quite simple. The MW field is switched on with a variable delay x after the laser 
flash. The amplitude of the transient signal plotted as a function of x renders the decay of the spin-polarized 
initial magnetization towards its equilibrium value. This method is preferred over the TREPR technique at 
low MW power (see equation (b 1.1 5. 31)) since the spin system is allowed to relax in the absence of any 
resonant MW field in a true spin-lattice relaxation process. The experiment is carried out by adding a PIN 
diode MW switch between the MW source and the circulator (see figure B 1.1 5.4 , and set between a pair of 
isolators. Since only low levels of MW power are switched (typically less than 1 W), as opposed to those in 
ESE and FT EPR, the detector need not be protected against high incident power levels. 

As a summary it may be of interest to point out why TREPR spectroscopy and related methods remain 
important in the EPR regime, even though pulsed EPR methods are becoming more and more widespread. (1) 
For the case of an inhomogeneously broadened EPR line the time resolution of TREPR compares favourably 
with pulsed techniques. 
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(2) The low MW power levels commonly employed in TREPR spectroscopy do not require any precautions to 
avoid detector overload and, therefore, the full time development of the transient magnetization is obtained 
undiminished by any MW detection deadtime. (3) Standard CW EPR equipment can be used for TREPR 
requiring only moderate efforts to adapt the MW detection part of the spectrometer for the observation of the 
transient response to a pulsed light excitation with high time resolution. (4) TREPR spectroscopy proved to be 
a suitable technique for observing a variety of spin coherence phenomena, such as transient nutations [16], 
quantum beats [17] and nuclear modulations [18], that have been useful to interpret EPR data on light-induced 
spin-correlated radical pairs. 


B1.15.5 MULTIPLE RESONANCE TECHNIQUES 

In the previous chapters experiments have been discussed in which one frequency is applied to excite and 
detect an EPR transition. In multiple resonance experiments two or more radiation fields are used to induce 
different transitions simultaneously [19, 20, 21, 22 and 23 ]. These experiments represent elaborations of 
standard CW and pulsed EPR spectroscopy, and are often carried out to complement conventional EPR 
studies, or to refine the information which can in principle be obtained from them. 

B1. 15.5.1 ELECTRON-NUCLEAR DOUBLE RESONANCE SPECTROSCOPY (ENDOR) 

It was noted earlier that EPR may at times be used to characterize the electronic structure of radicals through a 
measurement of hyperfine interactions arising from nuclei that are coupled to the unpaired electron spin. In 
very large radicals with low symmetry, where the presence of many magnetic nuclei results in a complex 
hyperfine pattern, however, the spectral resolution of conventional EPR is very often not sufficient to resolve 
or assign all hyperfine couplings. It was as early as 1956 that George Feher demonstrated that by electron- 
nuclear double resonance (ENDOR) the spectral resolution can be greatly improved [24], In ENDOR 
spectroscopy the electron spin transitions are still used as means of detection because the sensitivity of the 
electron resonance measurement is far greater than that of the nuclear resonance. In brief, an EPR transition is 
saturated, which leads to a collapse of the observed EPR signal as the corresponding state populations 
equalize. If one now simultaneously irradiates the spin system with an RF field in order to induce transitions 
between the nuclear sublevels, the condition of saturation in the EPR transition is lifted as the nuclear sublevel 
populations shift, and there is a partial recovery of the EPR signal. 

ENDOR transitions can be easily understood in terms of a simple system consisting of a single unpaired 
electron spin (S=|) coupled to a single nuclear spin (I=£). The interactions responsible for the various 

splittings are summarized in the following static Hamiltonian: 


H = Hm + Knzi +^hh = &BgS - g n p a B - 1 + SAL (B1.15.32) 

The coupling constants of the hyperfine and the electron Zeeman interactions are scalar as long as radicals in 
isotropic solution are considered, leading to the Hamiltonian 

H = gcficR - S - ft, At* ' 1 + OS • I. (B1. 15.33) 
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In the high-field approximation with B|| z, the energy eigenvalues classified by the magnetic spin quantum 
numbers, M s and M p are given by 

Em s .n, = gcMo^s " Hn&Mf B Q + aM s M } (bi.15.34) 

where g n and a may be positive or negative, thus leading to a different ordering of the levels. The energy level 
diagram for the case a<0 and l a |/2<g n P NB0 is shown in figure B 1.1 5. 8 . Adopting the notation f Jco e = g e P e B and 
fico N =g n P NB0 , two EPR transitions are obtained, 

ftj EpR = oj q ± af(2ft) (B1.15.35) 


which obey the selection rule A M s =±l and A MpO. The two ENDOR transitions are 

W LNDOR = |^n±a/(2fi)| 


(B1. 15.36) 


which satisfy the selection rule A M s =0 and A Mj=±l. The absolute value is used in equation (bl.15.36) to 
take into account the two cases |a|/(2fl) < |a> N | and |a|/(2fi) > |co N |. The corresponding ENDOR spectra are 

shown schematically in figures b 1.1 5. 8(B) and (C) Irrespective of the EPR line monitored, two ENDOR lines, 
separated by |a|/B and centred at |a> N |, are observed. For |a|/(2fl) > |go n | the two ENDOR transitions are given 

by |a|/(2fi)±|co N |: again, two lines are observed; however, separated by 2|a> N | and centred at |a|/(2fl). 
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Figure Bl.15.8. (A) Left side: energy levels for an electron spin coupled to one nuclear spin in a magnetic 
field, S = I =4, g n >0, a<0, and |a|/(2fi)<co . Right side: schematic representation of the four energy levels with 

|±±)=|M S =±; ,Mj=±| ). |+-)=1, |++)=2, |-)=3 and |-+)=4. The possible relaxation paths are characterized by the 

respective relaxation rates W. The energy levels are separated horizontally to distinguish between the two 
electron spin transitions. Bottom: ENDOR spectra shown when |a|/(2fl)<|co | (B) and when |co |<|a|/(2S) (C). 


For the simple system discussed above the advantages of performing double resonance do not become so 


apparent: two lines are observed using either method, EPR or ENDOR. The situation dramatically changes 
when there are i groups of nuclei present, each group consisting of n- magnetically equivalent nuclei with 
nuclear spin quantum number 1^ each one coupling to the unpaired electron spin with the hyperfine constant 
a>. While the EPR spectrum will consist 
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of 7^(2^1+ 1) lines, for each group of equivalent nuclei, no matter how many nuclei there are or what their 
spin quantum number is, there will still be only two ENDOR lines separated by (aj or 2co Ni . Hence, with 
increasing number of groups of nuclei the number of ENDOR lines increases only in an additive way. Since 
the ENDOR spectral lines are comparable in width to EPR lines, the reduced number of lines in the ENDOR 
spectrum results in a much greater effective resolution. Therefore, accurate values of the hyperfine couplings 
may be obtained from an ENDOR experiment even under conditions where the hyperfine pattern is not 
resolved in the EPR spectrum. In addition, ENDOR spectra become easier to interpret when there are nuclei 
with different magnetic moments involved. Their ENDOR lines normally appear in different frequency ranges 
and, from their Larmor frequencies, these nuclei can be immediately identified. ENDOR is also a well 
justified method when anisotropic hyperfine and nuclear quadrupole (for nuclei with I>1) couplings in solids 
are to be measured. As an example, the ENDOR spectrum of the perinaphthenyl radical in liquid solution is 
depicted in figure B 1.1 5. 9 (see also figure B 1.1 5. 6 for a comparison with the CW EPR spectrum). 
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Figure Bl.15.9. The ENDOR spectrum of the perinaphthenyl radical in mineral oil taken at room 
temperature. \u"\fh = I7.GB MM^jikL \a^\/h = 5.12 Ml l*are the hyperfine coupling constants for the protons in 

the position Q and □, respectively, | tf B |/feft) = 0.631 mTund |oqIA«A-1=0.183 mT. 


The ENDOR experiment is performed at a constant external magnetic field by applying MW and RF fields in 
a continuous fashion. This technique is called CW ENDOR spectroscopy. The design of an ENDOR 
spectrometer differs only slightly from a basic CW or pulsed EPR spectrometer. A coil located around the 
sample tube within the resonant EPR cavity is used as an element in a RF transmitter circuit and is the source 
of a RF field. The basic elements of the RF circuit include a low-power signal source or sweeper and a high- 
power amplifier to produce an RF output signal that can be scanned over a wide frequency range (e.g. for 
proton ENDOR at X-band from approximately 4 to 30 MHz) at a power level up to 1 kW. To carry out an 
ENDOR experiment, the magnetic field i? is set at the resonance of one of the observed EPR transitions. 


Then, the MW power is increased in order to partially saturate the EPR transition. The degree of saturation is 
provided by the saturation factor s defined earlier (see equation (bl .15.13) ). 
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Finally, a strong RF oscillating field of varying frequency is applied to induce and saturate a transition within 
the nuclear sublevels. When the resonance condition for nuclear transitions is fulfilled, the saturated EPR 
transition can be desaturated by the ENDOR transition provided both transitions have energy levels in 
common. This desaturation of the EPR transition is detected as a change in the EPR absorption at 
characteristic frequencies condor' aiK * cons ^ tutes ^ e ENDOR response. 

Phenomenologically, the ENDOR experiment can be described as the creation of alternative relaxation paths 
for the electron spins, which are excited with microwaves. In the four-level diagram of the S=I=± system 

described earlier (see figure B 1.1 5. 8 relaxation can occur via several mechanisms: W le and W ln describe the 
relaxation rates of the electron spins and nuclear spins, respectively. W xl and W x2 are cross-relaxation rates 
in which electron and nuclear spin flips occur simultaneously. Excitation, for example, of the EPR transition |- 
+)<-»|++) (i.e. 4<r^2) will equalize the population of both levels, 4 and 2, if the direct relaxation (characterized 
by the relaxation rate ft^*) cannot compete with the transitions induced by the resonant microwaves. 

Simultaneous application of an RF field at a frequency corresponding to the |++)<-»|+-) (i.e. 2<-»l) transition 
then opens a relaxation path via W^and ^'j^or, more directly, via W j. The extent to which these relaxation 

bypasses can compete with the direct li'-froute controls the degree of desaturation of the EPR line, and, 

therefore, determines the ENDOR signal intensity, which, consequently, does not generally reflect the number 
of contributing nuclei (in contrast to EPR and NMR). The signal intensity observed depends very critically on 
the balance between the various relaxation rates and the magnitude of the MW and RF fields, B^ and B 2 , 
respectively. Additional parameters to be varied in order to optimize the ENDOR signal-to-noise ratio are the 
radical concentration, the solvent viscosity and the temperature. The amplitude of ENDOR signals is 
furthermore influenced by the enhancement effect which occurs because the nucleus does not only experience 
the time-dependent magnetic field B 2 at the RF, but also an additional magnetic field component (the 
hyperfine field) due to the magnetic moment of the electron. Therefore, the effective field at the nucleus can 
be described as 


Bf = kB 2 (B1.15.37) 

where k is the hyperfine enhancement factor. For isotropic HFI k=|1-M s /(flco N )|. The hyperfine enhancement 

is one reason for the different intensities of the individual lines of an ENDOR line pair; at the same RF power 
the high-frequency line is usually more intense than the low-frequency one. Another reason for asymmetrical 
ENDOR line patterns is the effectiveness of the cross-relaxation paths: W xl is in general different from W x2 , 
thus leading to an asymmetrical relaxation network and, as a consequence, to unequal signal intensities. In 
spectra of single crystals, powders and noncrystalline solids, however, the enhancement factor is governed by 
different hyperfine tensor components. This often leads to unexpected intensity patterns within ENDOR line 
pairs. 

Despite the increased resolution of ENDOR compared to EPR, some restrictions concerning the information 
contents of ENDOR spectra persist: (1) unlike EPR, the relative signal intensities of ENDOR line pairs 
belonging to different groups of nuclei do not give any indication of the relative number of nuclei belonging 
to the individual groups; (2) the ENDOR spectrum does not give the sign of the hyperfine couplings, that is, 
one does not know which ENDOR transition belongs to which electron spin state. Both problems are 
addressed in triple resonance, which can be seen as an extension to ENDOR spectroscopy. Therefore, triple 
resonance experiments are very often carried out in order to supplement ENDOR data. 
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B1. 15.5.2 ELECTRON-NUCLEAR-NUCLEAR TRIPLE RESONANCE 

A refinement of the ENDOR experiment is electron-nuclear-nuclear triple resonance, now commonly 
denoted TRIPLE. In TRIPLE experiments one monitors the effect of a simultaneous excitation of two nuclear 
spin transitions on the level of the EPR absorption. Two versions, known as special TRIPLE (ST) and general 
TRIPLE (GT), are routinely performed on commercially available spectrometers. 

(A) SPECIAL TRIPLE (ST) 

The special TRIPLE technique [25, 26] is used for hyperfine couplings |a|/(2fi) < |a> N |. In a typical experiment, 

RF is generated at two frequencies: one is fixed at the free nuclear frequency co N appropriate to the sort of 
nuclei under scrutiny and the second is swept. These two frequencies are multiplied to obtain the sum and the 
difference frequencies, g^^rj^' w ^ c ' 1 are use< ^ to i rra( iiate the sample. The experiment can be understood 
using the energy level and relaxation scheme of figure B 1.1 5. 8 . Both ENDOR transitions, g^endor ^ ,e " 
transitions 3<o4 and 1^2), associated with the same nucleus are simultaneously excited. In cases of 
vanishing cross-relaxation the second saturating RF field enhances the efficiency of the relaxation bypass, 
thus increasing the signal intensity, particularly in cases where W N is the rate-limiting step (because W n (( 
W e ). A second advantage of ST resonance over ENDOR is that when both RF fields are sufficiently strong to 
completely saturate nuclear transitions the EPR desaturation becomes independent of W N . Consequently, the 
line intensities are no longer determined by the relaxation behaviour of the various nuclei, but rather reflect 
the number of nuclei involved in the transition. Finally, ST also has the advantage of higher resolution 
because the effective saturation of nuclear transitions results in smaller observed linewidths compared to 
ENDOR. 

(B) GENERAL TRIPLE (GT) 

In a general TRIPLE (GT) experiment one particular ENDOR transition is pumped with the first RF while the 
second RF is scanned over the whole range of nuclear resonances [27]. Therefore, nuclear transitions of 
different sets of nuclei of the same kind or of different kinds are saturated simultaneously and the effect on the 
ENDOR transitions for all the hyperfine couplings in the system is measured. Clearly, ST is included within 
GT. From the characteristic intensity changes of the high-frequency and low-frequency signals compared with 
those of the ENDOR signals the relative signs of the hyperfine coupling constants can easily be determined. 

B1. 15.5.3 ELECTRON-ELECTRON DOUBLE RESONANCE (ELDOR) 

ELDOR is the acronym for electron-electron double resonance. In an ELDOR experiment [28] one observes a 
reduction in the EPR signal intensity of one hyperfine transition that results from the saturation of another 
EPR transition within the spin system. ELDOR measurements are still relatively rare but the experiment is 
firmly established in the EPR repertoire. 

With help of the four-level diagram of the S=I=± system (see figure B 1.15.8 two common ways for recording 

ELDOR spectra will be illustrated. In frequency-swept ELDOR the magnetic field is set at a value that 
satisfies the resonance condition for one of the two EPR transitions, e.g. 4<-»2, at the fixed observe klystron 
frequency, 03 ob§ . The pump klystron is then turned on and its frequency, © , is swept. When the pump 
frequency passes through the value 
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that satisfies the resonance condition of the 3<r^l transition, there is a decrease in the signal at the frequency 
co obs which constitutes the ELDOR signal. In field-swept ELDOR the pumping and observing MW 
frequencies are held fixed at predetermined values and the magnetic field is swept through the region of 
resonance. ELDOR experiments are technically more difficult than ENDOR: simultaneous EPR in one 
magnetic field for two different transitions requires irradiation simultaneously at two MW frequencies. That 
is, one requires a resonator tunable to two MW frequencies separated by a multiple of the hyperfine coupling. 
The development of loop-gap and split-ring resonators has, because of their wide bandwidth and the 
feasibility of high filling factors, made ELDOR a truly practical technique. 

To analyse ELDOR responses, the reduction of the observed EPR transition at a> obs is expressed 
quantitatively in terms of the ELDOR reduction factor 


R = ffa^ = o)-/(*w) (B1 . 15 . 38) 

where ^( K) DumD = 0) is the observed EPR intensity with pump power off, and ^(a> DumD ) is the intensity with 
pump power on. The ELDOR technique is very sensitive to the various relaxation mechanisms involved. For 
the £=/=£ system 7? may be expressed in terms of the six relaxation rates between the four energy levels that 

are indicated in figure B 1.1 5.8 . With the assumption W q u =W q 24 =W q and ^ N 12= ^ N 34= ^ N the ELDOR 
reduction factor is given by 

W c (2W a + W„ + W x2 ) + (W a ^W^UW^ W, 2 ) (B1 " 15-39) 

which shows that the ELDOR response will be a reduction if W^ 2 W pFF^. If modulation of dipolar hyperfine 
couplings is the dominant relaxation mechanism, this condition can be fulfilled for dilute radical 
concentrations at low temperatures. At high concentrations or sufficiently high temperatures, Heisenberg spin 
exchange or chemical exchange, which tends to equalize the population of all spin levels, is the dominant 
ELDOR mechanism. 

ELDOR has been employed to study a number of systems such as inorganic compounds, organic compounds, 
biologically important compounds and glasses. The potential of ELDOR for studying slow molecular motions 
has been recognized by Freed and coworkers [29, 30 ], 


B1.15.6 PULSED EPR SPECTROSCOPY 

By far the greatest advantage of pulsed EPR [ 31 , 32 ] lies in its ability to manipulate the spin system nearly at 
will and, thus, to measure properties that are not readily available from the CW EPR spectra. Nevertheless, 
EPR has long remained a domain of CW methods. In contrast to the rapid development of pulsed NMR 
spectroscopy, the utilization of the time domain in EPR took a much longer time, even though the underlying 
principles are essentially the same. There are several reasons for this slow development of pulsed EPR. (1) 
The large energies involved in electron spin interactions (see figure B 1.1 5. 3 ) can give rise to spectral widths 
of the order of 10-25% of the carrier frequency 
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(at X-band) as opposed to the ppm scale which applies to NMR. Consequently, with the exception of some 
organic radicals in solution and a few defect centres in single crystals, it is technically impossible to excite the 


entire EPR spectrum by a pulse of electromagnetic radiation. (2) CW EPR records derivatives of absorption 
spectra by using magnetic field modulation in a range between 10 kHz and 100 kHz, a method that takes 
advantage of narrowband detection at the modulation frequency (see figure B 1.1 5. 5 ) and of better resolution 
of the derivative as compared to the absorption lineshape (see figure B 1.1 5. 6 ). Calculating the derivative from 
the absorption lineshape obtained with pulsed methods results in a decrease in the signal-to-noise ratio. For 
these reasons, CW EPR is typically more sensitive than pulsed EPR at a given resolution. (3) Finally, the fact 
that electron spin relaxation times are orders of magnitudes shorter than the nuclear spin relaxation times 
encountered in NMR makes the technology required to perform pulsed EPR experiments much more 
demanding. 

In recent years, however, enormous progress has been made and with the availability of the appropriate MW 
equipment pulsed EPR has now emerged from its former shadowy existence. Fully developed pulse EPR 
instrumentation is nowadays commercially available [31, 33]. 

The practical goal for pulsed EPR is to devise and apply pulse sequences in order to isolate pieces of 
information about a spin system and to measure that information as precisely as possible. To achieve this goal 
it is necessary to understand how the basic instrumentation works and what happens to the spins during the 
measurement. 

B1. 15.6.1 PULSES AND THEIR EFFECTS 

For an understanding of pulsed excitation of spin ensembles it is of fundamental importance to realize that 
radiation pulses actually contain ranges of frequencies: A burst of monochromatic microwaves at frequency 
a> MW and of pulse duration t translates into a frequency spectrum of the pulse that has field components at all 
frequencies. The amplitude of the field drops off as one moves away from the carrier frequency o> MW 
according to B^((d)gc B 1 sin(co MWt )/(co MWt ). The excitation bandwidth of a specific pulse depends only on 
the pulse duration, t . 

The effect of an MW pulse on the macroscopic magnetization can be described most easily using a coordinate 
system (x',y',z) which rotates with the frequency a> MW about the z-axis defined by the applied field B. 
Initially, the net magnetic moment vector M is in its equilibrium position oriented parallel to the direction of 
the strong external field. In the rotating frame, B 1 is a stationary field, which is assumed to be oriented 
parallel to the x'-axis of the rotating coordinate system. The result of applying a short intense MW pulse is to 
rotate the magnetization M about the axis defined by B 1? i.e. the x'-axis, through the flip angle 
9=y e B 1 t . =co 1 t , expressed in radians. When the duration of the MW radiation at a given MW power level is 
just long enough to flip M into the x^'-plane, the pulse is defined as a tt/2 -pulse. Immediately after the 
cessation of the pulse, M has been rotated with its magnitude unaltered (if relaxation phenomena are 
negligible during the excitation pulse) to an orientation perpendicular to B 1 at angle with respect to Bq\\ z. 
After this perturbation the system is then allowed to return to its equilibrium, or, after an appropriate delay, 
additional pulses with specific flip angles and phases are applied to further manipulate the spin system. With 
suitable apparatus, that is, the detection system aligned in the direction of the y '-axis of the rotating axis 
system, the temporal behaviour of the y '-component of the magnetization can be followed. The normalized FT 
of the function M Xf) provides the lineshape which is analogous to that obtained from CW EPR experiments 
under nonsaturating conditions. 
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B1.15.6.2 INSTRUMENTATION 

The design of a pulsed EPR spectrometer depends heavily on the required pulse length and pulse power which 
in turn are mainly dictated by the relaxation times of the paramagnetic species to be studied, but also by the 
type of experiment performed. When pulses of the order of a few nanoseconds are required (either to compete 


with the relaxation times or to excite a broad spectral range) not only is high MW power needed to fulfill the 
condition y Q BA=n/2, but also the whole design of such a high power spectrometer becomes much more 
complex and the construction is more expensive. In most of today's pulsed EPR spectrometers the MW pulses 
are formed on a low-power level by fast switching diodes after the CW source. These low-power pulses are 
then fed into a pulsed high-power MW amplifier (typically a travelling-wave tube amplifier) capable of giving 
the requisite high power up to a few kW. The amplified pulses are then directed into a resonant cavity to 
excite the EPR transitions of the system. The MW power in the resonator grows as [l-exp(-a> MWt /Q)] and 
decays as exp(-co MWt /Q) in response to a square, resonant pulse. An important consideration in a pulsed EPR 
spectrometer is the detection deadtime, or how soon after a pulse the signal M ,(t) can be measured. Typically, 
the deadtime is taken to be the time when ringing from the resonator equals thermal noise. The choice of an 
appropriate resonator for pulsed EPR experiments is therefore always influenced by the conflicting demands 
for a short deadtime and good sensitivity: a low quality factor Q for bandwidth coverage and fast instrumental 
response should be combined with concentrated MW fields for short pulses and high filling factor, the latter in 
partial compensation for the loss in sensitivity in favour of fast time resolution. The spin system responds to 
the exciting MW pulses by producing a signal at a later time when the incident pulses are off. A low-noise 
amplifier amplifies the signal to a level well above the noise floor of the detector. Standard Schottky barrier 
diodes can be used as detectors up to a bandwidth of 5 MHz. For broader bandwidths, multiplying mixers can 
be employed to downconvert the signal from the sample to a video signal centred at zero frequency. (The 
mixer output is the sum or the difference of two input frequencies and the signal amplitudes are proportional 
to the input amplitudes.) A quadrature mixer has two inputs, one for the signal and one for a reference from 
the master MW oscillator. Its two outputs are in quadrature with each other: one is out of phase by 90° and the 
other in phase with the pulse phase. Quadrature detection means that, unlike the case in other spectrometers, 
the reference arm needs no phase shifter since the phase of the recorded signal can be adjusted digitally by 
taking a linear combination of the two quadrature components. Typically, the low-noise MW amplifier and 
the mixer detector are very sensitive to high-power reflections from the cavity and, therefore, have to be 
protected during the excitation pulse. A gated PIN-diode switch in front of the amplifier strongly attenuates 
the input of the detection system during the excitation pulses and, therefore, avoids saturation or permanent 
damage. 

B1.15.6.3 PULSE EPR METHODS 

(A) FOURIER TRANSFORM EPR (FT EPR) 

All operating principles are the same as in FT NMR. A single short and intense MW pulse (typically a tt/2- 
pulse along x*) is applied to flip the magnetizations into the x'j/-plane of the rotating frame (see figure 
Bl. 15. 10(A) ). The induced signal proportional to M . will decay due to transverse relaxation or sample 
inhomogeneities. This process is called free induction decay (FID). The complete spectrum is obtained 
without the need of a field sweep via FT of the FID. Under most conditions, the FT EPR spectrum measured 
using a single excitation pulse corresponds exactly to the CW EPR spectrum. Todays state-of-the-art pulsed 
EPR spectrometers feature tt/2 -pulse lengths of L«5 ns or less, corresponding to an excitation bandwidth of 
roughly 200 MHz. Therefore, FT EPR is applicable to not too wide spectral patterns consisting of narrow 
lines (as typical for free radicals in solution) with long enough T 2 so that the 
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FID does not die away before the deadtime has elapsed. In the case of inhomogeneously broadened EPR lines 
(as typical for free radicals in solids) the dephasing of the magnetizations of the individual spin packets 
(which all possess slightly different resonance frequencies) will be complete within the detection deadtime 
and, therefore, the FID signal will usually be undetectable. 
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Figure Bl. 15.10. FT EPR. (A) Evolution of the magnetization during an FT EPR experiment (rotating frame 
representation). (B) The COSY FT EPR experiment. 

In complete analogy to NMR, FT EPR has been extended into two dimensions. Two-dimensional correlation 
spectroscopy (COSY) is essentially subject to the same restrictions regarding excitation bandwidth and 
detection deadtime as was described for one-dimensional FT EPR. In 2D-COSY EPR a second time 
dimension is added to the FID collection time by a preparatory pulse in front of the FID detection pulse and 
by variation of the evolution time between them (see figure B 1.1 5. 10(B)). The FID is recorded during the 
detection period of duration t 2 , which begins with the second n/2 -pulse. For each ^ the FID is collected, then 
the phase of the first pulse is advanced by 90°, and a second set of FIDs is collected. The two sets of FIDs, 
whose amplitudes oscillate as functions of t^ then undergo a two-dimensional complex Fourier 
transformation, generating a spectrum over the two frequency variables o^ and co 9 . 
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The peaks along the leading diagonal cd^g^ correspond to the usual absorption spectrum, whereas the cross- 
peaks (peaks removed from the diagonal) provide evidence for cross-correlations. 

(B) ELECTRON-SPIN ECHO (ESE) METHODS 


Under conditions where the rapid decay of the FID following a single excitation pulse is governed by 
inhomogeneous broadening the dephasing of the individual spin packets in the x^'-plane can be reversed by 
the application of a second MW pulse in an ESE experiment (see figure Bl.15.11 ) [34]. As before, the 
experiment begins with the net electron spin magnetization M aligned along the magnetic field direction z. At 
the end of the first n/2 -pulse, which is applied at the Larmor precession frequency co n , with the amplitude B A 


pointing along the x '-direction, the net magnetic moment is in the equatorial plane. Immediately, the 
magnetization starts to decay as different spin packets precess about z at their individual Larmor frequencies 
cd^cDq. In the rotating frame the contributions of different spin packets to the magnetization M), appear to 
fan out as shown in figure Bl. 15. 11(A) : viewed in the laboratory frame some spins would appear to precess 
faster (co i > a> ) and some slower (co i < cd q ) than the average. As a result, the FID decays rapidly and after a 
short time there is no detectable signal. From this FID an echo can be generated by means of a second MW 
pulse, applied at time x after the first pulse. The second pulse is just long enough to turn the magnetization 
vectors through 180° about the x'-axis. The original precession frequencies and the directions of rotation of 
the individual components will remain unaltered and, therefore, the magnetizations will rotate toward each 
other in the x^'-plane until they refocus after the same time x into a macroscopic magnetic moment along the - 
y '-axis. At this point the spin alignment produces a microwave field B^ in the cavity corresponding to an 
emission signal that is referred to as an echo [34]. 

As the spins precess in the equatorial plane, they also undergo random relaxation processes that disturb their 
movement and prevent them from coming together fully realigned. The longer the time x between the pulses 
the more spins lose coherence and consequently the weaker the echo. The decay rate of the two-pulse echo 
amplitude is described by the phase memory time, T M , which is the time span during which a spin can 
remember its position in the dephased pattern after the first MW pulse. T M is related to the homogeneous 
linewidth of the individual spin packets and is usually only a few microseconds, even at low temperatures. 

The two-pulse sequence k/2-t-k-t is not the only sequence which leads to the formation of an echo. A pulse 
sequence which has proven to have particular value consists of three tt/2 -pulses as depicted in figure Bl.15.11 
(B). In this three-pulse sequence with pulse intervals x and T a so-called stimulated echo is formed after an 
interval x following the third pulse. The mechanism of formation of the stimulated echo is a little more 
complicated than that of the primary echo and the reader is referred to some excellent review articles [32, 35, 
36 ] for a comprehensive discussion of this topic. Here it is sufficient to mention that with the second tt/2 -pulse 
the y'-components of the dephased magnetization pattern are temporarily stored in the x'z-plane where they 
remain during the waiting time T. The third MW pulse brings the M z magnetizations back into the x'y'-plane, 
where they continue their time evolution and give rise to the stimulated echo at time x after the third pulse. 
The characteristic time of the three-pulse echo decay as a function of the waiting time 7 is much longer than 
the phase memory time T M (which governs the decay of a two-pulse echo as a function of x), since the phase 
information is stored along the z-axis where it can only decay via spin-lattice relaxation processes or via spin 
diffusion. 
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Figure Bl. 15.11. Formation of electron spin echoes. (A) Magnetization of spin packets i,j, k and / during a 
two-pulse experiment (rotating frame representation). (B) The pulse sequence used to produce a stimulated 
echo. In addition to this echo, which appears at x after the third pulse, all possible pairs of the three pulses 
produce primary echoes. These occur at times 2x, 2(x+T) and (x+2T). 

In electron-spin-echo-detected EPR spectroscopy, spectral information may, in principle, be obtained from a 
Fourier transformation of the second half of the echo shape, since it represents the FID of the refocused 
magnetizations, however, now recorded with much reduced deadtime problems. For the inhomogeneously 
broadened EPR lines considered here, however, the FID and therefore also the spin echo, show little structure. 
For this reason, the amplitude of the echo is used as the main source of information in ESE experiments. 
Recording the intensity of the two-pulse or three-pulse echo amplitude as a function of the external magnetic 
field defines electron-spin-echo- (ESE-) 
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detected EPR spectroscopy. Such a field-swept ESE spectrum is similar to the conventional CW EPR 
spectrum except for the fact that the lines appear in absorption and not in the more familiar first derivative 
form. 


ESE-detected EPR spectroscopy has been used advantageously for the separation of spectra arising from 
different paramagnetic species according to their different echo decay times. Furthermore, field-swept ESE 


spectroscopy is superior to conventional CW EPR when measuring very broad spectral features. This is 
because the field modulation amplitudes used in CW EPR to detect the first derivative of the signal are often 
too small compared to the width of the EPR line, so that the gradient of the absorption signal becomes very 
small. Such features are either invisible in CW EPR or obscured by baseline drifts, while they can be well 
distinguished in the absorption spectra with ESE-detected EPR. 

In electron spin echo relaxation studies, the two-pulse echo amplitude, as a function of the pulse separation 
time x, gives a measure of the phase memory relaxation time T M from which T 2 can be extracted if 7^ -effects 
are taken into consideration. Problems may arise from spectral diffusion due to incomplete excitation of the 
EPR spectrum. In this case some of the transverse magnetization may leak into adjacent parts of the spectrum 
that have not been excited by the MW pulses. Spectral diffusion effects can be suppressed by using the Carr- 
Purcell-Meiboom-Gill pulse sequence, which is also well known in NMR. The experiment involves using a 
sequence of n -pulses separated by 2x and can be denoted as [7r/2-(x-7r-x-echo) n ]. A series of echoes separated 
by 2x is generated and the decay in their amplitudes is characterized by T M . 

The other important (spin-lattice) relaxation time T^ is accessible with the help of an additional preparation, 

e.g. an inversion (i.e. a Ti-pulse) or saturation pulse (a single long pulse or a chain of short tt/2 -pulses) placed 
with a variable delay time Tin front of a two-pulse ESE sequence. The 7^ -information is then extracted from 
the dependence of the echo amplitude on the interval T. Experiments of this type are generally called 
'inversion recovery' or 'saturation recovery' experiments. In principle T^ can also be estimated from the 
amplitude of the stimulated echo, as the z-magnetization relaxes towards thermal equilibrium during the 
variable pulse delay time T. Here, inaccuracies in measuring T^ may again originate from spectral diffusion 
and the interaction between the electron spin and the nuclear spins which can affect the amplitude of the echo. 

The electron-spin echo envelope modulation (ESEEM) phenomenon [ 37 , 38 ] is of primary interest in pulsed 
EPR of solids, where anisotropic hyperfine and nuclear quadrupole interactions persist. The effect can be 
observed as modulations of the echo intensity in two-pulse and three-pulse experiments in which x or T is 
varied. In liquids the modulations are averaged to zero by rapid molecular tumbling. The physical origin of 
ESEEM can be understood in terms of the four-level spin energy diagram for the S = 7=| model system 

introduced earlier to describe ENDOR (see figure B 1.15.8 . So far, however, only isotropic hyperfine 
couplings have been considered, leading to an EPR spectrum of this system that comprises the two allowed 
transitions 1<o3 and 2<o4 with A M s =±l and A Mj=0. The situation is different for the case where the 
hyperfine couplings are anisotropic and, in particular, of the same order of magnitude as the nuclear Zeeman 
couplings. Because of the anisotropic nature of the interactions, the energy levels of the spin system are 
modified and the nuclear spin states are mixed. As a consequence, the transitions 1<o4 and 2<o3 involving a 
simultaneous nuclear spin transition (both forbidden for the isotropic case) are now also allowed to some 
extent. In figure Bl. 15. 12(A) the evolution of this spin system in a two-pulse echo experiment (see figure 
Bl. 15. 11(A) ) is considered. Since there are four transitions, there are four components of the magnetization 
to keep track of. For simplicity, only the magnetizations of two transitions, co 24 and a> 14 (labelled a and/, 
respectively), originating from the same nuclear spin level in the lower electron spin manifold, are considered. 
By applying sufficiently short MW pulses both allowed and forbidden transitions are excited simultaneously. 
After the first tt/2 -pulse, the two sets of electrons 
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are precessing at two different frequencies separated by the nuclear frequency. If spin packet a is on 
resonance (cd 24 = 03 mw ), its component of the magnetization is fixed in the rotating frame, whereas the 
magnetization component/precesses with frequency co 14 -co mw =cd 12 . After the time x, the Ti-pulse inverts the 
vector a into the -y '-direction, and at the same time, because of the branching of transitions, gives rise to a 
new component/'. The effect of the tt -pulse on the magnetization component/is a rotation about x' by 180° 
and the formation of a component «' according to the ratio of the transition probabilities for allowed and 
forbidden transitions, a and a' will remain unaltered in the rotating frame (because they are on resonance), 


whereas the two vectors /and/ continue to precess with the off-resonance frequency a> 12 . At time x,/will 
refocus with a at the -y '-axis to form an echo, but the vectors a' and/ will not contribute, because they are no 
longer oriented along -y\ This results in a reduction of the echo intensity at time x. The component/ will 
only contribute to the echo if, in time x, it precesses an integral number of times in the x^'-plane. Therefore, 
the echo amplitude oscillates in proportion to cos(a> 14 x). The same holds for any combination of transitions 
with energy levels in common. Therefore, one expects the echo intensity to oscillate not only with the nuclear 
frequencies a> 12 and a> 34 but also with the sum and the difference of these frequencies. By FT of the echo 
envelope an ENDOR-like spectrum is obtained. The amplitudes of the modulation frequencies are determined 

by the depth parameter &=4/ a I f =(B<i> N /(<i> 12 <i> 34 )) , where I a and 7 f denote the intensities of the allowed and 
forbidden transitions, respectively. B/ is a measure of the anisotropy of the hyperfine coupling tensor. Large 
modulation amplitudes are expected for a> 12 ,<D 34 ^0. This is in contrast to ENDOR spectroscopy, where the 
enhancement factor and, therefore, the ENDOR line intensities, decrease for small nuclear transition 

frequencies. For small hyperfine coupling constants <d 12 »cd 34 »<d n and koc (B/(a> N )) . Again in contrast to 
ENDOR, the ESEEM modulation depth will increase for nuclei with smaller y n . 

The stimulated (three-pulse) echo decay may also be modulated, but only by the nuclear frequencies a> 12 and 
a> 34 and not by their sum and difference frequencies. The qualitative reason for this is that the first pulse 
generates modulation at the nuclear frequencies; the second pulse additionally incorporates the sum and 
difference frequencies and the third pulse causes interference of the sum and difference frequencies to leave 
only the ENDOR frequencies. Apart from the depth parameter k the modulation amplitudes of the ENDOR 

frequencies a> 12 and a> 34 are determined by sm 2 ((D 34 x/2) and sm 2 ((D 12 x/2), respectively. As a consequence, so- 
called blind spots can occur. For example, if a> 34 is an integral multiple of n, i.e. x=2tui/<d 34 , then the 
modulation at a> 12 is completely suppressed. Therefore, the dependence on the response on x should also be 
examined. 

The main advantage of the three-pulse ESEEM experiment as compared to the two-pulse approach lies in the 
slow decay of the stimulated echo intensity determined by T^ which is usually much longer than the phase 
memory time T M that limits the observation of the two-pulse ESE. 

More sophisticated pulse sequences have been developed to detect nuclear modulation effects. With a five- 
pulse sequence it is theoretically possible to obtain modulation amplitudes up to eight times greater than in a 
three-pulse experiment, while at the same time the unmodulated component of the echo is kept close to zero. 
A four-pulse ESEEM experiment has been devised to greatly improve the resolution of sum-peak spectra. 
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Figure Bl. 15.12. ESEEM spectroscopy. (A) Top: energy level diagram and the corresponding stick spectrum 
for the two allowed (a) and two forbidden (f) transitions. Bottom: time behaviour of the magnetization of an 
allowed (a) spin packet and a forbidden (f) spin packet during a two-pulse ESE sequence (see figure Bl.15.11 
(A) ). (B) The HYSCORE pulse sequence. 

In the 2D three-pulse ESEEM technique both the time intervals t^=x and t 2 =T of a stimulated echo sequence 
are independently increased in steps (see figure B 1.1 5. 11(B) ) [39], If the spacing between the first two pulses, 
x, is varied over a sufficiently broad range, blind spots, which caused problems in one-dimensional spectra, do 
not arise in this 2D ESEEM method. Combination cross peaks arising from couplings with several 
inequivalent nuclei can be used to determine the relative signs of the hyperfine splittings. A disadvantage of 
the 2D three-pulse ESEEM technique is that the echo intensities decay at different rates along the two time 
axes, with T M -relaxation along the ^-axis and 7^ -relaxation along the ^-axis. As a result, the linewidths in 
the two frequency dimensions can differ by orders of magnitude. 

An alternative 2D ESEEM experiment based on the four-pulse sequence depicted in figure B 1.1 5. 12(B) has 
been proposed by Mehring and coworkers [40]. In the hyperfine sublevel correlation (HYSCORE) 
experiment, the decay of the echo intensity as a function of ^ is governed by 7^ -relaxation, whereas the echo 
decay along the ^ 2 -axis is 
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determined by the ^-relaxation of the nuclei. Since both relaxation processes are fairly slow, the resolution 
along both frequency dimensions is much increased compared to the 2D three-pulse ESEEM experiment. In 
the HYSCORE experiment, too, the positions of the cross-peaks can be used to determine the relative signs of 
the hyperfine coupling constants. 


The ESEEM methods are best suited for the measurement of small hyperfine couplings, e.g. for the case of 


nuclear spins with small magnetic moment. Larger hyperfine interactions can be measured best by pulsed 
versions of ENDOR spectroscopy. These methods will be introduced as a final application of the pulsed 
excitation scheme introduced earlier. Pulsed ENDOR methods are double-resonance techniques wherein, at 
some particular time in an ESE pulse sequence, a RF pulse is applied that is swept in frequency to match 
resonance with the hyperfine-coupled nuclei. The typical pulse schemes for the most commonly used versions 
of pulsed ENDOR, termed Mims- [ 41 ] and Davies-type [ 42 ] ENDOR to acknowledge those who originally 
introduced them, are depicted in figure Bl.15.13 . In both experiments the ENDOR effect is manifested in a 
change of the ESE intensity when the RF field is on nuclear resonance. The ENDOR spectrum can thus be 
recorded by detecting the echo amplitude as a function of the frequency of the RF pulse. 

The Davies-ENDOR technique is based on an inversion recovery sequence (see figure Bl. 15. 13(A) . The 
experiment starts by interchanging the populations of levels 1 and 3 of one of the EPR transitions of the S=I=± 

model spin system by means of a first selective MW Ti-pulse (with a strength \(o^\ « |A|/fl). Neglecting 

relaxation during the time span T, a two-pulse ESE sequence performed after time t=T produces an echo 
which is inverted with respect to a two-pulse ESE applied to the same spin system at thermal equilibrium. 
When, during the time span T, a selective RF Ti-pulse is applied, the two-pulse ESE will disappear as soon as 
the RF field is on resonance with one of the two transitions l<r^2 or 3<o4. This is because the populations of 
the nuclear sublevels are interchanged by the RF pulse, which simultaneously equalizes the populations of the 
on-resonant EPR transition. The ENDOR effect will not be observable if the preparation pulse (i.e. MW pulse 
1) and/or the two-pulse ESE sequence are non-selective. 

Mims ENDOR involves observation of the stimulated echo intensity as a function of the frequency of an RF 
Ti-pulse applied between the second and third MW pulse. In contrast to the Davies ENDOR experiment, the 
Mims-ENDOR sequence does not require selective MW pulses. For a detailed description of the polarization 
transfer in a Mims-type experiment the reader is referred to the literature [43]. Just as with three-pulse 
ESEEM, blind spots can occur in ENDOR spectra measured using Mims' method. To avoid the possibility of 
missing lines it is therefore essential to repeat the experiment with different values of the pulse spacing x. 
Detection of the echo intensity as a function of the RF frequency and x yields a real two-dimensional 
experiment. An FT of the x-domain will yield cross-peaks in the 2D-FT-ENDOR spectrum which correlate 
different ENDOR transitions belonging to the same nucleus. One advantage of Mims ENDOR over Davies 
ENDOR is its larger echo intensity because more spins due to the nonselective excitation are involved in the 
formation of the echo. 

Pulsed ENDOR offers several distinct advantages over conventional CW ENDOR spectroscopy. Since there 
is no MW power during the observation of the ESE, klystron noise is largely eliminated. Furthermore, there is 
an additional advantage in that, unlike the case in conventional CW ENDOR spectroscopy, the detection of 
ENDOR spin echoes does not depend on a critical balance of the RF and MW powers and the various 
relaxation times. Consequently, the temperature is not such a critical parameter in pulsed ENDOR 
spectroscopy. Additionally the pulsed technique permits a study of transient radicals. 
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Figure Bl. 15.13. Pulsed ENDOR spectroscopy. (A) Top: energy level diagram of an S=I=± spin system (see 

also figure B 1.1 5. 8 (A)). The size of the filled circles represents the relative population of the four levels at 
different times during the (3+1) Davies ENDOR sequence (bottom). (B) The Mims ENDOR sequence. 

More advanced pulsed techniques have also been developed. For a review of pulsed ENDOR techniques the 
reader is referred to [ 43 , 44 and 45 ], 
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B1.15.7 HIGH-FIELD EPR SPECTROSCOPY 

Since its discovery in 1944 by Zavoisky, EPR has typically been performed at frequencies below 40 GHz. 
This limitation was a technical one, but recent developments in millimetre and submillimetre wave frequency 
technology and magnetic field technology have enabled the exploration of ever higher EPR frequencies [ 46 , 
47 and 48]. High-field/high-frequency EPR spectroscopy has a number of inherent advantages [47], (1) The 
spectral resolution of g- factor differences and anisotropics greatly improves since the electron Zeeman 
interaction scales linearly with the magnetic field (see equation (b 1.1 5. 18) ). If paramagnetic centres with 
different g- values or different magnetic sites of rather similar g- values are present, the difference in the 
spectral field positions of the resonances is proportional to the MW frequency co 


Tuo / I 1 \ 

AB (] = — L (B1.15.40) 
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Even for a single radical the spectral resolution can be enhanced for disordered solid samples if the 
inhomogeneous linewidth is dominated by unresolved hyperfine interactions. Whereas the hyperfine line 
broadening is not field dependent, the anisotropic g-matrix contribution scales linearly with the external field. 
Thus, if the magnetic field is large enough, i.e. when the condition 


*LB» > B™ 


#iso 


> ^ l/2 (B1. 15.41) 


is fulfilled, the powder spectrum is dominated by the anisotropic g-matrix. equation (b 1.15.41) may be 
considered as the high-resolution condition for solid-state EPR spectra, to be fulfilled only at high enough Bq. 
From figure Bl.15.14 one sees that for a nitroxide spin label in a protein this is fulfilled almost completely at 
95 GHz, but not at 10 GHz. In the case of well resolved g-anisotropy the extension to high-field ENDOR and 
ESEEM has the additional advantage of providing single-crystal-like hyperfine information when transitions 
are excited at field positions where only specific orientations of the g-matrix with respect to the external 
magnetic field contribute to the spectrum. (2) Relaxation times become longer for many systems at higher 
frequencies. (3) Particularly in studies of small samples, high-field/high- frequency EPR is typically more 
sensitive compared to EPR at X-band frequencies, by virtue of the increased Boltzmann factor (see equation 
(bl.15.10) ). (4) For high-spin systems with zero-field splittings larger than the MW quantum it is impossible 
to observe all EPR transitions. Here, higher-frequency experiments are essential for recording the whole 
spectrum. (5) At high frequencies it becomes possible to violate the high-temperature approximation with 
standard cryogenic systems. This effect can be exploited to gain information on the absolute sign of 
parameters of the spin Hamiltonian. 
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Figure Bl. 15.14. Comparison of 95.1 GHz (A) and 9.71 GHz (B) EPR spectra for a frozen solution of a 
nitroxide spin label attached to insulin measured at 170 K. 

Disadvantages of high-field/high- frequency EPR are mainly technical ones due to the limited availability of 
MW components operating at millimetre and submillimetre wavelengths and the high costs of spectrometer 
development. Furthermore, low-frequency EPR will not be completely superseded by high-field/high- 
frequency EPR because some experiments that rely on the violation of the high-field approximation no longer 
work when increasing the EPR frequency. The largest disadvantages occur in studies of proton interactions, 
where ESEEM is a convenient tool at X-band but cannot be used at W-band frequencies and above because of 
too small modulation depths for most systems. Nevertheless, these drawbacks are outweighed by the 
advantages to such an extent that high-field/high- frequency EPR methods will become more and more 
widespread as one overcomes the technical hurdles. 
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The early high-field/high-frequency EPR spectrometers were developed mostly on the basis of klystron MW 
generators. In the past few years, however, solid-state MW sources such as Gunn oscillators or IMP ATT 
diodes were applied more frequently. To improve their mediocre frequency stability they need to be phase 
locked to a low-frequency stable reference source. If frequencies higher than approximately 150 GHz are 
required, the output frequency of the solid-state MW generator can be multiplied in a Schottky diode 
harmonic generator (multiplication factors between 2 and 5). To generate even higher frequencies up to 1 THz 
more exotic high-power pulsed and CW tube sources such as gyrotrons, extended interaction oscillators 


(EIOs), backward wave oscillators (BWOs) or magnetrons are available. Their spectral characteristics may be 
favourable; however, they typically require highly stabilized high- voltage power supplies. Still higher 
frequencies may be obtained using far- infrared gas lasers pumped for example by a C0 2 laser [49]. 

All the waveguide elements become very small at high frequencies as compared to the standard X-band or 
even Q-band. This produces high losses up to several decibels per metre due to waveguide imperfections. 
Therefore, if millimetre and submillimetre waves have to be transmitted over long (and straight) distances, 
oversized or corrugated waveguides are normally used because of their smaller ohmic losses. Corrugated 
waveguides have narrow grooves, each a quarter-wavelength deep, cut into the guide walls. The effect of the 
grooves is to destructively average the E field near the wall surface which cannot now have a non-zero 
component perpendicular to the surface. To couple to a resonant cavity, however, these waveguides need to be 
tapered back to the fundamental-mode waveguide. Very recently developed EPR spectrometers operating at 
130 GHz [50], 250 GHz [51] and 360 GHz [52] (see figure Bl. 15. 15 forgo waveguides for millimetre-wave 
transmission for the most part. Instead quasi-optic techniques are used [53]. Once millimetre-waves have been 
converted into Gaussian beams by means of corrugated feedhorns they can be transported and manipulated in 
free space using quasi-optical elements such as lenses (constructed from Teflon or high-density polyethylene) 
and off-axis mirrors. The losses in these elements are virtually negligible. 

The mechanical specifications of cavities for millimetre waves are highly demanding regarding cavity 
dimensions, cavity surfaces and precision of coupling mechanisms due to the reduced dimensions at 
millimetre wavelengths. Furthermore, with increasing MW frequency resonators become more and more 
difficult to handle. Therefore, high- field/high-frequency EPR measurements are very often carried out without 
a cavity with the sample placed directly in a transmission waveguide. From the point of view of absolute 
sensitivity, however, small-volume cavities are evidently preferable. Typically used cavities for millimetre - 
and submillimetre-wave EPR are multi-mode Fabry-Perot resonators [54] consisting of a confocal or 
semiconfocal arrangement of two mirrors placed at a particular distance apart. Fabry-Perot resonators have 
been used successfully in the reflection and the transmission mode. A typical Fabry-Perot resonator is 
extremely sensitive to displacements of the mirrors by as little as 0.1 |um. Also, the mechanical isolation of the 
cavity from the modulation coils has to be improved compared to lower frequency designs because of the 
larger modulation amplitudes and increased interaction forces with the larger static field. 
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Figure Bl. 15.15. High-field/high-frequency EPR spectrometer operating at 360.03 GHz bl.15.52. 
Microwaves at 360.03 GHz are produced by frequency multiplication of the output of a Gunn oscillator and 
are then passed into a corrugated feedhorn to set up a fundamental Gaussian beam. Refocusing and redirection 
of the beam is accomplished using off-axis mirrors. The microwaves then travel through a corrugated 
waveguide and are coupled into a Fabry-Perot-type open cavity. Incident linearly polarized microwaves and 
circularly polarized EPR signals emitted from the resonator are separated by a wiregrid polarizer. Heterodyne 
phase-sensitive detection of the EPR signal is achieved in a subharmonic mixer using microwaves of 180.62 
GHz to produce a 1.21 GHz intermediate frequency signal which is further down-converted in a quadrature 
mixer. All MW sources are phase locked to one common reference oscillator. 


-43- 


Many different sorts of millimetre-wave detectors have been developed, each offering its own combination of 
advantages and drawbacks. For convenience, they may be divided into two general categories: bolometers and 
mixers (heterodyne detectors). A bolometer is a device which responds to a change in temperature produced 
when it absorbs incident radiation. The noise figure of an He-cooled bolometer is excellent; however, its small 
bandwidth limits its application to CW EPR experiments. Heterodyne detection systems transfer a signal band 
from a high frequency to a lower frequency where low-noise amplifiers are available. This is accomplished, 
for example, in a mixer by overlaying the signal with a mono frequent and stable local oscillator (LO) 
frequency to produce a (difference) intermediate frequency (IF) in the lower GHz range. With increasing 
frequency, however, the Schottky barrier diodes used in mixers become very sensitive to static electricity and 
mechanical stress, thus limiting their reliability. 


Pulsed, or time-domain, EPR spectrometers have also been developed at higher frequencies up to 140 GHz 
[ 55 , 56 ]. They are generally low-power units with characteristically long pulse lengths (typically 50 ns for a 
7i/2 -pulse) due to the limited MW powers available at millimetre wavelengths and the lack of fast-switching 


pulse-forming devices at these frequencies. In general, all the experiments outlined in the previous section can 
be performed, however, with the even more severe restriction to limited excitation bandwidths. Nevertheless, 
pulsed EPR performed at high frequencies has clear advantages when the poor orientation selection at low 
frequencies prevents the study of spectral anisotropics of the ESE decay, for example in relaxation 
measurements. As an example, figure Bl.15.16 depicts the decay of a two-pulse ESE of a quinone radical as a 
function of the external magnetic field. Clearly, the echo decay governed by T 2 is different for selected field 
positions in the spectrum. This T 2 anisotropy can be analysed in terms of anisotropic motional processes of 
the radical in its molecular environment. Due to the low g-anisotropy (as is typical in biomolecules) this 
experiment would not have been successful at lower frequencies. 
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Figure Bl.15.16. Two-pulse ESE signal intensity of the chemically reduced ubiquinone- 10 cofactor in 
photosynthetic bacterial reaction centres at 1 15 K. MW frequency is 95.1 GHz. One dimension is the 
magnetic field value i? ; the other dimension is the pulse separation x. The echo decay function is anisotropic 
with respect to the spectral position. 
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B1.16 Chemically-induced nuclear and electron 
polarization (CIDNP and CIDEP) 


Elizabeth J Harbron and Malcolm D E Forbes 


B1.16.1 INTRODUCTION 

Chemically-induced spin polarization was one of the last truly new physical phenomena in chemistry to be 
discovered and explained during this century. So unusual were the observations and so ground-breaking the 
theoretical descriptions that, over a very short time period, the chemist's way of thinking about free radical 
reactions and how to study them was fundamentally changed. After the earliest experimental reports of 
unusual phases of electron paramagnetic resonance (EPR) (1963) [1] and nuclear magnetic resonance (NMR) 
(1967) [2, 3 and 4] transitions in thermal, photolytic and radiolytic reactions involving free radical 
intermediates, it took several years of theoretical development before the idea of the radical pair mechanism 
(RPM) was put forward to explain the results [5, 6, 7, 8 and 9]. Gradually, the theory was tested and 
improved, and additional polarization mechanisms were discovered. The overall physical picture has stood the 
test of time and now both chemically-induced dynamic nuclear polarization (CIDNP) and its electron 
analogue (CIDEP) are well understood. The phenomena are exploited by many researchers who are trying to 
understand the kinetic and magnetic properties (and the links between them) of free radicals, biradicals and 
radical ion pairs in organic photochemistry, as well as photosynthetic reaction centres and other biologically 
relevant systems. The high structural resolution of NMR and EPR spectroscopies, combined with recent 
advances in fast data collection instrumentation and high powered pulsed lasers, has made time-resolved 
CIDNP and CIDEP experiments some of the most informative in the modern physical chemistry arsenal. 


In spectroscopy it is common for transitions to be observed as absorptive lines because the Boltzmann 
distribution, at equilibrium, ensures a higher population of the lower state than the upper state. Examples 
where emission is observed, which are by definition non-equilibrium situations, are usually cases where 
excess population is created in the higher level by infusing energy into the system from an external source. 
For example, steady-state emission spectroscopy is used to measure fluorescence or phosphorescence from 
the excited states of organic molecules. The technique requires excitation to the upper energy levels first, then 
what is observed is a spontaneous emission. Another example is the laser, which is pumped with an external 
source such as a flash lamp or an electric arc to ensure a population inversion, and stimulated emission then 
occurs from the upper state upon absorption of another photon. What makes the non-Boltzmann NMR and 
EPR populations observed in CIDNP and CIDEP experiments so unusual is that nuclear-spin dependent 
chemical reactions (homolytic bond-breaking or forming) are responsible for the process. While it usually 
requires energy to break the bond, once it is broken the mixing of spin wavefunctions in the resulting radical 
pair, which will be described in detail in the following, is all that is necessary to make some NMR and EPR 
transitions appear with enhanced absorption (greater intensity than Boltzmann would predict) or even in 
emission (higher population in the excited state). The overall phase and magnitude of the polarization is 
dependent on the nuclear spin projections of the nuclei (usually, but not always, protons) near the free radical 
site of the molecules in question. For this reason, it is easy to see why a suitable theoretical description of 
CIDNP took a long time to evolve. The idea that the nuclear spin-state energy level differences, which are 
much smaller than kT at room temperature, could be responsible for different chemical reaction rates was a 
revolutionary and somewhat controversial one. As more and 


more experiments were performed to support this idea, it rapidly gained acceptance and, in fact, helped 
connect the solution dynamics of small molecules to spin quantum mechanics in a very natural and 
informative fashion. 

We make one important note here regarding nomenclature. Early explanations of CIDNP invoked an 
Overhauser-type mechanism, implying a dynamic process similar to spin relaxation; hence the word 
'dynamic' in the CIDNP acronym. This is now known to be incorrect, but the acronym has prevailed in its 
infant form. 

The general phenomena of CIDNP and CIDEP are presented in figure B 1.1 6.1 and figure B 1.1 6. 2 . Figure 
B 1.1 6.1 shows work by Roth et al [ 10 ] on radical cation structure in which the bottom trace is a 'dark' 
spectrum and the top trace is the CIDNP spectrum [10]. Figure B 1.16.2 shows CIDEP spectra of radicals 
formed by decomposition of a fluorinated polymer initiator [11]. The NMR spectra in figure B 1.16.1 and the 
EPR spectra in figure B 1.1 6. 2 can be recognized as spin-polarized by the presence of lines in emission and 
enhanced absorption. The origin of the CIDNP and CIDEP phenomena will be explored and explained in the 
following, and we will return to these examples for further analysis once the theory behind them is 
understood. 
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Figure Bl.16.1. H CIDNP spectrum (250 MHz; top) observed during irradiation of chloranil with sabinene 
(1) in acetone-cL and dark spectrum (bottom). Assignments are based on the 2D H- H COSY spectrum. 
Reprinted from [10]. 
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Figure Bl.16.2. X-band TREPR spectra obtained at 0.1 |as after 308 nm photolysis of a fluorinated peroxide 
dimer in Freon 1 13 at room temperature. Part A is the A/E RPM spectrum obtained upon direct photolysis; 
part B is the E/A RPM spectrum obtained upon triplet sensitization of this reaction using benzophenone. 


B1. 16.2 CIDNP 


B1.16.2.1 SPIN HAMILTONIAN 


CIDNP involves the observation of diamagnetic products formed from chemical reactions which have radical 
intermediates. We first define the geminate radical pair (RP) as the two molecules which are born in a radical 
reaction with a well defined phase relation (singlet or triplet) between their spins. Because the spin physics of 
the radical pair are a fundamental part of any description of the origins of CIDNP, it is instructive to begin 
with a discussion of the radical -pair spin Hamiltonian. The Hamiltonian can be used in conjunction with an 
appropriate basis set to obtain the energetics and populations of the RP spin states. A suitable Hamiltonian for 
a radical pair consisting of radicals 1 and 2 is shown in equation (B 1.1 6.1) below [12]. 


The first term describes the electronic Zeeman energy, which is the interaction of the magnetic field with the 
two electrons of the radical pair with the magnetic field, i? Q . The two electron spins are represented by spin 

operators ^ and ^ 2z - ^ n this expression, g is the g factor, which is the chemical shift of the unpaired 

electrons. The other variables in the first term are constants: P is the Bohr magneton, and fiis Planck's 

constant divided by 2tt. The second term in the Hamiltonian describes the hyperfine interaction between each 

radical and the nuclei on that radical, where ^is again the electron spin operator, 7 is the nuclear spin operator, 
and a- and a^ are the hyperfine coupling constants. The hyperfine constants describe the coupling between 
electronic and nuclear spins; coupling between an electron and a proton is designated a^. For most carbon- 
centred alkyl free radicals, a H for an electron and a proton on the same carbon is negative in sign while a H for 
a proton p to an electron is positive. The sign of the hyperfine coupling constant will become an important 
issue in the analysis of CIDNP data below. 

The final term in the radical pair Hamiltonian is the exchange interaction (J) between the unpaired electrons. 
This interaction is a scalar quantity that describes the coupling of the angular momenta of two radicals which 
are in close proximity. Its magnitude decreases exponentially with increasing inter-radical distance as shown 
by equation (B 1.1 6. 2), where J Q is the exchange interaction at the point of closest contact, r Q ; r is the inter- 
radical distance; and X is a fall-off parameter generally accepted to be approximately 1 A -1 for isotropic 
solutions. The exchange interaction should not be confused with the quantum mechanical term exchange 
integral, although the two are related [13]. 

j = y e _;Lfr " ro) . (B1.16.2) 

The exchange interaction results in an energy splitting between the singlet and triplet states of the RP as 
shown in figure B 1.1 6. 3 which shows a plot of the RP energy levels versus the inter-radical separation. The S 
and Tlevels shown in the lower part of the figure are in the absence of an external magnetic field. When an 
external field is applied, the triplet level is split into T +9 T^ and T_, as shown in the upper inset. At the high 
magnetic fields at which most CIDNP experiments are conducted, T + and T_ are far away from S and can be 
neglected in what is known as the high-field approximation. 
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Figure Bl.16.3. Energy levels versus inter-radical separation for a radical pair. The lower part of the figure 
shows the S and Tlevels in the absence of an external magnetic field while the inset shows the splitting of the 
triplet levels in the presence of a magnetic field, B^. 

When the inter-radical distance is very small, J is large, and S and T^ are far apart and cannot mix. This is 
called the 'exchange region' as the electron spins are constrained by the exchange interaction to remain in 
their respective spin states. As the radicals diffuse apart, J rapidly falls to zero, and the S and T^ levels 
become degenerate and are allowed to mix. If the radicals diffuse back to the region of large J, they must 
again be in either the S or T^ state, but some of the RPs which were formerly in one spin state will now be in 
the other. This phenomenon is known as intersystem crossing; it is a process critical to the understanding of 
CIDNP and will be explained in greater detail below. 

B1. 16.2.2 MECHANISM OF INTERSYSTEM CROSSING 

Vector representations of the radical pair spin states, shown in figure B1.16.4A , help to explain how 
intersystem crossing occurs in RPs. The vector diagrams show the magnitude of the spin angular momentum 
vector and its z component (parallel to i? ) for each of the two spins in the RP. Because the x andj 
components of spin are unspecified, each spin vector can be thought to precess around a cone. When 
considering only the influence of the interaction of the electron chemical shift with the magnetic field, the 
frequency of electron precession is given by the expression for the Larmor frequency, equation (Bl.16.3). 
Additional interactions, such as electron-nuclear hyperfine couplings, must be added to or subtracted from the 
Larmor frequency in order to determine the actual precessional frequency of a given electron; 


co = gpth- 1 b . 


(B1.16.3) 


As mentioned previously, a geminate RP in close contact is constrained by the exchange interaction to remain 
in its initial spin state, S or 7^, and vector representations of these states are shown in figure B1.16.4A. The 
exchange interaction prevents the two spins in the RP from precessing independently; so long as they precess 
at the same frequency, the two spins will remain in the same mutual orientation. Once the radicals have 
diffused to the region where J is zero, however, they are free to precess independently of one another. At this 
point, differences in g factor and/or hyperfine interaction will cause the radicals to precess at different 


frequencies. This difference in precessional frequencies is given by Q, as shown in equation (B 1.1 6.4), where 
m lz - and m 2 - are the nuclear magnetic quantum numbers for each member of the RP, respectively, and the 
other variables were defined previously; 


a>i -d)2 = 2Q = (g] - g 2 )^h ^o + ^^if - ^a 2 ;^2 J f- 


(B1.16.4) 


Eventually, the two spins will fall out of step with one another and will oscillate between the S state, 
intermediate states, and the T^ state. An intermediate state is shown in region II of the vector diagram and can 
be described as a coherent superposition of the S and T^ states with the coefficients c s and c T delineating the 
amount of S and T^ 'character'. If the radicals diffuse back together to the region of large J, the RP is again 
required to be in either the S or T^ state. The squares of the coefficients c s and c T will determine the 
probability that the RP will jump to the S or T^ state at this inter-radical distance. Some RPs will now be in a 
different spin state than they were initially and are said to have undergone intersystem crossing, as mentioned 
above. While S-T^ mixing occurs when the radicals in a RP are in the J= region, the fact that intersystem 
crossing has occurred cannot be determined unless they diffuse back together and are forced by the exchange 
interaction to be in the S or T^ state. 


B, 




|S> 


C fl |5> +Q,|To> 



B 


singial 
precursor 


Region I 


J*0 


diffusion 


* (RtfR 


recombination 
(cage) 

prCKtoClS 


Region II 


j = o 

S -T D mixing 


scavenging 
(escape} 
products 


Region III 


J*0 


diHusron 


triplet 
precursor 


Figure Bl.16.4. Part A is the vector representations of the S state, an intermediate state, and the T^ state of a 
radical pair. Part B is the radical reaction scheme for CIDNP. 


By examining the expression for Q ( equation (B 1 . 1 6.4) ), it should now be clear that the nuclear spin state 
influences the difference in precessional frequencies and, ultimately, the likelihood of intersystem crossing, 
through the hyperfine term. It is this influence of nuclear spin states on electronic intersystem crossing which 
will eventually lead to non-equilibrium distributions of nuclear spin states, i.e. spin polarization, in the 
products of radical reactions, as we shall see below. 

B1.16.2.3 RADICAL REACTION SCHEME 


A general reaction scheme for CIDNP is shown in figure B1.16.4B , where the radical dynamics in each region 


correspond to the vector diagram for that region shown in figure B1.16.4A . A geminate RP is formed from a 
singlet or triplet precursor through bond cleavage or an electron transfer reaction. Thermal reactions proceed 
from the singlet state while photochemical reactions tend to occur from the triplet state; certain species, such 
as azo compounds, react from the singlet in photochemical reactions [14]. The RP is always formed in the 
same spin state as its precursor because the RP-forming reaction must conserve angular momentum. For the 
vast majority of reactions, recombination of singlet RPs is allowed while recombination of triplet RPs is 
forbidden. While there are exceptions [15], we will consider triplet RPs to be incapable of recombination 
throughout this explanation. It should also be noted that figure B1.16.4B shows radical-forming reaction 
pathways for both singlet and triplet RPs for the purpose of illustration, but it is to be understood that most 
radical-forming reactions occur from either a singlet (left side) or triplet (right side) precursor but not both. 

The geminate RPs in figure B1.16.4B are indicated by a bar with the spin multiplicity. It is common to speak 
of geminate RPs as being in a 'cage', and this notion is central enough to our discussion of CIDNP to merit 
some discussion. The idea of the cage effect in radical chemistry stems from early work by Franck and 
Rabinowitch [16] in which they noted that radicals have an increased probability of recombination in solution 
as compared to the gas phase. While the term 'cage' may encourage one to picture a rigid ensemble of solvent 
molecules, the cage effect does not describe the influence of a static entity and is actually somewhat difficult 
to define. 

While all agree that the cage is a concept critical to CIDNP, different researchers vary somewhat in their 
definition of it. Turro et al [17] conceive of a RP as guest and a solvent cage as host in an extremely short- 
lived 'collision complex' with a lifetime of about 10 -11 s. Salikhov et al [12] define the cage as a region of 
effective recombination of two radicals in a RP. They note that radicals may diffuse to the second or third 
coordination sphere and still come back together to give products. Accordingly, the cage effect describes a 
twofold influence of condensed media: the two radicals in a RP are not only in close contact for a longer 
period of time than they would be in the gas phase but are also more likely to re-encounter one another after 
diffusing apart. Goez [18] describes the same effects in more abstract terms. He writes of the cage as a region 
of time and states that two radicals are in the cage so long as they have not lost each other for good. If two 
radicals in a RP diffuse apart but re-encounter, then they can be said to have been in the cage the entire time. 
If the same two radicals do not re-encounter, then they are said to have escaped the cage. These different 
perspectives on the cage are not meant to confuse the reader but are rather intended to present the general idea 
while conveying the complexity and importance of the concept in CIDNP. 

Returning to figure B 1.1 6.4 if the RP is initially in the singlet state, some of the geminate RPs will recombine 
in the cage in what are called cage or recombination products. In addition to the reformed radical precursor, 
recombination products can also include products from disproportionation reactions which occur in cage. A 
few singlet RPs may escape the cage instead of recombining; as mentioned before, triplet RPs cannot 
recombine, so they will also escape. 


Escaped radicals diffuse to region II, where J is negligible, and may undergo S-T^ mixing as described 
previously. From region II, the radicals may follow any of three different pathways. 

(1) The radicals may re-encounter one another following S-T mixing. As they diffuse together into the 
region of large J, the radicals are again constrained to be in either the S or the T^ state. Some fraction of 
RPs will have undergone intersystem crossing. 

(2) An individual radical from the RP may encounter a radical from a different RP to form what are known 
as random RPs or F pairs. F pairs which happen to be in the singlet state have a high probability of 
recombining, so the remaining F pairs will be in the triplet state. Consequently, the initial condition for F 
pairs is the triplet state in nearly all cases. 


(3) An individual radical from the RP may be scavenged by a solvent or another chemical species to form 
diamagnetic products. Because the products are formed following escape from the cage, they are known 
as escape or scavenging products. 

B1. 16.2.4 RADICAL PAIR MECHANISM: NET EFFECT 

We have introduced the RP spin Hamiltonian, the mechanism of nuclear spin selective intersystem crossing, 
and the reaction scheme for RPs has been explained. We now possess all the tools we need to explain 
qualitatively how nuclear spin polarization arises and manifests itself in a CIDNP spectrum. The RPM may 
appear in a CIDNP spectrum as a net effect, a multiplet effect, or a combination of both. The net effect is 
observed when the g factor difference between radicals R^ and R 2 (A g) in a radical pair is large compared 
with the hyperfine interaction. The simplest example involves a RP with just one hyperfine interaction, as 
shown in figure B 1.16.5 . In this example we will set the following conditions: (1) the RP is initially a singlet; 
(2) the hyperfine coupling constant (a H ) is negative; and (3) the g factor for R^ is greater than that for R 2 . The 
recombination product is formed by in-cage recombination of R^ and R 2 , and a scavenging product is formed 
by the abstraction of, say, a halogen atom from the solvent following escape from the cage. The scavenging 
product is formed primarily from escaped triplet RPs although it should be noted that escaped singlets could 
also form the same product. 
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Figure Bl.16.5. An example of the CIDNP net effect for a radical pair with one hyperfine interaction. Initial 
conditions: g 1 > g 2 ; # H negative; and the RP is initially singlet. Polarized nuclear spin states and schematic 
NMR spectra are shown for the recombination and scavenging products in the boxes. 


In this simple case, there are just two nuclear spin states, a and p. Equation (1.16.5) shows the calculation of 
the difference in electron precessional frequencies, Q, for nuclear spin states a (equation (B 1.1 6. 5a)) and P 
(equation (B 1.1 6.5b)). 


20 = A g p e Ti~' B + <±)a H 


(B1. 16.5a) 


2Q = Agfijt ' B + (^)rt H (B1.16.5ft) 

Since Ag is positive and # H is negative, Q is larger for the (3 state than for the a state. Radical pairs in the P 
nuclear spin state will experience a faster intersystem crossing rate than those in the a state with the result that 
more RPs in the (3 nuclear spin state will become triplets. The end result is that the scavenging product, which 
is formed primarily from triplet RPs, will have an excess of spins in the (3 state while the recombination 
product, which is formed from singlet RPs, will have an excess of a nuclear spin states. 

Relative populations for the a and P states are indicated by the thickness of the lines in the diagrams at the 
bottom of figure B 1.1 6. 5 and the corresponding CIDNP spectrum is shown below each level diagram. The 
signal for the recombination product, with its excess of a spins, will be in enhanced absorption. This 
enhanced absorption is distinguished from a typical NMR absorptive signal by its abnormal intensity, which 
may be as much as 1000 times 
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greater than a NMR signal from a Boltzmann population difference. The scavenging product, with its excess 
of P spins, appears in emission in the CIDNP spectrum. Herein lies an extremely important feature of CIDNP: 
cage and escape products have opposite phases of polarization. It should be straightforward to see that 
changing the sign of a H in the previous example would change the values of Q and would ultimately result in 
a flipping of the phase of the polarization. A rule for predicting the phase of the polarization for each product 
will be presented with the next example. 

A slightly more complex system exhibiting the RPM net effect is presented in figure B 1.1 6. 6 [19]. In this 
case, radicals R^ and R 2 each have one hyperfine coupling, so the two protons in the recombination product 
originate from different radicals. The four nuclear spin states and allowed transitions for the product are 
shown in figure B1.16.6A along with the NMR spectrum in the absence of spin polarization. Again, we must 
set some initial conditions for this CIDNP example: the RP is initially in the triplet state, both hyperfine 
coupling constants are positive and g 1 is greater than g 2 . The values shown on each level in figure B1.16.6B 
are representative of the absolute values of Q for each nuclear spin state and are proportional to the 
populations of those states. The phase of each transition can be determined by subtracting the Q value of the 
upper level from that of the lower. For transitions 1 and 2, this value is 2a^(A g), which is positive and yields 
absorptive transitions. For transitions 3 and 4, the value [-2a 2 (A g)] results in emission transitions. A stick 
plot of the CIDNP spectrum is shown in the figure. For the CIDNP net effect, each line within a multiplet will 
always have the same phase. 
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Figure Bl.16.6. An example of CIDNP net effect for a radical pair with two hyperfine interactions. Part A 
shows the spin levels and schematic NMR spectrum for unpolarized product. Part B shows the spin levels and 
schematic NMR spectrum for polarized product. Populations are indicated on each level. Initial conditions: g 1 
> g 2 ; a i > 0; a 2 > 0; spins on different radicals; the RP is initially triplet. 
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The phase of a transition in a CIDNP spectrum can be determined using rules developed by Kaptein [20], The 
rule for the net effect is shown in equation (Bl.16.6). For each term, the sign (+ or -) of that value is inserted, 
and the final sign determines the phase of the polarization: plus is absorptive and minus is emissive. The 
variables are defined in the caption to figure B 1.1 6. 7. 
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Figure Bl.16.7. Kaptein's rules for net and multiplet RPM of CIDNP. The variables are defined as follows: |u 
= + for RP formed from triplet precursor or F pairs and - for RP formed from singlet precursor, s = + for 
recombination (or disproportionation)/cage products and - for scavenge/escape products, a .. = + if nuclei i 
andy were on the same radical and - if nuclei i andy were on different radicals. A g i = sign of (g 1 - g 2 )- a = 
sign of hyperfine interaction. J.. = sign of exchange interaction. 


f^t = txeAgjtij. 


(B1.16.6) 


Kaptein's rule is applied below to each transition in the example in figure Bl.16.6 . It is important to choose 
Ag correctly: Ag is equal to g 1 - g 2 where g 1 describes the radical containing the nucleus of interest (often a 
proton) while g 2 is the other radical in the RP. The rule correctly predicts absorptive phase for NMR 
transitions 1 and 2 and emissive for NMR transitions 3 and 4. 


for I and 2: T nci = (+)(+)(+)(+) = + = A 
for 3 and 4: r nLi = (+)(+}(-}(+) = - = E, 

B1. 16.2.5 RADICAL PAIR MECHANISM: MULTIPLET EFFECT 

The other RPM polarization pattern observed in CIDNP spectra is called the multiplet effect. In contrast to the 
net effect, the multiplet effect occurs when the hyperfine interactions are large compared with Ag. This is best 
explained by example, and the radical pair for a hypothetical case is shown in figure B 1.16.8 . We note that 
only the recombination product will be considered here. Both radicals are identical and have two protons with 
hyperfine coupling, H l and H 2 . The initial conditions for this example are that Ag is zero; the nuclear spin- 
spin coupling constant, J 12 is positive; a^ is negative; a 2 is positive; the RP is initially a singlet; and the 
nuclear spins are both on the same radical. Values proportional to Q are again shown on each nuclear spin 
level in figure B 1.1 6. 8 . Because Ag is zero, the Zeeman term in the equation for Q is zero and, therefore, the 
value of Q is proportional to the magnitude and the sign of the sum or difference of the hyperfine coupling 
constants, as shown in figure B 1.1 6. 8 . Assuming that a 1 and a 2 are roughly equal in magnitude but opposite 
in sign, it should be clear that the aa and PP nuclear spin states will have very small Q values while ap and 
Pa will have larger Q values. Accordingly, ap and Pawill intersystem cross from singlet to triplet more 
quickly, and these levels will be depleted by escape from the cage relative to the aa and PP levels. The 
remaining populations are indicated by the width of the bars in the bottom of figure B 1.16.8 , and the 
transitions from more populated levels to less populated ones are shown. As shown in the stick plot CIDNP 
spectrum, the lines of each multiplet alternate in phase. Because the first line is emissive and the second line is 
absorptive, this pattern is called E/A for emissive/absorptive. 
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Figure Bl.16.8. Example of CIDNP multiplet effect for a symmetric radical pair with two hyperfine 
interactions on each radical. Part A is the radical pair. Part B shows the spin levels with relative Q values 
indicated on each level. Part C shows the spin levels with relative populations indicated by the thickness of 
each level and the schematic NMR spectrum of the recombination product. 


Kaptein's rule for the multiplet effect is useful for predicting the phase of each transition, and it is similar to 


but has more variables than the rule for the net effect. The variables in equation (B 1.1 6. 7) are defined in 
figure B 1.1 6. 7 . A final sign of plus predicts E/A phase while minus predicts A/E. 


r n i..n = ^aiiijJij^j (B1.16.7) 

The application of Kaptein's rule to the example in figure Bl. 16.8 is shown below, and it correctly predicts 
E/A multiplets. 

r,nuh = <-)(+) (-)(+)<+)(+) = + = e/a. 

One of the most attractive features of the CIDNP multiplet effect is that it allows determination of the sign of 
the /coupling, which is often difficult to do by other methods. 
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B1.16.2.6 EXAMPLES OF CIDNP 

While the stick plot examples already presented show net and multiplet effects as separate phenomena, the 
two can be observed in the same spectrum or even in the same NMR signal. The following examples from the 
literature will illustrate 'real life' uses of CIDNP and demonstrate the variety of structural, mechanistic, and 
spin physics questions which CIDNP can answer. 

Roth et al [10] have used CIDNP to study the structures of vinylcyclopropane radical cations formed from 
precursors such as sabinene (1). 



The radical cation of 1 (1 + ) is produced by a photo-induced electron transfer reaction with an excited electron 
acceptor, chloranil. The major product observed in the CIDNP spectrum is the regenerated electron donor, 1. 

The parameters for Kaptein's net effect rule in this case are that the RP is from a triplet precursor (jli is +), the 
recombination product is that which is under consideration (s is +) and Ag is negative. This leaves the sign of 
the hyperfine coupling constant as the only unknown in the expression for the polarization phase. Roth et al 
[ 10 ] used the phase and intensity of each signal to determine the relative signs and magnitudes of the 

hyperfine coupling constant for each proton in 1 + . Signals in enhanced absorption indicated negative 
hyperfine coupling constants while emissive signals indicated positive hyperfines. 

The CIDNP spectrum is shown in figure Bl.16.1 from the introduction, top trace, while a dark spectrum is 
shown for comparison in figure Bl.16.1 bottom trace. Because the sign and magnitude of the hyperfine 
coupling constant can be a measure of the spin density on a carbon, Roth et al [10] were able to use the 

relative spin density of each carbon to determine that the structure of the radical cation 1 + is a delocalized 
one, shown below. This example demonstrates the use of CIDNP to determine the signs and relative 


magnitudes of the hyperfine coupling constants and to assign the structure of an intermediate. 
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In a case involving both net and multiplet effects, Goez and Sartorius [21] studied the photoreaction of 
triethylamine with various triplet sensitizers containing carbonyl functionalities. In a two-step process, the 
amine (DH) first transfers an electron to the excited sensitizer and forms the aminium cation DH + . The 
aminium cation is then deprotonated to form a neutral a-aminoalkyl radical (D), which can go on to form 
products. In this example, triethylamine (DH) was reacted with a variety of sensitizers, and N,N- 
diethylvinylamine was the polarized product which was studied. N,N-diethylvinylamine can be formed by two 
different pathways. If the deprotonation step occurs in cage, H + is transferred to the sensitizer, and 
polarization in the product arises from a neutral radical pair, AHD\ If the deprotonation step occurs out of 
cage, then H + will be abstracted by free amine; in this case, polarization is formed from a radical ion pair, A- 
DH + . The goal of this work was to determine the intermediates leading to N,N-diethylvinylamine; does the 
deprotonation step occur in cage or out of cage? 

The radical cation and neutral radical derived from triethylamine are shown below. 




DH- D* 

DH + has only one non-negligible hyperfine, a^ a = +19.0 G while D' has two significant hyperfines, a Ha 
13.96 G and # Ha = +19.24 G. Clearly, these two radicals will lead to very different polarizations in the 

CIDNP spectrum of both cage and escape products. 


Figure B 1.16.9 shows background-free, pseudo-steady-state CIDNP spectra of the photoreaction of 
triethylamine with (a) anthroquinone as sensitizer and (b) and (c) xanthone as sensitizer. Details of the 
pseudo-steady-state CIDNP method are given elsewhere [22]. In trace (a), no signals from the (3 protons of 
products 1 (recombination) or 2 (escape) are observed, indicating that the products observed result from the 
radical ion pair. Traces (b) and (c) illustrate a useful feature of pulsed CIDNP: net and multiplet effects may 
be separated on the basis of their radiofrequency (RF) pulse tip angle dependence [23]. Net effects are shown 
in trace (b) while multiplet effects can 
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be seen in (c). Both traces show signals from the P protons of products 1 and 2, indicating that these products 
were formed from a neutral radical pair intermediate. It was ultimately determined that the time scale of the 
deprotonation step relative to the lifetime of the radical ion pairs determined whether products were formed 
from radical ion pairs or from neutral radical pairs. The energetics of the system varied with the sensitizer, 
and results were compiled for a variety of sensitizers. This example illustrates the very common application of 
CIDNP as a mechanistic tool. 
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Figure Bl.16.9. Background-free, pseudo-steady-state CIDNP spectra observed in the photoreaction of 
triethylamine with different sensitizers ((a), anthraquinone; (b), xanthone, CIDNP net effect; (c), xanthone, 
CIDNP multiplet effect, amplitudes multiplied by 1.75 relative to the centre trace) in acetonitrile-d^. The 
structural formulae of the most important products bearing polarizations (1, regenerated starting material; 2, 
N,N-diethylvinylamine; 3, combination product of amine and sensitizer) are given at the top; R denotes the 
sensitizer moiety. The polarized resonances of these products are assigned in the spectra. Reprinted from [21]. 
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In an extension of traditional CIDNP methods, Closs and co-workers developed time-resolved CIDNP (TR- 
CIDNP) in the late 1970s [24, 25 and 26]. The initial time-resolved experiments had a time resolution in the 


microsecond range [24], but a nanosecond method was later developed [27]. A typical pulse sequence for 
time-resolved CIDNP involves a series of saturation pulses to remove background signals from equilibrium 
polarization followed by a laser pulse to from the radical pairs. After a preset delay time, x, after the laser 
flash, a RF pulse is applied, and the FID of the product is acquired. Further details of this experiment are 
given in [26], 

The first application of the time-resolved CIDNP method by Closs and co-workers involved the Norrish 1 
cleavage of benzyl phenyl ketone [24, 25]. Geminate RPs may recombine to regenerate the starting material 
while escaped RPs may form the starting ketone (12), bibenzyl (3), or benzil (4), as shown below. 
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Closs et al [ 25 ] plotted the polarizations versus time of the starting ketone (2, trace B, emissive signal) and the 
bibenzyl escape product (3, trace A, absorptive signal), as shown in figure Bl. 16.10 . Ketone 2 can be formed 
either by recombination of geminate pairs or the reaction of F pairs; in either case, the polarization will be 
emissive. At the earliest delay time (1 jus), the emissive signal of 2 is already present while no polarization 
from F pairs is apparent because there has not been sufficient time for the diffusion of radicals to occur. At 
later delay times, both absorptive and emissive polarizations grow until they reach a maximum. In order to 
demonstrate that much of the emissive polarization was due to production of 2 from F pairs, a thiol scavenger 
was added to trap the escaped benzoyl radicals. As shown in trace C, the emissive polarization is constant 
from the earliest delay time when the scavenger is present, indicating that much of the emissive polarization 
from geminate pairs is constant, while that from F pairs grows in with time. This was the first instance in 
which polarization from F pairs was shown to enhance the polarization of geminate products. In addition to 
establishing the utility of the time-resolved CIDNP method, this experiment was the first to demonstrate that 
polarization from cage and escape products could be separated based on the time scale of its production. 
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Figure Bl. 16.10. Intensity of CH 2 resonance as a function of delay time in A dibenzyl, B deoxybenzoin, and 
C deoxybenzoin in presence of thiol scavenger. Reprinted from [25]. 


While the earliest TR-CIDNP work focused on radical pairs, biradicals soon became a focus of study. 
Biradicals are of interest because the exchange interaction between the unpaired electrons is present 
throughout the biradical lifetime and, consequently, the spin physics and chemical reactivity of biradicals are 
markedly different from radical pairs. Work by Morozova et al [28] on polymethylene biradicals is a further 
example of how this method can be used to separate net and multiplet effects based on time scale [28]. Figure 
Bl.16.11 shows how the cyclic precursor, 2,12-dihydroxy-2,12-dimethylcyclododecanone, cleaves upon 308 
nm irradiation to form an acyl-ketyl biradical, which will be referred to as the primary biradical since it is 
formed directly from the cyclic precursor. The acyl-ketyl primary biradical decarbonylates rapidly (k co > 5 x 

10 7 s ) to form a bis-ketyl biradical, which will be referred to as the secondary biradical. Both the primary 
and secondary biradicals can form a number of diamagnetic products, as shown in Figure Bl.16.11 . 
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Figure Bl.16.11. Biradical and product formation following photolysis of 2,12-dihydroxy-2,12- 
dimethylcyclododecanone. Reprinted from [28]. 


In the TR-CIDNP spectrum, the methyl protons of products IV, V, and VI, formed from the secondary 
biradical, show a combination of net and multiplet polarizations. Morozova et al [28] measured separately the 
time dependence of the net and multiplet polarizations for this group of protons, and the results are shown in 
figure Bl.16.12 and figure Bl.16.13 respectively. Clearly, the net and multiplet polarizations develop on 
different time scales; while the net polarization is constant after approximately 1 |us, the multiplet polarization 
takes much longer to evolve. It was determined that this difference arises because the net polarization in these 
products of the secondary biradical is actually inherited from the primary biradical, while the multiplet 
polarization is generated in the secondary biradical. If chemical transformation (decarbonylation in this case) 
is fast compared with the rate of intersystem crossing, then a secondary biradical or radical pair may inherit 
polarization from its precursor. This is known as the memory effect in CIDNP, and this work was the first 
report of the memory effect in biradicals. The polarization inherited from the primary biradical is net because 
Ag > in the primary biradical; because the secondary biradical is symmetric, A g = 0, and only multiplet 
polarization can be generated. It was also determined in this study that the kinetics of the net effect reflect the 
decay of the T^ level while the multiplet effect corresponds to the decay of the T + and T_ levels; the reasons 
for these observations are beyond the scope of this presentation, but the interested reader is directed to the 
references for additional details. 
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Figure Bl. 16.12. Experimental kinetics of the CIDNP net effect: (A) for the aldehyde proton of the products 
II and III of primary biradical; (!) for the CH 3 CH(OH) protons of the products IV, V, and VI of secondary 

biradical. Reprinted from [28]. 
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Figure Bl. 16.13. Kinetics of the CIDNP multiplet effect: (full curve) the calculated CIDNP kinetics for the 
product of disproportionation of bis-ketyl biradical; (O) experimental kinetics for the CH 3 CH(OH) protons of 
the products IV, V and VI of the secondary biradical. Reprinted from [28]. 
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B1.16.3CIDEP 


As the electron counterpart to CIDNP, CIDEP can provide different but complementary information on free 
radical systems. Whereas CIDNP involves the observation of diamagnetic products, the paramagnetic 


intermediates themselves are observed in CIDEP studies. Unlike CIDNP, CIDEP does not require chemical 
reaction for the formation of polarization, as we shall see below. In addition to the RPM, three other 
mechanisms may produce electron polarization: the triplet mechanism (TM), the radical-triplet pair 
mechanism (RTPM) and the spin-correlated radical pair (SCRP) mechanism. Some of these mechanisms 
provide information which can lead directly to the structural identification of radical intermediates while all of 
them supply data which may be used to elucidate mechanisms of radical reactions. 

Experimentally, the observation of CIDEP is difficult but not impossible using a commercial steady-state EPR 
spectrometer with 100 kHz field modulation [29]. The success of this particular experiment requires large 
steady-state concentrations of radicals and rather slow spin relaxation, as the time response of the instrument 
is, at best, 10 |us. Therefore, the steady-state method works well for only a limited number of systems and 
generally requires a very strong CW lamp for irradiation. The time response can be shortened if the radicals 
are produced using a pulsed laser and the 100 kHz modulation is bypassed. The EPR signal is then taken 
directly from the preamplifier of the microwave bridge and passed to a boxcar signal averager [30] or 
transient digitizer [31]. At the standard X-band frequency the time response can be brought down to about 60 
ns in this fashion. In this case the overall response becomes limited by the resonant microwave cavity quality 
factor [32]. At higher frequencies such as g-band, the time response can be limited by laser pulse width or 
preamplifier rise time ( < 10 ns) [33], Using the boxcar or digitizer method, CIDEP is almost always 
observable, and this so-called time-resolved (CW) EPR spectroscopy is the method of choice for many 
practitioners. The 'CW in the name is used to indicate that the microwaves are always on during the 
experiment, even during the production of the radicals, as opposed to pulsed microwave methods such as 
electron-spin-echo or Fourier-transform (FT) EPR. Significant advantages in sensitivity with similar time 
response are available with FT-EPR, but there are also disadvantages in terms of the spectral width of the 
excitation that limit the application of this technique [34]. The TR (CW) method is the most facile and cost 
effective method for the observation of complete EPR spectra exhibiting CIDEP on the sub-microsecond time 
scale. 

B1. 16.3.1 RADICAL PAIR MECHANISM 

The RPM has already been introduced in our explanation of CIDNP. The only difference is that for the 
electron spins to become polarized, product formation from the geminate RP is not required. Rather, in a 
model first introduced by Adrian, so-called 'grazing encounters' of geminate radical pairs are all that is 
required [35]. Basically the RP must diffuse from a region where the exchange interaction is large to one 
where it is small, then back again. The spin wavefunction evolution that mixes the S and T^ electronic levels 
in the region of small J leads to unequal populations of the S and Tq states in the region of large J. The 
magnitude of the RPM CIDEP is proportional to this population difference. 
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As for CIDNP, the polarization pattern is multiplet (E/A or A/E) for each radical if Ag is smaller than the 
hyperfine coupling constants. In the case where Ag is large compared with the hyperfines, net polarization 
(one radical A and the other E or vice versa) is observed. A set of rules similar to those for CIDNP have been 
developed for both multiplet and net RPM in CIDEP (equation (B 1.1 6. 8) and equation (B 1.1 6.9)) [36]. In 
both expressions, jli is postitive for triplet precursors and negative for singlet precursors. J is always negative 
for neutral RPs, but there is evidence for positive /values in radical ion reactions [37]. In equation (Bl.16.8), 
r mult = + predicts E/A while r mult = - predicts A/E. For the net effect in equation (B 1.16.9), r net = + 
predicts A while r net predicts E. 


Tmult = — Jf* (B1.16.8) 

r ncl = +J AgfA* (B1.16.9) 


Because the number of grazing encounters is a function of the diffusion coefficient, CIDEP by the RPM 
mechanism is a strong function of the viscosity of the solvent and, in general, the RPM becomes stronger with 
increasing viscosity. Pedersen and Freed [39] have developed analytical techniques for the functional form of 
the viscosity dependence of the RPM. 

A typical example of RPM multiplet effects is shown in figure B 1.16.2 . Upon 308 nm laser irradiation, the O- 
O bond of a fluorinated peroxide dimer is cleaved to yield two radicals plus C0 2 as shown in figure B 1.1 6. 2 . 

The radical signal is split into a doublet by the a-fluorine atom, and each line in the doublet is split into a 
quartet by the adjacent CF 3 group (although not all lines are visible in these spectra). The A/E pattern of the 
spectrum in figure B1.16.2A indicates that the RP is formed from a singlet precursor. When benzophenone, a 
triplet sensitizer, is added to the system, the precursor becomes a triplet, and the polarization pattern is now 
E/A, as shown in figure B1.16.2B . These spectra demonstrate the utility of RPM CIDEP in determining the 
spin multiplicity of the precursor. 

B1. 16.3.2 TRIPLET MECHANISM 

A second mechanism of CIDEP is the triplet mechanism (TM) [40, 41] . As the name implies, this polarization 
is generated only when the RP precursor is a photoexcited triplet state. The polarization is produced during 
the intersystem crossing process from the first excited singlet state of the molecular precursor. It should be 
noted that this intersystem crossing, which will be explained in detail later, is to be distinguished from that 
described for RPs described above. Because the TM polarization is present before the triplet reacts to produce 
the RP, its phase is either net E or net A for both radicals. The origin of the polarization is as follows: in the 
intersystem crossing process, which is dominated by spin-orbit interactions, the most suitable basis set is one 
where the canonical orientations of the triplet state are represented. An example is shown in figure B 1.1 6. 14 
where the directions of the triplet 7., T and T z basis functions for naphthalene are indicated in their usually 
defined orientations. These are also sometimes called the 'zero-field' basis functions. As the electrons 
undergo the intersystem crossing process, these are the orbital directions they 'see'. This is called the 
molecular frame of reference. 
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Figure Bl. 16.14. Top, the canonical axes for triplet naphthalene. The z-axis is directed out of the plane of the 
paper. Bottom, energy levels and relative populations during the CIDEP triplet mechanism process. See text 


for further details. 

Because the spin-orbit interaction is anisotropic (there is a directional dependence of the 'view' each electron 
has of the relevant orbitals), the intersystem crossing rates from S^ to each triplet level are different. 

Therefore, unequal populations of three triplet levels results. The T , T and T basis functions can be 

i x y z 

rewritten as linear combinations of the familiar a and P spin Ifunctions and, consequently, they can also be 
rewritten as linear combinations of the high-field RP spin wavefunctions T + , T^ and T_, which we have 
already described above. The net polarization generated in the zero-field basis is carried over to the high-field 
basis set and, consequently, the initial condition for the geminate RP is that the population of the triplet levels 

is not strictly equal (3 each). Exactly which triplet level is overpopulated depends on the sign of the zero-field 
splitting parameter D in the precursor triplet state. A representative energy level diagram showing the flow of 
population throughout the intersystem crossing, RP and free radical stages is shown in figure Bl.16.14. 

The absolute magnitude of the TM polarization intensity is governed by the rate of rotation of the triplet state 
in the magnetic field. If the anisotropy of the zero-field states is very rapidly averaged (low viscosity), the TM 
is weak. If the experiment is carried out in a magnetic field where the Zeeman interaction is comparable with 
the D value and the molecular tumbling rate is slow (high viscosity), the TM is maximized. Additional 
requirements for a large TM polarization are: (1) the intersystem crossing rates from S^ to T , T and T z must 
be fast relative to the RP production step, and (2) the spin relaxation time in the excited triplet state should not 
be too short in order to ensure a large TM polarization. An example of the triplet mechanism from work by 
Jent et al [42] is shown in figure Bl. 16.15 . Upon laser flash photolysis, dimethoxyphenylacetophenone 
(DMPA, 5) forms an excited singlet and undergoes fast intersystem crossing and subsequent photocleavage of 
the triplet to form radicals 6 and 7 as shown below. 
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Radical 7 can subsequently fragment to form methyl radical (8) and methylbenzoate (9). 
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Figure Bl.16.15. TREPR spectrum after laser flash photolysis of 0.005 M DMPA (5) in toluene, (a) 0.7 jus, 
203 K, RF power 10 mW; O, lines CH 3 (8), spacing 22.8 G; t, benzoyl (6), remaining lines due to (7). (b) 

2.54 jus, 298 K, RF power 2 mW to avoid nutations, lines of 7 only. Reprinted from [42]. 
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Contradictory evidence regarding the reaction to form 8 and 9 from 7 led the researchers to use TREPR to 
investigate the photochemistry of DMPA. Figure B1.16.15A shows the TREPR spectrum of this system at 0.7 
|us after the laser flash. Radicals 6, 7 and 8 are all present. At 2.54 |us, only 7 can be seen, as shown in figure 
B1.16.15B. All radicals in this system exhibit an emissive triplet mechanism. After completing a laser flash 
intensity study, the researchers concluded that production of 8 from 7 occurs upon absorption of a second 
photon and not thermally as some had previously believed. 

B1. 16.3.3 RADICAL-TRIPLET PAIR MECHANISM 

In the early 1990s, a new spin polarization mechanism was postulated by Paul and co-workers to explain how 
polarization can be developed in transient radicals in the presence of excited triplet state molecules (Blattler et 
al [43], Blattler and Paul [44], Goudsmit et al [45]). While the earliest examples of the radical-triplet pair 
mechanism (RTPM) involved emissive polarizations similar in appearance to triplet mechanism polarizations, 
cases have since been discovered in which absorptive and multiplet polarizations are also generated by 
RTPM. 
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Polarization obtained by RTPM is related to the RPM in that diffuse encounters are still required, but differs 
in that it involves the interaction of a photoexcited triplet state with a doublet state radical. When a doublet 

state (electron spin 2) radical is present in high concentration upon production of a photoexcited triplet, the 
doublet and triplet interact to form new quartet and doublet states. When the two species find themselves in 
regions of effective exchange (\J\ > 0), a fluctuating dipole-dipole interaction (D) induces transitions between 
states, leading to a population redistribution that is non-Boltzmann, i.e. CIDEP. This explanation of RTPM is 
only valid in regions of moderate viscosity. If the motion is too fast, the assumption of a static ensemble will 
break down. The resulting polarization is either net E or net A depending on the sign of /(there is no 
dependence on the sign of D). This mechanism may also be observable in reactions where a doublet state 
radical produced by photolysis and unreacted triplets might collide. For this to happen, the triplet lifetimes, 
radical-triplet collision frequencies and triplet spin relaxation rates need to be of comparable time scales. 

Figure B 1.1 6. 16 shows an example of RTPM in which the radical species is TEMPO (10), a stable nitroxide 
radical, while the triplet state is produced by photoexcitation of benzophenone (11) [45]. 
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Figure Bl. 16.16. TREPR spectrum of TEMPO radicals in 1,2-epoxypropane solution with benzophenone, 1 
jus after 308 nm laser flash. Reprinted from [45]. 
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The three-line spectrum with a 15.6 G hyperfine reflects the interaction of the TEMPO radical with the 
nitrogen nucleus (1= 1); the benzophenone triplet cannot be observed because of its short relaxation times. 
The spectrum shows strong net emission with weak E/A multiplet polarization. Quantitative analysis of the 
spectrum was shown to match a theoretical model which described the size of the polarizations and their 
dependence on diffusion. 

B1.16.3.4 SPIN CORRELATED RADICAL PAIR MECHANISM 


The fourth and final CIDEP mechanism results from the observation of geminate radical pairs when they are 
still interacting, i.e. there is a measurable dipolar or exchange interaction between the components of the RP 
at the time of measurement. It is called the spin correlated radical pair (SCRP) mechanism and is found under 
conditions of restricted diffusion such as micelle-bound RPs [46, 47] or covalently linked biradicals [ 48 ] and 
also in solid state structures such as photosynthetic reaction centres [ 49 ] and model systems [50], In this 
mechanism additional lines in the EPR spectrum are produced due to the interaction. If the D or J value is 
smaller than the hyperfines, then the spectrum is said to be first order, with each individual hyperfine line split 
by 2 J or 2D into a doublet. The most unusual and immediately recognizable feature of the SCRP mechanism 
with small interactions is that each component of the doublet receives an opposite phase. For triplet precursors 
and negative /values, which is the common situation, the doublets are E/A. The level diagram in figure 
B 1.1 6. 17 shows the origin of the SCRP polarization for such a system, considering only one hyperfine line. 


|1> = T+ 


P>. 






/ 

\ 

I 

\ 

\ 


I I 


/ 

/ |4>-T 





I- 


Figure Bl. 16.17. Level diagram showing the origin of SCRP polarization. 

When the J or D coupling exceeds the hyperfine couplings, the spectrum becomes second order and is much 
more complex. Lines are alternating with E or A phase, and, if J or D becomes even larger or becomes 
comparable with the Zeeman interaction, a net emission appears from S-T_ mixing. In second-order spectra 
the /coupling must be extracted from the spectrum by computer simulation. In terms of line positions and 
relative intensities, these spectra have a direct analogy to NMR spectroscopy: the first-order spectrum is the 
equivalent of the AX NMR problem (two sets of doublets), while the second-order spectrum is analogous to 
the AB ('roof effect, intermediate /values) or AA' (convergence of lines, large /values) nuclear spin system 
[51]. 
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The SCRP polarization pattern can be complicated by other factors: chemical reaction can deplete the middle 
energy levels in Scheme Y as they have the most singlet character and are thus more likely to react upon 
encounter. Spin relaxation via either correlated (dipole-dipole) or uncorrelated (g factor or hyperfine 
anisotropy) mechanisms can also redistribute populations on the TREPR time scale [52]. The most important 
relaxation mechanism is that due to modulation of the exchange interaction caused by conformational motion 
which changes the inter-radical distance on the EPR time scale. Here both the T^ and T 2 relaxation processes 
are important [53], Interestingly, /modulation is also the process by which RPM is produced, and recently it 
has been demonstrated that at certain viscosities, both RPM and SCRP polarization patterns can be observed 
simultaneously in both micellar [54] and biradical-type RPs [55]. The presence of SCRP polarization in 
biradicals has enabled much information to be obtained regarding weak electronic couplings in flexible 
systems as a function of molecular structure, solvent, and temperature [56, 57, 58 and 59]. The spin 
polarization observed in EPR spectra of photosynthetic reaction centres has also proven informative in 
relating structure to function in those systems, especially in comparing structural parameters measured 
magnetically to those found by other methods such as x-ray crystallography [60]. 

Most observations of SCRP have been from triplet precursors, but Fukuju et al [61] have observed singlet- 
born SCRP upon photolysis of tetraphenylhydrazine in sodium dodecyl sulfate (SDS) micelles. The 
tetraphenylhydrazine (12) cleaves to form two diphenylamino (DPA) radicals, as shown below. 
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Figure Bl. 16.18 shows TREPR spectra of this system in SDS micelles at various delay times. The A/E/A/E 
pattern observed at early delay times is indicative of a singlet-born SCRP. Over time, a net absorptive 


component develops and, eventually, the system inverts to an E/A/E/A pattern at late delay times. The long 
lifetime of this SCRP indicates that the DPA radicals are hydrophobic enough that they prefer to remain in the 
micelle rather than escape. The time dependence of the spectra can be described by a kinetic model which 
considers the recombination process and the relaxation between all states of the RP. The reader is directed to 
the reference for further details. 
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Figure Bl. 16.18. TREPR spectra observed after laser excitation of tetraphenylhydrazine in an SDS micelle at 
room temperature. Reprinted from [61]. 

B1. 16.3.5 FURTHER EXAMPLES OF CIDEP 

While each of the previous examples illustrated just one of the electron spin polarization mechanisms, the 
spectra of many systems involve polarizations from multiple mechanisms or a change in mechanism with 
delay time. 

Work by Koga et al [62] demonstrates how the polarization mechanism can change upon alteration of the 
chemical environment. Upon laser flash photolysis, excited xanthone abstracts a proton from an alcohol 

solvent, cyclohexanol in this case. The xanthone ketyl radical (XnH) and the alcoholic radical (ROH) exhibit 
E/A RPM polarization with slight net emission, as shown in figure B 1.1 6. 19 (a). Upon addition of HC1 to the 
cyclohexanol solution, the same radicals are observed, but the polarization is now entirely an absorptive triplet 

mechanism, as shown in figure Bl.16.19 (b). It was determined that both H + and Cl~ or HC1 molecules must 
be present for this change in mechanism to occur, and the authors postulated that the formation of a charge 
transfer complex between xanthone and HC1 in their ground state might be responsible for the observed 
change in polarization. This curious result demonstrates how a change in the CIDEP mechanism can yield 
information about chemical changes which may be occurring in the system. 


-28- 



G-HJCOH UdlCftl 


kfl[yl ridlCll 


Figure Bl. 16.19. (a) CIDEP spectrum observed in the photolysis of xanthone (1.0 x 10 M) in cyclohexanol 
at room temperature. The stick spectra of the ketyl and cyclohexanol radicals with RPM polarization are 
presented, (b) CIDEP spectrum after the addition of hydrochloric acid (4.1 vol%; HC1 0.50 M) to the solution 
above. The stick spectra of the ketyl and cyclohexanol radicals with absorptive TM polarization are presented. 
The bold lines of the stick spectra of the cyclohexanol radical show the broadened lines due to ring motion of 
the radical. Reprinted from [62]. 

Utilizing FT-EPR techniques, van Willigen and co-workers have studied the photoinduced electron transfer 
from zinc tetrakis(4-sulfonatophenyl)porphyrin (ZnTPPS) to duroquinone (DQ) to form ZnTPPS and DQ 
in different micellar solutions [34, 63]. Spin-correlated radical pairs [ZnTPPS . . . DQ~] are formed initially, 
and the SCRP lifetime depends upon the solution environment. The ZnTPPS is not observed due to its short 
T 2 relaxation time, but the spectra of DQ~ allow for the determination of the location and stability of reactant 
and product species in the various micellar solutions. While DQ is always located within the micelle, the 

location of ZnTPPS and free DQ - depends upon the micellar environment. 

Figure B 1.1 6. 20 shows spectra of DQ~ in a solution of TX100, a neutral surfactant, as a function of delay 
time. The spectra are qualitatively similar to those obtained in ethanol solution. At early delay times, the 
polarization is largely TM while RPM increases at later delay times. The early TM indicates that the reaction 
involves ZnTPPS triplets while the A/E RPM at later delay times is produced by triplet excited-state electron 
transfer. Calculation of relaxation times from spectral data indicates that in this case the ZnTPPS porphyrin 
molecules are in the micelle, although some may also be in the hydrophobic mantle of the micelle. Further, 

lineshape and polarization decay analyses indicate that free DQ~ radicals move from the micelle into the 
aqueous phase. The lack of observation of spin-correlated radical pairs indicates that they have dissociated 
prior to data acquisition. Small out-of-phase signal contributions at the earliest delay times show that the 
radical pair lifetime in TX100 solution is approximately 100 ns. 
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FT-EPR spectra of the ZnTPPS/DQ system in a solution of cetyltrimethylammonium chloride (CTAC), a 
cationic surfactant, are shown in figure Bl. 16.21 . As in the TX100 solution, both donor and acceptor are 

associated with the micelles in the CTAC solution. The spectra of DQ~ at delays after the laser flash of less 
than 5 (is clearly show polarization from the SCRP mechanism. While SCRPs were too short-lived to be 
observed in TX100 solution, they clearly have a long lifetime in this case. Van Willigen and co-workers 


determined that the anionic radicals ZnTPPS and DQ - remain trapped in the cationic micelles, i.e. an 
electrostatic interaction is responsible for the extremely long lifetime of the [ZnTPPS 3- . . . DQ - ] spin- 
correlated radical pair. The spectrum at delay times greater than 5 |us is again due to free DQ - . Linewidth 
analysis and relaxation time calculations indicate that the free DQ - remains trapped in the cationic micelles. 
These results demonstrate the use of the CIDEP mechanisms in helping to characterize the physical 
environments of free radicals. In both cases shown here, spectral analysis allowed a determination of the 
lifetime of the initial SCRP and the location of the porphyrin and the free DQ - . 
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Figure Bl. 16.20. FTEPR spectra of photogenerated DQ - in TX100 solution for delay times between laser 
excitation of ZnTPPS and microwave pulse ranging from 20 ns to 1 1 jus. The central hyperfine line (M= 0) is 
at « - 4.5 MHz. Reprinted from [63]. 
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Figure B 1.1 6.21. FTEPR spectra photogenerated DQ~ in CTAC solution for delay times between laser 
excitation of ZnTPPS and microwave pulse ranging from 50 ns to 10 |us. The central hyperfine line (M = 
at «7 MHz. Reprinted from [63]. 
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Figure B 1.16.22 shows a stick plot summary of the various CIDEP mechanisms and the expected polarization 
patterns for the specific cases detailed in the caption. Each mechanism clearly manifests itself in the spectrum 
in a different and easily observable fashion, and so qualitative deductions regarding the spin multiplicity of 
the precursor, the sign of Jin the RP and the presence or absence of SCRPs can immediately be made by 
examining the spectral shape. Several types of quantitative information are also available from the spectra. 
For example, if the molecular structure of one or both members of the RP is unknown, the hyperfine coupling 
constants and g- factors can be measured from the spectrum and used to characterize them, in a fashion similar 
to steady-state EPR. Sometimes there is a marked difference in spin relaxation times between two radicals, 
and this can be measured by collecting the time dependence of the CIDEP signal and fitting it to a kinetic 
model using modified Bloch equations [64]. 
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Figure Bl. 16.22. Schematic representations of CIDEP spectra for hypothetical radical pair CH 3 + R. Part A 
shows the A/E and E/A RPM. Part B shows the absorptive and emissive triplet mechanism. Part C shows the 
spin-correlated RPM for cases where J « a H and J » a^. 

If the rate of chemical decay of the RP is desired, the task is complex because the majority of the CIDEP 
signal decays via relaxation pathways on the 1-10 |us time scale, as opposed to chemical reaction rates which 
are nominally about an order of magnitude longer than this. There are two ways around this problem. The first 
is to use a transient digitizer or FT-EPR and signal average many times to improve the signal-to-noise ratio at 
long delay times where chemical reaction dominates the decay trace. The second is to return to the steady- 
state method described above and run what is called a 'kinetic EPR' experiment, where the light source is 
suddenly interrupted and the EPR signal decay is collected over a very long time scale. The beginning of the 
trace may contain both relaxation of CIDEP intensity as well as chemical decay; however, the tail end of this 
trace should be dominated by the chemical reaction rates. Much use has been made of kinetic EPR in 
measuring free radical addition rates in polymerization reactions [65, 66]. 
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From SCRP spectra one can always identify the sign of the exchange or dipolar interaction by direct 
examination of the phase of the polarization. Often it is possible to quantify the absolute magnitude of D or J 
by computer simulation. The shape of SCRP spectra are very sensitive to dynamics, so temperature and 
viscosity dependencies are informative when knowledge of relaxation rates of competition between RPM and 
SCRP mechanisms is desired. Much use of SCRP theory has been made in the field of photosynthesis, where 
structure/function relationships in reaction centres have been connected to their spin physics in considerable 
detail [67, 68]. 
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B1.17 Microscopy: electron (SEM and TEM) 


Rasmus R Schroder and Martin Midler 


ABBREVIATIONS 

2D two-dimensional 

3D three-dimensional 

ssCCD slow-scan charge coupled device 

BSE backscattered electrons 

CTF contrast transfer function 

DQE detection quantum efficiency 

E electron energy 


E electron rest energy 


EDX energy dispersive x-ray detection spectroscopy 

EELS electron energy loss spectroscopy 

EFTEM energy filtering transmission electron microscope 

EM electron microscope/microscopy 

EPMA electron probe micro analysis 


ESEM environmental scanning electron microscope 

ESI electron spectroscopic imaging 

ESR electron spin resonance 

FEG field emission gun 

IP imaging plate 

LVSEM low-voltage scanning electron microscope 

MTF modulation transfer function 

NMR nuclear magnetic resonance 


PSF 

SE 

SEM 

STEM 

TEM 


point spread function 

secondary electron 

scanning electron microscope 

scanning transmission electron microscope 

transmission electron microscope 


B1.17.1 INTRODUCTION 


The electron microscope (EM) was developed in the 1930s primarily as an imaging device which 
exceeded the resolution power of the light microscope by several orders of magnitude. With the evolution 
towards dedicated instruments designed to answer specific structural and analytical questions, electron 
microscopy (EM) has grown into a heterogeneous field of electron beam devices. These allow the study of 
the interaction of electrons with the sample, which can subsequently be interpreted as information about 
object structure or chemical composition. Therefore, EM must be compared to other high-resolution 
diffraction methods, such as x-ray or neutron scattering, or to spectroscopic techniques such as electron 
spin resonance spectroscopy (ESR) and nuclear magnetic resonance spectroscopy (NMR). More recent, 
non-diffractive techniques include scanning tunnelling microscopy (STM) and atomic force microscopy 
(AFM) (for a detailed discussion see chapter B 1.1 9 ). 

All these methods are used today to obtain structural and analytical information about the object (see also 
the specific chapters about these techniques). In the case of structural studies, x-ray crystallography is the 
method of choice if suitable three-dimensional (3D) crystals of the object are available. In fact, x-ray 
crystallography has provided a vast number of atomic structures of inorganic compounds or organic 
molecules. The main advantage of EM, however, is the possibility of directly imaging almost any sample, 
from large biological complexes consisting of many macromolecules down to columns of single atoms in a 
solid material. With modern instruments and applying specific preparation techniques, it is even possible 
to visualize beam-sensitive organic material at molecular resolution (of the order of 3-4 A). Imaging at 
atomic resolution is almost routine in material science. Today, some challenges remain, including the 
combination of sub-Angstom resolution with chemical analysis and the ability to routinely reconstruct the 
complete 3D spatial structure of a given sample. 


The history of EM (for an overview see table Bl. 17.1 ) can be interpreted as the development of two 
concepts: the electron beam either illuminates a large area of the sample ('flood-beam illumination', as in 
the typical transmission electron microscope (TEM) imaging using a spread-out beam) or just one point, 
i.e. focused to the smallest spot possible, which is then scanned across the sample (scanning transmission 
electron microscopy (STEM) or scanning electron microscopy (SEM)). In both situations the electron 
beam is considered as a matter wave interacting with the sample and microscopy simply studies the 
interaction of the scattered electrons. 


Table Bl.17.1. Instrumental development of electron microscopy. 


Year Event Reference 


1926 Busch focuses an electron beam with a magnetic lens 

1931 Ruska and colleagues build the first TEM protoype Knoll and Ruska (1932) 

[79] 

1935 Knoll proves the concept of SEM 

1938 von Ardenne builds the first SEM prototype 

1939 Siemens builds the first commercial TEM 

1965 Cambridge Instruments builds the first commercial SEM 

1968 Crewe and colleagues introduce the FEG as electron beam source Crewe et al (1968) [80] 

1968 Crewe and colleagues build the first & STEM prototype Crewe et al (1968) [81] 

1995 Zach proves the concept of a corrected LVSEM Zach (1995) [10] 

1998 Haider and colleagues prove the concept of the TEM spherical aberration Haider et al (1998) \55] 

corrector 


In principle, the same physico-chemical information about the sample can be obtained from both illumination 
principles. However, the difference of the achievable spatial resolutions of both illumination principles 
illustrates the general difference of the two approaches. Spatial resolution in the case of flood-beam 
illumination depends on the point spread function (PSF) of the imaging lens and detector system and — for a 
typical TEM sample — on the interference effects representing phase shifts of the scattered electron wave. In 
general, resolution is described as a global phenomenon. For the scanned beam, all effects are local and 
confined to the illuminated spot. Thus, spatial resolution of any event is identical to the achievable spot 
diameter. Therefore, all steps in the further development of EM involve either the improvement of the PSF 
(either by improving sample preparation, imaging quality of the electron lenses, or improving spatial 
resolution of the electron detection devices) or the improvement of the spot size (by minimizing the size of the 
electron source using a field emission gun (FEG) and by improving the imaging quality of the electron 
lenses). 

In general, EM using a focused beam provides higher spatial resolution if the spot size of the scanning beam 
is smaller than the derealization of the event studied: consider, for example, inelastic scattering events which 
for signal-to-noise ratio (SNR) reasons can be imaged in energy filtering TEMs (EFTEM) at 1-2 nm 
resolution. In a dedicated STEM with an FEG electron source, a localization of the inelastic event comparable 


to the actual probe size of e.g. 2 A can be expected. Moreover, this effect is enhanced by the electron 
detectors used in TEM and S(T)EM. For scanned-beam microscopy, detectors do not resolve spatial 
information, instead they are designed for highest detection quantum efficiency (DQE almost ideal). Such an 
ideal detection has only recently been reached for TEM by the use of slow-scan charge-coupled device 
(ssCCD) cameras or imaging plates (IP). However, their spatial resolution is only moderate, compared to 
conventional, electron-sensitive photographic material. 


For several reasons, such as ease of use, cost, and practicability, TEM today is the standard instrument for 
electron diffraction or the imaging of thin, electron-transparent objects. Especially for structural imaging at 
atomic level (spatial resolution of about 1 A) the modern, aberration-corrected TEM seems to be the best 
instrument. SEM provides the alternative for imaging the surface of thick bulk specimens. Analytical 
microscopy can either be performed using a scanning electron probe in STEM and SEM (as for electron probe 
micro-analysis (EPMA), energy-dispersive x-ray spectroscopy (EDX) and electron energy loss spectroscopy 
(EELS)) or energy-selective flood-beam imaging in EFTEM (as for image-EELS and electron spectroscopic 
imaging (ESI)). The analytical EM is mainly limited by the achievable probe size and the detection limits of 
the analytical signal (number of inelastically scattered electrons or produced characteristic x-ray quanta). The 
rest of this chapter will concentrate on the structural aspects of EM. Analytical aspects are discussed in more 
detail in specialized chapters (see, for example, B1.6 ). 

It is interesting to note the analogy of developments in light microscopy during the last few decades. The 
confocal microscope as a scanning beam microscope exceeds by far the normal fluorescence light microscope 
in resolution and detection level. Very recent advances in evanescent wave and interference microscopy seem 
to promise to provide even higher resolution (B1.18). 

EM has been used in a wide variety of fields, from material sciences to cell and structural biology or medical 
research. In general, EM can be used for any high-resolution imaging of objects or their analytical probing. 
Modern instrumentation of STEM and SEM provides high-resolution instruments capable of probe sizes, in 
the case of TEM, of a few Angstrom or sub-Angstrom information limit. However, specimen properties and 
sample preparation limit the achievable resolution. Typical resolution obtained today range from atomic detail 
for solid materials, molecular detail with resolution in the order of 3-5 A for crystalline biological samples, 
and about 1-2 nm for individual particles without a certain intrinsic symmetry. Recent publications on the 
different aspects of EM include Williams and Carter ([I], general text covering all the modern aspects of EM 
in materials science), the textbooks by Reimer ([2,3 and 4], detailed text about theory, instrumentation, and 
application), or — for the most complete discussion of all electron-optical and theoretical aspects — Hawkes 
and Kasper [5]. Additional research papers on specialized topics are referenced in text. 


B1.17.2 INTERACTION OF ELECTRONS WITH MATTER AND IMAGING 
OF THE SCATTERING DISTRIBUTION 

The interaction of electrons with the specimen is dominated by the Coulomb interaction of charged particles. 
For a summary of possible charge-charge interactions see figure Bl. 17.1 . Elastic scattering by the Coulomb 
potential of the positively charged atomic nucleus is most important for image contrast and electron 
diffraction. This scattering potential is well localized, leads to large scattering angles and yields high- 
resolution structural information from the sample. In contrast, interactions with the atomic electrons lead to an 
energy loss of the incident electron by the excitation of different energy states of the sample, such as phonon 
excitation, plasmon excitation or inner-shell ionization (see table B 1.17.2 ). Inelastic scattering processes are 
not as localized as the Coulomb potential of the nucleus, leading to smaller scattering angles (inelastic 
forward scattering), and are in general not used to obtain high-resolution structural information. Instead, 
inelastic scattering provides analytical information about the chemical composition and state of the sample. 


Table Bl.17.2. Electron-specimen interactions. 


Elastic scattering 


Inelastic scattering 


Where 

Scattering 
potential 

Scattering angles 
E= 100 keV) 

Application 


Used effects 


Coulomb potential of nucleus 
Localized 

Large ( > 10 mrad) 


High-resolution signal (TEM, STEM) 
Back-scattering of electrons (BSE 
signal in SEM) 


At atomic shell electrons 
Less localized 

Smaller (< 10 mrad) 


Analytical signal (TEM, STEM, SEM) Emission of 
secondary electrons (SE signal in SEM) 


Phonon excitation (20 meV-1 eV) 

Plasmon and interband excitations (1-50 eV) 

Inner-shell ionization (A E = ionization energy 

loss) 

Emission of x-ray (continuous/characteristic, 

analytical EM) 


The ratio of elastically to inelastically scattered electrons and, thus, their importance for imaging or analytical 
work, can be calculated from basic physical principles: consider the differential elastic scattering cross section 
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with the characteristic screening angle Q = X/(2nR) where denotes the scattering angle, E is the electron 

1 f\ 
energy, Eq the electron rest energy, +eZ is the charge of the nucleus, R = a^Z , and a H = 0.0529 nm is the 

Bohr radius. Compare this to the inelastic differential scattering cross section 
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with the characteristic inelastic scattering angle 9 E = AE/E-(E+Eq)/(E+2Eq) for a given energy loss AE. 
For large scattering angles, 9 » 6q,9 e the ratio 
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only depends on the atomic number Z, whereas for small angles 


(B1.17.3) 
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for all Z. The total scattering cross sections are found by integrating the above equations [6] 
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or experimentally [7] 
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Figure Bl.17.1. Schema of the electron specimen interactions and their potential use for structural and 
analytical studies. 


Equation (Bl.17.1) , Equation (B 1.1 7.2) , Equation (Bl.17.3) , Equation (B 1.1 7.4) , Equation (Bl.17.5) and 
Equation (B 1.1 7.6) indicate that inelastic scattering is most important for light atoms, whereas elastic 
scattering dominates for large scattering angles and heavy atoms. Therefore, the high resolution image 
contrast — given by electrons scattered to large angles — for TEM and STEM is dominated by the elastic 
scattering process. Inelastically scattered electrons are treated either as background, or separated from the 
elastic image by energy-dispersive spectrometers. 

In the case of SEM, both elastic and inelastic processes contribute to image contrast: elastic scattering to large 
angles (multiple elastic scattering resulting in a large scattering angle) produces backscattered electrons 
(BSE), which can be detected above the surface of the bulk specimen. Inelastic collisions can excite atomic 
electrons above the Fermi level, i.e. more energy is transferred to the electron than it would need to leave the 
sample. Such secondary electrons (SE) are also detected, and used to form SEM images. As will be discussed 
in B 1.1 7.4 (specimen preparation), biological material with its light atom composition is often stained or 
coated with heavy metal atoms to increase either the elastic scattering contrast in TEM or the BSE signal in 
SEM. Unstained, native biological samples generally produce only little image contrast. 

Inelastic scattering processes are not used for structural studies in TEM and STEM. Instead, the signal from 
inelastic scattering is used to probe the electron-chemical environment by interpreting the specific excitation 
of core electrons or valence electrons. Therefore, inelastic excitation spectra are exploited for analytical EM. 

Next we will concentrate on structural imaging using only elastically scattered electrons. To obtain the 
structure of a scattering object in TEM it is sufficient to detect or image the scattering distribution. Consider 
the scattering of an incident plane wave \|/ (r) = 1 on an atomic potential V(r). The scattered, outgoing wave 
will be described by the time-independent Schrodinger equation. Using Green's functions the wave function 
can be written as 


m f €Xt) ( i fc I t — v r I ) 
ir(r) = exp{ifcr) r / — ' V(r')^(r')dr (B1.17.7) 

2t7j- / |r — r'| 

where k denotes the initial wave vector, m is equal to the electron mass. For large distances from the 
scattering centre r » r\ This can be approximated by 

if(r) = expUfcr) + /(ft) CXp(1 * f) (B1.17.8) 

r 

where denotes the scattering angle. 

In this approximation, the wave function is identical to the incident wave (first term) plus an outgoing 
spherical wave multiplied by a complex scattering factor 

f{$) = \f(ft)\exp(u}{&)) (B1.17.9) 


which can be calculated as 


j'(tt) = 1% / cxp{ifcV)V(r')Vf(r )dr (B1. 17.10) 
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where K = kr'lr. For a weak potential F(r) it is possible to use the first Born approximation, i.e. \|/(r f ) in 
equation (B 1.1 7. 10) can be replaced by the incident wave resulting in: 

j { 0)^--^- fcKp(\(k-k f )r f )V(r)d7-. (B1. 17.11) 

ZJTJI J 

This equation describes the Fourier transform of the scattering potential V(r). It should be noted that, in the 
Born approximation the scattering amplitude /(6) is a real quantity and the additional phase shift r|(0) is zero. 
For atoms with high atomic number this is no longer true. For a rigorous discussion on the effects of the 
different approximations see [2] or [5]. 

In a diffraction experiment a quantity \F(S)\ 2 can be measured which follows from equation ( Bl.17.8 ) and 
equation ( B 1.1 7. 9 ) in Fourier space as 

F(S) = 5u + i|/(S)| exp(iif(S)) (B1.17.12) 

where S = k-k } denotes the scattering vector. Combining equation (B 1 . 1 7 . 1 1 ) and equation (B 1 . 1 7 . 1 2) leads 
to the conclusion that in a diffraction experiment the squared amplitude of the Fourier transform of the 
scattering potential V(r) is measured. Similar formulae can be deduced for whole assemblies of atoms, e.g. 
macromolecules, resulting in the molecular transform instead of simple atomic scattering factors (for an 
introduction to the concept of molecular transforms see e.g. [8]). Such measurements are performed in x-ray 
crystallography, for example. To reconstruct the original scattering potential V(r) it is necessary to determine 
the phases of the structure amplitudes to perform the reverse Fourier transform. However, if lenses are 
available for the particles used as incident beam — as in light and electron microscopy — a simple microscope 
can be built: the diffracted wave is focused by an objective lens into the back focal plane, where the scattered 
and unscattered parts of the wave are separated. Thus, the objective lens can simply be understood as a 
Fourier transform operator. In a subsequent imaging step by one additional lens, the scattered and unscattered 
waves are allowed to interfere again to form a direct image of the scattering potential. This can be understood 
as a second Fourier transform of the scattering factor (or molecular transform) recovering the spatial 
distribution of the scattering centres. The small angular scattering distribution of only 10-20 mrad results in a 
complication in the case of EM. The depth of focus is very large, i.e. it is not possible to recalculate the 3D 
distribution of the scattering potential but only its 2D projection along the incident beam. All scattering 
distributions, images, or diffraction patterns are always produced by the 2D projecting transmission function 
of the actual 3D object. Using a variety of tomographic data collections it is then possible to reconstruct the 
true 3D object (see below). 


The above theory can also be applied to STEM, which records scattering distributions as a function of the 
scanning probe position. Images are then obtained by plotting the measured scattering intensities (i.e. in the 
case of the elastic scattering, the direct measurement of the scattering factor amplitudes) according to the 
probe position. Depending on the signal used, this leads to a conventional elastic dark-field image, or to 
STEM phase-contrast images [9]. 

In the case of the scanning electron microscope (SEM), images are formed by recording a specific signal 
resulting from the electron beam/specimen interaction as a function of the scanning probe position. Surface 
structures are generally described with the SE (secondary electron) signal. SEs are produced as a consequence 


of inelastic events. They have very low energies and, therefore, can leave the specimen and contribute to the 
imaging signal only when created very close to the specimen surface. The escape depth for the secondary 
electrons depends on the material. It is relatively large (tens of nanometres) for organic and biological 
material and small for heavy metals (1-3 nm). High-resolution topographic information (limited mainly by the 
diameter of the scanning electron beam) requires that the source of the signal is localized very close to the 
specimen surface. In the case of organic materials this localization can be achieved by a very thin metal 
coating (W, Cr, Pt; thickness = approximate SE escape depth). 

The BSE signal is also frequently used for imaging purposes. BSE are electrons of the primary beam 
(scanning probe) that have been elastically or inelastically scattered in the sample. Their energy depends on 
their scattering history. When scattered from the surface, they may have lost no or very little energy and 
provide high topographic resolution. When multiply scattered inside the sample they may have lost several 
keV and transfer information from a large volume. This volume depends on the material as well as on the 
energy of the primary beam. BSE produce SE when passing through the SE escape zone. These SE are, 
however, not correlated with the position of the scanning probe and contribute a background noise which can 
obscure the high resolution topographic SE signal produced at the point of impact of the primary beam. The 
interpretation of high-resolution topographic images therefore depends on optimized handling of specimen 
properties, energy of the electron probe, metal coating and sufficient knowledge of the signal properties. 

The discussion of electron-specimen interactions shows that, for a given incident electron dose, a certain 
quantity of resulting scattered electrons and secondary electrons or photons is produced. The majority of 
energy transfer into the specimen leads to beam damage and, finally, to the destruction of the sample 
structure. Therefore it is desirable to simultaneously collect as much information from the interactions as 
possible. This concept could lead to an EM instrument based on the design of a STEM but including many 
different detectors for the elastic dark field, phase contrast, inelastically scattered electrons, BSE, SE, and 
EDX. The complexity of such an instrument would be enormous. Instead, specific instruments developed in 
the past can coarsely be categorized as TEM for structural studies on thin samples, STEM for analytical work 
on thin samples and SEM for analytical and surface topography studies. 
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B1.17.3 INSTRUMENTATION 


B1.17.3.1 ELECTRON BEAM INSTRUMENTS 


The general instrumentation of an EM very much resembles the way an ordinary, modern light microscope is 
built. It includes an electron beam forming source, an illumination forming condenser system and the 
objective lens as the main lens of the microscope. With such an instrumentation, one forms either the 
conventional bright field microscope with a large illuminated sample area or an illumination spot which can 
be scanned across the sample. Typical electron sources are conventional heated tungsten hairpin filaments, 
heated LaB 6 -, or CeB 6 -single-crystal electron emitters, or — as the most sophisticated source — FEGs. The 
latter sources lead to very coherent electron beams, which are necessary to obtain high-resolution imaging or 
very small electron probes. 

Modern EMs use electromagnetic lenses, shift devices and spectrometers. However, electrostatic devices have 
always been used as electron beam accelerators and are increasingly being used for other tasks, e.g. as the 
objective lens (LVSEM, [10]). 

EM instruments can be distinguished by the way the information, i.e. the interacting electrons, is detected. 
Figure B 1.17.2 shows the typical situations for TEM, STEM, and SEM. For TEM the transmitted electron 
beam of the brightfield illumination is imaged simply as in an light microscope, using the objective and 


projective lenses as conventional imaging system. Combining such TEMs with energy-dispersive imaging 
elements (filters, spectrometers; see [14]) the modern generation of EFTEMs has been introduced in the last 
decade. In the case of STEM, the transmitted electrons are not again imaged by lenses, instead the scattered 
electrons are directly recorded by a variety of detectors. For SEM, the situation is similar to that of STEM. 
However, only the surface of a bulk specimen is scanned and the resulting backscattered or secondary 
electrons are recorded by dedicated detectors. 

As a special development in recent years, SEMs have been designed which no longer necessitate high vacuum 
(environmental SEM, ESEM; variable pressure SEM, VPSEM). This development is important for the 
imaging of samples with a residual vapour pressure, such as aqueous biological or medical samples, but also 
samples in materials science (wet rock) or organic chemistry (polymers). 
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Figure Bl.17.2. Typical electron beam path diagrams for TEM (a), STEM (b) and SEM (c). These schematic 
diagrams illustrate the way the different signals can be detected in the different instruments. 

B1. 17.3.2 ELECTRON DETECTORS IN TEM, STEM, AND SEM 


Detectors in EM can be categorized according to their different spatial resolution or in relation to the time it 
takes to actually see and process the signal (real-time/on-line capability). 


Historically, TEM — as an offspring of the electron oscillograph — uses a fluorescent viewing screen for the 
direct observation of impinging electrons by green fluorescent light. The spatial resolution of such screens is 
of the order of 30-50 jum. Coupled to a TV camera tube and computer frame-grabber cards, fluorescent 
screens are still used for the real-time recording of the image. Often the cameras are combined with silicon 
intensifier targets (SIT), which allows 
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the detection of single electrons. This is of special importance for the study of beam-sensitive samples. In 
general, the spatial resolution of such combinations is very poor and dynamic range of the signal is limited to 
about 256 grey levels (8 bit). 

The most common electron detector for TEM has been photographic emulsion. Its silver halide particles are 
sensitive to high-energy electrons and record images at very high resolution (grain size smaller than 10 |um) 
when exposed to the electron beam in the TEM. The resolution recorded on film depends on the scattering 
volume of the striking electron. For typical electron energies of 100-200 keV, its lateral spread is 10-20 |um. 
The dynamic range of photographic film is up to 4000 grey levels (12 bit). The most important advantage of 

film is its large detector size combined with high resolution: with a film size of 6 x 9 cm 2 , up to 10 7 image 
points can be recorded. 

A recent development is the adaptation of IP to EM detection. IPs have been used for detecting x-rays, which 

generate multiple electron-electron hole pairs in a storage layer of BaFBr:Eu 2+ . The pairs are trapped in Eu-F 
centres and can be stimulated by red light to recombine, thereby emitting a blue luminescent signal. Exposing 
IPs to high energetic electrons also produces electron-electron hole pairs. Scanning the IP with a red laser 
beam and detecting the blue signal via a photo-multiplier tube results in the readout of the latent image. The 

large detector size and their extremely high dynamic range of more than 10 7 makes IPs the ideal detector for 
electron diffraction. 

The only on-line detector for TEM with moderate-to-high spatial resolution is the slow-scan CCD camera. A 
light-sensitive CCD chip is coupled to a scintillator screen consisting of plastic, an yttrium-aluminium garnet 
(YAG) crystal, or phosphor powder. This scintillator layer deteriorates the original resolution of the CCD chip 
elements by scattering light into neighbouring pixels. Typical sizes of chips at present are 1024 x 1024 or 
2048 x 2048 pixels of (19-24 |um) 2 ; the achievable dynamic range is about 10 5 grey levels. 

For all the detectors with spatially distinct signal recording, the numeric pixel size (such as scanning pixel size 
for photographic film and IP, or chip-element size for ssCCD) must be distinguished from the actual 
obtainable resolution. This resolution can be affected by the primary scattering process of electrons in the 
detecting medium, or by the scattering of a produced light signal or a scanning light spot in the detecting 
medium. Therefore, a point signal is delocalized, mathematically described by the PSF. The Fourier transform 
of the PSF is called the modulation transfer function (MTF), describing the spatial frequency response of the 
detector. Whereas the ideal detector has a MTF = 1 over the complete spatial frequency range, real detectors 
exhibit a moderate to strong fall-off of the MTF at the Nyquist frequency, i.e. their maximal detectable spatial 
resolution. In addition to spatial resolution, another important quantity characterizing a detector is the 
detection quantum efficiency (DQE). It is a measure of the detector noise and gives an assessment for the 
detection of single electrons. 

For all TEM detectors the ssCCD has the best DQE. Depending on the scintillator, it is in the range of DQE = 
0.6 - 0.9, comparable to IPs which show a DQE in the order of 0.5-0.8. The DQE of photographic emulsion 
is strongly dependent on electron dose and does not exceed DQE = 0.2. For a complete and up-to-date 
discussion on TEM electron detectors see the special issue of Microscopy Research and Technique (vol 49, 
2000). 


In SEM and STEM, all detectors record the electron current signal of the selected interacting electrons (elastic 
scattering, secondary electrons) in real time. Such detectors can be designed as simple metal-plate detectors, 
such as the elastic dark-field detector in STEM, or as electron-sensitive PMT. For a rigorous discussion of 
SEM detectors see [3]. 
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Except for the phase-contrast detector in STEM [9], STEM and SEM detectors do not track the position of the 
recorded electron. The spatial information of an image is formed instead by assigning the measured electron 
current to the known position of the scanned incident electron beam. This information is then mapped into a 
2D pixel array, which is depicted either on a TV screen or digitalized in a computer. 

For the parallel recording of EEL spectra in STEM, linear arrays of semiconductor detectors are used. Such 
detectors convert the incident electrons into photons, using additional fluorescent coatings or scintillators in 
the very same way as the TEM detectors described above. 


B1.17.4 SPECIMEN PREPARATION 

The necessity to have high vacuum in an electron beam instrument implies certain constraints on the 
specimen. In addition, the beam damage resulting from the interaction of electrons with the specimen 
(radiation damage) requires specific procedures to transfer the specimen into a state in which it can be 
analysed. During these procedures, which are more elaborate for organic than for inorganic solid-state 
materials, structural and compositional aspects of the specimen may be altered and consequently the 
corresponding information may be misleading or completely lost. It must be mentioned here that the EM is 
only the tool to extract information from the specimen. As well as having its own physical problems (CTF, 
beam damage, etc) it is — like any other microscopy — clearly not capable of restoring information that has 
been lost or altered during specimen preparation. 

Specimens for (S)TEM have to be transparent to the electron beam. In order to get good contrast and 
resolution, they have to be thin enough to minimize inelastic scattering. The required thin sections of organic 
materials can be obtained by ultramicrotomy either after embedding into suitable resins (mostly epoxy- or 
methacrylate resins [11]) or directly at low temperatures by cryo-ultramicrotomy [12]. 

Ultramicrotomy is sometimes also used to produce thin samples of solid materials, such as metals [ 13 ] which 
are, however, preferentially prepared by chemical- or ion-etching (see [1]) and focused ion beam (FIB) 
techniques [14]. 

Bulk specimens for SEM also have to resist the impact of the electron beam instrument. While this is 
generally a minor problem for materials science specimens, organic and aqueous biological samples must be 
observed either completely dry or at low enough temperatures for the evaporation/sublimation of solvents and 
water to be negligible. Internal structures of aqueous biological samples can be visualized by cryosectioning 
or cryo fracturing procedures [15,16]. Similar procedures are used in the preparation of polymers and 
composites [17]. Fracturing and field-ion beam procedures are used to expose internal structures of 
semiconductors, ceramics and similar materials. 

The preparation of biological specimens is particularly complex. The ultrastructure of living samples is related 
to numerous dynamic cellular events that occur in the range of microseconds to milliseconds [18]. 
Interpretable high-resolution structural information (e.g. preservation of dimensions, or correlation of the 
structural detail with a physiologically or biochemically controlled state) is therefore obtained exclusively 
from samples in which life has been stopped very quickly and with a sufficiently high time resolution for the 
cellular dynamics [19]. Modern concepts for specimen preparation therefore try to avoid traditional, chemical 


fixation as the life-stopping step because it is comparatively slow (diffusion limited) and cannot preserve all 
cellular components. Cryotechniques, often in 
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combination with microscopy at low temperatures, are used instead. Very high cooling rates (> 10 000 K s _1 
are required to prevent the formation and growth of ice crystals, which would affect the structural integrity. 
Such high cooling rates, at the same time, result in a rapid arrest of the physiological events, i.e. produce a 
very high temporal resolution (microseconds to milliseconds) [20], in capturing dynamic processes in the cell 
[21,22]. 

Despite the drawbacks of chemical-fixation based procedures [23,24], most of our current knowledge on 
biological ultrastructure relies on this approach. In contrast to cryopreparative procedures, chemical fixation 
does not require special skills and instrumentation. 

Cryoimmobilization procedures that lead to vitrification (immobilization of the specimen water in the 
amorphous state) are the sole methods of preserving the interactions of the cell constituents, because the liquid 
character of the specimen water is retained (reviewed in [25]). 

Vitrification at ambient pressure requires very high cooling rates. It can be accomplished by the 'bare grid' 
approach for freezing thin (> 100 nm) aqueous layers of suspensions containing isolated macromolecules, 
liposomes, viruses, etc. This technique was used to produce figure B 1.1 7. 6 . It has developed into a powerful 
tool for structural biology, now providing subnanometre resolution of non-crystalline objects [ 26 , 27 ]. The 
bare-grid technique permits imaging of macromolecules in functional states with sufficient resolution to allow 
the correlation with atomic data from x-ray diffraction of crystals [28,29] (see also Journal of Structural 
Biology, vol 125, 1999). 

High-pressure freezing is at present the only practicable way of cryoimmobilizing larger non-pretreated 
samples [ 30 , 32 ], At a pressure of 2100 bars, about the 10-fold greater thickness can be vitrified, as compared 
to vitrification at ambient pressure [33] and a very high yield of adequately frozen specimens (i.e. no 
detectable effects of ice crystal damage visible after freeze substitution) has been demonstrated by TEM of 
suspensions of micro-organisms, as well as for plant and animal tissue, provided that the thickness of the 
aqueous layer did not exceed 200 |um. 

Biological material, immobilized chemically or by rapid freezing, must be transferred into an organic solvent 
that is compatible with the most frequently used hydrophobic resins. Chemically fixed materials are 
dehydrated in graded series of alcohol or acetone at room temperature. The ice of the frozen sample is 
dissolved at low temperature by a freeze-substitution process [34]- For TEM, the samples are embedded in 
resin [11], for SEM they are dried, most frequently by the critical-point drying technique, which avoids 
deleterious effects of the surface tension of the solvents. Dehydration and complete drying results in non- 
isotropic shrinkage of biological materials. 

The information that can be extracted from inorganic samples depends mainly on the electron beam/specimen 
interaction and instrumental parameters [I], in contrast to organic and biological materials, where it depends 
strongly on specimen preparation. 

For analytical SEM and non-destructive imaging (e.g. semiconductor, critical-dimension measurements (CD) 
and other quality control) adequate electron energies have to be selected in order to minimize charging up of 
the specimen. For high-resolution imaging of surfaces using the SE signal, the signal source often must be 
localized at the specimen surface by a thin metal coating layer [35]. 
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B1.17.5 IMAGE FORMATION AND IMAGE CONTRAST 

Whereas electron optics, sample preparation and the interaction of electrons with the sample follow a 
common set of rules for all different kinds of EM (TEM, STEM, SEM), the image formation and image 
contrast in EM images is very technique-specific. In general, analytical imaging is distinguished from 
structural imaging, the latter being further classified either into the imaging of the projected 2D scattering 
potential for TEM and STEM or into the topographical imaging of a surface in SEM. The complete 
understanding of the electron-sample interaction and the increasingly better understanding of the sample 
preparation and reconstruction of the object from image contrast allows a quantitative interpretation of EM 
data. The electron microscope has evolved, over a long period, from a simple imaging microscope to a 
quantitative data collection device. 

B1. 17.5.1 IMAGING OF PROJECTED STRUCTURE— THE CONTRAST TRANSFER FUNCTION (CTF) OF TEM 

The discussion of the electron-specimen interaction has already provided the necessary physical principles 
leading to amplitude and phase changes of the scattered electron wave. Consider again the elastic scattering as 
described by equation (B 1.17.1) . In STEM the elastic scattering is measured by an angular detector, 
integrating over all electrons scattered to high angles (see figure B 1.1 7. 3(a) . For thin samples, the measured 
image contrast can be directly interpreted as the spatial distribution of different atomic composition. It 
corresponds to the pointwise measurement of the sample's scattering factor amplitude (see equation 
(B 1.1 7.9) ) 


yint= / / f(0)d<pdS (B1. 17.13) 




where denotes the scattering angle, constrained by the geometry of the angular detector; cp denotes the 
azimuthal angle. According to the properties of the elastic scattering distribution ( equation (B 1.1 7.1) ), the 
detected signal for a given detection angle interval depends strongly on the atomic number Z of the scattering 
atom. This results in different contrast for different atomic composition (Z-contrast). With a different, position 
selective, detector it is also possible to measure the phase part of the scattering factor. The geometry of these 
detectors is illustrated in figure B 1.1 7. 3(b) [9]. 
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Figure Bl.17.3. STEM detectors: (a) conventional bright and dark-field detectors, electrons are detected 
according to their different scattering angles, all other positional information is lost; (b) positional detector as 
developed by Haider and coworkers (Haider et al 1994). 

It should be noted again that STEM provides lateral spatial discrimination of the 2D projected sample by the 
scanning of a point-like electron beam. Spatial resolution is thereby given by the focus of the incident beam, 
which is at present limited to a typical diameter of a few Angstoms. Modern TEM, using the very coherent 
electron wave of a FEG and higher electron energies (see Bl. 17.3.1 ), delivers higher resolution, i.e. its 
information limit can be improved into the sub-Angstrom regime. However, the correspondence of image 
contrast and scattering factor ^(0) ( equation (B 1 . 1 7 . 9) ) is more complicated than in STEM. In TEM, image 
contrast can be understood either by interference effects between scattered and unscattered parts of the 
electron wave, or by simple removal of electrons scattered to higher angles (scattering contrast). The latter is 
important for imaging of strongly scattering objects consisting of heavy metal atoms. Electrons scattered to 
high angles are easily removed by a circular aperture (see figure B 1.17.4(a) . Elastically scattered electrons 
not removed by such an aperture can form interference patterns with the unscattered part of the incident 
electron wave (see figure B 1.1 7.4(b) ). Such interference patterns lead to either diffraction patterns or images 
of the sample, depending on the imaging conditions of the microscope. Therefore, any wave aberrations of the 
electron wave, both by the imaged sample (desired signal) and by lens aberrations or defocused imaging, 
result in a change of image 
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contrast. Mathematically, this behaviour is described by the concept of contrast transfer in spatial frequency 
space (Fourier space of image), modelled by the CTF. 
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Figure Bl.17.4. Visualization of image contrast formation methods: (a) scattering contrast and (b) 
interference contrast (weak phase/weak amplitude contrast). 

In the simplest case of bright-field imaging, the CTF can easily be deduced: the elastically scattered electron 
wave can be described using a generalized phase shift 3> en by 


^Hcredir) = ^<](r)exp(i<tWr)) 


(B1. 17.14) 


where 


cxp(i4Wr)) = cxp(i^ c i(r) + ^ c itr)) 


(B1. 17.15) 
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and cp el (f) and |u el (r) denote the elastic phase and amplitude contrast potential. In the notation of equation 
(Bl.17.15) , cp el (r) denotes a positive phase potential whereas |Li el (>) denotes a negative absorption potential. 
Assuming a weak phase/weak amplitude object — compared to the unscattered part of the electron wave — the 
generalized phase shift can be reduced to 


exp(iO gcn (rJ) = L+i^i(r) + ji c i(7-}. 


(B1. 17.16) 


After propagation into the back focal plane of the objective lens, the scattered electron wave can be expressed 
in terms of the spatial frequency coordinates k as 


ftcafoettfCfc) = (Hk) * i^d(k) + JUd(fe)) * exp(-ilV(fc)), (B1.17.17) 

Here W(k) denotes the wave aberration 

W(k) = ^(CV^A 4 - 2A-AA 3 ) (B1. 17.18) 

with the objective lens spherical aberration C g? the electron wave length X and the defocus Az. It should be 
noted that, in the above formulae, the effect of inelastic scattering is neglected. For a rigorous discussion of 
image contrast, including inelastic scattering, see [36], 

In the usual approximation of the object as a weak phase/weak amplitude object, this scattered wave can be 
used to calculate the intensity of the image transform as 

f(k) = Vw em |(*) ® #«„,„,,(*). (B1.17.19) 

Calculating the convolution using equation (B 1.1 7. 17) and regrouping the terms yields the final equation for 
the image transform: 

l{k) = S(fc) - 2 x (tt-i(fe)) sin(lF(fc)) + £ c i(Jb) cos(W(fc)). (B1.17.20) 

The power spectrum of the image PS(k) is then given by the expectation value //( jfe) x /(A))' norma Uy 

calculated as the squared amplitude of the image transform. More detailed discussions of the above theory are 
found in [1,2]. 

In the conventional theory of elastic image formation, it is now assumed that the elastic atomic amplitude 
scattering factor is proportional to the elastic atomic phase scattering factor, i.e. 

Acl(fc) = -Mk)${k) = -AVz\{k). (B1. 17.21) 
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The factor A has been measured for a variety of samples, indicating that the approximation can be applied up 
to quasi-atomic resolution. In the case of biological specimens typical values of A are of the order of 5-7%, as 
determined from images with a resolution of better than 10 A [37,38]. For an easy interpretation of image 
contrast and a retrieval of the object information from the contrast, such a combination of phase and 
amplitude information is necessary. 

Figure B 1.17.5 shows typical examples of the CTF for weak phase, weak amplitude or combined samples. 
The resulting effect on image contrast is illustrated in Figure B 1.17.6 , which shows image averages of a 
protein complex embedded in a vitrified aqueous layer recorded at different defocus levels. The change in 
contrast and visible details is clear, but a direct interpretation of the image contrast in terms of object structure 
is not possible. To reconstruct the imaged complex, it is necessary to combine the information from the 
different images recorded with different defocus levels. This was first suggested by Schiske [39] and is 
normally applied to high-resolution images in materials science or biology. Such correction procedures are 
necessary to rectify the imaging aberrations from imperfect electron-optical systems, which result in a 
delocalized contrast of object points. In recent years, improvements of the electron-optical lenses have been 
made, and high-resolution imaging with localized object contrast will be possible in the future (see Bl. 17.5.3 


and figure B 1.17.9 ). It should be noted that any high-resolution interpretation of an EM image strongly 
depends on the correction of the CTF. In high-resolution TEM in materials science, sophisticated methods for 
this correction have been developed and are often combined with image simulations, assuming a certain 
atomic model structure of the sample. Three mainstream developments in this field are: (1) the use of focal 
series and subsequent image processing [40,41], (2) electron holography [42,43 and 44] and (3) the 
development of corrected TEMs, which prohibit contrast derealization (see Bl.17.5.3 ). 
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Figure Bl.17.5. Examples of CTFs for a typical TEM (spherical aberration C § = 2.7 mm, 120 keV electron 
energy). In (a) and (b) the idealistic case of no signal decreasing envelope functions [ 77 ] are shown, (a) Pure 
phase contrast object, i.e. no amplitude contrast; two different defocus values are shown (Scherzer focus of 
120 nm underfocus (solid curve), 500 nm underfocus (dashed curve)); (b) pure amplitude object (Scherzer 
focus of 120 nm underfocus); (c) realistic case including envelope functions and a mixed weak 


amplitude/weak phase object (500 nm underfocus). 
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Figure Bl.17.6. A protein complex (myosin SI decorated filamentous actin) embedded in a vitrified ice layer. 
Shown is a defocus series at (a) 580 nm, (b) 1 130 nm, (c) 1700 nm and (d) 2600 nm underfocus. The pictures 
result from averaging about 100 individual images from one electron micrograph; the decorated filament 
length shown is 76.8 nm. 

B1. 17.5.2 IMAGING OF SURFACE TOPOLOGY 

SEMs are ideally suited to study highly corrugated surfaces, due to the large depth of focus. They are 
generally operated with lower beam energies (100 eV-30 keV) in order to efficiently control the volume in 
which the electron beam interacts with the sample to produce various specific signals (see figure Bl. 17.1 for 
imaging and compositional analysis. Modern SEM instruments (equipped with a field emission electron 
source) can scan the sample surface with a beam diameter of 1 nm or smaller, thus providing high-resolution 
structural information that can complement the information obtained from atomic force microscopy (AFM). 
In contrast to AFM, which directly provides accurate height information in a limited range, quantitative 
assessment of the surface topography by SEM is possible by measuring the parallax of stereo pairs [45]. 

High-resolution topographic information is obtained by the secondary-electron signal (SE 1, see figure 
Bl.17.7 produced at the point of impact of the primary beam (PE). The SE 1 signal alone is related to the 
position of the scanning beam. It depends on the distance the primary beam travels through the SE escape 
zone, where it releases secondary electrons that can leave the specimen surface, i.e. it depends on the angle of 
impact of the primary beam (see figure Bl.17.7 ). The high-resolution topographic signal is obscured by other 
SE signals (SE 2, SE 3, figure Bl.17.7 ) that are created by BSE (electrons of the primary beam, multiply 
scattered deeper inside the specimen) when they leave the specimen and pass through the SE escape zone (SE 
2) and hit the pole pieces of the objective lens and/or the walls of the specimen chamber (SE 3). Additional 
background signal is produced by the primary beam striking the objective lens aperture. 


-22- 


a) 


primary ateeuons (PE,1-3W(eV) 



SE3 


I 


/ 
SE escape daplhd 


b) 


primary electrons (PE) 



SE 


Figure Bl.17.7. (a) Classification of the secondary electron signals. High-resolution topographic information 
is obtained by the SE 1 signal. It might be obscured by SE 2 and SE 3 signals that are created by the 
conversion of BSE. d\ SE escape depth, 7?: range out of which BSE may leave the specimen, (b) SE signal 
intensity, R > d: the SE 1 signal depends on the angle of impact of the electron beam (PE). SE 1 can escape 
from a larger volume at tilted (B) surfaces and edges (C) than at orthogonal surfaces (A). 

High-resolution topographic imaging by secondary electrons therefore demands strategies (instrumentation, 
specimen preparation [35] and imaging conditions) that aim at enhancing the SE 1 signal and suppressing the 
background noise SE signals (e.g. [46]). Basically, the topographic resolution by SE 1 depends on the smallest 
spot size available and on the SE escape zone, which can be up to 100 nm for organic materials and down to 
1-2 nm for metals. 

Non-conductive bulk samples, in particular, are frequently rendered conductive by vacuum coating with 
metals using sputter or evaporation techniques. The metal coating should be of uniform thickness and 
significantly thinner than the smallest topographic details of interest. Metal coating provides the highest 
resolution images of surface details. It may, however, irreversibly destroy the specimen. An example of such a 
metal-coated sample is shown in figure B 1.1 7. 8 . 
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Figure Bl.17.8. Iron oxide particles coated with 4 nm of Pt in an m-planar magnetron sputter coater 
(Hermann and Miiller 1991). Micrographs were taken in a Hitachi S-900 'in-lens' field emission SEM at 
30,000 primary magnification and an acceleration voltage of 30 kV. Image width is 2163 nm. 

SEM with low acceleration voltage (1-10 kV) (LVSEM) can be applied without metal coating of the sample, 
e.g. for quality control purposes in semiconductor industries, or to image ceramics, polymers or dry biological 
samples. The energy of the beam electrons (the acceleration voltage) should be selected so that charge 
neutrality is approached, i.e. the amount of energy that enters the sample also leaves the sample in the form of 
SE and BSE. Modern SEM instruments, equipped with FEGs provide an adequate spot size, although the spot 
size increases with decreasing acceleration voltage. The recent implementation of a cathode lens system [ 47 ] 
with very low aberration coefficients will allow the surfaces of non-metal coated samples at beam energies of 
only a few electronvolts to be imaged without sacrificing spot size. New contrast mechanisms and new 
experimental possibilities can be expected. 

The fact that electron beam instruments work under high vacuum prohibits the analysis of aqueous systems, 
such as biological materials or suspensions, or emulsions without specimen preparation as outlined above. 
These preparation procedures are time consuming and are often not justified in view of the only moderate 
resolution required to solve a specific practical question (e.g. to analyse the grain size of powders, bacterial 
colonies on agar plates, to study the solidification of concrete, etc). Environmental SEM (ESEM) and 'high- 
pressure SEM' instruments are equipped with differentially pumped vacuum systems and Peltier-cooled 
specimen stages, which allow wet samples to be observed at pressures up to 5000 Pa [48], Evaporation of 
water from the specimen or condensation of water onto the specimen can thus be efficiently controlled. No 
metal coating or other preparative steps are needed to control charging of the specimen since the interaction of 
the electron beam with the gas molecules in the specimen chamber produces positive ions that can 
compensate surface charges. 'High-pressure SEM', therefore, can study insulators without applying a 
conductive coating. The high gas pressure in the vicinity of the specimen leads to a squirting of the electron 
beam. Thus the resolution-limiting spot size achievable on the specimen surface depends on the acceleration 
voltage, the gas pressure, the scattering cross section of the gas and the distance the electrons have to travel 
through the high gas pressure zone [49]. High-pressure SEM and ESEM is still under development and the 
scope of applications is expanding. Results to date consist mainly of analytical and low-resolution images 
(e.g. [50]). 
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B1. 17.5.3 MODERN DEVELOPMENTS OF INSTRUMENTS AFFECTING IMAGE CONTRAST AND RESOLUTION 

As was discussed above, the image contrast is significantly affected by the aberrations of the electron-optical 


lenses. The discussion on the CTF showed that the broadening PSF of the TEM delocalizes information in an 
TEM image. This necessitates additional techniques to correct for the CTF, in order to obtain interpretable 
image information. Furthermore, it was discussed that the resolution of STEM and SEM depends on the size 
of the focused beam, which is also strongly dependent on lens aberrations. The leading aberrations in state-of- 
the-art microscopes are the spherical and chromatic aberrations in the objective lens. Correction of such 
aberrations was discussed as early as 1947, when Scherzer suggested the correction of electron optical lenses 
[51], but it was not until 1990 that a complete concept for a corrected TEM was proposed by Rose [52]. It was 
in the last decade that prototypes of such corrected microscopes were presented. 

The first corrected electron-optical SEM was developed by Zach [10]. For low-voltage SEM (LVSEM, down 
to 500 eV electron energy instead of the conventional energies of up to 30 keV) the spot size is extremely 
large without aberration correction. Combining C § and C Q correction and a electrostatic objective lens, Zach 
showed that a substantial improvement in spot size and resolution is possible. The achievable resolution in a 
LVSEM is now of the order of 1-2 nm. More recently, Krivanek and colleagues succeeded in building a C § 
corrected STEM [51,54]. 

The construction of an aberration-corrected TEM proved to be technically more demanding: the point 
resolution of a conventional TEM today is of the order of 1-2 A. Therefore, the aim of a corrected TEM must 
be to increase the resolution beyond the 1 A barrier. This implies a great number of additional stability 
problems, which can only be solved by the most modern technologies. The first C § corrected TEM prototype 
was presented by Haider and coworkers [55]. Figure B 1.1 7. 9 shows the improvement in image quality and 
interpretability gained from the correction of the spherical aberration in the case of a materials science sample. 
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Figure Bl.17.9. A CoSi grain boundary as visualized in a spherical-aberration-corrected TEM (Haider e£ al 
1998). (a) Individual images recorded at different defocus with and without correction of C s ; (b) CTFs in the 
case of the uncorrected TEM at higher defocus; (c) CTF for the corrected TEM at only 14 nm underfocus. 
Pictures by courtesy of M Haider and Elsevier. 
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The field of corrected microscopes has just begun with the instruments discussed above. The progress in this 
field is very rapid and proposals for a sub-Angstom TEM (SATEM) or even the combination of this 
instrument with a corrected energy filter to form a sub-electronvolt/sub-Angstom TEM (SESAM) are 
underway. 


B1.17.6 ANALYTICAL IMAGING, SPECTROSCOPY, AND MASS 
MEASUREMENTS 

For a detailed discussion on the analytical techniques exploiting the amplitude contrast of inelastic images in 
ESI and image-EELS, see chapter B 1.6 of this encyclopedia. One more recent but also very important aspect 
is the quantitative measurement of atomic concentrations in the sample. The work of Somlyo and colleagues 
[56], Leapman and coworkers [57,58], and Door and Gangler [ 59 ] introduce techniques to convert measured 
intensities of inelastically scattered electrons directly into atomic concentrations. 

For bio-medical or cell-biological samples, in particular, this provides a direct measurement of physiological 
ion concentrations. The main disadvantage of such methods is the almost unavoidable derealization of free 
ions and the resulting change in concentration during sample preparation steps. The discussed rapid freezing 
and the direct observation of samples without any chemical treatment provides a very good compromise for 
organic samples. In materials science, derealization does not seem to pose a major problem. 

Another specialized application of EM image contrast is mass measurement. Using the elastic dark-field 
image in the STEM or the inelastic image in the EFTEM, a direct measurement of the scattering mass can be 
performed. For reviews on this technique see [ 60 , 61 ]. 


B1.17.7 3D OBJECT INFORMATION 

EM images are always either 2D projections of an interaction potential (see equation (Bl.17.1 1) and equation 
(B 1.1 7. 12) ) or a surface topology encoded in grey levels of individual image points (image contrast). The aim 
of EM image processing is to reconstruct the 3D object information from a limited number of such 
projections. This problem does not arise for all applications of EM. Very often in materials science the sample 
is prepared in such a way that one single projection image contains all the information necessary to answer a 
specified question. As an example, consider figure B 1.1 7. 9 which shows a Co-Si interface. The orientation of 
the sample is chosen to give a perfect alignment of atoms in the direction of the grain boundary. Imaging at 
atomic resolution allows the direct interpretation of the contrast as images of different atoms. One single 
exposure is, in principle, sufficient to collect all the information needed. It should be noted here again that this 
is only true for the spherical-aberration, C corrected EM with its non-oscillating CTF (bottom right panel in 
figure B 1.1 7. 9(a) . As is obvious from the Co-Si interface ( figure B 1.1 7. 9(a) ) finite C s imaging) and the 
defocus series of the biological sample ( figure B 1.1 7. 6 ) more than one image has to be combined for 
conventional EMs. However, such a direct interpretation of one projected image to obtain the 3D structure 
information works only for samples that are ordered crystallographically at the atomic level. For other 
samples it is necessary to combine projections from different angles. Such samples are unique, non- 
crystallographic structures, e.g. structural defects in materials science, cellular compartments in cell biology, 
or macromolecules (protein complexes or single biological molecules) in high-resolution molecular imaging. 
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The generalized problem has been solved by tomography. In EM it is possible to tilt the sample along an axis 
which is perpendicular to the electron beam (single-axis tomography, see figure B 1.17. 10 . If the sample can 
withstand a high cumulative electron dose, then an unlimited number of exposures at different defocus values 
and tilting angles can be recorded. If kinematical scattering theory can be applied (single elastic scattering, 
compare equation (B 1.1 7. 8) ), then it is possible to correct all the effects of the CTF and each corrected 
projection image at one particular tilting angle corresponding to a section of the Fourier-transformed 
scattering potential equation (Bl.17.1 1) and equation (B 1.1 7. 12) . The combination of information from 
different tilting angles provides the determination of the structure factors in the complete 3D Fourier space. 
Finally, a simple mathematical inverse Fourier-transform produces a complete 3D reconstructed object. A 


geometric equivalent of the projection and reconstruction process is found in the sectioning of the Ewald- 
sphere with the 3D Fourier-transform of the scattering potential. For an introduction to the concepts of the 
Ewald-sphere and Fourier techniques in structure determination see [62]. 
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Figure Bl. 17.10. Principles of 3D reconstruction methods, (a) Principle of single axis tomography: a particle 
is projected from different angles to record corresponding images (left panel); this is most easily realized in 
the case of a helical complex (right panel), (b) Principle of data processing and data merging to obtain a 
complete 3D structure from a set of projections. 
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The basic reconstruction algorithms involved are mathematically well known and are well established [63]. A 
variety of concepts has evolved for single- and multi-axis tomography, combining projection information, 
calculating 3D object densities either with Fourier- or real-space algorithms (see figure Bl.17.10 which shows 
some examples of the geometry used for the single- and multi-axis tomography). For a complete reference to 
the methods used see Frank [ 64 ] and the special issue of Journal of Structural Biology (vol 120, 1997). 


The main disadvantage of the tomographic approach is the beam-induced destruction of the sample. In 
practice, one can record only a limited number of images. Therefore, it is not possible to correct the CTF 
completely or to obtain an infinite sampling of the projection angles from only one specimen. Two major 
approaches are used today: the single- and double-axis tomography of one individual object (e.g. cell 
organelles, see Baumeister and coworkers [65]) and, second, the imaging of many identical objects under 
different projection angles (see [64]; random conical tilt, angular reconstitution). 


For single-axis tomography, with its limited number of images and the subsequent coarse sampling in 
reciprocal Fourier space, only a moderate resolution can be expected. For chemically fixed samples with high 
image contrast from heavy atom staining it is possible to obtain a resolution of about 4 nm ([66], 
reconstruction of the centrosome). For native samples, true single-axis tomography without averaging over 
different samples results in even lower resolution. Today, sophisticated EM control software allows a fully 
automatic collection of tilted images [67], making single-axis tomography a perfect reconstruction tool for 
unique objects. 

If many identical copies of the object under study are available, other procedures are superior. They rely on 
the fact that the individual molecules are oriented with respect to the incident electron beam. Such a situation 
is found mainly for native ice-embedded samples (compare the paragraph about preparation). In ice layers of 
sufficient thickness, no special orientation of the molecule is preferred. The obtained projection images from 
one, untilted image can then be classified and aligned in an angular reconstitution reconstruction process. By 
averaging large numbers of projection images, it is possible to correct for CTF effects [ 68 ] and to obtain an 
almost complete coverage of reciprocal Fourier space. If — for some reason — the object still shows a limited 
number of preferred orientations, an additional tilting of the sample again gives complete coverage of possible 
projection angles (random conical tilt method). Both methods have been successfully applied to many 
different biological samples (for an overview, see [64]). 

An important point for all these studies is the possible variability of the single molecule or single particle 
studies. It is not possible, a priori, to exclude 'bad' particles from the averaging procedure. It is clear, 
however, that high structural resolution can only be obtained from a very homogeneous ensemble. Various 
classification and analysis schemes are used to extract such homogeneous data, even from sets of mixed states 
[69]. In general, a typical resolution of the order of 1-3 nm is obtained today. 

The highest resolutions of biological samples have been possible for crystalline samples (electron 
crystallography). 2D crystals of membrane proteins and one cytoskeletal protein complex have been solved at 
the 3-4 A level combining imaging and electron diffraction, as pioneered by Henderson and coworkers [ 70 , 71 
and 72], also see figure Bl. 17.11 . Icosahedral virus particles are reconstructed from images to 8-9 A 
resolution [26,27], allowing the identification of alpha helices. Compared to single particles, these samples 
give much higher resolution, in part because much higher numbers of particles are averaged, but it is also 
possible that a crystallization process selects for the uniformity of the crystallizing object and leads to very 
homogeneous ensembles. 
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a) 



Figure Bl. 17.11. Reconstructed density of an a,P-tubulin protein dimer as obtained from electron 
crystallography (Nogales et al 1997). Note the appearance of the P-sheets ((a), marked B) and the a-helices 
((b), marked H) in the density. In particular the right-handed a-helix H6 is very clear. Pictures by courtesy of 
E Nogales and Academic Press. 

For electron crystallography, the methods to obtain the structure factors are comparable to those of 
conventional x-ray crystallography, except that direct imaging of the sample is possible. This means that both 
electron diffraction and imaging can be used, i.e. structure amplitudes are collected by diffraction, structure 
factor phases by imaging. For a general overview in the structure determination by electron crystallography, 
see [73]. The 3D structure of the sample is obtained by merging diffraction and imaging data of tilt series 
from different crystals. It is, therefore, a form of tomography adapted for a diffracting object. 

Even though it is easy to get the phase information from imaging, in general, imaging at the desired high 
resolution (for structure determination work of the order of 3-4 A) is very demanding. Specialized 
instrumentation (300 kV, FEG, liquid He sample temperature) have to be used to avoid multiple scattering, to 
allow better imaging (less imaging aberrations, less specimen charging which would affect the electron beam) 
and to reduce the effects of beam damage. 


B1 .17.8 TIME-RESOLVED AND IN SITU EM STUDIES: VISUALIZATION 
OF DYNAMICAL EVENTS 

As a result of the physical conditions in electron microscopes such as the high vacuum, the high energy load 
on the sample by inelastic scattering, or the artificial preparation of the sample by sectioning or thinning, it 
has become customary to think about samples as static objects, precluding the observation of their native 
structural changes during a reaction. In some studies, however, the dynamics of reactions have been studied 
for biological systems as well as in 
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materials science. In situ microscopy was widely used in materials science in the 1960s and 1970s, when, for 
example, metal foils were studied, heated up in the EM, and reactions followed in a kind of time-lapse 


microscopy [74]. In recent years, similar experiments have been performed on semiconductors and ceramics, 
and a general new interest in in situ microscopy has developed. 

Time-resolved EM in biological systems is a comparatively new and limited field. Simple time-lapse fixation 
of different samples of a reacting biological tissue has long been used, but the direct, temporal monitoring of a 
reaction was developed only with the invention of cryo-fixation techniques. Today, time-lapse cryo-fixation 
studies can be used in the case of systems with slow kinetics, i.e. reaction times of the order of minutes or 
slower. Here, samples of a reacting system are simply taken in certain time intervals and frozen immediately. 
For the study of very fast reactions, two approaches have been developed that couple the initiation of the 
reaction and the fixation of the system on a millisecond time scale. The reaction itself can be started either by 
a rapid mixing procedure [ 75 ] or by the release of a masked reaction partner photolysing caged compounds 
(see figure B 1.1 7. 12 [76], For a review of time-resolved methods used in biological EM, see [19]. 



h) c) d) L-i 


Figure Bl. 17.12. Time-resolved visualization of the dissociation of myosin SI from filamentous actin (see 
also figure B 1.1 7. 6 ). Shown are selected filament images before and after the release of a nucleotide analogue 
(AMPPNP) by photolysis: (a) before flashing, (b) 20 ms, (c) 30 ms, (d) 80 ms and (e) 2 s after flashing. Note 
the change in obvious order (as shown by the diffraction insert in (a)) and the total dissociation of the complex 
in (e). The scale bar represents 35.4 nm. Picture with the courtesy of Academic Press. 
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B1.18 Microscopy: light 

H Kiess 


B1.18.1 INTRODUCTION 

Light microscopy is of great importance for basic research, analysis in materials science and for the practical 
control of fabrication steps. When used conventionally it serves to reveal structures of objects which are 
otherwise invisible to the eye or magnifying glass, such as micrometre-sized structures of microelectronic 
devices on silicon wafers. The lateral resolution of the technique is determined by the wavelength of the light 


and the objective of the microscope. However, the quality of the microscopic image is not solely determined 
by resolution; noise and lack of contrast may also prevent images of high quality being obtained and the 
theoretical resolution being reached even if the optical components are ideal. The working range of the light 
microscope in comparison to other microscopic techniques is depicted schematically in table B 1.1 8.1. Clearly, 
the light microscope has an operating range from about half a micrometer up to millimetres, although recent 
developments in improving resolution allow the lower limit to be pushed below half a micrometer. 

Table Bl.18.1 Overview of working ranges of various microscopic techniques (in |um). 


Light microscope 0.5 <^> 1000 

Scanning electron microscope 0.05 <^> 1000 

Transmission electron microscope 0.001 <^> 10 

Scanning probe microscope 0.0001 <^> 100 


Microscopes are also used as analytical tools for strain analysis in materials science, determination of 
refractive indices and for monitoring biological processes in vivo on a microscopic scale etc. In this case 
resolution is not necessarily the only important issue; rather it is the sensitivity allowing the physical quantity 
under investigation to be accurately determined. 

Light microscopy allows, in comparison to other microscopic methods, quick, contact-free and non- 
destructive access to the structures of materials, their surfaces and to dimensions and details of objects in the 
lateral size range down to about 0.2 |um. A variety of microscopes with different imaging and illumination 
systems has been constructed and is commercially available in order to satisfy special requirements. These 
include stereo, darkfield, polarization, phase contrast and fluorescence microscopes. 


The more recent scanning light microscopes are operated in the conventional and/or in the confocal mode 
using transmitted, reflected light or fluorescence from the object. Operation in the confocal mode allows 
samples to be optically sectioned and 3D images of objects to be produced — an important aspect for imaging 
thick biological samples. The breakthrough for confocal microscopes was intimately connected with the 
advent of computers and data processing. The conventional microscope is then replaced by a microscopic 
system comprising the microscope, the scanning, illumination and light detection systems, the data processor 
and computer. 

This overview will first deal with the optical aspects of conventional microscopes and the various means to 
improve contrast. Confocal microscopy, which in the last decade has become an important tool, especially for 
biology, is discussed in the final section. 


B1.18.2 MAGNIFICATION, RESOLUTION AND DEPTH OF FOCUS 


B1. 18.2.1 MAGNIFICATION 


Microscopes are imaging systems and, hence, the image quality is determined by lens errors, by structures in 
the image plane (e.g., picture elements of CCD cameras) and by diffraction. In addition, the visibility of 
objects with low contrast suffers from various noise sources such as noise in the illuminating system (shot 
noise), scattered light and by non-uniformities in the recording media. Interest often focuses on the achievable 
resolution, and discussions on limits to microscopy are then restricted to those imposed by diffraction (the so- 
called Abbe limit), assuming implicitly that lenses are free of errors and that the visual system or the image 
sensors are ideal. However, even under these conditions the Abbe limit of the resolution may not be reached if 
the contrast is insufficient and noise is high. 

Before discussing the limits imposed by diffraction and the influence of contrast and noise on resolution, it is 
important to recall the basic principle of the light microscope: The objective lens provides a magnified real 
image of the object in the focal plane of the eyepiece. This image is then focused by the eyepiece onto the 
retina of the eye and is seen by the observer as a virtual image at about 25 cm distance, the normal distance 
for distinct vision ( figure B 1.1 8.1 ). The object is illuminated by the light of a lamp, either from below through 
the stage of the object holder if the object is transparent, or from the top if the object is non-transparent and 
reflecting. Organic objects containing fluorescent molecules are often investigated with an illuminating light 
beam that causes the sample to fluoresce. The exciting light is 'invisible' and the object is imaged and 
characterized by the emitted light. 
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Figure Bl.18.1. Light rays and imaging lenses in the microscope. The illumination system is not included. 
The real image is seen by the eye as a virtual image at 25 cm distance, the normal distance for distinct vision. 

The magnification M microse achievable by a microscope is the ratio of the scales of the virtual image and of 
the object. It can be easily seen from figure Bl.18.1 that this ratio is given by 

Mi- is the scale of magnification by the objective under the geometric conditions given by the microscope 
and M eyepiece = *// eyepiece is the magnification by the eyepiece, with the focal length/ eyepiece and 1= 25 cm, 

the normal distance for distinct vision. The objectives are marked with the scale of magnification (e.g. 40:1) 
by the manufacturer and similarly, the eyepiece by its magnification under the given conditions (e.g. 5 x). 
Multiplication of both numbers gives the magnification of the microscope. For practical reasons the 
magnification of the objective is not so high as to resolve all the details in the real image with the naked eye. 


The magnification is rather chosen to be about 500 A Q ^- to 1000 A Q ^-, where A Q ^- is the numerical aperture of 
the objective (see the next section) The eyepiece is then necessary to magnify the real image so that it can 
conveniently be inspected. 

B1.18.2.2 LATERAL RESOLUTION: DIFFRACTION LIMIT 

The performance of a microscope is determined by its objective. It is obvious that details of the object that are 
not contained in the real image (figure B 1.1 8.1) cannot be made visible by the eyepiece or lens systems, 
whatever quality or magnification they may have. The performance is defined here as the size of the smallest 
lateral structures of the object that can be resolved and reproduced in the image. To fully assess resolution and 
image fidelity, the modulation transfer function of the imaging system has to be known (or for scanning 
microscopes more conveniently the point spread function). The resolution is then given by the highest lateral 
frequency of an object which can just be transmitted by the optical system. 


Alternatively, one may consider the separation of two structure elements in the plane of the object, which are 
just discernible in the image [1, 2 and 3]. Since an exact correlation exists between the pattern generated by 
the object in the exit pupil of the objective and the image, the limit on the resolution can be estimated simply. 
If the diffracted beams of zeroth and ±first order are collected by the lens, then an image of low fidelity of the 
structure, with the zeroth order only a grey area, is obtained. Hence, the limit to resolution is given whenever 
the zeroth- and first-order beams are collected (figure B 1.1 8. 2). If the diffracted light enters a medium of 
refractive index n, the minimal discernible separation # min of two structure elements is given by n sin a = 
AAz min . The expression n sin a can be called the numerical aperture of the diffracted beam of first order 
which, for microscopes, is identical to the numerical aperture A Q ^- of the objective lens. The numerical 
aperture of a lens is the product of the refractive index of the medium in front of the objective and of the sine 
of half of the angle whose vertex is located on the optical axis and being the starting point of a light cone of 
angle a which is just collected by the lens. 
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Figure Bl.18.2. Diffraction figure of a grating: If only the zeroth-order beam were collected by the lens, only 
a bright area would be visible without any structure indicating the presence of the grating. If the zeroth- and 
±first-order beams are collected, as indicated in the figure, the grating can be observed, albeit with incomplete 
object fidelity. 

The smallest resolvable structure is thus a min = A//4 *•. If, in addition, the aperture of the illumination system 
is taken into account, one finds: 

flmiii = A/t'Uy + ^ill)- 

The highest resolution is obtained if A^ = A Q ^. In this case a min equals about half the wavelength used for the 
illumination divided by the numerical aperture. Using blue or ultraviolet light for illumination, a min can reach 
values of 0.2 to 0.15 |um with a numerical aperture of the microscope of about 0.9. 


The diffraction limit for resolution does not imply that objects of dimensions smaller than a min are not 
detectable. Single light-emitting molecules or scattering centres of atomic dimensions can be observed even 
though their size is below the resolution. For their detectability it is required that the separation of the centres 
is greater than the resolution and that the emitted signal is sufficiently high to become detectable by a light- 
sensitive device. Microscopy, for which these assumptions are fulfilled, has sometimes been called 
ultramicroscopy. 


B1. 18.2.3 CONTRAST, NOISE AND RESOLUTION 

The resolution limited by diffraction assumes that illumination and contrast of the object are optimal. Here we 
discuss how noise affects the discernibility of small objects and of objects of weak contrast [4]. Noise is 
inherent in each light source due to the statistical emission process. It is, therefore, also a fundamental 
property by its very nature and limits image quality and resolution, just as diffraction is also responsible for 
the fundamental limit. Light passing through a test element in defined time slots At will not contain the same 
number of photons in a series of successive runs. This is due to the stochastic emission process in the light 
source. Other sources of noise, such as inhomogeneities of recording media, will not be considered here. 

It is assumed that the image can be divided up into a large number of picture elements, whose number will be 

of the order of 10 . If the contrast due to a structure in the object between two adjacent elements is smaller 
than the noise-induced fluctuations, the structure cannot be discerned, even if diffraction would allow this. 
Similarly, if the statistical excursion of the photon number in one or several of the elements is larger than or 
equal to the signal, then the noise fluctuations might be taken as true signals and lead to misinterpretations. 

If viewed in transmission, the background brightness B^ is higher than the light B Q transmitted by an 
absorbing object. The contrast can then be defined as C = (B b - B Q )/B^ with B^ > B Q > and 1> C > 0. It has 
been shown that density of photons R (photons cm ) required to detect the contrast C is 

Here, TV is the density of picture elements (cm -2 ); if TV is high, the resolution is high, requiring, however, an 
increase of photon density over images with lower resolution. Also, low contrast requires greater photon 
density than high contrast in order to overcome false signals by noise fluctuations of adjacent picture 
elements. The factor k reflects the random character of the photons (noise) and has to be chosen so as to 
protect against misinterpretations caused by noisy photon flux, k depends somewhat on how well the image 
should be freed from noise-induced artefacts, a reasonable value being k = 5. 

A summary of the diffraction- and noise-induced limitations of the resolution is qualitatively depicted in 
figure B 1.1 8. 3 . With noise superimposed, the rectangular structure depicted in figure B 1.1 8. 3(a) becomes less 
defined with decreasing spacing and width of the rectangles. In figure B 1.1 8. 3(b) , an assumed modulation 
transfer function of an objective is shown: that is, the light intensity in the image plane as a function of spatial 
frequency obtained by an object which sinusoidally modulates the transmitted light intensity. At low spatial 
frequencies, the amplitude is independent of frequency; at higher frequencies it drops linearly with increasing 
frequency. The root mean square (rms) noise, due to the statistical nature of the light, increases with spatial 
frequency. The intersection of the rms noise with the modulation transfer function gives the frequency at 
which noise becomes equal to (k = 1) or l/25th (k = 5) of the signal. At high contrast, the decrease in image 
amplitude is usually determined by diffraction; at lower contrast, noise is predominant. 
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Figure Bl.18.3. (a) Rectangular structure with noise superimposed: in the left half of the graph the 
rectangular structure is recognizable; in the right half, with the narrower spacing the rectangular structure 
would barely be recognizable without the guiding lines in the figure, (b) Modulation transfer function for a 
sinusoidal signal of constant amplitude as a function of frequency: at low frequency the amplitude of the 
transfer function is independent of the frequency. However, beyond a certain frequency the amplitude 
decreases with increasing frequency. This drop corresponds to a limitation of the resolution by diffraction. 
Noise increases with frequency and the crossings with the transfer function indicate where noise limits the 
resolution. At a contrast of 30% this cutoff frequency is exclusively determined by noise at point A; at 100% 
contrast the amplitude of the signal drops before the final signal is cut off at point B by noise. For k = 5, the 
noise at the crossing is l/25th of the signal; for k = 1, it is equal to the signal. 

Under appropriate contrast and high light intensity, the resolution of planar object structures is diffraction 
limited. Noise in the microscopic system may also be important and may reduce resolution, if light levels 
and/or the contrasts are low. This implies that the illumination of the object has to be optimal and that the 
contrast of rather transparent or highly reflecting objects has to be enhanced. This can be achieved by an 
appropriate illumination system, phase- and interference-contrast methods and/or by data processing if 
electronic cameras (or light sensors) and processors are available. Last but not least, for low-light images, 
efforts can be made to reduce the noise either by averaging the data of a multitude of images or by subtracting 
the noise. Clearly, if the image is inspected by the eye, the number of photons, and hence the noise, are 
determined by the integration time of the eye of about 1/30 s; signal/noise can then only be improved, if at all 
possible, by increasing the light intensity. Hence, electronic data acquisition and processing can be used 
advantageously to improve image quality, since integration times can significantly be extended and noise 
suppressed. 

B1. 18.2.4 DEPTH OF FOCUS 

The depth of focus is defined as how far the object might be moved out of focus before the image starts to 
become blurred. It is determined (i) by the axial intensity distribution which an ideal object point suffers by 
imaging with the objective (point spread function), (ii) by geometrical optics and (iii) by the ability of the eye 
to adapt to different distances. In case (iii), the eye adapts and sees images in focus at various depths by 
successive scanning. Obviously, this mechanism is inoperative for image sensors and will not be considered 
here. The depth of focus caused by spreading the light intensity in axial direction (i) is given by 


f PSF = nX/M)- 


with n the refractive index, X the wavelength of the light and A the numerical aperture. 


The focal depth in geometrical optics is based on the argument that 'points' of a diameter smaller than 0.15 
mm in diameter cannot be distinguished. This leads to a focal depth of 

The total depth of focus is the sum of both. It increases with the wavelength of the light, depends on the 
numerical aperture and the magnification of the microscope. For X = 550 nm, a refractive index of 1 and a 
numerical aperture of 0.9, the depth of focus is in the region of 0.7 |um; with a numerical aperture of 0.4 it 
increases to about 5 |um. High-resolution objectives exclude the observation of details in the axial direction 
beyond their axial resolution. This is true for conventional microscopy, but not for scanning confocal 
microscopy, since optical sectioning allows successive layers in the bulk to be studied. Similarly, the field of 
view decreases with increasing resolution of the objective in conventional microscopy, whereas it is 
independent of resolution in scanning microscopy. 


B1.18.3 CONTRAST ENHANCEMENT 

In transmission microscopy, a transparent object yields low contrast. Molecular biological samples may be 
dyed in order to enhance contrast. However, this is in many cases neither possible nor desirable for various 
reasons, meaning that the object is only barely visible in outline and with practically no contrast. Similarly, if 
inorganic samples are to be investigated which are composites of materials of practically equal indices of 
refraction, the different components can only be distinguished in the microscope with great difficulty. This is 
all equally true for reflected light microscopy: the visibility and resolution of microscopic images suffer, if 
contrast is low. In order to cope with this, different illumination techniques are applied in order to enhance the 
contrast. 

B1. 18.3.1 KOHLER'S BRIGHT-FIELD ILLUMINATION SYSTEM 

Kohler's illumination system [5], which allows the field of view to be precisely illuminated, is schematically 
depicted in figure B 1.1 8.4 . The object is illuminated through a substage: the filament of a lamp is imaged by a 
collector lens into the focal plane of the condenser, where the condenser iris is located. Light from each point 
in the condenser iris passes through the object as a parallel beam inclined to the axis of the microscope at an 
angle depending on the position of the point in the iris. The parallel beams come to a focus at corresponding 
points in the focal plane of the objective. The collecter iris allows the area illuminated in the object plane to be 
varied. The condenser iris is the aperture of the illumination and its opening should be adjusted to the aperture 
of the objective lens and contrast properties of the object: if the apertures are equal, the highest resolution is 
achieved; if the illumination aperture is reduced, the contrast is enhanced. In practice, the aperture of the 
condenser is in most cases chosen to be smaller than the aperture of the objective lens in order to improve 
contrast. 
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Figure Bl.18.4. The most frequently used illumination system in bright-field microscopy. 

B1. 18.3.2 ENHANCED CONTRAST BY DARK-FIELD ILLUMINATION 

Dark-field microscopy utilizes only those light beams from the condenser that have an aperture greater than 
that of the objective. This is in contrast to Kohler's illumination principle, where care is taken to adjust the 
aperture of the condenser by an iris to become equal to or smaller than that of the objective. A ring-type 
diaphragm is used to allow light beams to pass the condenser lens at an aperture greater than that of the 
objective lens. This is shown schematically in figure Bl. 18. 5 . In this arrangement, no direct light beams pass 
through the objective but only those which are diffracted or scattered by the object. If the direct light beam is 
blocked out, the background appears black instead of bright, thus increasing the contrast. Special condensers 
have been designed for dark-field illumination. Dark-field illumination has often been used in reflection. 



objective 


object 


annular 
diaphragm 


Figure Bl.18.5. Dark-field illumination: the aperture of the objective is smaller than the aperture of the beams 
allowed by the annular diaphragm. 

B1. 18.3.3 ZERNIKE'S PHASE CONTRAST MICROSCOPY 

Phase contrast microscopy [6, 7] is more sophisticated and universal than the dark-field method just 
described. In biology, in particular, microscopic objects are viewed by transmitted light and phase contrast is 
often used. Light passing through transparent objects has a different phase from light going through the 
embedding medium due to differences in the indices of refraction. The image is then a so-called phase image 
in contrast to an amplitude image of light absorbing objects. Since the eye and recording media in question 
respond to the intensity (amplitude) of the light and not to changes of the light phase, phase images are barely 
visible unless means are taken to modify the interference of the diffracted beams. The diffraction pattern of a 
phase grating is like that of an amplitude grating except that the zeroth-order beam is especially dominant in 
intensity. Zernike realized that modification of the zeroth-order beam will change the character of the image 
very effectively, by changing its phase and its intensity. For each object, depending on its character 
concerning the phase and amplitude, a 'Zernike diaphragm' (i.e. a diaphragm that affects the phase and the 
amplitude of the zeroth-order beam) can be constructed with an appropriate absorption and phase shift, which 
allows the weak-contrast image of the object to be transformed into an image of any desired contrast. 

The principle of phase contrast microscopy is explained by figure B 1.1 8. 6 . The object is assumed to be a 
linear phase grating. The diaphragm is annular, which means that only a small fraction of the diffracted light 
is covered by the Zernike diaphragm, as indicated in the figure for the first-order beams. In general, the 
Zernike diaphragm shifts the phase of the zeroth order by tt/2 with respect to the diffracted beams. Since the 
intensity of the diffracted beams is much lower than that of the direct beam, the intensity of the zeroth-order 
beam is usually attenuated by adding an absorbing film. Clearly, all these measures indicate that images of 
high contrast cannot be combined with high intensity using this technique; a compromise between both has to 
be found depending on the requirements. The image fidelity of phase contrast imaging depends, therefore, on 
the width and light absorption of the Zernike diaphragm, in addition to the size and optical path difference 
created by the object under study and, finally, on the magnification. 
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Figure Bl.18.6. Schematic representation of Zernike's phase contrast method. The object is assumed to be a 
relief grating in a transparent material of constant index of refraction. Phase and amplitude are varied by the 
Zernike diaphragm, such that an amplitude image is obtained whose contrast is, in principle, adjustable. 

At this point it is worth comparing the different techniques of contrast enhancements discussed so far. They 
represent spatial filtering techniques which mostly affect the zeroth order: dark field microscopy, which 
eliminates the zeroth order, the Schlieren method (not discussed here), which suppresses the zeroth order and 
one side band and, finally, phase contrast microscopy, where the phase of the zeroth order is shifted by tt/2 and 
its intensity is attenuated. 

B1. 18.3.4 INTERFERENCE MICROSCOPY 

As already discussed, transparent specimens are generally only weakly visible by their outlines and flat areas 
cannot be distinguished from the surroundings due to lack of contrast. In addition to the phase contrast 
techniques, light interference can be used to obtain contrast [8, 9]. 
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Transparent, but optically birefringent, objects can be made visible in the polarizing microscope if the two 
beams generated by the object traverse about the same path and are brought to interfere. In the case of 
optically isotropic bodies, the illuminating light beam has to be split into two beams: one that passes through 
the specimen and suffers phase shifts in the specimen which depend on the thickness and refraction index, and 
a second beam that passes through a reference object on a separate path (see figure B 1.1 8. 7 . By superposing 
the two beams, phase objects appear in dark-bright contrast. 
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Figure Bl.18.7. Principle for the realization of interference microscopy. The illuminating beam is split by 
beamsplitter 1 before passing the object so that the reference beam is not affected by the object. The separated 
beams interfere behind beamsplitter 2. 

The differential interference contrast method utilizes the fact that, using a Wollaston prism, linearly polarized 
light can be split into two light beams of perpendicular polarization ( figure B 1.1 8. 8 ). Since they are slightly 
parallel shifted, the two beams pass through the object at positions having different thickness and/or refractive 
index. The splitting of the beams is chosen to be sufficiently small not to affect the resolution. They are 
brought together again by a second Wollaston prism, and pass through the analyser. Since the beams are 
parallel and their waves planar in the object plane, the beams in the image plane are also parallel and the 
waves planar. Hence, the interference of the beams does not give rise to interference lines but to contrast, 
whose intensity depends on the phase difference caused by small differences of the refractive index. The 
image appears as a relief contrast which can be modified by changing the phase difference: for example, by 
moving the Wollaston prism perpendicularly to the optical axis. 
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Figure Bl.18.8. Differential interference contrast: the light beam is split into two beams by a Wollaston 
prism. The two beams pass the object at closely spaced positions and give, after interference, a contrast due to 
the phase difference. 

The interpretation of such images requires some care, because the appearance of a relief structure may be 
misleading; it does not necessarily mean that the surface or thickness of the object is relief-like. Obviously, 
such a relief may also appear if samples are homogeneously thick, but composed of elements of different 
indices of refraction. Also, edges of the object may be missed if they are inappropriately oriented with respect 
to the polarization direction of the beams. 

Interference microscopy is also possible in reflection. The surface structure of highly reflecting objects such 
as metals or metallized samples is frequently investigated in this way. Using multiple-beam interference [10], 
surface elevations as small as a few nanometres in height or depth can be measured. This is due to the fact that 
the interference lines become very sharp if the monochromaticity of the light and the number of interfering 
beams are high. 
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B1. 18.3.5 FLUORESCENCE MICROSCOPY 


Fluorescence microscopy has been a very popular method of investigating biological specimens and obtaining 
contrast in otherwise transparent organic objects. The samples are stained with fluorescent dyes and 
illuminated with light capable of exciting the dye to fluoresce. The wavelength of the emitted light is Stokes- 
shifted to wavelengths longer than that of the primary beam. Since the quantum efficiency (the ratio of the 
numbers of emitted to exciting photons) of the dyes is often low and since the light is emitted in all directions, 
the image is of low intensity. Nevertheless, this technique allows images of high contrast and of high signal- 


to-noise ratio to be obtained. The principle of fluorescence microscopy is illustrated in figure B 1.1 8. 9 for the 
epifluorescence microscope. The primary excitation does not in principle directly enter the detector and thus 
provides the desired contrast between stained and unstained areas, which appear completely dark. It is obvious 
from the very nature of the preparation technique that, in addition to morphological structures, chemical and 
physicochemical features of the sample can be revealed if the dyes adsorb only at special chemical sites. 
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Figure Bl.18.9. Epifluorescence microscope: the object is excited from the top and the fluorescent light is 
emitted in all directions, as indicated by the multitude of arrows in the object plane. The fluorescent light 
within the aperture of the objective gives rise to the image, showing that much of the fluorescent light is lost 
for imaging. 
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B1.18.4 SCANNING MICROSCOPY 

In scanning microscopy, the object is successively scanned by a light spot, in contrast to conventional 
microscopy in which the entire object field is processed simultaneously. Thus, scanning represents a serial 
(and conventional microscopy a parallel) processing system. The requirements for the optical lenses are 
relaxed for the scanning microscope, because the whole field of view is no longer imaged at once, but the 
price that is paid is the need for reconstruction of the image from a set of data and the required precision for 
the scanning. 


A point light source is imaged onto the specimen by the objective and the transmitted light collected by the 
collector lens and detected by a broad-area detector; in the case of reflection microscopy, the objective lens 
also serves simultaneously as a collector (see figure Bl.18.10 . The resolution is solely determined by the 
objective lens, because the collector has no imaging function and only collects the transmitted light. The 


scanning is assumed in figure Bl.18.10 to be based on the mechanical movement of the sample through the 
focal point of the objective. In this case, off-axis aberrations of the objective are avoided, the area to be 
imaged is not limited by the field of view of the objective and the image properties are identical and only 
determined by the specimen. The drawback of stage movement is the lower speed compared with beam 
scanning and the high mechanical precision required for the stage. Beam scanning allows the image to be 
reconstructed from the serially available light intensity data of the spots in real time [ 11 , 12 ]. If a framestore is 
available, the image can be taken, stored, processed if desired and displayed. Processing of the electrical 
signal offers advantages. There is, for example, no need to increase contrast by stopping down the collector 
lens or by dark-field techniques which, in contrast to electronic processing, modify the resolution of the 
image. 

Scanning gives many degrees of freedom to the design of the optical system and the confocal arrangement is 
one of the most prominent, having revolutionized the method of microscopic studies, in particular of 
biological material. Since confocal microscopy has in recent years proved to be of great importance, it is 
discussed in some detail here. 
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Figure Bl.18.10. Scanning microscope in reflection: the laser beam is focused on a spot on the object. The 
reflected light is collected and received by a broad-area sensor. By moving the stage, the object can be 
scanned point by point and the corresponding reflection data used to construct the image. Instead of moving 
the stage, the illuminating laser beam can be used for scanning. 


B1.18.5 CONFOCAL SCANNING MICROSCOPY 


B1. 18.5.1 PRINCIPLE AND ADVANTAGES OF CONFOCAL MICROSCOPY 


The progress that has been achieved by confocal microscopy [13, 14, 15 and 16] is due to the rejection of 


object structures outside the focal point, rejection of scattered light and slightly improved resolution. These 
improvements are obtained by positioning pointlike diaphragms in optically conjugate positions (see figure 
Bl.18.11 . The rejection of structures outside the focal point allows an object to be optically sectioned and not 
only images of the surface are obtained by scanning but also of sections deep in a sample, so that three- 
dimensional microscopic images can be prepared as well as images of sections parallel to the optical axis. 
Therefore, internal structures in biological specimens can be made visible on a microsopic scale without 
major interference with the biological material by preparational procedures (fixation, dehydration etc) and 
without going through the painstaking procedure of mechanical sectioning. In addition, time-dependent 
studies of microscopic processes are possible. Obviously, there is a price to be paid: 
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Confocal microscopy requires serial data acquisition and processing and hence comprises a complete system 
whose cost exceeds that of a conventional microscope. 
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Figure Bl.18.11. Confocal scanning microscope in reflection: the pinhole in front of the detector is in a 
conjugate position to the illumination pinhole. This arrangement allows the object to be optically sectioned. 
The lens is used to focus the light beam onto the sample and onto the pinhole. Thus, the resulting point spread 
function is sharpened and the resolution increased. 

Figure B 1.1 8.1 1 shows the basic arrangement of a confocal instrument. The important points are more easily 
presented for the reflection microscope, although everything also applies to transmission if modified 
appropriately. The broad-area detector is replaced by a point detector implemented by a pinhole placed in 
front of the detector at the conjugate position to the pinhole on the illumination side. This arrangement 
ensures that only light from the small illuminated volume is detected and light that stems from outside the 
focal point is strongly reduced in intensity. This is illustrated in figure Bl.18.12 where a reflecting object is 
assumed to be below the focal plane: only a small fraction of the reflected light reaches the detector, since it is 
shielded by the pinhole. The intensity drops below detection threshold and no image can be formed. 
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Figure Bl. 18.12. Illustration that only light reflected from object points in the focal plane contributes to the 
image. If the light is reflected from areas below the focal plane, only a small fraction can pass through the 
pinhole so that light from those areas does not contribute to the image. The pinhole in front of the detector is 
exaggerated in size for the sake of presentation. 

B1. 18.5.2 OPTICAL SECTIONING, SMALLEST SLICE THICKNESS AND AXIAL RESOLUTION 

The fact that only points in the focal plane contribute to the image, whereas points above or below do not, 
allows optical sectioning. Thus, the object can be imaged layer by layer by moving them successively into the 
focal plane. For applications, it is important how thin a slice can be made by optical sectioning. A point-like 
object imaged by a microscope has a finite volume [13] which is sometimes called voxel, in analogy to pixel 
in two dimensions. Its extension in the axial direction determines the resolution in this direction (z) and the 
smallest thickness of a layer that can be obtained by optical sectioning. The intensity variation of the image of 
a point along the optical axis for the confocal arrangement is given by 

/(ji) = {sm(j//4)/M/4} 4 


with u = (Sn/X) z sin 2 (a/2). X is the wavelength of the light, and n sin a is the numerical aperture of the lens. 
The function I(u) is zero at u = n, 2% . . .. If we take the spread of the function I(u) between u = ± n as the 
smallest slice thickness t, one obtains 
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t = JL/(4sin 2 ff/2). 


The slice thickness is proportional to the wavelength of the light and a function of the aperture angle. For X = 
0.5 |um, the slice thickness is about 0.25 |im for a = n/2. Obviously, the point spread function serves also to 


determine the smallest separation that two points in the axial direction may have in order to be resolved. If the 
Rayleigh criterion is applied — intensity between the image points to be half of the intensity at maximum — 
then the resolution is in the range of 0.15-0.2 |um. 

B1. 18.5.3 RANGE OF DEPTH FOR OPTICAL SECTIONING 

The greatest depth at which a specimen can be optically sectioned is also of interest. This depth is limited by 
the working distance of the objective, which is usually smaller for objectives with greater numerical aperture. 
However, the depth imposed by the working distance of the objective is rarely reached, since other 
mechanisms provide constraints as well. These are light scattering and partial absorption of the exciting and 
emitted light, in the case of fluorescence microscopy. The exciting beam is partially absorbed by fluorophores 
until it reaches the focused volume. Hence, less light is emitted from a focused volume that is deep in the bulk 
of a sample. Thus, the intensity of the light reaching the detector decreases with increasing depth, so that for 
image formation laser power and/or integration time would have to be increased. Though technically possible, 
both cannot be increased beyond thresholds at which the samples, especially biological materials, are 
damaged. 

B1. 18.5.4 LATERAL RESOLUTION 

The extension of the voxel in a radial direction gives information on the lateral resolution. Since the lateral 
resolution has so far not been discussed in terms of the point spread function for the conventional microscope, 
it will be dealt with here for both conventional and confocal arrangements [13]. The radial intensity 
distribution in the focal plane (perpendicular to the optical axis) in the case of a conventional microscope is 
given by 

with v = (2n/X)rn sin a, X is the wavelength of the light, r is the radial coordinate, n sin a is the numerical 
aperture, and ^(v) is the first-order Bessel function of the first kind. Zero intensity is at v = 1.22 tt, 2.237T, 
3.4271.... 

For the confocal arrangement in transmission, the objective and the collector are used for imaging; in 
reflection the objective is used twice. Therefore, the radial intensity distribution in the image is the square of 
that of the conventional microscope: 

f^tlv) = (2Mv)fv)\ 

/ Jv) has the same zero points as / (v). However, in the confocal case the function is sharpened and the 

UOIl-L ill 

sidelobes are suppressed. The light intensity distributions for the conventional and the confocal case are 
depicted in figure Bl.18.13 . If the Rayleigh criterion for the definition of resolution is applied, one finds that 
the lateral resolution in 
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the confocal case is improved in comparison with conventional microscopy: obviously, the sharpened 
function in the confocal case allows two closely spaced points at smaller separation to be distinguished. 
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Figure Bl. 18.13. The point spread function in the confocal arrangement (full square) is sharpened in 
comparison with the conventional arrangement (full diamond). Therefore, the resolution is improved. 

B1. 18.5.5 CONTRAST ENHANCEMENT AND PRACTICAL LIMITS TO CONFOCAL ONE-PHOTON-EXCITATION 
FLUORESCENCE MICROSCOPY 

The methods to improve contrast described for conventional microscopy can also be applied to confocal 
microscopy [17]. However, because the images are obtained by scanning and data processing, the tools of 
image manipulation are also advantageously utilized to improve image quality. Nevertheless, all these 
methods have their limitations, as will be explained in the following example. Biological studies are often 
made with fluorescence. Usually the fluorophore is excited by one photon from the ground state to the excited 
state; the ratio of the number of photons emitted by the fluorophore to the number of exciting photons is, as a 
rule, significantly below one. Therefore, the number of photons collected per voxel is low, depending on the 
density of fluorophores, on the exciting light intensity and on the scan rate. The density of fluorophores is, in 
general, determined by the requirements of the experiment and cannot be significantly varied. In order to 
increase the signal, the scan rate would have to be lowered, and the number of scans and the exciting light 
intensity increased. However, extended exposure of the dyes leads to bleaching in the whole cone of 
illumination and hence to the number of layers to be sectioned. The number of layers is even more reduced if, 
in addition to bleaching the fluorophore, the excitation produces toxic products that modify or destroy the 
properties of 
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living cells or tissues. Lasers could, in principle, supply higher light intensities, but saturation of emission of 
the dyes additionally limits the applicable power. In these circumstances, real improvement can only be 
reached if the voxel at the focal point could exclusively be excited by the incident light. 


B1.18.5.6 CONFOCAL MICROSCOPY WITH MULTIPHOTON-EXCITATION FLUORESCENCE 


Usually a fluorophore is excited from its ground to its first excited state by a photon of an energy which 
corresponds to the energy difference between the two states. Photons of smaller energy are generally not 
absorbed. However, if their energy amounts to one-half or one-third (etc) of the energy difference, a small 
probability for simultaneous absorption of two or three (etc) photons exists since the energy condition for 
absorption is fulfilled. However, due to this small probability, the photon density has to be sufficiently high if 
two-, three- or ?z -photon absorption is to be observed [18]. In general, these densities can only be achieved by 
lasers of the corresponding power and with appropriate pulse width, since absorption by multiphoton 
processes increases with the nth power of the photon density (figure Bl.18.14). 
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Figure Bl.18.14. Schematic representation of the increase of absorption with photon density for two- and 
three-photon absorption. 

One-photon excitation has limitations due to the unwanted out-of- focus fluorophore absorption and bleaching, 
and light scattering. These drawbacks can be circumvented if multiphoton excitation of the fluorophore is 
used. Since it increases with the nth power of the photon density, significant absorption of the exciting light 
will only occur at the focal point of the objective where the required high photon density for absorption is 
reached. Consequently, only 
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fluorescent light will be emitted from a volume element whose size will be determined by the intensity and 
power law dependence of the exciting radiation. Though confocal arrangement is not needed, it was shown 
that, in the confocal arrangement, the effective point spread function is less extended and, hence, the 
resolution improved [19, 20 ]. 

Thus, multiphoton excitation eliminates unwanted out-of- focus excitation, unnecessary phototoxity and 
bleaching. However, efficient power sources are required and, since the efficiency of multiphoton excitation is 
usually low, the times needed to generate images are increased. 


B1. 18.5.7 THE FUTURE: RESOLUTION BEYOND THE DIFFRACTION LIMIT IN CONFOCAL FLUORESCENCE 


MICROSCOPY? 

As light microscopy has many advantages over other microscopic techniques, the desire is to overcome the 
limit due to the extension of the point spread function or to reduce the emitting volume by multiphoton 
excitation. One proposal was made on the basis of fluorescence microscopy [21]. As discussed, in 
fluorescence microscopy, molecules are excited to emit light which is then used to form the microscopic 
image. If the exciting light is imaged onto a small volume of the sample, light emitted from this volume 
determines the spatial resolution (i.e. both in depth and the lateral direction). If the light-emitting volume can 
be reduced, resolution will be improved. This is achievable, in principle, by stimulated emission: if the 
stimulated emission rate is higher than the fluorescence decay and slower than the decay rate of intrastate 
vibrational relaxation, the emitting volume in the focal region shrinks. Estimates predict a resolution of 0.01- 

0.02 |um for continuous illumination of 1 mW and picosecond excitations of 10 MW cm -2 at a rate of 200 
kHz. If this idea can be reduced into practice, the diffraction limit would be overcome. The high resolution 
combined with the advantages of light microscopy over other microscopic methods would indeed represent a 
major breakthrough in this field. 
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B1.19 Scanning probe microscopies 

Nicholas D Spencer and Suzanne P Jarvis 


B1.19.1 INTRODUCTION 

The development of the scanning tunnelling microscope (STM) [1] was a revelation to the scientific 
community, enabling surface atomic features to be imaged in air with remarkably simple apparatus. The STM 
earned Binnig and Rohrer the Nobel prize for physics in 1986, and set the stage for a series of scanning probe 
microscopies (SPMs) based on a host of different physical principles, many of the techniques displaying 
nanometre resolution or better. 

The methods have in turn launched the new fields of nanoscience and nanotechnology, in which the 
manipulation and characterization of nanometre-scale structures play a crucial role. STM and related methods 
have also been applied with considerable success in established areas, such as tribology [2], catalysis [3], cell 
biology [4] and protein chemistry [4], extending our knowledge of these fields into the nanometre world; they 
have, in addition, become a mainstay of surface analytical laboratories, in the worlds of both academia and 
industry. 

Central to all SPMs (or 'local probe methods', or 'local proximal probes' as they are sometimes called) is the 
presence of a tip or sensor, typically of less than 100 nm radius, that is rastered in close proximity to — or in 
'contact' with — the sample's surface. This set-up enables a particular physical property to be measured and 
imaged over the scanned area. Crucial to the development of this family of techniques were both the ready 
availability of piezoelements, with which the probe can be rastered with subnanometre precision, and the 
highly developed computers and stable electronics of the 1980s, without which the operation of SPMs as we 
know them would not have been possible. 

A number of excellent books have been written on SPMs in general. These include the collections edited by 
Wiesendanger and Giintherodt [5] and Bonnell [6] as well as the monographs by Wiesendanger [7], DiNardo 
[8]andColton[9]. 


B1.19.2 SCANNING TUNNELLING MICROSCOPY 


B1. 19.2.1 PRINCIPLES AND INSTRUMENTATION 


Tunnelling is a phenomenon that involves particles moving from one state to another through an energy 
barrier. It occurs as a consequence of the quantum mechanical nature of particles such as electrons and has no 
explanation in classical physical terms. Tunnelling has been experimentally observed in many physical 
systems, including both semiconductors [10] and superconductors [11]. 

In STM, a sharp metal tip [12] is brought within less than a nanometre of a conducting sample surface, using a 
piezoelectric drive ( figure B 1.19.1 ). At these separations, there is overlap of the tip and sample wavefunctions 
at the 


gap, resulting in a tunnelling current of the order of nanoamps when a bias voltage (±10 ^-4 V) is applied to 
the tip [13]. The electrons flow from the occupied states of the tip to the unoccupied states in the sample, or 
vice versa, depending on the sign of the tip bias. The current is exponentially dependent on the tip-sample 
distance [14], 


i = Cfkpstr*^ 


(B1.19.1J 


where s is the sample-tip separation, § is a parameter related to the barrier between the sample and the tip, p t 
is the electron density of the tip, p s is the electron density of the sample and C, a constant, is a linear function 
of voltage. The exponential dependence on distance has several very important consequences. Firstly, it 

enables the local tip-sample spacing to be controlled very precisely (<10 A) by means of a feedback loop 
connected to the z-piezo, using the tunnelling current as a control parameter. Secondly, it means that despite 
the fact that the tip may be many tens of nanometres in radius, the effective radius — through which most of 
the tunnelling takes place — is of atomic dimensions, yielding subnanometre spatial resolution. This tip may be 
rastered over an area that can range from hundredths of square nanometres to hundreds of square microns, and 
the surface topography — or more specifically the spatial distribution of particular electronic states — may 
thereby be imaged. Imaging (which may be done in air, in vacuum, or even under liquids) may be achieved 
either by monitoring the tunnelling current, in order to maintain a constant tip-sample separation and 
displaying the z- voltages as a function of x and y position, or by simply rastering the tip above the surface at a 
constant height, and plotting the tunnelling current on the z-axis. The former is known as constant-current 
mode, the latter as constant-height mode ( figure B 1.1 9. 2 ). While constant-current mode is more stable for 
relatively rough surfaces, it is also somewhat slower than constant-height mode, because of its reliance on the 
feedback system, which sets a limit on the maximum scan speed. 
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Figure Bl.19.1. Principle of operation of a scanning tunnelling microscope. The x- and jy-piezodrives scan the 
tip across the surface. In one possible mode of operation, the current from the tip is fed into a feedback loop 
that controls the voltage to the z-piezo, to maintain constant current. The line labelled z-displacement shows 
the tip reacting both to morphological and chemical (i.e. electronic) inhomogeneities. (Taken from [ 213 ].) 
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Figure Bl.19.2. The two modes of operation for scanning tunnelling microscopes: (a) constant current and (b) 
constant height. (Taken from [ 214 ], figure 1.) 

The image obtained in a STM experiment is conventionally displayed on the computer screen as grey scales 
or false colour, with the lightest shades corresponding to peaks (or highest currents) and darkest shades 
corresponding to valleys (or lowest currents). With such graphic methods of data display, it is particularly 
tempting to interpret atomic-scale STM images as high-resolution topographs. However, it must be 
remembered that only electrons near the Fermi energy contribute to the tunnelling current, whereas all 
electrons contribute to the surface charge density. Since topography can reasonably be defined as a contour of 
constant surface charge density [15], STM images are intrinsically different from surface topographs. 


In addition to its strong dependence on tip-sample separation, the tunnelling current is also dependent on the 
electron density of states (DOS) of both tip and sample ( equation Bl.19.1 ). This dependence can be exploited 
to produce a map of the local DOS under the tip by varying the applied voltage and measuring the tunnelling 
current. Both occupied and unoccupied electronic states can be probed by this method, which is known as 
scanning tunnelling spectroscopy (STS) [16]. The traditional method of mapping DOS is to use ultraviolet 
photoelectron spectroscopy (UPS) to measure occupied states and inverse photoemission spectroscopy (IPS) 
to measure empty states. However, it is important to remember that these data do not correspond exactly to 
those derived from STS measurements. Firstly, the STS spectrum is a convolution of tip and sample 
properties (a potential problem, should the tip become contaminated during the experiment). Secondly, since 
states near the upper edge of the energy range investigated see a lower barrier than those near the lower end, 
they contribute a greater tunnelling current, so that sensitivity to occupied states falls off with increasing 
energy below the Fermi level. Thirdly, STS is a much more surface- (and above-surface-) sensitive technique 
than UPS or IPS, meaning that surface electronic states contribute far more to the STS spectrum. This also 
means that the sensitivity to s, p, and d states is different in STS, due to the different degrees to which the 
electron density associated with these states extends out of the surface. 


B1. 19.2.2 APPLICATIONS OF SCANNING TUNNELLING MICROSCOPY 


(A) SEMICONDUCTORS 


STM found one of its earliest applications as a tool for probing the atomic-level structure of semiconductors. 
In 1983, the 7 x 7 reconstructed surface of Si(l 1 1) was observed for the first time [ 17 ] in real space; all 
previous observations had been carried out using diffraction methods, the 7 x 7 structure having, in fact, only 
been hypothesized. By capitalizing on the spectroscopic capabilities of the technique it was also proven [ 18 ] 
that STM could be used to probe the electronic structure of this surface (figure B 1.1 9. 3). 
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Figure Bl.19.3. STM images of Si(l 1 1)-(7 
(Taken from [18], figure 1.) 


7) measured with (a) -2 V and (b) +2 V applied to the sample. 


A complete STS spectrum of the 7^7 reconstructed Si(l 1 1) surface displays remarkable correlation with the 
corresponding UPS and IPS spectra [19] ( figure B 1.1 9.4 ), showing the potential value of this approach. The 
high spatial resolution of the STS technique has also been demonstrated using a silicon surface containing 
impurity atoms [20] ( figure B 1.1 9. 5 ), where the absence or presence of a band gap over an individual atom 
shows whether it belongs, respectively, to the silicon or to a metallic impurity. 
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Figure Bl.19.4. (a) Local conductance STS measurements at specific points within the Si(l 1 1)-(7 x 7) unit 
cell (symbols) and averaged over whole cell, (b) Equivalent data obtained by ultraviolet photoelectron 
spectroscopy (UPS) and inverse photoemission spectroscopy (IPS). (Taken from [19], figure 2.) 
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Figure Bl.19.5. Tunnelling /-^curves acquired across a defect on Si(100). Away from the defect a bandgap 
can be seen. Over the defect itself, the bandgap disappears, suggesting that it possesses metallic character. 


This analysis was performed with spatial resolution of better than 1 nm. (Taken from [20], figure 2 and figure 
6.) 

Chemical reactions of ammonia with the silicon surface have also been clearly observed using STS [21], 
where the disappearance of the n and tt* states characteristic of the clean surface coincides with the formation 
of Si-H antibonding states corresponding to the dissociation of the ammonia on the Si surface. 

Other semiconductors have also proved to be a fruitful ground for STM investigation. Zheng et al [22] have 
used the spatial resolution and electronic state sensitivity of STM to spatially display the electronic 
characteristics of single Zn impurity atoms in Zn-doped GaAs, both in filled and empty states, which show 
spherical and triangular symmetry, respectively. Upon imaging a number of Zn-induced features, a variety of 
different heights were recorded, corresponding to the depth of the impurity atoms within the sample. Thus 
STM was used to probe both the chemical nature and the 3D spatial location of the impurity atoms — an 
achievement that would have been inconceivable before the advent of STM. 

STM has not as yet proved to be easily applicable to the area of ultrafast surface phenomena. Nevertheless, 
some success has been achieved in the direct observation of dynamic processes with a larger timescale. 
Kitamura et al [23], using a high-temperature STM to scan single lines repeatedly and to display the results as 
a time-versws-position pseudoimage, were able to follow the diffusion of atomic-scale vacancies on a heated 
Si(001) surface in real time. They were able to show that vacancy diffusion proceeds exclusively in one 
dimension, along the dimer row. 


(B) METALS 

STM has been applied with great success to the study of metals and adsorbate-metal systems [24]. This has 
naturally brought the technique into the mainstream of surface science, where structural information at the 
atomic level could previously only be obtained via diffraction methods such as low-energy electron 
diffraction (LEED) [25]. The STM can also provide a level of electronic information and visualization of the 
quantum mechanical behaviour of electrons that is unavailable from other methods: the images of copper and 
silver surfaces obtained by the groups of Eigler [ 26 ] and Avouris [27], showing standing waves produced by 
the defect-induced scattering of the 2D electron gas in surface states, bear eloquent testament to this (figure 
Bl.19.6). 



Figure Bl.19.6. Constant current 50 nm x 50 nm image of a Cu(l 11) surface held at 4 K. Three monatomic 
steps and numerous point defects are visible. Spatial oscillations (electronic standing waves) with a 


periodicity of -1.5 nm are evident. (Taken from [26], figure 1.) 

Surface reconstructions have been observed by STM in many systems, and the technique has, indeed, been 
used to confirm the 'missing row' structure in the 1 x 2 reconstruction of Au(l 10) [28]. As the temperature 
was increased within 10 K of the transition to the disordered lxl phase (700 K), a drastic reduction in 
domain size to -20^0 A (i.e. less than the coherence width of LEED) was observed. In this way, the STM 
has been used to help explain and extend many observations previously made by diffraction methods. 

STM studies of simple adsorbates on metal surfaces have proved challenging, partly due to the significant 
mobility of most small species on metals at room temperature, which therefore generally necessitates low- 
temperature operation. Additionally, since adsorbates can change the local density of states in the metal 
surface, particular care must be taken not to interpret STM images of adsorbate-metal systems as simple 
topographs, but rather to capitalize on the technique's capability for observing unoccupied and occupied 
energy states. In this way, the bond between adsorbate and substrate can be investigated on a local level, 
subject to the restrictions on energy range mentioned above. By observing the electronic changes in the 
neighbourhood of an adsorption site, much can be learned about the range over which chemical bonds can act 
and influence each other. 


Metal surfaces in motion have also been characterized by STM, one of the clearest examples being the surface 
diffusion of gold atoms on Au(l 11) [ 29 ] (figure B 1.1 9. 7). Surface diffusion of adsorbates on metals can be 
followed [30] provided that appropriate cooling systems are available, and STM has been successfully 
employed to follow the 2D dendritic growth of metals on metal surfaces [31]. 
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Figure Bl.19.7. A series of time-lapse STM topographic images at room temperature showing a 40 nm x 40 
nm area of Au(l 11). The time per frame is 8 min, and each took about 5 min to scan. The steps shown are one 
atomic unit in height. The second frame shows craters left after tip-sample contact, which are two and three 
atoms deep. During a 2 h period the small craters have filled completely with diffusing atoms, while the large 
craters continue to fill. (Taken from [29], figure 1.) 

(C) ORGANIC SURFACES 


The operation of the STM depends on the conduction of electrons between tip and sample. This means, of 
course, that insulating samples are, in general, not accessible to STM investigations. Nevertheless, a large 
body of work [ 32 ] dealing with STM characterization of thin organic films on conducting substrates is now in 


the literature, and the technique provides local structural and electronic information that is essentially 
inaccessible by any other method. 

STM of thin organic layers involves the tunnelling of current between the tip and the conducting substrate, 
underneath the organic layer. By choosing the tunnelling parameters appropriately [ 32 ] (« 0.3-1 V, 0.05-1 nA 
for adsorbate, 0.1-0.3 V, 0.3-10 nA for substrate), the method can be used to image either the substrate or the 
adsorbate — or both simultaneously, if a suitable voltage programme is used — repeating each line scanned at 
both voltages. There is some evidence that the tip can damage the organic layer during the imaging process 
[33]. The precise mechanism by which insulating molecules are imaged remains a topic of much discussion. 
Although single organic molecules have been successfully imaged by STM [34], the majority of STM studies 
of organic species has concerned a single monolayer of molecules deposited by evaporation, or by self- 
assembly, or by Langmuir-Blodgett techniques [35]. Often these images corroborate what had already been 
deduced from painstaking LEED investigations: an example is the imaging of co-adsorbed arrays of benzene 
and mobile CO, as seen by Ohtani et al [36]. 


One class of large molecules that was investigated relatively early was liquid crystals [37, 38], and in 
particular the group 4-n-alkyl-4'-cyanbiphenyl (mCB). These molecules form a highly crystalline surface 
adlayer, and STM images clearly show the characteristic shape of the molecule (figure B 1.1 9. 8). 
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Figure Bl.19.8. (a) STM image (5.7 nm x 5.7 nm) of 10-alkylcyanobiphenyl on graphite; (b) model showing 
the packing of the molecules. The shaded and unshaded segments represent the alkyl tails and the 
cyanobiphenyl head groups, respectively. (Taken from [38], figure 2.) 


The self-assembly of alkanethiols on gold has been an important topic in surface chemistry over the last few 
years [39] and STM has contributed significantly to our understanding of these systems. In particular, the 


formation of etch pits on the surface of Au(l 11) following treatment with alkanethiols is a phenomenon that 
was first observed by STM [40]. The segregation of thiols of different molecular weight or functionality is 
proving to be a relevant issue in their application. Stranick et al [41] have used STM to show the segregation 
of thiols with only very slight molecular differences into domains of size 10-100 A and their subsequent 
coalescence. 

The STM study of biological macromolecules has also been an area of great activity, and the imaging of DNA 
has been one of the challenges of the STM technique [42] ( figure B 1.1 9. 9 ). The elimination of artifacts has 
been a major issue in this story, and the work of Beebe et al [43] showing that 'DNA-like' structures were to 
be seen on the surface of clean graphite (HOPG) substrates was something of a milestone ( figure B 1.1 9. 10 ). 
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Figure Bl.19.9. Plasmid DNA (pUC18) on mica imaged by STM at high resolution. The inset is a cut-out of 
a zoomed-in image taken immediately after the overview. (Taken from [42], figure 2.) 



Figure Bl. 19.10. These images illustrate graphite (HOPG) features that closely resemble biological 
molecules. The surface features not only appear to possess periodicity (A), but also seem to meander across 
the HOPG steps (B). The average periodicity was 5.3 ±1.2 nm. Both images measure 150 nm x 150 nm. 
(Taken from [43], figure 4.) 
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Other biomolecules imaged have included all DNA bases [44], polysaccharides [ 45 ] and proteins [46, 47]. In 
many cases there is strong evidence that the imaging process is facilitated by the presence of ultrathin 
(conducting) water films on the surface of the sample [48, 49 and 50 ]. 

Lastly, STM has also been applied to the molecular-level imaging of polymer structures. In some cases these 
materials were deposited by Langmuir-Blodgett techniques [51], and in some cases by in situ polymerization 
[52]. Fujiwara et al [51] have used molecular dynamics simulations to interpret the images obtained from 
STM experiments. The combined use of these two techniques is proving to be a very powerful tool for 
understanding the conformation of polymer films on surfaces. They showed that the individual polyimide 
strands observed were aligned parallel to the deposition direction of the Langmuir-Blodgett film. 

(D) ELECTROCHEMISTRY 

The molecular-level observation of electrochemical processes is another unique application of STM [ 53 , 54 ]. 
There are a number of experimental difficulties involved in performing electrochemistry with a STM tip and 
substrate, although many of these have been essentially overcome in the last few years. 

If the scanning tip is to be involved in electrochemical reactions, it is important to remember that at 
micrometre separations (i.e. when the tip is too far from the substrate for tunnelling to occur), the faradaic 
current is given by the equation [54]: 7 f « 4nFD C r, where D Q is the diffusion coefficient of a particular 
species, F is Faraday's constant, C Q is the concentration of the species in solution, r is the radius of a disc of 
area equal to the effective exposed area of the tip and n is the number of electrons involved in the reaction. 
The total tip current, /, when the separation is small enough for tunnelling to occur, is given by / = 7 f + I v 
where I t is the tunnelling current, which is virtually independent of the total tip area exposed. In order to 
minimize 7 f , so as to be able to perform meaningful STM experiments, the exposed tip must be made as small 
as possible, and a plethora of techniques has been developed [53] for insulating all but the very end of the tip. 

Several designs for STM electrochemical cells have appeared in the literature [55]. In addition to an airtight 
liquid cell and the tip insulation mentioned above, other desirable features include the incorporation of a 
reference electrode (e.g. Ag/AgCl in saturated KC1) and a bipotentiostat arrangement, which allows the 
independent control of the two working electrodes (i.e. tip and substrate) [ 56 ] ( figure Bl.19.11 ). 
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Figure Bl. 19.11. Schematic presentation of a potentiostatic STM system, with individual potential control of 
substrate and tip. Piezoelectric single-tube scanner Sc with titanium spacer plate S; electrochemical cell EC 
consisting of plexiglass beaker B with Pt counterelectrode C and Ag/AgCl reference electrode Re in 0.1 M 
NaCl; plexiglass lid L; PTFE support unit SU with epoxy-sealed substrate, mounted on support plate SP. 
Low-noise potentiostat P with low-impedance voltage units C/ s and f/ T , both equipped with low-pass filter 
and signal generator SG; precision resistor R for measuring (z' s + / T ); low-noise current amplifier CA for 
measuring i T (Taken from [56], figure 1.) 

Examining electrodes and how they change under conditions of electrochemical reaction has been a major 
part of the electrochemical STM work performed until now. Many studies have revealed changes in surface 
reconstructions on silver and gold electrodes during electrochemical reactions [57], as well as increasing or 
decreasing surface roughness, depending on the conditions and electrolyte employed. Another field of activity 
has been the monitoring of metal deposition on electrodes [58], which is, of course, of tremendous practical 
importance. Since STM can image both periodic and non-periodic structures, it is of great utility, both in 
determining the geometric relationships between deposited metal and substrate, as well as in assessing the role 
of steps and defects in the deposition process [57]. 

Corrosion is another economically significant process that can be investigated on a molecular level, thanks to 
electrochemical STM. In addition to a number of academically interesting studies of systems such as the 
selective dissolution of copper from Cu-Au alloys [59], STM has also been used to investigate the properties 
of iron and steel under a variety of conditions designed to induce either passivation, corrosion, or 
electrochemical anodization [60, 61] . In the case of corrosion, STM has been used to monitor the growth of 
magnetite crystallites on the surface of the sample as it is taken through several successive cyclic 
voltammograms [61]. 
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The technique of scanning electrochemical microscopy (SECM) [ 62 ] uses the same apparatus as in 
electrochemical STM, but instead of measuring tunnelling currents, the reaction O + ne — » R (where O and R 


are oxidized and reduced species, respectively) is followed, by measuring the Faradaic current, 7 f , at distances 
further from the substrate than those at which tunnelling will readily occur. The current, Ip at distances far 
from the substrate surface, corresponds to the hemispherical diffusion of O to the tip surface (figure 
B 1.1 9. 12). As the tip nears the surface, this current is perturbed, either by hindered diffusion (lower current) 
or by reoxidation of R on the surface (higher current). The conductivity, potential, and electrochemical 
activity will therefore all influence Ip which can thus be used to produce an electrochemical image of the 
surface — if plotted as a function of x andj — as the tip is rastered over the surface. The technique has been 
used to image metals, polymers, biological materials and semiconductors. 



Figure Bl. 19.12. Basic principles of SECM. (a) With ultramicroelectrode (UME) far from substrate, 
diffusion leads to a steady-state current, z' T , oo. (b) UME near an insulating substrate. Hindered diffusion leads 
to ij < z' T , oo. (c) UME near a conductive substrate. Positive feedback leads to z' T > z' T , oo. (Taken from [62], 
figure 2.) 


(E) CATALYSIS 

It has long been the goal of many catalytic scientists to be able to study catalysts on a molecular level under 
reaction conditions. Since the vast majority of catalytic reactions take place at elevated temperatures, the use 
of STM for such in situ catalyst investigations was predicated upon the development of a suitable STM 
reaction cell with a heating stage. This has now been done [3] by Mclntyre et al, whose cell-equipped STM 
can image at temperatures up to 150 °C and in pressures ranging from ultrahigh vacuum up to several 
atmospheres. The set-up has been used for a number of interesting studies. In one mode of operation [ 63 ] 
( figure B 1.1 9. 13(a) ), a Pt-Rh tip was first used to image clusters of carbonaceous species formed on a clean 
Pt( 111) surface by heating a propylene adlayer to 550 K, and later to catalyze 
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the rehydrogenation of the species (in a propylene/hydrogen atmosphere) at room temperature. The catalytic 
activity of the tip was induced by applying a voltage pulse, which presumably cleaned the surface of 
deactivating debris (figure B 1.1 9. 13(b)). 
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Figure Bl. 19.13. (a) Three STM images of a Pt(l 1 1) surface covered with hydrocarbon species generated by 
exposure to propene. Images taken in constant-height mode. (A) after adsorption at room temperature. The 
propylidyne (=C-CH 2 -CH 3 ) species that formed was too mobile on the surface to be visible. The surface 
looks similar to that of the clean surface. Terraces (-10 nm wide) and monatomic steps are the only visible 

features. (B) After heating the adsorbed propylidyne to 550 K, clusters form by polymerization of the C H 

x y 

fragments. The clusters are of approximately round shape with a diameter equal to the terrace width. They 
form rows covering the entire image in the direction of the step edges. (C) Rows of clusters formed after 
heating to 700 K. At this higher temperature, the carbonaceous clusters are more compact and slightly smaller 
in size, as they evolve to the graphitic form when H 
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is lost completely, (b) The catalytic action of the STM Pt-Rh tip on a surface covered by carbonaceous 
clusters, as in figure Bl. 19. 13(a). (B) Imaging was performed in 1 bar of a propene (10%) and hydrogen 
(90%) mixture at room temperature. (A) Carbon clusters were imaged in the top third of the image while the 
tip was inactive. A voltage pulse of 0.9 V was applied to the position marked P, leaving a mound of material 
1.5 nm high. This process produced a chemically active Pt-Rh tip, which catalyzed the removal of all clusters 
in the remaining two-thirds of the image. Only the lines corresponding to the steps are visible. This image was 
illuminated from a near- incident angle to enhance the transition region where the tip was switched to its active 
state. (B) While the tip was in this catalytically active state, another area was imaged, and all of the clusters 
were again removed. (C) A slightly larger image of the area shown in (B) (centre square of this image), 
obtained after the tip was deactivated, presumably by contamination. The active-tip lifetime was of the order 
of minutes. (Taken from [63], figure 1 and figure 2.) 


(F) STM AS A SURFACE MODIFICATION METHOD 

Within a few years of the development of STM as an imaging tool, it became clear that the instrument could 
also find application in the manipulation of individual or groups of atoms on a surface [64]. Perhaps the most 
dramatic image originated from Eigler and Schweizer [65], who manipulated single physisorbed atoms of 
xenon on a Ni(l 10) surface, held at liquid helium temperature ( figure Bl.19.14 ). The tip-Xe distance was 
reduced (by raising the setpoint for the tunnelling current) until the tip-sample interaction became strong 
enough for the tip to be able to pick up the atom. After being moved to the desired location, the atom was 
removed by reversing the procedure. Using a similar experimental set-up, Crommie et al [66] have managed 
to shape the spatial distribution of electrons on an atomic scale, by building a ring of 48 iron adatoms (a 
'quantum corral') on a Cu(l 1 1) surface, which confines the surface-state electrons of the copper by virtue of 
the scattering effect of the Fe atoms ( figure B 1.1 9. 15 ). STS measurements of the local densities of states for 
the confined electrons correspond to the expected values for a 'particle-in-a-box', where the box is round and 
two-dimensional. In a similar way, Yokoyama et al [ 67 ] formed a pair of long straight chains of Al on the Si 
(001)-c(4 x 2) surface to create well defined ID quantum wells. The electrons in the IT* surface states can 
propagate only in the dimer-row direction of Si(001)-c(4 x 2) because of nearly flat dispersion in the 
perpendicular direction. The STM/STS measurements of the standing-wave patterns and their discrete energy 
levels could be interpreted according to the 'ID particle-in-a-box model'. This technique shows considerable 
promise for the further investigation of confined electrons and waveguides. There are numerous other means 
for moving atoms in surfaces, including voltage-pulsing techniques, which show promise as potential 
lithographic methods for silicon [68], 
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Figure Bl. 19.14. A sequence of STM images taken during the construction of a patterned array of xenon 
atoms on a Ni(100) surface. Grey scale is assigned according to the slope of the surface. The atomic structure 
of the nickel surface is not resolved. Each letter is 5 nm from top to bottom. (Taken from [65], figure 1.) 
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Figure Bl. 19.15. Spatial image of the eigenstates of a quantum corral. 48-atom Fe ring constructed on a Cu 
(1 1 1) surface. Average diameter of ring is 14.3 nm. The ring encloses a defect-free region of the surface. 
(Taken from [66], figure 2.) 

Finally, a technique that combines chemical vapour deposition (CVD) with STM has been devised by Kent et 
al [69]. The CVD gas used was iron pentacarbonyl, which is known to decompose under electron 
bombardment. Decomposition between tip and sample was found to occur at bias voltages above 5 V, forming 
iron clusters as small as 10 nm in diameter on the Si(l 1 1) substrate. Of particular practical interest is that 
arrays of 20 nm diameter dots have been shown to be magnetic, presenting a whole new range of possibilities 
for high-density data storage, as well as providing a convenient laboratory for nanometre-scale experiments in 
quantum magnetism. 


B1.19.3 FORCE MICROSCOPY 

B1. 19.3.1 PRINCIPLES 

(A) BACKGROUND 

A major limitation of the scanning tunnelling microscope is its inability to analyse insulators, unless they are 
present as ultrathin films on conducting substrates. Soon after the development of the STM, work started on 
the development of an equivalent nanoscale microscope based on force instead of current as its imaging 
parameter [70]. Such an instrument would be equally adept at analysing both conducting and insulating 
samples. Moreover, the instrument already existed on a micro- and macro-scale as the stylus profilometer 
[71]; this is typically used to measure surface roughness in one dimension, although it had been extended into 
a three-dimensional imaging technique, with moderate resolution (0.1 |um lateral and 1 nm vertical), by 
Teague et al [72]. 
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The concept that Binnig and co-workers [ 73 ] developed, which they named the atomic force microscope 
(AFM, also known as the scanning force microscope, SFM), involved mounting a stylus on the end of a 
cantilever with a spring constant, k, which was lower than that of typical spring constants between atoms. This 
sample surface was then rastered below the tip, using a piezo system similar to that developed for the STM, 
and the position of the tip monitored [74]. The sample position (z-axis) was altered in an analogous way to 
STM, so as to maintain a constant displacement of the tip, and the z-piezo signal was displayed as a function 
of x and y coordinates (figure B 1.1 9. 16). The result is a force map, or image of the sample's surface [75], 
since displacements in the tip can be related to force by Hooke's Law, F = -kz, where z is the cantilever 
displacement. In AFM, the displacement of the cantilever by the sample is very simply considered to be the 
result of long-range van der Waals forces and Born repulsion between tip and sample. However, in most 
practical implementations, meniscus forces and contaminants often dominate the interaction with interaction 
lengths frequently exceeding those predicted [76]. In addition, an entire family of force microscopies has been 
developed, where magnetic, electrostatic, and other forces have been measured using essentially the same 
instrument. 


Spring deflection sensor 



Figure Bl. 19.16. Schematic view of the force sensor for an AFM. The essential features are a tip, shown as a 
rounded cone, a spring, and some device to measure the deflection of the spring. (Taken from [74], figure 6.) 

(B) AFM INSTRUMENTATION 

The first AFM used a diamond stylus, or 'tip' attached to a gold- foil cantilever, and much thought was given 
to the choice of an appropriate lvalue [73]. While on the one hand a soft spring was necessary (in order to 
obtain the maximum deflection for a given force), it was desirable to have a spring with a high resonant 
frequency (10-100 kHz) in order to avoid sensitivity to ambient noise. The resonant frequency,/ Q , is given by 
the equation 


where m Q is the effective mass loading the spring. Thus, as k is reduced to soften the spring, the mass of the 
cantilever must be reduced in order to keep the k/m ratio as large as possible. Nowadays, cantilevers and 
integrated tips are 
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routinely micro fabricated out of silicon or silicon nitride. Typical dimensions of a cantilever are of the order 


of 1 x 10 x 100 |unr [77], the exact dimensions depending on the intended use. Cantilevers designed to 
operate in contact with the surface, in a similar way to a surface profilometer, have low spring constants 

(usually less than 1 N m _1 ) and correspondingly low resonant frequencies. Such levers are often fabricated in 
a V-shape configuration (figure B 1.1 9. 17), which makes for a greater stability towards lateral motion. If the 
tip is to come into hard contact with the surface, high-aspect-ratio tips are often desirable, with, typically, a 
radius of curvature of 10-30 nm. It is interesting to note that since experiments are generally carried out with 
contact forces on the order of nanonewtons, contact pressures in these experiments can be in the gigapascal 
range. In a stylus profilometer, the force exerted on the sample is some five orders of magnitude greater, but it 
is exerted over a larger contact area, leading to pressures in the tens of megapascals. 



Figure Bl. 19.17. Commercially produced, microfabricated, V-shaped Si 3 N 4 cantilever and tip for AFM 
(Taken from [215].) 

When the lever is intended for use with the tip separated from the surface, the lever stiffness is usually greater 

than ION m _1 with a high resonant frequency. In this case, more care is taken to prepare tips with small radii 
of curvature — sometimes as low as 2 nm. However, in reality, most experiments are performed with tips of 
unknown radii or surface composition, apart from rare cases where the AFM has been combined with field ion 
microscopy [78] or a molecule or nanotube of known dimensions and composition has been attached to the tip 
[79]. It is likely that in most AFMs, microasperities and contaminants mediate the contact. 

Detection of cantilever displacement is another important issue in force microscope design. The first AFM 
instrument used an STM to monitor the movement of the cantilever — an extremely sensitive method. STM 
detection suffers from the disadvantage, however, that tip or cantilever contamination can affect the 
instrument's sensitivity, and that the topography of the cantilever may be incorporated into the data. The most 
common methods in use today are optical, and are based either on the deflection of a laser beam [80], which 
has been bounced off the rear of the cantilever onto a position-sensitive detector ( figure B 1.1 9. 18 ), or on an 
interferometric principle [81]. 
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Figure Bl. 19.18. Schematic of an atomic force microscope showing the optical lever principle. 

Lateral resolution in AFM is usually better than 10 nm and, by utilizing dynamic measurement techniques in 
ultrahigh vacuum, true atomic resolution can be obtained [82]. In hard contact with the surface, the atomic- 
scale structure may still appear to be present, but atomic-scale defects will no longer be visible, suggesting 
that the image is actually averaged over several unit cells. The precise way in which this happens is still the 
subject of debate, although the ease with which atomic periodicity can be observed with layered materials is 
probably due to the Moire effect suggested by Pethica [83]. In this case a periodic image is formed by the 
sliding of planes directly under the tip caused by the lateral tip motion as the force varies in registry with unit 
lattice shear. 

A further issue that should be considered when interpreting AFM images is that they are convolutions of the 
tip shape with the surface ( figure B 1.1 9. 19 ). This effect becomes critical with samples containing 'hidden' 
morphology (or 'dead zones') on the one hand (such as deep holes into which the tip does not fit, or the 
underside of spherical features), or structure that is comparable in size to that of the tip on the other. While the 
hidden morphology cannot be regenerated, there have been several attempts to deconvolute tip shape and 
dimensions from AFM images ('morphological restoration') [84]. Some of these methods involve determining 
tip parameters by imaging a known sample, such as monodisperse nanospheres [85] or faceted surfaces [86], 
Another approach is to analyse the AFM image as a whole, extracting a 'worst case' tip shape from common 
morphological features that appear in the image [87], 
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Figure Bl. 19.19. Examples of inaccessible features in AFM imaging. L corresponds to the AFM tip. The 
dotted curves show the image that is recorded in the case of (a) depressions on the underside of an object and 
(b) mounds on the top surface of an object. M • L and MoL correspond to convolutions of the surface features 
with the tip shape. (Taken from [85], figure 2.) 


As with STM, the AFM can be operated in air, in vacuum or under liquids, providing a suitable cell is 
provided. Liquid cells (figure B 1.1 9. 20) are particularly useful for the examination of biological samples. 
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Figure Bl. 19.20. Cross section of an AFM fluid cell. (Taken from [ 216 ], figure 1.) 

Analogously to STM, the image obtained in a force microscopy experiment is conventionally displayed on the 
computer screen as grey scales or false colour, with the lightest shades corresponding to peaks (or highest 
forces) and darkest shades corresponding to valleys (or lowest forces). 

(C) FORCES IN AFM 

Although imaging with force microscopy is usually achieved by means of rastering the sample in close 
proximity to the tip, much can be learned by switching off the x- and j-scanning piezos and following the 
deflection of the cantilever as function of sample displacement, from large separations down to contact with 
the surface, and then back out to large separations. The deflection-displacement diagram is commonly known 
as a 'force curve' and this technique as 
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'force spectroscopy', although strictly speaking it is displacement that is actually measured, and further 
processing [88], reliant on an accurate spring constant, is necessary in order to convert the data into a true 
force-distance curve. In order to obtain sensitive force curves, it is preferable to make dynamic force-gradient 
measurements with the force and energy being found by integration [ 89 ] or alternatively to measure frequency 
shift of the cantilever resonant frequency as a function of displacement. This method has become particularly 
popular in combination with non-contact mode AFM (see below). Although with this method it is not 
straightforward to relate frequency shifts to forces, it appears to be a promising technique for distinguishing 
between materials on the nanometre scale [90], 


As the tip is brought towards the surface, there are several forces acting on it. Firstly, there is the spring force 
due to the cantilever, F s , which is given by F^ = -kz. Secondly, there are the sample forces, which, in the case 
of AFM, may comprise any number of interactions including (generally attractive) van der Waals forces, 
chemical bonding interactions, meniscus forces or Born ('hard-sphere') repulsion forces. The total force 


gradient as the tip approaches the sample is the convolution of spring and sample force gradients, or 

BF k{H 2 U/iin 2 ) 

Jd " k + a-u/dD 1 

where Uis the sample potential and D the tip-sample separation. If the spring constant of the cantilever is 

9 9 

comparable to the gradient of the tip-surface interaction, then at some point where d U/d D (negative for 
attraction) equals k, the total force gradient becomes instantaneously infinite, and the tip jumps towards the 
sample [91] ( figure B 1.19.21 ). This 'jump to contact' is analogous to the jump observed when two attracting 
magnets are brought together. The kinetic energy involved is often sufficient to damage the tip and sample, 
thus reducing the maximum possible resolution during subsequent imaging. Once a jump to contact occurs, 
the tip and sample move together (neglecting sample deformation for the time being) until the direction of 
sample travel is reversed. The behaviour is almost always hysteretic, in that the tip remains in contact with the 
sample due to adhesion forces, springing back to the equilibrium position when these have been exceeded by 
the spring force of the cantilever. The adhesion forces add to the total force exerted on the sample, and are 
often caused by tip contamination. It has been found that pretreating the tip in ozone and UV light, in order to 
remove organic contamination, reduces the adhesion observed, and improves image quality [92]- Image 
quality can also be enhanced by tailoring the imaging medium (i.e. in a liquid cell) to have a dielectric 
constant intermediate between those of the tip and the sample. This leads to a small, repulsive van der Waals 
force, which eliminates the jump to contact, and has been shown to improve resolution in a number of cases 
[ 93 , 94], probably due to the fact that the tip is not damaged during the approach. It should be noted that the 
jump to contact may also be eliminated if a stiff cantilever is chosen, such that k, the force constant of the 

cantilever, is greater than d 2 U/d D 2 at all separations. This condition for jump-to-contact is insufficient if the 
stiffness of the cantilever is artificially enhanced using feedback [ 95 ] or if dynamic measurements are made 

[96, 92]. 

Since the AFM is commonly used under ambient conditions, it must be borne in mind that the sample is likely 
to be covered with multilayers of condensed water. Consequently, as the tip approaches the surface, a 
meniscus forms between tip and surface, introducing an additional attractive capillary force. Depending on the 
tip radius, the magnitude of this force can be equal to or greater than that of the van der Waals forces and is 
observed clearly in the approach curve [98]. In fact, this effect has been exploited for the characterization of 
thin liquid lubricant films on surfaces [95]. The capillary forces may be eliminated by operation in ultrahigh 
vacuum, provided both tip and sample are baked, or, most simply, by carrying out the experiment under a 
contamination-free liquid environment, using a liquid cell [99]. 
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Figure Bl. 19.21. A plot of cantilever displacement as a function of tip sample separation during approach and 
retraction with an AFM. Note the adhesive forces upon retraction from the surface. 

(D) NON-CONTACT AFM 

Non-contact AFM (NC-AFM) imaging is now a well established true-atomic-resolution technique which can 
image a range of metals, semiconductors and insulators. Recently, progress has also been made towards high- 
resolution imaging of other materials such as C 60 , DNA and polypropylene. A good overview of recent 
progress is the proceedings from the First International Conference on NC-AFM [ 100 ]. 

Most NC-AFMs use a frequency modulation (FM) technique where the cantilever is mounted on a piezo and 
serves as the resonant element in an oscillator circuit [ 101 , 102 ], The frequency of the oscillator output is 
instantaneously modulated by variations in the force gradient acting between the cantilever tip and the sample. 
This technique typically employs oscillation amplitudes in excess of 20 nm peak to peak. Associated with this 
technique, two different imaging methods are currently in use: namely, fixed excitation and fixed amplitude. 
In the former, the excitation amplitude to the lever (via the piezo) is kept constant, thus, if the lever 
experiences a damping close to the surface the actual oscillation amplitude falls. The latter involves 
compensating the excitation amplitude to keep the oscillation amplitude of the lever constant. This mode also 
readily provides a measure of the dissipation during the measurement [ 100 ]. 

Although both methods have produced true-atomic-resolution images it has been very problematic to extract 
quantitative information regarding the tip-surface interaction as the tip is expected to move through the whole 
interaction potential during a small fraction of each oscillation cycle. For the same reason, it has been difficult 
to conclusively identify the imaging mechanism or the minimum tip-sample spacing at the turning point of 
the oscillation. 

Many groups are now trying to fit frequency shift curves in order to understand the imaging mechanism, 
calculate the minimum tip-sample separation and obtain some chemical sensitivity (quantitative information 
on the tip-sample interaction). The most common methods appear to be perturbation theory for considering 
the lever dynamics [ 103 ], and quantum mechanical simulations to characterize the tip-surface interactions 
[ 104 ]. Results indicate that the 
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interaction curve measured as a function of frequency shift does not correspond directly to the force gradient 
as first believed. 

(E) INTERMITTENT CONTACT AFM 

A further variation is intermittent contact mode or 'TappingMode' [ 105 ] (TappingMode is a trademark of 
Digital Instruments, Santa Barbara, CA.), where the tip is oscillated with large amplitudes (20-100 nm) near 

its resonant frequency, and the amplitude used to control the feedback loop. The system is adjusted so that the 

tip contacts the sample once within each vibrational cycle. Since the force on the sample in this mode is both 

small (<5 nN) and essentially normal to the surface, it is far less destructive than contact AFM, with its 

inherently large shear forces. This is of great importance when imaging biological materials; a further 

development of the intermittent-mode AFM, which allows it to be operated under liquids [ 106 ], extends the 

possibilities in this area even further. 

(F) MAGNETIC FORCE MICROSCOPY 

Magnetic forces may be exploited for the imaging of samples containing magnetic structure. Resolutions as 
high as 10 nm have been reported [ 107 ], The central modification of the AFM needed to perform magnetic 
force microscopy (MFM) is the use of a magnetic tip, which often consists of an electrochemically etched 
ferromagnetic material, or a non-magnetic tip that has been coated with a magnetic thin film [ 108 ]. The 
experiment is run in non-contact mode, with the tip some 10 nm away from the surface. Detection at long 
range helps distinguish between magnetic and non-magnetic interactions. Greater sensitivity is obtained when 
the cantilever is oscillated and magnetic force gradients detected by changes in the resonant frequency as the 
tip approaches the magnetic surface. The method is unique in its ability to image magnetic structure in 
surfaces (figure B 1.1 9.22) [ 109 ], which lies at the heart of magnetic data storage technology. 



Figure Bl. 19.22. Magnetic force microscopy image of an 8 |um wide track on a magnetic disk. The bit 
transitions are spaced every 2 |um along the track. Arrows point to the edges of the DC-erased region. (Taken 
from [109], figure 7.) 


-25- 


(G) LATERAL FORCE MICROSCOPY 

Lateral force microscopy (LFM) has provided a new tool for the investigation of tribological (friction and 
wear) phenomena on a nanometre scale [ 110 ]. Alternatively known as friction force microscopy (FFM), this 
variant of AFM focuses on the lateral forces experienced by the tip as it traverses the sample surface, which 


correspond to the local coefficients of dynamic friction. LFM can therefore provide a frictional map of the 
surface with sub-nanometre resolution. It therefore has the potential to reveal chemical differences between 
regions of similar morphology, virtually down to the atomic scale. 

The LFM method is an inherently contact-mode technique, and can be performed with an AFM, provided that 
there is some means of measuring the lateral tip displacement. Mate et al [ 111 ] were the first to modify their 
AFM in order to detect lateral forces and to observe frictional behaviour on the atomic scale. Their detection 
system was interferometric, and their cantilever and tip consisted of a shaped tungsten wire. Later 
developments using the laser beam-deflection method [ 112 , 113 ] with two sets of position-sensing detectors 
(figure B 1.1 9. 23), enabled both lateral and normal forces to be measured simultaneously. Clearly, these two 
sets offerees are not entirely independent, since a lateral force will be felt as the tip is scanned over a step, for 
example, irrespective of the frictional coefficient at the step. However, by measuring the lateral force as the 
tip is scanned in both directions along the same line (producing a 'friction loop', figure B 1.1 9. 24 , and 
subtracting one trace from the other, the frictional information can be separated from the purely 
morphological. 
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Figure Bl. 19.23. Principle of simultaneous measurement of normal and lateral (torsional) forces. The 
intensity difference of the upper and lower segments of the photodiode is proportional to the z-bending of the 
cantilever. The intensity difference between the right and left segments is proportional to the torsion, t, of the 
force sensor. (Taken from [ 110 ], figure 2.) 


-26- 


A topography 


difference In 
friction (3-2) 



H- 


(2) 


0) 


4- 


M 


Figure Bl. 19.24. Friction loop and topography on a heterogeneous stepped surface. Terraces (2) and (3) are 
composed of different materials. In regions (1) and (4), the cantilever sticks to the sample surface because of 
static friction F ST . The sliding friction is ^ on part (2) and t^ on part 3. In a torsional force image, the contrast 
difference is caused by the relative sliding friction, AF SL = t^-ty Morphological effects may be 
distinguished from frictional ones by their non-inverted behaviour upon scanning in the opposite direction. 
(Adapted from [110], figure 2.) 


-27- 


(H) MECHANICAL IMAGING WITH FORCE MICROSCOPY 

In AFM, the relative approach of sample and tip is normally stopped after 'contact' is reached. However, the 
instrument may also be used as a nanoindenter, measuring the penetration depth of the tip as it is pressed into 
the surface of the material under test. Information such as the elastic modulus at a given point on the surface 
may be obtained in this way [ 114 ], although producing enough points to synthesize an elastic modulus image 
is very time consuming. 


Pulsed-force mode AFM (PFM-AFM) is a method introduced for fast mapping of local stiffness and adhesion 
with lower required data storage than recording force-distance curves at each point on the x-y plane [ 115 ]. A 
sinusoidal or triangular modulation is applied between the tip and sample (either via lever or sample piezo) at 
a lower frequency than that of either the piezo or cantilever resonance frequency. Tip and sample then come 


into contact for part of each oscillation cycle. The deflection signal of the cantilever is put into sample-and- 
hold circuits. The peak displacement of the lever during the approach cycle is usually chosen for the feedback 
signal to the piezo, in order to maintain a constant force during scanning. Other sampling points can be chosen 
at any arbitrary timings within one cycle depending on the required property. For example, the local stiffness 
can be calculated from the slope obtained from subtracting the sample-and-hold signals at two different points 
on the linear part of the deflection-displacement curve. The adhesion force can be calculated from the 
difference between the largest negative deflection signal and the zero-deflection point. 
Another method developed for imaging mechanical properties is ultrasonic force microscopy [ 116 ] (UFM). 
This technique involves carrying out contact AFM while oscillating the sample at high frequency: typically 
200-700 kHz, chosen to be above the highest tip-sample resonance. If operating in the purely contact mode 
(i.e. if the tip does not leave the surface), the amplitude of the tip oscillations is determined by the elastic 
modulus of the tip-sample system, independent of the spring stiffness of the cantilever. The technique can be 
thought of as a fast-indentation system, and it samples a volume that has a radius some ten times the 
indentation depth. The overall spatial resolution is typically a few nanometres. 

B1. 19.3.2 APPLICATIONS OF FORCE MICROSCOPY 

(A) INORGANIC SURFACES 

True atomic resolution has been obtained on a wide range of inorganic surfaces including metals, 
semiconductors and insulators. Initially, imaging concentrated on Si (111)7x7 as a means of demonstrating 
the true-atomic-resolution imaging capability of the technique [81]. Even with such a well understood surface, 
surprising results were obtained in the form of additional contrast revealed between different surface atoms. In 
the case of Erlandsson et al [ 117 ] their results showed that centre adatoms appeared to be 0.13 A higher than 
the corner adatoms. They suggest that the additional contrast may be due to variation in chemical reactivity of 
the adatoms or to tip-induced, atomic relaxation effects reflecting the stiffness of the surface lattice ( figure 
Bl. 19.25 ). Nakagiri et al [ 118 ] also saw additional contrast in their images of Si(l 1 1) 7 x 7. However, they 
observed the six atoms in one half of the unit cell to be brighter than in the other half. The two halves 
correspond to faulted and unfaulted halves of the unit cell according to the dimer-adatom stacking fault model 
[ 119 ]. At present they are not able to distinguish which atoms correspond to which half. The fact that this 
additional contrast varied depending on the precise experimental technique used indicates that different 
imaging mechanisms could be responsible as a result of the different tip material or height of the tip with 
respect to the surface. 
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Figure Bl. 19.25. AFM image of Si(l 1 1)-(7 x 7) taken in the AC mode. Contrast can be observed between 
inequivalent adatoms. Image courtesy of R Erlandsson. (Taken from [ 217 ], figure 4.) 

Of particular interest are those surfaces where AFM has provided complementary information or revealed 
surface structure which could not be obtained by STM. One obvious application is the imaging of insulators 
such as NaCl(OOl) [ 120 ]. In this case it was possible to observe point defects and thermally activated atomic 
jump processes, although it was not possible to assign the observed maxima to anion or cation. 

Another area where AFM has provided new information is the imaging of metal oxides such as Ti0 2 [ 121 , 
122 ]. Although the surface of Ti0 2 (l 10) is observable with STM, only NC-AFM was able to image the 
bridging oxygen rows which are the outermost atoms on the surface ( figure B 1.1 9. 26 ). True-atomic-resolution 
imaging is still a relatively recent development and the full power of the technique in imaging insulators such 
as A1 2 3 or Si0 2 has yet to be demonstrated. 
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Figure Bl. 19.26. Highly resolved, non-contact AFM image of the Ti0 2 (l 10)-(1 x 1) surface (8.5 x 8.5 nm 2 ) 
with a single step. The two-dimensional order of the brigh 
the bridging oxygen atoms. (Taken from [ 121 ], figure 3.) 


with a single step. The two-dimensional order of the bright spots (0.65 x 0.3 nm 2 ) reproduces the alignment of 


Even without atomic resolution, AFM has proved its worth as a technique for the local surface structural 
determination of a number of bio-inorganic materials, such as natural calcium carbonate in clam and sea- 
urchin shells [ 123 ], minerals such as mica [ 124 ] and molybdenite [ 125 ] as well as the surfaces of inorganic 
crystals, such as silver bromide [ 126 ] and sodium decatungstocerate [ 127 ], This kind of information can prove 
invaluable in the understanding of phenomena such as biomineralization, the photographic process or 
catalysis, where the surface crystallography, especially the presence of defects and superstructures, can play 
an important role, but is difficult to determine by other methods. AFM has the considerable advantage that it 
can be used to examine powdered samples, either pressed into a pellet, if the contact mode is employed, or 
loosely dispersed on a surface, if intermittent or non-contact AFM is available. 

AFM has also provided insights into the growth of metal clusters and films on mica. In the case of palladium 
[ 128 ], for example, it was found that clusters in the 50 nm range exhibited truncated triangular shapes. 
Epitaxial growth of silver on a mica surface [ 129 ] is seen to depend in a complex way on both substrate 
temperature and film thickness ( figure B 1.1 9. 27 ), with island morphology giving way to channels, which 
become holes and then networks as the film thickness increases, the changes progressing more gradually as 
the substrate temperature is increased. The issue of surface roughness of silver films is central to the technique 
of surface-enhanced Raman spectroscopy, and AFM has been used to characterize films produced for this 


purpose [ 130 ]. AFM possesses a considerable advantage over electron microscopy in this kind of application, 
in that it can, if calibrated, automatically yield quantitative morphological and roughness data [ 131 ]. 
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Figure Bl. 19.27. AFM topographic images (7^7 jim ) of 20 epitaxial Ag films on mica prepared at five 
substrate temperatures (75, 135, 200, 275, and 350 °C) and four film thicknesses (50, 110, 200, and 300 nm) 

using metal deposition rates of 0.1 to 0.2 nm s . The vertical RMS and peak-to-valley roughness are 
indicated for each image. (Taken from [ 129 ], figure 1.) 

(B) ORGANIC SURFACES 

AFM has been used to image several surfaces of organic crystals — such as tetracene [ 132 ] and pyrene [ 133 ] — 
and has produced images that can be compared to the unit cells expected from previous x-ray crystallographic 
studies. In the case of tetracene, the surface is merely a truncation of that expected from the bulk data. 
However, in the case of pyrene, where the bulk consists of dimer pairs, a surface reconstruction is evident in 
the AFM image, corresponding to the presence of monomer species. Reconstructions are common phenomena 
in metal surfaces, where LEED has frequently been used to detect them [25]. LEED has scarcely been used to 
analyse organic crystal surfaces, however, due to problems associated with charging and/or degradation of the 
sample in an electron beam. AFM nicely fills this gap in the surface analytical arsenal. 


The overwhelming majority of AFM studies on organic surfaces has concerned organic thin films on 
inorganic substrates and, in particular, those deposited via Langmuir-Blodgett or self-assembly processes 
[35]. These films 


-31- 


have been an active research area for several years, frequently serving as models for complex systems, such as 
membranes. Thin organic films are also being developed for their nonlinear optical properties, as 
microlithographic resists and as sensor components [32]. 

Two Langmuir-Blodgett film systems that have been much studied by AFM are the calcium and barium 
arachidates, adsorbed as double layers on a silicon substrate, which has been pretreated so as to be 
hydrophobic [32]. These structures are formed by dipping the silicon into an arachidate film on a water trough 
and then removing the substrate again through the film. Repeating the process simply adds another double 
layer to the structure. As with organic crystals, more traditional surface-structure-determining approaches are 
often too destructive to allow analysis of systems such as these; possibly the most important and unique aspect 
of AFM in this context is that it is a local probe, and therefore capable of showing the enormous variety of 
defects in the films, both on a microscale (pores and islands in the films) and a nanoscale [ 134 , 135 ] 
(twinning, defects and complexities in molecular packing) (figure B 1.1 9. 28). This topic has been dealt with at 
length by Frommer in her excellent review article [32]. 



Figure Bl. 19.28. Molecular-scale image (2 nm x 20 nm) of a barium arachidate bilayer. Image was produced 
by averaging six images, but without filtering data. (Taken from [ 135 ], figure 1.) 

Self-assembly of long-chain alkanethiols on the Au(l 11) surface has been studied by a number of different 
techniques including AFM, and it has been consistently shown that the molecules form a commensurate (V3 x 
V3)R30° structure. AFM studies can also provide additional information on the mechanical properties of the 
organic layers [ 136 ], which are interesting in that they serve as a model system for lubricants [ 137 ], Above a 
critical applied load of 280 nN for the C 18 thiols, it has been found that the monolayers were disrupted, and 
that the subsequent image corresponded to that of the Au(l 11) substrate. However, on reducing the load to 
substantially below the critical value, the surface apparently healed, and the characteristic periodicity of the 
thiol overlay er returned. The exact way in which this phenomenon occurs is not completely understood [ 138 ], 
Possibilities include displacement of the thiols by the tip, binding of the thiols to the tip, or desorption of the 
thiols into a liquid phase. 
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(C) POLYMER SURFACES 

AFM is contributing significantly to our understanding of the surface structure of polymers, both on a 
microscale and on a molecular level. Segregation in the surface of block copolymers and polymer blends is 
often critical in determining technologically important properties, such as wettability or biocompatibility. It is 
also often difficult to image by optical or electron microscopy. AFM, on the other hand, offers a method for 
scrutinizing these materials down to the molecular level, without the need for surface preparation. AFM's 
ability to operate in a liquid environment makes it particularly useful in analysing polymers for medical 
applications, since these materials are designed to function surrounded by body fluids, which can influence 
the surface microstructure and nanostructure. 

Several studies have concerned the microstructure of lamellae in materials such as the block copolymers 
polystyrene-Woc£-poly-2-vinylpyridine [ 139 ] andpolystyrene-Woc£-polybutadiene [ 140 ], as well as single 
crystals of poly-para-xylylene [ 139 ], and reveal features (such as intersecting lamellae (figure B 1.1 9. 29)) that 
had not been previously observed. 



Figure Bl. 19.29. AFM image of polystyrene/polybutadiene copolymer, showing lamellar structure. (Taken 
from [140], figure 1.) 

LFM has also proved useful in the examination of polymer blends, since its ability to image effective 
frictional coefficients imparts a certain chemical sensitivity to the method [ 141 ]. A novel approach to 
discrimination between components of a polymer blend was adopted by Feldman et al [ 142 ], who used the 
low-refractive-index liquid, perfluorodecalin, as a medium for measurement of tip-polymer 'pull-off, thereby 
enhancing the London component of the Hamaker constant, improving the signal-to-noise ratio of the 
measurement. Using the same medium for lateral force (frictional) imaging, a striking contrast reversal was 
found between polystyrene and PMMA blend components, when the tip surface was changed from 
(hydrocarbon-covered) gold to silica ( figure B 1.1 9. 30 ). 
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Figure Bl. 19.30. Height and friction images of a spin-cast polystyrene-poly(methyl methacrylate) blend 
obtained with (a) gold and (b) silica probes under perfluorodecalin. Note the reversal of frictional contrast and 
the high spatial resolution. (Taken from [ 142 ], figure 7.) 

On the molecular level, spectacular AFM images have been obtained for a number of systems. In the case of 
isotactic polypropylene, for example, Snetivy and Vancso [ 143 ] have succeeded in imaging individual methyl 
groups on the polymer chain, and distinguishing between left- and right-handed helices in the crystalline i- 
polypropylene matrix ( figure B 1.1 9. 31 ). The same group has also used AFM to image the phenylene groups 
in poly(p-phenyleneterephthalamide) fibres, and have used this data to show the existence of a new 
polymorphic form that had previously only been suggested by computer simulations. 


-34- 



Figure Bl. 19.31. AFM image of a 2.7 nm x 2.7 nm area of a polypropylene surface, displaying methyl 
groups and right- and left-handed helices. (Taken from [ 143 ], figure 10.) 

(D) BIOLOGICAL SURFACES 

The AFM is now firmly established as a unique tool for the in situ investigation of biological surfaces [ 144 ], 
whether these be biomolecules, cell structures, or even viruses [ 145 ]. Often, a successful immobilization 
strategy has been key to successful imaging of the biological surface [ 146 ]. 

Lured by the promise of a new way to sequence the genetic code, and the prospect of nanomanipulation of 
nucleic acids, investigators have produced a plethora of papers in the area of AFM imaging of DNA. Notable 
among these is the work of Bustamente et al [ 147 ], who developed one of the first reproducible methods for 
imaging nucleic acids. Their approach involved three important components: (1) the use of extremely sharp 
tips [ 148 ] (radius of curvature »10 nm), which are prepared by electron-beam deposition of a carbon whisker 
on the tip apex in an electron microscope [ 149 ], (2) the use of mica substrates that have been ion-exchanged 
with magnesium, in order to promote interaction with the phosphate groups on the DNA and (3) careful 
control of the relative humidity [ 150 ] (or operation under liquids), in order to prevent tip-induced sample 
movement. Using this approach, and imaging under 2-propanol, high-resolution images of plasmid DNA have 
been obtained [ 151 ] ( figure B 1.1 9. 32 ), and the molecules dissected by momentarily increasing the AFM force 
[ 152 ]. Single- and double-stranded DNA [ 153 ] and RNA-polymerase-DNA complexes [ 154 ] have also been 
imaged using this approach, the helical pitch of the DNA deciphered [ 155 ], AFM used to determine local 
chirality of the DNA supercoiling [ 156 ] and even individual base pairs resolved [ 157 ], 
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Figure Bl. 19.32. AFM image of Blue Script II plasmid (400 nm x 400 nm) in propanol, taken with 'super 
tip', prepared by carbon deposition on normal tip in SEM, followed by ion milling. (Taken from [ 152 ], figure 

i.) 

Proteins have also been investigated extensively by AFM. Radmacher et al [ 158 ] monitored height changes in 
an adsorbed layer of lysozyme using intermittent-contact AFM, as the enzyme was exposed to a substrate 
molecule. The height changes were variously interpreted as due to conformational adaptations of the 
lysozyme, or to the different height of the enzyme-substrate complex. Many other proteins have been imaged, 
using various immobilization methods, and this area has been comprehensively reviewed in the literature 
[ 159 ]. Among the highest-resolution (<1 nm) examples have been those of Miiller et al [ 160 ], who have 
studied the inner surface of the hexagonally packed intermediate (HPI) layers of cell envelope proteins, such 
as those in the bacterium Deinococcus radiodurans, where protein conformational changes can be observed 
( figure B 1.1 9. 33 ). AFM has also provided a new window into the channels present in the surfaces of living 
cells, and Lai et al [ 161 ] have imaged the channels formed when the responsible cell proteins (porins) are 
reconstituted as crystalline arrays. The resolution obtained was such that individual polar head groups of the 
lipid molecules could be discerned. Several mechanical studies on proteins using AFM have also been 
reported, notably the hysteretic unfolding of the giant muscle protein, titin, reported by Rief et al [ 162 ], where 
the importance of this protein as a 'strength reservoir' during muscle stretching was demonstrated for the first 
time. 
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Figure Bl. 19.33. Conformational changes of the inner surface of an HPI layer, (a) The protruding cores are 
clearly visible, with some pores in an open conformation and others in an obstructed conformation, (b) Area 
shown in panel a imaged 5 min later. Some pores that were open earlier are now closed (circles), while closed 
ones have opened during this time interval (squares). Units were aligned and divided into two classes. The 
class averages exhibit a plugged (c) and an open hexamer (d). The difference map (g) represents the modulus 
of the height difference between the sixfold-symmetrized class averages ((e) and (f)). The full grey-level 
range corresponds to a vertical distance of 6 nm ((a) and (b)) and 3 nm ((c) to (f)). (Taken from [ 160 ], figure 
3.) 

In the area of cell and cell-structure imaging, AFM offers an advantage over optical and scanning electron 
microscopies in that it permits high-resolution imaging of living cells, and even the observation of dynamic 
phenomena. Henderson et al [ 163 ] observed the motion of filamentous actin in living glial cells (i.e., 
structures beneath the plasma membrane were imaged), and even performed nanosurgery on the cell with the 
AFM tip. Brandow et al [ 164 ] also cut into lipid membranes on a graphite surface using deliberately high 
force from the AFM tip, and found that the membranes healed themselves after sufficient time, but that the 
healing could be accelerated by rubbing the AFM tip perpendicular to the cut with a controlled force. The 
same group was also able to manipulate living glial cells [ 165 ] and even peel them away from a surface, if the 
normal force was appropriate. Thus, depending on the amount of force applied, the AFM could be used to cut, 
anneal, peel or image the sample. 

Several groups have focused on the biochemical receptor-ligand interaction, measuring forces between 
individual molecular binding pairs, such as biotin and streptavidin [ 166 , 167 ], One approach involves coating 
the tip with either receptor or ligand, the sample with the complementary molecule, and then carefully 
monitoring tip-sample separation after contact. If the cantilever stiffness is appropriately chosen, the force- 
distance curve during separation appears to be 'quantized' in units corresponding to a single molecular-pair 
interaction. Lee et al [ 168 ] have also demonstrated and measured the interaction between single 
complementary strands of DNA base pairs using AFM. Several other examples of ligand-receptor interactions 
have been summarized in a review by Bongrand [ 169 ], 
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(E) COLLOIDS 


Due to its sensitivity to small forces and its ability to operate in liquids, the AFM has opened up a new avenue 
of investigation into colloidal systems. A frequently used approach, developed by Ducker et al [ 170 ], involves 
the cementing of a colloidal-sized particle onto an AFM cantilever in place of the usual tip, and then 
monitoring interactions with some appropriate flat surface, or even another colloidal particle [ 171 ], under a 
variety of conditions. Larson et al [ 172 ] used this technique to investigate the interactions between a titania 
(rutile) particle (»9 |um diameter) and a rutile single-crystal surface under various conditions of pH and ionic 
strength. From their experiments they were able to measure the van der Waals interaction between rutile 
surfaces directly, and to calculate the non-retarded Hamaker constant for the system. Biggs et al have applied 
a similar approach to the venerable subject of gold colloid stability [ 173 ], by immobilizing a -6 |um gold 
sphere on a cantilever and measuring its interaction with a polished gold plate under solutions containing 
combinations of gold, citrate, chloride and a number of other ions of relevance to the colloid system. The 
authors were able to demonstrate the presence of a repulsive interaction between the gold surfaces due to 
adsorbed citrate or chloride; they showed that citrate adsorbed preferentially, and succeeded in measuring the 
surface potential of the gold as a function of anion concentrations. 

(F) TRIBOLOGY 

Force microscopy, and lateral (or frictional) force microscopy in particular, are having a tremendous impact in 
tribology — the science and technology of friction, wear, and lubrication. The interaction between moving 
surfaces (the central issue in tribology) is thought to consist of separate interactions between the many peaks 
in one surface with the many peaks in the countersurface. These peaks are known as asperities. It is this mode 
of interaction that leads to Amontons' empirical 'law' of friction (this law is generally attributed to Amontons, 
although initially observed by Leonardo da Vinci a century earlier), F = |u TV, where F is the frictional force, N 
is the normal force, and |u is the coefficient of friction. Notable by its absence in this equation is the apparent 
contact area between the two sliding bodies. In fact, as one might intuitively believe, the actual contact area, 
A, between the asperities is all-important, and the frictional force is proportional to A. However, A, in turn, 
increases in proportion to the normal force [ 174 ], (both due to 'flattening' of the asperities by the load and the 
creation of new load-bearing asperities, as the higher asperities are flattened) so thatv4 can be cancelled out of 
the resulting equation, leaving behind only the measurable quantities, F and N. The importance of AFM in 
fundamental tribology research is that the AFM measurement can be thought of as a single asperity contact, 
i.e. the fundamental interaction in frictional behaviour. 

Carpick et al [84] used AFM, with a Pt-coated tip on a mica substrate in ultrahigh vacuum, to show that if the 
deformation of the substrate and the tip-substrate adhesion are taken into account (the so-called JKR model 
[ 175 ] of elastic adhesive contact), then the frictional force is indeed proportional to the contact area between 
tip and sample. However, under these single-asperity conditions, Amontons' law does not hold, since the 
'statistical' effect of more asperities coming into play no longer occurs, and the contact area is not simply 
proportional to the applied load. 

Mate et al [ 111 ], who pioneered LFM, used their instrument to show atomic-scale structure in the frictional 
force between a tungsten tip and surface graphite atoms. In fact, what these authors observed was stick-slip 
behaviour: an effect normally associated with macroscopic phenomena, such as the vibration induced in a 
violin string by the bow, or the squealing of an automobile's brakes. They also found that the frictional 
coefficient varied slightly with applied load, i.e. a deviation from Amontons' law. The same group went on to 
study mica surfaces [ 124 ], where a similar stick-slip behaviour was observed, the frictional coefficient 
varying with the unit cell periodicity of the mica cleavage plane ( figure B 1.1 9. 34 ). Hu et al [ 176 ] examined 
mica surfaces by LFM using silicon nitride tips, and found that above 
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normal loads of 10 nN, Amontons' law was obeyed, while at low loads, deviations from linearity were 
observed. They also showed that friction decreased substantially as a function of humidity, decreasing by an 
order of magnitude under liquid water, but was essentially invariant with scanning direction across the mica 
surface. Wear was also observed by these authors, who found that a threshold load value needed to be 
exceeded before layer-by-layer wear was initiated. Below this value, even multiple scans were not found to 
produce visible damage to the surface. A similar effect was observed for the system AgBr on NaCl(OOl) 
(Si0 2 -coated tip) by Liithi et al [ 177 ], where a wear onset was observed at around 14 nN. Interestingly, 
frictional coefficients were measured over a range of loads in this latter work and the |u measured on NaCl 
was found to be an order of magnitude lower than that measured on AgBr. 
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Figure Bl. 19.34. Cantilever deflection and corresponding frictional force in the x-direction as a function of 
sample position as a mica sample is scanned back and forth under a tungsten tip. (Taken from [ 124 ], figure 2.) 

Numerous groups have applied LFM to address the issue of friction on thin organic films. These systems 
serve as useful models for the important macroscopic tribological issue of boundary lubrication. Overney et al 
[ 178 ] used LFM to examine mixed Langmuir-Blodgett films of perhydro arachidic acid and partially 
fluorinated carboxylic acid that had been transferred onto a silicon surface as a bilayer system, using poly(4- 
vinyl-N-methylpyridinium) as a countercation. The images clearly showed a difference in frictional 
coefficient between the two components, which segregated into submicron domains. Surprising, however, was 
the observation that the apparent frictional coefficient on the fluorinated component was a factor of four 
higher than that measured on the non-fluorinated one. The authors attributed this to the greater shear strength 
of fluorinated films, although more recent measurements using LB-deposited straight-chain acids of different 
lengths [ 179 ] suggest that the length of the molecule itself has a significant effect on mechanical properties, 
and therefore on the frictional coefficients measured. 

Kim et al [ 180 ], using specially synthesized, end-functionalized alkanethiols, investigated mechanisms of 
friction by producing gold-supported monolayers containing varying quantities of various bulky endgroups. 
They found that the differences in friction were apparently due to differences in the size of the terminal 
groups, larger terminal groups (whether F-containing or not) giving rise to increasing interactions that 
provided pathways for energy dissipation, and therefore higher frictional losses. 
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The issues of the correlation of adhesion and of viscoelastic relaxation with friction are currently being 
investigated using AFM and LFM. Although friction does not correlate with the adhesion energy between two 


surfaces, there is increasing evidence that it is proportional to adhesion hysteresis [ 181 ]: i.e., the energy 
dissipated during the making and breaking of an adhesive bond. Marti et al [ 189 ] have shown that the 
adhesion hysteresis measured during force-curve measurements (Si 3 N 4 tip) on silica and alumina surfaces 
under electrolytes is proportional to the LFM friction force measurements made under the same conditions. 
Haugstad et al [ 182 ] characterized fundamental aspects of friction on polymer surfaces by measuring 
frictional force on a gelatin film as a function of scanning frequency, i.e. velocity. The friction was found to 
increase at lower velocities, which can be explained by the onset of rubbery-state-type molecular relaxations, 
the viscoelastic dissipation correlating with frictional dissipation. Following intensive scanning at a particular 
site on the gelatin surface, a high-friction 'signature' could be observed in subsequent LFM measurements, 
and was explained by the perturbative effect of the previous frictional measurement. In this case, the LFM 
images were providing a map of molecular relaxation across the film. 

(G) MEASUREMENT OF MECHANICAL PROPERTIES 

The technological importance of thin films in such areas as semiconductor devices and sensors has led to a 
demand for mechanical property information for these systems. Measuring the elastic modulus for thin films 
is much harder than the corresponding measurement for bulk samples, since the results obtained by traditional 
indentation methods are strongly perturbed by the properties of the substrate material. Additionally, the 
behaviour of the film under conditions of low load, which is necessary for the measurement of thin-film 
properties, is strongly influenced by surface forces [75]. Since the force microscope is both sensitive to 
surface forces and has extremely high depth resolution, it shows considerable promise as a technique for the 
mechanical characterization of thin films. 

A striking example of the use of the AFM as an indenter is provided by Burnham and Colton [ 184 ] ( figure 
Bl. 19.35 ), where the differences (at <100 nm indentation) between the plastic behaviour of gold at 20 |iN 
load and the elastic behaviour of graphite and elastomer at lower loads are clearly observable. The importance 
of surface forces in measuring mechanical properties at low load has been demonstrated by Salmeron et al 
[ 184 ], who showed that tip-sample adhesion can seriously perturb hardness measurements, not only for clean 
surfaces, but also for those covered by a layer of contamination. These authors also suggested that initial 
passivation of the surface (e.g., by sulfidation) might be an effective approach to overcoming these artefacts. 
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Figure Bl. 19.35. Experimental nanoindentation curves obtained with the AFM showing the loading and 
unloading behaviour of (a) an elastomer and highly oriented pyro lytic graphite and (b) a gold foil. (Taken 
from [183], figure 4.) 

(H) CHEMICAL IMAGING 

The imaging of surfaces according to chemical species is immensely useful in understanding the mechanisms 
of many complex technological systems on a fundamental level. Over the past two decades, Auger electron 
spectroscopy, x-ray photoelectron spectroscopy and secondary ion mass spectroscopy have been extended into 
surface-sensitive chemical imaging methods [ 185 ]; nowadays the use of such techniques for troubleshooting 
has become virtually routine in the semiconductor industry, and has contributed significantly to our 
knowledge of catalyst systems, corrosion mechanisms and many other areas. However, these methods are not 
universally applicable since they are limited in spatial resolution (especially for insulating samples) and 
require the sample to be analysed in a vacuum. A chemically sensitive scanning force microscopy that could 
image the distribution of chemical species on an insulating surface with nanometre resolution under ambient 
conditions, or even under liquids, is therefore a highly desirable goal, and many research groups are active in 
this area. 
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A promising approach, still in the early stages of development, involves the functionalization of the AFM tip. 


In many cases this has been limited to the demonstration of the sensitivity offeree curves to the chemical 
species present on the surface — a necessary first step for the development of an imaging methodology. Akari 
et al [ 186 ] have functionalized a gold-coated tip with COOH-terminated long-chain hydrocarbons, using a 
self-assembled monolayer procedure; they have shown that the contrast is considerably enhanced over that of 
the uncoated tip, when imaging a patterned mixed monolayer consisting of CH 3 - and COOH-terminated 
molecules of similar length. 

LFM coupled with tip functionalization is a potentially important chemical imaging technique, since the tip 
functionality can be tailored so as to produce maximum contrast (i.e. maximum difference in frictional 
coefficient) between different chemical species on the surface. Frisbie et al [ 187 ] have examined a patterned 
surface of COOH and CH 3 groups, and have shown that the pattern could be readily imaged by a COOH- or 
CH 3 -coated tip (figure B 1.1 9. 36) running in LFM mode, since the imaged frictional coefficients depended on 
the particular tip-surface species interaction. A potential pitfall with this technique is that both local chemistry 
and local mechanical properties, due to differences in molecular packing, contribute to the imaging contrast. 
This problem was elegantly sidestepped by McKendry et al who investigated chiral discrimination by 
chemical force microscopy [ 188 ], In this case the interaction between the tip and surface can be changed 
without alteration of the mechanical properties or wetting energies of tip or surface. The chemical force 
microscope proved to be sufficiently sensitive to permit discrimination between enantiomers of simple chiral 
molecules in a friction image with more quantitative differences being obtained from adhesion force or 'pull- 
off force histograms. 
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Figure Bl. 19.36. Image of the frictional force distribution of a pattern consisting of areas of CH 3 -terminated 
and areas of COOH-terminated molecules attached to gold-coated silicon. The tip was also functionalized in 
(a) with CH 3 species and in (b) with COOH species. The bright regions correspond to the higher friction 
force, which in (a) is observed on the CH 3 areas and in (b) on the COOH areas. (Taken from [ 187 ], figure 3.) 

One potentially powerful approach to chemical imaging of oxides is to capitalize on the tip-surface 
interactions caused by the surface charge induced under electrolyte solutions [ 189 ]. The sign and the amount 
of the charge induced on, for example, an oxide surface under an aqueous solution is determined by the pH 
and ionic strength of the solution, as well as by the isoelectric point (IEP) of the sample. At pH values above 
the IEP, the charge is negative; below this value, 
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the charge is positive. The same argument applies to a Si 3 N 4 tip (normally an oxynitride at the surface), so 


that at every pH, either an attractive or a repulsive electrostatic tip-sample force will be superimposed upon 
the forces that are normally encountered in AFM (figure B 1.1 9. 37). By varying the pH and determining the 
value at which the charges switch sign, the IEP of the sample may be determined. Since this value is 
characteristic for a particular oxide, and the sample area probed is on the nanometre-squared scale, this 
appears to be a promising direction for the high-resolution chemical imaging of mixed oxide, oxide-covered 
alloy [ 190 ] or even protein [ 191 ] surfaces. 
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Figure Bl. 19.37. Normal force versus tip-sample distance curves for a Si 3 N 4 tip on a Si0 2 surface under 1 
mM NaCl solution at pH 4 and pH 9. (Taken from [ 189 ], figure 2.) 

Finally, Berger et al [ 192 ] have developed a technique whereby an array of force curves is obtained over the 
sample surface ('force-curve mapping'), enabling a map of the tip-sample adhesion to be obtained. The 
authors have used this approach to image differently oriented phase domains of Langmuir-Blodgett-deposited 
lipid films. 


B1. 19.4 SCANNING NEAR-FIELD OPTICAL MICROSCOPY AND 
OTHER SPMS 

Since the invention of the scanning tunnelling microscope, many other related techniques have been 
developed that combine the principles of piezo positioning with a feedback system, but rely on a surface 
interaction other than electron tunnelling or force sensing in order to produce images. The overwhelming 
majority of these scanning techniques are still in the very early stages of development and, in contrast to STM 
and AFM, have as yet revealed little that could not be better determined by other methods. Nevertheless, this 
is an area with tremendous promise, and a selection of these methods is therefore described below. 

B1. 19.4.1 SCANNING NEAR-FIELD OPTICAL MICROSCOPY 

Of the methods described in this section, scanning near-field optical microscopy (SNOM or NSOM) is the 
closest to being able to provide useful information that is unobtainable by other means. Indeed, this technique 
has already been made available as a commercial instrument. A detailed review of SNOM has been written by 
Pohl [ 193 ]. 
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While the spatial resolution in classical microscopy is limited to approximately A/2, where X is the optical 
wavelength (the so-called Abbe Limit [ 194 ], -0.2 |im with visible light), SNOM breaks through this barrier 
by monitoring the evanescent waves (of high spatial frequency) which arise following interaction with an 


object, rather than the propagating waves (of low spatial frequency), which are observed under far- field 
conditions. While the field intensity of propagating waves decays with the well known inverse-square 
dependence on distance from the source, evanescent waves decay much more rapidly, with an inverse-fourth 
power relationship. This means that the evanescent waves are almost completely damped out within a few 
nanometres of the source, and play no part in far-field (i.e. classical) optical measurements. Synge [ 195 ] was 
the first to discuss the possibility of exceeding the Abbe limit, as long ago as 1928, and suggested that by 
scanning a nanometre-sized aperture over the surface of the sample, a resolution higher than A/2 could be 
obtained. This is analogous to the use of a stethoscope by a physician, where spatial information can be 
obtained with a resolution far greater than the acoustical wavelength [ 196 ]. This forms the basis of SNOM. In 
1972, Ash et al [ 197 ] were able to show that this principle could indeed be demonstrated for microwave 
wavelengths (3 cm), but SNOM with visible light was not developed until the 1980s, when the technologies 
surrounding the STM became available. 

The design of the SNOM in the first experiments consisted of a minute aperture, formed by a metallized glass 
fibre tip, which was rastered across a sample that was illuminated by a laser beam from behind ( figure 
Bl. 19.38 ). The tip was maintained at a constant distance from the sample by using the tip-sample tunnelling 
current in a feedback loop. The resolution obtained was 25 nm (A/20) [ 196 , 198 ]. Subsequent variations to the 
experimental set-up have included the use of force interactions to maintain tip-sample separation [ 199 ] 
(enabling the imaging of insulators), as well as operation in transmission mode [ 200 ], Recent applications 
have included the detection and spectroscopy of single molecules [ 201 ], where spectral differences 
corresponding to the different chemical environments of individual molecules can be discerned, and the 
orientation of each molecular dipole determined. In general, the combination of high lateral resolution in the 
optical near-field and spectroscopic information has been restricted to fluorescence and luminescence 
experiments. Recently, vibrational spectroscopy in the optical near-field has also been attempted [ 202 , 203 , 
204 and 205 ] with surface-enhanced Raman scattering in particular, appearing to be a promising method for 
obtaining spectral, spatial and chemical information of molecular adsorbates with sub wavelength lateral 
resolution. This has been implemented in illumination mode, where the incident light emerges from the fibre 
probe, to obtain surface-enhanced Raman spectra from single Ag colloid nanoparticles [ 206 ]. Raman chemical 
imaging on a scale of 100 nm has also been demonstrated by Deckert et al on dye-labelled DNA. On the 
nanometre scale a strong dependence of the enhanced Raman signal on substrate morphology is expected. It is 
therefore particularly useful to correlate near- field Raman spectra with topographic information on a 
nanoscale, as in the experiment of Zeisel et al [ 207 ] who investigate the near-field surface-enhanced Raman 
spectroscopy of dye molecules adsorbed on silver island films. 
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Figure Bl. 19.38. Schematic of a scanning near-field optical microscope (SNOM). (Taken from [ 196 ], figure 

2.) 

B1. 19.4.2 SCANNING NEAR-FIELD ACOUSTIC MICROSCOPY (SNAM) 

This corresponds to the physician's stethoscope case mentioned above, and has been realized [ 208 ] by 
bringing one leg of a resonating 33 kHz quartz tuning fork close to the surface of a sample, which is being 
rastered in the x-y plane. As the fork-leg nears the sample, the fork's resonant frequency and therefore its 
amplitude is changed by interaction with the surface. Since the behaviour of the system appears to be 
dependent on the gas pressure, it may be assumed that the coupling is due to hydrodynamic interactions 
within the fork-air-sample gap. Since the fork tip-sample distance is approximately 200 |um (-A/20), the 
technique is sensitive to the near-field component of the scattered acoustic signal. 1 |um lateral and 10 nm 
vertical resolutions have been obtained by the SNAM. 

B1. 19.4.3 SCANNING THERMAL PROFILER (STP) 

This technique involves the scanning of a heated thermocouple tip above the surface of the sample ( figure 
Bl.19.39 ). Since the heat loss from the tip is highly dependent on the tip-sample spacing, the temperature of 
the tip can be used as a control parameter to monitor sample morphology and/or thermal properties [ 209 ], The 
lateral resolution obtained with this method is of the order of 0.1 |um, with a vertical resolution of about 3 nm. 
The temperature sensitivity of the tip is -0.1 mK. An advantage of the technique is that morphology can be 
imaged at distances approximately equal to the desired lateral resolution. 
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Figure Bl. 19.39. Schematic of the thermocouple probe in a scanning thermal profiler. The probe is supported 
on a piezoelectric element for modulation of tip-sample distance at frequency cOj and to provide positioning. 
The AC thermal signal at co 1 is detected, rectified, and sent to the feedback loop, which supplies a voltage to 
the piezostack to maintain the average tip-sample spacing constant. (Taken from [ 209 ], figure 1.) 

An extension of this technique, known as scanning thermal microscopy [ 210 ] (SThM) combines the thermal 
profiling technique with modulated-temperature differential scanning calorimetry, by applying a sinusoidal 
modulation to the tip temperature. Using this technique, the spatial distribution of thermal properties of 
materials (such as conductivity and diffusivity) can be monitored, and subsurface features imaged (since the 
penetration depth of the evanescent temperature wave is frequency-dependent). Additionally, local 
calorimetric analysis can be used to probe thermally activated near-surface processes, such as glass 
transitions, melting, crystallization or cure reactions. The technique has been used to great effect with polymer 
blends, where imaging contrast is caused by differences in the thermal properties of the individual 
components. 

B1. 19.4.4 SCANNING ION CONDUCTANCE MICROSCOPY (SICM) 

This method relies on the simple principle that the flow of ions into an electrolyte-filled micropipette as it 
nears a surface is dependent on the distance between the sample and the mouth of the pipette [ 211 ] ( figure 
Bl. 19.40 ). The probe height can then be used to maintain a constant current flow (of ions) into the 
micropipette, and the technique functions as a non-contact imaging method. Alternatively, the height can be 
held constant and the measured ion current used to generate the image. This latter approach has, for example, 
been used to probe ion flows through channels in membranes. The lateral resolution obtainable by this method 
depends on the diameter of the micropipette. Values of 200 nm have been reported. 
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Figure Bl. 19.40. The scanning ion-conductance microscope (SICM) scans a micropipette over the contours 
of a surface, keeping the electrical conductance through the tip of the micropipette constant by adjusting the 
vertical height of the probe. (Taken from [ 211 ], figure 1.) 

B1. 19.4.5 SCANNING MICROPIPETTE MOLECULE MICROSCOPY (SMMM) 

The apparatus involved in this method is related to that of SICM, except that the micropipette is blocked by a 
permeable polymer plug, and connected to the inlet of a differentially pumped quadrupole mass spectrometer 
[ 212 ] ( figure B 1.1 9. 3 8 ). Unlike most other scanning techniques, SMMM relies on a light microscope for 
positioning. Nevertheless, it is a unique spatially resolved sampling method for desorbing surface species 
under solution, and has numerous potential applications in biology and medicine. The diffusion of water 
through pores in a polymer membrane has been followed by using the set-up in figure B 1.1 9.41 where 
diffusing HDO is converted to HD (with the unambiguous mass of 3 amu) prior to mass spectrometric 
detection by means of a uranium reduction furnace. 
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Figure Bl. 19.41. Schematic of the scanning micropipette molecule microscope. (Taken from [ 212 ], figure 1.) 
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B1.19.5 OUTLOOK 


STM and its many related methods have opened new windows onto the nanometre-scale world. Within a short 
period of time, SPMs have become common fixtures, not only in surface science laboratories, but also in 
research groups working in areas as diverse as ceramics, polymers, cell biology, robotics, catalysis, and 
tribology. Clearly this trend will continue, as the concept of the proximal probe becomes combined with more 
and more of the macroscale analytical tools that we know today. It is also clear that the nanoscale chemical 
analysis of surfaces by means of SPM will become increasingly viable, with biological surfaces providing 
some of the most challenging and potentially fruitful analytical problems. Other suggested applications of 
SPMs are as high-density information storage systems, as selective molecular manipulators and as aids in 
microsurgery. With the field barely into its second decade, the technological and scientific possibilities of the 
SPM approach are immense. 
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B1.20 The surface forces apparatus 

Manfred Heuberger 


B1.20.1 INTRODUCTION 

Compared with other direct force measurement techniques, a unique aspect of the surface forces apparatus 
(SFA) is to allow quantitative measurement of surface forces and intermolecular potentials. This is made 
possible by essentially three measures: (i) well defined contact geometry, (ii) high-resolution interferometric 
distance measurement and (iii) precise mechanics to control the separation between the surfaces. 

It is remarkable that the roots of the SFA go back to the early 1960s [JJ. Tabor and Winterton [2] and 
Israelachvili and Tabor [3] developed it to the current state of the art some 15 years before the invention of the 
more widely used atomic force microscope (AFM) (see chapter B 1.1 9 ). 

Although only a few dozen laboratories worldwide are actively practising the SFA technique, it has produced 
many notable findings and a fundamental understanding of surface forces [4]. These are applicable to various 
research fields such as polymer science [5, 6, 7, 8, 9 and 10], thin-film rheology [ 11 , 12 , 13 and 14], biology 
[ 15 , 16 , 17 and 19], liquid crystals [ 20 , 21 and 22], food sciences [12, 23, 24], molecular tribology [9, 14, 25, 
26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 and 35] and automotive tribology [36] — to name just a few. Numerous 
experimental extensions of the SFA technique have been developed to take advantage of the unique 
experimental set-up. 

One of the most important extensions is the measurement of lateral forces (friction). Friction measurements 
have accompanied the SFA technique since its early beginnings in the Cavendish laboratory in Cambridge 
[ 37 ] and a variety of different lateral force measurements are practised throughout the SFA community. 


B1.20.2 PRINCIPLES 

B1. 20.2.1 DIRECT FORCE MEASUREMENT 

The measurement of surface forces calls for a rigid apparatus that exhibits a high force sensitivity as well as 
distance measurement and control on a subnanometre scale [38]. Most SFAs make use of an optical 
interference technique to measure distances and hence forces between surfaces. Alternative distance 
measurements have been developed in recent years — predominantly capacitive techniques, which allow for 
faster and simpler acquisition of an averaged distance [H, 39, 40] or even allow for simultaneous dielectric 
loss measurements at a confined interface. 


The predominant method of measuring forces is to detect the proportional deflection of an elastic spring. The 
proportionality factor is commonly called the spring constant and, in the SFA, may range from some lONm" 
1 to some 100 kN m . It is essential that the apparatus frame and surface compliance be at least 1000 times 
stiffer than the force-measuring spring. In a typical SFA, one of the two surfaces is attached to the force- 
measuring spring, while the other surface is rigidly mounted to the apparatus frame. In this set-up, shown 


schematically in figure B 1.20.1 a set of at least three parameters must be controlled or measured 
simultaneously for a direct force measurement. 

Figure Bl.20.1. Direct force measurement via deflection of an elastic spring — essential design features of a 
direct force measurement apparatus. 

The accurate and absolute measurement of the distance, D, between the surfaces is central to the SFA 
technique. In a typical experiment, the SFA controls the base position, z 3 , of the spring and simultaneously 
measures D, while the spring constant, k, is a known quantity. Ideally, the simple relationship A F(D) = kA 
(D-z^) applies. Since surface forces are of limited range, one can set F(D = go) = to obtain an absolute scale 
for the force. Furthermore, 8F(D = go)/8D « so that one can readily obtain a calibration of the distance 
control at large distances relying on an accurate measurement of D. Therefore, D and F are obtained at high 
accuracy to yield F{D), the so-called^orce versus distance curve. 

Most interferometric SFAs allow one to measure the distance, D, as a function of a selected lateral dimension, 
x\ which can be used to obtain information about the entire contact geometry. Knowledge of the contact 
geometry, together with the assumption that the underlying intermolecular forces are additive, allows the 
intermolecular potential W(D) to be deduced from the measured F(D) using the so-called Derjaguin 
approximation [4, 41 ]. For the idealized geometry of an undeformed sphere of radius R on a flat: 

(B1.20.1) 
where R becomes an effective radius R' = (T^R^) ' 5 for the case of two cylinders (SFA) with radii R^ and R^. 


In accordance with equation (Bl.20.1) , one can plot the so-called surface force parameter , P = F(D) 1 2n R, 
versus D. This allows comparison of different direct force measurements in terms of intermolecular potentials 
W{D), i.e. independent of a particular contact geometry . Figure B 1.20. 2 shows an example of the attractive 
van der Waals force measured between two curved mica surfaces of radius R « 10 mm. 
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Figure Bl.20.2. Attractive van der Waals potential between two curved mica surfaces measured with the 
SFA. (Reproduced with permission from [4], figure 1 1.6.) 


An alternative way of measuring F(D) is to control forces using magnetic fields instead of controlling 


distances [42, 43]. An electronically controlled current through one or more coils creates a magnetic gradient, 
which exerts a force on the force-measuring spring. The surfaces are allowed to equilibrate at a distance, D, 
which is then measured. This alternative set-up allows one to measure D{F) instead of F(D) allowing for a 
true equilibrium force measurement. 

B1. 20.2.2 SAMPLE PREPARATION 

Well defined contact geometry and absolute cleanliness are crucial factors for a successful SFA experiment. 
Therefore, two curved sheets of mica are brought into contact in crossed-cylinder geometry. 

The sample preparation of these mica sheets is a delicate process that requires some experience and often 
takes 1-2 days prior to an SFA experiment. Through successive cleaving, one has to prepare 1-5 \im thick 
and uniform sheets of mica. Mica is a natural material that is available in different qualities [44]. 

Each newly cleaved mica surface is very clean. However, it is known that mica has a strong tendency to 
spontaneously adsorb particles [45] or organic contaminants [46], which may affect subsequent 
measurements. The mica sheets are cut into 10 mm x 10 mm sized samples using a hot platinum wire, then 
laid down onto a thick and clean 100 mm x 100 mm mica backing sheet for protection. On the backing sheet, 
the mica samples can be transferred into a vacuum chamber for thermal evaporation of typically 50-55 nm 
thick silver mirrors. 


It was the idea of Winterton [2] to glue the otherwise fragile mica sheets onto polished silica discs to give 
them better mechanical stability, especially for friction experiments. The glue layer determines the final 

surface compliance of the silica/glue/mica stack which is typically around 4x 10 Pa. The use of mica 
samples from the same original sheet guarantees that the interferometer will be perfectly symmetrical (see 
section (bl. 20.2.3)). 

The silica discs that now hold the back-silvered mica samples are finally mounted into the SFA so that the 
cylinder axes are crossed and the clean mica surfaces are facing each other. 

B1. 20.2.3 MULTIPLE BEAM INTERFEROMETRY 

The absolute measurement of the distance, D, between the surfaces is central to the SFA technique. In 
interferometric SFAs, it is realized through an optical method called multiple beam interferometry (MBI), 
which has been described by Tolansky [47]. 

A 50 nm film of metal (silver) is deposited onto the atomically smooth mica sheets. White light with a 
coherence length of some 10 jum is directed normally through the mica sheets to illuminate the contact zone. 
The mica-silver interfaces have a reflectivity of typically 97% and form an optical resonator. A constructive 
interference occurs for light that has a wavelength equal to half the optical distance between the mirrors. This 
resonance is called interference fringe of chromatic order N= 1. The larger the optical distance, the more this 
resonance shifts towards the red end of the spectrum. In analogy to an organ pipe, one also observes harmonic 
fringes at higher frequencies which are identified with integer numbers N= 2,3,. . .. The emerging light from 
the silver/mica/mica/silver interferometer is focused onto the slit of an imaging spectrograph for further 
analysis. The light selected to enter the spectrograph entrance slit corresponds to a one-dimensional cut 
through the illuminated contact zone. The distance information along this (linear) dimension is maintained 
and the exit plane of the imaging spectrograph displays the interference pattern as a function of a position x f . 
Typically, one uses fringes in the visible spectrum (15 < N < 35) for distance measurements in the SFA, but 
other near- visible wavelengths can equally be employed. Generally, the distance resolution of MBI decreases 
linearly with increasing N and longer wavelengths used. 


Because the mica surfaces are curved, the optical distance between the mirrors is a function of the lateral 
dimension, x\ i.e. T= T(x f ). Therefore, the wavelength of a given fringe becomes itself a function of x r . Since 
the chromatic order, N, is invariant within a given fringe, these fringes are commonly called fringes of equal 
chromatic order (FECO) (see figure B 1.20. 3 .) 


I 





Figure Bl.20.3. FECO allow optical distances in the SFA to be measured at subnanometre resolution. The 
FECOs depicted here belong to a symmetric, three-layer interferometer. The elastically deformed contact 
region appears as flattened fringes in the upper graph (vertical part). Once the surfaces are separated to a 
distance D in a medium of index n, the wavelength shifts are measured to calculate D and n. Since mica is 
birefringent, one can observe (3 and y components for each fringe. To simplify data evaluation, birefringence 
can be suppressed using polarizing filters after the interferometer. 

The problem of calculating surface separations from a given FECO pattern is in general far more complex 
than it may seem at first sight. The main reason is the fact that the refractive index is different for each of the 
different layers in the interferometer. For the simplest case of two mica surfaces in contact, surrounded by a 
medium, one has to solve for a one-layer interferometer inside the contact area and for a three-layer 
interferometer outside the contact area — that is where light travels partially through the medium. The 
analytical treatment of a three-layer interferometer is considerably simplified if the two mica sheets have 
exactly the same thickness, i.e. when the interferometer is symmetric. Based on the work of Hunter et al [48], 
Israelachvili [49] has derived the first useful analytical expressions and methodology for the symmetrical 
three-layer interferometer in the SFA. Clarkson [ 50 ] extended MBI with a numerical analysis of asymmetric 
three-layer interferometers by applying the multilayer matrix method [51]. Later, Horn et al [52] derived 
useful analytical expressions for the asymmetric interferometer. 


Nonetheless, the symmetric interferometer remains very useful, because there, the wavelengths of fringes with 
even chromatic order, N, strongly depend on the refractive index, n^ of the central layer, whereas fringes with 
odd chromatic order are almost insensitive to riy This lucky combination allows one to measure the thickness 
as well as the refractive index of a layer between the mica surfaces independently and simultaneously [49]. 


To simplify FECO evaluation, it is common practice to experimentally filter out one of the components by the 
use of a linear polarizer after the interferometer. Mica birefringence can, however, be useful to study thin 
films of birefringent molecules [49] between the surfaces. Rabinowitz [53] has presented an eigenvalue 
analysis of birefringence in the multiple beam interferometer. 

Partial reflections at the inner optical interfaces of the interferometer lead to so-called secondary and tertiary 
fringe patterns as can be seen from figure B 1.20.4. These additional FECO patterns become clearly visible if 
the reflectivity of the silver mirrors is reduced. Methods for analysis of such secondary and tertiary FECO 
patterns were developed to extract information about the topography of non-uniform substrates [54], 
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Figure Bl.20.4. Cross-sectional sideview of a symmetric, three-layer interferometer illustrating the origin of 
primary, secondary, tertiary and gap FECOs. (Reproduced with permission from [54].) 

In the symmetric, three-layer interferometer, only even-order fringes are sensitive to refractive index and it is 
possible to obtain spectral information of the confined film by comparison of the different intensities of odd- 
and even-order fringes. The absorption spectrum of thin dye layers between mica was investigated by Miiller 
and Machtle [55, 56] using this method. 

Instead of an absorbing dye layer between the mica, Levins et al [52] used thin metallic films and developed a 
method for FECO analysis using an extended spectral range. 


The preparation of the reflecting silver layers for MBI deserves special attention, since it affects the optical 
properties of the mirrors. Another important issue is the optical phase change [58] at the mica/silver interface, 
which is responsible for a wavelength-dependent shift of all FECOs. The phase change is a function of silver 
layer thickness, T, especially for T < 40 nm [54], The roughness of the silver layers can also have an effect on 
the resolution of the distance measurement [59, 60 ]. 


Another interesting extension of the FECO technique, using a capillary droplet of mercury as the second 
mirror, was developed by Horn et al [61] . The light from this special interferometer is analysed in reflection, 


which yields an inverted FECO pattern. 

Every property of an interface that can be optically probed can, in principle, be measured with the SFA. This 
may include information obtainable from absorption spectroscopy [55], fluorescence, dichroism, 
birefringence, or nonlinear optics [43], some of which have already been realized. 

B1. 20.2.4 COMMON DESIGNS AND ATTACHMENTS 

Israelachvili and Adams [ 62 ] designed one of the first SFAs, known as the Mk I, in the mid-1970s. Later, 
Israelachvili developed the Mk II [63, 64], which is based on the Mk I but has improved mechanics — in 
particular, a double cantilever spring to avoid surface tilt upon deflection (force), as well as a number of new 
attachments for a variety of different experimental set-ups. The Mk I and Mk II designs served as basis for 
further versions. More recent and improved versions include the Mk III developed by Israelachvili [65], as 
well as the circular steel Mk IV developed by Parker et al [66] and the circular glass SFA designed by Klein 
[67]. Some of the most common designs are schematically reproduced in figure B 1.20. 5 . 
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Figure Bl.20.5. A selection of common SFA designs: SFA Mk I [62], SFA Mk II [64], SFA Mk III [65], SFA 
Mk IV [66], capacitive SFA [11], fibre-optic SFA [43]. Figures reproduced with permission from indicated 
references. 


A considerable number of experimental extensions have been developed in recent years. Luckham et al [5] 
and Dan [68] review examples of dynamic measurements in the SFA. Studying the visco-elastic response of 
surfactant films [69] or adsorbed polymers [7, 9] promises to yield new insights into molecular mechanisms of 
frictional energy loss in boundary-lubricated systems [28, 70 ], 


The measurement of lateral forces (friction and shear) in the SFA has recently been reviewed by Kumacheva 


[32]. To measure friction and shear response, one has to laterally drive one surface and simultaneously 
measure the response of a lateral spring mount. A variety of versions have been devised. Lateral drives are 
often based on piezoelectric or bimorph deflection [ 13 , 71 ] or DC motor drives, whereas the response can be 
measured via strain gauges, bimorphs, capacitive or optical detection. 

Another promising extension uses x-rays to probe the structure of confined molecules [72]. 

In summary, the SFA is a versatile instrument that represents a unique platform for many present and future 
implementations. Unlike any other experimental technique, the SFA yields quantitative insight into molecular 
dimensions, structures and dynamics under confinement. 


B1. 20.3 APPLICATIONS 

This section deals with some selected examples of typical SFA results, collected from various research areas. 
It is not meant to be a comprehensive review, rather a brief glance at the kind of questions that can be 
addressed with the SFA. 

The earliest SFA experiments consisted of bringing the two mica sheets into contact in a controlled 
atmosphere ( figure B 1.20. 6 ) or (confined) liquid medium [14, 27, 73, 74 and 75]. Later, a variety of surfactant 
layers [76, 77], polymer surfaces [5, 9, 10, 13, 68, 78], polyelectrolytes [79], novel materials [ 80 ] or 
biologically relevant molecular layers [15, 19, 81, 85 and 86] or model membranes [84, 87, 88] were prepared 
on the mica substrate. More recently, the SFA technique has also been extended to thick layers of other 
materials, such as silica [73, 89], polymer [10], as well as metals [ 59 ] and metal oxides [90, 91 ], 
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Figure Bl.20.6. Short-range adhesion of a mica-mica contact as a function of the relative crystallographic 
orientation of the mica surfaces, measured in a dry nitrogen atmosphere. With permission from [94]. 

B1.20.3.1 MEASURING SHORT-RANGE SOLVATION AND HYDRATION FORCES 


The measurement of surface forces out-of-plane (normal to the surfaces) represents a central field of use of 
the SFA technique. Besides the ubiquitous van der Waals dispersion interaction between two (mica) surfaces 


in dry air ( figure B1.20.2 )and figure B 1.20. 6 , there is a wealth of other surface forces arising when the 
surfaces are brought into contact in a liquid medium. Many of these forces result from the specific properties 
of the liquid medium and originate from a characteristic ordering or reorientation of atoms or molecules — 
processes which are often entropy driven. 

The well defined contact geometry and the ionic structure of the mica surface favours observation of 
structural and solvation forces. Besides a monotonic entropic repulsion one may observe superimposed 
periodic force modulations. It is commonly believed that these modulations are due to a metastable layering at 
surface separations below some 3-10 molecular diameters. These diffuse layers are very difficult to observe 
with other techniques [92]. The periodicity of these oscillatory forces is regularly found to correspond to the 
characteristic molecular diameter. Figure B 1.20. 7 shows a typical measurement of solvation forces in the case 
of ethanol between mica. 
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Figure Bl.20.7. The solvation force of ethanol between mica surface. The inset shows the full scale of the 
experimental data. With permission from [75], 

In the case of water, these forces are called hydration forces [93, 94]- The behaviour of water close to surfaces 
has attracted considerable attention due to its importance in the understanding of colloidal and biological 
interactions. Water seems to be a molecule of remarkable intermolecular interactions [95], mainly due to its 
capability to form hydrogen bonds. A number of aspects of water have been vigorously debated using results 
obtained with the SFA technique. These include the apparent viscosity in ultra-thin confined films [ 73 ] or the 
structure of water near surfaces, which is believed to give rise to hydrophilic repulsion or hydrophobic 
attraction [96, 92], or, indeed, the very origin of hydration forces and oscillatory forces [91]. 
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B1. 20.3.2 MEASURING LONG-RANGE DLVO-TYPE INTERACTIONS 


In a given aqueous electrolyte there is a certain population of hydrated ions and counterions. In addition, 
many surfaces, including mica, exhibit a net charge in aqueous solution. Counterions are known to form a 
diffuse screening layer near a charged molecule, particle or surface with roughly exponentially decreasing 
concentration into the solution. The characteristic decay length of this double layer is called the Debye length 
and decreases with increasing ionic strength. This double layer gives rise to an entropically driven, 


exponentially decreasing surface force, the so-called double-layer repulsion. Before the discovery of short- 
range forces (see above), it was commonly accepted that surface forces in liquid media could always be 
decomposed into an attractive dispersion component (van der Waals) and a repulsive double-layer force. 
These two forces are combined in the well known Derjaguin-Landau, Verwey-Overbeck (DLVO) theory and 
a number of systems have been measured in the SFA to confirm the predictions of this theory. Figure B 1.20. 8 
shows results obtained in aqueous solutions of NaCl on silica surfaces. The ranges of the observed long-range 
repulsion forces are in good agreement with the DLVO theory. The inset nicely demonstrates the effect of the 
monotonic short-range forces described in section (bl.20.3.1) . As a contrast to the results obtained on silica 
surfaces, figure B 1.20. 9 schematically displays the measured DLVO-type forces between mica surfaces in 
aqueous electrolytes. The main differences are that the monotonic hydration (solvation) force is dependent on 
the salt concentration and that there are oscillatory forces superimposed. On the mica surface, the monotonic 
hydration force hence seems to be mainly the result of the presence of hydrated ions in the double-layer, 
whereas the silica surface is strongly hydrophilic and hence 'intrinsically' hydrated [94], 





100 


Figure Bl.20.8. DLVO-type forces measured between two silica glass surfaces in aqueous solutions of NaCl 
at various concentrations. The inset shows the same data in the short-range regime up to D = 10 nm. The 
repulsive deviation at short range (<2 nm) is due to a monotonic solvation force, which seems not to depend 
on the salt concentration. Oscillatory surface forces are not observed. With permission from [73]. 
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Figure Bl.20.9. Schematic representation of DLVO-type forces measured between two mica surfaces in 
aqueous solutions of KN0 3 or KC1 at various concentrations. The inset reveals the existence of oscillatory 
and monotonic structural forces, of which the latter clearly depend on the salt concentration. Reproduced with 
permission from [94]. 

B1. 20.3.3 MEASURING BIOLOGICAL INTERACTIONS 

Interactions between macromolecules (proteins, lipids, DNA,. . .) or biological structures (e.g. membranes) 
are considerably more complex than the interactions described in the two preceding paragraphs. The sum of 
all biological interactions at the molecular level is the basis of the complex mechanisms of life. In addition to 
computer simulations, direct force measurements [98], especially the surface forces apparatus, represent an 
invaluable tool to help understand the molecular interactions in biological systems. 

Proteins can be physisorbed or covalently attached to mica. Another method is to immobilise and orient them 
by specific binding to receptor- functionalized planar lipid bilayers supported on the mica sheets [15]. These 
surfaces are then brought into contact in an aqueous electrolyte solution, while the pH and the ionic strength 
are varied. Corresponding variations in the force-versus-distance curve allow conclusions about protein 
conformation and interaction to be drawn [99], The local electrostatic potential of protein-covered surfaces 
can hence be determined with an accuracy of ±5 mV. 

A typical force curve showing the specific avidin-biotin interaction is depicted in figure B 1.20. 10 . The SFA 
revealed the strong influence of hydration forces and membrane undulation forces on the specific binding of 
proteins to membrane-bound receptors [81]. 
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Figure Bl.20.10. Typical force curve for a streptavidin surface interacting with a biotin surface in an aqueous 
electrolyte of controlled pH. This result demonstrates the power of specific protein interactions. Reproduced 
with permission from [81]. 

Direct measurement of the interaction potential between tethered ligand (biotin) and receptor (streptavidin) 
have been reported by Wong et al [16] and demonstrate the possibility of controlling range and dynamics of 
specific biologic interactions via a flexible PEG-tether. 

The adhesion and fusion mechanisms between bilayers have also been studied with the SFA [88, 100 ], Kuhl et 
al [17] found that solutions of short-chained polymers (PEG) could produce a short-range depletion attraction 
between lipid bilayers, which clearly depends on the polymer concentration ( figure B 1.20. 11 ). This depletion 
attraction was found to induce a membrane fusion within 10 minutes that was observed, in real-time, using 
FECO fringes. There has been considerable progress in the preparation of fluid membranes to mimic natural 
conditions in the SFA [87], which promises even more exciting discoveries in biologically relevant areas. 
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Figure Bl.20.11. Force curves of DMPC/DPPE (dimyristoyl phosphatidylcholine and dipalmitoyl 
phosphatidylethanolamine) bilayers across a solution of PEG at different concentrations. Clearly visible is a 
concentration-dependent depletion attraction, with permission from [17]. 

B1. 20.3.4 MEASUREMENTS IN MOLECULAR TRIBOLOGY 

Using friction attachments (see section (b 1.20. 2.4) ), many remarkable discoveries related to thin-film and 
boundary lubrication have been made with the SFA. The dynamic aspect of confined molecules at a sliding 
interface has been extensively investigated and the SFA had laid the foundation for molecular tribology long 
before the AFM technique was available. 

The often-cited Amontons' law [ 101 , 102 ] describes friction in terms of a friction coefficient, which is, a 
priori, a material constant, independent of contact area or dynamic parameters, such as sliding velocity, 
temperature or load. We know today that all of these parameters can have a significant influence on the 
magnitude of the measured friction force, especially in thin-film and boundary-lubricated systems. 

Using the SFA technique, it could be demonstrated that there is an intimate relationship between adhesion 
hysteresis and friction [28, 29, 68, 77]. Both processes dissipate energy through non-equilibrium mechanisms 
at the interface [30]. Friction can be represented as a sum of two terms, one adhesion related and the other 
load-related. It was recently shown with the SFA that, in the absence of adhesion, the load related portion 
linearly depends on the load and not necessarily on the real contact area, as commonly believed [25], Figure 
Bl.20.12 nicely illustrates this finding. 
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Figure Bl.20.12. Measured friction force, F, and real contact area, A, against externally applied load, Z, for 
two molecularly smooth mica surfaces sliding in 0.5 M KC1 solution, i.e. in the absence of adhesion, with 
permission from [25], 

A traditional subject of discussion is the phenomenon of intermittent friction, also called stick -slip friction 
[35]. It is thought that interfacial molecules can switch between different dynamic states, namely between a 
solid-like and a liquid-like state [ 103 ]. It was also found that liquids become oriented [7] and solid-like, when 
confined in a narrow gap. A diffuse layering of the trapped molecules may occur (see also section 
(Bl.20.3.1) ). The ordering mechanisms are particularly susceptible to the shape of the molecules and can spur 
substantial history and time effects, as illustrated in figure B 1.20. 13 . Molecules in such ordered arrangements 
no longer behave as liquids and exhibit, for example, a finite yield stress [8]. When sliding above a critical 
speed, however, intermittent friction often disappears and the interface remains liquid-like at all times. This 
dynamic phenomenon, which occurs in the absence of hydrodynamic lubrication, is due to a time effect at the 
molecular level. Furthermore, on molecular layers of surfactants it was observed that, at even higher speeds, 
the system can enter a superkinetic regime with vanishingly small friction forces [70], as depicted in figure 
Bl.20.14 . In this high-speed regime, mechanisms of molecular entanglement are too slow to dissipate energy. 
New findings point out a more complex behaviour of such systems in terms of multiple relaxation 
mechanisms at the molecular interface [77]. It has also been shown experimentally [28] and by molecular 
dynamics simulation [ 104 ] that there is a potential to control interfacial dynamics with subnanometre out-of- 
plane excitations to achieve ultralow friction at arbitrarily slow sliding velocities. 
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Figure Bl.20.13. Temporal development of intermittent friction following commencement of sliding. The 
shape of the molecules has a great influence on history and time effects in the system. Reproduced with 
permission from [34]. 
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Figure Bl.20.14. Dynamic friction at different velocities of DHDAA-coated mica (DHDAA 
dihexadecyldimethylammonium acetate). Reproduced with permission from [70]. 
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Polymer-bearing surfaces have also attracted considerable attention in view of their particular friction 
properties and underlying mechanisms. Klein et al [9] have demonstrated the possibility of achieving ultra- 
low friction on surfaces covered with polymer brushes. Figure B 1.20. 15 displays the strong lubricating effect, 
which is due to a very fluid layer at the interface between polymer and solvent — a layer that remains fluid 
even under high compression. Accompanying molecular dynamics simulations [ 105 ] suggest that a dynamic 
disentanglement of the polymer chains is responsible for the observed reduction in friction. 



Figure Bl.20.15. Shear force as a function of time for (a) bare mica in toluene and (b) polystyrene-covered 
mica in toluene. Reproduced with permission from [9]. 


REFERENCES 


[1] Bowden F P and Tabor D 1964 Friction and Lubrication of Solids Part II (Oxford: Oxford University Press) 

[2] Tabor D and Winterton R H S 1969 The direct measurement of normal and retarded van der Waals forces Proc. R. Soc. 
London A 312 435-50 

[3] Israelachvili J N and Tabor D 1 972 The measurement of van der Waals dispersion forces in the range 1 .5 to 1 30 nm 
Proc. R. Soc. London A 331 19-38 

[4] Israelachvili J N 1991 Intermolecular and Surface Forces 2nd edn (London: Academic) 

[5] Luckham P F and Manimaaran S 1997 Investigating adsorbed polymer layer behaviour using dynamic surface forces 
apparatuses — a review Adv. Colloid Interface Sci. 73 1-46 

[6] Kelly T\N et al 1998 Direct force measurements at polymer brush surfaces by atomic force microscopy Macromolecules 
31 4297-300 


-19- 


[7] Dhinojwala A and Granick S 1997 Surface forces in the tapping mode: solvent permeability and hydrodynamic 
thickness of adsorbed polymer brushes Macromolecules 30 1079-85 

[8] Luengo G, Israelachvili J N and Granick S 1996 Generalized effects in confined fluids: new friction map for boundary 
lubrication Wear 200 328-35 

[9] Klein J et al 1994 Reduction of frictional forces between solid surfaces bearing polymer brushes Nature 370 634-7 

[10] Mangipudi V S et al 1996 Measurement of interfacial adhesion between glassy polymers using the JKR method 
Macromol. Symp. 102 131-43 

[1 1 ] Tonck A, Georges J M and Loubet J L 1 988 Measurements of intermolecular forces and the rheology of dodecane 
between alumina surfaces J. Colloid Interface Sci. 126 150-5 

[12] Borwankar R P and Case S E 1997 Rheology of emulsions, foams and gels Curr. Opin. Colloid Interface Sci. 2 584-9 


[1 3] Luengo Ge(a/1 997 Thin film rheology and tribology of confined polymer melts: contrasts with bulk properties 
Macromolecules 30 2482-94 

[14] Demirel A L and Granick S 1996 Relaxations in molecularly thin liquid films J. Phys. : Condens. Matters 9537-9 

[15] Leckband D 1995 The surface force apparatus — a tool for probing molecular protein interactions Nature 376 617-18 

[16] Wong JYefa/ 1997 Direct measurement of a tethered ligand-receptor interaction potential Science 275 820-2 

[17] Kuhl T et al 1996 Direct measurement of polyethylene glycol induced depletion attraction between lipid bilayers 
LangmuirW 3003-14 

[18] Pincet F et al 1994 Long-range attraction between nucleosides with short-range specificity: direct measurements Phys. 
Rev. Lett. 73 2780-3 

[19] lonov R, De Coninck J and Angelova A 1996 On the origin of the long-range attraction between surface-confined DNA 
bases Thin Solid Films 284-285 347-51 

[20] Richetti P et al 1996 Measurement of the interactions between two ordering surfaces under symmetric and asymmetric 
boundary conditions Phys. Rev. E 54 1749-62 

[21] Idziak SHJe(a/ 1996 Structure in a confined smectic liquid crystal with competing surface and sample elasticities 
Phys. Rev. Lett. 76 1477-80 

[22] Idziak SH J et al 1996 Structure under confinement in a smectic-A and lyotropic surfactant hexagonal phase Physica 
B 221 289-95 

[23] Giasson S, Israelachvili J N and Yoshizawa H 1997 Thin film morphology and tribology study of mayonnaise J. Food 
Sci. 62 640-4 

[24] Luengo G et al 1997 Thin film rheology and tribology of chocolate J. Food Sci. 62 767-72 

[25] Berman A, Drummond C and Israelachvili J N 1998 Amontons' law at the molecular level Tribol. Lett. 4 95-101 

[26] Bhushan B, Israelachvili J N and Landman U 1995 Nanotribology: friction, wear and lubrication at the atomic scale 
Nature 374 607-16 

[27] Granick S 1991 Motions and relaxations of confined liquids Science 253 1374-9 

[28] Heuberger M, Drummond C and Israelachvili J N 1998 Coupling of normal and transverse motions during frictional 
sliding J. Phys. Chem. B 102 5038-41 

[29] Israelachvili J N, Chen Y L and Yoshizawa H 1994 Relationship between adhesion and friction forces J. Adhes. Sci. 
Technol. 8 1231-49 

[30] Israelachvili J N and Berman A 1995 Irreversibility, energy dissipation, and time effects in intermolecular and surface 
interactions Israel J. Chem. 35 85-91 


-20- 


[31] Krim J 1996 Friction at the atomic scale Sci. Am. 275 74-80 

[32] Kumacheva E 1998 Interfacial friction measurement in surface force apparatus Prog. Surf. Sci. 58 75-120 

[33] Reiter G, Demirel A L and Granick S 1994 From static to kinetic friction in confined liquid films Science 263 1741-4 

[34] Yoshizawa H and Israelachvili J N 1993 Fundamental mechanisms of interfacial friction. 2. Stick-slip friction of 
spherical and chain molecules J. Phys. Chem. 97 11 300-13 

[35] Yoshizawa H, McGuiggan P and Israelachvili J N 1993 Identification of a second dynamic state during stick-slip motion 
Science 259 1 305-8 

[36] Everson M P and Ohtani M 1998 New opportunities in automotive tribology Tribol. Lett. 5 1-12 

[37] Tabor D 1992 Fundamentals of Friction ed I L Singer and H M Pollock (London: Kluwer) 

[38] Luesse C et al 1988 Drive mechanism for a surface force apparatus Rev. Sci. Instrum. 59 81 1-2 

[39] Frantz P, Agrait N and Salmeron M 1996 Use of capacitance to measure surface forces. 1 . Measuring distance of 
separation with enhanced spacial and time resolution LangmuirW 3289-94 

[40] Frantz P et al 1997 Use of capacitance to measure surface forces. 2. Application to the study of contact mechanics 
LangmuirW 5957-61 

[41] Derjaguin B V 1 934 Ko//o/cy Zeitschrift 69 1 55-64 


[42] Stewart A M and Christenson H K 1990 Use of magnetic forces to control distance in a surface force apparatus Meas. 
Sci. TechnolM 1301-3 

[43] Frantz Pe(a/1 997 Design of surface forces apparatus for tribology studies combined with nonlinear optical 
spectroscopy Rev. Sci. Instrum. 68 2499-2504 

[44] Ribbe P H (ed) 1984 Micas Reviews in Mineralogy vol 13 (Chelsea, Ml: BookCrafters) 

[45] Ohnishi S et al 1999 Presence of particles on melt-cut mica sheets LangmuirlS 3312-6 

[46] Frantz P and Salmeron M 1998 Preparation of mica surfaces for enhanced resolution and cleanliness in the surface 
forces apparatus Tribol. Lett. 5 151-3 

[47] Tolansky S 1 948 Multiple Beam Interferometry of Surfaces and Films (Oxford: Oxford University Press) 

[48] Hunter S C and Nabarro F R N 1952 The origin of Glauert's superposition fringes Phil. Mag. 43 538-46 

[49] Israelachvili J N 1973 Thin film studies using multiple beam interferometry J. Colloid Interface Sci. 44 259-71 

[50] Clarkson M T 1989 Multiple-beam interferometry with thin metal films and unsymmetrical systems J. Phys. D: Appl. 
Phys. 22 475-82 

[51] Born M and Wolf E 1980 Principles of Optics 6th edn (Oxford: Pergamon) 

[52] Horn R G and Smith D T 1991 Analytical solution for the three-layer multiple beam interferometer Appl. Opt. 30 59-65 

[53] Rabinowitz P 1995 Eigenvalue analysis of the surface forces apparatus interferometer J. Opt. Soc. Am A 12 1593-601 

[54] Heuberger M, Luengo G and Israelachvili J N 1997 Topographic information from multiple beam interferometry in the 
surface forces apparatus LangmuirlZ 3839-48 

[55] Muller C, Machtle P and Helm C A 1994 Enhanced absorption within a cavity. A study of thin dye layers with the 
surface forces apparatus J. Phys. Chem. 98 1 1 1 19-25 

[56] Machtle P, Muller C and Helm C A 1994 A thin absorbing layer at the center of a Fabry-Perot interferometer J. 
Physique II 4 481-500 

[57] Levins J M and Vanderlick T K 1994 Extended spectral analysis of multiple beam interferometry: a technique to study 
metallic films in the surface forces apparatus LangmuirlQ 2389-94 


-21- 


[58] Farrell B, Bailey A I and Chapman D 1995 Experimental phase changes at the mica-silver interface illustrate the 

experimental accuracy of the central film thickness in a symmetrical three-layer interferometer Appl. Opt. 34 2914-20 

[59] Levins J M and Vanderlick T K 1993 Impact of roughness of reflective films on the application of multiple beam 
interferometry J. Colloid Interface Sci. 158 223-7 

[60] Levins J M and Vanderlick T K 1 992 Reduction of the roughness of silver films by the controlled application of surface 
forces J. Phys. Chem. 96 10 405-1 1 

[61] Horn RG et al 1996 The effect of surface and hydrodynamic forces on the shape of a fluid drop approaching a solid 
surface J. Phys. : Condens. Matters 9483-90 

[62] Israelachvili J N and Adams G E 1976 Direct measurement of long range forces between two mica surfaces in 
aqueous KN03 solutions Nature 262 774 

[63] Israelachvili J N 1987 Direct measurements of forces between surfaces in liquids at the molecular level Proc. Nat. 
Acad. Sci. USA 84 4722-5 

[64] Israelachvili J N 1989 Techniques for direct measurement of forces between surfaces in liquids at the atomic scale 
Chemtracts — Anal. Phys. Chem. 11-12 

[65] Israelachvili J N and McGuiggan P M 1990 Adhesion and short-range forces between surfaces. I. New apparatus for 
surface force measurements J. Mater. Res. 5 2223-31 

[66] Parker J L, Christenson H K and Ninham B W 1989 Device for measuring the force and separation between two 
surfaces down to molecular separations Rev. Sci. Instrum. 60 

[67] Klein J 1983 J. Chem. Soc. Faraday Trans. I 79 99 

[68] Dan N 1996 Time-dependent effects in surface forces Current Opinion Colloid Interface Sci. 1 48-52 

[69] Kutzner H B, Luckham P F and Rennie J 1996 Measurement of the viscoelastic properties of thin surfactant films 


Faraday Discuss. 104 9-16 

[70] Yoshizawa H, Chen Y L and Israelachvili J N 1993 Recent advances in molecular level understanding of adhesion, 
friction and lubrication l/l/ear 168 161-6 

[71] Peachey J, van Alsten J and Granick S 1991 Design of an apparatus to measure the shear response of ultrathin liquid 
films Rev. Sci. Instrum. 62 463-73 

[72] Idziak SHJe(a/ 1994 The x-ray surface forces apparatus: structure of a thin smectic liquid crystal film under 
confinement Science 264 1915-8 

[73] Horn R, Smith D T and Haller W 1989 Surface forces and viscosity of water measured between silica sheets Chem. 
Phys. Lett. 162 404-8 

[74] Ruths M, Steinberg S and Israelachvili J N 1996 Effects of confinement and shear on the properties of thin films of 
thermotropic liquid crystal Langmuir 12 6637-50 

[75] Wanless E J and Christenson H K 1994 Interaction between surfaces in ethanol: adsorption, capillary condensation, 
and solvation forces J. Chem. Phys. 101 4260-7 

[76] Dedinaite A et al 1998 Interactions between modified mica surfaces in triglyceride media Langmuir 14 5546-54 

[77] Yamada S and Israelachvili J N 1998 Friction and adhesion hysteresis of fluorocarbon surfactant monolayer-coated 
surfaces measured with the surface forces apparatus J. Phys. Chem. B 102 234-44 

[78] Ruths M and Granick S 1998 Rate-dependent adhesion between opposed perfluoropoly (alkyl ether) layers: 
dependence on chain-end functionality and chain length J. Phys. Chem. B 102 6056-63 

[79] Lowack K and Helm C A 1998 Molecular mechanisms controlling the self-assembly process of polyelectrolyte 
multilayers Macromolecules 31 823-33 


-22- 


[80] Luengo G et al 1997 Measurement of the adhesion and friction of smooth C-60 surfaces Chem. Mater. 9 1 166-71 

[81] Leckband D et al 1994 Direct force measurements of specific and nonspecific protein interactions Biochemistry 33 
4611-23 

[82] Chowdhury P B and Luckham P F 1995 Interaction forces between kappa-casein adsorbed on mica Colloids Surfaces 
B 4 327-34 

[83] Holmberg M et al 1997 Surface force studies of Langmuir-Blodgett cellulose films J. Colloid Interface Sci. 186 369-81 

[84] Kuhl T L et al 1994 Modulation of interaction forces between bilayers exposing short-chained ethylene oxide 
headgroups Biophys. J. 66 1479-88 

[85] Nylander T and Wahlgren N M 1997 Forces between adsorbed layers of beta-casein Langmuir 13 6219-25 

[86] Yu Z W, Calvert T L and Leckband D 1998 Molecular forces between membranes displaying neutral 
glycosphingolipids: evidence for carbohydrate attraction Biochemistry 37 1540-50 

[87] Seitz M et al 1998 Formation of tethered supported bilayers via membrane-inserting reactive lipids Thin Solid Films 
327-9 767-71 

[88] Wolfe J et al 1991 The interaction and fusion of bilayers formed from unsaturated lipids Eur. Biophys. J. 19 275-81 

[89] Rutland M W and Parker J L 1994 Surface forces between silica surfaces in cationic surfactant solutions: adsorption 
and bilayer formation at normal and high pH Langmuir 10 1 1 10-21 

[90] Xu Z H, Ducker W and Israelachvili J N 1996 Forces between crystalline alumina (sapphire) surfaces in aqueous 
sodium dodecyl sulfate surfactant solutions Langmuir 12 2263-70 

[91] Horn R G, Clarke D R and Clarkson M T 1988 Direct measurement of surface forces between sapphire crystals in 
aqueous solutions J. Mater. Res. 3 413-6 

[92] Cleveland J P, Schaffer T E and Hansma P K 1995 Probing oscillatory hydration potentials using thermal-mechanical 
noise in an atomic force microscope Phys. Rev. B 52 R8692-5 

[93] Israelachvili J N and Pashley R M 1983 Molecular layering of water at surfaces and origin of repulsive hydration forces 
Nature 306 249-50 

[94] Israelachvili J N, McGuiggan P and Horn R 1992 Basic physics of interactions between surfaces in dry, humid, and 
aqueous environments 1st Int. Symp. on Semiconductor Waver Bondings: Science, Technology and Applications 
(Pennington, NJ: Electrochemical Society) 


[95] Stanley H E 1999 Unsolved mysteries of water in its liquid and glass states MRS Bull. May 22-30 

[96] Israelachvili J N 1996 Role of hydration and water structure in biological and colloidal interactions Nature 379 219-25 

[97] Muller H J 1998 Extraordinarily thick water films on hydrophilic solids: a result of hydrophobic repulsion? Langmuir 14 
6789-92 

[98] Pierres A, Benoliel A M and Bongrand P 1996 Measuring bonds between surface-associated molecules J. Immunol. 
Methods 196 105-20 

[99] Leckband Defa/1 993 Measurements of conformational changes during adhesion of lipid and protein (polylysine and 
S-layer) surfaces Biotech. Bioeng. 42 167-77 

[100] Helm C A, Israelachvili J N and McGuiggan P M 1992 Role of hydrophobic forces in bilayer adhesion and fusion 
Biochemistry 31 1794-805 

[101] Amontons G 1699 De la resistence cause dans les machines Memoires de I'Academie Royale A 275-82 

[102] Coulomb C A 1785 Theorie des machines simples Memoire de Mathematique et de Physique de I'Academie Royale 
161-342 

[103] Gee ULet al 1990 Liquid to solidlike transitions of molecularly thin films under shear J. Chem. Phys. 93 1895-905 

[104] Gao J, Luedtke W and Landman U 1998 Friction control in thin-film lubrication J. Phys. Chem. B 102 5033-7 

[105] Grest G S 1996 Interfacial sliding of polymer brushes: a molecular dynamics simulation Phys. Rev. Lett. 76 4979-82 


-23- 


FURTHER READING 

Derjaguin B V 1934 Research in Surface Forces (New York: Consultants Bureau) 

An old classic: four volumes. 

Israelachvili J N 1991 Intermodular and Surface Forces (London: Academic) 

The most often cited reference about surface forces and SFA. 

Hutchings I M 1992 Friction and Wear of Engineering Materials (London: Arnold) 

A good introduction to tribology. 

Bhushan B 1999 Handbook of Micro/Nano Tribology (Boca Raton, FL: CRC) 

A valuable reference to anyone involved with friction at small scales. 


-1- 

B1.21 Surface structural determination: diffraction 
methods 

Michel A Van Hove 


B1.21.1 INTRODUCTION 

Diffraction methods have provided the large majority of solved atomic-scale structures for both the bulk 
materials and their surfaces, mainly in the crystalline state. Crystallography by diffraction tends to filter out 
defects and focus on the periodic part of a structure. By adding contributions from very many unit cells, 
diffraction gives results that are, in effect, averaged over space and time. This is excellent for investigating 
stable states of solid matter as they occur in well crystallized samples; some forms of disorder can also be 
analysed reasonably well. Diffraction, however, is much less appropriate for examining inhomogeneous and 
time-dependent events such as transition states and pathways in chemical reactions. 

For bulk structural determination (see chapter B 1.9 ), the main technique used has been x-ray diffraction 
(XRD). Several other techniques are also available for more specialized applications, including: electron 
diffraction (ED) for thin film structures and gas-phase molecules; neutron diffraction (ND) and nuclear 
magnetic resonance (NMR) for magnetic studies (see chapter B 1.1 2 and chapter B 1.1 3 ); x-ray absorption fine 
structure (XAFS) for local structures in small or unstable samples and other spectroscopies to examine local 
structures in molecules. Electron microscopy also plays an important role, primarily through imaging (see 
chapter B 1.1 7 ). 

At surfaces, the primary challenge is to obtain the desired surface sensitivity. Ideally, one wishes to gain 
structural information about those atomic layers which differ in their properties from the underlying bulk 
material. This means in practice extracting the structure of the first few monolayers, i.e. atoms within about 5- 
10 A (0.5-1 nm) of the vacuum above the surface. The above-mentioned bulk methods, if applied unchanged, 
do not easily provide sensitivity to this very thin slice of matter. The challenge becomes even greater when 
dealing with an interface between two materials, including solid/liquid and solid/gas interfaces. A number of 
mechanisms are available to obtain surface sensitivity on the required depth scale. We shall describe some of 
them in the next section, with emphasis on the solid/vacuum interface. 

However, it is necessary to first discuss the meaning of 'diffraction', because this concept can be interpreted 
in several ways. After these fundamental aspects are dealt with, we will take a statistical and historical view of 
the field. It will be seen that many different diffraction methods are available for surface structural 
determination. 

It will also be useful to introduce concepts of two-dimensional ordering and the corresponding nomenclature 
used to characterize specific structures. We can then describe how the surface diffraction pattern relates to the 
ordering and, thus, provides important two-dimensional structural information. 

We will, in the latter part of this discussion, focus only on those few methods that have been the most 
productive, with low-energy electron diffraction (LEED) receiving the most attention. Indeed, LEED has been 
the most successful surface structural method in two quite distinct ways. First, LEED has become an almost 
universal characterization 


technique for single-crystal surfaces: the diffraction pattern is easily imaged in real time and is very helpful in 
monitoring the state of the surface in terms of the ordering and, hence, also density, of adsorbed atoms and 
molecules. Second, LEED has been quite successful in determining the detailed atomic positions at a surface 
(e.g., interlayer distances, bond lengths and bond angles), especially for ordered structures. This relies 
primarily on simulating the intensity (current) of diffracted beams as a function of electron energy in order to 
fit assumed model structures to measured data. Because of multiple scattering, such simulation and fitting is a 
very different and much more difficult task than looking at a diffraction pattern. We will close with a 


description of the state of the art and an outlook on the future of the field. 


B1.21.2 FUNDAMENTALS OF SURFACE DIFFRACTION METHODS 

B1.21.2.1 DIFFRACTION 

(A) DIFFRACTION AND STRUCTURE 

Diffraction is based on wave interference, whether the wave is an electromagnetic wave (optical, x-ray, etc), 
or a quantum mechanical wave associated with a particle (electron, neutron, atom, etc), or any other kind of 
wave. To obtain information about atomic positions, one exploits the interference between different scattering 
trajectories among atoms in a solid or at a surface, since this interference is very sensitive to differences in 
path lengths and hence to relative atomic positions (see chapter B 1.9 ). 

It is relatively straightforward to determine the size and shape of the three- or two-dimensional unit cell of a 
periodic bulk or surface structure, respectively. This information follows from the exit directions of diffracted 
beams relative to an incident beam, for a given crystal orientation: measuring those exit angles determines the 
unit cell quite easily. But no relative positions of atoms within the unit cell can be obtained in this manner. To 
achieve that, one must measure intensities of diffracted beams and then computationally analyse those 
intensities in terms of atomic positions. 

With XRD applied to bulk materials, a detailed structural analysis of atomic positions is rather straightforward 
and routine for structures that can be quite complex (see chapter B 1.9 ): direct methods in many cases give 
good results in a single step, while the resulting atomic positions may be refined by iterative fitting procedures 
based on simulation of the diffraction process. 

With ED, by contrast, the task is more complicated due to multiple scattering of the electrons from atom to 
atom (see chapter B 1.1 7 ). Such multiple scattering is especially strong at the relatively low energies employed 
to study surfaces. This dramatically restricts the application of direct methods and strongly increases the 
computational cost of simulating the diffraction process. As a result, an iterative trial-and-error fitting is the 
method of choice with ED, even though it can be a slow process when many trial structures have to be tested. 

Also, the result of any diffraction-based trial-and-error fitting is not necessarily unique: it is always possible 
that there exists another untried structure that would give a better fit to experiment. Hence, a multi-technique 
approach that provides independent clues to the structure is very fruitful and common in surface science: such 
clues include chemical composition, vibrational analysis and position restrictions implied by other structural 
methods. This can greatly restrict the number of trial structures which must be investigated. 


(B) NON-PERIODIC STRUCTURES 

Diffraction is not limited to periodic structures [1]. Non-periodic imperfections such as defects or vibrations, 
as well as sample-size or domain effects, are inevitable in practice but do not cause much difficulty or can be 
taken into account when studying the ordered part of a structure. Some other forms of disorder can also be 
handled quite well in their own right, such as lattice-gas disorder in which a given site in the unit cell is 
randomly occupied with less than 100% probability. At surfaces, lattice-gas disorder is very common when 
atoms or molecules are adsorbed on a substrate. The local adsorption structure in the given site can be studied 
in detail. 


(C) NON-PLANAR INITIAL WAVES 

More fundamental is the distinction between planar and spherical initial waves. In XRD, for instance, the 
incident x-rays are well described by plane waves; this is generally true of probes that are aimed at the sample 
from macroscopic distances, as is the case also in most forms of ED and ND. However, there are techniques 
in which a wave is generated locally within the sample, for instance through emission of an x-ray (by 
fluorescence) or an electron (by photoemission) from a sample atom. In such point-source emission, the wave 
which performs the useful diffraction initially has a spherical rather than planar character; it is centred on the 
nucleus of an atom, with a rapidly decaying amplitude as it travels away from the emitting site. (Depending 
on the excitation mechanism, this initial wave need not be spherically symmetrical, but may also have an 
angular variation, as given by spherical harmonics, for instance, or combinations thereof.) 

This spherical outgoing wave can diffract only from atoms that are near to the emitting atom, mainly those 
atoms within a distance of a few atomic diameters. In these circumstances, the crystallinity of the sample is of 
less importance:the diffracting wave sees primarily the local atomic-scale neighbourhood of the emitting 
atoms. As long as the same local neighbourhood predominates everywhere in the sampled part of the surface, 
information about the structure of that neighbourhood can be extracted. It also helps very much if the local 
neighbourhood has a constant orientation, so that the experiment does not average over a multitude of 
orientations, since these tend to average out diffraction effects and thus wash away structural information. 

(D) VARIETY OF DIFFRACTION METHODS 

From the above descriptions, it becomes apparent that one can include a wide variety of techniques under the 
label 'diffraction methods'. Table Bl.21.1 lists many techniques used for surface structural determination, and 
specifies which can be considered diffraction methods due to their use of wave interference ( table Bl.21.1 
also explains many technique acronyms commonly used in surface science). The diffraction methods range 
from the classic case of XRD and the analogous case of LEED to much more subtle cases like XAFS (listed 
as both SEXAFS (surface extended XAFS) and NEXAFS (near-edge XAFS) in the table). 


Table Bl.21.1. Surface structural determination methods. The second column indicates whether a technique 
can be considered a diffraction method, in the sense of relying on wave interference. Also shown are statistics 
of surface structural determinations, extracted from the Surface Structure Database [14], up to 1997. Counted 
here are only 'detailed" and complete structural determinations, in which typically the experiment is simulated 
computationally and atomic positions are fitted to experiment. (Some structural determinations are performed 
by combining two or more methods: those are counted more than once in this table, so that the columns add 
up to more than the actual 1113 structural determinations included in the database.) 


Surface structural determination method 


Number of 
Diffraction structural 
method? determinations 


Percentage of 

structural 

determinations 


LEED yes 

IS (including LEIS, MEIS and HEIS for low-, no 

medium- and high-energy ion scattering) 


751 
102 


67.5 
9.2 


PD (covers a variety of other acronyms, like yes 88 7.9 

ARPEFS, ARXPD, ARXPS, ARUPS, NPD, 
OPD, PED) 

6.0 

4.7 

3.6 

1.2 


1.0 
0.9 
0.4 
0.4 
0.3 
0.3 
0.2 

0.1 
0.1 
0.1 


SEXAFS 

yes 

67 

XSW 

yes 

52 

XRD (also GIXS, GIXD) 

yes 

40 

TOF-SARS (time-of-flight scattering and 
recoiling spectrometry) 

no 

13 

NEXAFS (also called XANES) 

yes 

11 

RHEED 

yes 

10 

LEPD 

yes 

5 

HREELS 

yes 

4 

MEED 

yes 

3 

AED 

yes 

3 

SEELFS (surface extended energy loss fine 
structure) 

yes 

2 

TED 

yes 

1 

AD 

yes 

1 

STM 

no 

1 


XAFS is a good example of less obvious diffraction [2, 3]. In XAFS, an electron is emitted by an x-ray locally 
within the sample. It propagates away as a spherical wave, which is allowed to back-scatter from 
neighbouring atoms to the emitter atom. The back-scattered electron wave interferes at the emitting atom with 
the emitted wave, thereby modulating the probability of the emitting process itself when the energy 
(wavelength) is varied: as one cycles through constructive and destructive interferences, the emission 
probability oscillates with a period that reflects the interatomic distances. This emission probability is, 
however, measured through yet another process (e.g.,absorption of the incident x-rays, or emission of other x- 
rays or other electrons), which oscillates in synchrony with the interference. Thus, the structure-determining 
diffraction is in such a case buried relatively deeply in the overall process, and does not closely resemble the 
classic plane-wave diffraction of XRD. 

B1.21.2.2 SURFACE SENSITIVITY 

There are several approaches to gain the required surface sensitivity with diffraction methods. We review 
several of these here, emphasizing the case of solid/vacuum interfaces; some of these also apply to other 
interfaces. 

(A) SHORT MEAN FREE PATH 


One obvious method to obtain surface sensitivity is to choose probes and conditions that give shallow 
penetration. This can be achieved through a short mean free path X, i.e. a short average distance until the 
probe (e.g., x-ray or electron) is absorbed by energy loss or is otherwise removed from the useful diffraction 
channels. For typical x-rays, X is of the order of micrometres in many materials, which is too large compared 
to the desired surface thickness [4]. But for electrons of low kinetic energies, i.e. E « 10-1000 eV, the mean 
free path X is of the order of 5-20 A [5]. The mean free path has a minimum in the 100-200 eV range, with 
larger mean free paths existing both below and above this range. 

Such ideal low mean free paths are the basis of LEED, the technique that has been used most for determining 
surface structures on the atomic scale. This is also the case of photoelectron diffraction (PD): here, the mean 
free path of the emitted electrons restricts sensitivity to a similar depth (actually double the depth of LEED, 
since the incident x-rays in PD are only weakly attenuated on this scale). 

(B) GRAZING INCIDENCE AND/OR EMERGENCE 

Another approach to limit the penetration of the probe into the surface region is to use grazing incidence 
and/or grazing emergence; this works for those probes that already have a reasonably small mean free path X. 
A grazing angle (measured from the surface normal, i.e., close to 90°) then allows the probe to penetrate 
to a depth of only about X cos (0). This approach is used primarily for higher-energy electrons above about 
1000 eV in a technique called reflection high-energy electron diffraction (RHEED) [6]. 

With XRD, however, the mean free path is still too long to make this approach practical by itself [4]: as an 
example, to obtain even 100 A penetration, one would typically need to use a grazing angle of about 0.05°, 
which is technically extremely demanding. The penetration depth is proportional to the grazing angle of 
incidence at such small angles, so that a ten times smaller penetration depth requires a further tenfold 
reduction in grazing angle. In addition, such small grazing angles require samples with a flatness that is 
essentially impossible to achieve, in order that the x-rays see a flat surface rather than a set of ridges that 
shadow much of the surface. 


(C) TOTAL EXTERNAL REFLECTION 

In XRD, surface sensitivity can, however, be achieved through another phenomenon [4]: total external 
reflection. This also occurs at grazing angles of incidence, giving rise to the technique acronym of GIXS for 
grazing-incidence x-ray scattering. At angles within approximately 0.5° of = 90°, x-rays cannot penetrate by 
refraction into materials: the laws of optics imply that the wave velocity of refracted waves in the material 
would have to be larger than the speed of light under those circumstances, which is impossible for 
propagating waves. Instead, the incident wave is totally reflected. However, this is accompanied by a shallow 
penetration of waves that decay exponentially into the bulk while propagating parallel to the surface. Under 
such conditions, the decay length into the surface is of the order of 10-30 A, as desired. This penetration 
depth depends on the material and not on the wavelength of the x-rays. Note that total external reflection does 
not require vacuum: it can occur at various kinds of interfaces, depending on the relative optical constants of 
the phases in contact. 

(D) HIGH-SURFACE AREA MATERIALS 

None of the above methods is sufficient for neutrons, however. Neutrons penetrate matter so easily that the 
only effective approach is to use materials with a very high surface-to-volume ratio. This can be accomplished 
with small particles and exfoliated graphite, for instance, but the technique has essentially been abandoned in 
surface studies [7, §]. 


(E) SUPERLATTICE DIFFRACTION 

One further method for obtaining surface sensitivity in diffraction relies on the presence of two-dimensional 
superlattices on the surface. As we shall see further below, these correspond to periodicities that are different 
from those present in the bulk material. As a result, additional diffracted beams occur (often called fractional- 
order beams), which are uniquely created by and therefore sensitive to this kind of surface structure. XRD, in 
particular, makes frequent use of this property [4]. Transmission electron diffraction (TED) also has used this 
property, in conjunction with ultrathin samples to minimize bulk contributions [9]. 

(F) HYBRID METHODS 

As we have seen, the electron is the easiest probe to make surface sensitive. For that reason, a number of 
hybrid techniques have been designed that combine the virtues of electrons and of other probes. In particular, 
electrons and photons (x-rays) have been used together in techniques like PD [ 10 ] and SEXAFS (or EXAFS, 
which is the high-energy limit of XAFS) [2, 11 ]. Both of these rely on diffraction by electrons, which have 
been excited by photons. In the case of PD, the electrons themselves are detected after emission out of the 
surface, limiting the depth of 'sampling' to that given by the electron mean free path. 

(G) ELEMENTAL AND CHEMICAL-STATE RESOLUTION 

With some techniques, another mechanism can give high surface sensitivity, namely elemental resolution 
through spectroscopic filtering of emitted electrons or x-rays. In this approach, one detects, by setting an 
energy window, only those electrons or x-rays that are emitted by a particular kind of atom, since each 
electronic level produces a line at a particular energy given by the level energy augmented by the excitation 
energy. 


Thus, if a 'foreign' element is present only at the surface, one can detect a signal that only comes from that 
element and, therefore, only from the surface. Given sufficient energy resolution, one can even differentiate 
electrons coming from the same atoms in different bonding environments: e.g., in the case of a clean surface, 
atoms of the outermost layer versus bulk atoms [10]. This chemical-state resolution is due to the fact that 
electronic levels are shifted by bonding to other atoms, resulting in different emitted lines from atoms in 
different bonding situations. 

Elemental and chemical-state resolution affords the possibility of detecting only a monolayer or even a 
fraction of a monolayer. This approach is prevalent in PD and in methods based on x-ray fluorescence. 

It is also used in SEXAFS [11]: as we have seen, photoexcited electrons are back-reflected to the 
photoemitting atoms, thereby modulating the x-ray absorption cross-section through electron wave 
interference, after which a secondary electron or ion or fluorescent x-ray is ejected from the surface and 
finally detected. This latter ejection process provides surface sensitivity, through the electronic mean free path 
or the shallowness of ionic emission. However, elemental and chemical-state selection by energy filtering is 
essentially universal here, and again can give monolayer resolution with emission from foreign surface atoms 
different from the bulk atoms. 

A similar device can be applied to a form of x-ray diffraction called the x-ray standing wave (XSW) method 
[ 12 , 13], as detected by fluorescence. Here, x-ray waves reflected from bulk atomic planes form a standing 
wave pattern near the surface. The maxima and minima of this standing wave pattern can be arranged to fall at 
different locations on the atomic scale, by varying the energy and incidence angles. Thereby, the induced 
fluorescence varies with the location of those maxima and minima. Since the fluorescence is element specific, 
one can thus determine positions of foreign surface atoms relative to the extended bulk lattice (it remains 


difficult, however, to locate those substrate atoms that are close to the fluorescing surface atoms, because they 
are drowned by the bulk signal). 


B1.21.3 STATISTICS OF FULL STRUCTURAL DETERMINATIONS 

Many methods have been developed to determine surface structure: we have mentioned several in the 
previous section and there are many more. To get an idea of their relative usage and importance, we here 
examine historical statistics. We also review the kinds of surface structure that have been studied to date, 
which gives a feeling for the kinds of surface structures that current methods and technology can most easily 
solve. This will provide an overview of the range of surfaces for which detailed surface structures are known, 
and those for which very little is known. 

As source of information we use the Surface Structure Database [14], a critical compilation of surface 
structures solved in detail, covering the period to the end of 1997. It contains 1113 structural determinations 
with, on average, two determinations for each structure: thus there are approximately 550 distinct solved 
structures available. 

In terms of individual techniques, table Bl.21.1 lists the breakdown totalled over time, counting from the 
inception of surface structural determination in the early 1970s. It is seen that LEED has contributed 
altogether about 67% of all structural determinations included in the database. The annual share of LEED was 
100% until 1978, and has generally remained over 50% since then. In 1979 other methods started to produce 
structural determinations, especially PD, ion scattering (IS) and SEXAFS. XRD and then XSW started to 
contribute results in the period 1981-3. 


As the table shows, a host of other techniques have contributed a dozen or fewer results each. It is seen that 
diffraction techniques have been very prominent in the field: the major diffraction methods have been LEED, 
PD, SEXAFS, XSW, XRD, while others have contributed less, such as NEXAFS, RHEED, low-energy 
position diffraction (LEPD), high-resolution electron energy loss spectroscopy (HREELS), medium-energy 
electron diffraction (MEED), Auger electron diffraction (AED), SEELFS, TED and atom diffraction (AD). 
The major non-diffraction method is IS, which is described in chapter B 1.23 . 

The database provides interesting perspectives on the evolution of surface structural determination since its 
inception around 1970. Not surprisingly, there is a clear temporal trend toward more complex and more 
diverse materials, such as compound substrates, alloyed bimetallic surfaces, complex adsorbate-induced 
relaxations and reconstructions, epitaxial and pseudomorphic growth, alkali adsorption on semiconductor and 
transition metal substrates, and molecular adsorbates as well as co-adsorbates on metal surfaces. The 
complexity of some solved structures has grown to about 100 times that of the earliest structures. The range of 
structure types can also be gauged, for instance, from the list of substrate lattice categories included in the 
SSD database: bcc, CdCl 2 , Cdl 2 , corundum, CsCl, CuAu I, Cu 3 Au, diamond, fee, fluorite, graphite, hep, 
hexagonal, NaCl, perovskite, rutile, spinel, wurtzite, zincblende, 2H-MoS 2 , 2H-NbSe 2 and 6H-SiC. 

Nonetheless, when counting all structures solved over time, one finds a strong predominance of studies in 
certain narrow categories, as exhibited by the following uneven statistics: 

• fee metals far outdistance any other substrate lattice type, with 60% of the total; 

• the diamond lattice (C, Si and Ge) forms the next most numerous lattice category, about 10%, followed by 
the bcc (9%) and hep (7%) lattices; 

• elemental solids (with or without foreign adsorbates) form 85% of the substrates examined, the rest being 
metallic alloys (7%) or other compounds (8%); 


• the surfaces of non-reconstructed elemental metal substrates (with or without adsorbates) constitute about 
77% of the results; the remainder are reconstructed, i.e. have undergone a substantial structural change 
from the ideal termination of the bulk lattice, involving bond breaking and/or bond making; 

• looking at electronic properties, metals again dominate heavily, with 81% of the total, followed by 
semiconductors (16%), insulators (3%) and semimetals (less than 1%); 

• atomic overlayers comprise about 54% of all types of adsorption, as opposed to interstitial (1 %) or 
substitutional (5%) underlayers, molecular overlayers (10%), multilayers (9%) or mixes of these adsorption 
modes. 

There is much room for further study of various important categories of materials: one prominent example is 
oxides and other compounds (carbides, nitrides, . . .); another is all types of adsorption on oxides and other 
compounds. 

However, recent advances in techniques will ensure further diversification and complexification of solved 
surface structures. The present maturity of techniques will thus increasingly allow the analysis of structures 
chosen for their practical interest rather than for their simplicity. 


B1.21.4 TWO-DIMENSIONAL ORDERING AND NOMENCLATURE 

In diffraction, the degree and kind of structural ordering is an important consideration, since the diffraction 
reflects those structural properties. As a result, diffraction methods are ideal for characterizing the degree and 
type of ordering that a surface exhibits. In particular, at surfaces, LEED has always been a favourite tool for 
'fingerprinting' a particular state of ordering of a surface, enhancing experimental reproducibility. It is 
therefore useful to first briefly examine the forces that are responsible for the variety of ordering types that 
occur at surfaces. Then, we can introduce standard notation to succinctly describe specific forms of ordering 
that occur at surfaces. 

B1.21.4.1 TWO-DIMENSIONAL ORDERING 

A large number of ordered surface structures can be produced experimentally on single-crystal surfaces, 
especially with adsorbates [15]. There are also many disordered surfaces. Ordering is driven by the 
interactions between atoms, ions or molecules in the surface region. These forces can be of various types: 
covalent, ionic, van der Waals, etc; and there can be a mix of such types of interaction, not only within a given 
bond, but also from bond to bond in the same surface. A surface could, for instance, consist of a bulk material 
with one type of internal bonding (say, ionic). It may be covered with an overlayer of molecules with a 
different type of intramolecular bonding (typically covalent); and the molecules may be held to the substrate 
by yet another form of bond (e.g., van der Waals). 

Strong adsorbate-substrate forces lead to chemisorption, in which a chemical bond is formed. By contrast, 
weak forces result mphysisorption, as one calls non-chemical 'physical' adsorption. 

The balance between these different types of bonds has a strong bearing on the resulting ordering or 
disordering of the surface. For adsorbates, the relative strength of adsorbate-substrate and adsorbate- 
adsorbate interactions is particularly important. When adsorbate-substrate interactions dominate, well ordered 
overlayer structures are induced that are arranged in a superlattice, i.e. a periodicity which is closely related to 
that of the substrate lattice: one then speaks of commensurate overlayers. This results from the tendency for 
each adsorbate to seek out the same type of adsorption site on the surface, which means that all adsorbates 
attempt to bond in the same manner to substrate atoms. 


An example of commensurate overlayers is provided by atomic sulfur chemisorbed on a Ni (100) surface: all 
S atoms tend to adsorb in the fourfold coordinated hollow sites, i.e. each S atom tries to bond to four Ni 
atoms. At typical high coverages and moderate temperatures, this results in an ordered array of S atoms on the 
Ni (100) surface. However, high temperatures will disorder such overlayers; also, this layer may be kinetically 
disordered during its formation, as a result of gradual addition of sulfur atoms before they manage to order. 
The same is often true of molecular adsorption. Although intramolecular bonding can be strong enough to 
keep an adsorbed molecular species intact despite its bonding to the substrate, there is usually only a relatively 
weak mutual interaction among adsorbed molecular species. 

Relatively strong adsorbate-adsorbate interactions have a different effect: the adsorbates attempt to first 
optimize the bonding between them, before trying to satisfy their bonding to the substrate. This typically 
results in close-packed overlayers with an internal periodicity that it is not matched, or at least is poorly 
matched, to the substrate lattice. One thus finds well ordered overlayers whose periodicity is generally not 
closely related to the substrate lattice: this leads 
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to so-called incommensurate overlayers. Such behaviour is best exemplified by very cohesive overlayers like 
graphite sheets or oxide thin films that adopt their own preferred lattice constant regardless of the substrate 
material on which they are adsorbed. 

B1.21.4.2 COVERAGE AND MONOLAYER DEFINITIONS 

It is useful to define the terms coverage and monolayer for adsorbed layers, since different conventions are 
used in the literature. The surface coverage measures the two-dimensional density of adsorbates. The most 
common definition of coverage sets it to be equal to one monolayer (1 ML) when each two-dimensional 
surface unit cell of the unreconstructed substrate is occupied by one adsorbate (the adsorbate may be an atom 
or a molecule). Thus, an overlayer with a coverage of 1 ML has as many atoms (or molecules) as does the 
outermost single atomic layer of the substrate. 

However, many adsorbates cannot reach a coverage of 1 ML as defined in this way: this occurs most clearly 
when the adsorbate is too large to fit in one unit cell of the surface. For example, benzene molecules normally 
lie flat on a metal surface, but the size of the benzene molecule is much larger than typical unit cell areas on 
many metal surfaces. Thus, such an adsorbate will saturate the surface at a lower coverage than 1 ML; 
deposition beyond this coverage can only be achieved by starting the growth of a second layer on top of the 
first layer. 

It is thus tempting to define the first saturated layer as being one monolayer, and this often done, causing 
some confusion. One therefore also often uses terms like saturated monolayer to indicate such a single 
adsorbate layer that has reached its maximal two-dimensional density. Sometimes, however, the word 
'saturated' is omitted from this definition, resulting in a different notion of monolayer and coverage. One way 
to reduce possible confusion is to use, for contrast with the saturated monolayer, the term fractional 
monolayer for the term that refers to the substrate unit cell rather than the adsorbate size as the criterion for 
the monolayer density. 

B1.21.4.3 TWO-DIMENSIONAL CRYSTALLOGRAPHIC NOMENCLATURE 

(A) MILLER INDICES 

Single-crystal surfaces are characterized by a set of Miller indices that indicate the particular crystallographic 
orientation of the surface plane relative to the bulk lattice [5]. Thus, surfaces are labelled in the same way that 
atomic planes are labelled in bulk x-ray crystallography. For example, a Ni (1 1 1) surface has a surface plane 


that is parallel to the (1 1 1) crystallographic plane of bulk nickel. Thus, the Ni (1 1 1) surface exposes a 
hexagonally close-packed layer of atoms, given that nickel has a face-centred close-packed (fee) cubic bulk 
lattice, see figure Bl. 21.1 a). Some authors use the more correct notation {111} instead of (1 1 1), as is 
common in bulk crystallography to emphasize that the (1 1 1) plane is only one of several symmetrically 
equivalent plane orientations, like (1 1 1 ), ( 1 1 1), etc. The {111} notation implicitly includes all such equivalent 
planes. 
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Figure Bl.21.1. Atomic hard-ball models of low-Miller-index bulk-terminated surfaces of simple metals with 
face-centred close-packed (fee), hexagonal close-packed (hep) and body-centred cubic (bec) lattices: (a) fee 

(111)— (1 X 1); (b) fcc(100_)-(l X 1); ( C ) fcc(110)-(l X 1); (d) hcp(0001)-(l X 1); ( e ) hcp(10-10)-(l X 1) ? 

usually written as hcp(10 1 0)-(l x 1); (f) bcc(110)-(l x 1); (g) bcc(100)-(l x 1) and (h) bcc(lll)-(l x 1). The 
atomic spheres are drawn with radii that are smaller than touching-sphere radii, in order to give better depth 
views. The arrows are unit cell vectors. These figures were produced by the software program BALSAC [35], 
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Figure Bl.21.1 shows a number of other clean unreconstructed low-Miller-index surfaces. Most surfaces 
studied in surface science have low Miller indices, like (111), (110) and (100). These planes correspond to 
relatively close-packed surfaces that are atomically rather smooth. With fee materials, the (1 1 1) surface is the 
densest and smoothest, followed by the (100) surface; the (110) surface is somewhat more 'open', in the sense 
that an additional atom with the same or smaller diameter can bond directly to an atom in the second substrate 
layer. For the hexagonal close-packed (hep) materials, the (0001) surface is very similar to the fee (1 1 1) 
surface: the difference only occurs deeper into the surface, namely in the fashion of stacking of the hexagonal 
close-packed monolayers onto each other (ABABAB. . . versus ABC ABC. . ., in the convenient layer- 
stacking notation). The hep (1010) surface resembles the fee (110) surface to some extent, in that it also 


presents open troughs between close-packed rows of atoms, exposing atoms in the second layer. With the 
body-centred cubic (bcc) materials, the (110) surface is the densest and smoothest, followed by the (100) 
surface; in this case, the (1 1 1) surface is rather more open and atomically 'rough'. 

(B) HIGH-MILLER-INDEX OR STEPPED SURFACES 

The atomic structures of high-Miller- index surfaces are composed of terraces, separated by steps, which may 
have kinks in them [5]. Examples are shown in figure B 1.2 1.2 . Thus, the (755) surface of an fee crystal 
consists of (1 1 1) terraces, six atoms deep (from one step to the next), separated by straight steps of (100) 
orientation and of single-atom height. The fee (10,8,7) has 'kinks' in its step edges, i.e. the steps themselves 
are not straight. The steps and kinks provide a degree of roughness that can be very important as sites for 
chemical reactions or for nucleation of crystal growth. 

The step notation [5, 16] compacts the terrace/step information into the general form w(h t k t / t ) x (hjcj^. Here 
(h t k t / t and (/z § k s / ) are the Miller indices of the terrace plane and the step plane, respectively, while w is the 
number of atoms that are counted in the width of the terrace, including the step-edge atom and the in-step 
atom. Thus, the fee (755) surface can be denoted by 6 (1 1 1) x (100), since its terraces are six atoms in depth. 
A kinked surface, like fee (10,8,7), can also be approximately expressed in this form: the step plane (h s k s / § ) 
is a stepped surface itself, and thus has higher Miller indices than the terrace plane. However, the step notation 
does not exactly tell us the relative location of adjacent steps, and it is not entirely clear how the terrace width 
w should be counted. A more complete microfacet notation is available to describe kinked surfaces generally 
[5]. 
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Figure Bl.21.2. Atomic hard-ball models of 'stepped' and 'kinked' high-Miller-index bulk-terminated 
surfaces of simple metals with fee lattices, compared with an fcc(l 1 1) surface: fcc(755) is stepped, while fee 


(10,8,7) and fcc(25,10,7) are 'kinked'. Step-edge atoms are shown singly hatched, while kink atoms are 
shown cross-hatched. 
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(C) SUPERLATTICES 

Many surfaces exhibit a different periodicity than expected from the bulk lattice, as is most readily seen in the 
diffraction patterns of LEED: often additional diffraction features appear which are indicative of a 
superlattice. This corresponds to the formation of a new two-dimensional lattice on the surface, usually with 
some simple relationship to the expected 'ideal' lattice [5]. For instance, a layer of adsorbate atoms may 
occupy only every other equivalent adsorption site on the surface, in both surface dimensions. Such a lattice 
can be labelled (2 x 2): in each surface dimension the repeat distance is doubled relative to the ideal substrate. 
In this example, the unit cell of the original bulk-like surface is magnified by a factor of two in both 
directions, so that the new surface unit cell has dimensions (2 x 2) relative to the original unit cell. For 
instance, an oxygen overlay er on Pt (1 1 1), at a quarter-monolayer coverage, is observed to adopt an ordered 
(2 x 2) superlattice: this can be denoted as Pt (1 1 1) + (2 x 2)-0, which provides a compact description of the 
main crystallographic characteristics of this surface. This particular notation is that of the Surface Structure 
Database [14]; other equivalent notations are also common in the literature, such as Pt (1 1 1)-(2 x 2)-0 or Pt 
(111)2x2-0. 

This (2 x 2) notation can be generalized. First, it can take on the form (m x n), where the numbers m and n are 
two independent stretch factors for the two unit cell vectors. These numbers are often integers, but need not 
be. In addition, this new stretched unit cell can be rotated by any angle about the surface normal: this is 
denoted as (m x n) Ra°, where a is the rotation angle in degrees [5, 17, 18 and 19]; the suffix Ra° is omitted 
when a = 0, as is the case for Pt (1 1 1) + (2 x 2)-0. This Wood notation [5, 19] allows the original unit cell to 
be stretched and rotated; however, it conserves the angle between the two unit cell vectors in the plane of the 
surface, therefore not allowing 'sheared' unit cells. 

As a particular case, a surface may be given the Wood notation (1 x 1) ? as in Ni (1 1 1)-(1 x 1): this notation 
indicates that the two-dimensional unit cell of the surface has the same size as the two-dimensional unit cell of 
the bulk (111) layers. Thus, an ideally terminated bulk lattice without overlay ers or reconstructions will carry 
the label (1 x l). 

The Wood notation can be generalized somewhat further, by adding either the prefix 'c' for centred, or the 
prefix 'p' for primitive. For instance, one may have a c (2 x 2) unit cell or a p(2 x 2) unit cell, the latter often 
abbreviated to (2 x 2) because it is identical to it. In a centred unit cell, the centre of the cell is an exact copy 
of the corners of the cell; this makes the cell non-primitive, i.e. it is no longer the smallest cell that, when 
repeated periodically across the surface, generates the entire surface structure. Nonetheless, the centred 
notation is often used because it can be quite convenient, as the next example will illustrate. 
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The c (2 x 2) unit cell can also be written as (V2 x ^2)RA5°. Here, the original unit vectors of the (1 x 1) 
structure have both been stretched by factors V2 and then rotated by 45°. Thus, sulfur on Ni (100) forms an 
ordered half-monolayer structure that can be labelled as Ni (100) + c (2x 2) -S or, equivalently, Ni (100) + 
(V2 x V2) RA5 °-S. The c (2 x 2) notation is clearly easier to write and also easier to convert into a geometrical 
model of the structure, and hence is the favoured designation. 

A more general notation than Wood's is available for all kinds of unit cells, including those that are sheared, 
so that the superlattice unit cell can take on any shape, size and orientation. It is the matrix notation, defined 


as follows [5]. We connect the unit cell vectors a' and b f of the superlattice to the unit cell vectors a and b of 
the substrate by the general relations 

The coefficients m..,m.~,m~. and m in define the matrix M — ( J which serves to denote the 

11' 12' 21 22 \W2\ MllJ 

superlattice. The (1 x 1), (2 x 2) and c (2 x 2) lattices are then denoted respectively by the matrices 

M = ( l ° Y M = ( 2 ° V nd M " ( ' ~ J This allows the Ni (100) + c (2 x 2) -S structure to be also 

written as f^|( |(H))+ ( |-S* Clearly, this notation is not as intuitive and compact as the c (2 x 2) Wood 

notation. However, when the Wood notation is not capable of a clear and compact notation, use of the matrix 
notation is necessary. Thus, a structure characterized by a matrix like m — ( \could not be described 

in the Wood notation. 

In LEED experiments, the matrix M is determined by visual inspection of the diffraction pattern, thereby 
defining the periodicity of the surface structure: the relationship between surface lattice and diffraction pattern 
will be described in more detail in the next section. 

A superlattice is termed commensurate when all matrix elements m i ■ are integers. If at least one matrix 
element m i • is an irrational number (not a ratio of integers), then the superlattice is termed incommensurate. A 
superlattice can be incommensurate in one surface dimension, while commensurate in the other surface 
dimension, or it could be incommensurate in both surface dimensions. 

A superlattice can be caused by adsorbates adopting a different periodicity than the substrate surface, or also 
by a reconstruction of the clean surface. In figure B 1.2 1.3 several superlattices that are commonly detected on 
low-Miller-index surfaces are shown with their Wood notation. 
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Figure Bl.21.3. 'Direct' lattices (at left) and corresponding reciprocal lattices (at right) of a series of 
commonly occurring two-dimensional superlattices. Black circles correspond to the ideal (1 x 1) surface 
structure, while grey circles represent adatoms in the direct lattice (arbitrarily placed in 'hollow' positions) 
and open diamonds represent fractional-order beams in the reciprocal space. Unit cells in direct space and in 
reciprocal space are outlined. 


B1.21.5 SURFACE DIFFRACTION PATTERN 

The diffraction pattern observed in LEED is one of the most commonly used 'fingerprints' of a surface 
structure. With XRD or other non-electron diffraction methods, there is no convenient detector that images in 
real time the corresponding diffraction pattern. Point-source methods, like PD, do not produce a convenient 
spot pattern, but a diffuse diffraction pattern that does not simply reflect the long-range ordering. 
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So it is essential to relate the LEED pattern to the surface structure itself. As mentioned earlier, the diffraction 
pattern does not indicate relative atomic positions within the structural unit cell, but only the size and shape of 
that unit cell. However, since experiments are mostly performed on surfaces of materials with a known 
crystallographic bulk structure, it is often a good starting point to assume an ideally terminated bulk lattice; 
the actual surface structure will often be related to that ideal structure in a simple manner, e.g. through the 
creation of a superlattice that is directly related to the bulk lattice. 


In this section, we concentrate on the relationship between diffraction pattern and surface lattice [5]. In direct 
analogy with the three-dimensional bulk case, the surface lattice is defined by two vectors a and b parallel to 
the surface (defined already above), subtended by an angle y; a and b together specify one unit cell, as 
illustrated in figure B 1.2 1.4. Within that unit cell atoms are arranged according to a basis, which is the list of 
atomic coordinates within that unit cell; we need not know these positions for the purposes of this discussion. 
Note that this unit cell can be viewed as being infinitely deep in the third dimension (perpendicular to the 
surface), so as to include all atoms below the surface to arbitrary depth. 
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Figure Bl.21.4. 'Direct' lattices (at left) and reciprocal lattices (middle) for the five two-dimensional Bravais 
lattices. The reciprocal lattice corresponds directly to the diffraction pattern observed on a standard LEED 
display. Note that other choices of unit cells are possible: e.g., for hexagonal lattices, one often chooses 
vectors a and b that are subtended by an angle y of 120° rather than 60°. Then the reciprocal unit cell vectors 
also change: in the hexagonal case, the angle between a* and b* becomes 60° rather than 120°. 
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There are several special shapes of the surface lattice, forming the five two-dimensional Bravais lattices 
shown in figure Bl.21.4 . The Bravais lattices form the complete list of possible lattices. They are 
characterized by unit cell vectors of equal length (in the case of the square and hexagonal lattices) and/or a 
subtended angle of 90° or 60° (for the square, rectangular and hexagonal lattices) or by completely general 
values (for the oblique lattice). The rectangular lattice comes in two varieties: primitive and centred. The 
centred lattice has the particularity that its atomic basis is duplicated: each atom is reproduced by 
displacement through the vector 1/2 (a + b). The main value of the centred rectangular lattice is its 
convenience: it is easier to think in terms of the rectangle (with duplicated basis) than to think of the rhombus 
with arbitrary angle y. One could also centre any of the other lattices, but one would only produce another 
instance of a square, rectangular or oblique lattice, i.e. nothing more convenient. 


The diffraction of low-energy electrons (and any other particles, like x-rays and neutrons) is governed by the 
translational symmetry of the surface, i.e. the surface lattice. In particular, the directions of emergence of the 
diffracted beams are determined by conservation of the linear momentum parallel to the surface, hk v Here k 


denotes the wavevector of the incident plane electron wave that represents the incoming electron beam. This 
conservation can occur in two ways. After the diffractive scattering, the parallel component of the momentum 
rtft'|Can be equal to that of the incident electron beam, i.e. hk\= hk\l this corresponds to specular (mirror-like) 

reflection, with equal polar angles of incidence and emergence with respect to the surface normal, and with a 
simple reversal of the perpendicular momentum ftfc^ = —hk±. 

Alternatively, the electron can exchange parallel momentum with the lattice, but only in well defined amounts 
given by vectors "tfthat belong to the reciprocal lattice of the surface. That is, the vector g is a linear 
combination of two reciprocal lattice vectors a* and b * , with integer coefficients. Thus, g = ha* + kb* , 
with arbitrary integers h and k (note that all the vectors a,b ,a*,b* and g are parallel to the surface). The 
reciprocal lattice vectors a * and b * are related to the 'direct-space' lattice vectors a and b through the 
following non-transparent definitions, which also use a vector n that is perpendicular to the surface plane, as 
well as vectorial dot and cross products: 

a * = 2 J-^L-\ and b * = 2 J n * a V 

\a ' {h x n) / \b • (n x a) J 

These two equations are a special case of the corresponding three-dimensional definition, common in XRD, 
with the surface normal n replacing the third lattice vector c. 

Figure B 1.2 1.4 illustrates the 'direct-space' and reciprocal-space lattices for the five two-dimensional Bravais 
lattices allowed at surfaces. It is useful to realize that the vector a * is always perpendicular to the vector b 
and that b * is always perpendicular to a. It is also useful to notice that the length of a * is inversely 
proportional to the length of a, and likewise for b * and b. Thus, a large unit cell in direct space gives a small 
unit cell in reciprocal space, and a wide rectangular unit cell in direct space produces a tall rectangular unit 
cell in reciprocal space. Also, the hexagonal direct-space lattice gives rise to another hexagonal lattice in 
reciprocal space, but rotated by 90° with respect to the direct-space lattice. 
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The reciprocal lattices shown in figure B 1.2 1.3 and figure B 1.2 1.4 correspond directly to the diffraction 
patterns observed in LEED experiments: each reciprocal-lattice vector produces one and only one diffraction 
spot on the LEED display. It is very convenient that the hemispherical geometry of the typical LEED screen 
images the reciprocal lattice without distortion; for instance, for the square lattice one observes a simple 
square array of spots on the LEED display. 

One of the spots in such a diffraction pattern represents the specularly reflected beam, usually labelled (00). 
Each other spot corresponds to another reciprocal-lattice vector g = ha* + kb* and is thus labelled (hk), with 
integer h and k. 

When a superlattice is present, additional spots arise in the diffraction pattern, as shown in figure B 1.2 1.3 in 
terms of the reciprocal lattice: again, each reciprocal lattice point corresponds to a spot in a diffraction pattern. 
This can be easily understood from the fact that a larger unit cell in direct space imposes a smaller unit cell in 
reciprocal space. For instance, a (2 x 1) superlattice has a unit cell doubled in length in one surface direction 
relative to the (1 x 1) lattice, i.e. a is replaced by 2a. According to the above equations, this has no effect on b 
*, but halves a *. This is equivalent to allowing h to be a half- integer in g = ha * + kb *, thus doubling the 
number of spots in the diffraction pattern. These additional spots are therefore often called half-order spots in 
the (2x1) case, or fractional-order spots in the general case. 

With some practice, one can easily recognize specific superlattices from their LEED pattern. Otherwise, one 
can work through the above equations to connect particular superlattices to a given LEED pattern. A number 


of examples are given and discussed in some detail in [5]. A discussion can also be found there of the special 
case of stepped and kinked surfaces. 


B1.21.6 DIFFRACTION PATTERN OF DISORDERED SURFACES 

Many forms of disorder in a surface structure can be recognized in the LEED pattern. The main 
manifestations of disorder are broadening and streaking of diffraction spots and diffuse intensity between 
spots [1]. 

Broadening of spots can result from thermal diffuse scattering and island formation, among other causes. The 
thermal effects arise from the disorder in atomic positions as they vibrate around their equilibrium sites; the 
sites themselves may be perfectly crystalline. 

Islands occur particularly with adsorbates that aggregate into two-dimensional assemblies on a substrate, 
leaving bare substrate patches exposed between these islands. Diffraction spots, especially fractional-order 
spots if the adsorbate forms a superlattice within these islands, acquire a width that depends inversely on the 
average island diameter. If the islands are systematically anisotropic in size, with a long dimension primarily 
in one surface direction, the diffraction spots are also anisotropic, with a small width in that direction. 
Knowing the island size and shape gives valuable information regarding the mechanisms of phase transitions, 
which in turn permit one to learn about the adsorbate-adsorbate interactions. 
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Lattice-gas disorder, in which adatoms occupy a periodic lattice of equivalent sites with a random occupation 
probability, produces diffuse intensity distributions between diffraction spots. For complete disorder, one 
observes such diffuse intensity throughout the diffraction pattern. If there is order in one surface direction, but 
disorder in the other, one observes streaking in the diffraction pattern: the direction of the streaks corresponds 
to the direction in which disorder occurs. In principle, the diffuse intensity distribution can be converted into a 
direct-space distribution, including a pair-correlation function between occupied sites, e.g. by Fourier 
transformation. However, the diffuse intensity is too much affected by other diffraction effects (like multiple 
scattering) to be very useful in this manner. It nonetheless can be interpreted in terms of local structure, i.e. 
bond lengths and angles, by a procedure that is very similar to the multiple-scattering modelling for solving 
structures in full detail [20], 

LEED has found a strong competitor for studying surface disorder: scanning tunnelling microscopy, STM (see 
chapter B 1.20 ). Indeed, STM is the ideal tool for investigating irregularities in periodic surface structures. 
LEED (as any other diffraction method) averages its information content over macroscopic parts of the 
surface, giving only statistical information about disorder. By contrast, STM can provide a direct image of 
individual atoms or defects, enabling the observation of individual atomic behaviour. By observing a 
sufficiently large area, STM can also provide statistical information, if desired. 


B1.21.7 FULL STRUCTURAL DETERMINATION 

In the previous sections we have emphasized the two-dimensional information available through the 
diffraction pattern observed in LEED. But, as mentioned before, one can extract the detailed atomic positions 
as well, including interlayer spacings, bond lengths, bond angles, etc. Here we sketch how this more complete 
structural determination is accomplished. We focus on the case of LEED, since this method has produced by 
far the most structural determinations [5, 17, 18, 21 ]. The procedures employed to analyse PD data are in fact 
very similar to those for LEED, in many details. With XRD, the kinematic (single-scattering) nature of the 


problem makes the analysis simpler, but still considerable for complex structures: there also, a trial-and-error 
search for the solution is common. 

To obtain spacings between atomic layers and bond lengths or angles between atoms, it is necessary to 
measure and analyse the intensity of diffraction spots. This is analogous to measuring the intensity of XRD 
reflections. 

The measurement of LEED spot intensities is nowadays mostly accomplished by digitizing the image 
recorded by a video camera that observes the diffraction pattern, which is visibly displayed on a fluorescent 
screen within an ultra-high vacuum system [22]. The digitized image is then processed by computer to give 
the integrated spot intensity, after removal of the background. This is repeated for different incident electron 
energies. Thereby, the intensity of each spot is obtained as a function of the incident electron energy, resulting 
in an IV curve (intensity-voltage curve) for each spot. Computer codes for this purpose are available, and are 
normally packaged together with the required hardware [23]. The resulting IV curves form the experimental 
database to which theory can fit the atomic structure. It typically takes between minutes and an hour to 
accumulate such a database, once the sample has been prepared. 

Since ED by a surface is a complicated process, there is no routine method available to directly and accurately 
extract atomic positions from the experimental data. Direct holographic methods have been proposed [24], but 
have not yet 
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become routine methods, and in any case they yield only approximate atomic positions (with uncertainties on 
the scale of 0.2-0.5 A) and work only for relatively simple structures; when they do work they have to be 
followed up by refinement using the same trial-and-error approach that we discuss next. 

A detailed structural determination proceeds by modelling the full multiple scattering of the electrons that are 
diffracted through the surface structure. The multiple scattering means that an electron can bounce off a 
succession of atoms in an erratic path before emerging from the surface. Various theoretical and 
computational methods are available to treat this problem to any degree of precision: a compromise between 
precision and computing expense must be struck, with progress moving toward higher precision, even for 
more complex structures. 

The modelling of the multiple scattering requires input of all atomic positions, so that the trial-and-error 
approach must be followed: one guesses reasonable models for the surface structure, and tests them one by 
one until satisfactory agreement with experiment is obtained. For simple structures and in cases where 
structural information is already known from other sources, this process is usually quite quick: only a few 
basic models may have to be checked, e.g. adsorption of an atomic layer in hollow, bridge or top sites at 
positions consistent with reasonable bond lengths. It is then relatively easy to refine the atomic positions 
within the best-fit model, resulting in a complete structural determination. The refinement is normally 
performed by some form of automated steepest-descent optimization, which allows many atomic positions to 
be adjusted simultaneously [ 21 ] Computer codes are also available to accomplish this part of the analysis [25]. 
The trial-and-error search with refinement may take minutes to hours on current workstations or personal 
computers. 

In more complex cases, and when little additional information is available, one must test a larger number of 
possible structural models. The computational time grows rapidly with complexity, so that it may take hours 
to check a single model. More time-consuming, however, is often the human factor in guessing what are 
reasonable models to test. This is a much more difficult problem, which is the issue of finding the 'global 
optimum', not just a 'local optimum'. At present, several approaches to global optimization are being 
examined, such as simulated annealing [26] and genetic algorithms [27]. In any event, these will require 
larger amounts of computer time, since a wide variety of surface models must be tested in such a global 


search. 


B1.21.8 PRESENT CAPABILITIES AND OUTLOOK 

Surface crystallography started in the late 1960s, with the simplest possible structures being solved by LEED 
[14]. Such structures were the clean Ni (1 1 1), Cu(l 11) and Al(l 11) surfaces, which are unreconstructed and 
essentially unrelaxed, i.e. very close to the ideal termination of the bulk shown in figure Bl.21.1 a): typically, 
only one unknown structural parameter was fitted to experiment, namely the spacing between the two 
outermost atomic layers. 

Progress in experiment, theory, computational methods and computer power has contributed to the capability 
to solve increasingly complex structures [28, 29]. Figure B 1.2 1.5 quantifies this progress with three measures 
of complexity, plotted logarithmically: the achievable two-dimensional unit cell size, the achievable number 
of fit parameters and the achievable number of atoms per unit cell per layer: all of these measures have grown 
from 1 for simple clean metal 
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surfaces, like Ni (1 1 1) (see Figure Bl.21.1 (a)), to about 50-100 in the case of the reconstructed Si(l 1 1)-(7 x 
7) surface, the most complicated structure examined to date [30] (note that the basic model which solved the 
Si(l 1 1)-(7 x 7) surface was mainly derived from another diffraction study, using TED [9]). All these 
measures thus exhibit a progression by about two orders of magnitude over less than 25 years. 
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Figure Bl.21.5. Evolution with time of the complexity of structural determination achievable with LEED. 
The unit cell area is measured relative to the unit cell area of the simple (1 x 1) structures studied in the early 

days: thus a(n * n) superstructure (due to reconstruction and/or adsorption) has a unit cell size of n . A (7 x 
7) structure gives a complexity of 49 on this scale. The number of fit parameters measures the number of 
coordinates fitted to experiment in any given structure: a value of 1 was typical of many early determinations, 
when only one interlay er spacing was fitted to experiment. The Si(l 1 1)-(7 x 7) structure has over 100 fit 
parameters, if one allows only those structural changes in the top two double layers and the adatom layer that 
maintain the p3ml symmetry of the substrate. The number of atoms per unit cell refers to so-called composite 
layers, which are groups of closely spaced layers: this number dramatically affects computation time in 
multiple-scattering methods. It has grown from 1 in the simplest structures to about 100 in the Si(l 1 l) x (7 x 7) 
structure. 


Figure B 1.2 1.6 , figure B 1.2 1.7 , figure B 1.2 1.8 and figure B 1.2 1.9 show several of the more complex 
structures solved by LEED in recent years. They exhibit various effects observed at surfaces: 

• clustering of adatoms in Re(0001 ) - (2a/3 x 2a/3) R30°-6S [31], see figure B1 .21 .6 

• hollow-site adsorption and adsorbate-induced relaxations of substrate atoms both in Re(0001)- (2a/3 x 2a/3) 
R30°- 6S [31] and in Mo (1 00) - c (4 x 2) - 3S [32], see figure B1 .21 .7 

• adsorbate-induced reconstruction as well as substitutional adsorption in Cu(100)- (4 x 4) -10 Li [33], see 
figure B1.21.8 note that this is the most complex surface structural determination by LEED to date, involving 
far more adjustable structural parameters than were fitted in the Si(1 1 1) - (7 * 7) structure [30]; 

• compound ionic surface with a large bulk unit cell and very large surface relaxations in Fe 3 4 (1 11) [34], see 
figure B1.21.9 
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Further progress towards solving more complex surface structures is possible. The biggest challenge on the 
computational and theoretical side is the identification of the globally optimum structure. Holographic and 
other methods have not yet provided a convenient way to accomplish this, and would actually fail with 
structures that have the complexity of Cu(100) -(4x4)- lOLi and Si(l 1 1) - (7 x 7). Global-search 
algorithms, like simulated annealing and genetic algorithms, may provide workable, if perhaps not cheap, 
solutions. 
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Figure Bl.21.6. Side and top views of the best-fit structure of the Re(0001)-(2V3 x 2,/3)/?30 -6S surface 
structure (with a half-monolayer coverage of sulfur), as determined by LEED [31]. A (2^3 x 2^/3)^30 unit 

cell is outlined in the top view. Sulfur atoms are drawn as small open circles, Re atoms as large grey circles. 
Sulfur-sulfur distances in a ring of six alternate between 2.95 and 3.32 A, expanded from the unrelaxed 
distance between hollow sites of 2.75 A. Arrows represent lateral relaxations in the topmost metal layer, with 
the scale of displacements indicated by the lone arrow on the right. The bulk interlay er spacing in Re(0001) is 
2.23 A. Shades of grey identify atoms that are equivalent by symmetry in the sulfur and outermost rhenium 
layers. The darkest-grey rhenium atoms forming a triangle within a sulfur ring are pulled out of the surface by 
the adsorbed sulfur, relative to the lighter-grey rhenium atoms in the same layer. 
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Figure Bl.21.7. Top and side views of the best-fit structure of the Mo(100)-c (4 x 2)-3S surface structure 
(with a 3/4-monolayer coverage of sulfur), as determined by LEED [32]. A c (4 x 2) unit cell is outlined in the 
top view. The sulfur sizes (small black and dark grey circles) have been reduced from covalent for clarity, 
while the molybdenum atoms (large circles) are drawn with touching radii. The same cross-hatching has been 
assigned to molybdenum atoms that are equivalent by symmetry in the topmost two metal layers. Two-thirds 
of the sulfur atoms are displaced away from the centre of the hollow sites in which they are bonded: these 
displacements by 0.13 A are drawn exaggerated. Arrows in the top view also indicate the directions and 
relative magnitudes of molybdenum atom displacements (these substrate atoms are drawn in their undisplaced 
positions, except for the buckling seen in the second molybdenum layer in the side view). The bulk interlayer 
spacing in Mo(100) is 1.575 A. 


-27- 







CiKIO0H4s4>-IOLi 

Figure Bl.21.8. Perspective view of the structure of the Cu(100)-(4 x 4)-10Li surface structure (with a 
10/16-monolayer coverage of lithium), as determined by LEED [33]. The atoms are drawn with radii that are 
reduced by about 15% from covalent radii. The surface fragment shown includes four (4 x 4) unit cells. 
Lithium atoms are shown as larger spheres. In each unit cell, four lithium atoms (dark grey) form a flat-topped 
pyramid (as outlined): the lithium atoms rest in hollow sites on a 3 x 3 base of nine Cu atoms (lighter grey). 
Around each pyramid 12 lithium atoms occupy substitutional sites, i.e. have taken the place of Cu atoms: 
these lithium atoms are shown linked by an octagon. Since the lithium atoms are about 15% larger than the 
copper atoms that they replace, fewer lithium atoms can fit in the troughs evacuated by the copper atoms; 
thus, they do not fill the troughs completely, and leave a hole at each intersection between troughs (e.g. at the 
exact centre of the fragment). The lightest-grey atoms underneath are the bulk Cu(100) termination: some 
small local distortions in the atomic positions are also detected by LEED there. This and the following figure 
were produced with the SARCH/LATUSE/PLOT3D/BALSAC software, available from the author. 
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Side view 
Fe 3 4 (l I IMlxl ) (magnetite) 

Figure Bl.21.9. Perspective side and top view of the best-fit structure of Fe 3 4 (l 1 1), as determined by LEED 
[34]. A unit cell is outlined in the top view, in which all atoms are drawn with nearly touching radii, while 
smaller radii are used in the side view. This iron oxide was grown as an ultrathin film on a Pt(l 1 1) substrate, 
in order to prevent electrical charging of the surface. The free surface is at the top end of the side view, 
exposing 1/4 monolayer of 'external' iron ions (shown as small light-grey circles in both views). Large circles 
represent oxygen ions, forming hexagonally close-packed layers. In each such layer, one-fourth of the oxygen 
ions (drawn in darkest grey) is not coplanar with the others: in particular, in the outermost oxygen layer, these 
ions are raised outward by a large amount (0.42 A, compared to 0.04 A in the opposite direction in the bulk). 
Small circles below the surface represent iron ions in tetrahedral or octahedral interstitial positions between 
the O layers: the lightest-grey of these are in tetrahedral positions. Interlayer spacings as determined by LEED 
are given at the right with error bars and with corresponding bulk values in parentheses. 
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On the experimental side, a larger measured database is required than is commonly available to determine the 
large number of structural parameters to be fitted. For instance, LEED calculations for the Si(l 1 1) — (7 x 7) 
surface have been attempted to fit the many tens of unknown structural parameters; however, the amount of 
experimental data was insufficient for the task, resulting in a multitude of locally-optimum structures, without 
the ability to discriminate between them. Increasing the database size can be achieved by extending the energy 
range to higher energies, or by acquiring data at a number of different incidence directions: either way, the 
calculations become disproportionately more time-consuming, because the computing effort rises quickly 


with energy and non-symmetrical off-normal incidence directions. 
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B1.22 Surface characterization and structural 
determination: optical methods 

Francisco Zaera 


B1.22.1 INTRODUCTION 

As discussed in more detail elsewhere in this encyclopaedia, many optical spectroscopic methods have been 
developed over the last century for the characterization of bulk materials. In general, optical spectroscopies 
make use of the interaction of electromagnetic radiation with matter to extract molecular parameters from the 
substances being studied. The methods employed usually rely on the examination of the radiation absorbed, 


emitted or scattered by a system, and may be based on simple linear optical processes, resonance transitions 
and/or nonlinear processes. Molecular spectroscopy probes energy transitions at all levels, from the excitation 
of spins in the radiofrequency range (NMR), to rotational (microwave), vibrational (infrared) and electronic 
valence (visible-UV) and core (x-rays) excitations. Additional diffraction- and polarization-based techniques 
provide structural information and laser-based pump-probe methods allow for the study of molecular 
dynamics down to the femtosecond time scale. 

In spite of the wide range of applications of optical techniques for the study of bulk samples, however, they 
have so far found only limited use in the characterization of surfaces. One of the main reasons for this is the 
fact that it is quite difficult to discriminate optical signals originating from the surface from those arriving 
from the bulk of a given material. To illustrate this problem, imagine a typical metal sample consisting of a 
cube one centimetre long on each side. At a density of approximately 10 g cm -3 , this represents about 0.1 
moles (for molybdenum, to pick an example), or approximately 6 x 10 22 atoms, and of those about 1 x 10 16 , 
that is, only one in six million atoms, are on the surface of the cube. This means that if one wants to 
selectively characterize a surface phenomenon, one would need to develop a technique with a large dynamic 
range (of at least seven orders of magnitude) and/or the ability to discriminate between signals from surface 
and bulk elements. 

Moreover, with the advent of relatively cheap vacuum technologies over the past decades, physicists have 
been able to develop a large number of alternative particle-based (electrons, ions, atoms) techniques capable 
of selectively probing solid surfaces. Most particles interact strongly with matter, and therefore cannot 
penetrate deeply into the substance being probed. Consequently, whatever information can be obtained from 
the interactions of those particles with solid samples, it must be related to the properties of the surface. The 
same argument does not work as well with optical techniques, because photons penetrate through most 
substances to depths comparable to their wavelength, microns in the case of IR radiation. In order to 
overcome this difficulty, alternative ways have been devised to gain surface sensitivity with optical 
spectroscopies. Among them are the following: 

(1) Increasing the surface-to-bulk ratio of the sample to be studied. This is easily done in the case of highly 
porous materials, and has been exploited for the characterization of supported catalysts, zeolites, sol-gels 
and porous silicon, to mention a few. 

(2) Taking advantage of the intrinsic physical and chemical differences of surfaces introduced by the 
discontinuity of the bulk environment. Specifically, most solids display specific structural relaxations 
and reconstructions, surface 


phonons and surface electronic states easy to discriminate from those of the bulk. A clearer surface 
specificity is introduced in the study of adsorbates by the uniqueness of the molecules present at the 
interface. 

(3) Taking advantage of the symmetry changes induced by the presence of a surface. Many nonlinear 
techniques rely on the fact that the surface breaks the centrosymmetrical nature of the bulk. The use of 
polarized light can also discriminate among dipole moments in different orientations. 

(4) Illuminating the sample at grazing angles. The penetration depth of photons depends on the cosine of the 
incidence angle and, therefore, can be reduced by this procedure. Although such an approach has limited 
use, it has been successfully employed in a few instances, such as for x-ray diffraction experiments. 

The power of optical spectroscopies is that they are often much better developed than their electron-, ion- and 
atom-based counterparts, and therefore provide results that are easier to interpret. Furthermore, photon-based 
techniques are uniquely poised to help in the characterization of liquid-liquid, liquid-solid and even solid- 
solid interfaces generally inaccessible by other means. There has certainly been a renewed interest in the use 
of optical spectroscopies for the study of more 'realistic' systems such as catalysts, adsorbates, emulsions, 
surfactants, self-assembled layers, etc. 


In this chapter we review some of the most important developments in recent years in connection with the use 
of optical techniques for the characterization of surfaces. We start with an overview of the different 
approaches available to the use of IR spectroscopy. Next, we briefly introduce some new optical 
characterization methods that rely on the use of lasers, including nonlinear spectroscopies. The following 
section addresses the use of x-rays for diffraction studies aimed at structural determinations. Lastly, passing 
reference is made to other optical techniques such as ellipsometry and NMR, and to spectroscopies that only 
partly depend on photons. 


B1.22.2 IR SPECTROSCOPY 

Perhaps the optical technique most used for surface characterization has been infrared (IR) spectroscopy. The 
reason for this may very well be because the vibrational modes identified by the interaction of IR radiation 
with matter are among the most specific and thus the most informative for chemical characterization. Not only 
can vibrational frequencies be easily identified with specific localized vibrational groups within a molecule 
(metal-adsorbate vibrations, O-H stretches, C-C-C deformation modes, etc), but they also depend strongly 
on the local environment in which the probed moiety is placed [1, 2]. The use of IR spectroscopy was greatly 
enhanced by the development of Fourier-transform (FTIR) spectrometers in the early 1970s, an event that 
brought about an enormous improvement in performance in terms of sensitivity, acquisition time, dynamic 
range and ease of data processing (spectra ratioing in particular) over the conventional scanning apparatus; 
this made the extension of IR spectroscopy to difficult systems quite feasible. The several experimental 
approaches pursued for the implementation of IR spectroscopy in surface studies include straight 
transmission, diffuse reflectance, reflection-absorption, attenuated total reflectance and emission. Each of 
these is discussed in some detail below. 


B1. 22.2.1 TRANSMISSION IR SPECTROSCOPY 

The most common use of spectroscopy in general is in its transmission mode, and this was the first method 
employed for surface characterization as well. The pioneering work of Terenin et al on porous glasses [3] and 
of Eischens and others on chemisorption over supported metals [4] has already been reviewed in the past [5, 
6]. Extensive studies have been carried out since on the characterization of catalysts upon chemisorption of 
many reactants, from simple molecules such as carbon and nitrogen oxides to hydrocarbons and other 
complex species [7, 8]. In a recent use of transmission IR absorption to surface problems, the reactivity of 
silicon towards water and other gases was addressed by first creating highly reproducible porous surfaces by 
the controlled etching of silicon single crystals [9]. Unfortunately, the general application of transmission IR 
spectroscopy to surface studies faces some significant limitations, in particular the need for high-surface-area 
solids (which usually have quite heterogeneous and ill characterized surfaces) and the restricted range of 
frequencies available away from the regions where the solid absorbs (above 1300 cm for silica, 1050 cm 
for alumina, 1200 cm for titania, 800 cm for magnesia). 

B1. 22.2.2 DIFFUSE-REFLECTANCE IR SPECTROSCOPY 

Another useful technique for the IR characterization of surfaces in powders is diffuse-reflectance IR 
spectroscopy (DRIFTS) [10]. In the past, the challenge in using this approach has been in the development of 
efficient optics to collect the diffuse reflected radiation from the sample once illuminated with a focused IR 
beam, but nowadays this problem has been solved, and several cell designs are available commercially for this 
endeavour [11]. DRIFTS has, in theory, several advantages over conventional transmission arrangements. 
First, loose powders can be used without the need to press them into pellets, thus avoiding any sample 
distortions due to severe physical treatments, allowing for better exposures of the surface to adsorbates, and 


avoiding losses in the high-frequency range due to light scattering. Second, band intensities in the DRIFTS 
mode can be as much as four times more intense than in the transmission mode, possibly because of the 
potential multiple internal reflection of the light in the vicinity of the surface before its emergence towards the 
detector. Lastly, DRIFTS is better for opaque samples than transmission IR spectroscopy, although the diffuse 
reflectance may still be low in spectral regions where the absorptivity of the substrate is high. On the negative 
side, there is a potential lack of reproducibility in the intensities of the DRIFTS bands because of variations in 
scattering coefficients with cell geometry and sample-loading procedure. Furthermore, diffuse-reflectance 
spectroscopy suffers from the same key limitation in transmission IR spectroscopy, namely, it requires high- 
surface-area samples, and therefore provides average spectra only from many types of surface local ensembles 
and adsorption sites. 

The use of DRIFTS for the characterization of surfaces has to date been limited, but has recently been used 
for applications in fields as diverse as sensors development [12], soils science [13], forensic chemistry [14], 
corrosion [15], wood science [16] and art [17]. Given that there is in general no reason for preferring 
transmission over diffuse reflectance in the study of high-area powder systems, DRIFTS is likely to become 
much more popular in the near future. 


B1. 22.2.3 REFLECTION-ABSORPTION IR SPECTROSCOPY 

The best way to perform IR spectroscopy studies on small samples is in the reflection-absorption (RAIRS) or 
attenuated total-reflectance (ATR) modes, which work best for opaque and transparent substrates, 
respectively. RAIRS has in fact become the method of choice for the study of adsorbates on well 
characterized metal samples, including single crystals. The first attempt to obtain spectra from adlayers on 
bulk metal samples was that of Pickering and Eckstrom, who in 1959 looked at the adsorption of carbon 
monoxide and hydrogen on metal films by using a multiple reflection technique with an incoming beam at 
close to normal incidence to the surface [18]. It soon became clear that better spectra could be obtained by 
using glancing incidence angles instead [19], and that the gain from using multiple reflection was not worth 
the complications connected with the required experimental set-up (the optimum number of reflections 
usually varies between 3 and 10, and results in signal intensity increases of only about 30-50% compared to 
those from single reflection) [20]. The theory for IR radiation reflection at metal surfaces was later developed 
by Greenler, who proved that only the p-polarized component of the incident beam is capable of strong 
interaction with adsorbates on metals, and that interference between that component of the incident and 
reflected rays sets an intense standing field at the surface which can yield an intensity enhancement of a factor 
of up to 25 compared to that from the perpendicularly polarized photons [21]. Many surface scientists have 
since taken advantage of these properties to perform reflection-absorption measurements of monolayers on 
solid metals [22, 23 and 24]. Even though the initial RAIRS experiments were carried out with molecules with 
large dynamic moments such as CO (in order to take advantage of their large absorption cross sections), 
recent FTIR developments have led to the possibility of detecting submonolayer quantities of species like 
hydrocarbons with much weaker signals on single crystals of less than 1 cm area [25]. 

A recent example of the usefulness of RAIRS for the characterization of supported catalyst surfaces is given 
in figure B 1.22.1 which displays spectra obtained for a mixture of carbon and nitrogen monoxides coadsorbed 
on different palladium surfaces [26]. Both CO and NO stretching frequencies are quite sensitive to their 
adsorption sites, so they can be used to probe local surface sites by determining adsorption geometries in an 
analogous way as in organometallic discrete complexes. In this example, signals can be easily seen for the 
two-fold coordination of both CO and NO on Pd(100) surfaces and for three-fold, bridge and atop 
coordination of CO on Pd(l 11). The peaks from adsorption on single crystals are used as signatures for the 
different planes in palladium particles, so an estimate can be obtained on the relative abundance of (100) 
versus (111) sites available on the supported-metal system. 


On metals in particular, the dependence of the radiation absorption by surface species on the orientation of the 
electrical vector can be fully exploited by using one of the several polarization techniques developed over the 
past few decades [27, 28, 29 and 30]. The idea behind all those approaches is to acquire the p-to-s polarized 
light intensity ratio during each single IR interferometer scan; since the adsorbate only absorbs the p-polarized 
component, that spectral ratio provides absorbance information for the surface species exclusively. 
Polarization-modulation methods provide the added advantage of being able to discriminate between the 
signals due to adsorbates and those from gas or liquid molecules. Thanks to this, RAIRS data on species 
chemisorbed on metals have been successfully acquired in situ under catalytic conditions [31], and even in 
electrochemical cells [32]. 
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Figure Bl.22.1. Reflection-absorption IR spectra (RAIRS) from palladium flat surfaces in the presence of a 1 
x jq-6 j orr j. j no:CO mixture at 200 K. Data are shown here for three different surfaces, namely, for Pd 
(100) (bottom) and Pd(l 11) (middle) single crystals and for palladium particles (about 500 A in diameter) 
deposited on a 100 A thick Si0 2 film grown on top of a Mo(l 10) single crystal. These experiments illustrate 
how RAIRS titration experiments can be used for the identification of specific surface sites in supported 
catalysts. On Pd(100) CO and NO each adsorbs on twofold sites, as indicated by their stretching bands at 
about 1970 and 1670 cm , respectively. On Pd(l 1 1), on the other hand, the main IR peaks are seen around 
1745 cm -1 for NO (on-top adsorption) and about 1915 cm -1 for CO (threefold coordination). Using those two 
spectra as references, the data from the supported Pd system can be analysed to obtain estimates of the relative 
fractions of (100) and (111) planes exposed in the metal particles [26], 


The polarization dependence of the photon absorbance in metal surface systems also brings about the so- 
called surface selection rule, which states that only vibrational modes with dynamic moments having 
components perpendicular to the surface plane can be detected by RAIRS [22, 23 and 24]. This rule may in 
some instances limit the usefulness of the reflection technique for adsorbate identification because of the 
reduction in the number of modes visible in the IR spectra, but more often becomes an advantage thanks to 
the simplification of the data. Furthermore, the relative intensities of different vibrational modes can be used 
to estimate the orientation of the surface moieties. This has been particularly useful in the study of self- 


assembled and Langmuir-Blodgett monolayers, where RAIRS data have been unique in providing 
information on the orientation of the hydrocarbon chains [33]. Figure B 1.22. 2 shows an example in which 
RAIRS was used to determine a collective change in adsorption geometry for alkyl halides on metal single 
crystals as the surface coverage is increased past the half-monolayer [34]. 
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Figure Bl.22.2. RAIRS data from molecular ethyl bromide adsorbed on a Pt(l 1 1) surface at 100 K. The two 
traces shown, which correspond to coverages of 20% and 100% saturation, illustrate the use of the RAIRS 
surface selection rule for the determination of adsorption geometries. Only one peak, but a different one, is 
observed in each case: while the signal detected at low coverages is due to the asymmetric deformation of the 

terminal methyl group (1431 cm -1 ), the feature corresponding to the symmetric deformation (1375 cm -1 ) is 
the one seen at high coverages instead. Given that only vibrations with dynamic dipole moments 
perpendicular to the surface are visible with this technique, it is concluded that a flat adsorption geometry 
prevails at low coverages but that a collective rearrangement of the adsorbates to a standing-up configuration 
takes place at about half-saturation [34]. 


The use of RAIRS has recently been extended from its regular mid-IR characterization of adsorbates on 
metals into other exciting and promising directions. For one, changes in optics and detectors have allowed for 
an extension of the spectral range towards the far-IR region in order to probe substrate-adsorbate vibrations 
[35]. The use of intense synchrotron sources in particular looks quite promising for the detection of such weak 
modes [36]. Thanks to the speed with which Fourier-transform spectrometers can acquire complete IR 
spectra, kinetic studies of surface reactions can be carried out as well. To date this has only been done in a few 
cases, usually for reactions that take seconds or more to occur [37], but the advent of step scanners promises 

the availability of time resolutions of 10 -8 s or better in the near future [38]. In terms of the lifetime of the 
vibrational excitations themselves, this can in some instances be estimated from IR absorption line shapes. 
Because of the efficient coupling between the vibrations of adsorbate and phonons and other electronic 
surface states, the former are generally short-lived, and therefore yield IR absorption bands several 

wavenumbers wide. Nevertheless, bands as narrow as 0.7 cm -1 have been observed in some cases [39], 
Finally, the use of RAIRS is not limited to metal surfaces. Although the surface selection rules change 


significantly for non-metal surfaces, they can still be used to obtain orientational information for adsorbates 
on transparent substrates, as recently demonstrated in the elegant study by Hoffmann et al on the adsorption 
of long-chain hydrocarbons on silicon (figure B 1.22. 3) [40], and even for the analysis of air-liquid interfaces 
[41]. There are many clear new directions still unexplored for the use of RAIRS in surface characterization 
studies. 
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Figure Bl.22.3. RAIRS data in the C-H stretching region from two different self-assembled monolayers, 
namely, from a monolayer of dioctadecyldisulfide (ODS) on gold (bottom), and from a monolayer of 
octadecyltrichlorosilane (OTS) on silicon (top). Although the RAIRS surface selection rules for non-metallic 
substrates are more complex than those which apply to metals, they can still be used to determine adsorption 
geometries. The spectra shown here were, in fact, analysed to yield the tilt (a) and twist (P) angles of the 
molecular chains in each case with respect to the surface plane (the resulting values are also given in the 
figure) [40]. 

B1. 22.2.4 ATTENUATED TOTAL REFLECTANCE IR SPECTROSCOPY 


In 1960, Harrick demonstrated that, for transparent substrates, absorption spectra of adsorbed layers could be 
obtained using internal reflection [42]. By cutting the sample in a specific trapezoidal shape, the IR beam can 
be made to enter through one end, bounce internally a number of times from the flat parallel edges, and exit 
the other end without any losses, leading to high adsorption coefficients for the species adsorbed on the 
external surfaces of the plate (higher than in the case of external reflection) [24]. This is the basis for the ATR 
technique. 


In recent years, ATR has been used primarily in connection with the characterization of semiconductor 
surfaces. For instance, ATR studies have led to the detailed mapping of the complex series of reconstructions 
that silicon surfaces follow upon thermal treatment and/or hydrogen exposures [43]. Surface electronic 
excitations have been studied with this technique as well; see, for instance, the pioneering work of McCombe 
et al on the characterization of inter-subband optical transitions in silicon MOS field-effect transistors (figure 
Bl.22.4) [44]. One interesting additional extension of the use of multiple internal reflection to the 
characterization of non-transparent samples was discussed by Bermudez, who suggested that the sensitivity to 
adsorbates in IR-reflection spectroscopy can be enhanced by burying a metal layer beneath the surface of a 
dielectric material [45]. 
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Figure Bl.22.4. Differential IR absorption spectra from a metal-oxide silicon field-effect transistor 
(MOSFET) as a function of gate voltage (or inversion layer density, n^ which is the parameter reported in the 
figure). Clear peaks are seen in these spectra for the 0-1, 0-2 and 0-3 inter-electric-field subband transitions 
that develop for charge carriers when confined to a narrow (<100 A) region near the oxide-semiconductor 
interface. The inset shows a schematic representation of the attenuated total reflection (ATR) arrangement 
used in these experiments. These data provide an example of the use of ATR IR spectroscopy for the probing 
of electronic states in semiconductor surfaces [44]. 


B1.22.2.5 OTHER SURFACE IR SPECTROSCOPY ARRANGEMENTS 


There have been a few other experimental set-ups developed for the IR characterization of surfaces. 
Photoacoustic (PAS), or, more generally, photothermal IR spectroscopy relies on temperature fluctuations 
caused by irradiating the sample with a modulated monochromatic beam: the acoustic pressure wave created 
in the gas layer adjacent to the solid by the adsorption of light is measured as a function of photon wavelength 


in order to determine the absorption spectra [11]. It has sometimes been thought that PAS is more surface 
sensitive than DRIFTS, but in fact that depends on the specific optical and thermal properties of the material 
being studied. In emission spectrometry (EMS), the IR radiation emitted by the sample is directly collected 
and analysed. The detection of the (non-monochromatized) IR radiation from thin films has recently being 
combined with molecular beam techniques in order to perform differential microcalorimetric measurements 
on adsorption processes [46], Finally, the sample itself can be used as the detector of IR radiation. None of 
these techniques have found much use in surface studies to date. 


B1.22.3 LASER-BASED SPECTROSCOPIES 

Although the development of a large variety of lasers with different spectral ranges, intensities and temporal 
resolutions has led to the surge of many new optical characterization techniques, most of those have yet to 
make a large impact in surface science. As discussed above, the signals from surfaces are often weak and hard 
to differentiate from those from the bulk, and this is particularly troublesome in nonlinear techniques which 
rely on the absorption of more than one photon. Furthermore, the increase of the laser power to levels where 
signal intensities are no longer an issue may lead to damaging of the substrate. In spite of these limitations, 
some laser-based methods have already been developed for surface-characterization studies. 

B1. 22.3.1 RAMAN SPECTROSCOPY 

Perhaps the best known and most used optical spectroscopy which relies on the use of lasers is Raman 
spectroscopy. Because Raman spectroscopy is based on the inelastic scattering of photons, the signals are 
usually weak, and are often masked by fluorescence and/or Rayleigh scattering processes. The interest in 
using Raman for the vibrational characterization of surfaces arises from the fact that the technique can be used 
in situ under non- vacuum environments, and also because it follows selection rules that complement those of 
IR spectroscopy. 

Regular Raman has been employed mainly for the characterization of high- surface-area solids [47]. 
Specifically, a good methodology has been developed for the determination of bond orders, bond lengths and 
local geometries in many metal oxides used for catalysis. Figure B 1.22. 5 illustrates this point by displaying 
some examples where the Raman vibrational signals for metal-oxygen single and double bonds as well as for 
oxygen-metal-oxygen deformations were used to determine the structure of a number of supported and 
highly dispersed transition-metal oxides [48]. 
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Figure Bl.22.5. In situ Raman spectra from a family of transition-metal oxides dispersed on high-surface-area 
alumina substrates. Three distinct regions can be differentiated in these spectra, namely, the peaks around 
1000 cm -1 , which are assigned to the stretching frequency of terminal metal-oxygen double bonds, the 
features about 900 cm -1 , corresponding to metal-oxygen stretches in tetrahedral coordination sites, and the 
low-frequency (<400 cm -1 ) range associated with oxygen-metal-oxygen deformation modes. Data such as 
these can be used to determine the nature and geometry of supported oxides as a function of metal loading and 
subsequent treatment [48], 

In an interesting development in Raman spectroscopy, Fleischmann et al noticed in 1974 that there is a 
significant enhancement in the Raman signal intensities from solid surfaces if the substrate is comprised of 
small silver particles [49]. The same phenomenon has since been observed with copper, silver, gold, lithium, 
sodium, potassium, indium, platinum and rhodium, and has become the basis for surface-enhanced Raman 
spectroscopy (SERS) [50, 51 ]. The reasons for this enhancement are still not completely clear, but have been 
recognized to be the result of a combination of effects, including a surface electromagnetic field enhancement 
(in particular when illuminating rough samples with photons of energies near those of localized plasmons) and 
a chemical enhancement due to the change of polarizability in molecules when interacting with surfaces [52], 

Since its initial development, SERS has been used for the surface characterization of a good number of 
systems. One important extension to the use of SERS has been in the determination of surface geometries. 
Figure B 1.22. 6 shows 
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an example of the SERS C-H stretching frequency data used to determine the different chemisorption 
geometries of 2-butanol and 2-butanethiol on silver electrodes [53]. Notice that, being an optical technique, 
SERS works quite well in solid-liquid interfaces. On the other hand, the need for signal enhancement 
normally limits the use of SERS to a handful of metals and/or to samples with rough surfaces. 
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Figure Bl.22.6. Raman spectra in the C-H stretching region from 2-butanol (left frame) and 2-butanethiol 
(right), each either as bulk liquid (top traces) or adsorbed on a rough silver electrode surface (bottom). An 
analysis of the relative intensities of the different vibrational modes led to the proposed adsorption structures 
depicted in the corresponding panels [53]. This example illustrates the usefulness of Raman spectroscopy for 
the determination of adsorption geometries, but also points to its main limitation, namely the need to use 
rough silver surfaces to achieve adequate signal-to-noise levels. 

Another recent development in the use of Raman spectroscopy for the characterization of surfaces has been 
the employment of UV light for the initial excitation of the sample [54]. The advantage of UV over 
conventional Raman spectroscopy is twofold: (1) since the normal Raman scattering cross sections are 
proportional to the fourth power of the scattered light frequency, the use of higher-energy photons 
significantly increases the signal intensity and (2) by using UV light the spectral range is moved away from 
that where fluorescence de-excitation is observed. This allows for the Raman characterization of virtually any 
high-surface-area sample, including opaque solids such as black carbon. Unfortunately, UV-Raman 
spectroscopy is still not commercially available. 

B1. 22.3.2 OTHER NONLINEAR OPTICAL TECHNIQUES 

Other nonlinear optical spectroscopies have gained much prominence in recent years. Two techniques in 
particular have become quite popular among surface scientists, namely, second harmonic (SHG) [55] and 
sum-frequency (SFG) [56] generation. The reason why both SHG and SFG can probe interfaces selectively 
without being overwhelmed by the signal from the bulk is that they rely on second-order processes that are 
electric-dipole forbidden in centrosymmetric media; by breaking the bulk symmetry, the surface places the 
molecular species in an environment where their second-order nonlinear susceptibility, the term responsible 
for the absorption of SHG and SFG signals, becomes non-zero. 
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In SHG the sample is illuminated with light of a single colour and the component at twice the initial frequency 
is filtered from the emitted light and analysed. Because these experiments usually involve near-IR or visible- 
UV photons, SHG most often probes electronic transitions. In fact, SHG has often been used as a way to 
measure changes in work function or localized electrostatic surface potentials. When using polarized light, 
SHG can also be used to determine the geometrical alignment of polar molecules at interfaces and, by 
sweeping the incident photon energy, spectroscopic information can be obtained on molecular orbital energies 
as well. Figure Bl.22.7 shows an example of the latter for the case of rhodamine 6G [55]. This figure also 
shows a clever extension of the technique as a microscope to provide spatial information on adsorbates. 
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Figure Bl.22.7. Left: resonant second-harmonic generation (SHG) spectrum from rhodamine 6G. The inset 
displays the resonant electronic transition induced by the two-photon absorption process at a wavelength of 
approximately 350 nm. Right: spatially resolved image of a laser-ablated hole in a rhodamine 6G dye 
monolayer on fused quartz, mapped by recording the SHG signal as a function of position in the film [55]. 
SHG can be used not only for the characterization of electronic transitions within a given substance, but also 
as a microscopy tool. 

By combining two beams on the surface, one of visible or IR fixed frequency and a second, of variable energy 
in the IR region, resonance absorption can be measured by detecting the intensity of the outgoing light 
resulting from the addition of the two incident beams as a function of the photon energy of the variable laser. 
The net effect of this SFG is the acquisition of vibrational absorption spectra for surface species where signals 
are seen only for the modes active in both IR and Raman. Vibrational information can be obtained with SFG 
for almost any interface as long as lasers are available to cover the frequency range of interest and the bulk 
materials are transparent to the laser light. Also, as with many of the other techniques described above, 
orientational information can be obtained with SFG as well. Figure B 1.22. 8 displays data demonstrating that 
an increase in the concentration of acetonitrile dissolved in water leads to a collective molecular orientation 
change at the air/water interface [57], 
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Figure Bl.22.8. Sum-frequency generation (SFG) spectra in the C=N stretching region from the air/aqueous 
acetonitrile interfaces of two solutions with different concentrations. The solid curve is the IR transmission 
spectrum of neat bulk CH^CN, provided here for reference. The polar acetonitrile molecules adopt a specific 
orientation in the air/water interface with a tilt angle that changes with changing concentration, from 40° from 
the surface normal in dilute solutions (molar fractions less than 0.07) to 70° at higher concentrations. This 
change is manifested here by the shift in the C=N stretching frequency seen by SFG [57], SFG is one of the 
very few techniques capable of probing liquid/gas, liquid/liquid, and even liquid/solid interfaces. 

There are a few other surface-sensitive characterization techniques that also rely on the use of lasers. For 
instance surface-plasmon resonance (SPR) measurements have been used to follow changes in surface optical 
properties as a function of time as the sample is modified by, for instance, adsorption processes [58]. SPR has 
proven useful to image adsorption patterns on surfaces as well [59], 
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B1.22.3.3 TIME-RESOLVED PUMP-PROBE EXPERIMENTS 


The dynamics of fast processes such as electron and energy transfers and vibrational and electronic de- 
excitations can be probed by using short-pulsed lasers. The experimental developments that have made 
possible the direct probing of molecular dissociation steps and other ultrafast processes in real time (in the 
femtosecond time range) have, in a few cases, been extended to the study of surface phenomena. For instance, 
two-photon photoemission has been used to study the dynamics of electrons at interfaces [60]. Vibrational 
relaxation times have also been measured for a number of modes such as the O-H stretching in silica and the 
C-0 stretching in carbon monoxide adsorbed on transition metals [61]. Pump-probe laser experiments such as 
these are difficult, but the field is still in its infancy, and much is expected in this direction in the near future. 


B1.22.4 X-RAY DIFFRACTION AND X-RAY ABSORPTION 

Because x-rays are particularly penetrating, they are very useful in probing solids, but are not as well suited 
for the analysis of surfaces. X-ray diffraction (XRD) methods are nevertheless used routinely in the 
characterization of powders and of supported catalysts to extract information about the degree of crystallinity 
and the nature and crystallographic phases of oxides, nitrides and carbides [62, 63]. Particle size and 
dispersion data are often acquired with XRD as well. 

One way to obtain surface sensitivity in XRD experiments with crystalline samples is to illuminate the 
substrate at glancing angles. Under normal conditions, x-rays projected onto the sample at incident angles of 
less than 10° still penetrate to a depth of 10 |um or more but, beyond a critical angle, x-ray photons are 
completely reflected from the surface and light propagation into the solid is via a rapidly attenuating 
evanescent wave, and this renders the x-ray probe quite surface sensitive [64]. There are several inherent 
difficulties in implementing grazing-angle XRD experiments, which require focusing high-intensity x-rays 
onto surfaces at angles of the order of 0.1°, but recent experiments have proved the usefulness of this 
technique in providing interesting information on the structure of robust surfaces such as oxides, nitrides, 
silicides and other thin films. 

A related technique that also relies on the interference of x-rays for solid characterization is extended x-ray 
absorption fine structure (EXAFS) [ 65 , 66]. Because the basis for EXAFS is the interference of outgoing 
photoelectrons with their scattered waves from nearby atoms, it does not require long-range order to work (as 
opposed to diffraction techniques), and provides information about the local geometry around specific atomic 
centres. Unfortunately, EXAFS requires the high-intensity and tunable photon sources typically available only 
at synchrotron facilities. Further limitations to the development of surface-sensitive EXAFS (SEXAFS) have 
come from the fact that it requires technology entirely different from that of regular EXAFS, involving in 
many cases ultrahigh- vacuum environments and/or photoelectron detection. One interesting advance in 
SEXAFS came with the design by Stohr et al of fluorescence detectors for the x-rays absorbed by the surface 
species of small samples; that allows for the characterization of well defined systems such as single crystals 
under non- vacuum conditions [67]. Figure B 1.22. 9 shows the S K-edge x-ray absorption data obtained for a c 
(2 x 2)S-Ni(100) overlay er using their original experimental set-up. This approach has since been extended to 
the analysis of lighter atoms (C, O, F) on many different substrates and under atmospheric pressures 
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Figure Bl.22.9. Fluorescence-yield, surface-extended x-ray absorption fine structure (FY-SEXAFS) spectra 
for the S K-edge of a c(2 x 2) ordered monolayer of sulfur atoms adsorbed on a Ni(100) surface. The upper 
trace in the left panel corresponds to the raw data obtained at an incident photon angle of 20° from the surface 
plane, while the bottom trace displays the background-subtracted and normalized SEXAFS oscillations 
calculated from the original spectrum. The right panel, which corresponds to the Fourier-transformed 
SEXAFS data, provides information on both the Ni-S bond length (2.22 A) and the Ni near-neighbour 
coordination number around each sulfur atom (4) [67]. Because EXAFS relies on the interference of an 
outgoing photoelectron with its own scattering from nearby atoms, it provides local geometry information 
without requiring long-range order. This example also illustrates the high sensitivity of the technique (these 

experiments were carried out with a 1 cm area single crystal). The use of fluorescence detection allows for 
the extension of this type of study to samples in non-vacuum environments 


Soon after the development of EXAFS it was recognized that the signal near the x-ray absorption edge is quite 
complex and provides information on electronic transitions from atomic core levels to valence bands and/or 
molecular orbitals [69]. The analysis of that signal constitutes the basis for a technique named near-edge x-ray 
absorption fine structure (NEXAFS, or XANES). The shape of the x-ray absorption spectra near the 
absorption edge has long been used as an empirical fingerprint for the local chemical environment of oxides 
and other supported catalysts, but newer developments allow for the extraction of a more detailed picture of 
the nature and geometrical arrangement of adsorbates from those data. This is possible thanks in great part to 
the combination of the polarized nature of synchrotron radiation and the simplicity of the electronic transition 
dipoles for absorption from core levels [70]. Figure B 1.22. 10 displays an example where the geometry of 
vinyl moieties adsorbed on Ni(100) surfaces was determined by using NEXAFS [71]. 
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Figure Bl.22.10. Carbon K-edge near-edge x-ray absorption (NEXAFS) spectra as a function of photon 
incidence angle from a submonolayer of vinyl moieties adsorbed on Ni(100) (prepared by dosing 0.2 1 of 
ethylene on that surface at 180 K). Several electronic transitions are identified in these spectra, to both the pi 
(284 and 286 eV) and the sigma (>292 eV) unoccupied levels of the molecule. The relative variations in the 
intensities of those peaks with incidence angle can be easily converted into adsorption geometry data; the 
vinyl plane was found in this case to be at a tilt angle of about 65° from the surface [71]. Similar geometrical 
determinations using NEXAFS have been carried out for a number of simple adsorbate systems over the past 
few decades. 


B1.22.5 OTHER OPTICAL TECHNIQUES 

A few additional optical techniques need to be mentioned in this review. As discussed above, these are by and 
large well known spectroscopies for the study of bulk samples; it is only their extension to the study of 
surfaces what has not been realized to its fullest potential yet. 
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B1. 22.5.1 OTHER UV-VISIBLE OPTICAL TECHNIQUES 


Spectroscopies such as UV-visible absorption and phosphorescence and fluorescence detection are routinely 
used to probe electronic transitions in bulk materials, but they are seldom used to look at the properties of 
surfaces [72]. As with other optical techniques, one of the main problems here is the lack of surface 
discrimination, a problem that has sometime been bypassed by either using thin films of the materials of 
interest [73, 74], or by using a reflection detection scheme. Modulation of a parameter, such as electric or 
magnetic fields, stress, or temperature, which affects the optical properties of the sample and detection of the 
AC component of the signal induced by such periodic changes, can also be used to achieve good surface 
sensitivity [75]. This latter approach is the basis for techniques such as surface reflectance spectroscopy, 
reflectance difference spectroscopy/reflectance anisotropy spectroscopy, surface photoadsorption 


spectroscopy and surface differential reflectivity [76, 77 and 78]. Early optical characterization studies of 
solid surfaces were instrumental in the detection and characterization of intrinsic surface electronic states due 
to the uniqueness of the interface environments. Ellipsometry is also a mature technique often used to obtain 
film thickness and other optical properties [79]. 

One interesting new field in the area of optical spectroscopy is near- field scanning optical microscopy, a 
technique that allows for the imaging of surfaces down to sub-micron resolution and for the detection and 
characterization of single molecules [80, 81 ]. When applied to the study of surfaces, this approach is capable 
of identifying individual adsorbates, as in the case of oxazine molecules dispersed on a polymer film, 
illustrated in figure B 1.22. 11 [82]. Absorption and emission spectra of individual molecules can be obtained 
with this technique as well, and time-dependent measurements can be used to follow the dynamics of surface 
processes. 

B1. 22.5.2 MAGNETIC RESONANCE 

NMR has developed into a powerful analytical technique in the past decades, and has been used extensively in 
the characterization of a great number of chemical systems. Its extension to the study of surfaces, however, 
has been hampered by the need of large samples because of its poor sensitivity. On the other hand, the 
development of magic-angle-spinning NMR (MAS-NMR) and the extension of NMR to many nuclei besides 

hydrogen have opened the doors for the use of that technique to many solids. For instance, MAS H NMR has 
been quite useful in the research on Bransted acidity in oxides [83], Also, the study of zeolites with NMR is 

now practically routine: the chemical shifts in Si NMR data are easily interpreted in terms of the number of 
aluminium atoms next to a given silicon centre, and the position of the 27 Al peaks provides information on the 
coordination number and geometry of the aluminium atoms [84]. 129 Xe NMR has been used to probe both the 
local environment inside porous materials [ 83 ] and heterogeneities in adsorbates [85]. Dynamic studies on the 
thermal conversion of adsorbates on transition metal catalysts have been performed as well [86], 

Other magnetic measurements of catalysts include electron paramagnetic resonance and magnetic 
susceptibility. Although those are not as common as NMR, they can be used to look at the properties of 
paramagnetic and ferromagnetic samples. Examples of these applications can be found in the literature [ 87 , 
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Figure Bl.22.11. Near-field scanning optical microscopy fluorescence image of oxazine molecules dispersed 
on a PMMA film surface. Each protuberance in this three-dimensional plot corresponds to the detection of a 
single molecule, the different intensities of those features being due to different orientations of the molecules. 
Sub-diffraction resolution, in this case on the order of a fraction of a micron, can be achieved by the near-field 
scanning arrangement. Spectroscopic characterization of each molecule is also possible. (Reprinted with 
permission from [82]. Copyright 1996 American Chemical Society.) 

B1. 22.5.3 SPECTROSCOPIES WHICH RELY ONLY IN PART ON PHOTONS 

A number of surface-sensitive spectroscopies rely only in part on photons. On the one hand, there are 
techniques where the sample is excited by electromagnetic radiation but where other particles ejected from the 
sample are used for the characterization of the surface (photons in; electrons, ions or neutral atoms or moieties 
out). These include photoelectron spectroscopies (both x-ray- and UV-based) [ 89 , 90 and 91], photon 
stimulated desorption [92], and others. At the other end, a number of methods are based on a particles- 
in/photons-out set-up. These include inverse photoemission and ion- and electron-stimulated fluorescence [ 93 , 
94 ]. All these techniques are discussed elsewhere in this encyclopaedia. 
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B1.23 Surface structural determination: particle 
scattering methods 

J Wayne Rabalais 


B1.23.1 INTRODUCTION 

The origin of scattering experiments has its roots in the development of modern atomic theory at the 
beginning of this century. As a result of both the Rutherford experiment on the scattering of alpha particles 
(He nuclei) by thin metallic foils and the Bohr theory of atomic structure, a consistent model of the atom as a 
small massive nucleus surrounded by a large swarm of light electrons was confirmed. Following these 
developments, it was realized that the inverse process, namely, analysis of the scattering pattern of ions from 
crystals, could provide information on composition and structure. This analysis is straightforward because the 
kinematics of energetic atomic collisions is accurately described by classical mechanics. Such scattering 
occurs as a result of the mutual Coulomb repulsion between the colliding atomic cores, that is, the nucleus 
plus core electrons. The scattered primary atom loses some of its energy to the target atom. The latter, in turn, 
recoils into a forward direction. The final energies of the scattered and recoiled atoms and the directions of 
their trajectories are determined by the masses of the pair of atoms involved and the closeness of the collision. 
By analysis of these final energies and angular distributions of the scattered and recoiled atoms, the elemental 
composition and structure of the surface can be deciphered. 

Low-energy (1-10 keV) ion scattering spectrometry (ISS) had its beginning as a modern surface analysis 
technique with the 1967 work of Smith [1], which demonstrated both surface elemental and structural 
analysis. Over the next twenty years, it was clearly demonstrated [2, 3, 4, 5 and 6] that direct surface 
structural information could be obtained from ISS. Most of the early workers used electrostatic analysers to 
measure the kinetic energies of the scattered ions. There are two problems with this technique, (i) It analyses 
only the scattered ions; these are typically only a very small fraction (<5%) of the total scattered flux. Thus, 
high primary ion doses are required for spectral acquisition which are potentially damaging to the surface and 
adsorbate structures, (ii) Neutralization probabilities are a function of the ion beam incidence angle a with the 
surface and the azimuthal angle 5 along which the ion beam is directed. This is not a simple behaviour since 
the probabilities depend on the distances of the ion to specific atoms. As a result, it is difficult to separate 
scattering intensity changes due to neutralization effects from those due to structural effects. The use of alkali 
primary ions [7] which have low neutralization probabilities leads to higher scattering intensities and 
pronounced focusing effects, however, the contamination of the sample surface by the reactive alkali ions is a 
potential problem with this method. Buck and co-workers [8], who had been developing time-of- flight (TOF) 
methods for ion scattering since the mid 1970s, used TOF methods for surface structure analysis in 1984 and 
demonstrated the capabilities and high sensitivity of the technique when both neutrals and ions are detected 
simultaneously. A TOF spectrometer system with a long flight path for separation of the scattered and 
recoiled particles and continuous variation of the scattering and recoiling § angles was developed in 1990 
[9]. This coupling of TOF methods with detection of both scattered and recoiled particles led to the 
development of TOF scattering and recoiling spectrometry (TOF-SARS) as a tool for structural analysis [10]. 
A large, time-gated, position-sensitive microchannel plate detector was used in 1997 to obtain images of the 
scattered and recoiled particles, leading to the development of scattering and recoiling imaging spectrometry 
(SARIS) [U]. Several research groups [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 and 24] throughout the 
world are now engaged in surface structure determinations using some form of low-keV ISS. 


This section will concentrate on TOF-SARS and SARIS, for it is felt that these are the techniques that will be 
most important in future applications. Also, emphasis will be placed on surface structure determinations rather 
than surface elemental analysis, because TOF-SARS and SARIS are capable of making unique contributions 
in the area of structure determination. The use of TOF methods and large-area position-sensitive detectors has 
led to a surface crystallography that is sensitive to all elements, including the ability to directly detect 
hydrogen adsorption sites. TOF detection of both neutrals and ions provides the high sensitivity necessary for 
non-destructive analysis. Detection of atoms scattered and recoiled from surfaces in simple collision 
sequences, together with calculations of shadowing and blocking cones, can now be used to make direct 


measurements of interatomic spacings and adsorption sites within an accuracy of less than 0.1 A. 


B1.23.2 BASIC PHYSICS UNDERLYING KEV ION SCATTERING AND 
RECOILING 

There are two basic physical phenomena which govern atomic collisions in the keV range. First, repulsive 
interatomic interactions, described by the laws of classical mechanics, control the scattering and recoiling 
trajectories. Second, electronic transition probabilities, described by the laws of quantum mechanics, control 
the ion-surface charge exchange process. 

B1. 23.2.1 KINEMATICS OF ION-SURFACE COLLISIONS AND ELEMENTAL ANALYSIS 

The dynamics of ion surface scattering at energies exceeding several hundred electronvolts can be described 
by a series of binary collision approximations (BCAs) in which only the interaction of one energetic particle 
with a solid atom is considered at a time [25]. This model is reasonable because the interaction time for the 
collision is short compared with the period of phonon frequencies in solids, and the interaction distance is 
shorter than the interatomic distances in solids. The BCA simplifies the many -body interactions between a 
projectile and solid atoms to a series of two-body collisions of the projectile and individual solid atoms. This 
can be described with results from the well known two-body central force problem [26]. 

Within the BCA, the trajectories of energetic particles on the surface become a series of linear motion 
segments between neighbouring atoms. Both the scattered and recoiled atoms have high, discrete kinetic 
energy distributions. The simplest case of ion-surface scattering phenomena is quasi-single scattering (QSS), 
which represents the case of one large-angle deflection that is preceded and/or followed by a few small 
deflections. Figure B 1.23.1 shows an example of QSS. This typically produces a sharp scattering peak whose 
energy is near that of the theoretical single-collision energy. The energies of scattered and recoiled particles in 
single scattering (SS) can be derived from the laws of conservation of energy and momentum. The energy E 
of a projectile scattered from a stationary target is given as 


E t = £ V ((COS0 ± (A- - *iiT 0) I/2 )V(1 + A> 2 (B1.23.1) 

where A = MJM 9 and Eq, M v and M are the initial energy of the projectile and the mass of the target and the 
projectile, respectively. If the mass of the impinging particle is less than or equal to that of the target atom 
then ,4 > 1, and only the positive sign is used. If the scattering angle is chosen as 90°, equation (B 1.23.1) can 
be simplified to 


Solving for M t yields 


M, = M r 


j + g (B1.23.3) 


I -B 


where B = EJE^. If the mass of the impinging particle is greater than that of the target atom, M > M v and 
both signs are used in the equation. The energy of the scattered particle is then found to be a double-valued 
function of the scattering angle 0, i.e. there are two E § for each 0. For the case of A < 1, the maximum SS 
scattering angle is 


ft™* = sin A, 


(B1.23.4) 


For angles greater than max , only multiple scattering can occur. 

The energy of scattered or recoiled ions can be measured directly by means of an electrostatic energy 
analyser. If the TOF method is used, the relation between scattering energy E and TOF t s is expressed as 


*=^=Ht)' 


(B1.23.5) 


and 


k = */[ rt f . 


2^ 


(B1.23.6) 


where <i tof is the flight distance of the scattered atom. 
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Figure Bl.23.1. (a) Two-body collision of a projectile of mass M and kinetic energy Eq approaching a 
stationary target of mass M t with an impact parameter/?, (b) Quasi-single scattering and direct recoiling with 


incident angle a and exit angle P, based on the BCA. 

The intensity of SS /. from an element i in the solid angle AQ is proportional to the initial beam intensity 7 , 
the concentration of the scattering element N f , the neutralization probability P p the differential scattering cross 
section da(0)/dQ, the shadowing coefficient f si (oi, 8 in ) and the blocking coefficient^ .(a,8 t ) for the /th 
component on the surface: 

/; ** /oWift/ai/hi—Aft, (B1.23.7) 

Similar to QSS, direct recoil (DR) of surface atoms produces energetic atoms that have a relatively narrow 
velocity distribution. DR particles are those species which are recoiled from the surface layers as a result of a 
direct collision of the primary ion. They escape from the surface with little energy loss through collisions with 
neighbouring atoms. The energy E of a DR surface atom can be expressed as 

E r = £„{4^COS^)/[l +A)\ (B1.23.8) 


From geometry considerations, DR is observed only in the forward-scattering direction for which § < 90°. A 
similar expression as equation (B 1.23. 7) is applicable for recoiling intensity evaluation. All elements, 
including hydrogen, can be analysed by either scattering, recoiling, or both techniques. TOF peak 
identification of QSS and DR is straightforward using the equations above. 

B1. 23.2.2 SHADOW CONES, BLOCKING CONES, AND STRUCTURAL ANALYSIS 

A simple interpretation based on the BCA yields some important concepts for ion scattering and recoiling: 
shadow cones and blocking cones. As shown in figure B 1.23. 2(a) scattering of ions by a target atom produces 
a region in which no ion can penetrate behind the target atom. This region is called a shadow cone. The cone 
dimensions can be evaluated from known interatomic distances in experiments, figure B 1.23. 2(b) shows the 
normalized ion flux density across the shadow cone. There is zero flux density inside of the shadow cone and 
unit flux density far outside of the cone. Highly focused ion flux density appears at the boundary of the cone. 
This anisotropic distribution of ion flux after interaction with a target atom is the basis of ISS structural 
determinations. If a neighbouring atom lies inside the shadow cone (A in (a)), it cannot be scattered or 
recoiled. If it is well away from the cone (C), the cone has no effect on the intensity. When it lies in the 
focusing region (B), enhanced intensity scattering and/or recoiling intensity is observed. TOF-SARS measures 
the intensity change due to the shadow cone effect on neighbouring atoms as a function of incident beam 
direction. 

Trajectories of ions interacting with an additional target atom (blocking atom) after scattering from the initial 
target atom (scattering atom) produce a hollow region behind the blocking atom called a blocking cone ( figure 
Bl. 23. 2(c) ). It can be regarded as an interaction of a target atom (blocking atom) with ions emitted from an 
adjacent point ion source (scattering atom). A shadow cone is different because it originates from the 
interaction with ions from an infinitely distant point ion source (ion source in the ion beam line). Unlike a 
shadow cone, a blocking cone diverges with a measurable blocking cone angle £, . The closer the blocking 
atom is to the scattering atom, the larger is the angle of divergence. In traditional ISS, the variation of the 
interactions of both shadowing and blocking cones are measured. It is possible to minimize the effect of 
blocking cones by a judicious choice of scattering geometry. Blocking cones are, however, inevitable, 
especially for scattering trajectories from deep subsurface layers. 
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Figure Bl.23.2. (a) Shadow cone of a stationary Pt atom in a 4 keV Ne + ion beam, appearing with the 
overlapping of ion trajectories as a function of the impact parameter. The initial position of the target atom 
that recoils in the collision is indicated by a solid circle, (b) Plot of the normalized ion flux distribution 
density across the shadow cone in (a). The flux density changes from inside the shadow cone, to much 
greater than~l in the focusing region, converging to 1 away from the shadow cone edge, (c) Blocking cones 

of Pt atoms (B, B') cast by 4 keV Ne + ions scattered from another Pt atom (A). Note the different blocking 
angles of the two blocking atoms, which is due to the differences in the interatomic spacings between the 
scattering and blocking atoms. 


Considering a large number of ions with parallel trajectories impinging on a target atom, the ion trajectories 
are bent by the repulsive potential such that there is an excluded volume, called the shadow cone, in the shape 
of a paraboloid formed behind the target atom as shown in figure B 1.23. 3(a) . Ion trajectories do not penetrate 
into the shadow cone, but instead are concentrated at its edges much as rain pours off an umbrella. Atoms 
located inside the cone behind the target atom are shielded from the impinging ions. Similarly, if the scattered 
ion or recoiling atom trajectory is directed towards a neighbouring atom, that trajectory will be blocked. For a 
large number of scattering or recoiling trajectories, a blocking cone will be formed behind the neighbouring 
atom into which no particles can penetrate, as shown in figure Bl. 23. 3(b) . The dimensions of the shadowing 
and blocking cones can be determined experimentally from scattering measurements along crystal azimuths 
for which the interatomic spacings are accurately known. 
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Figure Bl.23.3. Schematic illustrations of backscattering with shadowing and direct recoiling with shadowing 
and blocking. 

B1. 23.2.3 SCATTERING AND RECOILING ANISOTROPY CAUSED BY SHADOWING AND BLOCKING CONES 

When an isotropic ion fluence impinges on a crystal surface at a specific incident angle a, the scattered and 
recoiled atom flux is anisotropic. This anisotropy is a result of the incoming ion's eye view of the surface, 
which depends on the specific arrangement of atoms and the shadowing and blocking cones. The arrangement 
of atoms controls the atomic density along the azimuths and the ability of ions to channel, that is, to penetrate 
into empty spaces between atomic rows. The cones determine which nuclei are screened from the impinging 
ion flux and which exit trajectories are blocked, as depicted in figure Bl.23.3. By measuring the ion and atom 
flux at specific scattering and recoiling angles as a function of ion beam incident a and azimuthal 8 angles to 
the surface, structures are observed which can be interpreted in terms of the interatomic spacings and shadow 
cones from the ion's eye view. 
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(A) TIME OF FLIGHT SCATTERING AND RECOILING SPECTROMETRY (TOF-SARS)— SHADOW CONE BASED 
EXPERIMENT 


In TOF-SARS [9], a low-keV, monoenergetic, mass-selected, pulsed noble gas ion beam is focused onto a 
sample surface. The velocity distributions of scattered and recoiled particles are measured by standard TOF 
methods. A channel electron multiplier is used to detect fast (>800 eV) neutrals and ions. This type of detector 
has a small acceptance solid angle. A fixed angle is used between the pulsed ion beam and detector directions 
with respect to the sample as shown in figure B 1.23. 4 . The sample has to be rotated to measure ion scattering 


and recoiling anisotropy as a function of the incident angle (a) and azimuthal angle (8) of the incident beam. 
Since the sample rotation changes both the incident beam direction and the detector direction, the spectra are 
affected by both shadow cones and blocking cones. In order to reduce blocking cone effects for simpler 
interpretations, high-angle scattering is preferred. Elemental analyses are achieved by converting the velocity 
distributions into energy distributions and relating those to the masses of the target atoms through the 
kinematic relationship that describes classical scattering and recoiling ( equation (B 1.2 3.1) and equation 
(Bl.23.8) ). Structural analyses are achieved by monitoring the scattered and recoiled particles as a function of 
both beam incident angle a and crystal azimuthal angle 8. The anisotropic features in these a- and S-scans are 
interpreted by means of shadow cone and blocking cone analyses. It requires several hours to collect data 
needed to construct a contour map of intensities as a function of both a and 8. Moreover, the experimental 
geometry is restricted to fixed scattering angles and in-plane scattering and recoiling trajectories. 

(B) SCATTERING AND RECOILING IMAGING SPECTROMETRY (SARIS)— BLOCKING CONE BASED EXPERIMENT 

In an ideal SARIS system [11], it would be desirable to measure the velocity distributions of all the energetic 
particles scattered and recoiled from a sample surface in a short time period. This concept requires a 
hemispherical, time-resolving, position-sensitive detector which covers all of the solid angle space above the 
sample surface. Implementation of this concept is not currently feasible. If data collection time is unimportant, 
a point detector as described above, such as a channel electron multiplier mounted on a flexible goniometer 
which allows movement of the detector over a large solid angle, can be used to collect TOF spectra. In order 
to compromise the size of detector with data collection time, a hybrid configuration can be used. This is a 
large, time-resolving, position-sensitive microchannel plate detector mounted on a triple-axis UHV 
goniometer. This instrument makes it possible to capture scattering and recoiling intensity distributions 
without changing the incident beam direction. It gives a great advantage in comparison of experimental results 
with those of computer simulations. The large-area detector provides the intensity distribution over a limited 
solid angle. Since the optimum detector position and flight distance for a specific experiment are variable, the 
detector has to be moved around to cover a large solid angle, and to compromise TOF resolution with the 
detector acceptance solid angle. 
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Figure Bl.23.4. Schematic diagram of TOF scattering and recoiling spectrometry (TOF-SARS) illustrating 
the plane of scattering formed by the ion beam, sample and detector. TOF spectra (a) are collected with fixed 


positions of the ion beam, sample and detector. In order to measure the incident angle (a) or azimuthal angle 
(8) intensity variation of a peak in a TOF spectrum, the sample is rotated about an axis that goes through its 
normal or through its plane, respectively. Such a and 8 intensity variations of a peak are shown in (b) and (c). 


B1.23.3 INSTRUMENTATION 

The basic requirements [9] for low-energy ion scattering are an ion source, a sample mounted on a precision 
manipulator, an energy or velocity analyser and a detector as shown in figure B 1.23. 5 . The sample is housed 
in an ultra-high vacuum (UHV) chamber in order to prepare and maintain well defined clean surfaces. The 
UHV prerequisite necessitates the use of differentially pumped ion sources. Ion scattering is typically done in 
a UHV chamber which houses other surface analysis techniques such as low-energy electron diffraction 
(LEED), x-ray photoelectron spectroscopy (XPS) and Auger electron spectroscopy (AES). The design of an 
instrument for ion scattering is based on the type of analyser to be used. An electrostatic analyser (ESA) 
measures the kinetic energies of ions while a TOF analyser measures the velocities of both ions and fast 
neutrals. 
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Figure Bl.23.5. Schematic illustration of the TOF-SARS spectrometer system. A = ion gun, B = Wien filter, 
C = Einzel lens, D = pulsing plates, E = pulsing aperture, F = deflector plates, G = sample, H = electron 
multiplier detector with energy prefilter grid and I = electrostatic deflector. 

B1. 23.3.1 ION SOURCE AND BEAM LINE 

The critical requirements for the ion source are that the ions have a small energy spread, there are no fast 
neutrals in the beam and the available energy is 1-10 keV. Both noble gas and alkali ion sources are common. 
For TOF experiments, it is necessary to pulse the ion beam by deflecting it past an aperture. A beam line for 
such experiments is shown in figure Bl.23.5 it is capable of producing ion pulse widths of »15 ns. 

B1. 23.3.2 ANALYSERS 


An ESA provides energy analysis of the ions with high resolution. A TOF analyser provides velocity analysis 


of both fast neutrals and ions with moderate resolution. In an ESA the energy separation is made by spatial 
dispersion of the charged particle trajectories in a known electrical field. ESAs were the first analysers used 
for ISS; their advantage is high-energy resolution and their disadvantages are that they analyse only ions and 
have poor collection efficiency due to the necessity for scanning the analyser. A TOF analyser is simply a 
long field-free drift region. It has the advantage of high efficiency since it collects both ions and fast neutrals 
simultaneously in a multichannel mode; its disadvantage is only moderate resolution. 

B1. 23.3.3 DETECTORS 

The most common detectors used for TOF-SARS are continuous dynode channel electron multipliers which 

are capable of multiplying the signal pulses by 10 6 -10 7 . They are sensitive to both ions and fast neutrals. 
Neutrals with 
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velocities £l0 6 cm s _1 are detected with the same efficiency as ions. Since the cones of these detectors are 

usually less than 1 cm and the TOF flight paths are of the order of ~1 m, the acceptance solid angles of such 
detectors are very small. Incident and azimuthal angle scans are made by rotating the sample with the detector 
at a fixed position. 

SARIS overcomes the limitations of small-area detectors by using a large, time-resolving, position-sensitive 
microchannel plate (MCP) detector and TOF methods to capture images of both ions and fast neutrals that are 
scattered and recoiled from a surface. Due to the large solid angle subtended by the MCP, atoms that are 
scattered and recoiled in both planar and non-planar directions are detected simultaneously. For example, with 
a 75 x 95 mm MCP situated at a distance of 16 cm from the sample, it spans a solid angle of -0.3 sr 
corresponding to an azimuthal range of -26° . Using a beam current of -0.1 nA cm , the four images 
required to make up a 90° azimuthal range can be collected in ~2 min with a total ion dose of -10 11 ions cm" 
2 . The time gating of the MCP provides resolution of the scattered and recoiled atoms into time frames as 
short as 10 ns, thereby providing element-specific spatial-distribution images. These SARIS images contain 
features that are sharply focused into well defined patterns as a function of both space and time by the crystal 
structure of the target sample. If the MCP is mounted on a goniometer that provides both horizontal and 
vertical rotation and translation away from the sample, it is possible to change the solid angle of collection 
and the flight path length. 


B1.23.4 COMPUTER SIMULATION METHODS 

It is extremely helpful to use classical ion trajectory simulations in order to visualize the ion trajectories to 
improve the understanding of ion behaviour in the surface region and to provide a systematic method for 
surface structure determination. Such simulations are based on the BCA in which the trajectories of the 
energetic particles are assumed to be a series of straight lines corresponding to the asymptotes of the 
scattering trajectories due to sequential binary collisions. The BCA has been proven to be valid for the keV 
range of energies. 

B1. 23.4.1 BINARY COLLISION APPROXIMATION 

Atom-surface interactions are intrinsically many-body problems which are known to have no analytical 
solutions. Due to the shorter de Broglie wavelength of an energetic ion than solid interatomic spacings, the 
energetic atom-surface interaction problem can be treated by classical mechanics. In the classical mechanical 


framework, the problem becomes a set of Newtonian equations of motion [ 26 ] for z'th particle in an TV-body 
problem. 

The summation of pair- wise potentials is a good approximation for molecular dynamics calculations for 
simple classical many-body problems [27]. It has been widely used to simulate hyperthermal energy (>1 eV) 
atom-surface scattering: 

f|^~ Yl ^*(*V^K (B1.23.10) 
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As the kinetic energy involved in the system goes higher, the interaction of energetic particles is more and 
more localized near the nuclei. When the interaction distance is much smaller than interatomic distances in the 
system, the BCA is valid: 


Fi ^ —Vfp[Tj,Tj) j — ihc nearest neighbour off. (B1.23.11) 

In the BCA, each collision process is regarded as an isolated event. Ion-solid interactions are approximated 
by a series of two-body interactions which are reduced to one-body problems in the centre-of-mass (CM) 
coordinates. The projectile is assumed to converge to the asymptote after a collision before interacting with 
the next collision partner. In the central force one-body problem, evaluation of scattering integrals and time 
integrals replaces the time-consuming numerical integration of a set of differential equations. 

B1. 23.4.2 SCATTERING INTEGRAL AND TIME INTEGRAL 

Once atom-surface scattering is reduced to the BCA, one can calculate the energy relationship between two 
particles involved in scattering with a known scattering angle and the laws of conservation of energy and 
momentum as shown in section B 1.23. 2.1 . The scattering angle as a function of impact parameter necessitates 
evaluation of the scattering integrals. The scattering angle, % in the CM coordinate system, and the CM energy 
E, are given by 

where 


E= E . (B1.23.13) 

] + A 

A = MJM p is impact parameter and r Q is the distance of closest approach (apsis) of the collision pair. The 

transformations from the CM coordinates (scattering angle %) to the laboratory coordinates with the scattering angle 
9 for the primary particle and § for the recoiled surface atoms is given by 


-4 sin/ 

tantf = — — (B1. 23.14) 

] < Acosx 

and 

sinfl 

tantf> = . (B1.23.15) 

1 — cos x 
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For accurate ion trajectory calculation in the solid, it is necessary to evaluate the exact positions of the 
intersections of the asymptotes (A 1? A 2 ) of the incoming trajectory and that of the outgoing trajectories of both 
the scattered and recoiled particles in a collision. The evaluation of these values requires time integrals and 
the following transformation equations: 


l+A 
Aa = ^?" Al - (B1 - 23 - 18) 

Numerical integration methods are widely used to solve these integrals. The Gauss-Muhler method [28] is 
employed in all of the calculations used here. This method is a Gaussian quadrature [ 29 ] which gives exact 
answers for Coulomb scattering. 

B1. 23.4.3 POTENTIAL FUNCTION 

One of the most important issues in simulation of energetic atom-surface scattering is the determination of the 
interaction potential between the colliding atoms. In the low-keV energy region, electrons have a screening 
effect on the Coulomb interaction of nuclei so that the actual nuclear charges affecting the trajectories are less 
than the atomic numbers (Z) of the atoms involved. This screening effect decreases the potential V(f) by an 
amount which is expressed as a screening function ®(r). The form of the potential function is 

V(t) = ^!^£-*<r). (B1.23.19) 

r 

The ®(r) can be expressed in various forms [30], e.g. the Bohr, Born-Mayer, Thomas-Fermi-Firsov and 
Moliere models, as well as the 'universal potential' of Ziegler, Biersack and Littmark known as the ZBL 
potential [31]. The ZBL potential function is expressed as 

*(r) = 0. 181 Be" 3 2/i " + 0.5099 c n -*«-V*r + q 2802 ffl M2 V« r 

+ojmn# 3U *' (B1 " 23 " 20) 

where the screening length a = (0.8S53Cj:U|>/<Z"- ::i + Z!'")h "fli s Bohr radius (0.53 A), Z 1? Z 2 are the atomic 
numbers of the atoms involved and C F is a screening constant for adjusting the screening length to calibrate 
the potential to experimental scattering data. The ZBL potential provides good agreement between simulated 


and experimental results. 
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B1. 23.4.4 GENERAL DESCRIPTION OF SIMULATION PROGRAM 

Classical ion trajectory computer simulations based on the BCA are a series of evaluations of two-body 
collisions. The parameters involved in each collision are the type of atoms of the projectile and the target 
atom, the kinetic energy of the projectile and the impact parameter. The general procedure for implementation 
of such computer simulations is as follows. All of the parameters involved in the calculation are defined: the 
surface structure in terms of the types of the constituent atoms, their positions in the surface and their thermal 
vibration amplitude; the projectile in terms of the type of ion to be used, the incident beam direction and the 
initial kinetic energy; the detector in terms of the position, size and detection efficiency; the type of potential 
functions for possible collision pairs. 

After defining the input parameters, the calculation of the trajectories of an incident ion begins with a 
randomly chosen initial entrance point on the surface. The next step is to find the first collision partner. 
Taking advantage of the symmetry of the crystal structure, one can list the positions of surface atoms within a 
certain distance from the projectile. The atoms are sorted in ascending order of the scalar product of the 
interatomic vector from the atom to the projectile with the unit velocity vector of the projectile. If the collision 
partner has larger impact parameter than a predefined maximum impact parameter (p max ), it is discarded. If a 
partner has a shorter impact parameter than/? max , the evaluation of the collision is initiated by converting 
three-dimensional information, such as the positions of the projectile and the target atom and the velocity of 
the projectile, into the parameters necessary to calculate the expressions for the impact parameter and the 
relative energy. After the equations are solved with these parameters, the values are converted back to three- 
dimensional information to search for a new collision partner with a new set of parameters calculated in the 
previous collision. This procedure is repeated until the kinetic energy of the projectile falls below a predefined 
cutoff energy or it ejects from the surface. If it is necessary to follow recoiled particles, they are regarded as 
new projectiles in subsequent collisions. After finishing a trajectory calculation, a new calculation starts with 
a new randomly chosen entrance point. Millions of trajectory calculations with different initial impact 
parameters are carried out in order to compare the results with those of experiments. In order to increase the 
speed of the calculation, a precalculated table of scattering angle and scattered energy as a function of impact 
parameter and kinetic energy is used. A two-dimensional spline method is used to interpolate a scattering 
angle and energy from the table. 

(A) CRYSTAL MODELS WITH THERMAL VIBRATION INCLUSION 

The lattice atoms in the simulation are assumed to vibrate independently of one another. The displacements 
from the equilibrium positions of the lattice atoms are taken as a Gaussian distribution, such as 


(B1.23.21) 


where a 2 and Ax are the variance and the displacement from the lattice equilibrium position, respectively. The 
variance of the distribution can be expressed as [ 32 ] 

2h 2 T 



n 2 = Ax 2 = ^ t ; , (B1. 23.22) 


4jr 2 mJK-> n 
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where © D is the Debye temperature. The Box-Muller method [29] is used for generating random deviates 
with a Gaussian distribution. 

(B) COMPARISON OF SIMULATED AND EXPERIMENTAL DATA 

A systematic comparison of two sets of data requires a numerical evaluation of their likeliness. TOF-SARS 
and SARIS produce one- and two-dimensional data plots, respectively. Comparison of simulated and 
experimental data is accomplished by calculating a one- or two-dimensional reliability (R) factor [33], 
respectively, based on the /^-factors developed for LEED [34]. The 7^-factor between the experimental and 
simulated data is minimized by means of a multiparameter simplex method [33]. 


B1.23.5 ELEMENTAL ANALYSIS FROM SCATTERING AND 
RECOILING 

TOF-SARS and SARIS are capable of detecting all elements by either scattering, recoiling or both techniques. 
TOF peak identification is straightforward by converting equation (B 1.2 3.1) and equation (B 1.23. 8) to the 
flight times of the scattered ^ s and recoiled t x particles as 


/ s = L{M X +■ Af 2 )/(2Af|£ ( p) l/a lras+[(Afj/W I ) 2 - sm 2 tf] l/:i } (B1. 23.23) 

and 

h = IAM t + iW 2 )/(8M|E u ) l/ -COS0 (B1. 23.24) 

where L is the flight distance, that is, the distance from target to detector. Collection of neutrals plus ions 
results in scattering and recoiling intensities that are determined by elemental concentrations, shadowing and 
blocking effects and classical cross sections. The main advantage of TOF-SARS for surface compositional 
analyses is its extreme surface sensitivity as compared to the other surface spectrometries, i.e. mainly XPS 
and AES. Indeed, with a correct orientation and aperture of the shadow cone, the first monolayer can be 
probed selectively. At selected incident angles, it is possible to delineate signals from specific subsurface 
layers. Detection of the particles independently of their charge state eliminates ion neutralization effects. Also, 

the multichannel detection requires primary ion doses of only «10 ions cm or «10 ions/surface atom for 
spectral acquisition; this ensures true static conditions during analyses. 

Examples of typical TOF spectra obtained from 4 keV Ar + impinging on a Si {100} surface with chemisorbed 
H 2 and H 2 are shown in figure B 1.23. 6 [35], Peaks due to Ar scattering from Si and recoiled H, O and Si are 

observed. The intensities necessary for structural analysis are obtained by integrating the areas of fixed time 

windows under these peaks. 
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Figure Bl.23.6. TOF spectra of a Si {100} surface with chemisorbed H 2 (left) and clean Si (right). Peaks 

due to scattered Ar and recoiled H, O, and Si are observed. Conditions: 4 keV Ar + , scattering angle = 28°, 
incident angle a = 8°. 

While qualitative identification of scattering and recoiling peaks is straightforward, quantitative analysis 
requires relating the scattered or recoiled flux to the surface atom concentration. The flux of scattered or 
recoiled atoms is dependent on several parameters as detailed in equation (B 1.2 3. 7) . Compositional analyses 
by TOF-SARS and ISS have been applied in different areas of surface science, mainly in situations where the 
knowledge of the uppermost surface composition (first monolayer) is crucial. Some of these areas are as 
follows: gas adsorption, surface segregation, compounds and polymer blends, surface composition of real 
supported catalysts, surface modifications due to preferential sputtering by ion beams, diffusion, thin film 
growth and adhesion. 


B1.23.6 STRUCTURAL ANALYSIS FROM TOF-SARS 

The atomic structure of a surface is usually not a simple termination of the bulk structure. A classification 
exists based on the relation of surface to bulk structure. A bulk truncated surface has a structure identical to 
that of the bulk. A relaxed surface has the symmetry of the bulk structure but different interatomic spacings. 
With respect to the first and second layers, lateral relaxation refers to shifts in layer registry and vertical 
relaxation refers to shifts in layer spacings. A reconstructed surface has a symmetry different from that of the 
bulk symmetry. The methods of structural analysis will be delineated below. 

B1. 23.6.1 SCATTERING VERSUS INCIDENT ANGLE SCANS 

When an ion beam is incident on an atomically flat surface at grazing angles, each surface atom is shadowed 
by its neighbouring atom such that only forwardscattering (FS) is possible; these are large impact parameter 
(p) collisions. 
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As a increases, a critical value ff i\iAis reached each time the /th layer of target atoms moves out of the shadow 
cone allowing for large-angle backscattering (BS) or small-/? collisions as shown in figure B 1.23.3 . If the BS 
intensity I n ^ is monitored as a function of a, steep rises [36] with well defined maxima are observed when the 


focused trajectories at the edge of the shadow cone pass close to the centre of neighbouring atoms. This is 

illustrated for scattering of Ne + from a Pt(l 10) surface in figure Bl.23.7. From the shape of the shadow cone, 

i.e. the radius (R) as a function of distance (/) behind the target atom ( figure B 1.23. 3 )), the interatomic spacing 

ff i 
(8) can be directly determined from the 7 BS versus a plots. For example, by measuring * -^along directions 

for which specific crystal azimuths are aligned with the projectile direction and using d = r/sin **>* one can 
determine interatomic spacings in the first atomic layer. The first-second layer spacing can be obtained in a 

similar manner from ff ( ^measured along directions for which the first- and second-layer atoms are aligned, 
providing a measure of the vertical relaxation in the outermost layers. 
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Figure Bl.23.7. Scattering intensity versus incident angle a scans for 2 keV Ne + incident on (1 x 2)-Pt{l 10} 
at =149° along the (TlO), (001) and (Tl2> azimuths. A top view of the (lx 2) missing-row Pt{l 10} surface 
along with atomic labels is shown. Cross-section diagrams along the three azimuths illustrating scattering 
trajectories for the peaks observed in the scans are shown on the right. 
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B1.23.6.2 SCATTERING VERSUS AZIMUTHAL ANGLE A SCANS 


Fixing the incident beam angle and rotating the crystal about the surface normal while monitoring the 

backscattering intensity provides a scan of the crystal azimuthal angles 8 [37]. Such scans reveal the 

periodicity of the crystal structure. For example, one can obtain the azimuthal alignment and symmetry of the 

outermost layer by using a low a value such that scattering occurs from only the first atomic layer. With 

higher a values, similar information can be obtained for the second atomic layer. Shifts in the first-second 

■> 

layer registry can be detected by carefully monitoring the °£Ufi values for second-layer scattering along 
directions near those azimuths for which the second-layer atoms are expected, from the bulk structure, to be 


directly aligned with the first-layer atoms. The ff <-^values will be maximum for those 5 values where the 
first- and second-layer neighbouring atoms are aligned. 

When the scattering angle is decreased to a forward angle (<90°), both shadowing effects along the 
incoming trajectory and blocking effects along the outgoing trajectory contribute to the patterns. The blocking 
effects arise because the exit angle P = - a is small at high a values. Surface periodicity can be read directly 
from these features [37], as shown in figure B 1.23. 8 for Pt{l 10}. Minima are observed at the 8 positions 
corresponding to alignment of the beam along specific azimuths. These minima are a result of shadowing and 
blocking along the close-packed directions, thus providing a direct reading of the surface periodicity. 
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Figure Bl.23.8. Scattering intensity of 2 keV Ne + versus azimuthal angle 8 scans for Pt{l 10} in the (1 
and (1 x 3) reconstructed phases. Scattering angle = 28° and incident angle a = 6°. 


2) 


Azimuthal scans obtained for three surface phases of Ni{l 10} are shown in figure B 1.23. 9 [38]. The minima 
observed for the clean and hydrogen-covered surfaces are due only to Ni atoms shadowing neighbouring Ni 
atoms, whereas for the oxygen-covered surface minima are observed due to both O and Ni atoms shadowing 

neighbouring Ni atoms. Shadowing by H atoms is not observed because the maximum deflection in the Ne + 
trajectories caused by H atoms is less than 2.8°. 
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Figure Bl.23.9. Scattering intensity of 4 keV Ne + versus azimuthal angle 8 for a Ni{l 10} surface in the clean 
(1 x l) ? (1 x 2)-H missing row, and (2 x l)-0 missing row phases. The hydrogen atoms are not shown. The 
oxygen atoms are shown as small open circles. O-Ni and Ni-Ni denote the directions along which O and Ni 
atoms, respectively, shadow the Ni scattering centre. 


B1.23.7 STRUCTURAL ANALYSIS FROM SARIS 

An example of the SARIS experimental arrangement is shown in figure B 1.23. 10 [ 39 , 40 and 41] . The 
velocity distributions of scattered and recoiled ions plus fast neutrals are measured by analysing the positions 
of the particles on the detector along with their correlated TOF from sample to detector. The detector is gated 
so that it can be activated in windows of several microseconds duration, which are appropriate for TOF 
collection of specific scattered or recoiled particles. These windows are divided into 255 time frames with the 
time duration of 16.7 ns for each frame. Good statistics are obtained in a total acquisition time of -1 min. The 
image ordinate represents particle exit angles (P) and the abscissa represents the crystal azimuthal angles (8), 
i.e. an image in ((3, 8)-space. 
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Figure Bl.23.10. Schematic diagram of a scattering and recoiling imaging spectrometer (SARIS). A large- 
area (95 x 75 mm), time-resolving, position-sensitive microchannel plate (MCP) detector captures a large 


solid angle of the scattering and recoiling particles. A triple-axis UHV goniometer moves the MCP inside the 
vacuum chamber in order to vary the scattering angle, the distance from detector to sample, the TOF 
resolution and the acceptance solid angle of the detector. 

B1. 23.7.1 INTERACTION OF 4 KEV AR WITH PT{111} 

(A) AR SCATTERING 

The time-resolved images of Ar scattering [ 33 ] from Pt { 1 1 1 } of figure B 1.23. 11 correspond to selected 
frames of scattered Ar atoms with the azimuthal angle of the incident beam aligned along (I 12). The overall 
scattering intensity is maximal at 1.17 |us (3.54 keV) for the scattered Ar atoms, corresponding to SS as 
predicted by the BCA. The two intense spots at 1.17 |us result from the scattering from a first-layer Pt atom 
and focusing of the scattered beam by an 'atomic lens' formed by neighbouring first-layer Pt atoms (2, 3, 4 in 
figure B 1.23. 11 ). The intense spots are at small P since most of the Ar atoms are scattered and focused by 
first-layer Pt atoms. Focused high-P scattering usually arises from subsurface collisions. 
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Figure Bl.23.11. Above: selected time-resolved SARIS images of 4 keV Ar + scattering from Pt{l 1 1} along 
(I 12). Below: view of Pt { 1 1 1 } surface along (112) showing Ar + scattering from a first-layer Pt atom ( 1 ) and 
splitting into two focused beams by an 'atomic lens' formed by neighbouring first-layer Pt atoms (2, 3, 4). 

(B) PT RECOILING 


The images of recoiled Pt atoms [33] by 4 keV Ar + are shown in figure B 1.23. 12 . With increasing TOF, the 
recoil Pt images change from diffuse, to a focused recoil spot at P -25° and, finally, to movement of this spot 
to a higher p that is partially off the MCP. This focused recoil is observed along the 0° (21 1) and 60° (112) 
azimuths but not along the 30° (101) and 90° (Oil) azimuths. The diffuse images at short TOF, e.g. 3.77 |us, 
correspond to recoil of Pt atoms from the first layer. These recoils have more isotropic distributions and 
higher energies because there are no atoms above them by which they can be blocked. At longer TOF, e.g. 
4.57 |us, the focused recoil is due to Pt atoms from the second layer which have been focused by an 'atomic 
lens' created by first-layer atoms ( figure B 1.23. 12 )). The second-layer Pt atom can either be recoiled directly 
into the atomic lens or it can scatter from the neighbouring aligned third-layer atom into the atomic lens. 
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Figure Bl.23.12. Above: selected time-resolved SARIS images of 4 keV Ar + recoiling Pt atoms from Pt 
{111} along (I 12). Below: view of Pt { 1 1 1 } surface along (112) showing a focused second-layer Pt recoil 
trajectory (atoms 1-4 form a focusing 'atomic lens'). 

B1. 23.7.2 INTERACTION OF 4 KEV HE WITH PT{111} 

A series of time-resolved He scattering images [ 33 ] taken as a function of azimuthal angle is shown in figure 
Bl.23.13 . The crystal was rotated about its surface normal by 3° for each image. Each image is taken from a 
16.7 ns frame corresponding to the QSS TOF. The same intensity scale was used for all of the frames. The 
observed images are rich in features which change in position and intensity as a function of azimuthal angle. 
The regions of low intensity correspond to the positions of the centres of the blocking cones; these regions 
have mainly circular or oval shapes with distortions caused by other overlapping blocking cones. The regions 
of high intensity correspond to the positions of intersection or near-overlap of blocking cones; atom 
trajectories are highly focused along the edges of the cones. 
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Figure Bl.23.13. A series of 20 time-resolved SARIS frames for 4 keV He + scattering from Pt{l 1 1}-(1 x 2) 
taken every 3° of rotation about the azimuthal angle 8, starting with 8 = 0° as the (211) azimuth and 60° as the 
(112) azimuth. Each frame represents a 16.7 ns window centred at the TOF corresponding to QSS as 
predicted by the BCA. The abscissa is the crystal azimuthal angle (8) and the ordinate is the particle exit angle 

(P). 

The images at 8 = 0° and 60° along the (211) and (112) azimuths, respectively, are symmetrical about a 
vertical line through the centre of the frame, as is the crystal structure along these azimuths as shown in figure 
Bl.23.14 . The shifts in the positions and sizes of the blocking cones can be monitored as the azimuthal angle 
8 is rotated away from the symmetrical 0° or 60° directions. There are large variations in the intensities as a 
function of 8, with the highest intensities being observed along the directions 8 = 22-32° and 56-60°. These 
high-intensity features result from focusing of ions onto second-layer atoms by the shadow cones of first-layer 
atoms. The first-layer atoms are symmetrical; however, the second-layer atoms are in sites which are 
asymmetrical with respect to the first layer, resulting in non-planar scattering trajectories. Very intense 
features in asymmetrical positions are observed at higher exit angles. These intense features correspond to 
semichanneling in asymmetrical channels. Semichannels are 'valleys' in surfaces through which scattered 
ions are guided. Along ( 1 01) the first-layer atoms form the 'walls' and the second-layer atoms form the 'floor' 
of the semichannel. However, the second-layer rows are not centred in the bottom of the channel, resulting in 
an asymmetrical channel. As a result, the scattered atom trajectories are bent and focused along directions 
determined by the asymmetry of the channel. 


-24- 


<211>f *< 1 °i><ii2> 



o o o o 


Pt{lll}-(lvl) Atoms and Vectors 

Q_H^-lst-layer ©^^2nd laver •-»3rd-laj<r 


Figure Bl.23.14. Schematic illustration of the Pt{l 11 }-(l x 1) surface. Arrows are drawn to indicate the 
nearest-neighbour first-first-, second-first-, and third-first-layer interatomic vectors. 

The frames along the 0° (21 1) and 60° (112) azimuths in figure B 1.23. 13 were selected to compare with those 
of blocking cone analyses and classical ion trajectory simulations. The arrangement of the first-layer atoms is 
identical along both of these azimuths; however, the second- and third-layer atoms have a different 
arrangement with respect to the first-layer atoms. He atoms scattered from second- and third-layer atoms 
experience a different arrangement of blocking cones on their exit from the surface. The positions of the 
blocking cones were calculated [ 39 ] from the interatomic vectors of figure Bl.23.14 and the critical blocking 
angles or sizes of the cones were calculated with the method described in section B 1.23. 4 . The results are 
shown in figure Bl.23.15 . The blocking of scattering trajectories from wth-layer atoms by their neighbouring 
?zth-layer atoms are observed at low (3 since these atoms are all in the same plane. This first/first-layer atom 
scattering contributes most of the intensity at low p. The arcs corresponding to the edges of the blocking 
cones ( figure Bl.23.15 )) resulting from the vectors a, B, D and E in figure Bl.23.14 occur at P -10°. The 
features at higher p correspond to scattering trajectories from second- and third-layer atoms that are blocked 
and focused by first-layer atoms. The cones resulting from the vectors F, G and O along (211) and M and N 
along (112) are due to scattering trajectories from second- and third-layer atoms that are blocked by first-layer 
atoms along these symmetrical directions. These are centred along the azimuths and are directed to higher p 
values for shorter interatomic spacings. Blocking cones due to the vectors H, /, K and L result from second- 
and third-layer scattering and are observed at 8 values off of the 0° and 60° directions due to non-planar 
scattering trajectories. 
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Figure Bl.23.15. Experimental images (left), simulated images (right) and blocking cone analyses (centre) for 
He + scattering along the (211) and (112) azimuths. For the calculated blocking cones, first-first, first-second, 
and first-third layer interactions are identified by dash-dot, solid, and dotted lines, respectively. The scattering 
parameters are: scattering angle (with respect to incident beam direction) = 45°; beam incident angle (with 
respect to surface plane) a = 28°; exit angle of scattered particles along detector normal (relative to surface 
plane) P = 17°; flight path to detector (along the detector normal)= 15.5 cm. Angular space subtended by 
MCP is 27° of crystal azimuthal angle 8 and 33° of particle exit angle p. 

(A) QUANTITATIVE ANALYSIS 

Quantitative analyses can be achieved by using the scattering and recoiling imaging code (SARIC) simulation 
and minimization of the 7^-factor [33] ( section B 1.23. 4.4 ) between the experimental and simulated images as a 

function of the structural parameters. The SARIC was used to generate simulated images of 4 keV He + 
scattering from bulk-terminated Pt { 1 1 1 } as a function of the first-second interlay er spacing d. Anisotropic 
thermal vibrations with an amplitude of 0.1 A were included in the model. A two-dimensional reliability, or R, 
factor, based on the differences between the experimental and simulated patterns, was calculated as a function 
of the deviation d of the first-second interlay er spacing from the bulk value. The plots shown in figure 
Bl.23.16 exhibit minima at J min = -0.005 and +0.005 A for the ( 1 22) and (211) azimuths, respectively. The 
optimized simulated images corresponding to J min are shown in figure Bl.23.16 , rightmost frames; there is 
good agreement between these simulated and experimental images. The /^-factors are sensitive to changes in 
the interlayer spacing at the level of 0.01 A. Based on these data, we conclude that the Pt{l 1 1} surface is bulk 
terminated with the first-second layer spacing within +0.01 A, or 0.4%, of the 2.265 A bulk spacing. This 
sensitivity is less than the uncertainty due to the thermal vibrations because SARIS samples the average 
positions of lattice atoms. 
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Figure Bl.23.16. Plots of the two-dimensional 7^-factors as a function of the deviation (d) of the first-second 
interlayer spacing from the bulk value. The experimental and simulated images along the (511) and (112) 
azimuths of figure B 1.23. 15 were used in the comparison. 


B1.23.8 ION-SURFACE ELECTRON EXCHANGE 

One of the unsolved problems in the interaction of low-energy ions with surfaces is the mechanism of charge 
transfer and prediction of the charge composition of the flux of scattered, recoiled and sputtered atoms. The 
ability to collect spectra of neutrals plus ions and only neutrals provides a direct measure of scattered and 
recoiled ion fractions. SARIS images can provide electronic transition probability contour maps which are 
related to surface electron density and reactivity along the various azimuths. 

Ion-surface electron transition probabilities are determined by electron tunnelling between the valence bands 
of the surface and the atomic orbitals of the ion [42]. Such transition probabilities are highest for close 
distances of approach. Since TOF-SARS is capable of directly measuring the scattered and recoiled ion 
fractions, it provides an excellent method for studying ion-surface charge exchange. For simplicity, electron 
exchange [ 43 ] between ions or atoms and surfaces can be discussed in terms of two regions: (i) along the 
incoming and outgoing trajectories where the particle is within Angstroms of the surface and (ii) in the close 
atomic encounter where the core electron orbitals of the collision partners overlap. In region (i), the 

dominating processes are resonant and Auger electron tunnelling transitions, both of which are fast (x < 10 
s). Since the work functions of most solids are lower than the ionization potentials of most gaseous atoms, 
keV scattered and recoiled species are predominately neutrals as a result of electron capture from the solid. In 
region (ii), as the interatomic distance R decreases, the atomic orbitals (AOs) of the separate atoms of atomic 
number Z 1 and Z 2 evolve into molecular orbitals (MOs) of a quasimolecule and finally into the AO of the 
'united' atom of atomic number (Z 1 + Z 2 ). As R decreases, a critical distance is reached where electrons are 
promoted into higher-energy MOs because of electronic repulsion and the Pauli exclusion principle. This can 
result in collisional reionization of neutral species. The fraction of species scattered and recoiled as ions is 
sensitive to atomic structure through changes in electron density along the trajectories. A direct method for 
measuring the spatial dependence of charge transfer probabilities with atomic-scale resolution has been 
developed using the method of DR ion fractions [43]. The data demonstrate the need for an improved 
understanding of how atomic energy levels shift and broaden near surfaces. 
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These types of measurements, combined with theoretical modelling, can provide a detailed microscopic map 
of the local reactivity of the surface as well as electron tunnelling rates within the surface unit cell. This 
information is of crucial importance for the understanding of various impurity-induced promotion and 


poisoning phenomena in catalysis and electron-density maps from scanning tunnelling microscopy. 


B1.23.9 ROLE OF SCATTERING AND RECOILING AMONG SURFACE 
SCIENCE TECHNIQUES 

Scattering and recoiling contribute to our knowledge of surface science through (i) elemental analysis, (ii) 
structural analysis and (iii) analysis of electron exchange probabilities. We will consider the merits of each of 
these three areas. 

B1. 23.9.1 ELEMENTAL ANALYSIS 

There are two unique features of scattering and recoiling spectrometry: (1) sensitivity to the outermost atomic 
layer of a surface and (2) sensitivity to surface hydrogen. Using an ESA, it is possible to resolve ions scattered 
from all elements of mass greater than carbon. The TOF technique is sensitive to all elements, including 
hydrogen, although it has limited mass resolution. For general qualitative and quantitative surface elemental 
analyses, XPS and AES remain the techniques of choice. 

B1.23.9.2 STRUCTURAL ANALYSIS 

The major role of TOF-SARS and SARIS is as surface structure analysis techniques which are capable of 
probing the positions of all elements with an accuracy of <0.1 A. They are sensitive to short-range order, i.e. 
individual interatomic spacings that are <10 A. They provide a direct measure of the interatomic distances in 
the first and subsurface layers and a measure of surface periodicity in real space. One of its most important 
applications is the direct determination of hydrogen adsorption sites by recoiling spectrometry [12, 41]. Most 
other surface structure techniques do not detect hydrogen, with the possible exception of He atom scattering 
and vibrational spectroscopy. 

TOF-SARS and SARIS are complementary to LEED, which probes long-range order, minimum domain size 
of 100-200 A and provides a measure of surface and adsorbate symmetry in reciprocal space. Coupling TOF- 
SARS, SARIS and LEED provides a powerful combination for surface structure investigations. The 
techniques of medium- and high- (Rutherford backscattering) energy ion scattering only sample subsurface 
and bulk structure and are not as surface sensitive as TOF-SARS. 


B1.23.10 LOW-ENERGY SCATTERING OF LIGHT ATOMS 

Atomic and molecular beams of light atoms such as He, H and H 2 formed from supersonic nozzle beam 
sources typically have kinetic energies of 20 to 100 meV [44]. Scattering of such low-energy light atoms from 
surfaces is predominantly elastic. Coherently scattered waves from regularly spaced surface atoms can 
interfere with each other, giving rise to well known diffraction phenomena. Such hyperthermal atoms have 
classical turning points that are 
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typically about 3 A above the centres of the outermost atomic layer of the surface. The diffraction data probe 
the outer regions of the atom-surface potential. This information is usually expressed as a corrugation 
function. Structural information such as bond angles and lengths is obtained by calculating the potential or 
corrugation function from assumed geometries [44]. 


The basic components of an apparatus for such atom scattering consist of a UHV scattering chamber equipped 
with a supersonic atomic and molecular beam source, a sample manipulator and a rotating mass spectrometer. 
Cryogenic sample temperatures are usually used in order to reduce the vibrational amplitudes of the surface 
atoms. Data are obtained from the in-plane and out-of-plane diffraction intensity distributions as a function of 
scattering angle. A set of corrugation parameters is derived from this data. These parameters can be calculated 
from a first principles approach based on a proportionality relation between the atom-surface interaction 
potential and the surface charge density. Such diffraction experiments of thermal He atoms coupled with 
theoretical simulations of the data have been shown to be a very useful structural tool for studying adsorption 
on surfaces. The method is extremely surface-sensitive, is capable of providing adsorption site information 
and is one of the few techniques that can detect surface hydrogen. 


B1. 23.11 SUMMARY 

Emphasis in this chapter has been placed on the physical concepts and structural applications of TOF-SARS 
and SARIS. These techniques are now established as surface structural analysis methods that will have a 
significant impact in areas as diverse as thin-film growth, catalysis, hydrogen embrittlement and penetration 
of materials, surface reaction dynamics and analysis of interfaces. Surface crystallography is evolving from 
the classical concept of a static surface and the question of 'Where do atoms sit?' to the concept of a 
dynamically changing surface. The development of large-area detectors with rapid acquisition of scattering 
and recoiling structural images, as described in B 1.23. 7, will provide a technique for capturing time-resolved 
snapshots of such dynamically changing surfaces. 
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B1.24 Rutherford backscattering, resonance 
scattering, PIXE and forward (recoil) scattering 

C C Theron, V M Prozesky and J W Mayer 


B1.24.1 INTRODUCTION 

The use of million electron volt (MeV) ion beams for materials analysis was instigated by the revolution in 
integrated circuit technology. Thin planar structures were formed in silicon by energetic ion implantation of 
dopants to create electrical active regions and thin metal films were deposited to make interconnections 
between the active regions. Ion implantation was a new technique in the early 1960s and interactions between 
metal films and silicon required analysis. For example, the number of ions implanted per square centimetre 
(ion dose) and thicknesses of metal layers required careful control to meet the specifications of integrated 
circuit technology. Rutherford backscattering spectrometry (RBS) and MeV ion beam analysis were 
developed in response to the needs of the integrated circuit technology. In turn integrated circuit technology 
provided the electronic sophistication used in the instrumentation in ion beam analysis. It was a synergistic 
development of analytical tools and the fabrication of integrated circuits. 

Rutherford backscattering spectrometry is the measurement of the energies of ions scattered back from the 
surface and the outer microns (1 micron = 1 |um) of a sample. Typically, helium ions with energies around 2 
MeV are used and the sample is a metal coated silicon wafer that has been ion implanted with about a 
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monolayer (10 ions cm ) of electrically active dopants such as arsenic. Only moderate vacuum levels 
(about 10 Pa) are required so that sample exchange is rapid allowing the analysis of the number of 
implanted atoms per square centimetre and their distribution in depth to be carried out in periods of about 15 
minutes or less. The sample is not damaged structurally during the analysis and therefore Rutherford 
backscattering spectrometry is considered non-destructive. This is in contrast to surface sensitive techniques 
such as Auger electron spectroscopy where surface erosion by sputtering is required for depth analysis. One of 
the strong features of Rutherford backscattering spectrometry is that the scattering cross sections are well 
known so that the analysis is quantitative. In other analytical techniques, such as secondary ion mass 
spectrometry (SIMS), the cross sections are not well defined. The relative ion yields can vary over three 
orders of magnitude depending on the nature of the surface. Rutherford backscattering has been a convenient 
way to calibrate secondary ion mass spectrometry, which in turn is more sensitive to the detection of trace 
elements than Rutherford backscattering. The two techniques are thus complementary. 

Ion beam analysis grew out of nuclear physics research on cross sections and reaction products involved in 
atomic collisions. In this work million volt accelerators were developed and used extensively. As the energies 
of the incident particles increased, the lower energy accelerators became available for use in solid state 
applications. The early nuclear physics research used magnetic spectrometers to measure the energies of the 
particles. This analytical procedure was time consuming and the advent of the semiconductor nuclear particle 
detector allowed simultaneous detection of all particle energies. It was an energy dispersive spectrometer. The 
semiconductor detector is a Schottky barrier (typically a gold film on silicon) or shallow diffused p-n junction 
with the active region defined by the high electric field in the depletion layer. The active region extends tens 
of microns below the surface of the detector so that in almost every application the penetration of the 
energetic particles is confined within the active region. The response of the detector is linear with the energy 
of the particles providing a true particle energy spectrometer. 


Analysis of ion implanted layers and metal-silicon interactions was carried out with Rutherford 
backscattering at 2.0 MeV energies and with semiconductor nuclear particle detectors for several years. 
Rutherford backscattering became well established and was utilized in materials analysis in industrial and 
university laboratories across the world. The importance of hydrogen and its influence in solid state chemistry 
led to the development of forward scattering in which one measures the energy of the recoiling hydrogen 
atom. The helium ion is heavier than that of hydrogen so that by tilting the sample it is possible to measure the 
recoil energy of the emerging hydrogen, again with a nuclear particle detector. In other words, the 
modification to the Rutherford backscattering spectrometry target chamber geometry was only to tilt the target 
and to move the detector. These forward recoil techniques have of course become more sophisticated with use 
of heavy incident ions and detectors which measure both the energy and the mass of the recoiling particles (A 
E - E or time of flight detector). 

Analysis of silicon is an almost ideal experimental situation because the masses of most implanted atoms and 
metal layers exceed that of silicon. In Rutherford backscattering the mass of the atom must be greater than 
that of the silicon target to separate the energy signals of the target atom from those of the silicon spectrum. 
Oxygen is an exception. It is lighter than silicon and also is ubiquitous in surface and interface layers. The 
analysis of oxygen, and also carbon and nitrogen, are carried out in the same experimental chamber as used in 
Rutherford backscattering, but the energy of the incident helium ions is increased to energies where there are 
resonances in the backscattering cross sections. These resonances increase the yield of the scattered particle 
by nearly two orders of magnitude and provide high sensitivity to the analysis of oxygen and carbon in 
silicon. The use of these high energies, 3.04 MeV for the helium-oxygen resonance, is called resonance 
scattering and the word Rutherford is inappropriate for a descriptor. 

By inserting a semiconductor x-ray detector into the analysis chamber, one can measure particle induced x- 
rays. The cross section for particle induced x-ray emission (PIXE) is much greater than that for Rutherford 
backscattering and PIXE is a fast and convenient method for measuring the identity of atomic species within 


the outer microns of the sample surface. The energy resolution in helium ion Rutherford backscattering 
spectrometry does not allow discrimination between the signals from high atomic number (high Z) elements 
close to each other in the periodic table. With conventional semiconductor detectors one cannot distinguish 
between gold and tungsten, for example, whereas the ion induced x-ray energies are easily distinguished for 
the two high Z elements. PIXE, then, becomes another tool in the MeV ion analysis chamber and only 
requires the addition of a x-ray detector system. 

The dimensions of the incident ion beam are typically 1 mm across the width of the incident beam impinging 
on the sample surface. This dimension can be easily obtained using slits in the beam handling system. The 
beam diameter can be reduced by orders of magnitude by using quadrupole or electrostatic lenses to focus the 
ion beam to diameters of about one micron on the sample surface. The beam is then rastered across the 
surface to provide a visual image of the surface with micron resolution. In this work the large cross sections 
for PIXE are important, because sample analysis can be performed without sample damage caused by the high 
current density of incident ions. 

This overview covers the major techniques used in materials analysis with MeV ion beams: Rutherford 
backscattering, channelling, resonance scattering, forward recoil scattering, PIXE and microbeams. We have 
not covered nuclear reaction analysis (NRA), because it applies to special incident-ion-target-atom 
combinations and is a topic of its own [1, 2]. 


B1.24.2 RUTHERFORD BACKSCATTERING SPECTROMETRY (RBS) 

The discussion of Rutherford backscattering spectrometry starts with an overview of the experimental target 
chamber, proceeds to the particle kinematics that determine mass identification and depth resolution, and then 
provides an example of the analysis of a silicide. 

B1.24.2.1 TARGET CHAMBER 

Figure B 1.24.1 shows the placement of the sample and detectors in the target chamber. The sample is located 
so that its surface is on an axis of rotation of a goniometer so that the beam position does not shift across the 
sample as the sample is tilted with respect to the incident ion beam. The backscattering detector is mounted as 
close to the incident beam as possible so that the average backscattering angle, 0, is close to 180°, typically 
170°, with a detector solid angle of about 3-5 millisteradians (msr). In some cases annular detectors are used 
with the incident beam passing through the centre of the detector aperture in order to provide larger analysis 
solid angles. The sample is rotated to glancing angle geometries when the forward scattering detector is used. 
Typically a thin foil is placed in front of the detector to block the helium ions while allowing the hydrogen 
ions to pass through with only minimal energy loss. The stopping power (energy loss) of MeV helium ions is 
ten times that of the recoiling hydrogen ions. As shown in section B 1.24. 8 below, the forward scattering 
detector system can be augmented to include a AE - E detector to allow identification of the ion mass as well 
as energy. The x-ray detector is placed so that the active region is in full view of the sample surface 
bombarded with the incident ions. 
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Figure Bl.24.1. Schematic diagram of the target chamber and detectors used in ion beam analysis. The 
backscattering detector is mounted close to the incident beam and the forward scattering detector is mounted 
so that, when the target is tilted, hydrogen recoils can be detected at angles of about 30° from the beam 
direction. The x-ray detector faces the sample and receives x-rays emitted from the sample. 


For conventional backscattering spectrometry with helium ions, the energy resolution of the semiconductor 
particle detector is typically 15 kiloelectron volts (keV). This resolution can be improved to 10 keV with 
special detectors and special detectors and detector cooling. The output signal, which is typically millivolts in 
pulse height, is processed by silicon integrated circuit electronics and provides an energy spectrum in terms of 
number of particles versus energy. It is often displayed as particles versus channel number as the energy scale 
is divided into channels which must be calibrated to give the energy scale. The calibration between the 
measured particle energy and the channel number is independent of the ion energy and sample analysed and 
only depends on the semiconductor detector and associated electronics response to the energy of the ion 
beams. 
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The vacuum requirements in the target chamber are relatively modest (10 Pa) and are comparable to those 
in the accelerator beam lines. All that is required is that the ion beam does not lose energy on its path to the 
sample and that there is minimal deposition of contaminants and hydrocarbons on the surface during analysis. 


B1. 24.2.2 KINEMATICS 

In ion beam analysis the incident particle penetrates into the silicon undergoing inelastic collisions, 
predominantly with target electrons, and loses energy as it penetrates to the end of its range. The range of 2.5 
MeV helium ions is about 10 microns in silicon; the range of comparable energy protons is about ten times 
that of the helium ions (the range of 3 MeV hydrogen is about 100 microns in silicon). During the penetration 
of the helium ions, a small fraction undergo elastic collisions with the target atom to give the backscattering 
signal. 


Figure B 1.24. 2 is a schematic representation of the geometry of an elastic collision between a projectile of 
mass M 1 and energy Eq with a target atom of mass M 2 initially at rest. After collision the incident ion is 

scattered back through an angle and emerges from the sample with an energy Ey The target atom after 
collision has a recoil energy E^. There is no change in target mass, because nuclear reactions are not involved 
and energies are non-relativistic. 
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Figure Bl.24.2. A schematic representation of an elastic collision between a particle of mass M 1 and energy 
Eq and a target atom of mass M 2 . After the collision the projectile and target atoms have energies of E^ and E 2 
respectively. The angles and (|) are positive as shown. All quantities refer to the laboratory frame of 
reference. 

The ratio of the projectile energies for M 1 < M 2 is given by 
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The energy ratio, called the kinematic factor K = E^IE^ shows that the energy after scattering is determined 
by the masses of the incident particle and target atoms and the scattering angle. For a direct backscattering 
through 180° the energy ratio has its lowest value given by 


— = j — 1 I (B1.24.2) 
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For incident helium ions (M 1 = 4) at E^ = 2.0 MeV the energy E^ of the backscattered particle for silicon (M 2 
= 28) is 1.12 MeV and for palladium (M 2 = 106 ) the energy is 1.72 MeV. 

The energy E 2 transferred to the target atom has a general relation given by 

and at = 1 80° the energy 2? 2 transferred to the target atom has its maximum value given by 

Ei 4M1M7 
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In collisions where M 1 = M 2 at = 180° the incident particle is at rest after the collision, with all the energy 
transferred to the target atom. For 2.0 MeV helium ions colliding with silicon the recoil energy E 2 is 0.88 
MeV and from palladium is 0.28 MeV. 

The ability to identify different mass species depends on the energy resolution of the detector which is 
typically 15 keV full width at half maximum (FWHM). For example, silver has a mass M 2 =108 and tin has a 
mass M 2 = 1 19. The difference between K^ = 0.862 and ^ Sn = 0.874 is 0.012. For 2 MeV helium ions the 


difference in backscattering energy is 24 keV which is well outside the range of the detector resolution, 
indicating that signals from Ag and Sn on the surface can be resolved. The difference between gold and 
tungsten K values is 0.005, and at 2 MeV energies one would not resolve the signals between gold and 
tungsten. With Rutherford backscattering and conventional detectors with energy resolution of 15 keV one 
can resolve the signals from and identify the elements of masses up to 100. One can resolve isotopes up to a 
mass of around 60. For example, all the silicon isotopes can be identified. 

B1. 24.2.3 SCATTERING CROSS SECTION 

The identity of target elements is established by the energy of the scattered particles after an elastic collision. 
The number of atoms per unit area, 7V S , is found from the number g D of detected particles (called the yield, Y) 
for a given number Q of particles incident on the target. The connection is given by the scattering cross 
section a(0) by 


Y = Q n = n(0)QQN s , 


(B1.24.5) 


This is shown schematically in figure B 1.24. 3. In the simplest approximation the scattering cross section a is 
given by 
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the scattering cross section originally derived by Rutherford. For 2 MeV helium ions incident on silver, Z 2 = 

47 at 180°, the cross section is 2.89 x 10 cm or 2.89 barns where the barn = 10 cm . The distance of 
closest approach is about 7 x 10 -4 A which is smaller than the K-shell radius of silver (10~ 2 A). This means 
that the incident helium ion penetrates well within the innermost radius of the electrons so that one can use an 
unscreened Coulomb potential for the scattering. The distance of closest approach is sufficiently large that 
penetration into the nuclear core does not occur and one neglects nuclear reactions. 
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Figure Bl.24.3. Layout of a scattering experiment. Only primary particles that are scattered within the solid 
angle Q spanned by the solid state detector are counted. 


The cross sections are sufficiently large that one can detect sub-monolayers of most heavy mass elements on 

silicon. For example, the yield of 2.0 MeV helium ions from 10 14 cm -2 silver atoms (approximately one-tenth 
of a monolayer) is 800 counts for a current of 100 nanoamperes for 15 minutes and detector area of 5 msr. 

This represents a large signal for a small number of atoms on the surface. With care, 10 monolayers of gold 
on silicon can be detected. 


B1. 24.2.4 DEPTH SCALE 


Light ions such as helium lose energy through inelastic collision with atomic electrons. In backscattering 
spectrometry, where the elastic collision takes place at depth t below the surface, one considers the energy 
loss along the inward path and on the outward path as shown in figure B 1.24.4 . The energy loss on the way in 
is weighted by the kinematic factor and the total is 
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where dE/dx is the rate of energy loss with distance and [S] is the energy loss factor. An example illustrating 
the influence of depth on analysis is given in figure B 1.24. 5 which shows two thin gold layers on the front and 
back of a nickel film. The scattering from gold at the surface is clearly separated from gold at the back layer. 
The energy width between the gold signals is closely equal to that of the energy width of the nickel signal. 
This signal is nearly square shaped because nickel exists from the front to the back surface. The depth scales 
are determined from energy loss values, which are given in tabular form as a function of energy [1, 2]. It is 

often expressed as a stopping cross section in terms of (l/N) dE/dx, which gives values in eV cm . The depth 
resolution is given by dividing the detector resolution by the energy loss factor. For 2 MeV helium in silicon 

one might expect a depth resolution of about 200 A for 180° scattering geometries. This can be reduced to 
values of about 50 A for glancing incident and exit angles. These values of depth resolution degrade as the 
particle penetrates into the sample and energy straggling becomes a factor. 
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Figure Bl.24.4. Energy loss components for a projectile that scatters from depth t. The particle loses energy A 
i? in via inelastic collisions with electrons along the inward path. There is energy loss A E^ in the elastic 
scattering process at depth t. There is energy lost to inelastic collisions A E t along the outward path. For an 


incident energy E Q the energy of the exiting particle is E^ = Eq - A Ew - A E s - A E 
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Figure Bl.24.5. Backscattering spectrum of a thin Ni film (950 A) with near monolayers («30 x 10 at cm 
2 ) of Au on the front and back surfaces of the Ni film. The signals from the front and back layers of Au are 
shown and are separated in energy from each other by nearly the same energy width as the Ni signal. 

B1. 24.2.5 SIMULATION 

Rutherford backscattering spectra can be analysed by use of some of the available analysis programs. 
Programs such as RUMP and GISA provide a layer-by-layer signal for multielement targets [3, 4, 5]. These 
programs include detector resolution, energy straggling and individual isotopes, and can also be applied to 
forward recoil spectrometry for detection of light elements. These programs also include provisions for 
enhanced cross section for light elements such as carbon and oxygen. 

B1. 24.2.6 SUICIDE FORMATION 


An example of Rutherford backscattering spectrometry of the formation of PtSi is shown in figure B 1.24. 6 for 
the case where the original Pt layer has reacted to form Pt 2 Si. The backscattering signals at the high energy 
end near 1.8 MeV represent Pt at the surface of the sample. The plateau extends downward in energy to 1.7 
MeV where there is a step down to the signals from Pt in Pt 2 Si. In the Pt signal the contribution from Pt 2 Si is 
shown shaded. In the Si portion of the spectrum the signal steps upward around 1.0 MeV and represents the 
silicon in the Pt 2 Si. The second step represents the Si signal from the Si substrate. In this case, the signals 
from the unreacted Pt, the Pt and Si in Pt 2 Si and the Si in the substrate are clearly identified and can be used 
to specify the thickness and composition of the silicide layer. 
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Figure Bl.24.6. Backscattering spectrum of a layer of Pt on Si that has been thermally heated so that 
approximately half the Pt has been consumed in the formation of Pt 2 Si, the first stage in the reaction of Pt 
with Si. In the spectrum the signals from Si have been multiplied by a factor of three for visibility, because the 
atomic number of Si (14) is much less than that of Pt (78). In the spectrum the signals from Pt are in the 
region of 1.6-1.8 MeV. The higher energy corresponds to scattering from unreacted Pt and the step around 1.7 
MeV corresponds to the transition from Pt to Pt 2 Si. The signal at 1.6 MeV corresponds to the back interface 
of the Pt>Si in contact with silicon. The silicon signal in the energy range from about 0.9-1.0 MeV 
corresponds to the Si in the Pt 2 Si. At lower energies the spectrum represents signals from the Si substrate. 


B1.24.3 IN SITU REAL-TIME RBS 

The essentially non-destructive nature of Rutherford backscattering spectrometry, combined with the its 
ability to provide both compositional and depth information, makes it an ideal analysis tool to study thin-film, 
solid-state reactions. In particular, the non-destructive nature allows one to perform in situ RBS, thereby 
characterizing both the composition and thickness of formed layers, without damaging the sample. Since only 
about two minutes of irradiation is needed to acquire a Rutherford backscattering spectrum, this may be done 
continuously to provide a real-time analysis of the reaction [6]. 

There are two main applications for such real-time analysis. The first is the determination of the chemical 
reaction kinetics. When the sample temperature is ramped linearly with time, the data of thickness of formed 
phase together with ramped temperature allows calculation of the complete reaction kinetics (that is, both the 
activation energy and the pre-exponential factor) from a single sample [6], instead of having to perform many 
different temperature ramps as is the usual case in differential thermal analysis [7, 8, 9, 10 and 11 ]. The 
second application is in determining the 
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contribution that each of the elements in the reaction couple makes to the overall atomic transport across the 
forming layer. For this purpose, thin, inert markers (analogous to the thin wires used by Kirkendall [12, 13]) 
are inserted into the layers to establish a reference frame within which to measure the contribution of each 
element's flux to the overall growth. Without the use of a real-time analysis technique, one must rely on the 


use of different samples, which, although nominally identical, do not necessarily behave identically since 
many of these reactions depend critically on the exact conditions at the interfaces between the layers. On the 
other hand, if analysis can be performed on a single sample, small changes in the position of the marker can 
then confidently be interpreted. Examples of these two applications are presented below. 

B1. 24.3.1 PT-SI 

When a thin (about 3000 A) layer of Pt is deposited onto a Si wafer and then heated, the first phase that forms 
at the interface between Pt and Si is Pt 2 Si. After all the Pt has been consumed, the newly formed Pt 2 Si layer 
reacts with the Si to form PtSi, which is stable in contact with excess Si. No further reaction is observed. 
Figure B 1.24. 7 shows the progress of this reaction as the temperature is ramped linearly at a rate of 1°C min~ 

. At time zero, the signal between 1.5 and 1.9 MeV is from the unreacted Pt layer, whereas the signal from 
the Si wafer appears below about 0.9 MeV. The signal from the Si has been magnified by a factor of three to 
compensate for the differences in cross sections between Pt (Z = 78) and Si (Z = 14). 
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Figure Bl.24.7. A three-dimensional plot of backscattering signal versus time for a Pt film deposited onto Si 

and heated at a rate of 1 °C min -1 . At time zero, the Pt signal shows a square-topped energy distribution. As 
time progresses and the sample is heated, a step appears in the Pt signal, indicating the formation of the first 
phase Pt 2 Si. At longer times a second step appears, indicating the formation of the second phase PtSi after all 
the Pt has been consumed. The energy widths of the Pt signals give the thickness of the formed phases. The 
heights of the Pt signals, relative to those from Si, give the composition of the phases. 

After 60 minutes of annealing, all the Pt has reacted to form Pt 2 Si. Almost immediately thereafter the reaction 
between Pt 2 Si and Si to form PtSi starts and after a further 60 minutes all the Pt 2 Si has reacted, resulting in a 
stable PtSi film on Si. The data of silicide thickness versus ramped temperature can be plotted in reduced form 
in an Arrhenius-like plot to give the activation energy [6, 14 ]. 
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B1. 24.3.2 PD 2 SI ON CRSI 2 


When a thin film structure of Si(100)/Pd/Cr (see figure B1.24.8(a) is heated to 300 °C, the Pd quickly reacts 
with the Si to form Pd 2 Si (b). Upon further heating the Cr reacts with the Si to form CrSi 2 on top of the Pd 2 Si. 
The required silicon can either be supplied directly by the diffusion of Si atoms from the crystalline substrate 
(c) or by Pd 2 Si dissociation followed by Pd diffusion (d). The motion of a thin Ta marker embedded in the 
Pd 2 Si layer is used to distinguish between these two mechanisms. In (c) there is no movement of the marker 
relative to the Pd 2 Si layer, while in (d) the marker moves towards the Pd 2 Si / CrSi 2 interface. 
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Figure Bl.24.8. Schematic diagram of the reaction of Pd/Cr layers on (100) Si with a Ta marker placed inside 
the Pd layer. When the sample (a) is heated to 300°C, the Pd reacts with the Si to form Pd 2 Si (b). Upon 
further heating Cr reacts with Si to form CrSi 2 on top of Pd 2 Si. The required Si can either be supplied directly 
by the diffusion of Si atoms from the crystalline substrate (c) or by Pd 2 Si dissociation followed by Pd 
diffusion (d). The motion of a thin Ta marker, embedded in the Pd 2 Si layer, is used to distinguish between 
these two mechanisms. In (c) there is no movement of the marker relative to the Pd 2 Si layer, while in (d) the 
marker moves towards the Pd 2 Si/CrSi 2 interface. 

In figure B 1.24. 9 the in situ, real-time, RBS spectrum of the formation of CrSi 2 on Pd 2 Si at 425°C is shown. 
The Ta marker embedded in the Pd 2 Si layer shifts to lower energies during CrSi 2 formation in agreement with 
the prediction for the case of Si diffusion (c). In the figure, the element from which backscattering has taken 
place has been underlined [14]. 
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Figure Bl.24.9. In situ, real-time, backscattering spectrum of the formation of CrSi 2 on Pd 2 Si at 425°C. The 
structure is shown in the inset. The elements underlined represent the origin of the signal; the Si signal in 
CrSi 2 is around channel 260 and the Cr signal from CrSi 2 appears around channel 350. The Ta marker 
embedded in the Pd 2 Si layer shifts to lower energy during CrSi 2 formation indicating that Si diffusion through 


the Pd 2 Si layer has occurred as indicated in diagram (c) of figure B 1.24. 8 . 


B1.24.4 CHANNELLING 

If the sample is mounted on a goniometer so that the crystal axis or planes of a single crystal sample, such as 
silicon, are aligned within about 0.1 or 0.5 degrees, the crystal lattice can steer the trajectories of incident ions 
penetrating the crystal [15, 16]. This steering of the incident energetic beam is known as 'channelling' as the 
atomic rows and planes are guides that steer the energetic ions along the channels between rows and planes. 
The channelled ions do not closely approach the lattice atoms with the result that the backscattering yield can 
be reduced 100-fold (an aligned spectrum compared to that when the incident ions are misaligned from the 
lattice atoms gives a random spectrum). Channelling measurements can determine the amount of lattice 
disorder in which displaced atoms are located within the channels and hence accessible to backscattering 
collisions with the channelled ions. Channelling can also be used to measure the number of impurity atoms 
located sufficiently far from substitutional lattice sites that they are accessible to backscattering from the 
channelled ions. 

Channelling phenomena were studied before Rutherford backscattering was developed as a routine analytical 
tool. Channelling phenomena are also important in ion implantation, where the incident ions can be steered 
along the lattice planes and rows. Channelling leads to a deep penetration of the incident ions to depths below 
that found in the normal, near Gaussian, depth distributions characterized by non-channelled energetic ions. 
Even today, implanted channelled 
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ions are of concern when one attempts to form shallow junctions in ion implantation of integrated circuit 
structures. Channelling effects can be overcome if the silicon crystal is amorphized by a prior implantation of 
silicon or germanium atoms. 

Figure B 1.24. 10 shows schematically a random and aligned spectrum for MeV helium ions incident on 
silicon. The aligned spectrum is characterized by a peak at the high energy end of the spectrum. The peak 
represents ions scattered from the outermost layer of atoms directly exposed to the incident beam. This peak is 
called the 'surface peak'. Behind the surface peak, at lower energies, the aligned spectrum drops to a value of 
l/40th of the silicon random spectrum indicating that nearly 98% of the incident ions are channelled and do 
not make close impact collisions with the lattice atoms. The rise in the aligned spectrum at lower energies 
represent the ions that are deflected from the steering by the lattice atoms and can then collide in close impact 
collisions with the lattice atoms and hence directly contribute to the backscattering spectra. 
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Figure Bl.24.10. Random and aligned (channelled) backscattering spectrum from a single crystal sample of 
silicon. The aligned spectrum has a peak at the high energy end of the Si signal. This peak represents helium 


ions scattered from the outer layers of Si that are exposed to the incident beam. The yield behind the peak is 
1/40 th of the random yield because the Si atoms are shielded from close encounter elastic collisions from the 
ion beam that is channelled along the axial rows of the Si crystal. 

The application of channelling to Rutherford backscattering spectrometry is used to determine the amount of 
damage in ion-implanted crystal and the lattice location of ion-implanted dopant atoms. One of the important 
contributions of channelling to integrated circuit technology is the analysis of amorphous layer formation 
during ion implantation and its subsequent reanneal at temperatures near 550°C, approximately half the 
melting temperature of silicon (1414°C). Figure B 1.24. 11 shows a channelling spectrum in a silicon sample, 
where the outer 4200 A of the silicon were converted into an amorphous layer by implantation of silicon 
atoms at liquid nitrogen temperatures [17]. In the spectrum of the as-implanted sample, marked '0 minutes', 
the yield of the silicon spectra matches that of the random spectra at energies of around 1 to 1.1 MeV. This 
shows that the implanted amorphous layer has atoms that are displaced from the underlying single crystal 
silicon. The silicon signal at minutes shows a decrease at around 0.9 MeV. This decrease represents the 
fraction of channelled ions in the silicon lattice. The yield does not drop to the non-implanted level because 
the incident helium atoms suffer multiple collisions penetrating through the amorphous layer and their angular 
distribution is broadened well beyond the critical angle for channelling. The critical angle for channelling of 1 
MeV helium ions along the (100) axis of silicon at room temperature is 0.63 degrees. As the sample is 
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thermally annealed at temperatures above 500°C, the amorphous layer reorders epitaxially on the silicon 
substrate. The rear edge of the amorphous spectrum moves towards the surface such that after 30 minutes half 
of the layer has recrystallized. The yield from the single-crystal silicon behind the implanted layer decreases 
since fewer of the incident ions suffer multiple collisions sufficient to make their angular distribution exceed 
that of the critical angle. Finally, after 60 minutes annealing, almost all the implanted layer is recrystallized 
and one is left with a surface peak slightly greater than that in the non-implanted case. 
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Figure Bl.24.11. The backscattering yield from an Si sample that has been implanted with Si atoms to form 
an amorphous layer. Upon annealing this amorphous layer recrystallizes epitaxially leading to a shift in the 
amorphous/single-crystal interface towards the surface. The aligned spectra have a step between the 
amorphous and crystal substrate which shifts towards the surface as the amorphous layer epitaxially 
recrystallizes on the Si. 

Channelling only requires a goniometer to include the effect in the battery of MeV ion beam analysis 
techniques. It is not as commonly used as the conventional backscattering measurements because the lattice 
location of implanted atoms and the annealing characteristics of ion implanted materials is now reasonably 
well established [18]. Channelling is used to analyse epitaxial layers, but even then transmission electron 
microscopy is used to characterize the defects. 
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B1. 24.5 RESONANCES 

At 2 MeV energies the incident helium ion does not penetrate through the barrier around the nucleus. At 
higher energies and for lighter target atoms such as carbon, nitrogen and oxygen, the helium ion can penetrate 
and resonances in the cross sections lead to enhanced backscattering yields. This allows one to investigate 
these target atoms within silicon and even higher mass substrates. 

An example of the oxygen resonance cross section is shown in figure B 1.24. 12 which displays the cross 
section versus energy [19]. The resonance that occurs at 3.04 MeV shows a strong peak. This results in a peak 

in the backscattering spectra as shown in figure B 1.24. 13 for 3.05 MeV He incident on an LaCaMnCL film 
on an LaA10 3 substrate. In the analysis one increases the energy of the beam to move the resonance to 

increasing depths. 

Carbon also has a resonance in its cross section leading to a 100-fold increase in the backscattering signal. 
This resonance has been very convenient for analysing 1% carbon in silicon-germanium films. The resonance 
for nitrogen is not as pronounced and has not been used extensively. 
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Figure Bl.24.12. Elastic cross section of helium ions scattered from oxygen atoms. The pronounced peak in 
the spectrum around 3.04 MeV represents the resonance scattering cross section that is often used in detection 


of oxygen. 
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Figure Bl.24.13. A thin film of LaCaMn0 3 on an LaA10 3 substrate is characterized for oxygen content with 
3.05 MeV helium ions. The sharp peak in the backscattering signal at channel 160 is due to the resonance in 
the scattering cross section for oxygen. The solid line is a simulation that includes the resonance scattering 
cross section and was obtained with RUMP [3]. Data from E B Nyeanchi, National Accelerator Centre, Faure, 
South Africa. 


B1.24.6 PARTICLE-INDUCED X-RAY EMISSION (PIXE) 

The PIXE method [ 20 ] is based on the spectrometry of the radiation released during the filling of vacancies of 
inner atomic levels. These vacancies are produced by bombarding a sample with energetic (a few MeV) ions 
that are normally derived from a high- voltage accelerator. The binding energies of the electrons in the outer 
layers of the electron shell of an atom are of the order of eV and radiation produced from the rearrangement of 
electrons in these levels will be in the region of visible wavelength. On the other hand, the binding energies of 
the inner levels are of the order of keV and radiation produced from processes involving these levels will be in 
the x-ray region. More importantly, as the electron energy levels of each element are quantized and unique, 
the measurement of the x-ray energy offers the possibility of determining the presence of a specific element in 
the sample. Furthermore, the x-ray intensity of a specific energy is proportional to the concentration of the 
corresponding element in the sample. 

The relative simplicity of the method and the penetrative nature of the x-rays, yield a technique that is 
sensitive to elements with Z > 10 down to a few parts per million (ppm) and can be performed quantitatively 
from first principles. The databases for PIXE analysis programs [21, 22 and 23] are typically so well 
developed as to include accurate fundamental parameters, allowing the absolute precision of the technique to 
be around 3% for major elements and 10-20% for trace elements. A major factor in applying the PIXE 
technique is that the bombarding energy of the 
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projectiles is a few orders of magnitude more that that of the binding energies of the electrons in the atom and, 
as the x-rays are produced from the innermost levels, no chemical information is obtained in the process. The 
advantage of this is that the technique is also not matrix dependent and offers quantitative information 
regardless of the chemical states of the atoms in the sample. The major application of the technique is the 
determination of trace element concentrations and, due to the accuracy and non-destructive nature of the 
technique [24], there are few other techniques that can compete. 

The PIXE technique is described schematically in figure B 1.24. 14. A beam of energetic ions (normally 
protons of around 3 MeV) is used to eject inner-shell electrons from atoms in a sample. This unstable 
condition of the atom cannot be maintained and these vacancies are filled by outer-shell electrons. This means 
that the electrons make a transition in energy in moving from one level to another, and this energy can be 
released in the form of characteristic x-rays, the energies of which identify the atom. In a competing process, 
called Auger electron spectroscopy, this energy can also be transferred to another electron that is ejected from 
the atom and can be detected by an electron detector. Therefore, the step from ionization to x-ray production 
is not 100% efficient. The x-ray production efficiency is called the fluorescence yield and must be included in 
the database for quantitative measurements. The x-rays that are emitted from the sample are measured using 
an energy dispersive detector that has a typical energy resolution around 2.5% (150 eV at 6 keV). 




Figure Bl.24.14. A schematic diagram of x-ray generation by energetic particle excitation, (a) A beam of 
energetic ions is used to eject inner- shell electrons from atoms in a sample, (b) These vacancies are filled by 
outer-shell electrons and the electrons make a transition in energy in moving from one level to another; this 
energy is released in the form of characteristic x-rays, the energy of which identifies that particular atom. The 
x-rays that are emitted from the sample are measured with an energy dispersive detector. 
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By convention, the transitions filling vacancies in the innermost shell are called K x-rays, those filling the 
next shell are L x-rays, etc. The energies of L x-rays are normally much lower than those of K x-rays, and 


similarly M x-rays have much lower energies than L x-rays. Due to the structure of the electron shells, there 
are naturally more possible transitions yielding L x-rays and even more possibilities of yielding M x-rays; 
therefore it becomes more complex to measure the higher-order x-rays. Typically, the analytical method is 
limited to K, L and M x-rays. The limitation of detecting elements with Z > 10 is due to the low energies of x- 
rays from the light elements that are absorbed before reaching the detector. The high yield of low-energy x- 
rays that originate from the major elements of a sample can be eliminated by a filter in front of the detector. 
Although the stopping of the bombarding ion is depth dependent, the measured x-ray energy gives no direct 
indication of the depth at which it was produced, and therefore the technique does not provide depth 
distribution information. 

Typically, PIXE measurements are performed in a vacuum of around 10 -4 Pa, although they can be performed 
in air with some limitations. Ion currents needed are typically a few nanoamperes and current is normally not 
a limiting factor in applying the technique with a particle accelerator. This beam current also normally leads 
to no significant damage to samples in the process of analysis, offering a non-destructive analytical method 
sensitive to trace element concentration levels. 

An example of a PIXE spectrum is shown in figure B 1.24. 15 this spectrum was obtained from the analysis of 
a piece of ivory to establish whether its source could be determined from trace element concentrations [25], 
The spectrum shows the contribution from the different elements, also showing the high Ca yield originating 
from the Ca-rich matrix of the ivory. In this case an 80 (urn Al filter was used to filter most of the x-rays from 
Ca, as they tend to dominate the spectrum. As most interest was focused on the higher-energy part of the 
spectrum (the higher-energy x-rays are typically not absorbed as much as those of low energy through the 
same filter) this enabled better sensitivities for the heavier elements to be obtained. To maximize the 
sensitivity and statistical accuracy, the yields from all the K- or L-shell x-rays from an element are used 
together to determine the concentration for each element [21]. As can be seen in the figure, the x-ray peaks are 
situated on a continuous background due to bremsstrahlung of the projectiles and secondary electrons and, 
typically, PIXE software programs perform non-linear iterative procedures to obtain accurate information on 
peaks and this background [26], 
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Figure Bl.24.15. An example of a PIXE spectrum. This spectrum was obtained from the analysis of a piece 
of ivory to establish whether the origin of the ivory could be determined from trace element concentrations. 


The spectrum shows the contribution from the different elements, also showing the Ca yield originating from 
the Ca-rich matrix of the ivory. In this case an 80 |um Al filter was used to filter most of the x-rays from Ca, as 
they tend to dominate the spectrum. As interest was focused on the higher-energy part of the spectrum (the 
higher-energy x-rays are typically not absorbed as much as those of low energy through the same filter), this 
enabled better sensitivities for the heavier elements to be obtained. The x-ray peaks are situated on a 
continuous background of bremsstrahlung from the projectiles and secondary electrons and, typically, PIXE 
software programs perform non-linear iterative procedures to obtain accurate information on peaks and this 
background. 


B1.24.7 NUCLEAR MICROPROBE (NMP) 

A microbe employs a focused beams of energetic ions, to provide information on the spatial distribution of 
elements at concentration levels that range from major elements to a few parts per million [27]. The range of 
techniques available that allowed depth information plus elemental composition to be obtained could all be 
used in exactly the same way; it simply became possible to obtain lateral information simultaneously. 
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The basis of the nuclear microprobe (NMP) is a source of energetic ions from a particle accelerator. These 

ions are then focused using either magnetic or electrostatic lenses to a minimum spot size less than 1 jum . 
The technology of focusing ions is still at the stage where the resolution of the NMP does not compete with 
the electron microprobe. A set of deflection plates allows movement or scanning of the ion beam and, using 
computer control, the position of the bombarding beam is known. At each point of irradiation, analytical data 
are acquired. In this way, the analytical information obtained can be presented as an image. Naturally, the 
imaging capabilities of the NMP is limited by the sensitivity of each technique used since an entire spectrum 
must be collected at each 'pixel' in the image. For example, a 128 x 128 pixel image is equivalent to 16 384 
single-point analyses. For this reason the analytical techniques used in imaging are mostly limited to PIXE in 
the case of trace elements, RBS and forward recoil spectrometry (FRS) for depth information and light 
elements and nuclear reaction analysis (NRA) to the detection of elements at high concentrations. Naturally 
the NMP can also be used in a point analysis mode, as for example in the case of geological applications [28], 
Here the grain sizes that need to be analysed are often of the order of a few microns and a reasonably small 
bombarding beam is necessary to limit the analysis to a specific grain. 

An example of the application of the PIXE technique using the NMP in the imaging mode [29] is shown in 
figure B 1.24. 16 . The figures show images of the cross section through a root of the Phaseolus vulgaris L. 
plant. In this case the material was sectioned, freeze-dried and mounted in vacuum for analysis. The scales on 
the right hand sides of the figures indicate the concentrations of the elements presented in ppm by weight. 
From the figures it is clear that the transports of the elements through the root are very different not only in 
the cases of the major elements Ca and K, but also in the case of the trace element Zn. These observations 
offer a wealth of information that is useful to a botanist studying dynamic processes in plant material. 

The quantitative imaging capability of the NMP is one of the major strengths of the technique. The advanced 
state of the databases available for PIXE [21, 22 and 23] allows also for the analysis of layered samples as, for 
example, in studying non-destructively the elemental composition of fluid inclusions in geological samples. 

The application of RBS is mostly limited to materials applications, where concentrations of elements are 
fairly high. RBS is specifically well suited to the study of thin film structures. The NMP is useful in studying 
lateral inhomogeneities in these layers [30] as, for example, in cases where the solid state reaction of elements 
in the surface layers occur at specific locations on the surfaces. Other aspects, such as lateral diffusion, can 
also be studied in three-dimensions. 


-21- 



<j%$* t 


100 mteron 

Figure Bl.24.16. An example of the application of the PIXE technique using the NMP in the imaging mode. 
The figures show images of the cross section through a root of the Phaseolus vulgaris L. plant. In this case the 
material was sectioned, freeze-dried and mounted in vacuum for analysis. The scales on the right of the 
figures indicate the concentrations of the elements in ppm by weight. It is clear that the transports of the 
elements through the root are very different, not only in the cases of the major elements Ca and K, but also in 
the case of the trace element Zn. 
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There are some special techniques that can be used with the NMP, specifically scanning transmission ion 
microscopy (STIM). In this case the bombarding ion beam is allowed to penetrate through a thin sample and 
the energies of the transmitted ions are measured. As the energy loss of the ions through the sample is directly 
proportional to the amount of material traversed, the sample 'thickness' can be imaged with very high 


efficiency. The technique is so efficient because every ion is counted. Beam currents of only a few fA are 
needed, thus permitting an imaging resolution of about 100 nm. The technique is well suited for the study of 
biological material where cell structure is easily identifiable due to the thickness differences in different parts 
of cells. An example is shown of STIM measurements of human oral cancer cells in figure B 1.24. 17. The 
different images indicate areas of different thickness, starting from thin to thick areas. The technique offers a 
'thickness' scan through the sample and, in this case, the cell walls of one specific cell can be seen in the areas 
dominated by thicker structures. There is relatively little material in the inner areas of the cell. 
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Figure Bl.24.17. An example of scanning transmission ion microscopy (STIM) measurements of a human 
oral cancer cell. The different images indicate different windows in the energy of transmitted helium ions as 
indicated in the figure. White indicate areas of high counts. The technique offers a 'thickness' scan through 
the sample, and, in this case, the cell walls of one specific cell can be seen in the areas dominated by thicker 
structures (data from C A Pineda, National Accelerator Centre, Faure, South Africa). 

Another special application of the NMP is the measurement of single event upset (SEU) in memory structures 
of computer chips [31]. In this technique, a single ion is directed onto a part of the memory structure, with a 
subsequent measurement of whether the memory bit was changed by the ion impact. In this way, the radiation 
hardness of different parts of the memory can be imaged. This information is valuable for production of 
components for space applications, where devices are subjected to high fluxes of ionizing radiation. A modern 
trend is also to study SEUs in living biological material to detect structures susceptible to radiation damage. A 
similar technique is the study of ion-beam-induced charge (IBIC) collection 32 from p-n junctions in 
semiconductor material. In this case, the ion beam is directed onto a p-n junction and the current flowing 
through the junction is measured. By rastering the beam an image can be obtained of the quality of these 
junctions. 
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B1.24.8 FORWARD RECOIL SPECTROMETRY (FRS) 


Forward recoil spectrometry (FRS) [33], also known as elastic recoil detection analysis (ERDA), is 
fundamentally the same as RBS with the incident ion hitting the nucleus of one of the atoms in the sample in 
an elastic collision. In this case, however, the recoiling nucleus is detected, not the scattered incident ion. RBS 
and FRS are near-perfect complementary techniques, with RBS sensitive to high-Z elements, especially in the 
presence of low-Z elements. In contrast, FRS is sensitive to light elements and is used routinely in the 
detection of H at sensitivities not attainable with other techniques [34]. As the technique is also based on an 
incoming ion that is slowed down on its inward path and an outgoing nucleus that is slowed down in a similar 
fashion, depth information is obtained for the elements detected. 


The analytically important parameters in FRS are exactly those of RBS. Naturally, the target nucleus can only 
recoil in the forward direction and, therefore, thick targets must be bombarded at an oblique angle to allow 
detection of the recoil. Thin targets allow the recoiling nucleus to be transmitted through the target. In the case 
of thick targets, the incident ion also has a high probability of scattering from the target into the detector. It is 
common that a filter is applied in front of the detector to remove scattered projectiles. This is possible because 
the projectile has a higher Z and lower energy and can be stopped while allowing the recoils through to the 
detector. Because of the kinematics as well as straggling, the depth resolution is somewhat worse than that 
obtainable with RBS. A simple schematic of the experimental setup for FRS is shown in figure B 1.24. 18. 
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Figure Bl.24.18. A simple schematic diagram of the experimental setup for FRS. The most common use of 
the technique is that of hydrogen detection using 4 He of a few MeV with the recoils being detected at 30° with 
respect to the beam direction and using a stopper foil to keep He from hitting the detector. This set-up can be 
generalized to include an energy loss (A E) detector in front of the detector thus allowing, in one experiment, 
the separation of signals due to hydrogen, deuterium and tritium. 
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The most common use of FRS is the detection of H using 4 He of a few MeV, with the recoils being detected 
at 30° with respect to the beam direction and a stopper foil to keep He from hitting the detector. This set-up 
can be generalized to include an energy loss (A E) detector in front of the detector, thus allowing the 
separation of signals from H, D and T in one experiment. The result of such an experiment [35] is shown in 
figure B 1.24. 19 where a sample was analysed for hydrogen, deuterium and tritium content using a 3.8 MeV 

He beam, and detecting the recoils at 30° with a 13.6 \im A E detector. The three-dimensional graph shows a 
plot of the counts obtained versus A E on the one axis and the energy measured (with a surface barrier 
detector) after passing through the A E detector on the other axis. The traces due to the three isotopes of 
hydrogen can clearly be seen, with the edges at high E corresponding to the surface of the sample. The shape 
of the traces from the surface to lower energies are indicative of the depth distribution of isotopes in the 
sample. 


3000 ~ 


MOO L 




1O0O _ 



Figure Bl.24.19. The FRS result of an experiment where a sample was analysed for hydrogen, deuterium and 

tritium content using a 3.8 MeV 4 He beam, detecting the recoils at 30° with a 13.6 |um A E detector. The 
three-dimensional graph shows a plot of the counts obtained versus A^on the one axis and the total energy 
(E + AE) on the other axis. The traces due to the three isotopes of hydrogen can clearly be seen, with the 
edges at high E corresponding to the surface of the sample. The shape of the traces from the surface to lower 
energies are indicative of the depth distribution of these isotopes in the sample. 

The most advanced applications of the FRS technique employ high-energy (some tens of MeV) heavy ions, 
such as CI and I [36], In this case a number of nuclei lighter than the projectile can be detected. A detector 
that can separate different nuclei is required. A two-stage detector is used in which either the time of flight or 
the energy loss A E is determined together with the energy, thus allowing the separation of nuclei with 
different mass (and Z in the case of A E). 

An example of such a measurement is shown in figure B 1.24. 20 where a A E-E detector telescope was used to 
discriminate between different elements. When using heavy ions as incident particles in the analysis of 
surface layers, care must be taken not to damage the surface. 
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Figure Bl.24.20. An example of a heavy-ion (iodine) FRS measurement where a large A E-E detector 


telescope was used to enable the discrimination of different elements. The plot shows counts (as intensity 
plot) versus A E and E. The sample analysed was graphite introduced in an experimental nuclear fusion device 
(tokamak). In this device, a plasma causes different elements to be deposited on the surface of the sample. 
These elements were determined using a 200 MeV I beam with the detector telescope at 37.5° with respect to 
the incident beam. It is clear that all the elements from Ni down to Be can be detected in one experiment. The 
starting points of the traces at high energies indicate concentration of the elements at or near the surface. The 
trend of the lines as a function of E indicates the concentrations of the elements as a function of depth. 
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B1.25 Surface chemical characterization 

L Coulier and J W Niemantsverdriet 


B1.25.1 INTRODUCTION 

Chemical characterization of surfaces plays an important role in various fields of physics and chemistry, e.g. 
catalysis, polymers, metallurgy and organic chemistry. This section briefly describes the concepts and a few 
examples of the techniques that are most frequently used for chemical surface characterization, which are x- 
ray photoelectron spectroscopy (XPS), Auger electron spectroscopy (AES), ultraviolet photoelectron 
spectroscopy (UPS), secondary ion mass spectrometry (SIMS), temperature programmed desorption (TPD) 
and electron energy loss spectroscopy (EELS), respectively. We have tried to give examples in a broad range 
of fields. References to more extensive treatments of these techniques and others are given at the end of the 
section, see ' Further readin g'. 

Although the techniques described undoubtedly provide valuable results on various materials, the most useful 
information almost always comes from a combination of several (chemical and physical) surface 
characterization techniques. Table B 1.25.1 gives a short overview of the techniques described in this chapter. 


B1.25.2 ELECTRON SPECTROSCOPY (XPS, AES, UPS) 


B1. 25.2.1 X-RAY PHOTOELECTRON SPECTROSCOPY (XPS) 


X-ray photoelectron spectroscopy (XPS) is among the most frequently used surface chemical characterization 
techniques. Several excellent books on XPS are available [1, 2, 3, 4, 5, 6 and 7]. XPS is based on the 
photoelectric effect: an atom absorbs a photon of energy hv from an x-ray source; next, a core or valence 
electron with binding energy E^ is ejected with kinetic energy ( figure B 1.25.1 ): 


£ t = llV — E]y — ip 


(B1.25.1) 


where E^ is the kinetic energy of the photoelectron, h is Planck's constant, v is the frequency of the exciting 
radiation, E^ is the binding energy of the photoelectron with respect to the Fermi level of the sample and cp is 
the work function of the spectrometer. Routinely used x-ray sources are Mg Ka (hv = 1253.6 eV) and Al Ka 
(hv = 1486.3 eV). In XPS one measures the intensity of photoelectrons N(E) as a function of their kinetic 
energy E^. The XPS spectrum is a plot of N(E) versus E^ (= hv -E^- cp). 


Table Bl.25.1 Overview of the surface characterization techniques described in this chapter. 


Acronym Full name Principle of measurement 


Key information 


XPS 


AES 


UPS 
SIMS 


TPD 


EELS 


X-ray 

photoelectron 

spectroscopy 

Auger electron 
spectroscopy 


Absorption of a photon by an atom, followed by the ejection of a Composition, 
core or valence electron with a characteristic binding energy. oxidation state, 

dispersion 


After the ejection of an electron by absorption of a photon, an 
atom stays behind as an unstable ion, which relaxes by filling 
the hole with an electron from a higher shell. The energy 
released by this transition is taken up by another electron, the 
Auger electron, which leaves the sample with an element- 
specific kinetic energy. 

UV photoelectron Absorption of UV light by an atom, after which a valence 
spectroscopy electron is ejected. 


Secondary ion 

mass 

spectroscopy 


Temperature 
programmed 
desorption 

Electron energy 

loss 

spectroscopy 


A beam of low-energy ions impinges on a surface, penetrates 
the sample and loses energy in a series of inelastic collisions 
with the target atoms leading to emission of secondary ions. 


After pre-adsorption of gases on a surface, the desorption 
and/or reaction products are measured while the temperature 
increases linearly with time. 

The loss of energy of low-energy electrons due to excitation of 
lattice vibrations. 


Surface 

composition, depth 
profiles 


Chemical bonding, 
work function 

Surface 
composition, 
reaction 

mechanism, depth 
profiles 

Coverages, kinetic 
parameters, 
reaction 
mechanism 

Molecular 
vibrations, reaction 
mechanism 


Photoelectron peaks are labelled according to the quantum numbers of the level from which the electron 
originates. An electron coming from an orbital with main quantum number n, orbital momentum / (0, 1, 2, 3, 
. . . indicated as s, p, d, f, . . .) and spin momentum s (+1/2 or -1/2) is indicated as nl^ . For every orbital 
momentum / > there are two values of the total momentum: j = / + 1/2 andy = / - 1/2, each state filled with 
2/ + 1 electrons. Hence, most XPS peaks come in doublets and the intensity ratio of the components is (/ + 
1)//. When the doublet splitting is too small to be observed, the subscript / + s is omitted. 

Figure B 1.25. 2 shows the XPS spectra of two organoplatinum complexes which contain different amounts of 
chlorine. The spectrum shows the peaks of all elements expected from the compounds, the Pt 4f and 4d 
doublets (the 4f doublet is unresolved due to the low energy resolution employed for broad energy range 
scans), CI 2p and CI 2s, N Is and C Is. However, the C Is cannot be taken as characteristic for the complex 
only. All surfaces that have not been cleaned by sputtering or oxidation in the XPS spectrometer contain 
carbon. The reason is that adsorbed hydrocarbons from the atmosphere give the optimum lowering of the 
surface free energy and hence, all surfaces are covered by hydrocarbon fragments [9]. 
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Figure Bl.25.1. Photoemission and Auger decay: an atom absorbs an incident x-ray photon with energy hv 
and emits a photoelectron with kinetic energy E^ = hv - E b . The excited ion decays either by the indicated 
Auger process or by x-ray fluorescence. 
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Figure Bl.25.2. XPS scans between and 450 eV of two organoplatinum complexes showing peaks due to 
Pt, CI, N and C. The C Is signal represents not only carbon in the compound but also contaminant 
hydrocarbon fragments, as in any sample. The abbreviation 'Me' in the structures stands for CH 3 (courtesy of 
J C Muijsers, Eindhoven). 


Because a set of binding energies is characteristic for an element, XPS can analyse chemical composition. 
Almost all photoelectrons used in laboratory XPS have kinetic energies in the range of 0.2 to 1.5 keV, and 
probe the outer layers of the sample. The mean free path of electrons in elemental solids depends on the 
kinetic energy. Optimum surface sensitivity is achieved with electrons at kinetic energies of 50-250 eV, 
where about 50% of the electrons come from the outermost layer. 

Binding energies are not only element specific but contain chemical information as well: the energy levels of 
core electrons depend on the chemical state of the atom. Chemical shifts are typically in the range 0-3 eV. In 
general, the binding energy increases with increasing oxidation state and, for a fixed oxidation state, with the 
electronegativity of the ligands. Figure Bl.25.3 illustrates the sensitivity of XPS binding energy to oxidation 
states for platinum in metal, and in the two organoplatinum complexes of Figure Bl.25.2 . The Pt 4f 7/2 peak 
of the metal comes at 71.0 eV, that of the complex where Pt has one CI ligand at 72.0 eV, characteristic of 

Pt 2+ , while the binding energy of the Pt 4+ in the complex with three CI ligands on platinum is again 2 eV 
higher, 74.4 eV [9]. The binding energy goes up with the oxidation state of the platinum. The reason is that 

the 74 electrons in the Pt ion (lower curve) feel a higher attractive force from the nucleus with a positive 
charge of 78 than the 76 electrons in Pt (middle curve) or the 78 in the neutral Pt atom (upper curve). 
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Figure Bl.25.3. Pt 4f XPS spectra of platinum metal (top) and of the two organoplatinum compounds (a) and 
(b), middle and bottom respectively, shown in figure B 1.25. 2, illustrating that the Pt 4f binding energy reflects 
the oxidation state of platinum (from [9]). 


Note that XPS measures binding energies. These are not necessarily equal to the energy of the orbitals from 
which the photoelectron is emitted. The difference is caused by reorganization of the remaining electrons 
when an electron is removed from an inner shell. Thus, the binding energy of the photoelectron contains both 
information on the state of the atom before photoionization (the initial state) and on the core-ionized atom left 
behind after the emission of an electron (the final state) [6]. Fortunately, it is often correct to interpret binding 
energy shifts in terms of initial state effects. 

Determining compositions is possible if the distribution of elements over the outer layers of the sample and 
the surface morphology is known. Two limiting cases are considered, namely a homogeneous composition 
throughout the outer layers and an arrangement in which one element covers the other. 

For homogeneous mixed samples it is relatively easy to determine the relative concentrations of the various 
constituents. For two elements one has approximately: 


BlM=(/|/5|)/(/2/fi) 


(B1.25.2) 


where n^ I n 2 is the ratio of elements 1 and 2, 1^ I 2 are the intensities of the peaks of elements 1 and 2 (i.e. the 
area of the peaks) and Sp S 2 are atomic sensitivity factors which are tabulated [8]. 

A more accurate calculation will account for differences in the energy dependent mean free paths of the 
elements and for the transmission characteristics of the electron analyser (see [7]). 


An example in which XPS is used for studying surface compositions and oxidation states is illustrated in 


figure B 1.25. 2 and figure B 1.25. 3 for two organoplatinum complexes. The samples were prepared for XPS by 
letting a solution of the complexes in dichloromethane dry on a stainless steel sample stub. The sample should 
thus be homogeneous and the use of ( Bl.25.2 ) permitted. Figure B 1.25. 2 shows the wide scan up to a binding 
energy of 450 eV [9]. The figure shows immediately that the CI peaks in the spectrum of the trichloride 
complex are about three times as intense as in the spectrum of the compound with one CI. If we apply 
( Bl.25.2 ) for the elements Pt, N and CI, we obtain Pt:N:Cl = 1:1.9:4 for the trichloride complex, close to the 
true stoichiometry of 1 :2:3. 

In the case of metal particles distributed on a support material (e.g. supported catalysts), XPS yields 
information on the dispersion. A higher metal/support intensity ratio (at the same metal content) indicates a 
better dispersion [3]. 

Another good example of the application of XPS in a different field of chemistry is shown in figure B 1.25. 4 . 
This figure shows the C Is spectrum of a polymer [11]. Four different carbon species can be distinguished. 
Reference tables indicate that the highest binding energy peak is due to carbon-fluorine species [7, 8]. The 
other three peaks are attributed (from high to low binding energy) to an ester species, an ether species and 
hydrocarbon/benzene fragments, respectively [8]. Hence, the carbon XPS spectrum nicely reflects the 
structure of the polymer. This example is also a nice illustration of the influence of the electronegativity of the 
ligands on the binding energy of carbon: the binding energy of carbon increases with the electronegativity of 
the ligands. 
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Figure Bl.25.4. C Is XPS spectrum of a polymer, illustrating that the C Is binding energy is influenced by 
the chemical environment of the carbon. The spectrum clearly shows four different kinds of carbon, which 
corresponds well with the structure of the polymer (courtesy of M W G M Verhoeven, Eindhoven). 

Owing to the limited escape depth of photoelectrons, the surface sensitivity of XPS can be enhanced by 
placing the analyser at an angle to the surface normal (the so-called take-off angle of the photoelectrons). This 
can be used to determine the thickness of homogeneous overlay ers on a substrate. 

This is demonstrated by the XPS spectra in figure B 1.25. 5(a) which show the Si 2p spectra of a silicon crystal 
with a thin (native) oxide layer, measured under take-off angles of 0° and 60° [12]. When the take-off angle is 

high, relatively more photoelectrons from the oxide surface region reach the analyser and the Si /Si intensity 
ratio increases with increasing angle. Figure B 1.25. 5(b) shows the intensity ratio as a function of take-off 
angle, the line being a fit corresponding to a flat, homogeneous oxide layer with a uniform thickness of 2 nm. 
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Figure Bl.25.5. (a) XPS spectra at take-off angles of 0° and 60° as measured from the surface normal from a 
silicon crystal with a thin layer of Si0 2 on top. The relative intensity of the oxide signal increases 
significantly at higher take-off angles, illustrating that the surface sensitivity of XPS increases, (b) Plot of 
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Si /Si 2p peak areas as a function of take-off angle. The solid line is a fit which corresponds to an oxide 
thickness of 2.0 nm (from [12]). 


An experimental problem in XPS is that electrically insulating samples may charge up during measurement, 
due to photoelectrons leaving the sample. Since the sample thereby acquires a positive charge, all XPS peaks 
in the spectrum shift by the same amount to higher binding energies. More serious than the shift itself is that 
different parts of the sample may acquire slightly different amounts of charge. This phenomenon, called 
differential charging, gives rise to broadening of the peaks and degrades the resolution. Correction for 
charging-induced shifts is made by using the binding energy of a known compound (in most cases one uses 
the C Is binding energy of 284.6 eV). Alternatively, in certain circumstances, a low energy electron beam can 
be used to neutralize the charged surface and eliminate the shift. 


Sensitive materials, such as metal salts or organometallic compounds, may decompose during XPS analysis, 
particularly when a standard x-ray source is used. Apart from the x-rays themselves, heat and electrons from 
the source may cause damage to the samples. In such cases, a monochromated x-ray source can offer a 


solution [9]. Damage is in particular an issue in imaging XPS, where the x-ray intensity is focused in a narrow 
spot. In this mode, a small hole in front of the analyser entrance enables one to select electrons from an area of 
a few micrometres, such that an image of the surface composition can be made. 


B1. 25.2.2 AUGER ELECTRON SPECTROSCOPY (AES) 

Auger electron spectroscopy is a powerful technique in the fields of materials and surface science [2, 3 and 4, 
7]. In AES, core holes are created by exciting the sample with a beam of electrons. The excited ion relaxes by 
filling the core hole with an electron from a higher shell. The energy released by this transition is taken up by 
another electron, the Auger electron, which leaves the sample with an element-specific kinetic energy ( figure 
Bl.25.1 ). The Auger electrons appear as small peaks on a high background of secondary electrons, scattered 
by the sample from the incident beam. To enhance the visibility of the Auger peaks, spectra are usually 
presented in the derivative (dN/dE) mode, see figure B 1.25. 6. 
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Figure Bl.25.6. Energy spectrum of electrons coming off a surface irradiated with a primary electron beam. 
Electrons have lost energy to vibrations and electronic transitions (loss electrons), to collective excitations of 
the electron sea (plasmons) and to all kinds of inelastic process (secondary electrons). The element-specific 
Auger electrons appear as small peaks on an intense background and are more visible in a derivative 
spectrum. 

Auger peaks are labelled according to the x-ray level nomenclature. For example, KL^ stands for a 
transition in which the initial core hole in the K shell is filled from the L, shell, while the Auger electron is 
emitted from the L 2 shell. Valence levels are indicated by 'V as in the KVV transitions of carbon or oxygen. 
The energy of an Auger electron formed in a KLM transition is to a good approximation given by 


£klm = Ek - El — E*a - 5E - tp 


(B1.25.3) 


where E^ LM is the kinetic energy of the Auger electron, E f is the binding energy of an electron in the i shell (i 
= K, L, M, . . .), 8E is the energy shift caused by relaxation effects and cp is the work function of the 
spectrometer. The 8E term accounts for the relaxation effect involved in the decay process, which leads to a 
final state consisting of a heavily excited, doubly ionized atom. 


Auger peaks also appear in XPS spectra. In this case, the x-ray ionized atom relaxes by emitting an electron 
with a specific kinetic energy E^. One should bear in mind that in XPS the intensity is plotted against the 
binding energy, so one uses ( Bl.25.1 ) to convert to kinetic energy. 

The strong point of AES is that it provides a quick measurement of elements in the surface region of 
conducting samples. For elements having Auger electrons with energies in the range of 100-300 eV where the 
mean free path of the electrons is close to its minimum, AES is considerably more surface sensitive than XPS. 

Auger electron spectroscopy allows for three types of measurement. First, it provides the elemental surface 
composition of a sample. If the Auger decay process involves valence electrons, one often obtains information 
on the oxidation state as well, although XPS is certainly the better technique for this purpose. Second, owing 
to the short data collection times, AES can be combined with sputtering to measure concentrations as a 
function of depth, see figure B 1.25. 7. Third, as electron beams are easily collimated and deflected 
electrostatically, AES can be used to image the composition into a chemical map of the surface (scanning 
Auger spectroscopy). The best obtained resolution is now around 25 nm [7]. 
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Figure Bl.25.7. Auger sputter depth profile of a layered Zr0 2 /Si0 2 /Si model catalyst. While the sample is 
continuously bombarded with argon ions that remove the outer layers of the sample, the Auger signals of Zr, 
O, Si and C are measured as a function of time. The depth profile is a plot of Auger peak intensities against 
sputter time. The profile indicates that the outer layer of the model catalyst contains carbon. Next Zr and O are 
sputtered away, but note that oxygen is also present in deeper layers where Zr is absent. The left-hand pattern 
is characteristic for a layered structure, and confirms that the zirconium is present in a well dispersed layer 
over the silicon oxide. The right-hand pattern is consistent with the presence of zirconium oxide in larger 
particles (from [27]). 
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A disadvantage of AES is that the intense electron beam easily causes damage to sensitive materials 
(polymers, insulators, adsorbate layers). Charging of insulating samples also causes serious problems. 


B1. 25.2.3 ULTRAVIOLET PHOTOELECTRON SPECTROSCOPY (UPS) 


Ultraviolet photoelectron spectroscopy (UPS) [2, 3 and 4, 6] differs from XPS in that UV light (He I, 21.2 eV; 
He II, 40.8 eV) is used instead of x-rays. At these low exciting energies, photoemission is limited to valence 
electrons. 

Hence, UPS spectra contain important chemical information. At low binding energies, UPS probes the density 
of states (DOS) of the valence band (but images it in a distorted way in a convolution with the unoccupied 
states). At slightly higher binding energies (5-15 eV), occupied molecular orbitals of adsorbed gases may 
become detectable. UPS also provides a quick measure of the macroscopic work function, (|), the energy 
separation between the Fermi and the vacuum level: § = hv -W, where ^is the width of the spectrum. UPS is 
a surface science technique typically applied to single crystals, the main reason being that all elements 
contribute peaks to the valence band region. As a result, the UPS spectra of compounds which contain more 
than two elements are rather complicated. 


B1.25.3 SECONDARY ION MASS SPECTROMETRY (SIMS) 

Secondary ion mass spectrometry (SIMS) is by far the most sensitive surface technique, but also the most 
difficult one to quantify. SIMS is very popular in materials research for making concentration depth profiles 
and chemical maps of the surface. For a more extensive treatment of SIMS the reader is referred to [3] and 
[ 14 , 15 and 16]. The principle of SIMS is conceptually simple: When a surface is exposed to a beam of ions 

(Ar + , Cs + , Ga + or other elements with energies between 0.5 and 10 keV), energy is deposited in the surface 
region of the sample by a collisional cascade. Some of the energy will return to the surface and stimulate the 
ejection of atoms, ions and multi-atomic clusters (figure B 1.25. 8). In SIMS, secondary ions (positive or 
negative) are detected directly with a mass spectrometer. 
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Figure Bl.25.8. The principle of SIMS: Primary ions with an energy between 0.5 and 10 keV cause a 
collisional cascade below the surface of the sample. Some of the branches end at the surface and stimulate the 
emission of neutrals and ions. In SIMS, the secondary ions are detected directly with a mass spectrometer. 
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SIMS is, strictly speaking, a destructive technique, but not necessarily a damaging one. In the dynamic mode, 
used for making concentration depth profiles, several tens of monolayers are removed per minute. In static 
SIMS, however, the rate of removal corresponds to one monolayer per several hours, implying that the surface 
structure does not change during the measurement (between seconds and minutes). In this case one can be 
sure that the molecular ion fragments are truly indicative of the chemical structure on the surface. 

The advantages of SIMS are its high sensitivity (ppm detection limit for certain elements), its ability to detect 
hydrogen and the emission of molecular fragments which often bear tractable relationships with the parent 


structure on the surface. A disadvantage is that secondary ion formation is a poorly understood phenomenon 
and that quantitation is usually difficult. A major drawback is the matrix effect: Secondary ion yields of one 
element can vary tremendously with its chemical environment. This matrix effect and the elemental sensitivity 
variation of five orders of magnitude across the periodic table make quantitative interpretation of SIMS 
spectra of many compounds extremely difficult. 

Figure B 1.25. 9(a) shows the positive SIMS spectrum of a silica-supported zirconium oxide catalyst precursor, 
freshly prepared by a condensation reaction between zirconium ethoxide and the hydroxyl groups of the 
support [17]. Note the simultaneous occurrence of single ions (H + , Si 4+ , Zr + ) and molecular ions (SiO + , 
SiOH + , ZrO + , ZrOH + , ZrCK. Also, the isotope pattern of zirconium is clearly visible. Isotopes are important in 
the identification of peaks, because all peak intensity ratios must agree with the natural abundance. In addition 
to the peaks expected from zirconia on silica mounted on an indium foil, the spectrum in figure B 1.25. 9(a) 

also contains peaks from the contaminants, Na + , K + and Ca + . This is typical for SIMS: sensitivities vary over 
several orders of magnitude and elements such as the alkalis are already detected when present in trace 
amounts. 
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Figure Bl.25.9. Positive SIMS spectra of a Zr0 2 /Si0 2 catalyst, (a) after preparation from Zr(OC 2 H 5 ), (b) 
after drying at 40 °C and (c) after calcination in air at 400 °C (from [17]). 
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The most useful information is in the relative intensities of the Zr + , ZrO + , ZrOH + and ZrCKions. This is 
illustrated in figure B 1.25. 9(b) and figure B 1.25. 9(c) which show the isotope patterns of these ions of a 
freshly dried and a calcined catalyst, respectively [17]. Note that the SIMS spectrum of the fresh catalyst 
contains small but significant contributions from ZrOH + ions (107 amu, 90 ZrOH + , and 111 amu, ZrOH + ). 
ZrOH + is most probably a fragment ion from zirconium ethoxide. In the spectrum of the catalyst which was 
oxidized at 400 °C, the isotope pattern in the ZrO range resembles that of Zr, indicating that ZrOH species are 
absent. Spectrum figure B 1.25. 9(b) of the zirconium ethoxide (0:Zr = 4: 1) shows higher intensities of the 
ZrWand ZrO + signals than the calcined Zr0 2 (0:Zr = 2:1) does. The way to interpret this information is to 
compare the spectra of the catalysts with reference spectra of Zr0 2 and zirconium ethoxide [17]. 


For single crystals, matrix effects are largely ruled out and excellent quantization has been achieved by 


calibrating SIMS yields by means of other techniques such as EELS and TPD (see further) [18]. Here SIMS 
offers the challenging perspective to monitor reactants, intermediates and products of catalytic reactions in 
real time while the reaction is in progress. 

A good example of monitoring adsorbed species on surfaces with SIMS is shown in figure B 1.25. 10. This 
figure shows positive SIMS spectra of the interaction of NO with the Rh(l 11) surface [19]. The lower curve 
shows the adsorption of molecular NO (peak at 236 amu) on Rh(l 1 1) at 120 K. The middle curve shows the 
situation after heating the sample to 400 K. The presence of the peaks at 220 amu (Rh 2 N + ) and 222 amu 
(Rh 2 + ) and the absence of the Rh 2 NO + (236 amu) indicate that NO has dissociated. Heating the sample at 
400 K in H 2 causes the removal of atomic oxygen and, thus, the disappearance of the Rh 2 + at 222 amu, as 
can be seen in the upper curve. 
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Figure Bl.25.10. SIMS spectra of the Rh(l 1 1) surface after adsorption of 0.12 ML NO at 120 K (bottom), 
after heating to 400 K (middle) and after reaction with H 2 at 400 K (top) (from [19]). 
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As in Auger spectroscopy, SIMS can be used to make concentration depth profiles and, by rastering the ion 
beam over the surface, to make chemical maps of certain elements. More recently, SIMS has become very 
popular in the characterization of polymer surfaces [14, 15 and 16 ]. 


B1. 25.4 TEMPERATURE PROGRAMMED DESORPTION (TPD) 


Thermal desorption spectroscopy (TDS) or temperature programmed desorption (TPD), as it is also called, is 
a simple and very popular technique in surface science. A sample covered with one or more adsorbate(s) is 
heated at a constant rate and the desorbing gases are detected with a mass spectrometer. If a reaction takes 
place during the temperature ramp, one speaks of temperature programmed reaction spectroscopy (TPRS). 


TPD is frequently used to determine (relative) surface coverages. The area below a TPD spectrum of a certain 
species is proportional to the total amount that desorbs. In this way one can determine uptake curves that 
correlate gas exposure to surface coverage. If the pumping rate of the UHV system is sufficiently high, the 
mass spectrometer signal for a particular desorption product is linearly proportional to the desorption rate of 
the adsorbate [20, 21]: 


r = -dOfdr = kfaO" = vmt)" eKp(-E^{6)/RT) (B1.25.4) 


where r is the rate of desorption, E^ is the activation energy of desorption, is the coverage in monolayers, 
R is the gas constant, t is the time, T is the temperature, k^ QS is the reaction rate constant for desorption, T^ is 

the temperature at the start, n is the order of desorption, (3 is the heating rate, equal to dT/dt and v is the 
preexponential factor of desorption. 

With the aid of (B 1.25. 4), it is possible to determine the activation energy of desorption (usually equal to the 
adsorption energy) and the preexponential factor of desorption [21, 24]. Attractive or repulsive interactions 
between the adsorbate molecules make the desorption parameters E and v dependent on coverage [22]. In 
the case of TPRS one obtains information on surface reactions if the latter is rate determining for the 
desorption. 

Figure B 1.25. 11 shows the temperature programmed reaction between 0.15 ML of CO and 0.24 ML of NO 
adsorbed at 150 K on a Rh(l 1 1) single crystal [23]. The spectra show the desorption of species with masses 
28, 29, 30, 44 and 45, corresponding to N 2 , 13 CO, NO, N 2 and 13 C0 2 respectively, as functions of the 

temperature. N 2 , 13 CO and 13 CO,> are the only desorption products, indicating that NO is totally dissociated 
and all N atoms are converted to N 2 . CO is not decomposed at all; part of it desorbs as CO and part of it reacts 
with atomic oxygen to C0 2 . At higher NO coverages (not shown) the TPD spectrum has also a mass 30 
signal, which is due to desorption of NO [28]. In this case, the coverage of adsorbed CO is too low to convert 
all NO to N 2 and C0 2 . 
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Figure Bl.25.11. Temperature programmed reactions of 0.15 ML of 13 CO coadsorbed with 0.24 ML of NO. 
Adsorption was done at 150 K and the heating rate was 5 K s _1 (from [23]). 

The disadvantage of TPD is that, in order to derive the kinetic parameters, rather involved computations are 
necessary [21, 24]. As an alternative to the complete desorption analysis, many authors rely on simplified 
methods. The analysis of spectra using simplified analysis should be made with care, as simplified analysis 
methods may easily give erroneous results [21]. 
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B1.25.5 ELECTRON ENERGY LOSS SPECTROSCOPY (EELS) 

Molecules possess discrete levels of vibrational energy. Vibrations in molecules can be excited by interaction 
with waves and with particles. In electron-energy loss spectroscopy (EELS, sometimes HREELS for high 
resolution EELS), a beam of monochromatic, low energy electrons falls on the surface, where it excites lattice 
vibrations of the substrate, molecular vibrations of adsorbed species and even electronic transitions. An 
energy spectrum of the scattered electrons reveals how much energy the electrons have lost to vibrations, 
according to the formula 


E = Eiy — Air 


(B1.25.5) 


where E is the energy of the scattered electron, E^ is the energy of the incident electrons, h is Planck's 
constant and v is the frequency of the excited vibration. The use of electrons requires that experiments are 
done in high vacuum and preferably on the flat surfaces of single crystals or foils (making ultrahigh vacuum 
conditions desirable). 

While infrared and Raman spectroscopy are limited to vibrations in which a dipole moment or the molecular 
polarizability changes, EELS detects all vibrations. Two excitation mechanisms play a role in EELS: dipole 


and impact scattering [4]. 

In dipole scattering we are dealing with the wave character of the electron. Close to the surface, the electron 
sets up an electric field with its image charge in the metal. This oscillating field is perpendicular to the surface 
and excites only those vibrations in which a dipole moment changes in a direction normal to the surface, 
similarly as in reflection absorption infrared spectroscopy. The outgoing electron wave has lost an amount of 
energy equal to hv, see (B 1.25. 5), and travels mainly in the specular direction. 

The second excitation mechanism, impact scattering, involves a short range interaction between the electron 
and the molecule (put simply, a collision) which scatters the electrons over a wide range of angles. The useful 
feature of impact scattering is that all vibrations may be excited and not only the dipole active ones. As in 
Raman spectroscopy, the electron may also take an amount of energy hv away from excited molecules and 
leave the surface with an energy equal to Eq + hv. 

Figure B 1.25. 12 illustrates the two scattering modes for a hypothetical adsorption system consisting of an 
atom on a metal [3]. The stretch vibration of the atom perpendicular to the surface is accompanied by a 
change in dipole moment; the bending mode parallel to the surface is not. As explained above, the EELS 
spectrum of electrons scattered in the specular direction detects only the dipole-active vibration. The more 
isotropically scattered electrons, however, undergo impact scattering and excite both vibrational modes. Note 
that the comparison of EELS spectra recorded in specular and off-specular direction yields information about 
the orientation of an adsorbed molecule. 
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Figure Bl.25.12. Excitation mechanisms in electron energy loss spectroscopy for a simple adsorbate system: 
Dipole scattering excites only the vibration perpendicular to the surface (Vj) in which a dipole moment 
normal to the surface changes; the electron wave is reflected by the surface into the specular direction. Impact 
scattering excites also the bending mode v 2 in which the atom moves parallel to the surface; electrons are 
scattered over a wide range of angles. The EELS spectra show the highly intense elastic peak and the 
relatively weak loss peaks. Off-specular loss peaks are in general one to two orders of magnitude weaker than 
specular loss peaks. 


A strong point of EELS is that it detects losses in a very broad energy range, which comprises the entire 
infrared regime and extends even to electronic transitions at several electron volts. EELS spectrometers have 
to satisfy a number of stringent requirements. First, the primary electrons should be monochromatic. Second, 


the energy of the scattered electrons should be measured with a high accuracy. Third, the low energy electrons 
must effectively be shielded from magnetic fields [25]. 

Figure B 1.25. 13 shows an HREELS spectrum of CO adsorption on a Rh(l 11) surface [26]. In the experiment 

3 L of CO was adsorbed at a pressure of 1 x 10~ 8 mbar and T= 200 K. At zero energy loss one observes the 
highly intense elastic peak. The other peaks in the spectrum are loss peaks. At high energy, loss peaks due to 
dipole scattering are visible. In this case they are caused by CO vibration perpendicular to the surface. The 

peak at 2070 cm is attributed to on-top adsorption of CO on Rh, while the peak at 1861 cm corresponds to 
CO adsorption on a threefold Rh site. The loss peaks at low energy loss are due to metal-adsorbate vibrations. 

In this case it is the Rh-CO bond. The peak at 434 cm -1 is due to the Rh-CO vibration of the CO adsorbed on 
top, that at 390 cm -1 to the Rh-CO vibration in threefold CO. The CO molecules order in a (2 x 2) -3CO 
structure, with one linear and two threefold CO molecules per (2 x 2) unit cell. 
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Figure Bl.25.13. HREELS spectrum of CO adsorbed on Rh(l 1 1) at T= 200 K. Visible are the C-0 vibration 

peaks at energy losses arom 
(courtesy of R Linke [26]). 


peaks at energy losses around 1800-2100 cm 1 and the Rh-CO signals at energy losses 300-500 cm 1 
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B1.26 Surface physical characterization 

W T Tysoe and Gefei Wu 


B1.26.1 INTRODUCTION 

The physical structure of a surface, its area, morphology and texture and the sizes of orifices and pores are 
often crucial determinants of its properties. For example, catalytic reactions take place at surfaces. Simple 

statistical mechanical estimates suggest that a surface-mediated reaction should proceed about 10 12 times 
faster than the corresponding gas-phase reaction for identical activation energies [1]. The catalyst operates by 
lowering the activation energy of the reaction to accelerate the rate. The reaction rate, however, also increases 
in proportion to the exposed surface area of the active component of the catalyst so that maximizing its area 
also strongly affects its activity. Catalysts often have complicated morphologies, consisting of exposed 
regions, and small micro- and meso-pores. A traditional method for measuring these areas, which is still the 
workhorse for the catalytic chemists, is to titrate the surface with molecules of known 'areas' and to measure 
the amount that just covers it. This is done by pressurizing the sample using probe gases and gauging when a 
single layer of adsorbate forms. This relies on developing robust theoretical methods for determining the 
equilibrium between the gas phase and the surface. This was done in 1938 by Brunauer, Emmett and Teller. 
Brunauer and Emmett were catalytic chemists and Teller a theoretical physicist who was persuaded to 
undertake the theoretical task of developing an adsorption isotherm [2]. This he apparently did in one day and 
the Brunauer-Emmett-Teller isotherm was born. This, with minor modifications, is the isotherm still used 
today. 

On planar systems, morphologies can be generally measured by directly imaging them using optical ( section 
B1.19 and section B 1.21 ) or electron ( B1.18 ) microscopies or using scanning probes (see section B 1.20 ). 
Coarse morphologies can be measured using crude probes such as profilometers [3] and, more recently, at the 
atomic level, using atomic force microscopy ( section B 1.20 ). The measurement of the thickness and 
properties of thin films deposited onto planar surfaces is more of a challenge. Electron-based spectroscopic 
probes can measure the nature of the outer selvedge of planar films (see section B 1.6 , section B 1.7 , section 
B1.21 ). For example, x-ray photoelectron spectroscopy is useful for measuring films of few Angstroms thick 
using the electron escape depth. The film itself can be probed using optical spectroscopic techniques such as 
infrared ( B1.2 ) and Raman ( B1.3 ) spectroscopies. Ellipsometry, the change in polarization of linearly 
polarized light as it passes through the film, is used to measure film thickness non-destructively over a wide 
range. It is particularly useful for probing surface coatings such as anti-reflection and protective films and has 
been used very effectively to probe overlay ers in ultrahigh vacuum and the formation kinetics of self- 


assembled monolayers on gold. 

The final technique addressed in this chapter is the measurement of the surface work function, the energy 
required to remove an electron from a solid. This is one of the oldest surface characterization methods, and 
certainly the oldest carried out in vacuo since it was first measured by Millikan using the photoelectric effect 
[4]. The observation of this effect led to the proposal of the Einstein equation: 


t\ = hv-e$> (B1.26.1) 


where v is the light frequency, h is Planck's constant, E^ the kinetic energy of the emitted electron, e the 
charge on an electron and O the material work function. The resulting notion of wave-particle duality led 
directly to the development of quantum mechanics. This is not strictly a physical probe since the work 
function of a clean sample depends on its electronic structure. This is strongly affected by the presence of 
adsorbates, electronegative adsorbates leading to an increase in work function, and electropositive adsorbates 
to a decrease. The observations have technological implications, so that filaments used today as electron 
sources in cathode ray (television) and vacuum tubes (valves) are coated with electropositive alkaline earth 
compounds that lower the work function and enhance the thermionically emitted current. This allows the 
filaments to operate effectively at lower temperatures and thereby increases their lifetimes. The main 
experimental utility of this method is to measure, in a simple and direct way, the coverage of an adsorbate on 
a surface (see section Al. 7 ). 


B1.26.2 THE BRUNAUER-EMMETT-TELLER (BET) METHOD 

B1. 26.2.1 PRINCIPLES 

(A) MEASUREMENTS OF SURFACE AREA BY GAS ADSORPTION 

The central idea underlying measurements of the area of powders with high surface areas is relatively simple. 
Adsorb a close-packed monolayer on the surface and measure the number TV of these molecules adsorbed per 
unit mass of the material (usually per gram). If the specific area occupied by each molecule is A then the 

m, 

total surface area S A of the sample is simply given by: 


S A = NA m . (B1.26.2) 

The saturation coverage during chemisorption on a clean transition-metal surface is controlled by the 
formation of a chemical bond at a specific site [5] and not necessarily by the area of the molecule. In addition, 
in this case, the heat of chemisorption of the first monolayer is substantially higher than for the second and 
subsequent layers where adsorption is via weaker van der Waals interactions. Chemisorption is often useful 
for measuring the area of a specific component of a multi-component surface, for example, the area of small 
metal particles adsorbed onto a high-surface-area support [6], but not for measuring the total area of the 
sample. Surface areas measured using this method are specific to the molecule that chemisorbs on the surface. 
Carbon monoxide titration is therefore often used to define the number of 'sites' available on a supported 
metal catalyst. In order to measure the total surface area, adsorbates must be selected that interact relatively 
weakly with the substrate so that the area occupied by each adsorbent is dominated by intermolecular 
interactions and the area occupied by each molecule is approximately defined by van der Waals radii. This 


generally necessitates experiments being carried out at low temperatures such that kT<KA H,^,, the heat of 

adsorption. Since now both the interaction of the first and subsequent absorbate layers is dominated by van 
der Waals 


forces, this leads not simply to the formation of a single monolayer, but also to the growth of second, third and 
subsequent layers. This distinction is shown in figure B 1.26.1 which plots coverage versus pressure at 
constant temperature (an isotherm) for a molecule (hydrogen) which chemisorbs at the surface where the 
saturation of the monolayer is clearly evident from the appearance of a plateau [7]. In the case of 
physisorption, as demonstrated in figure B 1.26. 2 , subsequent layers can grow so that the number of 
molecules adsorbed in the first monolayer is much more difficult to identify [8]. It is clear, in this case, that 
the first monolayer forms somewhere near the first 'knee' of the isotherm, labelled point 'B'. The importance 
of this point was first emphasized by Emmett [9]. In order to usefully measure the total surface area, the shape 
of the adsorption isotherm must be analysed to more clearly distinguish between monolayer (point B) and 
multilayer adsorption. In 1985, IUPAC introduced a classification of six different types of adsorption 
isotherm [ 11 ] ( figure B 1.26. 3 ) exhibited by real surfaces. Types I-V were originally classified by Brunauer, 
Denning, Denning and Teller (BDDT) [10]. Type I represents the Langmuir isotherm [12] for monolayer 
coverage and is most often exhibited for chemisorption where the heat of adsorption in the first layer is much 
greater than that in subsequent layers, but also corresponds to physisorption by microporous absorbents (pore 
width < 2 nm) within the solid. Type II are monolayer-multilayer isotherms and represent non-porous or 
macroporous absorbents (pore width > 50 nm). Industrial absorbents and catalysts which possess mesoporous 
(pore width 2-50 nm) structures often exhibit type IV behaviour. The shapes of types III and V isotherms are 
analogous to type II and IV respectively, but with weak gas-solid interactions. The stepwise type VI 
isotherms can be obtained with well defined, uniform solids. The origin of the shapes of some of these 
isotherms will be discussed in greater detail below. 
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Figure Bl.26.1. Sorption isotherm for chemisorption of hydrogen on palladium film at 273 K (Stephens S J 
1959 J. Phys. Chem. 63 188-94). 


(B) TYPICAL ISOTHERMS FOR THE PHYSICAL ADSORPTION OF GASES ON SURFACES 


As noted above, an isotherm plots the number of molecules adsorbed on the surface at some temperature in 
equilibrium with the gas at some pressure. Adsorption gives rise to a change in the free energy which, of 


course, depends on the number of molecules already adsorbed on the surface (defined as the coverage, 0, see 
section Al. 7 ). If it is assumed for simplicity that the structure of the adsorbed phase is similar to that of a solid 
(so that, at equilibrium, the chemical potential of the bulk phase equals the chemical potential for the gas 
phase in equilibrium with it) then: 


Chilli; = M° ■'■ 8T In P it (B1 .26.3) 

where Pq is the equilibrium vapour pressure. The chemical potential of the adsorbed phase in equilibrium with 
the gas at some temperature T can similarly be written as: 

Vzte= V? + RTlllF. (B1.26.4) 

We assume for simplicity that the adsorbed phase has the same entropy as the solid so that only an energy 
change is associated with the transfer of material from the bulk to the adsorbed phase, then: 

&F = (JLtte - ftbaifc =NAE (B1.26.5) 

where AE is the change in energy per adsorbed atom or molecule in going from the solid to the adsorbed 
phase. Combining equation (B 1.26. 3), equation (B 1.26.4) and equation (B 1.26. 5) yields: 

. P AE 

In — ^ , (B1.26.6) 

f kr 

If we knew the variation in AE as a function of coverage 0, this would be the equation for the isotherm. 
Typically the energy for physical adsorption in the first layer, -A E^ when adsorption is predominantly 
through van der Waals interactions, is of the order of 10^7 where T is the temperature and k the Boltzmann 

constant, so that, according to equation (B 1.26. 6), the first layer condenses at a pressure given by P/Pq~ 10~ 3 . 
This accounts for the rapid initial rise in the isotherm for low values of PIP ^ shown in figure B1.26.2 .Tn the 

case of chemisorption, where the interaction is even stronger, the first monolayer saturates at even lower 
values of P/Pq. It is initially assumed that adsorption into the second layer is dominated by van der Waals 
interactions with the sample. The attractive van der Waals energy e between two molecules in the gas-phase 

separated by a distance r is e a — KJ r where K^ is a constant. When combined with an r repulsive 
potential, this yields the Lennard- Jones 6-12 equation. An adsorbed species interacts with atoms in the 

truncated bulk of the sample, and the number of these increases as ~r 3 so that the net van der Waals 
interaction with the surface varies as r -3 . This indicates that the energy of adsorption in the second layer A£" 9 , 
assuming this to be dominated by van der Waals interactions with the surface, is given by: AE 2 = AE^/2 . 


From equation (B 1.26. 6) , this yields ln(PAP ) — 1.3 so that the pressure at which this layer is complete 
should be PIPq ~ 0.3, a value significantly higher than that required to saturate the first layer. Similarly, the 

energy required to saturate the third layer will be reduced by a factor 3 , the fourth layer 4 3 and so on. 
Therefore, if a value of AE is assumed for the first layer, the pressures at which the second and subsequent 
layer saturate can be calculated from this value. Writing this in terms of the coverage yields a simple form 

for the variation of AE as a function of coverage as AEy0 3 . This yields an isotherm for physisorption 
dominated by van der Waals interactions with the surface as: 


P Af, 

In — = r 

^o kTH* 


(B1.26.7) 


Such isotherms are shown in figure B 1.26.4 for the physical adsorption of krypton and argon on graphitized 
carbon black at 77 K [ 13 ] and are examples of type VI isotherms ( figure B 1.26. 3 ). Equation (B 1.26. 7)) further 

predicts that a plot of ln(P/P ) versus 1/0 3 should be linear: such a plot is displayed in figure B 1.26.5 [ 13 ] for 
the adsorption of argon on graphitized carbon black at 77 K and yields a good straight line. 



Figure Bl.26.2. Adsorption isotherm for nitrogen on anatase at 77 K showing 'point B' (Harkins W D 1952 
The Physical Chemistry of Surface Films (New York: Reinhold)). 
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Figure Bl.26.3. The IUPAC classification of adsorption isotherms for gas-solid equilibria (Sing K S W, 
Everett D H, Haul RAW, Mosoul L, Pierotti R A, Rouguerol J and Siemieniewska T 1985 Pure. Appl. Chem. 
57 603-19). 



Figure Bl.26.4. The adsorption of argon and krypton on graphitized carbon black at 77 K (Eggers D F Jr, 
Gregory N W, Halsey G D Jr and Rabinovitch B S 1964 Physical Chemistry (New York: Wiley) ch 18). 
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Figure Bl.26.5. Plot of \n(PIP^) versus 1/0 for argon on graphitized carbon black at 77 K (from the argon 
data in figure Bl.26.4 ) (Eggers D F Jr, Gregory N W, Halsey G D Jr and Rabinovitch B S 1964 Physical 
Chemistry (New York: Wiley) ch 18). 


It is clear, however, that not all isotherms display the stepwise behaviour shown in figure Bl.26.4 . For 


example, the isotherm for the adsorption of nitrogen on anatase [8] ( figure B 1.26. 2 ) has a rapid increase in 
coverage at low pressures, corresponding to the saturation of the first monolayer, but varies much more 
smoothly with coverage at higher pressures. This effect is also seen in the data for argon on carbon black [ 13 ] 
( figure B 1.26.4 ) where the steps become much less pronounced for multilayer adsorption. Part of the reason 
for this discrepancy is that, as the layer becomes thicker, intermolecular van der Waals interactions within the 
layer become more important. This has two effects. First, the difference in energy between layers becomes 
less pronounced, leading to a smoothing out of the curve. In addition, this decrease in energy difference for 
each layer means that subsequent layers start to grow even before previous monolayers have saturated. 
Finally, in the case of high-surface-area samples, the surface tends to become more heterogeneous, leading to 
a further smoothing out of the steps. The limiting case is an isotherm calculated on a different basis to that 
used for equation (B 1.26. 8) . Here it is assumed that adsorption in the first layer is dominated by surface van 
der Waals interactions, but that adsorption into second and subsequent layers is dominated by intermolecular 
interactions between adsorbates. This clearly no longer results in a strong variation in AE with each layer and 
allows multilayer films to be formed in which another layer can start before the previous layer has been 
completed. This smooths out the isotherm resulting in a variation of coverage with P that more closely 
resembles that shown in figure B 1.26.2 . Such an isotherm has the advantage that it often more closely mimics 
the behaviour of nitrogen physisorbed on high-surface-area materials and, as such, is more useful in 
reproducibly identifying 'point B'. This forms the basis of a reproducible method for measuring surface areas 
using the BET isotherm. The calculation of the BET isotherm assumes that: 


1 . There are B equivalent sites available for adsorption in the first layer. 

2. Each molecule adsorbed in the first layer is considered to be a possible adsorption site for molecules 
adsorbing into a second layer, and each molecule adsorbed in the second layer is considered to be a 
'site' for adsorption into the third layer, and so on. 

3. All molecules in the second and subsequent layers are assumed to behave similarly to a liquid, in 
particular to have the same partition function. This is assumed to be different to the partition function 
(A2.2) of molecules adsorbed into the first layer. 

4. Intermolecular interactions are ignored for all layers. 

Detailed derivations of the isotherm can be found in many textbooks and exploit either statistical thermo- 
dynamic methods [1] or independently consider the kinetics of adsorption and desorption in each layer and set 
these equal to define the equilibrium coverage as a function of pressure [14]. The most common form of BET 
isotherm is written as a linear equation and given by: 


1 t P(C -", (B1.26.8) 


vw-fi,) v m c v„cn, 

Here V m is the volume of gas required to saturate the monolayer, Fthe total volume of gas adsorbed, P the 
sample pressure, P^ the saturation vapour pressure and C a constant related to the enthalpy of adsorption. The 
resulting shape of the isotherm is shown plotted in figure B 1.26. 6 for C = 500. A plot of P/V(P - P Q ) against 
P/Pq should give a straight line having a slope (C- i)/V C and an intercept W C. The BET surface area is 
then calculated using the following equation: 

Sa = V m NA m (B1.26.9) 


where S A is the required surface area of the sample, V m the volume of the adsorbed monolayer, N Avogadro's 
number and A m the cross-sectional area of the adsorbed molecule. In the BET method, where nitrogen is 

generally used, the value of A m is taken to be 16.2 A 2 per nitrogen molecule. Classically, this BET equation is 
used for only for systems that exhibit type II and IV isotherms. 



Figure Bl.26.6. BET isotherm plotted from equation (B 1.26. 8) using a value of C = 500. 
(C) MEASUREMENT OF BET ISOTHERMS 

Practically, using the BET method to measure surface area involves three steps: (1) obtaining a full adsorption 
isotherm, (2) evaluating the monolayer capacity and (3) the calculation of surface area using equation 
(Bl.26.9) . It should be emphasized that the BET surface area represents a 'standard' method for measuring 
the area of a high-surface-area material that allows samples from different laboratories to be compared. Note 
that, due to the simplifying assumptions of the derivation, the BET method does not work well for type III and 
V isotherms where the weak interaction between gas and solid makes it hard to discern the formation of the 
first layer. Attempts to modify the BET equation to take account of these situations have proven unreliable 
and impractical. Beyond the BET method, approaches such as the Gibbs adsorption isotherm [15], immersion 
calorimetry [16] or adsorption from solution [ 17 ] have been used but the BET method continues as a standard 
procedure for the determination of surface areas [18]. Generally the measured value of BET-area can be 
regarded as an effective area unless the material is ultramicroporous. In the case of porous materials, it is 
important to know the pore sizes and their distributions. This can be calculated for type IV absorbents where 
the mesopore size distribution can be obtained using the Kelvin equation: 


\Pj rRT 


(B1.26.10) 


This equation describes the additional amount of gas adsorbed into the pores due to capillary action. In this 
case, Fis the molar volume of the gas, y its surface tension, R the gas constant, T absolute temperature and r 
the Kelvin radius. The distribution in the sizes of micropores may be determinated using the Horvath- 
Kawazoe method [19]. If the sample has both micropores and mesopores, then the T-plot calculation may be 
used [20], The T-plot is obtained by plotting the volume adsorbed against the statistical thickness of 
adsorbate. This thickness is derived from the surface area of a non-porous sample, and the volume of the 
liquified gas. 
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B1. 26.2.2 INSTRUMENTATION 


Two parameters must be measured to apply the BET equation, the pressure at the sample and the amount 
adsorbed at this pressure. There are three common methods for measuring the amount of gas adsorbed, called 
the volumetric method, the gravimetric method and the dynamic method, of which the volumetric method is 
the commonest [21]. 

(A) VOLUMETRIC METHOD 

This method essentially consists of admitting successive charges of gas (generally nitrogen) to the adsorbent 
using some form of volumetric measuring device such as a gas burette or pipette with the sample held at 
liquid nitrogen temperature (77 K). Nowadays, the amount of gas admitted to the sample can most 
conveniently be measured using a mass-flow controller. When equilibrium has been attained, the gas pressure 
in the dead space surrounding the sample is read using a manometer, and the quantity of gas remaining 
unadsorbed is then calculated with the aid of the gas law, assuming a perfect gas. The volume of the dead 
space of the apparatus must, of course, be accurately calibrated. A precision manometer should be employed. 
The quantity of gas adsorbed onto the surface can be calculated by subtracting the amount remaining 
unadsorbed from the total amount which has been admitted. This type of apparatus can be simply constructed 
from glass. Alternatively, commercial BET measuring apparatuses are also available using computers to 
collect the data and to calculate the resulting surface area [22]. Shown in figure B 1.26. 7 is a schematic 
diagram of a typical apparatus for measuring BET isotherms [23]. Helium, which does not adsorb at liquid 
nitrogen temperatures, is used to calibrate the volume of the dead space and the furnace is used to outgas the 
sample prior to gas adsorption. A mass-flow controller is often used instead of a burette or pipette to measure 
total amount of nitrogen that has been admitted. 
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Figure Bl.26.7. Diagram of a BET apparatus (representing an OMNISORB 100) (Beckman Coulter 1991 
OMNISORP Manual). 
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(B) GRAVIMETRIC METHOD 

This method is simple but experimentally more cumbersome than the volumetric method and involves the use 
of a vacuum microbalance or beam balance [22]. The solid is suspended from one arm of a balance and its 
increase in weight when adsorption occurs is measured directly. The 'dead space' calculation is thereby 
avoided entirely but a buoyancy correction is required to obtain accurate data. Nowadays this method is rarely 
used. 

(C) DYNAMIC METHOD 

This method has been developed using gas chromatographic techniques. The most popular way of 
implementing this method is by using a continuous nitrogen flow as first described by Nelsen and Eggertsen 
[24]. A known mixture of nitrogen and helium is passed through a bed of solid sample at ambient temperature 
(-300 K) where the exit gas is measured using a gas chromatographic detector. When the gas composition 
equilibrates as indicated by a constant base line on the GC recorder chart, the sample tube is immersed in a 
liquid nitrogen bath. The adsorption of nitrogen by the solid is then indicated by a negative excursion on the 
recorder chart corresponding to a loss of nitrogen from the system due to adsorption. After equilibrium is 
established at the particular partial pressure with the sample held at 77 K, the baseline attains its original level. 
The sample tube is then allowed to warm up to room temperature (300 K) by removal of the liquid nitrogen 
bath so that a positive excursion appears due to nitrogen desorption (see figure B 1.26. 8) [24]. The areas under 
these two curves should be equal and constitute a measure of the amount of nitrogen adsorbed. This method 
has drawn considerable interest due to its simplicity and speed and since it does not require a vacuum system. 
One of the main problems is that of deciding on the most appropriate conditions for 'outgassing' which can 
considerably affect the precision. 
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Figure Bl.26.8. Adsorption/desorption peaks for nitrogen obtained with the continuous flow method (Nelsen 
F M and Eggertsen F T 1958 Anal Chem. 30 1387-90). 
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B1.26.2.3 EXPERIMENTAL NOTES 


Nitrogen is the most widely used absorbent (at 77 K) for the BET method and has been employed almost 
universally. Argon is more suited to the measurement of microporous zeolites. Krypton may be used for the 


2„-U 


measurement of very low-surface-area (less than 3 m g ) samples because it has a very low saturation 
vapour pressure (-2.45 Torr) at liquid nitrogen temperature so that the absolute pressure range is from to 
approximately 0.75 Torr for a ratio of PIPq from to 0.3. The absolute pressure range used for the 
measurement depends upon the type of data reduction required and the absorbent's properties. The optimum 


pressure range needed for BET surface area determinations may be taken to be for PIPq up to 0.3 [23]. 
B1. 26.3 ELLIPSOMETRY 

B1. 26.3.1 PRINCIPLES 

The term ellipsometry was first coined by Rothen in 1945 [ 25 ] to refer to the measurement of thin films of 
materials by monitoring the light reflected from them at some incident angle 0. The method is illustrated in 
figure B 1.26. 9 . The technique was used extensively before this [26]. In the simplest case, the film is 
transparent with refractive index n, such that light can be reflected or transmitted at the first interface. The 
transmitted beam propagates through the material and is reflected from the substrate. The reflected portion of 
the beam can either subsequently reflect from the film/air interface or reflect once again into the film and 
undergo further reflections. The multiple beams are eventually emitted together so that, in general they are 
attenuated and one of the parameters that can most simply be measured is the reflectivity of the film. In 
addition, because of the phase shifts that occur because of path length differences as the beam passes through 
the film, there is also a change of the phase of this beam which depends on the film thickness d and the 
wavelength of the light, X. The way in which this phase change can be measured will be described below. 
Each of these values, that is, the reflectivity and the phase shift, depend on the polarization of the radiation 
(see below). When the light is polarized parallel to the surface it is said to be s polarized and when it is 

polarized perpendicularly to the surface, p polarized. The corresponding reflectivities are denoted R s and R? 
which, since there is a phase change on reflection, are generally complex numbers. The ratio of these values is 

defined as p which is given by: 

0= . (B1.26.11) 

Since this is a complex number, it can be separated into an amplitude and a phase and written as: 

p = — -exp(iA) (B1.26.12) 

where A is the difference between the phase shifts for p- and s-polarized light: A = 8 - 8 . By convention, the 

s p 

ratio of the moduli of the reflectivities of the p- and s-polarized light |^ p |/|7^ s | is written in terms of another 
parameter W where tan^ = |i? p |/|/? s |. Thus, the parameters that are measured in ellipsometry are W and A 
which can be related to the refractive index and thickness of the film. 


-13- 



RefradM- 

lliL\\. i'.. 


TliiclaKss d 


Index, n, 

Figure Bl.26.9. Schematic diagram showing the reflection of light incident at an angle from a medium 
with refractive index n 1 through a film of thickness d with refractive index n 2 . 

(A) THE NATURE OF ELECTROMAGNETIC RADIATION 

As shown by Maxwell, light consists of an oscillating electromagnetic field propagating in vacuum at the 
speed of light [27, 28, 29, 30, 31, 32 and 33]. Both the electric and magnetic fields are oriented 
perpendicularly to the direction of propagation of the light and perpendicularly to each other. By convention, 
the direction in which the electric field points defines the polarization direction so that plane-polarized light 
has an electric field component only in one direction. A common way to obtain plane-polarized light is to use 
a polarizer which only transmits light of one polarization and absorbs the other. Polarized spectacles have 
lenses made from such polarizing material. Maxwell was also able to demonstrate that the velocity v with 
which light propagates in a material is given by v = 1/V(|lig|li g where e is the permittivity of the material, 
g that of vacuum, |u the permeability of the material and |u Q the value in vacuum. When light propagates in 
vacuum, this reduces to v = 1/V(|UqGq) and yields the speed of light, c. This correspondence provided 
confirmation that light and electromagnetic radiation were one and the same. The refractive index of a 
material, n, is defined as n = c/v, and is therefore given by n = V (|ue). Since most materials under 
investigation are not magnetic, |u = 1, and the equation for refractive index simplifies to n = V e. 

As electromagnetic radiation propagates through space, the electric field converts into a magnetic field during 
the oscillation cycle. Thus, the energy present in the electric field converts to energy in a magnetic field. The 
magnetic field oscillates at the same frequency but 90° out of phase with the electric field so that, when the 
electric field is a maximum, the magnetic field is a minimum, and vice versa. This idea allows us to calculate 
the relative electric and magnetic field amplitudes in an electromagnetic wave. Let the electric field amplitude 
be Eq and the corresponding magnetic field amplitude Hq. The energy in a magnetic field is proportional to 
|LiWj and that in an electric field is proportional to e £j. Since the electric and magnetic fields interconvert in 

an electromagnetic wave, conservation of energy requires that these be equal, so that Hq = V (g/^Eq. Again, 
assuming that |u = 1 for a non-magnetic material and using the equation for the refractive index above yields: 
H Q = nE Q . 

Non-polarized electromagnetic radiation, of course, comprises two perpendicular polarizations, which can 
change both in amplitude and in phase with respect to each other. If the two polarizations are in phase with 
each other, the resultant is just another linearly polarized beam, with the resultant polarization direction given 
by a simple vector addition of the 
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two electric field components. When the electric fields of the two polarizations are out of phase with each 
other, the resulting electric field precesses as it propagates through space. For example, if the two electric 


fields are of equal magnitude but 90° out of phase with each other the electric field spirals as it moves through 
space. Depending on the relative phases, this rotation can be either clockwise or anti-clockwise. In this case, if 
we were able to look end on at the electric field vector, this would describe a circle and is therefore referred to 
as circularly polarized light. This particular combination of electric fields also carries with it angular 
momentum, and is responsible for the angular momentum selection rules in spectroscopy; this is really just the 
law of conservation of angular momentum. Different relative phases or amplitudes generally lead to 
elliptically polarized light, so that the phase shift A between the reflected and transmitted light measured in 
ellipsometry is manifested as elliptically polarized light. The different types of polarized light found as the 
phase shifts between and 360° are shown in figure B 1.26. 10. 
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Figure Bl.26.10. Various polarization configurations corresponding to different values of the phase shift, O 

Light can also be absorbed by a material through which it passes. This leads to an attenuation in intensity of 
the light as it passes through the material, which decays exponentially as a function of distance through the 
material and is described mathematically by the Beer-Lambert law [34]: 


/ = / exp(-£d) (B1. 26.13) 

where c is the concentration of the absorbant, / the path length and e the extinction coefficient. This is 
represented in the mathematical description of the propagation of an electromagnetic wave by modifying the 
dielectric constant to add an imaginary part, k, which is generally written as: n - \k where i = V -1. 

(B) REFLECTION OF LIGHT FROM A DIELECTRIC SURFACE 

Before discussing multiple reflections at a thin film, we will first examine the reflection of light from one 
material (with real refractive index n^) into another (with refractive index n 2 ) [27, 32, 33 ]. It is assumed that 
the material does not absorb light and this situation is depicted in figure B 1.26.11 . Since electromagnetic 
waves propagate in both materials, the behaviour at the interface is dictated by the boundary conditions for 
electric and magnetic fields. The components of E and //parallel to the surface (in the x direction in figure 
Bl.26.11 ) and the components of D and B 
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perpendicular to the surface (in the z direction in figure Bl.26.11 ) are continuous across the boundary. Thus 
the reflected and transmitted wave amplitudes can be calculated by simply applying these equations. We will 
consider the reflection of p-polarized radiation from the surface. The calculation of the equations for s- 
polarized radiation is essentially identical. The beam is incident at an angle 0., and a portion is reflected at an 


angle r . Of course, i and r are equal. The remaining light is transmitted at an angle t which will be 
different to i? due to refraction at the surface. The electric field amplitudes are taken to be E- in the incident 
beam, E in the reflected beam and E t in the transmitted beam. The corcoresponding values for magnetic field 
amplitudes are H^ H r and H t respectively. Application of the above boundary conditions gives: 

Ef COS0j - E*COS# r = ff COS0, (B1.26.14) 

H? + H?=H?. (B1.26.15) 

Using the above relationship between the electric and magnetic field amplitudes from equation (B 1.26. 15)): 

ft i Ef + n L E J = n 2 Ei- (B1 .26.1 6) 

In order to calculate the reflected amplitude, E® can be eliminated from equation (B 1.26. 15) and equation 
(B 1.26. 16) to yield: 

(Ej 3 - iff) cos ft _ urCEf- E?) 


Cpsft /1 2 


(B1.26.17) 


where we have used i = . Writing the reflection coefficient for p-polarized radiation as r p (which equals 
E T °/E { ) yields: 

m cos ft -j? i cosfl, 

r p = — (B1.26.18) 

which is the Fresnel equation for p-polarized radiation. A similar analysis can be carried out for s-polarized 
radiation using the boundary conditions in a similar way to yield: 

r = TT "■ (B1.26.19) 

Corresponding equations can also be written for the transmitted portion of the beam. For a non-absorbing 
sample, the refractive index is real, so that exp(i(|)) is real where § is the phase shift on reflection. This means 
that (|) is either or 180°. If, however, the sample absorbs light, the refractive index can become complex, 
resulting in a phase change of the light. This, for example, must be taken into account when reflecting light 
from a metallic (mirror) surface. 
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The reflection coefficients r p and r s give the electric field in the reflected beam for each polarization. Since 
the intensity of light is proportional to the square of the electric field, the reflectances for s- and p-polarized 

light can be written as R? = |r p | and 7? s = |r s | , respectively. These are plotted in figure B 1.26. 12 for a light 
beam incident from air (n = 1) onto a material with refractive index n = 3. It is evident that p-polarized 
radiation is reflected to a much lesser extent than s-polarized radiation, and exclusively s-polarized radiation 
is reflected at the polarizing angle. This effect is exploited in polarized sunglasses (as mentioned above) to 
minimize the reflective glare from surfaces by only allowing p-polarized light to be transmitted. At normal 
incidence, these equations reduce to: 


«'=«'=(^) ! 


which for glass with n ~ 1.5 gives about 4% reflectivity. The polarizing angle shown in figure B 1.26. 12 is 
given by: 

tan 9i = — 

and is known as the Brewster angle, p-polarized radiation is perfectly transmitted at this angle and Brewster 
windows (oriented at this angle) are used in lasers to minimize the loss of radiation in the laser cavity. This 
often results in laser light being polarized. This effect is exploited in polarimeters (see below). Note finally 
that the phase shift on reflection from a dielectric is 0° below the Brewster angle and 180° above. 
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Figure Bl.26.11. Diagram showing light impinging from a material of refractive index n 1 at an angle © i onto 
a material with refractive index n 2 and reflected at an angle © r and transmitted at an angle © r 
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Figure Bl.26.12. Plot of the reflectivity of s- and p-polarized light from a material with refractive index n = 3. 
(C) REFLECTION AT MULTIPLE INTERFACES 

We are now in a position to calculate the reflections from multiple interfaces using the simple example of a 
thin film of material of thickness d with refractive index n 2 sandwiched between a material of refractive index 
n^ (where this is generally air with n = 1) deposited onto a substrate of refractive index n^ [35, 36]. This is 
depicted in figure B 1.26. 9 . The resulting reflectivities for p- and s-polarized light respectively are given by: 

r* +r? 3 exp(-i2fl 

K = 1 fr-p T^V ( B1 - 26 - 20 ) 

1 -rrj 2 ;^exp(-i2/f) 

and 

R* = — \i £2 — — (B1. 26.21) 

l+r^,cxp(-i2/*) 

where Uand <7are the reflection coefficients for s- and p-polarized radiation, respectively, at the interface 
between material / and 7 ( equation (Bl.26.18) and equation (B 1.26. 19) ). The path length difference due to the 
film results in phase differences between different emerging beams giving rise to complex reflection 
coefficients and hence phase shifts. As noted above, this produces elliptically polarized light which can be 
analysed to yield the amplitude and phase shift and ultimately p. This parameter depends on the film thickness 
and the wavelength of light and is given by: 


-® 


fi=27T[ T II2 COS ft. (B1. 26.22) 
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These equations are generally too complex to be solved analytically even for relatively simpler systems and 
are therefore solved numerically. The way in which this is done will be described below. The measurement of 
A and ¥ clearly depends on the wavelength of light directly through this equation. However, both the real and 
imaginary parts of the refractive indices also depend on the wavelength of light. Now the reflectivity of the 

surface for s- and p-polarized light is 9? p = |r p | 2 and 9? s = \R^\ 2 respectively. This dependence is also exploited 
in ellipsometry by measuring A and W and a function of light wavelength in a technique known as 
spectroscopic ellipsometry [37], 

B1. 26.3.2 APPLICATIONS 

(A) MEASUREMENT OF THE OPTICAL CONSTANTS OF MATERIALS USING ELLIPSOMETRY 

In this case, the Fresnel ( equation (B 1.26.1 8) and equation (B 1 .26. 1 9) ) for reflection at a single interface are 
used. The phase shift is zero or 180° for a dielectric with a real refractive index, which can be measured 
directly from the reflectivities ( figure Bl.26.12 ). Intermediate phase shifts are found for absorbant materials 
with complex refractive indices which can also be measured from A and V F. This is generally done by 
numerically solving the Fresnel equation (Bl.26.18) and equation (B 1.2 6. 19) . Many ellipsometers include 
software to calculate these values directly and Fortran programs are also available to calculate these values. 
The variation in W plotted as a function of the imaginary part of the refractive index is shown in figure 
Bl.26.13. This varies between and 45° over the range of AT values. The variation in A as a function of K with 


n = 2 and an incidence angle of 70° is displayed in figure B 1.26. 14 . Again, the value of A varies as the 
imaginary part of the refractive index changes and can vary between and 180°. It is important to emphasize 
that, unless a material is extremely pure, its optical constants can vary so that it is generally important to 
measure these parameters for a particular sample. 
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Figure Bl.26.13. Plot of *P versus K, the imaginary part of the refractive index. 
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Figure Bl.26.14. Plot of A versus K, the imaginary part of the refractive index. 


(B) MEASUREMENT OF FILM THICKNESS AND OPTICAL PROPERTIES 


The way in which ellipsometry can be used to measure film thickness will be illustrated for the simple case of 
a material with thickness d and complex refractive index n 2 deposited onto a substrate with complex refractive 
index n^ with light incident from a material of refractive index riy Since light is generally incident from air, 
n^ is usually taken to be unity. This is done by measuring A and W and using equation (B 1.26.20) and 
equation (B 1.26.21) to calculate the optical properties of the film and its thickness. Since, in addition to the 
film thickness, there are potentially six other variables in the system (the real and imaginary parts of each of 
the refractive indices) whereas only two parameters A and W are measured, several of these must be 
determined independently. The substrate parameters can, for example, be measured prior to film deposition as 
described in the previous section and the refractive index of air is known. The variation in A and ¥ with the 
thickness d of a film of silicon dioxide (with refractive index 1 .46) deposited onto a silicon substrate (with 
refractive index 3.872 - i0.037) is shown plotted in figure B 1.26. 15 [37]. This yields a trajectory of the 
allowed values of A and ¥ as the film thickness. Since the film thickness is measured from the interference 
between the light reflected from the boundary between air and the film and that reflected from the interface 
between the film and the substrate, the maximum film thickness d that can be measured before the 


trajectory shown in figure B 1.26. 15 retraces itself is given by: 


2r/ ma *eos0| = 


"2 


(B1. 26.23) 


where n 2 is included to take account of the change in wavelength as the light passes through the film with this 
refractive index. The angle of the transmitted light (0 t ) and the incidence angle (0^ are related through Snell's 
law so that: 


cosfl, = J\- 


sin 2 9i 

"2 


(B1. 26.24) 


assuming that the light is incident from air (n= 1). Substitution into equation (B 1.26. 23) yields a value of 


rf max aS: 
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2 Jul " sin 2 #2 


(B1. 26.25) 


This shows that the values of A and *P are identical for films of thickness nd ^ ov where n is an integer. Thus 

— IllaX *~^ 

the trajectory shown in figure B 1.26. 15 retraces itself for multiples of this maximum thickness. For the silicon 
dioxide film deposited onto silicon, this yields a value of J max = 2832 A. Thus, in figure B 1.26. 15 if values A 
and W were measured to be 90 and 25° respectively, this would correspond to a film of 600 + n x 2832 A 
thick where n = Q, 1,2 etc. 
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Figure Bl.26.15. The Del/Psi trajectory for silicon dioxide on silicon with angle of incidence (^ = 70° and 
wavelength X = 6328 A (Tompkins H G 1993 A Users Guide to Ellipsometry (San Diego, CA: Academic)). 


B1. 26.3.3 INSTRUMENTATION 


The following components make up an ellipsometer: 


(1) Monochromatic light source. This is generally a small laser, usually a helium-neon laser emitting red light at 
6328 A. The plasma tube of these lasers is usually terminated by a Brewster window (see section B1 .26.3.1 ) to 
minimize losses in the laser cavity so that the light emitted by the laser is linearly polarized. If an unpolarized 
source is used, a polarizer is placed after the light source to produce linearly polarized light. 

(2) An element that converts linearly polarized light into elliptically polarized light. This component is made of a 
birefringent material where the refractive index of light that passes through the material depends on the light 
polarization with respect to its crystal lattice [33] where there are generally two perpendicular axes with 
different refractive indices for light polarized along each of these directions. Since, as shown above, the 
velocity of light through a medium depends on the refractive index, where the velocity of light v in the medium 
is given by v= c/n, this implies that light travels at different velocities depending on the polarization with 
respect to each of these directions. When the light is polarized along the direction of the smallest refractive 
index, the light travels the fastest and this is known as the 'fast' axis. Since light polarized along the other, 
higher-refractive-index axis travels more slowly, this is known as the 'slow' axis. The refractive indices of the 
fast and slow axis are 
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designated n^ and n^ respectively, where it follows from the definitions that n s > n^. If we now imagine 
linearly polarized light incident on this material with the electric field vector oriented at 45° to each of 
these axes, the electric field of the incident light along both the slow and fast axes will be equal at 
,EqC0s45 , where E^ is the electric field amplitude of the polarized light. They will also be in phase. The 
frequency of the light, v Q , will be the same for both components within the material, but they will travel 
with different velocities. This means that after traversing a plate of this material of thickness d, the two 
components will no longer be in phase with each other so that, according to the discussion in section 
Bl.26.3.1 , the light will, in general, be elliptically polarized. In order to calculate this, we first calculate 
the time required for light polarized along the fast and slow axes to traverse a disc of the birefringent 
material of thickness d. This is given by t s = d/v s and t^ = J/v f for the slow and fast axes respectively. 
The difference in transit time for the two beams, At = t s - t^ If the period of oscillation of the light is x 
(= 1/vq), then if At = x, the slow beam is one period behind the fast beam and the phase difference is 2n 
radians. The phase difference for intermediate values of At designated A<\> is given by (27iAt/x). 
Combining these equations yields: 


27TV rf 
^0 = («s — rtrJ^ (B1 .26.26) 


If the wavelength of the incident radiation in vacuo is A, Q , equation B 1.26. 26 becomes: 

A$ = ^{i h - rif) (B1. 26.27) 

so the value of A§ can be selected to be any desired value merely by varying the value of d for a material 
with particular values of n^ n^ and X. It is common to select A(|) = 90° = (2%)I4 radians, and this is known 
as a quarter-wave plate for a particular wavelength and produces circularly polarized light from linearly 
polarized light if the polarization direction of the incident beam is oriented at 45° to the fast and slow 
axes. Orientating the incident polarization to intermediate angles with respect to the fast and slow axes 
yields elliptically polarized light. This effect is exploited in the ellipsometer. 

(3) A polarizer which transmits only one polarization of radiation is required to define the state of polarization of 
the reflected beam. 

(4) The intensity of the reflected light must also be measured. Historically, this was done using the eye. Since, in 
general, a null (a measurement of the point at which the light decreases to zero) is required, this can be 
relatively sensitive. However, nowadays, the light intensity is generally measured using a photomultiplier tube. 


(5) These components must be mounted so that the incident and detection angles can be varied and kept equal 
and a place must be provided to mount the sample. 

There are several possible configurations used to construct an ellipsometer [44]. We will describe one 
example of the one of the most common arrangements shown in figure B 1.26. 16 . The linearly polarized light 
emerging from the laser passes through the quarter-wave plate, which can be rotated to yield elliptically 
polarized light. When this reflects from the sample this also produces elliptically polarized light. The quarter- 
wave plate is rotated to find the condition such that, when the elliptically polarized light interacts with the 
sample, the phase change produced on reflection exactly compensates for the elliptical polarization of the 
incident light to produce linearly polarized light. The angle of the resulting linearly polarized light can be 
accurately determined using the polarizer placed before the detector (known as the analyser) by rotating it so 
that no light reaches the detector (the null condition). Being able to achieve this depends, 
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of course, on the quarter- wave plate being correctly oriented so as to exactly compensate for the effect of the 
sample and the analyser being oriented at exactly 90° to the direction of the resulting linearly polarized light. 
The experiment then consists of rotating both the quarter- wave plate and the analyser so that no light reaches 
the detector. This is often done automatically, and these resulting quarter- wave plate and analyser angles can 
be simply converted into the parameters A and ¥ [37, 44 ]. 

Light source. Laser 
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Figure Bl.26.16. Schematic diagram of an ellipsometer. 

B1. 26.3.4 EXAMPLES 

(A) BASIC STUDIES 

Dielectric constants of metals, semiconductors and insulators can be determined from ellipsometry 
measurements [38, 39 ]. Since the dielectric constant can vary depending on the way in which a film is grown, 
the measurement of accurate film thicknesses relies on having accurate values of the dielectric constant. One 
common procedure for determining dielectric constants is by using a Kramers-Kronig analysis of 
spectroscopic reflectance data [39]. This method suffers from the series-termination error as well as the 
difficulty of making corrections for the presence of overlay er contaminants. The ellipsometry method is for 
the most part free of both these sources of error and thus yields the most accurate values to date [39]. 

(B) CHARACTERIZATION OF THIN FILMS AND MULTILAYER STRUCTURES 

Ellipsometry measurements can provide information about the thickness, microroughness and dielectric 
function of thin films. It can also provide information on the depth profile of multilayer structures non- 
destructively, including the thickness, the composition and the degree of crystallinity of each layer [39]. The 
measurement of the various components of a complex multilayered film is illustrated in figure B 1.26. 17 [40], 


This also illustrates the use of different wavelengths of light to obtain much more information on the nature of 
the film. Here A and W are plotted versus the wavelength of light (•) and the line drawn through these data 
represents a fit calculated for the various films of yttrium oxide deposited on silica as shown at the bottom of 
the figure [40]. 
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Figure Bl.26.17. (a) Observed and calculated ellipsometric [A(^), V F(X,)] spectra for the Y 2 3 film on vitreous 
silica. Angle of incidence 75°. (b) Best- fit model of the Y 2 3 film on vitreous silica (Chindaudom P and 
Vedam K 1994 Physics of Thin Films vol 19, ed K Vedam (New York: Academic) p 191). 

(C) REAL-TIME STUDIES 

With the development of multichannel spectroscopic ellipsometry, it is possible now to use real-time 
spectroscopic ellipsometers, for example, to establish the optimum substrate temperature in a film growth 
process [ 41 , 42 ]. 


B1. 26.4 WORK-FUNCTION MEASUREMENTS 


B1. 26.4.1 PRINCIPLES 


The work function (O) is defined as the minimum work that has to be done to remove an electron from the 
bulk of the material to a sufficient distance outside the surface such that it no longer experiences an 
interaction with the surface electrostatic field [43, 44 and 45]. In other words, it is the minimum energy 
required to remove an electron from the 
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highest occupied level (the 'Fermi level') of a solid, through the surface, to the so-called vacuum reference 
level (figure B 1.26. 18). Thus it is influenced by two factors. The first is associated with the bulk electronic 
properties of the solid: work function increases with increasing binding energy. The second is associated with 
penetrating the surface dipole layer: work function changes with surface contamination and structure. 
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Figure Bl.26.18. Schematic diagram of the energy levels in a solid. 

In order to understand the tendency to form a dipole layer at the surface, imagine a solid that has been cleaved 
to expose a surface. If the truncated electron distribution originally present within the sample does not relax, 
this produces a steplike change in the electron density at the newly created surface ( figure B 1 .26. 1 9(A) ). 
Since the electron density p(x) oc |\|/(x)| , where \|/(x) is the electron wavefunction, this implies that the 
electron wavefunction varies in a similarly step-wise fashion at the interface. This indicates that d 2 \|//dx 2 | , 
where s indicates that the derivative is evaluated at the surface, becomes infinite. Since the electron kinetic 

energy Z? K = (-ft /2m) (d \|//dx ), this creates an infinite-energy surface. This energy can be decreased by 

reducing d \|//dx and by allowing the wavefunction to become 'smoother' at the interface as shown in figure 
Bl. 26. 19(B) . This means that electron density previously within the sample extends outside the sample, 
producing a negative charge. Since the sample was originally electrically neutral, the excess charge outside 
the sample is balanced by a corresponding positive charge within it, resulting in an electric dipole moment at 
the surface. The work required to separate these charges increases the potential energy at the same time as the 
kinetic energy decreases. The equilibrium surface dipole moment corresponds to the minimum in this energy 
and this has been discussed in detail by Smoluchowski [46, 47]. In general, the greater the electron density of 
the sample, the larger will be the surface dipole. Thus, for the same metal, close-packed surfaces generally 
have the highest work functions; for example, in the case of copper, the work functions of the various surfaces 
are Cu(l 1 1): ® = 4.94 eV, Cu(100): ® = 4.59 eV, Cu(l 10): ® = 4.48 eV [45]. It is the presence of this surface 
dipole that renders the work function sensitive to changes in surface properties. For example, the surface 
dipole layer may change as a result of adsorption. Adsorbed species can be viewed as having a discrete dipole 
moment that tends to modify the total dipole layer at the surface by charge transfer and consequently change 
the work function. Thus, measurement of the work function change, A® = ^adsorbatecovered _ ^clean Y^^s 
important information on the degree of charge reorganization upon adsorption and the surface coverage of the 
adsorbate. For example, adsorption of an electronegative species (for example, chlorine, hydrogen etc) will 
tend to increase the surface dipole moment causing an increase in work function (A® positive). 
Correspondingly, an electropositive adsorbate (for example, an alkali metal) decreases the surface dipole 
moment causing a decrease in the work function (A® negative). These changes can be used to measure the 
adsorbate coverage as illustrated in figure B 1.26. 20 . Here it is assumed that there are N adsorbates per unit 
area each having an effective charge q. This is balanced by an equal and opposite 'image charge' in the 
substrate a distance d away so that each adsorbate possesses a dipole moment |u = qd. The separated layers of 
positive and negative charge can be thought of as forming a parallel-plate capacitor, where the potential 
difference between the capacitor plates corresponds to the change in work function of the sample. If the total 
charge per unit area on each of the plates is g, then: 
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Q = CA<t> 


(B1. 26.28) 


where C is the capacitance per unit area and is s^/d. Since Q = Nq, equation (B 1.26. 28) becomes: 

AF ^^ 


(B1. 26.29) 


which, remembering that |u = qd, yields the Helmholz equation: 


a<j> = 


€1} 


(B1. 26.30) 


and shows that, for this simple case, the change in work function varies linearly with the adsorbate coverage N 
and the surface dipole moment of the adsorbate |u. A simplifying assumption in these equations is that either 
the coverage or the surface dipole moment or both are sufficiently small that the dipoles do not interact. If the 
distance between the dipoles decreases (as the coverage becomes larger) and/or if the dipole moment is large, 
the electric field created by one dipole can polarize adjacent dipoles to reduce their dipole moments. This 
effect is known as depolarization. This situation has been described by Topping [48] and Miller [49] and the 
resulting change in work function taking these effects into account is given by: 


A* = 


N^l -9aN* f2 ) 


00 


(B1.26.31) 


where a is the polarizibility of the adsorbate. This reduces to equation (B 1.26. 30) for low coverages. 
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Figure Bl.26.19. The variation of the electron density (A) from an unrelaxed surface and (B) showing the 
smoothing of the electron density to lower the kinetic density. 
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Figure Bl.26.20. Diagram showing the dipole layer created on a surface by an electropositive adsorbate. 

B1. 26.4.2 INSTRUMENTATION 

There are many ways to measure the work function or work function change which can be generally classified 
as electron emission methods (thermionic emission, field emission and photoelectron emission), low-energy 
electron beam (retarding-potential) methods and capacitance methods [50, 51 and 52]. Absolute work function 
values can be measured using emission methods while the other techniques measure only work- function 
changes. 
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The probes for measuring surface work functions are generally incorporated into an ultrahigh vacuum 
apparatus and supplement the existing vacuum-compatible, surface-sensitive probes (see for example section 
BL7, section B 1.9 , section B 1.20 , section B 1.21 and section B 1.25 ). Rather than measuring the absolute value 
of the work function, it is often more interesting to measure the change in work function caused by some 
change to the surface. It can either be measured by modifying equipment already present in the vacuum 
system, for example using the electron gun of low-energy electron diffraction optics for the electron beam 
method or from the high-binding-energy cut-off in ultraviolet photoelectron spectroscopy. Specific probes for 
rapidly and conveniently measuring work-function changes can also be introduced separately into the 
chamber. The most common of these is the Kelvin probe or vibrating capacitor. 

(A) THERMIONIC EMISSION METHOD 

When a metal sample is heated, electrons are emitted from the surface when the thermal energy of the 
electrons, kT, becomes sufficient to overcome the work function O [53]. The probability of this electron 
emission depends on work function O and temperature Tas expressed in the Richardson-Dushman equation: 


/ = A{\ - r) T 2 exp{-e<PfkT) (B1. 26.32) 

where J is the thermionic emission current density, A = 120 A cm deg and r is the reflection coefficient 
for electrons arriving at the work function barrier. Thus, plotting ln(J/T z ) against llkT yields a straight line 
with slope equal to e®. The method is not generally suitable for monitoring adsorbates since the sample has to 
be heated to emit electrons. 

(B) FIELD EMISSION METHOD 

The barrier to electron removal from a surface can be reduced substantially by the presence of a strong 


electric field. The physical situation involved in this process is shown in figure B 1.26. 21 where, in the 
presence of an applied field, the work function is less than that at zero field. This increases the probability for 
electrons to 'tunnel' out of the sample [54]. This is a quantum mechanical phenomenon which can be 
formulated mathematically by considering a Fermi sea of electrons within the metal impinging on a potential 
barrier at the surface. The result is given by the Fowler-Nordheim equation as: 


'-««» feXfH^i^) 


(B1. 26.33) 


where J is the current arising from field emission, E is the electric field strength and a is a tabulated function 

equal to 0.95 ± 0.009 over the range of current densities normally encountered. A plot of1n(J/E) against 
(1/E) can be used to determine O. 
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Figure Bl.26.21. Potential energy curves for an electron near a metal surface. 'Image potential' curve: no 
applied field. 'Total potential' curve: applied external field = -E QX . 

This method needs a highly specialized sample configuration and the sample has to be a very sharp point 

(radius ~10 -5 cm) so that sufficiently high fields can be maintained for emission measurements. This 
requirement often precludes the use of other surface analysis techniques. However, for specialized 
applications, the field emission method may be the only one or the most convenient one available. It has been 
shown to be able to directly measure the work function change induced by the adsorption of a single atom on 
a tungsten plane [55]. Combined with field emission microscopy, the work function of different crystal planes 
can be detected. Much of the early understanding of adsorption was obtained with this device and a large 
number of data on single-crystal work functions have been produced by this technique [52]. 

(C) PHOTOELECTRON EMISSION METHOD 

When photons of sufficiently high frequency v are directed onto a metal surface, electrons are emitted in a 
process known as photoelectron emission [56]. The threshold frequency v Q is related to the work function by 
the expression 


— hvtjfe. 


(B1. 26.34) 


The total electron current generated in this process is given by the Fowler equation: 


J = BT 2 f 
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(B1. 26.35) 


where B is a parameter that depends on the material involved. The photoemitted current is measured as a 
function of photon energy; extrapolation allows the determination of v Q , and thus of®, to be made. 

This technique requires a photon source (a light source with monochromator or filters) of calibrated spectral 
intensity and variable energy in the range around v Q , and an electron collector. Both the work function and the 
work- function change may be determined conveniently from the cut-off in inelastic electrons in a 
photoelectron spectrum [47]. As demonstrated by figure B 1.26. 18 electrons with the minimum kinetic energy 
barely surmount the work function, while electrons from the Fermi level (E ¥ , the highest occupied level) will 
have the maximum kinetic energy, hv - O. 
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The work function can be calculated from the relationship 


4> = /jv- W 


(B1. 26.36) 


where Wis the energy width of the whole photoelectron spectrum. Adsorption changes the work function 
which in turn changes the width (W) of the UP spectrum. O and Ware inversely related. The work- function 
change upon adsorption manifests itself in a shift in the low-kinetic-energy 'secondary tail', as shown in the 
inset in figure B 1.26.22 [47]. 


!^t» 



KrrH-lu: 

cncrgy (eV> 


Figure Bl.26.22. The energy width Wof an ultraviolet photoelectron spectrum from a solid may be used to 
determine the work function. Changes in work function may be obtained from changes in the 'cut-off of the 
secondary electron peak (inset) (Attard G and Barnes C 1988 Surfaces (Oxford: Oxford University Press)). 

(D) LOW-ENERGY ELECTRON BEAM (RETARDING-POTENTIAL) METHODS 


In this method, the sample is designed as an anode. Electrons emitted from the cathode by thermionic 
emission normally impinge on the anode sample on which a retarding potential is applied. Basically this can 
be arranged as a diode (known as the diode method) [57] or a triode (called the Shelton triode method) [58]. 
Different circuit arrangements give rise to different I-V relationships and the difference in the work functions 
of the two electrode surfaces is measured. When a low-energy electron diffraction (LEED) apparatus is 
available (see section B 1.9 ), this method can be implemented using the low-energy electrons from the LEED 
electron gun. In this case, electrons of known, low energy are incident on the sample. As the potential across 


the sample is slowly made more negative, the current is measured and the relationship between sample and 
retarding voltage is shown in figure B 1.26. 23 [50], At low retarding voltage, most of the impinging electrons 
are collected. At some larger retarding voltage, the impinging electrons do not posses sufficient energy to 
reach the sample and so are reflected. This voltage is determined by the work functions of the sample and the 
electron gun filament, and by the accelerating voltage of the electron gun. 
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Changes in the work function of the sample, when the filament work function and the electron gun 
accelerating voltage are held constant, arevoltage are held constant, are manifested by a change in the cut-off 
retarding voltage (see figure B 1.26.23 ). 



Figure Bl.26.23. Current-voltage curves observed in the retarding potential difference method of work- 
function \hbox {measurement} (Hudson J B 1992 Surface Science (Stoneham, MA: Butterworth- 
Heinemann)). 

A low-energy electron beam can also be obtained using a field emission tip and used in the field emission 
retarding-potential method. This combination provides an absolute measure of the sample work function and 
the resolution is excellent [52], 

(E) CAPACITANCE METHODS 

When an electrical connection is made between two metal surfaces, a contact potential difference arises from 
the transfer of electrons from the metal of lower work function to the second metal until their Fermi levels line 
up. The difference in contact potential between the two metals is just equal to the difference in their respective 
work functions. In the absence of an applied emf, there is electric field between two parallel metal plates 
arranged as a capacitor. If a potential is applied, the field can be eliminated and at this point the potential 
equals the contact potential difference of the two metal plates. If one plate of known work function is used as 
a reference electrode, the work function of the second plate can be determined by measuring this applied 
potential between the plates [52]. One can determine the zero-electric-field condition between the two parallel 
plates by measuring directly the tendency for charge to flow through the external circuit. This is called the 
static capacitor method [59]. 


Historically, the first and most important capacitance method is the vibrating capacitor approach implemented 
by Lord Kelvin in 1897. In this technique (now called the Kelvin probe), the reference plate moves relative to 
the sample surface at some constant frequency and the capacitance changes as the interelectrode separation 
changes. An AC current thus flows in the external circuit. Upon reduction of the electric field to zero, the AC 
current is also reduced to zero. Originally, Kelvin detected the zero point manually using his quadrant 
electrometer. Nowadays, there are many elegant and sensitive versions of this technique. A piezoceramic foil 
can be used to vibrate the reference plate. To minimize noise and maximize sensitivity, a phase-locked 


detection circuit is used and a feedback loop may automatically null the electric field. The whole process can 
be carried out electronically to provide automatic recording of the contact potential and any changes which 
occur. 
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This technique does not involve heating the sample to high temperature or exposing it to high electric fields. 
Nor is there a need for a hot filament which cannot be used at high pressures. When mounted on a linear 
motion manipulator, the reference plate assembly can be moved away from the sample leaving no interference 
with other techniques. One major shortcoming of this technique is that the work function of the reference 
plate, or electrode, must be precisely known for an absolute determination of the sample work function. 
Moreover, when relative changes are examined, the work function of the reference electrode must be stable, 
either unaffected by adsorption or cleanable before each experiment. This disadvantage may be compensated 
for by using an inert metal such as gold. In addition, the reference electrode must be well shielded to reduce 
the effects of external electric and magnetic fields on the experiment [52]. 

B1. 26.4.3 APPLICATIONS 

(A) SURFACE CHARACTERIZATION 

Measurement of the work function of a surface is an important part of overall surface characterization. 
Surface electron charge density can be described in terms of the work function and the surface dipole moment 
can be calculated from it ( equation (B 1.2 6. 30) and equation (B 1.26. 31) ). Likewise, changes in the chemical or 
physical state of the surface, such as adsorption or geometric reconstruction, can be observed through a work- 
function modification. For studies related to cathodes, the work function may be the most important surface 
parameter to be determined [52], 

(B) MEASUREMENT OF ADSORPTION ISOTHERMS 

Almost all adsorbates cause work function changes (A®). Plotting A® versus pressure at various temperatures 
produces an adsorption isotherm which can be used to determine heats of adsorption and the surface coverage 
[60]. Much early understanding of adsorption was gained by this method. In particular alkali and alkali earth 
adsorption has been rather extensively studied in this way. This is illustrated in figure B 1.26. 24 for the 
adsorption of alkali metals on W(100) [61]. The change in work function is depicted as solid lines which 
initially decrease with increasing coverage as expected for an electropositive adsorbate. The dots connected 
by the dashed line represent the work functions of each of the bulk alkali metals and each of the work- 
function curves tends asymptotically to these values for large coverages. These curves are not linear as a 
function of coverage because of depolarization effects ( equation (B 1.26. 31) ) and so reach a minimum before 
attaining their bulk values. 
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Figure Bl.26.24. The change of work function of the (100) plane of tungsten covered by Na, K and Cs, and 
work function of alkali metals (dashed-dotted line) versus adatom concentration n (Kiejna A and 
Wojciechowski 1981 Prog. Surf. Sci. 11 293-338). 
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B1.27 Calorimetry 

Kenneth N Marsh 


B1.27.1 INTRODUCTION 

Calorimetry is the basic experimental method employed in thermochemistry and thermal physics which 
enables the measurement of the difference in the energy [/or enthalpy H of a system as a result of some 
process being done on the system. The instrument that is used to measure this energy or enthalpy difference 
(A [/or AH) is called a calorimeter. In the first section the relationships between the thermodynamic functions 
and calorimetry are established. The second section gives a general classification of calorimeters in terms of 
the principle of operation. The third section describes selected calorimeters used to measure thermodynamic 
properties such as heat capacity, enthalpies of phase change, reaction, solution and adsorption. 


B1.27.2 RELATIONSHIP BETWEEN THERMODYNAMIC FUNCTIONS 
AND CALORIMETRY 

The first law of thermodynamics relates the energy change in a system at constant volume to the work done 
on the system w and the heat added to the system q, 


AU = w+q. (B1.27.1) 

Both heat and work are a flow of energy, heat being a flow resulting from a difference in temperature between 
the system and the surroundings and work being an energy flow caused by a difference in pressure or the 
application of other electromechanical forces such as electrical energy. For the case where the system is 
thermally isolated from its surrounding, termed adiabatically enclosed, there is no heat flow to or from the 
surrounds, i.e. q = so that 

AU = ttfadiitatte- (B1.27.2) 

A calorimeter is a device used to measure the work w that would have to be done under adiabatic conditions 
to bring about a change from state 1 to state 2 for which we wish to measure AU= U 2 - Uy This work w is 
generally done by passing a known constant electric current 3 for a known time t through a known resistance 
R embedded in the calorimeter, and is denoted by w elec where 

WJ*lec = % 2 Rt- (B1.27.3) 


In general it is difficult to construct a calorimeter that is truly adiabatic so there will be unavoidable heat leaks 
q. It is also possible that non-deliberate work is done on the calorimeter such as that resulting from a change 
in volume against a non-zero external pressure Pt%i(— f p^i d V), often called pV work. Additional work w' 
may be done on the system by energy introduced from stirring or from energy dissipated due to self-heating of 
the device used to measure the temperature. The basic equation for the energy change in a calorimeter is 


W «**.-/*.«' + ■' + ,. (B1.27.4, 

The pV work term is not normally measured. It can be eliminated by suspending the calorimeter in an 
evacuated space (p = 0) or by holding the volume of the calorimeter constant (dV= 0) to give 

At/ = w itK + w f + <j, (B1 .27.5) 

This is the working equation for a constant volume calorimeter. Alternatively, a calorimeter can be maintained 
at constant pressure/? equal to the external pressure p t in which case 


/; 

JVi 


V2 

p cxX dV = p(V 2 - V|) (B1.27.6) 


and 


At/ = U 2 - U t = W C \ K - pV\ - pV* + \D* + <y (B1.27.7) 


hence 


A(U +pV)=(U+ p V h - (U 4- ^1/) L = ,^ lw + tf / + ^. (B1.27.8) 

The quantity C/+ PFis termed the enthalpy H, hence 

A// = H 2 - Wj = iiJ C foc+ «/ +fl« (B 1.27.9) 

is the working equation for a constant pressure calorimeter. 

The heat capacity at constant volume Cy is defined from the relations 


AU = / CvdT (B1.27.10) 


= / " Cv 


and 


Ci/=oy/ar) v = lim [[u(T lf v 2 )-U(Tu Vi)»/(r 2 -7"i)]. ( B1 - 27 - 11 ) 

Values of C^(7) can be derived from a constant volume calorimeter by measuring AU for small values of (T 2 
- 7^) and evaluating AU/(T 2 - T^) as a function of temperature. The energy change AU can be derived from a 
knowledge of the amount of electrical energy required to change the temperature of the sample + container 


from T^ to T 2 , w elec (sample + container), and the energy required to change the temperature of the container 
only from T^ to 7^, w elec (container). If the volume of the sample is kept constant, and the calorimeter is 
adiabatic (q = 0) or the heat leak is independent of the amount of sample and no other work is done then: 

AU = UiT 2 i V) — U(T t , V) = w c i K (sample i container) — n; c]cc (container). (B1.27.12) 

Except for gases, it is very difficult to determine C y . For a solid or liquid the pressure developed in keeping 
the volume constant when the temperature is changed by a significant amount would require a vessel so 
massive that most of the total heat capacity would be that of the container. It is much easier to measure the 
difference 

AH = //(T^, p) — H[T]* p) = uj C | W (sample + container) — u- c icc( container) (B1.27.13) 

between the enthalpies of the initial and final states when the pressure is kept constant and derive the heat 
capacity at constant pressure C defined by 

C,,*W/ar),, = lim [\H(Ti, Vi) - H(T t , F,)}/(T 2 - 7,)]. (B1.27.14) 

The enthalpy change A/7 for a temperature change from T^ to T 2 can be obtained by integration of the 
constant pressure heat capacity 


MI - I Y„d/\ (B1.27.15) 


= f'Cp 

Jt 

The entropy change AS for a temperature change from T^ to T 2 can be obtained from the following integration 

A$= j (C if fT)dT. (B1.27.16) 


B1.27.3 OPERATING PRINCIPLE OF A CALORIMETER 

All calorimeters consist of the calorimeter proper and its surround. This surround, which may be a jacket or a 
bath, is used to control the temperature of the calorimeter and the rate of heat leak to the environment. For 
temperatures not too far removed from room temperature, the jacket or bath usually contains a stirred liquid at 
a controlled temperature. For measurements at extreme temperatures, the jacket usually consists of a metal 
block containing a heater to control the temperature. With non-isothermal calorimeters (calorimeters where 
the temperature either increases or decreases as the reaction proceeds), if the jacket is kept at a constant 
temperature there will be some heat leak to the jacket when the temperature of the calorimeter changes. 
Hence, it is necessary to correct the temperature change observed to the value it would have been if there was 
no leak. This is achieved by measuring the temperature of the calorimeter for a time period both before and 
after the process and applying Newton's law of cooling. This correction can be reduced by using the 
technique of adiabatic calorimetry, where the temperature of the jacket is kept at the same temperature as the 
calorimeter as a temperature change occurs. This technique requires more elaborate temperature control and it 
is primarily used in accurate heat capacity measurements at low temperatures. 


With most non-isothermal calorimeters, it is necessary to relate the temperature rise to the quantity of energy 
released in the process by determining the calorimeter constant, which is the amount of energy required to 
increase the temperature of the calorimeter by one degree. This value can be determined by electrical 
calibration using a resistance heater or by measurements on well-defined reference materials [JJ. For example, 
in bomb calorimetry, the calorimeter constant is often determined from the temperature rise that occurs when 
a known mass of a highly pure standard sample of, for example, benzoic acid is burnt in oxygen. 


B1.27.4 CLASSIFICATION OF CALORIMETERS 

B1. 27.4.1 CLASSIFICATION BY PRINCIPLE OF OPERATION 

ISOTHERMAL CALORIMETERS (MORE PRECISELY, QUASI-ISOTHERMAL) 

These include calorimeters referred to as calorimeters with phase transitions. The temperatures of the 
calorimeter (i.e. the vessel) T(c) and the jacket T(s) in such calorimeters remain constant throughout the 
experiment. For calorimeters with phase transition, the calorimetric medium is usually a pure solid (its stable 
modification) in equilibrium with the liquid phase of the same substance, for example, ice and water. The 
reaction chamber is placed inside a vessel inside the layer of this substance. The jacket also contains an 
equilibrium mixture of two phases of the same substance. For an exothermic process, part of the solid 
substance melts in the vessel, and the volume change of the liquid is precisely measured. Another calorimeter 
of this type is an isothermal titration calorimeter. One of the reactants is added at such a rate that, for an 
endo thermic system, the enthalpy or energy change is balanced by the simultaneous addition of electrical 
energy so the calorimeter remains isothermal. The energy added is a direct measure of the energy or enthalpy 
change. For an exothermic system electrical energy that is added at a constant rate is counterbalanced by the 
removal of energy at a constant rate (by, for example, a thermoelectric cooling device) to maintain the 
calorimeter isothermal. The reactant is then added at such a rate that the calorimeter remains isothermal when 
the addition of electrical energy is discontinued. 


ADIABATICALLY-JACKETED CALORIMETERS 

The energy released when the process under study takes place makes the calorimeter temperature T(c) change. 
In an adiabatically jacketed calorimeter, T(s) is also changed so that the difference between T(c) and T(s) 
remains minimal during the course of the experiment; that is, in the best case, no energy exchange occurs 
between the calorimeter (unit) and the jacket. The thermal conductivity of the space between the calorimeter 
and jacket must be as small as possible, which can be achieved by evacuation or by the addition of a gas of 
low thermal conductivity, such as argon. 

HEAT-FLOW CALORIMETERS 

These calorimeters are enclosed in a thermostat ('heat sink') which has a much greater heat capacity than that 
of the calorimeter vessel proper. The energy released in the calorimeter is negligibly small compared with the 
heat capacity of the thermostat, and hence the thermostat temperature T(s) does not change. The outer surface 
of the calorimeter (vessel) is in direct thermal contact with the inner surface of the thermostat, and the energy 
flow occurs through a series of thermopiles, which consist of a large number of thermocouples connected in 
series. The flow of energy through the thermocouples gives rise to a voltage. The thermopiles are designed so 
that the majority of energy that flows from the calorimeter to the thermostat flows through them as rapidly as 
possible and the area under the curve of the voltage produced against time is a measure of the overall quantity 
of the energy released (or taken up) in the process occurring in the calorimeter. 


ISOPERIBOLE CALORIMETERS 

This type of calorimeter is normally enclosed in a thermostatted-jacket having a constant temperature T(s). 
and the calorimeter (vessel) temperature T(c) changes through the energy released as the process under study 
proceeds. The thermal conductivity of the intermediate space must be as small as possible. Most combustion 
calorimeters fall into this group. 

B1. 27.4.2 CLASSIFICATION BY DESIGN 

LIQUID CALORIMETERS 

A liquid serves as the calorimetric medium in which the reaction vessel is placed and facilitates the transfer of 
energy from the reaction. The liquid is part of the calorimeter (vessel) proper. The vessel may be isolated 
from the jacket (isoperibole or adiabatic), or may be in good thermal contact (heat-flow type) depending upon 
the principle of operation used in the calorimeter design. 

ANEROID (LIQUIDLESS) CALORIMETERS 

The reaction vessel is situated inside a metal of high thermal conductivity having a cylindrical, spherical, or 
other shape which serves as the calorimetric medium. Silver is the most suitable material because of its high 
thermal conductivity, but copper is most frequently used. 


COMBINED CALORIMETERS 

These are a combination of the liquid and aneroid types. 
B1. 27.4.3 SELECTION OF METHOD OF MEASUREMENT 

In designing a calorimeter, consideration must be given to the combination of the principle of its operation 
with the type of design and this depends on the ultimate goal. Thus, isoperibole calorimeters may be a liquid, 
aneroid, or a combined calorimeter type with a static or rotating vessel (dynamic). For adiabatically-jacketed 
calorimeters, one normally uses a combined or aneroid design that enables the creation of a vacuum between 
the unit and the jacket, which is essential for proper thermal isolation of the unit. Isothermal calorimeters 
require special consideration depending on their application. To characterize a calorimeter design, it is 
necessary to specify its type by all the classifications, for example, adiabatic aneroid static or isoperibole 
liquid dynamic calorimeter. 

The selection of the operating principle and the design of the calorimeter depends upon the nature of the 
process to be studied and on the experimental procedures required. However, the type of calorimeter 
necessary to study a particular process is not unique and can depend upon subjective factors such as technical 
restrictions, resources, traditions of the laboratory and the inclinations of the researcher. 


B1.27.5 CALORIMETERS FOR SPECIFIC APPLICATIONS 

Various books and chapters in books are devoted to calorimeter design and specific applications of 
calorimetry. For several decades the Commission on Thermodynamics of the International Union of Pure and 


Applied Chemistry (IUPAC) has been responsible for a series of volumes on experimental thermodynamics 
and thermochemistry. Experimental Thermochemistry, volume I, published in 1956, edited by F D Rossini [2], 
dealt primarily with combustion calorimetry. Volume II published in 1962, edited by H A Skinner [3], 
primarily documented advances in combustion calorimetry since the first volume. In 1979 an update of much 
of the material covered in Experimental Thermodynamics, volumes I and II, was published under the title 
Combustion Calorimetry with editors S Sunner and M Mansson [4]. The first volume in the series 
Experimental Thermodynamics, Calorimetry of Non-Reacting Systems, edited by J P McCullough and D W 
Scott [5] was published in 1968. This volume covered the general principle of calorimeter design for non- 
reacting systems. It included a detailed discussion of adiabatic and drop calorimeters for the measurements of 
heat capacity, calorimeters for measurement of enthalpies of fusion and vaporisation, and calorimeters for the 
measurement of heat capacities of liquids and solutions close to room temperature. The second volume, 
Experimental Thermodynamics of Non-Reacting Systems, edited by B LeNeindre and B Vodar [6], published 
in 1975, was concerned with the measurement of a broader class of thermodynamic and transport properties 
over a wide range of temperature and pressure. A number of the techniques covered, such as density of a fluid 
as a function of temperature and pressure and speed of sound, allow the calculation of energy differences by 
non-calorimetric methods. Volume III, Measurement of Transport Properties of Fluids, edited by W A 
Wakeham, A Nagashima and J V Sengers [7], published in 1991, was concerned primarily with the 
measurement of the transport properties of fluids. Volume IV, Solution Calorimetry, edited by K N Marsh and 
PAG O'Hare [8] was published in 1994. This book covered calorimetric techniques for the measurement of 
enthalpies of reaction of organic substances, heat capacity and excess enthalpy of mixtures of organic 
compounds in both the liquid and gas phase, calorimetry of electrolyte solutions at high 


temperature and pressure, microcalorimetric application in biological systems, titration calorimetry, and the 
calorimetric determination of pressure effects. IUPAC Chemical Data Series No 32, Enthalpies of 
Vaporization of Organic Compounds by V Majer and V Svoboda [9], contains a detailed review of 
calorimeters used to measure enthalpy of vaporization. Other monographs dealing extensively with 
calorimetric techniques have been published. These include Specialist Periodical Reports, Chemical 
Thermodynamics, volume 1 [10], which covered combustion and reaction calorimetry, heat capacity of organic 
compounds, vapour-flow calorimetry and calorimetric methods at high temperature. Physical Methods of 
Chemistry, Volume VI, Determination of Thermodynamic Properties [H, 12] contains a chapter on 
calorimetry and a chapter devoted to differential thermal methods including differential thermal calorimetry. 

B1. 27.5.1 MEASUREMENT OF HEAT CAPACITY 

The most important thermodynamic property of a substance is the standard Gibbs energy of formation as a 
function of temperature as this information allows equilibrium constants for chemical reactions to be 
calculated. The standard Gibbs energy of formation A f G° at 298.15 K can be derived from the enthalpy of 
formation A f //° at 298.15 K and the standard entropy AS° at 298.15 K from 


&(Cr = &\H- - TAS\ (B1.27.17) 

The enthalpy of formation is obtained from enthalpies of combustion, usually made at 298.15 K while the 
standard entropy at 298.15 K is derived by integration of the heat capacity as a function of temperature from T 
= K to 298.15 K according to equation (B 1.27. 16) . The Gibbs-Helmholtz relation gives the variation of the 
Gibbs energy with temperature 

{HG/T)/i)T\ ff = -H/T 2 . (B1.27.18) 


Hence it is necessary to measure the heat capacity of a substance from near K to the temperature required 
for equilibrium calculations to derive the enthalpy as a function of temperature according to equation 
(Bl.27.15) . 


LOW TEMPERATURE HEAT CAPACITY 

For solids and non-volatile liquids accurate heat capacity measurements are generally made in an adiabatic 
calorimeter. A typical low temperature aneroid-type adiabatic calorimeter used to make measurements 
between 4 K and about 300 K is shown in figure B 1.27.1 . The primary function of the complex assembly is to 
maintain the calorimeter proper at any desired temperature between 4 K and 300 K. The only energy gain 
should be from the addition of electrical energy during a measurement. The upper part of the calorimeter 
contains vessels for holding liquid nitrogen and helium that provide low temperature heat sinks. Construction 
materials are generally those having high thermal conductivity (e.g. copper) plated with reflectant material 
(e.g. chromium) to reduce radiant energy transfer. The calorimeter proper and its surrounding adiabatic shield 
are suspended by silk lines and can be raised to bring them into good thermal contact with the lower tank, 
thereby cooling the calorimeter. When the calorimeter proper has reached its desired temperature, thermal 
contact is broken by lowering the calorimeter and the adiabatic shield. Adiabatic conditions are maintained by 
keeping the temperature of the adiabatic shield at the temperature of the calorimeter and heat conduction is 

minimized by maintaining a high vacuum (10~ 3 Pa) inside the cryostat. The temperature is normally measured 
with high precision using a calibrated platinum resistance thermometer. A major source of heat leak is through 
the electrical leads. This can be minimized by tempering the leads as they pass through the nitrogen and 
helium tanks and then bringing them to the calorimeter temperature with an electrical heater on the floating 
ring. In operation, a known amount of electrical energy is added through the heater and the temperature rise 

(usually of the order of 5 K) is measured to within 10~ 3 K. The temperature of the adiabatic shield is 
automatically controlled to follow the temperature of the calorimeter. 



Figure Bl.27.1. Aneroid-type cryostat for low-temperature adiabatic calorimeter: 1, 2, liquid nitrogen 
transfer; 3, 4, 5, 6, liquid helium transfer parts; 7, brass vacuum jacket; 8, outer floating radiation shield; 9, 
liquid nitrogen tank; 10, liquid helium tank; 11, nitrogen radiation shield; 12, lead wire; 13, helium radiation 
shield; 14, adiabatic shield; 15, windlass; 16, helium exit connector; 17, copper shield for terminal block; 18, 
helium exit tube; 19, vacuum seal; 20, O-ring gasket; 21, cover plate; 22, coil spring; 23, helium vapour 
exchanger; 24, supporting braided silk line; 25, floating ring; 26, calorimeter assembly. (Reprinted with 
permission from 1968 Experimental Thermodynamics vol I (Butterworth).) 
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HIGH TEMPERATURE HEAT CAPACITY 


Adiabatic and drop calorimetry are the primary methods used to make measurements of heat capacity above 
room temperature. In drop calorimetry, a known mass of a sample at a known high temperature is dropped 
into a calorimeter vessel, usually close to room temperature and its temperature rise is measured. This method 
gives enthalpy differences, which are usually represented as a power series in the temperature. The equation 
can be differentiated to give the heat capacity. This method can be used to very high temperatures with 
moderate accuracy but it gives poor results when the sample undergoes phase transitions during the cooling 
process, since there may not be a complete transformation in the calorimeter. For systems with known phase 


transitions adiabatic calorimetry is widely used. This technique is similar to that used in low temperature 
calorimetry except that no cooling is required. An example high temperature adiabatic calorimeter is shown in 
figure B 1.27. 2. An adiabatic shield that in the figure is the outer silver cup surrounds the calorimeter proper. 
Its temperature is controlled to be as close as possible to that of the calorimeter proper. The shield is 
surrounded by an inner and outer guard, which consist of multiple layers of thin aluminium. The inner guard 
is usually heated to a temperature close to the calorimeter temperature. Since the main heat loss mechanism at 
high temperature is radiation, the volume surrounding the calorimeter does not need to be evacuated. 
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Figure Bl.27.2. Schematic vertical section of a high-temperature adiabatic calorimeter and associated 
thermostat (Reprinted with permission from 1968 Experimental Thermodynamics vol I (Butterworth).) 
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Two methods are generally used when operating an adiabatic calorimeter. In the continuous-heating method, 
the calorimeter and shield are heated at a constant rate such that the temperature difference between the 
calorimeter and shield are minimal. At predetermined temperatures the power and time are recorded allowing 
the average heat capacity to be determined. This method allows for rapid measurement and the control of the 
shield temperature is less demanding. In the intermittent-heating method a known amount of power is added 
for a known time and the temperature change measured. The control of the adiabatic shield is more difficult 
because of the sudden changes in the rate of change of the temperature of the calorimeter at the beginning and 
end of the heating period. However, it is essential to use the intermittent method when a transition such as a 
solid-solid, liquid-solid, or an annealing process takes place. 

HEAT CAPACITY OF GASES 

The heat capacity of a gas at constant pressure C is normally determined in a flow calorimeter. The 
temperature rise is determined for a known power supplied to a gas flowing at a known rate. For gases at 
pressures greater than about 5 MPa Magee et al [13] have recently described a twin-bomb adiabatic 
calorimeter to measure Cy. 

B1.27.5.2 COMBUSTION CALORIMETRY 


Combustion or bomb calorimetry is used primary to derive enthalpy of formation values and measurements 
are usually made at 298.15 K. Bomb calorimeters can be subdivided into three types: (1) static, where the 
bomb or entire calorimeter (together with the bomb) remains motionless during the experiment; (2) rotating- 


bomb calorimeters, where provision is made to rotate the bomb in the calorimetric media and (3) entirely 
rotating calorimeters, called dynamic. It is not necessary to use a rotating-bomb calorimeter for burning 
conventional organic compounds (containing only C, H, O and N). A stainless steel bomb without a 
corrosion-proof metal lining is suitable. 

For burning organic substances containing heteroatoms of non-metals and metals, dynamic calorimeters of the 
combined or aneroid types are used. Liquid rotating-bomb calorimeters can also be used. For burning 
compounds containing halogens and sulphur, a bomb made of corrosion-resistant metal or lined with such a 
metal is generally used. The most resistant metal for the protection of the inner surface of a bomb used in 
combustion of chlorine-, sulphur- or bromine-containing organic compounds is tantalum, since it is very little 
affected by the products of combustion of these substances. Platinum can also be used as a protective layer, 
even though it is prone to react with the reaction products (e.g. Cl 2 + HC1 + H 2 0); the correction required to 
account for the enthalpy of such a reaction can be made by the analysis of the quantity of platinum dissolved. 
To study comparatively slow bomb processes, an adiabatically jacketed calorimeter designed as a combined 
or aneroid type or an isothermal calorimeter can be used only if the reaction can be conducted under static 
conditions. 

STATIC BOMB CALORIMETER 

An example of a static bomb calorimeter used to measure energies of combustion in oxygen is shown in 
figure B 1.27. 3 The bomb is typically a heavy walled vessel capable of withstanding pressures of 20 MPa. A 
precisely known mass of the material to be burnt is held in a small platinum cup and oxygen is added to a 
pressure of about 3 MPa. The bomb is either immersed in a known mass of water or suspended in an 
evacuated vessel (aneroid type). The material is ignited by passing a current through a thin platinum wire 
stretched between the two metal posts, which causes an attached cotton or polythene fuse to burn. A small 
amount of water is added to the calorimeter to ensure that any solution that is formed is sufficiently dilute to 
allow the small corrections associated with the various solution processes to be 
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calculated. The amount of material in the vessel is chosen to give a temperature rise of from 1 K to 3 K, for a 
typical bomb immersed in about 3 kg H 2 0. For an organic compound this corresponds to a mass of from 0.5 g 
to 1.6 g. A static bomb calorimeter is used for substances containing only carbon, hydrogen, oxygen and 
nitrogen giving only carbon dioxide, water and N 2 and possibly small amounts of HN0 2 and HN0 3 . The 

temperature rise is typically measured to between 10 K and 10~ 5 K and is measured with either a platinum 
resistance or a quartz thermometer. In order to relate the temperature rise to the energy of combustion, the 
calorimeter constant (the amount of energy required to increase the temperature of the calorimeter by 1 K) 
must be known. This can be obtained either by direct electrical calibration or by burning a certified reference 
material whose energy of combustion has been determined in specifically designed calorimeters. Direct 
electrical calibration is not simple as it involves the installation of a high power electrical heater within the 
bomb. Measurements on reference materials are usually made at a National Standards Laboratory. Reference 
materials suitable for the calibration of various calorimeters have been recommended by the International 
Union of Pure and Applied Chemistry (IUPAC) [JJ. Benzoic acid is the most used reference material and its 
energy of combustion is known to about 1 part in 15,000. Involatile liquids or solids can usually be burnt 
directly. Volatile materials must be encapsulated in either plastic bags or glass ampoules. Combustion of these 
samples requires the addition of a known mass of an auxiliary material, which is usually an involatile oil 
whose energy of combustion is known. 



Figure Bl.27.3. Typical static combustion bomb. (Reproduced with permission from A Gallencamp & Co. 
Ltd.) 

ROTATING BOMB CALORIMETER 

For substances containing elements additional to C, H, O and N a rotating bomb calorimeter is generally used. 
A typical rotating bomb calorimeter system is shown in figure B 1.27.4 . With this calorimeter considerably 
more water is added to the combustion bomb and the continuous rotation of the bomb both about the 
cylindrical axis and end over 
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end ensures that the final solution is homogeneous and in equilibrium with the gaseous products. At the 
completion of the experiment this solution is withdrawn and analysed. Substances containing S, Si, P and the 
halogens can be studied in such calorimeters. To ensure that the products are in a known oxidation state it is 
generally necessary to add small quantities of reducing agents or other materials. 



Figure Bl.27.4. Rotating bomb isoperibole calorimeter. A, stainless steel bomb, platinum lined; B, heater; C, 
thermostat can; D, thermostat inner wall; E, thermostat water; G, sleeve for temperature sensor; H, motor for 
bomb rotation; J, motor for calorimeter stirrer; K, connection to cooling or heating unit for thermostat; L, 
circulation pump. 

Precision combustion measurements are primarily made to determine enthalpies of formation. Since the 
combustion occurs at constant volume, the value determined is the energy change A Q U. The enthalpy of 
combustion A Q Hcan be calculated from A Q U, provided that the change in the pressure within the calorimeter 
is known. This change can be calculated from the change in the number of moles in the gas phase and 
assuming ideal gas behaviour. Enthalpies of formation of compounds that do not readily burn in oxygen can 
often be determined by combusting in fluorine and the enthalpy of formation of volatile substances can be 
determined using flame calorimetry. For compounds that only combust at an appreciable rate at high 
temperature, such as zirconium in chlorine, the technique of hot-zone calorimetry is used. In this method one 
heats the sample only very rapidly with a known amount of energy until it reaches a temperature where 
combustion will occur. Alternatively, a well characterized material such as benzoic acid can be used as an 
auxiliary material which, when it burns, raises the temperature sufficiently for the material to combust. These 
methods have been discussed in detail [2, 3 and 4]. 

B1. 27.5.3 ENTHALPIES OF PHASE CHANGE 

Accurate enthalpies of solid-solid transitions and solid-liquid transitions (fusion) are usually determined in an 
adiabatic heat capacity calorimeter. Measurements of lower precision can be made with a differential scanning 
calorimeter (see later). Enthalpies of vaporization are usually determined by the measurement of the amount 
of energy required to vaporize a known mass of sample. The various measurement methods have been 
critically reviewed by Majer and Svoboda [9]. The actual technique used depends on the vapour pressure of 
the material. Methods based on 
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vaporization into a vacuum are best suited for pressures from about 25 kPa down to 10 -4 Pa. This method has 
been extensively developed by Sunner's group in Lund [14], The most recent design allows measurement on 
samples down to 5 mg over a temperature range from 300 K to 423 K with an accuracy of about 1%. Methods 
based on vaporization into a steady stream of carrier gas, useful in the range 0.05 Pa to 25 kPa, have also been 
developed in Lund under Wadso [15] and gas flow cells based on their designs are commercially available. 
The method is accurate to between 0.2 and 0.5%. Methods based on vaporization into a closed system, useful 
in the range 5 kPa to 3 MPa fall into two types; recycle and controlled withdrawal. Both types can give an 
accuracy approaching 0.1%. Methods based on recycle often contain a second calorimeter to determine the 
heat capacity of the flowing gas. 

B1. 27.5.4 SOLUTION CALORIMETRY 

Solution calorimetry covers the measurement of the energy changes that occur when a compound or a mixture 
(solid, liquid or gas) is mixed, dissolved or adsorbed in a solvent or a solution. In addition it includes the 
measurement of the heat capacity of the resultant solution. Solution calorimeters are usually subdivided by the 
method in which the components are mixed, namely, batch, titration and flow. 

BATCH CALORIMETERS 

Batch calorimeters are instruments where there is no flow of matter in or out of the calorimeter during the 
time the energy change is being measured. Batch calorimeters differ in the way the reactants are mixed and in 
the method used to determine the enthalpy change. Enthalpy changes can be measured by the various methods 


outlined above; isothermal, adiabatic, heat flow or isoperibole. It is necessary to have the reactants separated 
in the calorimeter. The most common method is to maintain one of the reactants in an ampoule that is broken 
to release its contents, which initiates the reaction. Initially, thin walled glass ampoules were used but these 
usually required the narrow neck to be flame-sealed after the contents were added. In recent years there have 
been significant improvements in ampoule design. An ampoule particularly suited for solids consists of a 
stainless steel cylinder with replaceable thin glass windows at each end. The cylinder can be taken apart with 
one half forming a cup into which the solid can be added and weighed. Wadso and coworkers have developed 
a variety of ampoules that attach to the stirrer. The glass window is broken by depressing the stirrer so as to 
impinge against an ampoule-breaking pin. This technique ensures good mixing of the reactants. A typical 
solution calorimeter is shown in figure B 1.27. 5 

Ampoules are satisfactory when the presence of a vapour space is not important. When volatile organic 
compounds are mixed in the presence of a vapour space there can be a considerable contribution to the 
measured heat effect from the vaporization. This results from the change in the vapour composition that 
occurs so as to maintain vapour-liquid equilibrium with the liquid mixture. For a small enthalpy of mixing, 
this correction can be greater than the enthalpy of mixing itself. A batch method suitable for the measurement 
of enthalpies of mixing in the absence of a vapour space is shown in figure B 1.27. 6 . Known masses of liquids 
A and B are separately confined over mercury and are mixed by rotation of the entire calorimeter. When 
liquids are mixed at constant pressure there is a volume change on mixing. The side arm C, which is partially 
filled with mercury, allows for the expansion or contraction of the mixture against the air space D. The 
calorimeter operates in the isoperibole mode. 
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Figure Bl.27.5. A typical solution calorimeter with thermometer, heater and an ampoule on the base of the 
stirrer which is broken by depressing it against the ampoule breaker. (Reproduced with permission from 
Sunner S and Wadso I 1959 Acta. Chem. Scand. 13 97.) 


Reviews of batch calorimeters for a variety of applications are published in the volume on Solution 
Calorimetry [8]: cryogenic conditions by Zollweg [22], high temperature molten metals and alloys by Colinet 
and Pasturel [19], enthalpies of reaction of inorganic substances by Cordfunke and Ouweltjes [16], electrolyte 


solutions by Simonson and Mesmer [24], and aqueous and biological systems by Wadso [25]. 
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Figure Bl.27.6. A calorimeter for enthalpies of mixing in the absence of a vapour space. (Reproduced with 
permission from Larkin J A and McGlashan M L 1961 J. Chem. Soc. 3245.) 
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TITRATION OR DILUTION CALORIMETERS 


In a titration or dilution calorimetry one fluid is added from a burette, usually at a well-defined rate, into a 
calorimeter containing the second fluid. One titration is equivalent to many batch experiments. Calorimeters 
are typically operated in either the isoperibole or isothermal mode. The method has been extensively 
developed by Izatt, Christensen and co-workers at Brigham Young University for the measurement of 
formation constants and enthalpies of reaction for a variety of organic and inorganic compounds. This 
technique is described in detail by Oscarson et al [23]. A titration calorimeter typically has a vapour space, 
which for large enthalpies of reaction or solution in a solvent with a low vapour pressure does not give rise to 
significant errors. The vapour space can be eliminated by filling the cell completely and, as the titration 
proceeds, the excess liquid flows out of the calorimeter into a reservoir. The calculations of the enthalpy 
change for such a procedure is complex. To measure enthalpies of mixing of organic liquids, Stokes and 
Marsh [ 20 ] have used an alternative method that eliminates both the vapour space and the effect of volume 
changes on mixing. Their isothermal dilution calorimeter is shown in figure B 1.27. 7 . The calorimeter proper, 
made from either stainless steel or glass, contains a stirrer, a sealed heater, a thermistor to measure the 
temperature, and a silver rod connected to a Peltier device. A known volume of mercury is added from a 


pipette and one component is added to completely fill the calorimeter. The calorimeter is then brought to 

isothermal conditions within 10~ 3 K. For endothermic reactions the Peltier device removes energy at a rate 
sufficient to counterbalance the power introduced from the stirrer. The second component is then injected into 
the calorimeter from a burette, displacing the mercury. Electrical power is added to maintain the calorimeter 
approximately isothermal. Usually the rate of addition of the second component is adjusted so that the 

calorimeter remains isothermal to within 10~ 3 K during the addition. The injection is stopped at selected 
intervals and the calorimeter brought back to isothermal conditions by either the addition of additional 
electrical power or additional liquid from the burette. The volume of the mercury pipette is such that one run, 
which comprises about 20 individual measurement, covers over half the composition range. The components 
are then interchanged to determine the second half of the curve. The two runs should give results overlapping 
to within 1%. For exothermic systems the Peltier device is run at high power and the energy removed is 
counterbalanced by the addition of electrical energy to maintain the calorimeter isothermal. When the second 
component is added, the power is turned off for known periods of time. This calorimeter has been modified to 
measure enthalpies of solution of gases in liquids and Stokes [ 20 ] has described a version of this calorimeter 
that uses the overflow technique. Titration and dilution calorimeters have the disadvantage that they are 
difficult to operate at high pressures or at temperatures considerably removed from ambient. Flow calorimetry 
does not suffer these disadvantages. 
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Figure Bl.27.7. Schematic diagram of isothermal displacement calorimeter: A, glass calorimeter cell; B, 
sealed heater; C, stainless steel stirrer; D, thermistor; E, inlet tube; F, valve; G, window shutters; H, silver rod; 


I, thermoelectric cooler; J, small ball valves; K, levelling device, (reproduced by permission from Costigan M 
J, Hodges L J, Marsh K N, Stokes R H and Tuxford C W 1980 Aust. J. Chem. 33 2103.) 

FLOW CALORIMETERS 

In a flow calorimeter one or more streams flow in and out of the calorimeter. Flow calorimeters are suited for 
measurements over a wide range of temperatures and pressures on both liquids and gases. A wide pressure 
range is possible because measurement are usually made in a small bore tube, and a wide temperature range is 
feasible because the in- flowing material can be readily brought to the calorimeter temperature by heat 
exchange with the out-flowing fluid. In a flow experiment there is generally no vapour space and changes in 
volume on mixing are inconsequential. Flow methods are not suitable for measurement involving solids, and 
usually large volumes of materials are required. 
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Various flow calorimeters are available commercially. Flow calorimeters have been used to measure heat 
capacities, enthalpies of mixing of liquids, enthalpy of solution of gases in liquids and reaction enthalpies. 
Detailed descriptions of a variety of flow calorimeters are given in Solution Calorimetry by Grolier [17], by 
Albert and Archer [18], by Ott and Wormald [21], by Simonson and Mesmer [ 24 ] and by Wadso [25], 

A flow calorimeter developed by Picker suitable for the measurement of heat capacity of a liquid is shown in 
figure Bl .27.8. The method measures the difference in heat capacity between the fluid under study and some 
reference fluid. The apparatus contains two thermistors T 1 and T 2 used to measure the temperature change 
that occurs when the flowing fluid is heated by two identical heaters Z 1 and Z 2< The standard procedure is to 
flow the reference material from A through both cells. With the same fluid, same power and same flow rate 
the temperature change AT should be the same. The temperature difference observed on flowing the sample 
material from B through cell C 1 while the reference is still flowing through C 2 is a measure of the heat 
capacity difference between the two liquids. The flow method has been extensively developed for 
measurement on biological systems and on liquid mixtures at high temperatures and pressures. The apparatus 
constructed by Christensen and Izatt, shown in figure B 1.27. 9 can be used to measure positive and negative 
enthalpy changes at pressures up to 40.5 MPa and temperatures up to 673 K. Two high pressure pumps were 
used for the fluid flow. Mixing occurs in the top half of the isothermal cylinder where the fluid from the two 
pumps meet. A control heater encircles the cylinder, which is attached by three heat-leak rods to a base plate 
maintained 1 K below the cylinder temperature. A pulsed electrical current is passed through the control 
heater to maintain the temperature of the cylinder the same as that of the walls of the oven. During mixing the 
frequency of the pulses are either increased or decreased depending on the size of the enthalpy of mixing. 
Commercial calorimeters are available based on both the Picker and Christensen et al designs. 
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Figure Bl.27.8. Schematic view of Picker's flow microcalorimeter. A, reference liquid; B, liquid under study; 
P, constant flow circulating pump; Zj and Z 2 , Zener diodes acting as heaters; Tjand T 2 , thermistors acting as 
temperature sensing devices; F, feedback control; N, null detector; R, recorder; Q, thermostat. In the above A 
is the reference liquid and C 2 is the reference cell. When B circulates in cell Cjthis cell is the working cell. 
(Reproduced by permission from Picker P, Leduc P-A, Philip P R and Desnoyers J E 1 97 1 J. Chem. Thermo. 
B41.) 
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Figure Bl.27.9. High-temperature heat-leak calorimeter. (Reproduced by permission from Christensen J J 
and Izatt R M 1984 An isothermal flow calorimeter designed for high-temperature, high-pressure operation 
Thermochim. Acta 73 117-29.) 


B1.27.6 DIFFERENTIAL SCANNING CALORIMETRY 

Boerio-Goats and Callanan [12] have recently reviewed different thermal methods, including differential 
scanning calorimetry. A differential scanning calorimeter (DSC) consists of two similar cells containing a 
sample and a reference material. In one type of DSC both cells are subjected to a controlled temperature 
change by applying power to separate heaters and the temperature difference between the sample and 
reference cells is observed. When an endothermic change occurs in the sample, for example, melting, the 
sample temperature lags behind that of the reference. In a power compensated DSC the power to that cell is 
increased to keep the heating rate of the sample and reference cells the same. A schematic diagram of a 
typical DSC is shown in figure B 1.27. 10 . Another type of calorimeter, developed initially by Tian and Calvet, 
is also considered a differential scanning calorimeter but is called a heat- flux or heat-conduction calorimeter. 
A schematic diagram of this type of calorimeter is shown in figure B 1.27.11 . In this calorimeter two sets of 
thermopiles, consisting of multiple junction thermocouples, connect both cells to a large block enclosing the 
sample. The output of the two thermopiles, when connected in opposition gives a measure of the difference in 
energy flows between the sample and reference when both are heated at the same rate. Both types of DSC can 
be used to measure heat capacities, enthalpies of phase change, adsorption, dehydration, reaction and 
polymerization. The major advantage of DSC is the rapidity of the measurements, the small sample 
requirement, and 
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the ready availability of commercially available equipment capable of operating from liquid nitrogen 
temperatures to well above 1000 K by unskilled personnel. With the majority of DSC instruments it is 
possible to obtain heat capacities with an accuracy approaching 2-3%, provided one uses the optimum sample 
size and scan rate along with careful calibration of the temperature scale and the calorimetric response. 
Reference materials are used to calibrate a DSC and to check the correct operating conditions. Differential 
scanning calorimeters have been developed for specific applications. A very precise calorimeter, developed by 
Privilov and coworkers [17] and now available commercially, has been used to measure the heat capacity of 
very small amounts of biological materials in aqueous solutions. 

Temperature sensors 



Individual healer 


Figure Bl.27.10. Schematic diagram of a power-compensated DSC. 
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Figure Bl.27.11. Schematic diagram of a Tian-Calvet heat- flux or heat-conduction calorimeter. 
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B1. 27.7 ACCELERATING RATE CALORIMETRY 


Special calorimeters have been developed to make thermal hazard evaluations. In an exothermic chemical 
reaction there is the possibility of a runaway reaction occurring where the energy released from the reaction 
increases the temperature with a consequent increase in the reaction rate, thus increasing the release of energy. 
If there are insufficient resources to remove the generated energy, hazardous temperature and pressure 
regimes can be encountered. An accelerating rate calorimeter or ARC, initially developed at Dow Chemical, 
is available commercially to study such thermal hazards. A schematic diagram is shown in figure B 1.27. 12. 
The reaction vessel consists of a spherical bomb that can withstand pressures greater than 20 MPa and 
temperatures to 770 K. The calorimeter operates in an adiabatic mode under computer control in a heat- wait- 
seek mode. After the reaction comes to thermal equilibrium the rate of temperature rise due to the reaction is 
determined. If this is less than a preset value the calorimeter temperature is increased in steps and the process 
repeated until the reaction rate is sufficient to give the preset temperature rise. The chemical reaction then 
proceeds at its own rate and the temperature and pressure recorded. From these measurements the kinetic 
parameters are determined and used to establish the conditions that could lead to a runaway reaction. A 
problem with this calorimeter is that the massive vessel required to withstand the pressure has a heat capacity 
well in excess of the heat capacity of the reactants. This problem can be overcome by having a thin-walled 
vessel within the bomb and the pressure in the space between the reaction vessel and the bomb is 
automatically controlled to the pressure in the calorimeter. 
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Figure Bl.27.12. Schematic diagram of an accelerating rate calorimeter (ARC). 
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B1.27.8 SPECIALIZED CALORIMETERS 

Ultra sensitive Calvet type microcalorimeters are available commercially to measure the deterioration of 
materials over relatively long periods. For example, the lifetime of a battery can be estimated from the energy 
released when it is placed in such a calorimeter on open circuit [27], Similarly the shelf life of drugs and other 
digestible products can be evaluated from the very small calorimetric response that results from 
decomposition reactions [28], Calorimeters have also been developed to measure the heat effects associated 
with the uptake of oxygen and carbon dioxide in living plants. Such measurements have been used to identify 
species that exhibit high rates of metabolism [29]. Calorimeters are also used to measure the rate of enzyme 
reaction and as a clinical tool to identify micro-organisms and test the effect of drugs in inhibiting the growth 
of such micro-organisms. 


B1.27.9 RECENT DEVELOPMENTS 

Recent developments in calorimetry have focused primarily on the calorimetry of biochemical systems, with 
the study of complex systems such as micelles, proteins and lipids using microcalorimeters. Over the last 20 
years microcalorimeters of various types including flow, titration, dilution, perfusion calorimeters and 
calorimeters used for the study of the dissolution of gases, liquids and solids have been developed. A more 
recent development is pressure-controlled scanning calorimetry [26] where the thermal effects resulting from 
varying the pressure on a system either step-wise or continuously is studied. 
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B1.28 Electrochemical methods 

Alexia W E Hodgson 


B1.28.1 INTRODUCTION 

Electrochemical methods may be classified into two broad classes, namely potentiometric methods and 
voltammetric methods. The former involves the measurement of the potential of a working electrode 
immersed in a solution containing a redox species of interest with respect to a reference electrode. These are 
equilibrium experiments involving no current flow and provide thermodynamic information only. The 
potential of the working electrode responds in a Nernstian manner to the activity of the redox species, whilst 
that of the reference electrode remains constant. In contrast, in voltammetric methods the system is perturbed 


and involves the control of the electrode potential or the current as the independent variable, and measurement 
of the resulting current or potential. 

The latter may be further subdivided into transient experiments, in which the current and potential vary with 
time in a non-repetitive fashion; steady-state experiments, in which a unique interrelation between current and 
potential is generated, a relation that does not involve time or frequency and in which the steady-state current 
achieved is independent of the method adopted and periodic experiments, in which current and potential vary 
periodically with time at some imposed frequency. 

In this chapter, transient techniques, steady-state techniques, electrochemical impedance, 
photoelectrochemistry and spectroelectrochemistry are discussed. 


B1.28.2 INTRODUCTION TO ELECTRODE REACTIONS 

Electrode processes are a class of heterogeneous chemical reaction that involves the transfer of charge across 
the interface between a solid and an adjacent solution phase, either in equilibrium or under partial or total 
kinetic control. A simple type of electrode reaction involves electron transfer between an inert metal electrode 
and an ion or molecule in solution. Oxidation of an electroactive species corresponds to the transfer of 
electrons from the solution phase to the electrode (anodic), whereas electron transfer in the opposite direction 
results in the reduction of the species (cathodic). Electron transfer is only possible when the electroactive 
material is within molecular distances of the electrode surface; thus for a simple electrode reaction involving 
solution species of the form 

in which species O is reduced at the electrode surface to species R by the transfer of n electrons, the overall 
conversion 


may be divided into three steps [I, 3]: 

electron era nsfer 
' 'electrode * "electrode 

rti a *h transport 
^elecKode > "bulk* 

The scheme involves the transport of the electroactive species from the bulk solution to the electrode surface, 
where it can undergo electron transfer, thus forming the reduced species 7? at the electrode surface. Finally, 
the reduced species is transported from the electrode surface back to the bulk solution. The overall reaction 
rate will be limited by the slowest step, therefore a particular reaction might be controlled by either the 
kinetics of electron transfer or by the rate at which material is brought to or from the electrode surface. The 
rate of electron transfer can be experimentally controlled through the electrode potential imposed and can vary 
by several orders of magnitude in a small potential interval. For the steps involving the transport of species to 
and from the electrode surface there are three distinct modes of mass transport regime which can occur: 
diffusion, migration and convection. 


The nature of electrode processes can, of course, be more complex and also involve phase formation, 
homogeneous chemical reactions, adsorption or multiple electron transfer [1, 2, 3 and 4]. 

B1. 28.2.1 ELECTRON TRANSFER 

For a simple electron transfer reaction containing low concentrations of a redox couple in an excess of 
electrolyte, the potential established at an inert electrode under equilibrium conditions will be governed by the 
Nernst equation and the electrode will take up the equilibrium potential E for the couple O/R. In terms of 

current density, the dynamic situation at the electrode surface is expressed by J = J * * = , the sum of the 
partial cathodic and partial anodic current densities, which have opposite signs and the magnitude of which at 

equilibrium potential is defined as h = - J = J. The exchange current density, y , is a measure of the 

amount of electron transfer activity at the equilibrium potential. On applying a potential to the electrode, the 
system will seek to move towards a new equilibrium where the concentrations of the electroactive species are 
those demanded by the Nernst equation for the applied potential, and an associated current of reduction or 
oxidation will flow. The rate of electron transfer can be described by classical kinetics, and hence expressed 
by the product of a rate constant with the concentration of the reactant at the electrode surface. The rate of the 
heterogeneous electron transfer will depend on the potential gradient at the interface driving the transfer of 
electrons between the electrode and the solution phases and in general will take the form of 

k = k^p\i- a ^F/RT)E}^ nd k = A Q cxp(C-ff A #iJ7/f T}K ) for a reduction and oxidation process respectively. ot c 

and ot A are the cathodic and anodic transfer coefficients, F is the Faraday constant and k^ the standard rate 
constant [1, 2, 3 and 4]. By substituting and defining the overpotential, r| = E - E Q , as the deviation of the 
potential from the equilibrium 


value, the Butler- Volmer equation for current density may be derived: 


f /&a*iF \ f-acriF VI 


which represents the fundamental equation of electrode kinetics. The equation may be simplified for the 
limiting cases in which very high positive or very high negative overpotentials are applied, leading to the 
Tafel equations, which provide a simple method for determining exchange current density and transfer 
coefficient (figure Bl.28.1). For very low overpotentials the equation simplifies toy =j^{nF/RT)r\, indicating 
that very close to the equilibrium potential, the current density varies linearly with overpotential. 
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Figure Bl.28.1. Schematic Tafel plot for the experimental determination of/ and a. 
B1. 28.2.2 MASS TRANSPORT 

Diffusion, convection and migration are the forms of mass transport that contribute to the essential supply and 
removal of material to and from the electrode surface [1, 2, 3 and 4]. 

Diffusion may be defined as the movement of a species due to a concentration gradient, which seeks to 
maximize entropy by overcoming inhomogeneities within a system. The rate of diffusion of a species, the 
flux, at a given point in solution is dependent upon the concentration gradient at that particular point and was 
first described by Fick in 1855, who considered the simple case of linear diffusion to a planar surface: 


Flux = -D 




where dc^dx is the concentration gradient andZ) is the diffusion coefficient (figure B 1.28. 2). The flux of 
species to and from the electrode surface must also be accompanied by the conversion of reactant to product 
and by the flux of electrons. The flux of material crossing the electrode boundary can therefore be converted 
to current density by equating the two fluxes: 




= -D 


dnU) 
<Lx 


The second of Fick' s laws expresses the change in concentration of a species at a point as a function of time 
due to diffusion (figure B 1.28. 2). Hence, the one-dimensional variation in concentration of material within a 
volume element bounded by two planes x and x + dx during a time interval dt is expressed by dc^(x,t)/dt) = D 

(d cfic,t)/dx ). Fick's second law of diffusion enables predictions of concentration changes of electroactive 
material close to the electrode surface and solutions, with initial and boundary conditions appropriate to a 
particular experiment, provide the basis of the theory of instrumental methods such as, for example, potential- 
step and cyclic voltammetry. 
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Figure Bl.28.2. Fick's laws of diffusion, (a) Fick's first law, (b) Fick's second law. 

Convection is the movement of a species due to external mechanical forces. This can be of two types: natural 
convection, which arises from thermal gradients or density differences within the solution, and forced 
convection, which can take the form of gas bubbling, pumping or stirring. The former is undesirable and can 
occur in any solution 


at normal sized electrodes on the time scale often or more seconds. In contrast, the latter has the function of 
overcoming contributions from natural convection and of increasing the rate of mass transport and hence 
facilitates the study of the kinetics of electrode reactions. Forced convection usually possesses well defined 
hydrodynamic behaviour, thus enabling the quantitative description of the flow in the solution and the 
prediction of the pattern of mass transport to the electrode. 

Migration is the movement of ions due to a potential gradient. In an electrochemical cell the external electric 
field at the electrode/solution interface due to the drop in electrical potential between the two phases exerts an 
electrostatic force on the charged species present in the interfacial region, thus inducing movement of ions to 
or from the electrode. The magnitude is proportional to the concentration of the ion, the electric field and the 
ionic mobility. 


Most electrochemical experiments are designed so that one of the mass transport regimes dominates over the 
others, thus simplifying the theoretical treatment, and allowing experimental responses to be compared with 
theoretical predictions. Normally, specific conditions are selected where the mass-transport regime results 
only from diffusion or convection. Such regimes allow mass transport to be described by a set of 
mathematical equations, which have analytical solutions. A common experimental practice to render the 
migration of reactants and products negligible is to add an excess of inert supporting electrolyte, thus ensuring 
that any migration is dominated by the ions of the electrolyte. Electro-neutrality is also thus maintained, 
ensuring that electric fields do not build up in the solution. Furthermore, the addition of a high concentration 
of electrolyte increases the solution conductivity, compresses the double-layer region to dimensions of 10-20 
A, and ensures a constant ionic strength during the electrochemical experiment. As a consequence, the 
activities of the electroactive species and thus the applied potentials, as predicted by the Nernst equation and 
by the rate of electron transfer, remain constant throughout the experiment. 


B1.28.3 TRANSIENT TECHNIQUES 

Voltammetry relies on the registering of current-potential profiles, whether by controlling the potential of the 
working electrode and recording the resulting current or by measuring the potential response as a function of 
an applied current. The electrochemical cell, as well as a conducting medium, must also contain at least one 
other electrode. In a two-electrode configuration, the second electrode is a reference electrode that serves both 
as a standard against which the working potential is measured and as the necessary current-carrying electrode 
where the rate of charge transfer must be equal and opposite to that of the working electrode. Commonly, 
these two functions are separated in a three-electrode configuration, in which a secondary or counter electrode 
is employed as the current-carrying electrode and a separate reference electrode reports the potential of the 
working electrode. This prevents any undesirable polarization of the reference electrode, since only small 
currents flow in the reference electrode loop. Placement of the reference electrode close to the working 
electrode enables the exclusion of the majority of the solution IR drop, which is often achieved by the use of 
a Luggin capillary [1, 2]. 

The measurement of the current for a redox process as a function of an applied potential yields a 
voltammogram characteristic of the analyte of interest. The particular features, such as peak potentials, half- 
wave potentials, relative peak/wave height of a voltammogram give qualitative information about the analyte 
electrochemistry within the sample being studied, whilst quantitative data can also be determined. There is a 
wealth of voltammetric techniques, which are linked to the form of potential program and mode of current 
measurement adopted. Potential-step and potential-sweep 


techniques are carried out under conditions where diffusion is the only mode of mass transport and the 
experiment is designed such that diffusion may be described by linear diffusion to a plane electrode and 
changes in concentration occur perpendicular to the surface [1, 2, 3, 4 and 5]. 

B1. 28.3.1 LINEAR-SWEEP AND CYCLIC VOLTAMMETRY 

Linear-sweep and cyclic voltammetry were first reported in 1938 and described theoretically in 1948 by 
Randies and Sevtik [I, 2, 3, 4, 5 and 6]. The techniques consist of scanning the potential between two chosen 
limits at a known sweep rate, u, and measuring the current response arising from any electron transfer 
process. In linear-sweep voltammetry the scan terminates at the chosen end potential, E^ whereas in cyclic 
voltammetry, the potential is reversed back at i? f toward the starting potential E^ or another chosen potential 
limit. The potential limits define the electrode reactions that take place so that the potential scan is normally 
chosen to start at a potential value where no electrode reaction occurs and swept towards positive or negative 
potentials to investigate oxidation or reduction processes, respectively. The current-potential curves for a 
simple reversible electrode reaction are characterized by unsymmetrical peaks with the current density 
increasing as the sweep rate is raised. On the forward sweep, the current begins to rise as the potential reaches 
the vicinity of the reversible formal potential £", then passes through a maximum before decreasing again as 
the potential is sufficiently driven to produce a diffusion-limited current. The surface concentration of an 
electroactive species, R, decreases as the potential is made more positive and the rate of oxidation increases, 
until it becomes effectively zero, at which point the reaction is diffusion controlled. In terms of concentration 
profiles, the flux of species to the electrode surface increases with potential (hence time) and continues to 
increase until the surface concentration reaches zero, at which point the flux to the surface starts to decrease, 
since the surface concentration remains at zero, yielding the peak-shaped response ( figure B 1.28. 3 ). On the 
reverse sweep, anodic current continues to flow until the potential is still sufficiently negative to cause the 
oxidation of R. When the potential, however, reaches the vicinity off", the oxidized species produced can 
diffuse back to the electrode surface to be reduced, the current becomes cathodic and a similar peak-shaped 


response is obtained as the reaction becomes diffusion controlled. 

The scan rate, o = |d£7d4 plays a very important role in sweep voltammetry as it defines the time scale of the 
experiment and is typically in the range 5 mV s to 100 V s for normal macroelectrodes, although sweep 
rates of 10 6 V s _1 are possible with microelectrodes (see later). The short time scales in which the 
experiments are carried out are the cause for the prevalence of non-steady-state diffusion and the peak-shaped 
response. When the scan rate is slow enough to maintain steady-state diffusion, the concentration profiles with 
time are linear within the Nernst diffusion layer which is fixed by natural convection, and the current- 
potential response reaches a plateau steady-state current. On reducing the time scale, the diffusion layer 
cannot relax to its equilibrium state, the diffusion layer is thinner and hence the currents in the non-steady- 
state will be higher. 
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Figure Bl.28.3. Concentration profiles of an electroactive species with distance from the electrode surface 
during a linear sweep voltammogram. 

Cyclic voltammetry provides a simple method for investigating the reversibility of an electrode reaction ( table 
Bl.28.1 ). The reversibility of a reaction closely depends upon the rate of electron transfer being sufficiently 
high to maintain the surface concentrations close to those demanded by the electrode potential through the 
Nernst equation. Therefore, when the scan rate is increased, a reversible reaction may be transformed to an 
irreversible one if the rate of electron transfer is slow. For a reversible reaction at a planar electrode, the peak 
current density, /, is given by 


y p = 2.69x l0 5 n m D u \f v m 


where n is the number of electrons, D is the diffusion coefficient, £'j*the concentration of the electroactive 
species in the bulk and o the sweep rate. Of particular importance is the proportionality of the peak current to 
the square root of the scan rate. In addition, for a reversible couple, the cathodic and anodic peak potentials 
are separated by 59/n mV, the reversible half-wave potential is situated midway between the peaks, the peak 
potential is independent of scan rate and the peak current ratio equals 1 ( figure B 1.28.4 ). As the response 
becomes less reversible, the separation between the peaks increases, as an overpotential is necessary to drive 


reduction and oxidation reactions, and the shape of the peaks will become more drawn out. Beyond the peak, 
however, the electrode reaction remains diffusion controlled. For totally irreversible systems the reverse peak 
disappears completely and the peak current density is expressed by 

/p = 2.99 x 10 s /i(<* nJ 1/2 cf , IJ 1/ V fl 
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where a is the transfer coefficient and n a the number of electrons transferred up to and including the rate- 
determining step. The majority of redox couples fall between the two extremes and exhibit quasi-reversible 
behaviour. When investigating an electrode reaction for reversibility it is essential to obtain results over a 
sweep-rate range of at least two orders of magnitude, in order not to reach erroneous conclusions. In addition, 
although subsequent cyclic voltammograms enable valuable mechanistic information to be deduced, the first 
sweep cycle only should be considered for accurate analysis of kinetic data. 

If adsorbed electroactive species are present on the electrode surface, the shape of the cyclic voltammogram 
changes, since the species do not need to diffuse to the electrode surface. In this case the peaks are 
symmetrical with coincident peak potentials provided the kinetics are fast. 

Table Bl.28.1 Diagnostic tests for reversibility of electrode processes in cyclic voltammetry at 293 K. 
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Quasi -reversible process 
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Figure Bl.28.4. Cyclic voltammogram for a simple reversible electrode reaction in a solution containing only 
oxidized species. 


On investigating a new system, cyclic voltammetry is often the technique of choice, since a number of 
qualitative experiments can be carried out in a short space of time to gain a feeling for the processes involved. 
It essentially permits an electrochemical spectrum, indicating potentials at which processes occur. In 
particular, it is a powerful method for the investigation of coupled chemical reactions in the initial 
identification of mechanisms and of intermediates formed. Theoretical treatment for the application of this 
technique extends to many types of coupled mechanisms. 

B1. 28.3.2 POTENTIAL-STEP TECHNIQUES 

In a potential-step experiment, the potential of the working electrode is instantaneously stepped from a value 
where no reaction occurs to a value where the electrode reaction under investigation takes place and the 
current versus time (chronoamperometry) or the charge versus time (chronocoulometry) response is recorded. 
The transient obtained depends upon the potential applied and whether it is stepped into a diffusion control, in 
an electron transfer control or in a mixed control region. Under diffusion control the transient may be 
described by the Cottrell equation obtained by solving Fick's second law with the appropriate initial and 
boundary conditions [1, 2, 3, 4, 5 and 6]: 

Immediately after the imposition of a large negative overpotential in a solution containing oxidized species, 
O, a large current is detected, which decays steadily with time. The change in potential from E Q will initiate 
the very rapid reduction of all the oxidized species at the electrode surface and consequently of all the 
electroactive species diffusing to the surface. It is effectively an instruction to the electrode to instantaneously 
change the concentration of O at its surface from the bulk value to zero. The chemical change will lead to 
concentration gradients, which will decrease with time, ultimately to zero, as the diffusion-layer thickness 
increases. At time t = 0, on the other hand, <3c-/<3x) v _ n will tend to infinity. The linearity of a plot of/ versus t~ 

1/9 l x u 

' confirms whether the reaction is under diffusion control and can be used to estimate values for the 
diffusion coefficient. It is a good technique for determining exact kinetic parameters when a mechanism is 
fully understood. Under mixed control, where the rates of diffusion and electron transfer are comparable, the 
current decays less steeply: at short times it will be controlled by electron transfer but, as the surface 
concentration is depleted, mass transport will become the rate-limiting step. 

When analysing the data, it is important to consider a wide time range to ensure the reliability of the data, 
since at short times, <1 ms, it will be determined by the charging time of the double layer, and at longer times, 
>10 s, by the effects of natural convection. 

Potential-step techniques can be used to study a variety of types of coupled chemical reactions. In these cases 
the experiment is performed under diffusion control, and each system is solved with the appropriate initial and 
boundary conditions. 

Double potential steps are useful to investigate the kinetics of homogeneous chemical reactions following 
electron transfer. In this case, after the first step — raising to a potential where the reduction of O to R occurs 
under diffusion control — the potential is stepped back after a period x, to a value where the reduction of O is 
mass-transport controlled. The two transients can then be compared and the kinetic information obtained by 
looking at the ratio of 
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the currents, which are a function of both x and the homogeneous rate constant, k. This is a good method for 
obtaining exact information, provided that the mechanism is already understood. 

B1. 28.3.3 PULSE VOLTAMMETRY 

Pulse techniques were originally devised to provide enhanced sensitivity in classical polarography for 
analytical applications [7, 8 and 9]. Sub-nanomolar detection limits can, in fact, be achieved with mercury 
electrodes where charging and background faradic currents are minimal. At solid electrodes (C, Pt, Au), 
charging currents and background currents arising from electrode surface reactions limit the level of analyte 
detection. However, pulse techniques remain particularly useful when looking at analyte concentrations of 10~ 
5 M and lower, where voltammetric techniques such as linear-sweep voltammetry and cyclic voltammetry 
become limited by the difficulty of measuring faradic currents in the presence of background currents. Many 
step techniques have been devised, based on the succession of potential steps of varying height and in forward 
and reverse directions [2, 6, 7]. They find wide application in digitally based potentiostats, the electronics of 
which are suited to their exploitation. The current is normally sampled toward the end of the potential pulse, 
after the capacitative current has decayed, and the pulse widths are adjusted to fit between this limit and the 
onset of natural convection. Normal-pulse voltammetry (NPV), differential -pulse voltammetry (DPV) and 
square-wave voltammetry (SWV) are perhaps the most widely used of a variety of pulse techniques that have 
been developed. 

In NPV [2, 7, 10, 11], short pulses of increasing height are superimposed on a constant base potential, Z? b , 
where no reaction occurs ( figure B 1.28. 5(a) ). At the end of the pulse of width, t^ (typically 50-60 ms), the 
potential is returned to E^, where it is held for another fixed period of a few seconds, before being pulsed 
again with a height increase determined by the scan rate. The current is sampled at the end of each pulse and 
the values are plotted against the potential to give a voltammetric profile similar to a steady-state 
voltammogram. The maximum current is given by the Cottrell equation, j = n FDc^ Ar 1 '"/^"? where t^ is the 

time at which the current is measured. NPV is therefore a good technique for determining diffusion 
coefficients. 

In DPV [2, 7, 11] the pulse height is kept constant and the base potential is either swept constantly or is 
incremented in a staircase ( figure B 1.28. 5(b) ). The current is sampled just before the end of the pulse and just 
before pulse application, and the difference between the two measurements is plotted as a function of the 
potential. The resulting voltammogram is peak-shaped since it essentially is the differential of a steady-state 
shaped response, and for this feature the technique particularly lends itself to analytical purposes, enabling 
lower detection limits to be achieved. For a reversible system, the peak is symmetric with E^ = Ey 2 -hEI2, 
where AE is the pulse amplitude. In general, DPV is better at eliminating capacitative contributions and the 
peak-shaped response is useful for distinguishing two waves with close half-wave potentials. 
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Figure Bl.28.5. Applied potential-time waveforms for (a) normal pulse voltammetry (NPV), (b) differential 
pulse voltammetry (DPV), and (c) square-wave voltammetry (SWV), along with typical voltammograms 
obtained for each method. 

SWV is an alternative voltammetric technique, first reported in 1952 by Barker and Jenkins [12] and 
subsequently developed into the form known today by Osteryoung et al [13, 14, 15 and 16]. The potential- 
time waveform is composed of a sequence of symmetrical square-wave pulses superimposed on an underlying 
ramp (figure B 1.28. 5(c)). The critical parameters are the step height of the underlying potential scan, AE S ; the 
height of the square-wave pulse, ^ sw ; the pulse width, t^ and the time at which the current is sampled on the 
forward and reverse pulses, ^ s . Current measurements are made near the end of the pulse in each square-wave 
cycle: once at the end of the forward pulse and once at the end of the reverse pulse. However, capacitative 
contributions can be discriminated against before they decay, since over a small potential range between 
forward and reverse pulses, the capacity is constant and is thus annulled by subtraction. Consequently, shorter 
pulses than in DPV and NPV can be applied, enabling higher frequencies to be employed and much faster 
analysis to be carried out. The difference between the two currents, the net current, is plotted versus the base 
staircase potential, yielding a peak-shaped response. Since the square-wave modulation amplitude is large, the 
reverse pulses cause the reverse reaction to occur and, thus, the net current is larger 
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than either the forward or the reverse components. This, coupled with the effective discrimination against 
charging currents, enables a more sensitive analysis. The resulting peak-shaped voltammograms are 
symmetrical with characteristic position, width and height: the peak potential, E^, coincides with the half- 
wave potential of a redox couple, the peak width indicates the effective number of electrons transferred and 
the peak current is proportional to the analyte concentration. In addition, the peak shape and position have 
been found to be largely independent of the size and geometry of the electrode. The net current is generally 


compared with theoretical predictions of a dimensionless current W, which are related by the Cottrell equation 
for the characteristic time: 

n \" 2 \ 


J = H£) )* 


where t^ is the pulse width. As well as for analysis, SWV has been found to be well suited to kinetic 
investigations. 

B1.28.3.4 STRIPPING VOLTAMMETRY 

Stripping voltammetry involves the pre-concentration of the analyte species at the electrode surface prior to 
the voltammetric scan. The pre-concentration step is carried out under fixed potential control for a 
predetermined time, where the species of interest is accumulated at the surface of the working electrode at a 
rate dependent on the applied potential. The determination step leads to a current peak, the height and area of 
which is proportional to the concentration of the accumulated species and hence to the concentration in the 
bulk solution. The stripping step can involve a variety of potential waveforms, from linear-potential scan to 
differential pulse or square-wave scan. Different types of stripping voltammetries exist, all of which 
commonly use mercury electrodes (dropping mercury electrodes (DMEs) or mercury film electrodes) [7, 17 ]. 

Anodic-stripping voltammetry (ASV) is used for the analysis of cations in solution, particularly to determine 
trace heavy metals. It involves pre-concentrating the metals at the electrode surface by reducing the dissolved 
metal species in the sample to the zero oxidation state, where they tend to form amalgams with Hg. 
Subsequently, the potential is swept anodically resulting in the dissolution of the metal species back into 
solution at their respective formal potential values. The determination step often utilizes a square-wave scan 
(SWASV), since it increases the rapidity of the analysis, avoiding interference from oxygen in solution, and 
improves the sensitivity. This technique has been shown to enable the simultaneous determination of four to 
six trace metals at concentrations down to fractional parts per billion and has found widespread use in 
seawater analysis. 

Cathodic stripping voltammetry follows a similar sequence of events, except that trace anionic species are 
reduced in the form of insoluble salts with metal constituents on the electrode surface, e.g. Ag and Hg, during 
application of a short, relatively positive deposition potential. The applied potential is then swept linearly or 
pulsed from the deposition potential in the cathodic direction resulting in the selective desorption of the 
anionic species according to the respective formal potential values. Cathodic stripping voltammetry can be 
used to determine organic and inorganic compounds that form an insoluble film at the electrode surface. 
Various inorganic analytes such as halide ions, sulphide ions and oxo-anions are capable of forming insoluble 
Hg salts which can be pre-concentrated on the Hg electrode surface and be measured. 
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Adsorptive stripping analysis involves pre-concentration of the analyte, or a derivative of it, by adsorption 
onto the working electrode, followed by voltammetric measurement of the surface species. Many species with 
surface-active properties are measurable at Hg electrodes down to nanomolar levels and below, with detection 
limits comparable to those for trace metal determination with ASV. 

Improved sensitivities can be attained by the use of longer collection times, more efficient mass transport or 
pulsed waveforms to eliminate charging currents from the small faradic currents. Major problems with these 
methods are the toxicity of mercury, which makes the analysis less attractive from an environmental point of 
view, and surface fouling, which commonly occurs during the analysis of a complex solution matrix. Several 
methods have been reported for the improvement of the pre-concentration step [17, 18]. The latter is, in fact, 


strongly influenced by the choice of solvent, electrode material, pH, electrode potential and temperature. A 
constant mass-transport rate leads to better reproducibility and hence stirring is often used with static mercury 
drop electrodes and stationary electrodes. Hydrodynamic electrodes are also employed in order to increase the 
sensitivity and decrease the detection limits. 

Recent years have witnessed the exploitation of stripping voltammetry in chemical sensors. Complex, fixed- 
site ASV analysers are used to determine a wide range of metals, such as Cr, Ni, Cu, V, Sn, As and Cd, in the 
effluents from mining, mineral processing, metal-finishing and related industries. The portable 
instrumentation and low power demands of stripping analysis satisfy many of the requirements for on-site in 
situ measurements. The development of remotely deployed submersible stripping probes, easy-to-use 
microfabricated metal sensor strips and micromachined, hand-held total stripping analysers have been 
reported to move the measurement of trace metals to the field and to perform them more rapidly, reliably and 
inexpensively [ 17 , 18 , 19 and 20 ]. 


B1.28.4 STEADY-STATE TECHNIQUES 

In the study of electrode reactions, the rates of electron transfer are very often high compared to mass 
transport, rendering the extrapolation of mechanistic and kinetic data unfeasible. It is therefore essential for 
the study of electrode reactions and the extrapolation of kinetic information to disrupt the equilibrium by 
increasing the rate of mass transport and forcing the process into a mixed-control region where the rate of 
electron transfer is comparable to that of mass transport. There are several methods available for increasing 
and varying the rate of mass transport in a controlled way, amongst which are hydrodynamic electrodes and 
microelectrodes [I, 2, 3 and 4]. In both cases, the regime may be described by solvable systems that may be 
used to predict the rate of mass transport and in the interpretation of experimental data. In hydrodynamic 
electrodes, the increased rate of mass transport of species is brought about by external mechanical forces, 
which can arise from the movement of the electrode, agitation of the solution or flowing of the solution past 
the electrode surface. The resulting forced convection leads to the thinning of the Nernst diffusion layer with a 
consequent increase in the linear concentration gradient that exists across it and hence to current densities as 
large as 100 times greater than the steady-state diffusion-limited value. By measuring the current-potential 
response as a function of mass transport it is thus possible to extrapolate kinetic information regarding an 
electrode reaction, provided it is under mixed control. There are a number of electrode designs that fall into 
the category of hydrodynamic electrodes, which include the rotating-disc electrode (RDE), the rotating ring- 
disc electrode (RRDE), the wall-jet electrode, the wall-pipe electrode, the tube electrode and the channel 
electrode [21, 22, 23, 24, 25, 26 and 27]. The RDE and RRDE are perhaps the most commonly employed in 
kinetic and mechanistic studies, and these will be further discussed together with the channel electrode. 
Microelectrodes, scanning electrochemical microscopy (SECM) and sonoelectrochemistry are also discussed. 
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B1. 28.4.1 ROTATING-DISC ELECTRODES 

A rotating-disc electrode (RDE) consists of a disc of electrode material embedded into a larger insulating 
sheath, and attached to the rotor spindle via a suitable electrical contact. The disc and sheath are rotated about 
a vertical axis. Upon rotation, a pump-action flow is initiated, which brings solution perpendicularly to the 
electrode surface and throws it out in a radial direction on meeting disc and sheath ( figure B 1.28. 6 ). A more 
quantitative description of the flow patterns can be made by the use of cylindrical polar coordinates, by 
looking at the variation of the solution-flow velocity components V x , V Y and Vq as a function of x, the distance 
perpendicular to the surface of the electrode. The change in concentration of an electroactive species with 
time due to convection and diffusion may be written as [1, 2, 4, 5] 


dt 


9r 2 r 9r r ? 90 ? J [ r 3.r r dr r 30 J* 


diffusion coiivtftlion 


However, the equation can be simplified, since the system is symmetrical and the radius of the disc is 
normally small compared to the insulating sheath. The access of the solution to the electrode surface may be 
regarded as uniform and the flux may be described as a one-dimensional system, where the movement of 
species to the electrode surface occurs in one direction only, namely that perpendicular to the electrode 
surface: 

^ = D^ - V t 22. where V, = -O.SL^'V 1 ' 1 * 2 . 

i)i Bx 2 x <ix 

The importance of convection in the system increases as the square of the distance from the electrode surface, 
and close to the surface it is not a dominant form of mass transport. Hence concentration changes will arise 
due to both diffusion and convection. In the Nernst diffusion model, this trend is exaggerated, and for the 
mass transport behaviour at an RDE, a plot of the concentration of electroactive species, c- 9 versus the 
distance from the electrode surface, x, is divided into two distinct zones ( figure B 1.28. 6 ). At the electrode 
surface, i.e. x = 0, the concentration of the electroactive species will be L \ and up to a distance 8 away from the 
electrode, there is a stagnant layer, in which diffusion is the only form of mass transport (the Nernst diffusion 
layer). Outside this layer, mass transport is dominated by strong convection and the concentration is 
maintained at the bulk value, cf 3 . The diffusion-layer thickness is determined by the rotation rate of the disc, 
the layer becoming thinner with increasing rotation rate. In this model, the values offhand 5 will depend on 
the applied potential and the electrode rotation rate, respectively. In a linear-sweep experiment at a given 
rotation rate, the concentration profiles, (dc^/dx), within the diffusion layer will vary linearly as the applied 
potential at the RDE is swept from a value where no electron transfer occurs towards values positive to Ej?. As 
the experiment is driven further, the surface concentration of the electroactive species eventually reaches zero, 
at which point the current response reaches its limiting plateau value. The limiting current density is expressed 
by [1,2, 4, 5] 

Jl = =riFk m a 
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where k m is the mass-transport coefficient and the diffusion-layer thickness is 5 = lib'^w" 1 ^ 2 , where co 

is the rotational speed in rad s , and v is the kinematic viscosity of the solution. Substituting for 8 leads to the 
Levich equation 

A = 0.62l/iFD 2 'V l ' f V>' /2 * 

The expression for the mass-transport-limiting current density may be employed together with the Nernst 
equation to deduce the complete current-potential response in a solution containing only oxidized or reduced 
species 

E = E» + — log — - — 

nr I 

in which / are current values from the rising portion of the curve (under mixed control). This equation, of 


course, only holds for fast electron-transfer reactions, where the surface concentrations are related through the 
Nernst equation. For a reversible electrode reaction, therefore, a plot of potential versus the logarithmic 
quotient should have a slope of (59/n) mV (at 298 K) and the formal potential for the couple will coincide 
with the half- wave potential, Ey 2 , of the curve (potential corresponding to 1/2 7 L ) ( figure B 1.28. 7 ). On 
reversing the scan, the current-potential curve will exactly retrace the forward scan, as the electroactive 
species will continue to be reduced or oxidized at the same potentials as in the forward sweep. The product 
formed at the electrode surface during both scans quickly disappears into the bulk solution through convection 
and is not available for the reverse electron transfer during the back-sweep. In addition to analysing the shape 
of a current-potential curve at a single rotation rate, the relationship between limiting current densities and 
mass transport can also be investigated. A common treatment for RDE data is to plot the limiting current 
density versus the square root of rotation speed, with a linear plot confirming conditions of mass-transport 
control, and the slope being used to determine the other parameters. 
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Figure Bl.28.6. (a) Convection within the electrolyte solution, due to rotation of the electrode; (b) Nernst 
diffusion model for steady state. 
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Figure Bl.28.7. Schematic shape of steady-state voltammograms for reversible, quasi-reversible and 
irreversible electrode reactions. 


In the case of an irreversible electrode reaction, the current-potential curve will display a similar shape, with 


1 10 

y L still proportional to or, but the curve is drawn out along the potential axis. The current-potential curve 
may be described by [1, 2, 4, 5] 
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in which the log plot remains linear but with a slope of 23RT/anF ( figure B 1.28. 7 ). The most obvious feature 
of the irreversible voltammogram is that the half- wave potential no longer falls near the reversible formal 
potential, reflecting the sluggish electron transfer kinetics. An activation overpotential is required to drive the 
reaction. 

The rotating-disc is also well suited to the study of coupled chemical reactions [2, 4]. 

It is essential for the rotating-disc that the flow remain laminar and, hence, the upper rotational speed of the 
disc will depend on the Reynolds number and experimental design, which typically is 1000 s or 10,000 rpm. 
On the lower limit, 10 s or 100 rpm must be applied in order for the thickness of the boundary layer to be 
comparable to that of the radius of the disc. 

The great advantage of the RDE over other techniques, such as cyclic voltammetry or potential-step, is the 
possibility of varying the rate of mass transport to the electrode surface over a large range and in a controlled 
way, without the need for rapid changes in electrode potential, which lead to double-layer charging current 
contributions. 

B1. 28.4.2 ROTATING RING-DISC ELECTRODES 

The rotating ring-disc electrode (RRDE) consists of a central disc separated from a concentric ring electrode 
by a thin, non-conducting gap. It was first developed by Frumkin and Nekrasov to detect unstable 
intermediates in electrochemical reactions [1, 2, 22]. As with the RDE, on rotation of the disc, solution is 
pulled towards the centre of the disc and then thrown out radially across the surface of the structure. The ring 
is effectively situated downstream to the disc. This permits the intermediates formed on the disc, as the result 
of an oxidation or reduction process, to be detected at the ring following their mass transport across the 
insulating gap between the electrodes. Hence, information on intermediates can be obtained before they reach 
the bulk solution or react further with the electrolyte solution. The ring and the disc are independent from one 
another and can hence be potentiostatted independently. 

In order to employ the RRDE for quantitative studies, it is necessary to describe the transport of species from 
disc to ring. In the absence of homogeneous chemical reactions, the electrogenerated species at the disc 
reaction is transported to the ring by diffusion across the stagnant layer at the electrode surface, by convection 
across the gap and diffusion across the stagnant layer at the ring electrode. The collection efficiency, TVq, is 
defined as the ratio of the mass-transport-controlled current for the electrode reactions at ring and disc, N^ = - 
We^dise' w ^ ere the minus sign arises because the reactions at the ring and at the disc occur in the opposite 
direction. The collection efficiency thus represents the fraction of material produced at the disc that is detected 
at the ring. Analytical solutions of the convective-diffusion transport at the ring-disc enables the collection 
efficiency for specific disc and ring dimensions to be calculated: 
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where a, p and F are defined as 

-ey-« 
'-ey-ey 

and r 1? r 2 and r 3 are the radius of the disc, the radius of the disc surrounded by the insulating sheath and the 
radius of the disc surrounded by sheath and ring, respectively. The collection efficiency is a function of r 1? r 2 
and r 3 and does not depend on the rotation speed or the nature of the redox species. Since access of material to 
the ring is not uniform, some of the material will be transported back into solution and collection efficiencies 
are typically around 0.2-0.3 and strongly depend on the geometry of the electrodes and the distance between 
them. The rotation rate will affect the time taken for the intermediates to be transported from the disc to the 
ring, short-lived intermediates requiring higher rotation rates and the construction of RRDEs with thin inter- 
electrode gaps. The rotation rate will not, however, affect the efficiency, since the currents at both the 
generator and collector electrodes will be enhanced. 

A number of different types of experiment can be designed, in which disc and ring can either be swept to 
investigate the potential region at which the electron transfer reactions occur, or held at constant potential 
(under mass-transport control), depending on the information sought. 

The RRDE is very useful for the detection of short-lived intermediates, in the investigation of reaction 
mechanisms, but also in the distinction of free and adsorbed intermediates, as the latter are not transported to 
the ring. 

B1. 28.4.3 CHANNEL-FLOW ELECTRODES 

Forced convection can also arise from the movement of electrolyte solution over a stationary working 
electrode. In a channel electrode, the electrode is embedded smoothly in one wall of a thin, rectangular duct 
through which electrolyte is mechanically pumped [3, 6, 26, 27]. The design of the flow cell consists of two 
plates sealed together, with typical dimensions of 30-50 mm in length, <10 mm in width and a distance 
between the plates of less than 1 mm (the cell height). The electrode is embedded either at the centre of the 
base plate or attached to the centre of the cover plate by means of an adhesive. 

The solution flow is normally maintained under laminar conditions and the velocity profile across the channel 
is therefore parabolic with a maximum velocity occurring at the channel centre. Thanks to the well defined 
hydrodynamic flow regime and to the accurately determinable dimensions of the cell, the system lends itself 
well to theoretical modelling. The convective-diffusion equation for mass transport within the rectangular 
duct may be described by 
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where v x , v , v z are the solution-velocity profiles in the directions x, y, z. By convention, the direction of flow 
is designated as the x-direction and the ^-direction is that normal to the electrode. The equation may be 
considerably simplified by removal of the time dependence under steady-state conditions and by neglecting 
axial diffusion to the macroelectrode, since convection is considerably faster. The diffusion layer is situated 
very close to the electrode surface and is small compared to the cell depth, decreasing as the flow rate is 
increased. The Leveque approximation further simplifies the system by approximating the parabolic flow to a 
linear flow near the electrode surface, provided that the electrode is less wide than the channel (for edge 
effects to be neglected) and the height, h, of the channel is much greater than the width, d. In an analogous 
fashion as for the RDE, solution for a simple mass-transport-limited electrode reaction leads to the Levich 
equation 

/| im = 0M5nFtf D 2/y v [ / 2 (h z d)- l ^wjc^ 

where v f is the solution volume flow rate, x e the length of the electrode, and d and w the height and width of 
the cell, respectively. Analytical solutions for the channel electrode also extend to more complicated electrode 
reactions involving coupled homogeneous reactions. 

Amongst the greatest advantages of channel-flow electrodes is the possibility of controlling the rate of mass 

transport over a range of three orders of magnitude, from 10 to 10 cm 3 s , and of varying the mass- 
transport coefficient by altering the cell depth and the electrode length. These rates are, in fact, not attainable 
at other hydrodynamic electrodes, such as the RDE and the wall-jet electrode. In addition, there is no risk of a 
build-up of a stagnant zone since the spent solution flows to waste. 

The channel-flow electrode has often been employed for analytical or detection purposes as it can easily be 
inserted in a flow cell, but it has also found use in the investigation of the kinetics of complex electrode 
reactions. In addition, channel-flow cells are immediately compatible with spectroelectrochemical methods, 
such as UV/VIS and ESR spectroscopy, permitting detection of intermediates and products of electrolytic 
reactions. UV-VIS and infrared measurements have, for example, been made possible by constructing the cell 
from optically transparent materials. 

B1. 28.4.4 MICROELECTRODES 

A microelectrode is an electrode with at least one dimension small enough that its properties are a function of 
size, typically with at least one dimension smaller than 50 |um [ 28 , 29 , 30 , 31 , 32 and 33]. If compared with 
electrodes employed in industrial-scale electrosynthesis or in laboratory-scale synthesis, where the 
characteristic dimensions can be of the order of metres and centimetres, respectively, or electrodes for 
voltammetry with millimetre dimension, it is clear that the size of the electrodes can vary dramatically. This 
enormous difference in size gives microelectrodes their unique properties of increased rate of mass transport, 
faster response and decreased reliance on the presence of a conducting medium. Over the past 15 years, 
microelectrodes have made a tremendous impact in electrochemistry. They have, for example, been used to 
improve the sensitivity of ASV in environmental analysis, to investigate rapid 
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electron transfer and coupled chemical reactions, and to study electrode reactions in low resistive media. 


The increased rate of mass transport is one of the most attractive and advantageous properties of 
microelectrodes over conventional electrodes, as the increased transport of the reactant to the electrode 
surface allows it to reach steady-state regimes rapidly. The diffusion rates increase with decreasing electrode 
size beyond that obtained with other steady-state techniques. For example, with a 10 |um diameter disc, the 
steady-state, mass-transfer coefficient, k , is comparable to that of a rotating disc revolving at an 
experimentally impossible 250,000 rpm. The discrimination against charging currents is another very 
important property. In fact, the magnitude of the charging current depends on the area of the capacitor, and for 
a microelectrode it decreases with electrode area. Thus, a microelectrode has a very reduced interface 
capacitance, and the charging current decays much more quickly than with conventional electrodes and faster 
response times may be achieved. Another property of microelectrodes is the decreased distortion from IR , 
the potential drop between working and reference electrodes generated by the passage of current through a 
solution and expressed in terms of the product of the solution resistance and the current flowing in the circuit. 
With conventional electrodes, it is usual to add supporting electrolyte to minimize the solution resistance, but 

with microelectrodes, the current passing through the cell is low, often of the order of 10~ 9 A, and hence 
problems with IR drop are greatly reduced. This proves to be an advantage to experiments with either a large 

current, /, or a large resistance, R, as for example in experiments with solvents of very low dielectric constant, 
in media with very low ionic strength, or in studies of solutions with high concentrations of electroactive 
species. Electrochemical measurements can be therefore made in new and unique chemical environments, 
which are not amenable at larger electrodes, and experiments have been reported in frozen acetonitrile, low- 
temperature glasses, ionically conductive polymers, oil-based lubricants and milk. In addition, the use of 
electrolyte-free organic media can greatly extend the electrochemical potential window, thus allowing studies 
of species with high redox potentials. Furthermore, such dimensions offer obvious analytical advantages, 
including the exploration of microscopic domains, measurement of local concentration profiles, detection in 
micro-flow systems and analysis of very small sample volumes [28], 

Microelectrodes with several geometries are reported in the literature, from spherical to disc to line electrodes; 
each geometry has its own critical characteristic dimension and diffusion field in the steady state. The 
diffusional flux to a spherical microelectrode surface may be regarded as planar at short times, therefore 
displaying a transient behaviour, but spherical at long times, displaying a steady-state behaviour [28, 34]. If a 

potential is applied so that the reaction O + ne~ — » R, becomes diffusion controlled, the current density at a 
microsphere electrode can be expressed by 

f\FD ul c\ itFDcj 

This expression is the sum of a transient term and a steady-state term, where r is the radius of the sphere. At 
short times after the application of the potential step, the transient term dominates over the steady-state term, 
and the electrode is analogous to a plane, as the depletion layer is thin compared with the disc radius, and the 
current varies with time according to the Cottrell equation. At long times, the transient current will decrease to 
a negligible value, the depletion layer is comparable to the electrode radius, spherical diffusion controls the 
transport of reactant, and the current density reaches a steady-state value. At times intermediate to the limiting 
conditions of Cottrell behaviour or diffusion control, both transient and steady-state terms need to be 
considered and thus the full expression must be used. However, many experiments involving microelectrodes 
are designed such that one of the simpler current expressions is valid. 
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Of course, in order to vary the mass transport of the reactant to the electrode surface, the radius of the 
electrode must be varied, and this implies the need for microelectrodes of different sizes. Spherical electrodes 
are difficult to construct, and therefore other geometries are often employed. Microdiscs are commonly used 
in the laboratory, as they are easily constructed by sealing very fine wires into glass epoxy resins, cutting 


perpendicular to the axis of the wire and polishing the front face of the disc that is created [30]. Because of its 
planar geometry, the diffusion field over the surface of a microdisc is non-uniform and the flux only 
approximates that of a hemisphere. The rate of diffusion to the edge of the disc will be higher than to the 
centre. Therefore, the rates of diffusion to the disc are estimated as space-averaged quantities and a factor of 
4/tt is required to adjust the equation for a spherical microelectrode to describe the diffusion of reactant to the 
surface of a microdisc electrode, which becomes j = inFDc^fitt; where r is now the radius of the disc. 

Similarly to the response at hydrodynamic electrodes, linear and cyclic potential sweeps for simple electrode 
reactions will yield steady-state voltammograms with forward and reverse scans retracing one another, 
provided the scan rate is slow enough to maintain the steady state [28, 35, 36, 37 and 38]. The limiting current 
will be determined by the slowest step in the overall process, but if the kinetics are fast, then the current will 
be under diffusion control and hence obey the above equation for a disc. The slope of the wave in the absence 
of IR drop will, once again, depend on the degree of reversibility of the electrode process. 

All types of voltammetry may be applied to microelectrodes, including normal, reverse pulse and square-wave 
voltammetry. Pulse voltammetry and potential-step program at microelectrodes discriminate against charging 
currents and the boundary conditions are set much faster between pulses, since the electrode responds in a 
much more rapid fashion to a potential change [ 11 , 15 , 16 ]. Cyclic voltammetry measurements can be made 
on a much more rapid time scale than with electrodes of conventional size and be operated in the range of tens 
of nanoseconds, without important distortion by IR drop and concerns regarding charging currents. This 
renders the characterization of rates and mechanisms of very fast chemical reactions as well as determination 
of trace quantities of transient species possible. At high sweep rates, however, only linear diffusion needs to 
be considered [28]. 

The advantages of microelectrodes for low- volume detection and spatially and temporally resolved 
measurements have been largely exploited in biology and medicine [ 39 , 40 and 41 ]. One of the most active 
and longer-standing fields is neuroscience, where the development of the electroanalysis of brain extracellular 
fluid has been remarkable, since it can be relatively non-invasive due to the small size and low currents 
flowing. An example is the monitoring of the release of neurotransmitters with carbon microelectrodes in 
either amperometric mode or using fast cyclic voltammetry [42, 43]. Carbon materials are the most common 
starting materials and have also been applied to other electroactive compounds such as histamine, anticancer 
drugs and ascorbic acid. Extension of the investigation of cellular systems to non-electroactive 
neurochemicals has led to the development of enzyme-modified microelectrodes for measurements of 
glutamate, glucose, and choline and acetylcholine. Voltammetric measurements have also been reported in 
single cells, although the living cells are separated from the parent organism. Microelectrodes have also found 
widespread use in sensor technology and environmental analysis. Due to the high rate of steady-state diffusion 
at a microelectrode, their response is independent of convection, thus enabling their use for the analysis of 
flowing systems. In order to enhance the current response at microelectrodes, a number of approaches have 
been described. Amongst these are random arrays of microdisc electrodes [44, 45 and 46] and interdigitated 
arrays of microband electrodes [47, 48]. Arrays of microelectrodes enable the enhancement of the current 
response, whilst retaining the properties of a single microelectrode, and have been used as highly sensitive 
detectors in flow-injection analysis and in liquid chromatography. 
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B1. 28.4.5 SCANNING ELECTROCHEMICAL MICROSCOPY 

SECM is a scanning-probe technique introduced by Bard et al in 1989 [ 49 , 50 and 51] based on previous 
studies by the same group on in situ STM [52] and simultaneous work by Engstrom et al [53 and 54], who 
were the first to show that an amperometric microelectrode could be used as a local probe to map the 
concentration profile of a larger active electrode. SECM may be envisaged as a 'chemical' microscope based 
on faradic current changes as a microelectrode is moved across a surface of a sample. It has proved useful for 


obtaining topographical and chemical information on a wide range of sample surfaces, including electrodes, 
minerals, polymers and biological materials. 

The apparatus consists of a tip-position controller, an electrochemical cell with tip, substrate, counter and 
reference electrodes, a bipotentiostat and a data-acquisition system. The microelectrode tip is held on a 
piezoelectric pusher, which is mounted on an inchworm-translator-driven x-y-z three-axis stage. This 
assembly enables the positioning of the tip electrode above the substrate by movement of the inchworm 
translator or by application of a high voltage to the pusher via an amplifier. The substrate is attached to the 
bottom of the electrochemical cell, which is mounted on a vibration- free table [55, 56, 57 and 58]. A number 
of different size and shape tips have been reported. The most common are disc shaped with diameters of 0.6- 
25 |um formed by sealing a Pt, Au wire or carbon fibre of the required radius in a glass capillary and polishing 
the sealed end. The glass wall surrounding the disc is sharpened to a conical shape to decrease the possibility 
of contact between glass and substrate as the tip is moved close to the latter. For most studies, the ratio of the 
diameter of the entire tip end, including the insulator, to that of the electrode itself should typically be «10. 
Metal electrodes down to the nanometre scale have also been fabricated by sealing an etched Pt or Pt-Ir wire 
in a suitable insulating material, leaving the etched end exposed. Commercial SECM instruments have only 
recently appeared on the market. 

With SECM, almost any kind of electrochemical measurement may be carried out, whether voltammetric or 
potentiometric, and the addition of spatial resolution greatly increases the possibilities for the characterization 
of interfaces and kinetic measurements [55, 56, 52, 58 and 59]. It may be employed as an electrochemical tool 
for the investigation of heterogeneous and homogeneous reactions, as an imaging device, or for 
micro fabrication, making use of different modes of operation. In amperometric feedback mode a three- or 
four-electrode configuration is employed, in which a microelectrode tip serves as the working electrode, the 
potential is controlled versus the reference electrode and the current flows between tip and counter-electrodes. 
The potential of the sample may also be controlled and it may thus serve as a second working electrode. The 
electrolyte solution contains a redox mediator, e.g. a reducible species O, such that when a suitably negative 
potential is applied to the tip, its reduction takes place at a rate governed by diffusion of the electroactive 
species to the electrode. If the tip is more than several tip diameters away from the surface, the steady-state 
current is given by /j\. = 4nFDcf*r 9 for a disc-shaped tip, where r is the radius of the tip. However, when the 
tip is brought within a few tip radii to a conductive substrate, the reduced species formed at the tip diffuses to 
the substrate where it is re-oxidized. As a consequence, an additional flux of O to the tip is produced which 
leads to an increase in the tip current, known as positive feedback. The smaller the tip-substrate distance the 
larger is the effect. In contrast, if the substrate is an electrical insulator, the reducible species cannot be 
regenerated and, since the diffusion of O from the bulk is hindered at small distances to the substrate, the tip 
current will be smaller than /r*_, i.e. negative feedback. Therefore, by scanning over the surface of a substrate, 
the variation in current can be related to changes in the distance and hence to the topography of the substrate. 
Besides feedback mode, several other modes exist, such as generation/collection mode, where species 
generated at one working electrode are detected at the second, penetration mode, in which a small tip is used 
to penetrate a microstructure and extract spatially resolved information about concentrations, kinetic and 
mass-transport parameters, and ion-transfer feedback mode, recently developed and useful for studies of ion- 
transfer reactions at liquid/liquid and liquid/membrane interfaces. The SECM 


-23- 

methodologies are based on quantitative theory, which has been developed for a variety of systems involving 
heterogeneous and homogeneous processes and different tip and substrate geometries. In many cases, 
analytical approximations allow the generation of theoretical dependences and an analysis of experimental 
data [60, 61, 62, 63 and 64]. 

The high rate of mass transfer in SECM enables the study of fast reactions under steady-state conditions and 
allows the mechanism and physical localization of the interfacial reaction to be probed. It combines the useful 


features of microelectrodes and thin-layer cells in dimensions not easily attainable in larger electrochemical 
cells. The mass-transfer rate in SECM is a function of the tip-substrate distance. At large distances, d, k m « 
D/r, whereas for small distances (d < r), k m & Did. The large effective k m obtainable enables fast 
heterogeneous reaction rates to be measured under steady-state conditions. Zhou and Bard measured a rate 
constant of 6 x 10 Ms for the electro-hydrodimerization of acrylonitrile (AN) and observed the short-lived 
intermediate AN - for this process [65]. 

Heterogeneous reactions at a substrate can also be probed without the need for an external voltage, if for 
example the mediator regeneration is chemical in nature rather than electrochemical. This has opened the 
possibility of studying dissolution of ionic single crystals and locating individual sites of reactivity in 
multiphase systems. It can be used to map the local surface reactivity in either feedback or collection mode. 
Feedback mode has been employed, for example, to probe the surface reactivity of a titanium substrate 
covered with Ti0 2 and to individuate precursor sites for pitting corrosion, whereas collection mode has been 
used to image fluxes of species produced or consumed at the substrate, such as iontophorectic fluxes of 
electroactive species through porous membranes [56, 57, and 58 ], 

Among the systems most studied by SECM are heterogeneous electron transfer reactions at the metal/ solution 
interface. Nonetheless, the diversity of interfaces and processes that can be studied with SECM has grown to 
include liquid/liquid and liquid/gas interfaces [66], and materials of biological significance [67]. It is a 
promising technique for the mapping of biochemical activity, for example transport in tissues and 
immobilized enzyme kinetics. Detection of single molecules has also recently been reported [68], 

B1. 28.4.6 SONOELECTROCHEMISTRY 

The technique of applying ultrasound during electrochemical measurements and reactions is known as 
sonoelectrochemistry or sonovoltammetry and is a field that has grown rapidly in recent years [69, 70, 71 and 
72]. The dominant ultrasonic effects are the enhanced mass transport of electroactive substrate to the electrode 
and the activation of the electrode surface through a cavitational cleaning action. The latter are violent 
collapses of oscillating bubbles, which can cause effects such as depassivation and erosion. The huge effects 
on the rate of mass transport to the electrode surface may be envisaged as an extremely thinned diffusion layer 
of uniform accessibility, partly induced by acoustic streaming, when ultrasonic horn transducers are 
employed, and by cavitational collapse of micro-bubbles at the solid-liquid interface. 

Two major sources of ultrasound are employed, namely ultrasonic baths and ultrasonic immersion horn 
probes [70, 71 ]. The former consists of fixed-frequency transducers beneath the exterior of the bath unit filled 
with water in which the electrochemical cell is then fixed. Alternatively, the metal bath is coated and directly 
employed as electrochemical cell, but in both cases the results strongly depend on the position and design of 
the set-up. The ultrasonic horn transducer, on the other hand, is a transducer provided with an electrically 
conducting tip (often Ti6A14V), which is immersed in a three-electrode thermostatted cell to a depth of 1-2 
cm directly facing the electrode surface. 
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The ultrasound intensity and the distance between the horn and the electrode may be varied at a fixed 
frequency, typically of 20 kHz. This cell set-up enables reproducible results to be obtained due to the 
formation of a macroscopic jet of liquid, known as acoustic streaming, which is the main physical factor in 
determining the magnitude of the observed current. 

The effects of ultrasound-enhanced mass transport have been investigated by several authors [73, 74, 75 and 
76 ]. Empirically, it was found that, in the presence of ultrasound, the limiting current for a simple reversible 
electrode reaction exhibits quasi-steady-state characteristics with intensities considerably higher in magnitude 
compared to the peak current of the response obtained under silent conditions. The current density can be 


described by j\\m = iiF/Jr^/S, where the diffusion layer 5 depends on the distance between the horn and the 
electrode and on the ultrasound intensity. Superimposed on the faradic current a fluctuation or noise is also 
detected consistent with the turbulent nature of the macroscopic jet of liquid and the presence of oscillating 
and cavitating bubbles. 

In an alternative design, the actual tip of the ultrasonic horn may be used as the working electrode after 
insertion of an isolated metal disc [77, 78 and 79]. With this electrode, known as the sonotrode, very high 
limiting currents are obtained at comparatively low ultrasound intensities, and diffusion layers of less than 1 
|um have been reported. Furthermore, the magnitude of the limiting currents has been found to be proportional 
to D , enabling a parallel to be drawn with hydrodynamic electrodes. 

The cleaning or depassivation effect is of great importance in sonoelectrochemistry, as it can be employed to 
wash off surface-adsorbed species and reduce blocking of the electrode by adsorption of reaction products. 
This effect has been reported, for example, for the depassivation of iron electrodes and for the removal of 
deposits and in the presence of polymer films on the electrode surface. However, damage of the electrode 
surface, especially for materials of low hardness such as lead or copper, can also occur under harsh 
experimental conditions and applied intensities [70, 71, 80 ], 

Sonoelectrochemistry has been employed in a number of fields such as in electroplating for the achievement 
of deposits and films of higher density and superior quality, in the deposition of conducting polymers, in the 
generation of highly active metal particles and in electroanalysis. Furthermore, the sonolysis of water to 
produce hydroxyl radicals can be exploited to initiate radical reactions in aqueous solutions coupled to 
electrode reactions. 


B1.28.5 ELECTROCHEMICAL IMPEDANCE SPECTROSCOPY 

In contrast to transient techniques, which involve the perturbation of a system and studying its relaxation with 
time, when the perturbation is sinusoidal, the analysis is performed in the frequency domain, which is 
obtained by applying Laplace transforms to the time-domain information. Alternating-current impedance 
techniques employ the ratio of an imposed sinusoidal voltage and the resulting sinusoidal current to define the 
impedance, which is a function of the frequency of the signal. When a steady-state system is perturbed by an 
applied AC voltage, it relaxes to a new steady state and the time taken for this relaxation is known as x. x = 
RC, where R is the resistance and C the capacitance of the system. Analysis of this relaxation process provides 
information about the system. In the frequency domain, fast processes, low x, occur at high frequencies, while 
slow processes, with high x, occur at low frequencies. Thus, dipolar properties may be studied at high 
frequencies, bulk properties at intermediate frequencies and surface properties at low frequencies. 
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Methods for measuring the impedance can be divided into controlled current and controlled potential [2, 4, 
81 ]. Under controlled potential conditions, the potential of the electrode is sinusoidal at a given frequency 
with the amplitude being chosen to be sufficiently small to assure that the response of the system can be 
considered linear. The ratio of the response to the perturbation is the transfer function, or impedance, Z, when 
considering the response of an AC current to an AC voltage imposition and is defined as E = IZ, where E and 
/ are the waveform amplitudes for the potential and the current respectively. Impedance may also be 
envisaged as the resistance to the flow of an alternating current. 

Two different components contribute to impedance: the resistive or real component due to resistors and the 
reactive or imaginary component from AC circuitry elements, such as capacitors, inductors, etc. Unlike the 
resistive component, the reactive impedance affects not only the magnitude of the AC wave but also its time- 


dependent characteristic, the phase. For example, when an alternating voltage wave is applied to a capacitor, 
the resulting current waveform will lead the applied voltage by 90°. Due to this reason the introduction of 
complex notation is convenient. Thus, when a system is perturbed by a sinusoidal potential, varying with time 
according to 

E(t) = E[,cxp(iorf) 

the response can be expressed in terms of 

i(l) = / cxp(iwf -p) 

where i is the complex number, E(t) and I(t) are the instantaneous values, E^ and / Q the peak amplitude of the 
potential and the current respectively, cp the phase angle difference and co the angular frequency in radians (co 

= 27$. 

Introducing the complex notation enables the impedance relationships to be presented as Argand diagrams in 
both Cartesian and polar co-ordinates (r,cp). The former leads to the Nyquist impedance spectrum, where the 
real impedance is plotted against the imaginary and the latter to the Bode spectrum, where both the modulus 
of impedance, r, and the phase angle are plotted as a function of the frequency. In AC impedance the cell is 
essentially replaced by a suitable model system in which the properties of the interface and the electrolyte are 
represented by appropriate electrical analogues and the impedance of the cell is then measured over a wide 

frequency range, usually between 10 4 and 10 -3 Hz. By comparing the measured results with values calculated 
from the model system, the suitability of the model and the values of the parameters can be evaluated. In fact, 
one of the advantages of EIS is that impedance functions frequently display many of the features exhibited by 
passive electrical circuits. The most important elements employed in equivalent circuits are the resistor R, 
which represents the resistance that charge carriers encounter in a specific medium, the capacitor C, which 
represents the accumulation of charged species, and the inductance Z, which represents the deposition of 
surface layers ( figure B 1.28. 8 ). An analogy, however, is not always feasible, due to the active nature of 
electrochemical interfaces and the chemical nature of charge transfer processes, as well as the non-ideal 
electric behaviour of real electrochemical systems. Furthermore, problems can arise in selecting a correct 
equivalent circuit out of a large number of possibilities, because of the uncertainty connected with the 
impedance at low frequency and because of the large number of possible combinations of mechanistic 
reactions that can produce the same impedance shape within error limits. Various methods have been reported 
to discriminate for the correct equivalent circuit and to obtain values for the elements in the circuit [81, 82 and 
83 ]. Spectra displaying one time constant are simple to interpret and may be easily resolved graphically 
directly from the spectra; however, more complex methods such as deconvolution and complex nonlinear 
least-square methods, or a combination of these are required for more complicated spectra. A number of 
software packages are based on these different types of methods. 
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Figure Bl.28.8. Equivalent circuit for a three-electrode electrochemical cell. WE, CE and RE represent the 
working, counter and reference electrodes; R is the solution resistance, R the uncompensated resistance, R t 
the charge-transfer resistance, R Y the resistance of the reference electrode, C d the double-layer capacitance and 
C r the parasitic loss to the ground. 

AC impedance spectroscopy is widely employed for the investigation of both solid- and liquid-phase 
phenomena. In particular, it has developed into a powerful tool in corrosion technology and in the study of 
porous electrodes for batteries [84, 85, 86 and 87]. Its usage has grown to include applications ranging from 
fundamental studies of corrosion mechanisms and material properties to very applied studies of quality 
control and routine corrosion engineering. In corrosion, EIS enables one to obtain instantaneous corrosion-rate 
information, polarization resistance and information on the kinetics and mechanisms of charge transfer 
processes such as oxide growth and metal dissolution. The technique is frequently employed in the monitoring 
of polymer-coated metals to investigate the corrosion protection, the dielectric properties, the onset of defect 
formation and the processes of coating degradation [84], 
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B1.28.6 PHOTOELECTROCHEMISTRY 

The combination of electrochemistry and photochemistry is a form of dual-activation process. Evidence for a 
photochemical effect in addition to an electrochemical one is normally seen in the form of photocurrent, 
which is extra current that flows in the presence of light [88, 89 and 90]. In photoelectrochemistry, light is 
absorbed into the electrode (typically a semiconductor) and this can induce changes in the electrode's 
conduction properties, thus altering its electrochemical activity. Alternatively, the light is absorbed in solution 
by electroactive molecules or their reduced/oxidized products inducing photochemical reactions or 
modifications of the electrode reaction. In the latter case electrochemical cells (RDE or channel-flow cells) 
are constructed to allow irradiation of the electrode area with UV/VIS light to excite species involved in 
electrochemical processes and thus promote further reactions. 


Conduction in semiconductors requires that electrons in the valence band be excited into the conduction band 
either by thermal or photochemical excitation. Upon excitation, an unoccupied vacancy (a hole) is left in the 
valence band. The hole and the excited electrons can move in response to an applied electric field and so 
permit the passage of current. Semiconduction can be controlled via doping of small quantities of material, 
which can be either electron donating or electron accepting, leading to n-type and p-type semiconductors [91, 
92, 93 and 94]. In a solution containing a redox couple, electron transfer will occur until the electrochemical 
potentials of the semiconductor and the solution are equal ( figure B 1.28. 9 ). The semiconductor will have a net 
positive or negative charge, which is situated near the surface of the solid, known as the space-charge layer 
(2-500 nm thickness). The bands may be envisaged as bent: band-bending downwards indicates excess of 
negative charge at the surface, whereas band-bending upwards indicates an excess of positive holes. 
Potentiostatic control of a semiconductor can change the energy of the conduction and valence bands with 
consequent changes in the band bending, leading to the supply and removal of charge carriers within the space 
layer and enabling electrolysis to occur at the solid/liquid interface. For the n-type semiconductor Ti0 2 , for 
example, the conduction and valence bands are bent upward at the surface, provided that a very negative 
potential is not applied to the electrode. An applied potential leading to no band-bending — i.e. to an absence 
of the space-charge layer — is called the flat-band potential of the semiconductor. Upon irradiation with light 
of an equal or greater energy than the band gap, the photochemically promoted electrons will be swept into 
the bulk of the material by the electric field present in the space-charge layer, whilst the holes in the valence 
band will migrate to the surface of the solid ( figure B 1.28. 10 ). As a result, photo-oxidation processes will be 
promoted to occur at the solid-liquid interface. 
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Figure Bl.28.9. Energetic situation for an n-type semiconductor (a) before and (b) after contact with an 
electrolyte solution. The electrochemical potentials of the two systems reach equilibrium by electron 
exchange at the interface. Transfer of electrons from the semiconductor to the electrolyte leads to a positive 
space charge layer, W. U is the potential drop in the space-charge layer. 
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Figure Bl.28.10. Schematic representation of an illuminated (a) n-type and (b) p-type semiconductor in the 
presence of a depletion layer formed at the semiconductor-electrolyte interface. 
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In the last 30 or more years, research in the field of photoelectrochemistry and photocatalysis has greatly 
expanded, with advances being made in the fundamental understanding of the faradic processes that control 
charge transfer at semiconductor/liquid junctions, and with the development of stable efficient and 
inexpensive photoelectrochemical cells [95, 96 and 97]. Semiconductor photoelectrochemistry has had an 
impact in a number of fields. Photocorrosion has been exploited to prepare technologically useful structures 
such as lenses that are integrated with light-emitting diodes and mated optical fibres. It has also led to the 
formation of porous Si electrodes, which have received much attention due to their interesting optoelectronic 
properties. Photoelectrochemical surface preparation of semiconductor materials has been used to clean solids 
and to evaluate etch pit densities. 

Photoelectrochemistry may be used as an in situ technique for the characterization of surface films formed on 
metal electrodes during corrosion. Analysis of the spectra allows the identification of semiconductor surface 
phases and the characterization of their thickness and electronic properties. 

Furthermore, semiconductor powders can be employed for the catalytic generation of useful products such as 
H 2 and 2 that can be used for the destruction of pollutants [98, 99 and 100 ]. 


B1.28.7 SPECTROELECTROCHEMISTRY 


In addition to the mechanism of electrode reactions, readily deduced using voltammetric techniques, the 
electrochemist seeks knowledge of the chemical composition and properties of electro-generated 
intermediates and films formed on electrode surfaces. Spectroelectrochemistry allows the simultaneous 
acquisition of electrochemical and spectroscopic data, which offer additional information to the investigation 
of a wide range of complex surface and homogeneous processes occurring in electrochemical systems [ 101 , 
102 , 103 and 104 ], Advances made in instrumentation over the past three decades have enabled the adaptation 
of spectroscopic methods to in situ application in an electrochemical cell and the development of new 
techniques, which have found widespread use in structure characterization of electrode surfaces, in 
identification of homogeneous phase molecules, and in studies of species adsorbed at the electrode/electrolyte 
interface. 


One of the first in situ combined electrochemical/spectroscopic techniques to be investigated employed 
UV/VIS detection, where solution-phase spectra of organic radicals generated at an electrode could be 
recorded. Typically, organic intermediates or products possess additional absorption bands not observed in the 
parent molecule, which can be used to fingerprint the electro -generated species. In addition to spectra, useful 
information may be obtained by monitoring the absorbance as a function of time during a potential-step 
experiment. When the potential is stepped from a value where no electrode transfer takes place to one where 
an electro -generated species is formed, the spectroscopic intensity-time response of the products may be 
analysed and, from the shape and size, estimates of the lifetime of the electrode intermediates can be 
extrapolated. 

In UV-VIS spectroelectrochemistry, optically transparent electrodes (OTEs) are utilized. A beam of 
monochromatic UV-VIS light is directed perpendicularly through the OTE, then through the diffusion layer 
next to the electrode and the bulk solution, before passing out of the electrochemical cell through an exit 
window and being detected. The beam is attenuated by the presence of absorbing species in the solution, 
therefore enabling spectral and temporal information about the concentrations of such species in the diffusion 
layer at the electrode to be obtained. An alternative design is the optically transparent thin layer electrode, in 
which a minigrid electrode is employed [2, 103 , 105 ]. 
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Infrared spectroscopy has also been widely employed in electrochemistry [ 105 , 106 , and 107 ]. Spectra aid the 
identification of reactants, of products and of long-lived intermediates and allow changes in the interfacial 
solvent to be tracked. A variety of spectral sampling and data acquisition methods have been developed to 
approach in situ detection of species. In external reflection sampling methods, the infrared beam is directed 
through a polarizer onto the front surface of a highly polished disc-shaped working electrode with high 
reflectivity in the infrared spectral region, such as Pt, Au or Ag. A special, thin-layer electrochemical cell is 
used that permits the infrared beam to enter and strike the disc, where it is reflected out of the cell and 
detected. In contrast, in attenuated total internal reflection sampling methods, the working electrode is a thin 
film of metal deposited on one surface of an ATR crystal. The metal film must be sufficiently thin to allow 
penetration of the IR evanescent wave beyond the metal solution interface. The ATR crystal forms the bottom 
of a chamber that holds the electrolyte solution and the counter and reference electrodes and the crystal is 
positioned so that the metal film is inside the chamber. This method has not been widely used in 
electrochemistry partly due to the difficulty in the preparation of the thin metal film working electrodes. 
Nonetheless, the latter design overcomes molecular transport limitations imposed by external reflection 
methods, where a thin solution layer of the order 1-5 |um between the front face of the working electrode and 
the infrared transparent window is required to minimize absorption of infrared radiation by the solvent. In 
fact, diffusion of species into and out of the thin-layer region is restricted and can lead to reactant depletion or 
product accumulation. 

Luminescence has been used in conjunction with flow cells to detect electro-generated intermediates 
downstream of the electrode. The technique lends itself especially to the investigation of 
photoelectrochemical processes, since it can yield information about excited states of reactive species and 
their lifetimes. It has become an attractive detection method for various organic and inorganic compounds, 
and highly sensitive assays for several clinically important analytes such as oxalate, NADH, amino acids and 
various aliphatic and cyclic amines have been developed. It has also found use in microelectrode fundamental 
studies in low-dielectric-constant organic solvents. 

One of the most important advances in electrochemistry in the last decade was the application of STM and 
AFM to structural problems at the electrified solid/liquid interface [ 108 , 109 ], Sonnenfield and Hansma [ 110 ] 
were the first to use STM to study a surface immersed in a liquid, thus extending STM beyond the gas/solid 
interfaces without a significant loss in resolution. In situ local-probe investigations at solid/liquid interfaces 
can be performed under electrochemical conditions if both phases are electronic and ionic conducting and this 


offers a great advantage since the Fermi levels of both substrate and tip can be precisely adjusted 
independently of each other. This opens the possibility of correlating structural to physical properties, since 
charge transfer central to electrochemical reactivity occurs within a few atomic diameters of the electrode 
surface, in the inner Helmholtz plane, and the detailed arrangement of atoms and molecules at this interface 
strongly controls the corresponding electrochemical reactivity. Since its introduction in electrochemistry, 
STM investigations have focused on studies of metal electrodes, i.e. Au, Pt, Pd and Rh, their surface charges 
in the double-layer potential region, and the surface changes caused by the formation of surface oxides. In 
addition, it has been employed in reconstruction and restructuring studies of metal surfaces, in studies of 
underpotential deposition of metals, in the investigation of adsorption/desorption processes, as well as in the 
understanding of processes controlling deposition and corrosion at semiconductor electrodes. 

Amongst other spectroscopic techniques which have successfully been employed in situ in electrochemical 
investigations are ESR, which is used to investigate electrochemical processes involving paramagnetic 
molecules, Raman spectroscopy and ellipsometry. 
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B1.29 High-pressure studies 

Malcolm F Nicol 


B1.29.1 INTRODUCTION 

This chapter introduces the physical chemistry of materials under high pressures. Space limitations permit 
only a broad-brush introductory survey. High-pressure studies range from designing equipment to generate, to 
confine and to measure high pressures to spectroscopic studies from 10 Hz to beyond 10 Hz at 
temperatures from below 1 K to 10 5 K and beyond for all sorts of elements, compounds, solutions and 
mixtures. To say that these are extreme ranges of conditions is an understatement. 

To gain a sense of the range of behaviours, consider what happens to one element familiar to every chemist 
and physicist: oxygen. At ambient temperature, oxygen, 2 , exists as the canonical odourless, colourless gas 
of elementary school and as a purple, orange, red, blue or black solid depending on the pressure and the 
direction from which you look at the crystals. It becomes a metal and, at low temperatures, an antiferromagnet 
or a superconductor. In the solid phase stable above 10 GPa, 2 has a strong infrared vibrational absorption 
band in the stretching region. Then, of course, there is the 3 isomer which has not been studied at high 
pressures. 

Similarly 'strange' things happen to other materials. Above 5 GPa, CO spontaneously polymerizes; the 

( _ - C=0 ) 

structure of the product is ^ ^ ™. Indeed, almost every carbon compound with unsaturated bonds 

becomes unstable with respect to reactions that produce saturated compounds at this or slightly higher 

pressures. Somewhat above 100 GPa, Csl and Xe also are metals. By 100 TPa and 10 5 K — yes, experiments 
have been done to these conditions — the density of Al exceeds 12 g cm -3 , or about five times greater than at 
ambient pressure. All but the Is and possibly 2s electrons remain localized on an Al nucleus. 

Books are available on many of these subjects. The objective here, therefore, is to introduce several 
fundamental issues and point to additional information by citing key references and suggesting further 
reading. We begin by briefly delimiting what we mean by high pressure. Then, we discuss how high pressures 
are achieved and measured before describing the behaviours of a few familiar materials at high pressures. 


B1.29.2 WHAT IS PRESSURE? 

Almost everyone has a concept of 'pressure' from weather reports of the pressure of the atmosphere around 
us. In this context, 'high pressure' is a sign of good weather while very low pressures occur at the 'eyes' of 
cyclones and hurricanes. In elementary discussions of mechanics, hydrostatics of fluids and the gas laws, most 
scientists learn to compute pressures in static systems as force per unit area, often treated as a scalar quantity. 
They also learn that unbalanced pressures cause fluids to flow. Winds are the flow of the atmosphere from 
regions of high to low 


pressures. However, high and low pressures in the atmosphere rarely deviate by as much as 10% from the 
local mean pressure, about 0.1 megapascal at 'sea level'. The pascal (Pa) is the SI unit of pressure, 1 Pa= 1 N 

m -2 = 10~ 5 bar. One standard atmosphere is about 1.013 x 10 5 Pa. Local fluctuations in the pressure of the 
atmosphere are, however, much smaller than the difference between the average pressure at 'sea level' and at 
the peaks of high mountains. The average pressure near the top of Mount Everest is less than one-quarter of 
atmospheric pressure at 'sea level'. 

This example of high and low pressure also shows the ambiguities of these terms in science. All these 
pressures are essentially constant in terms of the range of pressures encountered in nature. From negative 
pressures in solids under tension (e.g., on the wall of flask confining a fluid), pressure in nature increases 

through the very low-pressure vacuum of interplanetary space (less than 10~ 13 Pa) to well in excess of 10 20 Pa 
at the centres of neutron stars! In these terms, high pressure and low pressure are relative terms with different 
meanings in different areas of chemistry and physics. Test this by searching an electronic database for 'high 
pressure'. A discussion of high-pressure studies, therefore, must decide what pressure is and what high means; 
just how high is high. 

Relationships from thermodynamics provide other views of pressure as a macroscopic state variable. Pressure, 
temperature, volume and/or composition often are the controllable independent variables used to constrain 
equilibrium states of chemical or physical systems. For fluids that do not support shears, the pressure, P, at 
any point in the system is the same in all directions and, when gravity or other accelerations can be neglected, 
is constant throughout the system. That is, the equilibrium state of the system is subject to a hydrostatic 
pressure. The fundamental differential equations of thermodynamics: 

dU = -Pdv \ TdS 
dA = -PdV-SdT 

identify P through the Maxwell relations: 

p = -(atjfdV) s = -c<iAfav) T , 

Two other Maxwell relations define the direction systems change to achieve equilibrium: 

V = f>' } = OH/»P) s = (dGf&Ph. 

In both mechanical (constant S, minimize H) and thermal (constant T, minimize G) contexts, pressure drives a 
system to become smaller or denser. 

The situation is more complex for rigid media (solids and glasses) and more complex fluids: that is, for most 
materials. These materials have finite yield strengths, support shears and may be anisotropic. As samples, they 
usually do not relax to hydrostatic equilibrium during an experiment, even when surrounded by a hydrostatic 
pressure medium. For these materials, P should be replaced by a stress tensor, a.., and the appropriate 
thermodynamic equations are more complex. 


The take-home lesson is that the vast majority of high-pressure studies are on solids or other rigid media and 
are not done under hydrostatic conditions. The stresses and stress-related properties may vary throughout the 
sample. Unless the probes are very local and focus on a small region of the sample, measurements are 
averages over a range of, often uncharacterized, conditions. 


As well as macroscopic equations of state relating free energies, enthalpy, entropy, density, composition, 
temperatures and pressure, high-pressure science also concerns how changes of pressure and other 
macroscopic constraints affect the microscopic molecular and electronic structures of matter. At low 
pressures, the chemistry of most materials is described in terms of electrons tightly bound to specific atoms, 
molecules or ions and relatively weaker intermolecular van der Waals or ionic forces. The itinerant 
conduction electrons of metals are an exception; they are delocalized throughout the solid. The highly local 
nature of most electrons reflects the drive to minimize their potential energies. 

To a rough approximation, the kinetic and potential energies of electrons in simple systems vary with density 

as p and -p , respectively. This means that kinetic energy considerations should dominate at very high 
densities. Localized electrons should, therefore, eventually delocalize at very high pressures, converting ionic 
and molecular materials to more closely packed extended network structures and to metals at high pressures. 
Again, of course, the question occurs: how high is very high? Experiments provide the answer, or at least a 
lower limit. Where the answer is missing, experimenters are driven to try to attain even higher pressures. 

Many experiments support the pressure-delocalization principle. Most unsaturated organic molecules 
including CO and C 2 N 2 polymerize at pressures of the order of 10 GPa [1, 2 and 3]. Layered covalent solids 
like graphite and hexagonal boron nitride (h-BN) transform to dense, three-dimensional network materials, 
diamond and cubic boron nitride (c-BN) [4, 5 and 6]. Solid oxygen, solid and fluid iodine, fluid hydrogen and 
nitrogen, xenon, and cesium iodide are examples of materials developing metallic behaviour at pressures of 
the order of 100 GPa [7, 8, 9, 10, U, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 and 22]. Later sections provide 
more details about some of these transformations. At intermediate pressures, the energies of atomic and 
molecular orbitals change with pressure. By measuring electronic spectra at high pressures, differences of 
these energy changes can be determined for various different orbitals. In some cases, spectral features change 

by as much as 0.1 eV GPa -1 . Drickamer named this phenomenon pressure-tuning spectroscopy and has 
written extensively about observations for many systems [23, 24 and 25 ]. 


B1.29.3 WHAT PRESSURES ARE HIGH? 

What then are high pressures? The answer to this question involves the bias of personal experience. I often 
remark in an off-hand manner that 'In my laboratory, we consider 5 kbar a low pressure'. We have several 
reasons for setting the low-high boundary around 1 GPa. This and slightly higher pressures can conveniently 
be achieved by use of commercial autoclaves several litres in size, mechanical compressors and fluid or even 
compressed-gas pressure media. Many commercial processes run at these and lower pressures. These include 
the Haber synthesis of ammonia, a method for producing high-density polyethylene, and recently developed 
methods for producing vaccines by denaturing viruses or for sterilizing (pascalizing) strawberry jam and other 
foods at ambient temperature which preserve their flavours better than pasteurizing at higher temperatures. 

The rates of several chemical reactions accelerate by factors of 10 4 or more between 0.1 and 100 MPa at 
ambient temperature, so much interesting chemistry occurs at these lower pressures. At such 'low' pressures, 
Bridgman [26] even showed how to cook eggs at 'room' temperature. 


At ambient temperature, however, few materials remain fluid at pressures much higher than 1 GPa. Fluids 
also are much more difficult to confine at higher pressures. Absolute pressures have been measured with 
dead- weight testers only to about 2.5 GPa, that is to the pressure of a solid-solid phase transition in elemental 
bismuth at 298 K [27]. The importance of non-hydrostatic stresses and changes to the technology of high- 
pressure studies above 1 GPa suggest the rough dividing line which I have adopted for this essay. 

The energies of chemical changes provide a third criterion for defining high pressures. Many unsaturated 


organic compounds dimerize or polymerize at high pressures because the products are denser by about 10 
nr mol (10 cm 3 mol -1 ). At a pressure of 1 GPa, the corresponding decrease of the energy, enthalpy and free 
energy is 10 kJ mol -1 or relatively modest compared with chemical bond energies. At 10 GPa, for the same 
difference of molar volume, the energies decrease by 100 kJ mol, an amount comparable to bond energies. 
That is, chemical change can be anticipated at pressures somewhat above 1 GPa. 


B1.29.4 HOW ARE HIGH PRESSURES ACHIEVED? 

Laboratory high-pressure studies follow many approaches. The pressure may remain constant (so-called static 
experiments) or be transient (so-called dynamic experiments where shock waves generated by an explosion, 
impact or laser ablation compress a sample for a few microseconds or shorter times). Many devices have been 
used for static experiments to about 20 GPa; Jayaraman described many of these in three articles [28, 29 and 
30 ]. Many static high-pressure cells are variants of the piston-cylinder apparatus frequently used to illustrate 
compression in elementary discussions of the thermodynamics of gases. An external force on a piston free to 
move within a cylinder applies pressure to a sample that is confined as long as the seal between the piston and 
cylinder remains leaktight. Bridgman's anvils [ 31 ] represent a different concept. The concept confines the 
sample between two pistons made of a hard material shaped as truncated cones and a crushable cylindrical 
gasket. The classical Bridgman design used cemented tungsten carbide anvils and a lava (pyrophyllite) gasket. 
External force applied to the anvils crush the cylinder, preventing the sample from 'blowing out', while 
applying pressure to the confined sample. Many dual-piston, tetrahedral and cubic cells elaborate on one or 
both of these concepts of compressing a sample within a confined, sealed volume. 

All static studies at pressures beyond 25 GPa are done with diamond-anvil cells conceived independently by 
Jamieson [ 32 ] and by Weir et al [33]. In these variants of Bridgman's design, the anvils are single-crystal 
gem-quality diamonds, the hardest known material, truncated with small flat faces (culets) usually less than 
0.5 mm in diameter. Diamond anvils with 50 |um diameter or smaller culets can generate pressures to about 
500 GPa, the highest static laboratory pressures equivalent to the pressure at the centre of the Earth. 

Dynamic experiments with conventional (chemical) explosives or projectiles accelerated in gas guns have 
achieved 1 TPa in favourable cases. Laser-driven shocks have produced higher shock pressures [34], and 
measurements to 75 TPa have been reported for shock waves generated during underground tests of nuclear 
explosives (for a recent discussion see [35]). Sample volumes in static experiments range from litres at 
pressures up to 10 GPa to 0.1 nl at 500 GPa. Samples for commercial dynamic high-pressure production of 
diamond powder was done on the 100 kl scale. Most samples for shock wave studies are smaller; laser-driven 
shock wave experiments often use microlitre samples. 


In static experiments, the temperatures of samples can be controlled from less than 1 to more than 5000 K and 
can be measured with reasonable accuracy. For low temperatures, entire pressure vessels are thermostatted by 
mounting them in cryostats or surrounding them with heaters or furnaces. With these techniques, the 
temperatures are uniform throughout the sample. The strengths of the materials used to construct the vessel, 
however, limit the temperatures and pressures that can be achieved by such external heating methods. For 
higher temperatures, internal heating is used: that is, the sample is heated while the confining pressure vessel 
is kept at a lower temperature to maintain its mechanical strength. This can be done by surrounding the 
sample with heating elements mounted inside the pressure vessel around the sample or by irradiating the 
sample with an intense infrared or visible laser. Internal-heating methods may involve very large temperature 

gradients, up to 10 8 K m _1 (100 K |um _1 ), and challenge thermometry. The irreversible nature of the work 
done when shock waves compress a sample necessarily increases the sample's temperature; however, the 
temperature of the shocked state is often impossible to characterize. Indeed, measuring temperatures achieved 
during dynamic experiments is one of the biggest unsolved problems of high-pressure research. 


B1.29.5 HOW ARE HIGH PRESSURES MEASURED? 

Absolute pressure measurements by dead- weight piston-cylinder methods have been made only to 2.5 GPa, 
although Getting recently developed a cell which may extend absolute measurements to 5 GPa [27]. Pressures 
achieved during shock experiments are computed with the Rankine-Hugoniot equations which assume that 
the shocked, high-temperature state is one of thermodynamic equilibrium and mass, momentum, and energy 
are conserved [35]. Several procedures have been developed to relate densities and other properties of states 
achieved during shock experiments to values of the same properties on ambient or zero-Kelvin isotherms. 
Pressures in static experiments above 2.5 GPa are often determined by measuring the density of a convenient 
material like NaCl or Au confined with the sample being studied by x-ray diffraction or by a secondary probe 
that has been calibrated in terms of x-ray densities. Typical secondary probes include luminescence spectra of 

ruby (dilute Cr 3+ in A1 2 3 ) [ 36 ] or Sm:YAG [ 37 ] or Raman spectra of nitrogen [ 38 ] or diamond [39]. In each 
secondary probe, a spectral feature — a narrow emission line or vibrational band — whose energy can be 
measured precisely changes energy with pressure in an established, ideally simple manner. 


B1.29.6 HIGH-PRESSURE FORMS OF FAMILIAR OR USEFUL 
MATERIALS: DIAMOND, FLUID METALLIC HYDROGEN, METALLIC 
OXYGEN, IONIC CARBON DIOXIDE, GALLIUM NITRIDE 

The most important commercial products of high-pressure science are the extremely hard materials, synthetic 
diamond [40, 41 and 42] and (c-BN) [43]. At ambient pressure, diamond is less stable than the less dense 
allotrope, graphite. c-BN also may be less stable than the less dense h-BN isomer with its graphitic structure 
of rings of alternating B and N atoms. Diamond and c-BN with four equivalent sp 3 bonds per atom are denser 
than graphite and h-BN with three sp bonds, each shorter than an sp bond and one very long intermolecular 
van der Waals bond per atom. The volume change, AV, for the transformations from graphite to diamond is 
negative. Thus, the A(PV) = PA V contribution to the change of enthalpy or Gibbs' free energy is negative and 
becomes even more negative at higher pressures, so the denser forms of each material become more stable at 
higher pressures. Rearranging the bonds around each atom involves high-energy barriers that separate the low 
and high density forms (for a historical review of these processes see [44]). Although Yagi and Utsumi [ 45 ] 
showed that the graphite converted to the hexagonal form of diamond at 


ambient temperature above 10 GPa, complete conversion and recovery to ambient pressure was not possible 
unless the product was heated under pressure to more than 1 100 K. That is, high temperatures and high 
pressures are used to overcome the thermodynamic and kinetic barriers to making the desirable dense hard 
materials that can be recovered metastably. The high barrier also impedes reversion of the recovered diamond 
and c-BN to the stable graphitic forms at low temperatures. Besides the high-pressure route, diamond can be 
synthesized at low pressures, even less than 0.1 MPa, by kinetically controlled gas-surface reactions. 

Large quantities of both diamond and c-BN are produced by static or shock methods for industrial cutting 
applications. Most of the synthetic material is finely powdered and can be bound or compacted to make tools. 
Manufacturing costs for large crystals are too high for the commercial gem market; however, large diamonds 
of exceptionally high quality are made for special applications: e.g. x-ray monochromators for high-intensity 
synchrotrons. For cutting steels, c-BN is particularly valuable, because diamond tools tend to react with the 
steel, forming iron carbide. 

Metallic hydrogen has been a holy grail of high-pressure research since Wigner and Huntington suggested that 


it might be stable above about 10 GPa [46]. Many claims to the contrary notwithstanding, metallic solid 
hydrogen has not been found at ambient temperatures to 342 GPa [47]. Weir et al, however, found that fluid 
hydrogen becomes highly conductive under shock compression at lower pressures, 140 GPa, and higher 
temperatures [11]. The fact that homonuclear diatomic molecular fluids become conductive at lower 
pressures — and necessarily higher temperatures — than solid phases of the same systems is evident in nitrogen 
and iodine, and may be a general phenomenon. Further careful experiments must be done to confirm this 
conjecture. Detailed studies of the conductivities of supercritical Cs and Hg show that the transition from low 
to high electrical conductivity has neither the characteristics of a thermodynamic phase transition nor a 
general relationship to the vapour-liquid critical point [48, 49, 50 and 51 ]. 

Oxygen is the low-Z diatomic which is known to transform to a metal and, at about 1 K, a superconductor at 
high pressures. The transition pressure is slightly greater than 100 GPa [7]. The conductive phase consists of 
2 molecules; that is, it is not an atomic phase. Optical, infrared and visual spectral, and x-ray diffraction data 
show that the relevant s phase of oxygen is very anisotropic, and it is reasonable to conjecture that the 
electrical conductivity also depends upon crystallographic orientation. The other group VI elements also have 
metallic, superconductive phases at high pressures and low temperatures. 

Recent work on the carbon dioxide system shows another unusual high-pressure behaviour. Raman spectra of 
carbon dioxide show that C0 2 molecules remain the basis of the phases to more than 40 GPa at temperatures 
below a few hundred Kelvin [52]. These results, however, do not mean that the molecular crystals are the 
stable phases; indeed, recent studies of the combustion of carbon at high pressures by Yoo et al [53] reach 
another conclusion. They initiated combustion of a mixture of carbon and oxygen at pressures between 7 and 
13 GPa by heating the carbon with a Nd:YAG laser, quenching the products to ambient temperature under 
pressure and recording their Raman spectra. As well as features of unreacted 2 and C0 2 in some samples, 

they found vibron bands characteristic of the carbonate ion near 734 and 1079 cm -1 , a band assigned to CO ++ 

near 2243 cm -1 , and several lattice modes between 100 and 350 cm -1 . These features, including the shape of 
the lattice-mode spectrum, closely match the spectra of, the ionic dimer of N0 2 , reported by Agnew et al 

[54], apart from minor shifts because of different pressures, force constants and reduced masses. At higher 
pressures, heating carbon to ignition was more difficult because diamond formed, which greatly reduced the 
absorption of the Nd:YAG radiation. The could not be quenched to ambient pressure at ambient temperature; 
it transformed to C0 2 below 2 GPa. When the was compressed above 15 GPa at ambient temperature, the 

sharp band near 1 100 cm either disappeared or broadened 


above. This change was reversible with pressure and was attributed to either amorphorization of CO^CO^, or 
transitions to larger multimers. 

An ionic dimer of carbon dioxide has a critical implication for understanding detonation chemistry of 
energetic organic molecules. This dimerization provides a reasonable explanation for the kink in shock 
compression Hugoniot of C0 2 [55], which, if correct, implies that the dimer forms on the time scale of shock 
loading and detonation. Because water and some nitrogen oxides also become more ionic at these high 
pressures, interactions and chemical reactivities of these ionic H-C-N-0 species will differ from those of the 
neutral species assumed, in most models, to be the major detonation products over a wide range of high 
pressures and temperatures [56]. Furthermore, because ionic species are implicated in planetary 'ices' like 
H 2 0, CH 4 and NH 3 [57, 58, 59 and 60], the ionic dimer should have some bearing in understanding the 
internal structure and magnetism of the Jovian planets [61]. 

Interest in A1N, GaN, InN and their alloys for device applications as blue light-emitting diodes and blue lasers 
has recently opened up new areas of high-pressure synthesis. Near atmospheric pressure, GaN and InN are 
unstable with respect to decomposition to the elements far below the temperatures where they might melt. 
Thus, large boules of these materials typically used to make semiconductor devices cannot be grown from the 


melt or annealed at temperatures approaching their melting points. Devices have been grown by 
heteroepitaxial methods, depositing GaN on A1 2 3 or SiC substrates, although high defect concentrations 
because of mismatched lattice constants and thermal expansivities are a serious problem. Grzegory et al [ 62 ] 
showed how to overcome this limitation for GaN by growing large crystals from N 2 dissolved in liquid Ga at 
pressures up to 2 GPa. A1N but not InN also have been grown this way [63]. These relatively slow processes 
produce excellent materials. 

Wallace et al explored another approach, metathesis reactions. By igniting a mixture of Gal 3 and Li 3 N 
confined at pressures of the order of 4 GPa, they produced fine-crystalline GaN [63]. The thermodynamic 
driving force for the process is the very negative enthalpy of forming Lil. With appropriate mixtures and 
pressures, they produced CrN, Cr 2 N, TaN and other nitrides by this metathesis route. The metathesis method, 
like the direct reactions between elements that Yoo et al used to make c-BN, (3-Si 3 N 4 , B 2 3 and other 
materials [64, 65], yields more crystalline products at higher confining pressure. Recently, Wallace [ 66 ] 
devised combinations of reagents, chemical diluents and confinement so that GaN and InN crystals of similar 
quality can be made at as low as ambient pressure. 


B1.29.7 SPECTROSCOPY AT HIGH PRESSURES 

Almost every modern spectroscopic approach can be used to study matter at high pressures. Early 
experiments include NMR [67], ESR [68]; vibrational infrared [33] and Raman [69]; electronic absorption, 
reflection and emission [23, 24 and 25, 70]; x-ray absorption [ 71 ] and scattering [72], Mossbauer [ 73 ] and ge- 
ms analysis of products recovered from high-pressure photochemical reactions [74]. The literature contains 
too many studies to do justice to these fields by describing particular examples in detail, and only some 
general rules, appropriate to many situations, are given. 

The frequencies of vibrational modes usually increase with increasing pressure because the corresponding 
potential wells become narrower and the force constants increase. In wavenumber terms, these increases range 

up to the order of 10 cm GPa . A notable exception is the stretching mode of an O-H-0 hydrogen bond. 
Other instances of modes whose frequencies decrease with increasing pressure suggest molecular or lattice 
instabilities that lead to phase transitions at higher pressures. Usually, the transition occurs before the 
frequency of the mode reaches zero. 
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Most electronic valence transitions shift to longer wavelengths at higher pressures: that is, the gap between the 
highest occupied orbital and lowest unoccupied orbital tends to decrease upon compression. The rates of shift 
usually are larger (1) for pure materials than for solutes in a solvent and (2) for stronger (more allowed) 
transitions. However, these correlations are not quantitative, and many transitions shift in the opposite 

direction. The largest shifts are of the magnitude 0.1 eV GPa . Many d-d bands of transition element 
compounds vary linearly with the fifth power of the metal-ligand distance. 

New methods appear regularly. The principal challenges to the ingenuity of the spectroscopist are availability 
of appropriate radiation sources, absorption or distortion of the radiation by the windows and other 
components of the high-pressure cells, and small samples. Lasers and synchrotron radiation sources are 
especially valuable, and use of beryllium gaskets for diamond-anvil cells will open new applications. Impulse- 
stimulated Brillouin [75], coherent anti-Stokes Raman [76, 77], picosecond kinetics of shocked materials [78], 
visible circular and x-ray magnetic circular dichroism [79, 80] and x-ray emission [72] are but a few recent 
spectroscopic developments in static and dynamic high-pressure research. 

An especially interesting recent example is Benedetti et aPs use of circular dichroism (CD) spectroscopy to 
detect a pressure-induced change of the configuration at the metal centre of the octahedral chiral A- and A-tris 


[cyclo 0,0' l(R),2(R)-dimethylethylene dithophosphato] chromium(III) [79]. The pressure medium was 

Nujol®. To measure the CD spectrum, they had to overcome the birefringence of the strained diamond 
windows of the high-pressure cell. They did this by recording and averaging spectra of the sample — and of a 

blank cell filled with Nujol® — for each of four 90° rotations of the cell around the axis normal to the 
windows. The measurements for the blank showed that the baseline obtained by this averaging procedure was 

close to ideal, although a small further correction was required. 
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B 2.1 Ultrafast spectroscopy 

Warren F Beck 


B2.1.1 INTRODUCTION 

The development of the millisecond and microsecond flash photolysis experiments by George Porter and co- 
workers [1, 2] in the 1950s marks the true birth of time-resolved spectroscopy. Porter's work, which provided 
for the first time a way to capture the absorption spectrum of a short-lived kinetic intermediate in a 
photochemical reaction, helped to start a new era in physical chemistry, one that was focused on the 
mechanism and dynamics of chemical reactions. Owing to the subsequent development of mode-locked laser 
sources, beginning with the picosecond ruby and neodymium-glass lasers in the 1960s, the sub-picosecond 
passively mode-locked dye laser in the late 1970s and, most recently, the femtosecond self-mode-locked Ti- 
sapphire laser in the early part of this decade, the time resolution for spectroscopic measurements has 
advanced three orders of magnitude, from the 10 ps to the 10 fs regime [3]. It is now possible to conduct a 
wide variety of spectroscopies with ultrashort laser pulses of photons selectable over the entire spectral range 
from the x-ray region [4, 5] to the terahertz or far-infrared (IR) region [6, 7, 8, 9 and 10 ]. A variety of robust 
methods have been developed to probe the time evolution of populations and coherences. The shortest time 
scale that is now routinely accessible is comparable to or shorter than the period of molecular vibrations, the 
fundamental time scale of chemistry. 

This chapter focuses on the primary experimental methods of ultrafast spectroscopy, as discussed in terms of 
studies on intramolecular dynamics in the condensed phase or in proteins. Ultrafast spectroscopy generally 
denotes spectroscopy that exploits the time resolution obtainable with mode-locked laser sources. The 
ultrafast regime encompasses electronic and vibrational energy transfer, charge transfer and structural 
dynamics involving isomerization and the breaking of bonds. In many cases, these processes can be optically 
triggered so that the time course can be studied with a delayed probing or gating pulse. The initial and delayed 
pulses are derived from a pulse train emitted by a single mode-locked light source; the ultrashort timing of the 
experiment is usually derived from the distance of flight of the optical pulses using a technique that is 
reminiscent of interferometry. After a discussion of the current ideas in producing and characterizing tunable 
ultrashort optical pulses for spectroscopy, the chapter discusses methods for time-resolved fluorescence 
spectroscopy, pump-probe methods for time-resolved absorption spectroscopy and multipulse photon-echo 
techniques for the measurement of coherence. The chapter closes with a brief discussion of the use of phase- 
controlled, multiple-pulse sequences for advanced, highly selective spectroscopies and for the control of 
chemical dynamics. 


B2.1.2 FEMTOSECOND LIGHT SOURCES 

The development of ultrafast spectroscopy has paralleled progress in the technical aspects of pulse formation 
[11]. Because mode-locked laser sources are tunable only with difficulty, until recently the most heavily 
studied physical and chemical systems were those that had strong electronic absorption spectra in the 
neighbourhood of conveniently produced wavelengths. 


As one important example, the introduction of the prism-controlled, colliding-pulse, mode-locked (CPM) dye 
laser [12, 13] led almost immediately to developments in measurement technique with pulses of less than 100 


fs duration; Shank and co-workers used an amplified CPM laser [ 14 ] in their work with 6 fs pulses in 1987 
[15]. Until recently, the pulses used in those experiments were the shortest optical pulses characterized. The 
transition-state spectroscopy of Zewail and Bernstein [ 16 , 17 , 18 , 19 , 20 , 21 and 22] exploited an amplified 
CPM laser after frequency doubling and/or continuum generation. The chemical systems that were most easily 
studied, however, were those that could be stimulated either by the 620 nm output of the CPM directly or after 
frequency doubling to 310 nm. In addition, the CPM laser and its contemporary, more tunable alternative, the 
pulse-compressed, synchronously pumped dye laser [11], were tools that could be effectively used only by 
researchers with extensive backgrounds in lasers and optics. 

These limitations have recently been eliminated using solid-state sources of femtosecond pulses. Most of the 
femtosecond dye laser technology that was in wide use in the late 1980s [ 11 ] has been rendered obsolete by 
three technical developments: the self-mode-locked Ti-sapphire oscillator [23, 24, 25, 26 and 27], the 
chirped-pulse, solid-state amplifier (CPA) [28, 29, 30 and 31], and the non-collinearly pumped optical 
parametric amplifier (OP A) [32, 33 and 34]- Moreover, although a number of investigators still construct 
home-built systems with narrowly chosen capabilities, it is now possible to obtain versatile, nearly state-of- 
the-art apparatus of the type described below from commercial sources. Just as home-built NMR 
spectrometers capable of multidimensional or solid-state spectroscopies were still being home built in the late 
1970s and now are almost exclusively based on commercially prepared apparatus, it is reasonable to expect 
that ultrafast spectroscopy in the next decade will be conducted almost exclusively with apparatus from 
commercial sources based around entirely solid-state systems. 

Figure B2.1.1 depicts an instrument that takes advantage of many of the most recent technical developments. 
The best strategy for generating wavelength-tunable ultrashort laser pulses for time-resolved spectroscopy 
involves use of an OP A as the only wavelength-tunable element. This approach is organized around the 
principle that extremely stable, fixed wavelength, high-energy pulse trains can now be generated using an 
amplified Ti-sapphire-based system. The chief advantage of instruments like the one shown in figure B2.1.1 
is that experimental demands for specific operating wavelengths are met by adjustment of the last device in 
the pulse-forming chain, the OPA. One can expect such a design to be considerably more robust and user- 
friendly than systems based on wavelength tunable oscillators, which demand manipulation of every device in 
the instrument in response to tuning to a new wavelength. 
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Figure B2.1.1 Femtosecond light source based on an amplified titanium-sapphire laser and an optical 
parametric amplifier. Symbols used: P, Brewster dispersing prism; X, titanium-sapphire crystal; OC, output 
coupler; B, acousto-optic pulse selector (Bragg cell); FR, Faraday rotator and polarizer assembly; DG, 
diffraction grating; BBO, P-barium borate nonlinear crystal. 

B2.1.2.1 OSCILLATORS 

The most commonly used femtosecond oscillator at this point is the self-mode-locked Ti-sapphire laser [ 23 , 
24, 25, 26 and 27], shown in figure B2. 1 . 1 which can routinely produce pulses of light with durations 
adjustable over the 10-150 fs range. The wavelength of the Ti-sapphire oscillator can be tuned over the 700- 
1 100 nm range using an intracavity slit or birefringent filter, providing pulse durations that are essentially 
limited by the bandwidth of the filtering element. (It should be emphasized, however, that tuning an oscillator 
of this type is not as routinely done as is tuning an OPA.) The pulse energy that can be directly obtained from 
the oscillator is typically limited to the 2 nJ/pulse regime, but the oscillator emits pulses at a high repetition 
rate, typically 75-100 MHz, depending on the cavity dimensions. 


The Ti-sapphire oscillator is extremely useful as a stand-alone source of femtosecond pulses in the near-IR 
region of the spectrum. Some ultrafast experiments, especially of the pump-probe variety (see below), can be 
conducted with pulses obtained directly from the oscillator or after pulse selection at a lower repetition rate. 
Far-IR (terahertz) radiation is usually generated using a semiconductor (usually GaAs) substrate and focused 
Ti-sapphire oscillator pulses [7]. If somewhat higher-energy pulses are required for an experiment, the Ti- 
sapphire oscillator can be cavity dumped by an intracavity acousto-optical device known as a Bragg cell, 


producing perhaps 50 nJ pulses but at a reduced repetition rate, typically between 200 kHz and 1 MHz [35]. 
The energy available from a cavity-dumped Ti-sapphire laser is intense enough to generate a continuum-like 
source in a single-mode optical fibre. Wiersma and co-workers [ 36 ] generated 5 fs pulses by compressing 
these continuum pulses with a sequence of gratings and prisms. Comparable oscillators have been described 
that exploit other types of solid-state gain media. Although Ti-sapphire crystals are widely used because the 
absorption spectrum overlaps favourably with the output spectra of argon-ion and frequency-doubled Nd- 
YV0 4 continuous-wave lasers, Cr-LISAF may be favoured in the future as a gain medium for femtosecond 
oscillators because it can be pumped by continuous-wave GaAs diode lasers [ 37 , 38 ]. 

B2.1.2.2 AMPLIFICATION 

Although many useful femtosecond spectroscopic experiments on condensed-phase targets can be easily 
performed with low-energy pulses, in the 100 pJ to 1 nJ regime, higher-energy pulses are required if 
wavelength tunability is desired. A femtosecond continuum [ 39 ] can be generated in water or sapphire if the 
pulse energy is higher than 200 nJ; OPA sources require even higher energies, in excess of l|uJ/pulse. 
Amplification of Ti-sapphire oscillators is at this point routinely performed, with excellent commercial 
systems readily available of the regenerative amplifier type [28, 30, 11, 40, 41 ], and there are simple 
multipass amplifier designs [42, 43] that are easily constructed in the laboratory. 

Pulses are selected for amplification from the oscillator's 75-100 MHz pulse train at a much lower repetition 
rate, ranging in published designs from 10 Hz to 250 kHz, either by a Pockels cell or a Bragg cell (as shown 
in figure B2.1.1 . The selected pulse train is amplified in Ti-sapphire gain media using a method known as 
chirped-pulse amplification [28, 29, 30 and 31 ]. In this scheme, oscillator pulses are stretched temporally well 
into the picosecond regime prior to amplification so that the damage threshold for the gain crystal is not 
exceeded [44]. If the amplifier is designed to operate in the >10kHz regime, like the one depicted in figure 
B2.1.1 a stretcher may not be required. A grating-pair pulse compressor [45] is used to compress the pulse 
back nearly to its original duration after it emerges from the amplifier. Regenerative amplifiers capable of 
producing 75-150 fs pulses are the most common systems in use [ 30 , 40], but recently a multipass ring 
amplifier has been described that produces 20 fs pulses [43]. The multipass amplifier depicted in figure B2.1.1 
is a non-ring design that permits a more facile input and extraction of the amplified pulse. 

The most common commercially prepared amplifier systems are pumped by frequency-doubled Nd-YAG or 
Nd-YLF lasers at a 1-5 kHz repetition rate; a continuously pumped amplifier that operates typically in the 250 
kHz regime has been described and implemented commercially [40]. The average power of all of the 
commonly used types of Ti-sapphire amplifier systems approaches 1 W, so the energy per pulse required for 
an experiment effectively determines the repetition rate. 

B2.1.2.3 OPTICAL PARAMETRIC AMPLIFICATION 

Perhaps the ultimate femtosecond light source, the OPA exploits a nonlinear parametric process to amplify a 
portion of 


a femtosecond continuum [ 32 , 33 and 34, 46, 47 and 48]. In most designs, a portion of the output of a 
regenerative or multipass Ti-sapphire amplifier is frequency doubled in a nonlinear crystal to prepare an 
intense pump pulse. Less than 1 |uJ/pulse of the amplifier's output is reserved to seed a single-filament 
continuum [ 47 ] in a thin sapphire crystal. A second nonlinear crystal is used as the gain medium for the 
parametric process, which splits an input pump photon at frequency co 3 into two output photons, a signal 
photon at frequency co 1 and an idler photon at frequency a> 2 , with energy conserved (a> 3 = CDj + a> 2 ). The 
parametric process is greatly enhanced by the presence of seed light at either co 1 or co 2 , which is supplied by 
the continuum [46]. The apparatus can be adjusted to select a certain range of frequencies from the continuum 


for amplification in the nonlinear crystal, allowing the production of wavelength-tunable output pulses 
derived either from the signal or idler with adjustable pulse durations. At this point, (3-barium borate (BBO) is 
the material of choice for the nonlinear crystal. 

The OPA should not be confused with an optical parametric oscillator (OPO), a resonant-cavity parametric 
device that is synchronously pumped by a femtosecond, mode-locked oscillator. 14 fs pulses, tunable over 
much of the visible regime, have been obtained by Hache and co-workers [49, 50] with a BBO OPO pumped 
by a self-mode-locked Ti-sapphire oscillator. 

Shortly after the development of high-energy/pulse Ti-sapphire regenerative amplifier systems, a number of 
investigators reported progress in using OPAs in producing tunable sources of very short pulses. Wilson and 
co-workers [46] showed early on that an experimentally useful source for femtosecond spectroscopy with <50 
fs pulses was obtained through the use of continuum seeding of a type I nonlinear OPA crystal, which was 
pumped by the fundamental output of an amplified Ti-sapphire laser. The main problem with the early 
systems was inherent to the physics of collinearly pumped parametric amplification: the signal, idler and 
pump frequencies have different group velocities (see below) in the nonlinear crystal, which limits the amount 
and frequency bandwidth of the parametric gain. In other language, the phase-matching condition for the 
collinearly pumped OPA works only over a small bandwidth, which tends to limit the pulse duration to fairly 
long (100 fs) pulses, and tuning of the OPA to different signal wavelengths requires reoptimization of the 
crystal's orientation. The design advanced by Wilson's group takes advantage of the smaller mismatch in 
group velocities in the near-IR part of the spectrum; other designs employing pumping with the second 
harmonic of the Ti-sapphire laser provide direct access to visible signal pulses but with significantly longer 
durations (150 fs) [48]. 

Very recently, Hache and co-workers [49] found that a non-collinear pumping of an OPO crystal produces a 
phase-matching condition that is independent of signal wavelength over a very broad bandwidth. This 
discovery makes it possible to obtain very high parametric gain in an OPA with a single pass through the 
crystal and adjustable signal bandwidths. The result is a source of light with wide tunability and adjustable 
pulse durations, the ultimate femtosecond light source. The design for the OPA depicted in figure B2.1.1 is an 
adaptation of that recently described by Riedle and co-workers [32]. 400 nm light obtained by frequency 
doubling the output of a regenerative Ti-sapphire amplifier is overlapped at an angle of 3.7° with the 
femtosecond continuum light. This angle produces in BBO the wavelength-independent phase-matching 
condition noted by Hache and co-workers. Riedle and co-workers demonstrated that signal pulses of 15-20 fs 
duration could be obtained from this system, with tunability over most of the 400-800 nm visible range. 
Using a similar approach, but with some changes in the details of producing the continuum seed light that 
were intended to produce as broad a signal bandwidth as possible, De Silvestri and co-workers [ 34 ] 
subsequently showed that signal pulses as short as 7.2 fs in the visible regime could be produced. In 
contemporaneous work, Kobayashi and co-workers [ 33 ] obtained sub- 10 fs pulses from the visible signal and 
near-IR idler from a comparable apparatus. These latter two results have nearly matched the legendary 
performance of the pulse-compressed CPM dye laser of Shank and co-workers [15]. 


A surprising aspect of the recent work on non-collinearly pumped OPA systems is that it appears that fairly 
long pump pulses, in the 150 fs regime, are to be preferred over shorter pulses if sub- 10 fs signal pulses are 
desired. Mature commercial regenerative amplifier designs operating over the 1-250 kHz repetition-rate range 
are already capable of producing these pulses. Thus, the experimental motivation for developing amplifier 
systems capable of producing very short pulses directly has vanished for spectroscopic applications; however, 
there are a number of important applications for short, high-energy pulses, such as driving the formation of 
femtosecond pulses of x-rays [5]. High-energy physics applications for short, amplified pulses have been 
reviewed recently by Mourou and co-workers [51], 


B2.1.2.4 PULSE COMPRESSION 

Owing to its wide spectral bandwidth, a short optical pulse is distorted temporally by passing through the 
beamsplitters, lenses, filters, etc, that are required for an ultrafast spectroscopic experiment. The distortion 
arises from the dispersion of the speed of light in a medium as a function of wavelength, or group-velocity 
dispersion (GVD). The sweep of frequencies observed at a given spatial position is known as chirp (in 
analogy to the sound of a frequency-swept audio pulse) or group-delay dispersion (GDD). This phenomenon 
is usually described in terms of a Taylor series expansion of the phase § of the optical pulse around the centre 
frequency, a> [ 11 , 52 ]. The first derivative term represents the group delay, t(($) = d(|)/dco. The quadratic term, 
corresponding to the linear sweep in the group delay with respect to wavelength, can be corrected by an 
optical delay line with negative GDD (alternatively speaking, anomalous dispersion) [53], constructed from a 
pair of diffraction gratings [45], a pair of dispersing prisms with Brewster angled surfaces [54] or, less often, 
by a Gires-Tournois interferometer (GTI) [55, 56]- The amount of negative GDD is determined by the 
distance of separation between the two prisms or gratings or between the two reflective films in the 
interferometer. 

In most femtosecond experiments, a double-passed pair of prisms is inserted into the beam prior to its 
reaching the measurement apparatus. In figure B2.1.1 a pair of prisms is depicted after the OPA and after the 
pulse-picker (for use in oscillator-only experiments). This practice allows one to precompens ate for the GDD 
imparted to the beam by the optics in the measurement apparatus so that the pulses are as short as possible 
when they arrive at the sample's position. The prisms are typically manipulated by translating one or both of 
the prisms normal to the base so as to increase or decrease the amount of prism glass traversed; this permits a 
small amount of positive GDD to be added or subtracted. The pulse duration at the sample's position is 
usually determined with an intensity autocorrelation measurement, performed by replacing the sample with a 
nonlinear crystal, such as potassium dihydrogen phosphate (KDP) or BBO. An even more significant problem 
associated with GDD is that the formation of a short pulse in an oscillator requires many round-trips through 
the gain medium, so it is routine to place a pair of prisms or a GTI in the oscillator's cavity to repair the 
damage suffered by the pulse on each trip. The design of Murnane and co-workers [25] depicted for the 
oscillator in figure B2.1.1 exemplifies this practice, which was first established in the final design for the 
CPM dye laser by Valdmanis and Fork [13]. In the future, oscillators may not require prism pairs; it is now 
possible to fabricate chirped mirrors that employ multiple layers of dielectric coatings to provide for a precise 
compensation of quadratic and cubic phase distortions that arise from the gain medium of a femtosecond 
oscillator [26], In fact, a commercial design employing a chirped-mirror, Ti-sapphire oscillator and a solid- 
state continuous-wave Nd-YV0 4 pump laser in a single, compact, sealed (no user controls!) enclosure has just 
become available. Such a source would be expected to be unusually robust. 

The shortest optical pulses actually used so far (1998) in ultrafast spectroscopic experiments were obtained by 
Shank and co-workers from an amplified CPM laser [57]. In these extraordinary experiments, a sequence of a 
pair of prisms 


and a pair of gratings was employed. The reason for this additional complexity was that the cubic-term phase 
distortion imparted by the prism pair is of opposite sign of that imparted by the grating pair. As a result, it was 
possible to null simultaneously the quadratic and cubic dispersion terms at the position of the sample [15]. 
Most optical materials exhibit normal dispersion; the index of diffraction increases monotonically with 
respect to frequency. Some materials, however, exhibit an anomalous dispersion regime in the near-IR part of 
the spectrum. Hochstrasser and co-workers exploited the finding that the mid-IR substrates CaF 2 and BaF 2 , 
used, for instance, for beamsplitters and windows, imparts GDD to a transmitted beam that is of the opposite 
sign of that imparted by Ge, which was used as a long-pass filter [58]. Fused silica optical fibres exhibit 
anomalous dispersion at wavelengths above 1.3 |um [59, 60 ]. The implication of this phenomenon is that 
positively chirped IR pulses can be compressed by being propagated in a fibre. Further, short IR pulses can 
propagated in a fibre for arbitrarily long distances without suffering a change in pulse duration [61, 62 ]. 


B2.1.3 FEMTOSECOND TIME-RESOLVED SPECTROSCOPY 

In lieu of electronic timing mechanisms, a form of interferometry is usually employed in ultrafast 
spectroscopy to obtain time resolution on the picosecond or shorter time scales. The simplest ultrafast 
measurement apparatus is that shown in figure B2.1.2 an autocorrelator. This instrument is used in various 
forms to measure the duration of an ultrashort laser pulse [63]. A pulse of light is first split into equal portions 
by a beamsplitter; the two parts are then recombined after they travel in the two arms of an interferometer 
(here, derived from the Michelson design) so that they are focused onto a thin (100 |um or less in thickness) 
nonlinear crystal. When the two pulses overlap temporally and spatially, the second harmonic is emitted along 
the direction that conserves momentum, between the direction of the two emerging fundamental pulses [64], 
By moving one of the retroreflectors in the interferometer along the beam path towards or away from the 
beamsplitter, one of the pulses is made to scan temporally through the other pulse in the nonlinear crystal. 
Thus, the ultrafast timing of the experiment is provided by a simple measurement of displacement and 
knowledge of the speed of light. Reproducible displacements of the order of 1 |um, corresponding to a time- 
of- flight displacement of 6.67 fs in the interferometer, are routinely performed with computer-controlled 
linear actuators. 
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Figure B2.1.2 Modified Michelson interferometer for non-collinear intensity autocorrelation. Symbols used: 
rl, r2, retroreflecting mirror pair mounted on a translation stage; bs, beamsplitter; x, nonlinear crystal; pmt, 
photomultiplier tube. 


The intensity of this upconverted light detected by a photodetector is described by the equation 
/(r) = J"™ /(/ — r)/(/) dt 5 where 1(f) is the intensity of the pulse as a function of time t at a given point in 

the nonlinear crystal, and x is the shift in time of the variably delayed or gating pulse, as controlled by one of 
the retroreflectors in the interferometer. The photodetector integrates the upconverted intensity for a given 
delay x; an example of the signal obtained by scanning x is shown in figure B2. 1.3(a) with the input pulse 
train provided by a self-mode-locked Ti-sapphire laser. The symmetrical shape of the autocorrelation trace 
arises from the convolution of the input 1(f) pulse shapes. The true shape of 1(f) and the dependence of the 
phase on wavelength can be obtained in a slightly more elaborate experiment called frequency-resolved 
optical gating (FROG) [65, 66, 67 and 68], a form of which can be performed by recording the autocorrelation 
traces as a function of wavelength by dispersing the upconverted light in a spectrometer placed before the 
photodetector [69]. 
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Figure B2.1.3 Output of a self-mode-locked titanium-sapphire oscillator: (a) non-collinear intensity 
autocorrelation signal, obtained with a 100 |im (3-barium borate nonlinear crystal; (b) intensity spectrum. 

The spectrum of the femtosecond pulse provides some information on whether the input pulse is chirped, 
however, causing the temporal width of I(t) to be broader than expected from the Heisenberg indeterminancy 
relationship. 


The full width at half maximum of the autocorrelation signal, 21 fs, corresponds to a pulse width of 13.5 fs if 
a sech 2 shape for the I(t) function is assumed. The corresponding output spectrum shown in figure B2. 1.3(b) 
exhibits a width at half maximum of approximately 700 cm . The time-bandwidth product A x A v is close to 
0.3. This result implies that the pulse was compressed nearly to the Heisenberg indeterminacy (or Fourier 
transform) limit [51] by the double-passed prism pair placed in the beam path prior to the autocorrelator. 

The intensity autocorrelation measurement is comparable to all of the spectroscopic experiments discussed in 
the sections that follow because it exploits the use of a variably delayed, gating pulse in the measurement. In 
the autocorrelation experiment, the gating pulse is just a replica of the time-fixed pulse. In the spectroscopic 
experiments, the gating pulse is used to interrogate the populations and coherences established by the time- 
fixed pulse. 

B2.1.3.1 FLUORESCENCE UPCONVERSION SPECTROSCOPY 


Time-resolved fluorescence is perhaps the most direct experiment in the ultrafast spectroscopist's palette. 
Because only one laser pulse interacts with the sample, the method is essentially free of the problems with 
field-matter time orderings that arise in all of the subsequently discussed multipulse methods. The signal 


detected is usually directly proportional to the population in the resonantly prepared excited state alone. In 
systems that exhibit photochemistry, for instance, the time evolution of the fluorescence provides a direct 
view of the decay of the photochemically active state. 

Fluorescence spectroscopy can be performed in the ultrafast regime with a nonlinear crystal and a short gating 
pulse in an experiment known as fluorescence upconversion [70, 71 ]. The interferometer shown in figure 
B2.1.2 is modified so that an incident pulse is split into two pulses: a weaker pulse that is used to excite a 
sample and a stronger, gating pulse that is overlapped spatially in a nonlinear crystal with the fluorescence 
that is collected from the sample. Sum-frequency generation in the nonlinear crystal produces output photons 
with frequency a> 3 from the input gate photons of frequency co 1 and fluorescence photons of frequency a> 2 ; 
since co 3 = coj + a> 2 , the output photons are upconverted, or transferred to a higher frequency by the gate 
pulse. The intensity of the beam of output photons is proportional to the product of the intensity of the gating 
and fluorescence photons; the gate photons slice out only those fluorescence photons that are temporally 
overlapped with the short gating pulse in the nonlinear crystal. This permits the time course of the 
fluorescence intensity at a particular frequency co 2 to be mapped out by scanning the time delay for the gate 
pulse. 

The frequency a> 2 that is detected in the fluorescence can be largely selected by adjusting the phase-matching 
condition for the nonlinear crystal. A double monochromator (or, alternatively, a prism and a single-stage 
monochromator) is used to discriminate between the upconverted fluorescence and the background 
interference from the second harmonic of the gating light. Even though the nonlinear crystal is angle-tuned to 
optimize the intensity of the upconverted fluorescence at the wavelength chosen by the monochromator, the 
strong gate pulse generates a significant second-harmonic signal that can often be as strong or stronger than 
the gated fluorescence. If the gate pulse is short, with a concomitantly large spectral bandwidth, the second- 
harmonic background may significantly overlap the fluorescence spectrum of the sample under study. In 
many respects this problem is comparable to that encountered in discriminating Rayleigh scattering from 
Raman scattering in the low Raman frequency regime. In Raman spectroscopy, however, the vibrational line 
shapes are usually much narrower than any fluorescence background; in fluorescence upconversion 
spectroscopy, the second-harmonic background from the gate pulse is often as broad as the fluorescence 
signal, making things comparably more difficult [71]. 
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In figure B2.1.4 a design for a fluorescence upconversion spectrometer is depicted that is based on one 
discussed by Jimenez and Fleming [71]. In the most versatile instrument, an OP A would be used to generate 
the wavelength-tunable excitation pulse, while a portion of the direct output of the amplified Ti-sapphire laser 
that pumps the OP A would be reserved for use as a strong gate pulse. This practice has several advantages: 
the gating and excitation pulses are implicitly time-synchronized and the excitation pulse can be well removed 
in wavelength from the region of the fluorescence photon, in order to minimize the second-harmonic gate 
background mentioned above. The experimental set-up depicted in figure B2.1.4 employs two off-axis 
parabolic reflectors to collect and collimate the fluorescence emitted by the sample and then to focus it onto 
the nonlinear crystal. Other designs employ a single elliptical reflector to perform both tasks. The design 
shown here may permit low-temperature fluorescence studies to be executed, however, since the crystal and 
sample can be well separated on the optical table. Jimenez and Fleming [ 71 ] note that very few femtosecond 
fluorescence upconversion experiments have so far been attempted at low temperature. 
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Figure B2.1.4 Fluorescence upconversion spectrometer based on the use of off-axis elliptical reflectors for 
the collection and focusing of fluorescence. Symbols used: el, e2, off-axis elliptical reflectors; s, sample; x, 
nonlinear crystal. (After Jimenez and Fleming [71].) 

Over the last few years, the time resolution attainable in fluorescence upconversion has reached the sub- 100 fs 
regime. The main problems associated with pushing the time resolution down further are sensitivity and GDD. 
The problem of collecting enough fluorescence photons and imaging them onto the nonlinear crystal, where 
detection occurs, is at odds with obtaining pulse-width-limited time resolution. Large elliptical or parabolic 
reflectors enhance the number of photons collected, but the time-of- flight dispersion increases with the 
surface area used on the reflector, so in many implementations the reflectors are masked. Lenses were used 
for collection and focusing in early experiments, as in the design discussed by Barbara and co-workers [70], 
but the time resolution obtainable is typically limited to the 250 fs regime owing to the GDD suffered by the 
fluorescence photons in being transmitted through the material used to make the lens. In principle, spherical 
confocal reflective optics might be used without time-of-flight dispersion. Even so, the nonlinear crystal itself 
imparts GDD and distorts the time response. 

The instrument response function (IRF) for the fluorescence upconversion experiment, then, cannot be shorter 
than the intensity cross-correlation function, which can be obtained using an instrument like that shown in 
figure B2. 1.4 
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in which the excitation and gate pulses used in the upconversion experiment are injected separately into the 
two arms of the interferometer. In the simplest case, the actual time response of the fluorescence resembles 
the integral of the IRF, S{r) = f* 00 l(t — T)F(t)dt' The fluorescence signal F(x) at the detection frequency 

a> 2 is, in the absence of other dynamics and rapid relaxation, a step function; the excitation pulse transfers a 
fraction of the ground-state population to the resonant excited state, which then emits fluorescence. As 
indicated in the above equation, the upconversion signal S(x) responds as the gate pulse I(t) is integrated by 
the step-response of F(x). Of course, if ground-state recovery or energy-transfer processes deplete the 
resonant excited-state population, then F(x) decays accordingly. The upconversion signal S(x) then exhibits a 
shape that is effectively a convolution of the IRF and the excited-state population response F(x). Since many 
experiments are intended to determine F(x) via measurement of S(x), one is often faced with deconvolution of 
the IRF from the measured signal in the course of data analysis. This problem has been extensively discussed 
in the literature [72]; the best solution, is, if possible, to make the IRF much shorter than the dynamics 
exhibited by F(x) so that the measured signal is not significantly distorted. 

An important extension to the simplest upconversion experiment at a single detection frequency a> 2 is the 
practice of measuring time-resolved fluorescence spectra, that is, the shape of the fluorescence spectrum 


emitted by the sample at a given gate delay x, S(cd 2 ,x). In most reported work, the S(a> 2 ,x) spectrum is built up 
by obtaining a family of single-wavelength transients, scanned at a given a> 2 as a function of x. If the 
instrument provides for computer control of the angle tuning of the nonlinear crystal and the detection 
monochromator, it would be more efficient, with respect to experimental time, to directly scan a> 2 at a given 
time delay x. 

Perhaps the best example of a situation requiring knowledge of the time evolution of the fluorescence 
spectrum is that associated with dynamic solvation, the time-dependent reorganization of solvent dipoles in 
response to a light-induced change in the dipole moment of a dissolved chromophore [73 , 74, 75, 76, 77 and 
78 ]. As an example, figure B2.1.5 shows two upconversion traces obtained with the dye phenoxazone 
dissolved in methanol at room temperature. When the observation wavelength is tuned to the blue edge of the 
emission spectrum, at 570 nm, the fluorescence is observed to decay with a time constant of several 
picoseconds. If the observation wavelength is tuned to the red edge of the absorption spectrum at 650 nm, the 
fluorescence transient exhibits a rise with a similar time constant. These two upconversion transients evidence 
a blue-to-red shift of the time-resolved fluorescence spectrum owing to dynamic solvation on the picosecond 
time scale. 
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Figure B2.1.5 Fluorescence upconversion traces obtained at two observation wavelengths (full circles, 570 
nm; open circles, 650 nm) at room temperature with an oxazine dye, phenoxazone, in methanol solvent. 
Figure courtesy of Professor S Rosenthal (Vanderbilt University). 

The upconversion method can also be applied to the measurement of anisotropy [79], which returns 
information on the time dependence of the orientation of the excited-state transition-dipole moment with 
respect to time following excitation with a linearly polarized pulse of light. Anisotropy information can be 
used to study rotational diffusion [80, 81] in liquids, energy transfer between chromophores in proteins [79, 
82 ] and excited-state isomerization [ 83 , 84], to name just three common applications. The method takes 
advantage of photoselection of those molecules whose transition-dipole moments are aligned with the 
excitation pulse's plane of polarization to prepare an essentially polarized excited-state orientational 
distribution initially. The fluorescence that is emitted initially by this distribution is highly polarized; as time 
progresses, any mechanism that causes rotation of the excited-state transition-dipole moment, such as 
rotational diffusion, energy transfer and isomerization, will cause depolarization of the fluorescence. 


The anisotropy function r(t) = (L(t) - I^(t))/(L(t) + 27j_(f)) is determined by two polarized fluorescence 
transients IJt) and I^{t) observed parallel and perpendicular, respectively, to the plane of polarization of the 
excitation pulse. In the upconversion experiment, the two measurements are most conveniently made by 
rotating the plane of polarization of the excitation pulse with respect to the fixed orientation of the input plane 


of the nonlinear crystal. Because the photoselected angular distribution for a set of isolated transition dipoles 
exhibits a cos(0) dependence with respect to the angle between the plane of polarization of the excitation 
pulse and the observation plane, the anisotropy r(i) decays from an initial value of 0.4 as the depolarization 
proceeds. The time course of the anisotropy is essentially a measure of the time-correlation function ^(jjthat 

describes the memory of the initial, photoselected dipole orientation ^{Qjas correlated with the probed 
dipole direction £.^O as a function of time [ 79 , 84 ]. 

In certain situations involving coherently interacting pairs of transition dipoles, the initial fluorescence 
anisotropy value is expected to be larger than 0.4. As indicated by the theory described by Wynne and 
Hochstrasser [85, 86] and by Knox and Giilen [87, 88], the initial anisotropy expected for a pair of coupled 
dipoles oriented 90° apart, as an example, 
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is 0.7, and the decay of the anisotropy from 0.7 to 0.4 is a measure of the time scale for the decay of the 
electronic coherence between the two states. This theory has been applied to the interpretation of the decay of 
anisotropy in the analogous anisotropy measurement obtained using pump-probe (see below) stimulated- 
emission measurements in magnesium tetraphenylporphyrin [ 89 ] and in exciton-coupled chromophore pairs in 
cyanobacterial light-harvesting proteins [90, 91 ], 

B2.1.3.2 PUMP-PROBE SPECTROSCOPY 

An interferometric method was first used by Porter and Topp [1, 92] to perform a time-resolved absorption 
experiment with a ^-switched ruby laser in the 1960s. The nonlinear crystal in the autocorrelation apparatus 
shown in figure B2.1.2 is replaced by an absorbing sample, and then the transmission of the variably delayed 
pulse of light is measured as a function of the delay x. This approach is known today as a pump-probe 
experiment; the first pulse to arrive at the sample transfers {pumps) molecules to an excited energy level and 
the delayed pulse probes the population (and, possibly, the coherence) so prepared as a function of time. 

The pump-probe concept can be extended, of course, to other methods for detection. Zewail and co-workers 
[ 16 , 18 , 19 and 20, 91] have used the probe pulse to drive population from a reactive state to a state that emits 
fluorescence [ 94 , 95 , 96 , 97 and 98] or photodissociates, the latter situation allowing the use of mass 
spectrometry as a sensitive and selective detection method [99, 100 ]. 

Pump-probe absorption experiments on the femtosecond time scale generally fall into two effective types, 
depending on the duration and spectral width of the pump pulse. If the pump spectrum is significantly 
narrower in width than the electronic absorption line shape, transient hole-burning spectroscopy [ 101 , 102 , 
103 , 104 , 105 , 106 , 107 , 108 , 109 , 110 , 111 , 112 and U3] can be performed. The second type of experiment, 
dynamic absorption spectroscopy [51, 114 , 115 , 116 , 117 , 118 , 119 , 120 , 121 and 122 ], can be performed if 
the pump and probe pulses are short compared to the period of the vibrational modes that are coupled to the 
electronic transition. 

Figure B2.1.6 depicts a standard type of apparatus used for the hole-burning type of time-resolved absorption 
experiment [ 112 , 113 , 123 ]. A pulse train from an amplified laser is split into two portions. The minor portion 
is used directly as a source of pump photons; the major portion is used to generate a broad-band probe source 
derived from a femtosecond continuum. In this application, the continuum is typically generated by focusing a 
> 1 |uJ pulse of light into flowing water or ethylene glycol in a cuvette [39]; a continuum with particularly 
good optical properties can be generated in a thin sapphire crystal [47]. After the pump and probe pulses are 
overlapped in the sample, the transmitted probe light is dispersed in a monochromator and then detected either 
by a photodiode or by a multichannel detector, such as a charge-coupled device (CCD). The most common 
detection scheme involves using a mechanical chopper to modulate the intensity of the pump beam; the pump- 
induced changes in the transmission of the probe beam are then detected by using a lock-in amplifier. 
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Figure B2.1.6 Femtosecond spectrometer for transient hole-burning spectroscopy with a continuum probe. 
Symbols used: bs, 10% reflecting beamsplitter; p, polarizer. The continuum generator consists of a focusing 
lens, a cell containing flowing water or ethylene glycol or, alternatively, a sapphire crystal and a recollimating 
lens. 


As an example, a series of transient hole-burning spectra obtained with a chirp-compensated continuum probe 
with a light-harvesting protein is shown in figure B2.1.7 [ 112 ]. As the probe delay increases, the initially 

narrow transmission hole increases in width owing to vibrational redistribution and shifts about 500 cm to 
the red owing to dynamic solvation. Analysis of this series of spectra was made in terms of overlapping 
spectral contributions from ground-state depletion, stimulated emission, and excited-state absorption. The 
time resolution of the experiment is limited by the width of the pump-probe cross-correlation function, which 
can be conveniently determined in this case using either FROG or a wavelength-resolved optical Kerr 
measurement [ 124 ]. 

The main cost of this enhanced time resolution compared to fluorescence upconversion, however, is the 
aforementioned problem of time ordering of the photons that arrive from the pump and probe pulses. When 
the probe pulse either precedes or trails the arrival of the pump pulse by a time interval that is significantly 
longer than the pulse duration, the action of the probe and pump pulses on the populations resident in the 
various resonant states is unambiguous. When the pump and probe pulses temporally overlap in the sample, 
however, all possible time orderings of field-molecule interactions contribute to the response and complicate 
the interpretation. Double-sided Feynman diagrams, which provide a pictorial view of the density matrix's 
time evolution under the action of the laser pulses, can be used to determine the various contributions to the 
sample response [ 125 ], 

The part of the response arising from a coherent interaction between the temporally overlapped probe and 
pump pulses is the so-called coherence spike [ 126 , 127 ], which makes its appearance in the zero-delay region 
in figure B2.1.7 essentially confined to the spectral region coinciding with the pump-pulse spectrum. 
Accordingly, the overall pump-probe signal's temporal shape when viewed at a single-probe wavelength is 
not just the integral of the convolution of the pump and probe pulse temporal shapes. The intensity of the 
coherence spike is strongly dependent on the duration of the laser pulses employed in the experiment and the 
time scale of dephasing [ 127 , 128 ]. 
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Figure B2.1.7 Transient hole-burned spectra obtained at room temperature with a tetrapyrrole-containing 
light-harvesting protein subunit, the a subunit of C-phycocyanin. Top: fluorescence and absorption spectra of 
the sample superimposed with the spectrum of the 80 fs pump pulses used in the experiment, which were 
obtained from an amplified CPM dye laser operating at 620 nm. Bottom: absorption-difference spectra 
obtained at a series of probe time delays. 

The first dynamic absorption studies that afforded a view of a molecular response stimulated by impulsive 
excitation with femtosecond laser pulses were performed by Shank and co-workers in 1988 with 6 fs pulses 
from a fibre-grating pulse-compressed CPM laser [52]. This experiment might be called a degenerate pump- 
probe experiment since the 6 fs pulses were used both for pumping and probing; the broad spectral bandwidth 
had been used previously as a short, chirp-free, continuum-like probe in time-resolved hole-burning 
spectroscopy [ 101 ]. Under the conditions of impulsive excitation, where the pulse duration is short relative to 
the period of the coupled vibrations, vibrational wavepackets [ 129 , 130 ] are created on structurally displaced 
excited-state potential energy surfaces. The wavepackets move back and forth under the forces of the excited- 
state potential and cause modulation of the stimulated-emission contribution to the pump-probe signal. A 
second pump-field-molecular interaction causes a stimulated transition of some of the moving excited-state 
wavepacket back down to the ground state at a position that is displaced from the equilibrium 
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molecular geometry, causing a ground-state wavepacket motion; that corresponds to the signal detected in 
conventional resonance Raman spectroscopy [ 130 , 131 ]. Accordingly, the modulation of the ground-state 
depletion signal owing to the wavepacket motion on the ground-state surface is termed resonant-impulsive 
stimulated-Raman scattering (RISRS) [ 132 ], Champion and co-workers [ 133 , 134 and 135 ] have made 
extensive use of RISRS in their studies of heam motions in response to ligand photodissociation. Nelson and 
co-workers [ 132 ] have used RISRS to look at collective motions in molecular crystals. 


Figure B2.1.8 shows dynamic absorption results obtained with an IR dye in solution. The experiment was 
conducted with 13 fs pulses from a pulse-picked Ti-sapphire laser and a rapid-scanning pump-probe 
interferometer [ 136 ]. The single-wavelength transient was obtained by dispersing the transmitted probe light 
in a monochromator and monitoring at a single narrow range of wavelengths. The transient exhibits a 
modulation signal arising from excited-state vibrational wavepacket motions that is sustained for at least a 
picosecond. Modulations of this type can be frequency analysed using either Fourier transformation or a 
linear-prediction, singular-value decomposition (LPSVD) method, as was done in figure B2. 1.8(b) . The 
LPSVD method [ 137 ] fits the modulation pattern to a series of damped cosinusoids. The representation shown 
in figure B2. 1.8(b) uses the frequencies and damping (dephasing) times to construct a spectral representation 
that resembles a conventional Raman spectrum. The modulation spectrum shown in figure B2. 1.8(b) 

evidences contributions from several vibrational modes over the 100-1200 cm range. The intensity of 
modulation for a given frequency is observed to depend strongly on the pulse duration; as the pulse duration is 
made shorter, the frequency window that provides impulsive excitation broadens. This windowing has been 
described by Lotshaw and McMorrow in their discussion of non-resonant optical Kerr effect studies, where 
impulsive excitation of intramolecular and intermolecular vibrational modes in neat liquids can be studied 
through the use of orthogonally polarized pump and probe beams and optically heterodyned detection 
methods [138, 139, 140, 141 and 142]. 

Excited-state vibrational coherence of the type observed in figure B2.1.8 is potentially a very important tool 
for the elucidation of excited-state reaction dynamics. The most well known example of this type of work is 
that of Mathies, Shank and co-workers on the dynamics of rhodopsin [ 118 , 143 ]. Elsaesser and co-workers 
[ 144 ] used a two-colour dynamic absorption technique to study ultrafast intramolecular proton transfer in a 
benzotriazole dye in solution. Wynne and Hochstrasser [ 145 ] observed vibrational coherence associated with 
charge transfer in a contact ion-pair in solution. Diffey et al [ 122 ] used a comparison of excited-state and 
RISRS wavepacket modulation patterns to study ultrafast charge transfer in a bacteriochlorophyll dimer 
system isolated from a purple-bacterial light-harvesting chromophore. Vos and co-workers [ 146 , 147 and 148 ] 
have observed excited-state vibrational coherence in purple-bacterial reaction centres; Stanley and Boxer 
observed analogous signals using fluorescence upconversion [ 149 ]. 

So far we have exclusively discussed time-resolved absorption spectroscopy with visible femtosecond pulses. 
It has become recently feasible to perform time-resolved spectroscopy with femtosecond IR pulses. 
Hochstrasser and co-workers [58, 150 , 151 , 152 , 153 , 154 , 155 , 156 and 157 ] have worked out methods to 
employ IR pulses to monitor chemical reactions following electronic excitation by visible pump pulses; these 
methods were applied in work on the light-initiated charge-transfer reactions that occur in the photosynthetic 
reaction centre [ 156 , 157 ] and on the excited-state isomerization of the retinal pigment in bacteriorhodopsin 
[ 155 ]. Walker and co-workers [ 158 ] have recently used femtosecond IR spectroscopy to study vibrational 
dynamics associated with intramolecular charge transfer; these studies are complementary to those performed 
by Barbara and co-workers [ 159 , 160 ], in which ground- state RISRS wavepackets were monitored using a 
dynamic-absorption technique with visible pulses. 
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Figure B2.1.8 Dynamic absorption trace obtained with the dye IR144 in methanol, showing oscillations 
arising from coherent wavepacket motion: (a) transient observed at 775 nm; (b) frequency analysis of the 
oscillations obtained using a linear prediction, singular- value-decomposition method. 

In some extremely innovative recent experiments, Hochstrasser and co-workers [ 58 ] have described IR 
transient hole-burning experiments focused on characterizing inhomogeneous broadening in the amide I 

transition in several small polypeptides. 180 fs IR pulses centred at 1650 cm -1 were generated from a BBO 
OPA source through the use of difference-frequency mixing of the signal and idler pulses in a AgGaS 2 

crystal. Narrower, 1 ps duration IR pulses were produced from the spectrally broad femtosecond IR pulses 
using a mechanically scannable Fabry-Perot etalon. The results were presented using a novel two- 
dimensional representation as shown in figure B2.1.9 in which the time-resolved hole-burned spectra were 
plotted against the centre frequency of the pump spectrum that burned the hole. The results provide 
information on the extent of derealization of the amide vibrational wavefunction along the peptide backbone. 


-18- 


17Q0 


16S0 


100) 


1&4G 


ifi&d 



1530 


15*0 


tew 


1WD 


S54Q 


1GW 


1690 


1700 


Pump Wavenumber [cm m% ] 


Figure B2.1.9 Two-dimensional time-resolved IR holeburning spectra obtained with two small polypeptides, 
apamin and scyllatoxin, by Hochstrasser and co-workers [58]. Figure courtesy Professor R M Hochstrasser 
(University of Pennsylvania). 
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B2.1.3.3 PHOTON-ECHO AND TRANSIENT-GRATING SPECTROSCOPY 


The methods discussed so far, fluorescence upconversion, the various pump-probe spectroscopies, and the 
polarized variations for the measurement of anisotropy, are essentially conventional spectroscopies adapted to 
the femtosecond regime. At the simplest level of interpretation, the information content of these conventional 
time-resolved methods pertains to populations in resonantly prepared or probed states. As applied to chemical 
kinetics, for most slow reactions (on the ten picosecond and longer time scales), populations adequately 
specify the position of the reaction coordinate; intermediates and products show up as time-delayed spectral 
entities, and assignment of the transient spectra to chemical structures follows, in most cases, the same 
principles used in spectroscopic experiments performed with continuous wave or nanosecond pulsed lasers. 


The multiple-pulse methods discussed in this section, in contrast, can be used to obtain information on the 
time evolution of electronic or vibrational coherence, the correlation of phase between two states. In fast, sub- 
picosecond chemical reactions and in energy transfer, as examples, knowledge of the time evolution of 
coherence and population is essential if a correct physical model is to be established. For example, in the 
purple-bacterial photosynthetic reaction centre, the 3 ps time scale for the charge transfer from the 
bacteriochlorophyll dimer that serves as the primary electron donor to the pheophytin that serves as the 
electron acceptor is comparable to that of vibrational dephasing and just longer than the fast part of electronic 
dephasing [ 146 , 149 , 156 , 161 , 162 , 163 , 164 , 165 , 166 and 167 ]. It is certainly inadequate to describe this 
situation just in terms of populations and the states of the individual macrocycles that are involved in the 
charge-transfer reaction. A similar problem applies to the photophysics of retinal in the visual and proton- 
pumping proteins rhodopsin and bacteriorhodopsin. The light-initiated isomerization of retinal in these 
proteins occurs on the sub-picosecond time scale; it involves a time evolution from an excited electronic state 
to a isomerized ground state on a time scale that is shorter than that involved in vibrational dephasing [ 114 , 
117, 118, 120, 143]. 

Photon-echo and transient-grating spectroscopy exploit, in general, time-ordered interactions between three or 
more optical pulses [ 125 , 168 ], The principles involved are analogous to those that are well established for 
pulsed experiments in nuclear and electron magnetic resonance [ 169 ]. Although the methods discussed below 
were applied first for the study of electronic coherence, as picosecond and femtosecond IR sources were 
developed a set of corresponding methods were applied to the direct study of vibrational coherence. Simple 
multiple-pulse sequences, with control over relative delays between pulses, can be formed using 
interferometers with three (or more) arms. Importantly, the pulses can be aimed spatially so that the phase- 
matched (momentum-conserving) outgoing directions for the signal (echo or diffracted) pulses are spatially 
resolved from the transmitted excitation pulses, affording zero-background detection [ 170 ]. 

Transient-grating spectroscopy is performed using two pump pulses that are arranged to arrive at the sample 
at the same time but aimed to overlap at an angle [ 171 ]. A transmission diffraction grating is formed owing to 
the spatial interference between the two pump pulses. The grating consists of alternating regions of excited- 
state molecules, formed in the bright regions where the two incoming beams interfere constructively, and of 
ground-state molecules, left undisturbed in the dark regions where the two beams interfere destructively. A 
third, probing laser pulse is diffracted by the population grating into a direction that is resolved from the pump 
directions, allowing the intensity of the population grating to be measured as a function of time. Any physical 
process that causes the spatial pattern of ground- and excited-state molecules to fade away or become less 
distinct will be detected in terms of a decrease in the diffracted beam's intensity. Thus, in addition to being 
sensitive to population decay, the transient-grating experiment 
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can be used to detect spatial motion of molecules owing to transport or diffusion. For example, Miller and co- 
workers [ 172 , 173 and 174 ] have employed transient-grating spectroscopy to study protein motions in heam 
proteins. Further, since the diffraction efficiency is sensitive to the orientation of the transition-dipole 
moments of the excited-state molecules, the transient-grating method can be used to characterize rotational 
diffusion. Fayer and co-workers have also exploited polarization gratings, formed by making the two 
coincident incoming beams be orthogonally polarized [ 171 ]. 

The simplest echo experiment, the two-pulse photon echo, reports information on the decay of coherence in 
terms of echo intensity. As in the corresponding spin-echo experiment in magnetic resonance, the first 
incident laser pulse rotates the Bloch vector [ 168 , 175 ] from pure population in the ground state to an 
orientation corresponding to a coherent superposition state; the density matrix now exhibits off-diagonal 
elements as well as diagonal elements. During the waiting period t between the first and second pulses, the 
off-diagonal (coherence) elements in the density matrix decay according to the dephasing time, T 2 , while the 
on-diagonal (population) elements decay according to the lifetime of the resonantly prepared state, T, . The 


second laser pulse rotates the Bloch vectors again so that the phase of rotation of the Bloch vectors is inverted, 
which leads to refocusing. The vectors are maximally refocused at an interval t after the second pulse, so that 
the spontaneous emission arising from the ensemble of molecules is radiated with spatial coherence, forming 
an echo pulse. Thus, the intensity of the echo as a function of the waiting time t can be described by an 
exponential decay with time constant T 2 [ 176 ]. If the two input pulses are directed along the directions jfc 1 and 

£ 2 , echo signal beams are emitted along the 2 jfc 1 - Jt 2 and 2 A 2 - jfc 1 directions. In practice, the input and echo 

beams are recollimated by a single lens after they emerge from the sample; iris apertures are used to spatially 
isolate the direction of either of the echo beams so that the emerging input beams are blocked. 

In liquids, the Bloch equations (single-dephasing time scale) picture [ 175 ] used above to describe the 
formation of echo signals is apparently inadequate. It is now known that electronic dephasing occurs over a 
distribution of time scales, so a single time constant T 2 is insufficient to describe all of the line -broadening 
dynamics [ 177 ], The two-pulse echo method described above only is sensitive to the fastest of processes; in 
organic molecules in solution, the two-pulse echoes typically decay on the 20 fs or shorter time scale [ 176 ]. 
This is the time scale usually assigned to homogeneous line broadening. The slower electronic dephasing 
processes that contribute to inhomogeneous line broadening, involving solvent-induced fluctuations or 
radiationless decay between uncorrelated states, extend over the 10 fs to 100 ps (or longer) time scales in 
liquids and proteins [77, 78, 177 ], 

A three-pulse or stimulated photon-echo experiment can be employed to characterize dephasing on a much 
longer time scale than is accessible to the two-pulse photon echo experiment. Figure B2.1.10 shows a three- 
pulse interferometer and beam-input geometry employed by Fleming and co-workers [ 170 , 178 , 179 , 180 and 
181 ]. The modified forward-box beam-input geometry allows three-pulse echoes (or grating signals) to be 
detected in the phase-matched A 1 - jfc 2 + A 3 and - jfc 1 + 4 2 + jfc 3 directions. As in the Hahn stimulated spin-echo 

sequence [ 169 ], for a given waiting time t between the first two pulses, a plot of the echo intensity as a 
function of the time period T between the second and third pulses returns just the lifetime T^ of the resonantly 
prepared state. At a given delay T, a plot of the intensity as a function t returns a decay related to the 
dephasing time T 2 , but there is a subtle change in the shape of the intensity envelope that is discernible as Tis 
varied [ 170 , 179 , 182 , 183 , 184 and 185 ], Figure B2.1.11 shows the results obtained from a three-pulse 
photon-echo experiment on a small protein subunit that binds an extended tetrapyrrole chromophore [ 186 ]. At 
early delays T, the shape is asymmetrical, with the maximum intensity shifted away from t = 0. As Tis 
increased, so that the ensemble evolves for longer time periods prior to rephasing by the third pulse, the 
envelope becomes more symmetrical, and the maximum shifts back to near t = 0. The asymmetrical shape 
observed at early 
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delays T reports the presence of an echo, but the symmetrical signal observed at longer delays T arises from a 
free-induction decay only. 
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Figure B2.1.10 Stimulated photon-echo peak-shift (3PEPS) signals. Top: pulse sequence and interpulse 
delays t and T. Bottom: echo signals scanned as a function of delay t at three different population periods T, 
obtained with samples of a tetrapyrrole-containing light-harvesting protein subunit, the a subunit of C- 
phycocyanin. 
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Figure B2.1.11 3PEPS profile obtained at room temperature with samples of a tetrapyrrole-containing light- 
harvesting protein subunit, the a subunit of C-phycocyanin, as used in the previous figure. 

This experiment is now known as a three-pulse stimulated photon-echo peak-shift (3PEPS) experiment. 
Weiner and Ippen [ 182 ] were the first to describe the echo-envelope-shifting phenomenon and its relationship 
to inhomogeneous line broadening; application of the 3PEPS method to problems of dynamic solvation has 
been popularized especially by the groups of Fleming [ 177 , 178 and 179 , 184 , 185 ] and of Wiersma, who has 
advanced gated versions of the experiment that actually time-resolves the echo in order to obtain additional 
information [ 187 , 188 and 189 ]. The 3PEPS method returns, in general, superior information on solvation 
dynamics as compared to that returned by dynamic Stokes shift measurements by fluorescence upconversion 
or transient hole-burning spectroscopy because no line shape assumptions have to be made; in fact, a full 
analysis of the time-correlation and line-broadening functions obtained from the 3PEPS experiment can be 
used to obtain all pertinent spectroscopic observables, including the absorption and fluorescence line shapes. 
Owing to the mapping of coherence into population by the second pulse in the sequence, dephasing can be 
studied using the 3PEPS method over an enormous time scale, generally as long as the lifetime of the resonant 
state [ 177 ], Figure B2.1.12 hows the entire 3PEPS profile obtained from a series of experiments on the sample 
used for figure B2.1.1 1 conducted with a series of T delays. Several different time scales that contribute to 
electronic dephasing are notable, corresponding to a very fast decay on the <20 fs time scale, a roughly 
exponential decay on the 100 fs time scale, and a slower decay to a long-lived offset over the 200 fs to 1 ps 
time scale [ 186 ]. The magnitude of the peak shift for each component is proportional to the strength of 
coupling of solvent fluctuations on a given time scale to the electronic dipole of the resonant electronic state 
that is used as a probe [ 177 ]. 
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Figure B2.1.12. 3PEPS profile obtained at room temperature with samples of a tetrapyrrole-containing light- 
harvesting protein subunit, the a subunit of C-phycocyanin, as used in the previous figure. 

As implied above, comparable two- and three-pulse echo experiments can be conducted with femtosecond IR 
pulses in order to study vibrational dephasing. Notable work in this area has been conducted by Fayer and co- 
workers with picosecond IR pulses obtained from a free-electron laser at Stanford University [ 190 , 191 , 192 , 
193 , 194 and 195 ]. Vibrational states can also be prepared with visible femtosecond pulses through the use of 
stimulated Raman coherences. In this family of methods, two laser pulses with frequencies co 1 and co 2 act 
simultaneously to transfer population to a vibrational level of frequency co 1 - a> 2 . The chief advantage of 
using pairs of visible pulses is that a wider frequency range becomes accessible owing to the availability of 
very short visible pulses; experiments conducted with IR pulses are limited by time-bandwidth considerations 
to lower frequencies. At present, however, the stimulated Raman methods have only been used non- 
resonantly, so very intense femtosecond laser pulses are required, typically in the |LiJ/pulse regime. In contrast, 
photon-echo experiments conducted with resonant electronic states are generally conducted with pulse 
energies in the low nJ regime. 

One important stimulated Raman method, known as the Raman echo experiment, is analogous to the three- 
pulse or stimulated photon-echo discussed above. Two pairs of visible pulses prepare and map vibrational 
coherences into population, respectively, and after a waiting period a fifth pulse is used to stimulate rephasing 
and echo formation. Berg and co-workers have used the Raman echo to probe the vibrational dephasing of 
rotating methyl groups in different solvent environments [ 196 , 197 ]. Tokmakoff, Fleming, and their co- 
workers [ 198 , 199 , 200 and 201 ] have exploited the intrinsic two-dimensionality of the Raman echo 
experiment to explore anharmonic coupling between intermolecular modes in liquid CS 2 ; each axis of the 
experiment returns information that is analogous to that available from conventional stimulated coherent 
Raman spectroscopy, but in the two-dimensional representation cross-peaks appear that directly report 
coupling between two vibrational modes. 
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B2.1.3.4 COHERENT CONTROL AND FUTURE ULTRAFAST SPECTROSCOPIES 


A number of investigators are now developing pulse-shaping and modulation techniques that are useful with 
ultrashort laser pulses. These methods will permit preparation of precisely timed and phased multipulse 
sequences of arbitrary complexity for use in nonlinear spectroscopy. In addition, rather than just exploiting 
pulse sequences to project coherences into echo intensities and time shifts for spectroscopic purposes, as in 
the methods discussed above, several investigators are devising pulse sequences to focus wavefunctions onto 


potential surfaces in a non-statistical manner. This concept, known generally as coherent control [ 202 , 203 
and 204 ], refers to attempts to control chemical reactions with specially constructed sequences of ultrashort 
laser pulses of known phase evolution and duration. 

Scherer et al [ 205 , 206 ] showed how to prepare, using interferometric methods, pairs of laser pulses with 
known relative phasing. These pulses were employed in experiments on vapour phase I 2 , in which 
wavepacket motion was detected in terms of fluorescence emission. A more general approach, which can be 
used in principle to generate pulse sequences of any type, is to transform a single input pulse into a shaped 
output profile, with the intensity and phase of the output under control throughout. The idea being exploited 
by a number of investigators, notably Warren and Nelson, is to use a programmable dispersive delay line 
constructed from a pair of diffraction gratings spaced by an active device that is used either to absorb or phase 
shift selectively the frequency-dispersed wavefront. The approach favoured by Warren and co-workers 
exploits a Bragg cell driven by a radio-frequency signal obtained from a frequency synthesizer and a 
computer-controlled arbitrary waveform generator [ 207 ]. Nelson and co-workers use a computer-controlled 
liquid-crystal pixel array as a mask [ 208 ], In the future, it is likely that one or both of these approaches will 
allow execution of currently impossible nonlinear spectroscopies with highly selective information content. 
One can take inspiration from the complex pulse sequences used in modern multiple-dimension NMR 
spectroscopy to suppress unwanted interfering resonances and to enhance selectively the resonances from 
targeted nuclei. 

A simple example of what is possible now, even with two-pulse sequences, is the work by Shank and co- 
workers on focused RISRS wavepackets. Bardeen et al [ 209 ] used pump pulses prepared on purpose with a 
linear, negative chirp to enhance the magnitude of wavepackets driven to the ground state by impulsive 
stimulated Raman scattering. In the simplest application, this kind of approach might be used to make it easier 
to detect weakly displaced normal modes in dynamic absorption spectroscopy. In a mode-selective chemistry 
example, focusing of wavepackets [ 204 , 210 ] might be used to prepare a certain vibrational superposition 
state, which could be subsequently excited by another laser pulse to produce an enhanced product yield [ 211 ]. 
The goal of this kind of work is to drive chemical reactions in directions that are normally not possible along 
normal kinetically and energetically controlled routes [ 204 ]. 
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B 2.2 Electron, ion and atom scattering 

MR Flannery 


B2.2.1 INTRODUCTION 

This chapter deals with quantal and semiclassical theory of heavy-particle and electron-atom collisions. Basic 
and useful formulae for cross sections, rates and associated quantities are presented. A consistent description 
of the mathematics and vocabulary of scattering is provided. Topics covered include collisions, rate 
coefficients, quantal transition rates and cross sections, Born cross sections, quantal potential scattering, 
collisions between identical particles, quantal inelastic heavy-particle collisions, electron-atom inelastic 
collisions, semiclassical inelastic scattering and long-range interactions. 


B2.2.2 COLLISIONS 

B2.2.2.1 DIFFERENTIAL AND INTEGRAL CROSS SECTIONS 

A uniform monoenergetic beam of test or projectile particles A with number density 7V~ A and velocity v A is 
incident on a single field or target particle B of velocity v B . The direction of the relative velocity w = v A -v B is 
along the Z-axis of a Cartesian XYZ frame of reference. The incident current (or intensity) is theny^ = N^v, 
which is the number of test particles crossing unit area normal to the beam in unit time. The differential cross 
section for scattering of the test particles into unit solid angle dQ = d(cos \|/) d(|) about the direction v'(\|/, §) of 
the final relative motion is 

Number of lesl particles scattered by one field 
d&li: $.&) _ particle into unit solid angle perunil time 
di 2 C uirent j t of inci dent beam 

The number of particles scattered per unit time by the field particle and detected per unit time is then 


d.Vj , dfi dr* d-7 <U 

where the detector, located along the scattered direction v\\\f, §), subtends an angle dQ = dA/r at the 

scattering centre and projects an area cL4 = r 2 d(cos 0) d(|) normal to the scattered beam. Thus [da/ dQ] dQ is 
the cross-sectional area of the beam that is intercepted by one target particle and scattered into the solid angle 

dA/r of a cone with axis along v'(\|/, §) and vertical angle d\|/. In classical terms ( figure B2.2.1 ), the number of 
particles detected per second about direction (\|/ ? §) is the number N A v(bdb d<\>) of incident particles crossing 
the initial areal element bdbd(|) per second. 


Hence 




bdh 


d{cos iff) 


For an incident current flowing between two cylinders of radii b and b + db, theny.27i[da /dQ] d(cos \|/) is the 
number of particles scattered per second between the two cones of semivertical angles \|/, \|/ + d\|/ (figure 
B2.2.1). 

dA=r*dSl 



Figure B2.2.1. Scattering of a beam with current^ = 7V A v particles per unit area incident between two 
cylinders of radii b and b + db by one particle at rest in the laboratory. 

The integral cross section for scattering over all directions is 


v(v)=j d(^s^)/ r^u^*)W 


The integral cross section is therefore the effective area presented by each field particle B for scattering of the 
test particles A into all directions. The probability that the test particles are scattered into a given direction 
v'(\|/, (|)) is the ratio 

dtj(v: $,$) 


of the differential -to-integral cross sections. 
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B2.2.2.2 COLLISION RATES, COLLISION FREQUENCY AND PATH LENGTH 

An electron or atomic beam of (projectile or test) particles A with density N± of particles per cm 3 travels with 
speed v and energy E through an infinitesimal thickness dx of (target or field) gas particles B at rest with 

density N n particles per cm 3 . The particles are scattered out of the beam by A-B collisions with integral cross 

_o _i 
section o(E) at a rate (cm J s ) given by the total number of collisions between A and B particles 

dA j <g) = -l-V A (£>a(£)J-V B 

= -v B iE)N t \{E) 

in unit time and unit volume. The microscopic rate coefficient (cm 3 s _1 ) for the scattering of one test particle 
by one field particle is k(E) = va(E). The frequency (s _1 ) of collision between one test particle and N B field 
particles (cm -3 ) is v B = k(E)N B . Since v = dx/dt, the variation with x of intensity j\ of the attenuated beam is 
governed by 

r = -[%(£)UW- 
dx 

For constant density N B and speed v, the solution is 

y f U\.v) = Ji(ft\0)cxp[-Af B ff(ft , )A-] 
= JK£,Q)exp(-jrA) 

where X = 1/N b <j(E) = v/v B is the path length between collisions. Sincey^ = N A v, the density N A (E 9 x) obeys a 
similar equation. These equations describe the attenuation of a particle beam A travelling through a target gas 
B. For target gas particles with a distribution /g(v B ) dv B in velocities v A , the microscopic rate then becomes 


-/"*- 


A(/T) = / |^ A - Vu\a(\v A - v u |)/u(^)dvB 

where f — -M^v 2 ^ s ^ e kinetic energy of the projectile beam. For an isothermal beam with an energy 
distribution^^) d E at temperature T, the macroscopic rate coefficient (cm 3 s -1 ) or thermal rate constant is 

k[T)^ f k(E)ME)&B. 

B2.2.2.3 ENERGY AND ANGULAR MOMENTUM: CENTRE OF MASS AND RELATIVE VELOCITY 

The velocity of the centre of mass (CM) of the projectile and target particles of respective masses M A and M B 
is 


V = (M A v A + M n vn)f(M A + M B ), 

The relative velocity is 

v = u A - v El . 

The velocities of A and B in terms of v and v are 




vq = V-— — v. 

The total kinetic energy then decomposes into the sum 

of the energy E CM = \MV 2 of the CM with mass M= (M A + M B ), and the energy E^ = \M AB v 2 of relative 
motion, where the reduced mass M AB is M A M B /(M A + M B ). Let /? be the position of the CM relative to a 
fixed origin O and r be the inter-particle separation. The total angular momentum about O similarly 
decomposes into the sum 

L = R x MV I r x M AR v 

of angular momenta of the CM and of relative motion. For any collision in the absence of any external field, 
the energy E CM and angular momentum L CM of the CM are always conserved for all types of collision. The 
two species, A and B, may be electrons, ions, atoms or molecules, with or without any internal structure and 
may therefore possess internal energy and angular momentum which must be taken into account. For 
structured particles E^ and L rel can change in a collision. 

B2.2.2.4 ELASTIC SCATTERING 

A{ff) I H(j5)-* Ate) * Bifih 

Elastic scattering involves no permanent changes in the internal structures (states a and P) of A and B. Both 
the energy i? rel and angular momentum L VQ ^AB) of relative motion are therefore all conserved. 


B2.2.2.5 INELASTIC SCATTERING 


Inelastic scattering produces a permanent change in the internal energy and angular momentum state of one or 
both structured collision partners A and B, which retain their original identity after the collision. For inelastic / 
= (a, P) — >f= (a',P') collisional transitions, the energy Ej j = ^Af\^vf r of relative motion, before (/) and after 

(J) the collision satisfies the energy conservation condition, 

E; + t a (A) i ep(B) = E f + e a (A) + <f„.(B> 

where e A B are the internal energies of A and B. The maximum amount of kinetic energy that can be 
transferred to internal energy is limited to the initial kinetic energy of relative motion, E^fAB) = \ MmivJ- 

Excitation implies *; = ^(Af + f^fB) < c^(A) + *^{B) m s /de-excitation (or superelastic) implies e,< e f 

and energy resonance or excitation transfer implies e^. = e« Changes in angular momentum are limited by the 
conservation requirement that 

i rd (0 +L a (A) + L fi (B) = £,&(/) + LAA) + L?{B) 

where L a o denotes the internal angular momentum of each isolated species. Collisions, in which only angular 
momentum is transferred without any energy change, are called quasi-elastic collisions. 

B2.2.2.6 REACTIVE SCATTERING 

A i B -> C h D 

Reactive scattering or a chemical reaction is characterized by a rearrangement of the component particles 
within the collision system, thereby resulting in a change of the physical and chemical identity of the original 
collision reactants A + B into different collision products C + D. Total mass is conserved. The reaction is 
exothermic when £" rel (CD) > i? rel (AB) and is endothermic when i? rel (CD) < i? rel (AB). A threshold energy is 
required for the endothermic reaction. 

B2.2.2.7 CENTRE-OF-MASS TO LABORATORY CROSS SECTION CONVERSION 

Theorists calculate cross sections in the CM frame while experimentalists usually measure cross sections in 
the laboratory frame of reference. The laboratory (Lab) system is the coordinate frame in which the target 
particle B is at rest before the collision i.e. v B = 0. The centre of mass (CM) system (or barycentric system) is 
the coordinate frame in which the CM is at rest, i.e. v = 0. Since each scattering of projectile A into (\|/, (|)) is 
accompanied by a recoil of target B into (n - \\f, fy + 71) in the CM frame, the cross sections for scattering of A 
and B are related by 

I dSi IrM i dn lew" 


In the Lab frame, the projectile is scattered by A and the target, originally at rest, recoils through angle B . 
The number of particles scattered into each solid angle in each frame remains the same, the relative speed v is 
now v A and 7. = 7V A v in each frame. Hence 


d"A<»A. *) 


da 


A J Lab 




"I^aJL <& Jo, 

"id« B jL dn J CM 


(A) TWO-BODY ELASTIC SCATTERING 


A(«) + B05)-^A<ar)H Btjff). 

The scattering and recoil angles A and B in the Lab frame are related to the CM scattering angle \|/ by 

sin $ 


tanft A = 


i + y cos ^ 


%=I(7T-^) 


y = M A /Atf B 

< B < ijr. 


The elastic cross sections for scattering and recoil in the Lab-frame are related to the cross section in the CM- 
firame by 


d<T A (#A 


dft 


,m _ (Mr 1 


(1 \ Y 2 + 2ycosty' 2 {d<T($.tl>) 


A J Lato 


ycos^l L d£2 


i 


M 




(BJ TWO-BODY INELASTIC OR REACTIVE SCATTERING PROCESS A+B^C + D 

The energies E. and iy of relative motion of A and B and of C and D, respectively satisfy EJE i = 1-e ^ AE\, 
where e^= g,- g ? . is the increase in internal energy. The scattering and recoil angles are 


tan At s 


sin^ 


tantfu = 


sin p 




The Lab and CM cross sections are then related by 




lift JcM 


where j denotes C or D. The scattering of a beam from a stationary target is governed by these equations. A 
crossed beam experiment in which two beams intersect at an angle is not in the Lab-frame. In this case the 
measured quantities can be similarly transformed [1] to CM for comparison with theoretical calculations. 


B2.2.3 MACROSCOPIC RATE COEFFICIENTS 


B2.2.3.1 SCATTERING RATE 


^ = -kN A U)NnU) - -v*N A {t). 

iif 
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A distribution/ A (v A ) of N^(t) test particles (cm J ) of species A in a beam collisionally interacts with a 
distribution /^(vt.) of N^(t) field particles of species B. Collisions with B will scatter A out of the beam at the 


loss rate (cm 3 s ) 


cm* s l ) = / f A {v A )dv A / 


A (cm s )= / f A {v A )dv A f [vo(u)]_fn(vu)dvn 


The macroscopic rate coefficient k (cm 3 s *) for elastic collisions between the ensembles A and B is 


ft (em 3 s" ] ) = i) AE j / ff Urci^nH expf-^iJdl^i 


in terms of the integral cross section a(v) for A-B elastic scattering at relative speed v = |v A - v fi |. The 

microscopic rate coefficient is va(v). The frequency v B (s _1 ) of collision between one test particle A with N B 
field particles is kN B . 

The rate coefficient for elastic scattering between two species with non-isothermal Maxwellian distributions is 
then 


VAB = 


71 \M A M H ) 


1/2 


where 


1 M A M B v* 


2A n (M A r n i A/ n r A ) 


and 


A(I) = (uafj) / ff (Oe exp(-0 d« (cm 1 s" 1 ) 


For isothermal distributions T A = T B = T, the rate is 


k(Tz) = {v c ) I ff(c c Kexp(-e c )de c (cm 3 s" 1 ) 


19 1/9 

where e = jM AB v /k B T and ( v AB ) = (8k B T/n M^g) . The rate of collisions of electrons A at temperature T Q 
with a gas of heavy-particles B at temperature T B is 

X^a^ab)] = -k E N A (t)N B (t) = -v'lb/Va(0 

where e Q = \m Q v 2 lk B T Q and < v e > = (Sk B T Q /nm Q ) m . 

B2.2.3.2 ENERGY TRANSFER RATE 

Each of the species A transfers energy s AB to each species B. The amount of energy transferred per unit 
volume in unit time from ensemble A to ensemble B is 

where the macroscopic rate coefficient k E (energy cm 3 s ) for the averaged energy loss (e AB ) is 

The amount of energy lost in unit time, the energy-loss frequency, is v EB = k^N^t). The energy-loss rate 
coefficient for two-temperature Maxwellian distributions is 

2M A M B a _ / K . _ 2 _ _. 

*g (7a* T" B ) = <tb(7" a - 7b)w ab / fT|>(f ie |)£f rel r cxp(-f rel ) d^ e i 

[M A \ M u )- J 


where g d (*t%i) is the momentum transfer cross section at reduced energy * W |. For isothermal distributions, T A 
= T B and the energy rate coefficient k E of course then vanishes. 

B2.2.3.3 TRANSPORT CROSS SECTIONS AND COLLISION INTEGRALS 

Transport cross sections are defined for integer n = 1,2,3. . ., as 

The diffusion and viscosity cross sections are given by the transport cross sections cr ) and f ff (2 ' 5 
respectively. 

Collision integrals are defined for integer s = 0,1,2,. . ., as 

Q<"*(D = [(* + IJtCJtur)** 2 ]- 1 / ff (w '(E)E* H exp(-E/i B r) d£ 
= [(* + I)!]"' / <r w («)« i+l exp-<?d<? 


where e = \M^y Ik^T. The external factors are chosen so that these expressions for g^ and Qy 1 ^ reduce to 

n d for classical rigid spheres of diameter d. The rate coefficient k (cm 3 s _1 ) for scattering can then be 
expressed, in terms of the collision integral, as equal to v AB Q( > \ The amount of energy lost per cm 3 per 

second by collision can be expressed in terms of Q^ ' \ Tables of transport cross sections and collision 
integrals for (n, 6, 4) ion-neutral interactions are available [2, 3]. 

(C) CHAPMAN-ENSKOG MOBILITY FORMULA 

When ions move under equilibrium conditions in a gas and an external electric field, the energy gained from 
the electric field E between collisions is lost to the gas upon collision so that the ions move with a constant 
drift speed v d = KE. The mobility K of ions of charge e in a gas of density N is given in terms of the collision 
integral by the Chapman-Enskog formula [2] 


K -M»krY# L '' mr 
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B2.2.4 QUANTAL TRANSITION RATES AND CROSS SECTIONS 

B2.2.4.1 MICROSCOPIC RATE OF TRANSITIONS 
In the general elastic/inelastic collision process 

A(<i) + B(/S)- A(«)-B(/?') 

the external scattering or deflection of a beam of projectile particles A (electrons, ions, atoms) by target 
particles B (atoms, molecules) is accompanied by transitions (electronic, vibrational, rotational) within the 
internal structure of either or both collision partners. For a beam with incident momentum p. = hk. in the range 

(p ,p + dp) or directed energy E. = (E., p.) in the range (E. 9 E. + dE), the translational states representing the 

A-B relative or external motion undergo free-free transitions (2?., E f + dE.) — » (E« Er+ d£\) within the 
translational continuum, while the structured particles undergo bound-bound (excitation, de-excitation, 
excitation transfer) or bound-free (ionization, dissociation) transitions i = (a, P) — >/= (a', P') in their internal 
electronic, vibrational or rotational structure. The transition frequency (s ) for this collision is 

!^£ (s -. ) = il±£|y ;i |V,(/r,> 

dp/ h g t ff 

which is an average over the g i initial degenerate internal states i and a sum over all gr final degenerate 
internal states/of the isolated systems A and B. It is therefore the probability per unit time for scattering from 
a specified E f — (external) continuum state into unit solid angle dp, accompanied by a transition from any one 

of the g t initial states (a, P) to all final internal states (a 1 , p f ) of degeneracy g^and to all final translational 
states pXE^d Er of relative motion consistent with energy conservation. The double summation X z - ^is over the 
g. initial and g f final internal states of A and B with total energy e. and e f respectively. 


Check: The dimension of [|V r | p] is E, [ft] = Et so that dWJdp^ indeed has the correct dimension of/ 
(A) INTERACTION MATRIX ELEMENT 
The matrix element 

V f i = (N f <l> f \Vir A .m.R)\N i V?)rR = V* f 
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is an integration over the internal coordinates r = r A , r B of the electrons of A and B and over the channel 
vector I? for A-B relative motion. The matrix element of the mutual electrostatic interaction V(r A , r B , R) 
couples the eigenfunction Nt^*(R. r\, rp) of [H^ + Mm + V]of [// rel +// int + F] for the complete collision 

system for all R to the final R — » oo asymptotic state 7V f O f (R, r A , r B ), which is an eigenfunction only of the 
unperturbed Hamiltonian [// rel + /7j nt ]- The wavefunction 

for the full collision system with Hamiltonian // rel + /7 int + V tends at asymptotic R to 

e ifc, k -| 


*; -Ep^^+y/y^^)^- 1 


which represents an incoming plane wave of unit amplitude in the incident elastic channel i and an outgoing 
spherical waves of amplitude ^.. in all channels^, including i. The Kronecker symbol means 8- = 1, i =j and 
8- = 0,i^y . The final state at infinite separation R is 

which is an eigenfunction only of /? rel + // inr The plane wave of unit amplitude describes the external relative 
motion with Hamiltonian // rel and § ('"a^b ^ r B^ describes the internal, isolated, normalized atomic 
eigenstates of A and B with internal Hamiltonian /7 inr The factors N f ^provide the possibility of having 
translational (scattering) states with arbitrary amplitudes which are not necessarily unity. 

(B) TRANSITION OPERATOR 

The interaction matrix element can also be written as 

V ft mm {N f Q f \T\N& t ) 

where the transition operator, F is defined by F® = V^P. The transition operator ^therefore couples states 
which are eigenfunctions of the same unperturbed Hamiltonian // j + // int , in contrast to V which couples 


states *¥: and ®<> belonging to different Hamiltonians. 
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B2.2.4.2 DETAILED BALANCE BETWEEN RATES 


The frequency (number per second) of/ -^/transitions from all g. degenerate initial internal states and from 
the p^. d E f initial external translational states is equal to the reverse frequency from the gr degenerate final 
internal states and the p A ESmsX external translational states. The detailed balance relation between the 
forward and reverse frequencies is therefore 

[gi/H <1E; dp,] ( -^jM dp, = [g/PfdEf dp f \ ( — rpj d# 

since V^= V ^ From energy conservation e f + E f = e,+ E„ then d E f = d E„ The differential frequencies 

d% dW if __ dW fi _ dfij, 

dp f " r ' dp/ " J ; dpr dp; 

for the forward and reverse transitions, / ^/ are therefore equal. 
B2.2.4.3 ENERGY DENSITY OF CONTINUUM STATES 

The continuum wavefunctions § (R) for the states of the A-B relative motion satisfy the orthonormality 
condition 


f piEHE { <p p (R}$;{R)<\n= I, 


The number of translational states per unit volume dR with directed energies E = (E, p) in the range [E, E + 

dE] is p(E)d E. This orthonormality condition for continuum states is analogous to the condition Z|( ^l^/)! = 
1 for bound states. For plane waves, §JR) = N exp (ip • R/h), then 

<<M«p) = \N\ 2 (2jrh) 3 S(p-p) = \N\ 2 (2xMk-k') 
nip p(E) 

Note, irrespective of the method chosen to normalize the wavefunctions, that 


always. The amplitude \N\ does, however, depend on the choice of normalization. 
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(1) For momentum normalized states, ( <L f |<L) = $(p -/?'), \N\ = (2n h) and the density of states p(E) = mp. 

(2) For wavevector normalized states, < § ,|(|) ) = 5(k - k f ), \N\ = (2n) and the density of states p(E) = (mp/ 

r>\ p p 

o I/O 

(3) For energy-normalized states, ( <|) J<L) = 5(E - E*), p(E) = 1 and \N\ = (mp I h ) . 

(4) For waves with unit amplitude, \N\ = 1 and p(E) = (mp I h 3 ). 

Note that ( § \ (b ) p(2T)d £" is dimensionless for all cases and yields unity for a single particle when 
integrated over all E. The number of states in the phase-space element dE dR is 

dn = \N\ 2 p(E) dEdR = dpdR/(2w%f 

i.e. each translational state occupies a cell of phase volume (2nh) . The density of states in the interval [E,E + 
d E] is p(E) = 4np(E). The number of translational states per unit volume with energy in the scalar range \E, E 
+ d E] is 

0/2 


|W| 2 p(£)d£ = ■l (2;rffl) " ,1 g^dE. 
V?r A 3 


Check. The number of free particles with all momenta/; in equilibrium with a gas bath of volume v at 
temperature Tis the translational partition function 2 t . Since the fraction of particles with energy E is exp (- 
E/k^T)/2 t , the Maxwell distribution 

\NfVpiE)dE 
^f 

= ^={EfkvT) in ew(-EfkuT)d<EfkvT) 

is then recovered. 
CURRENT 

Current is the number of particles crossing unit area in unit time. The current in a beam with directed energy 
E within the range (E, E + dE) is 

jdE = v\N\ 1 p{E)dE = (p 2 /h y )dE. 
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9 o 

The current per unit dE is the current density j = (p I h ). The quantal expression for current 


j = 2^7»;^-^ v *;i 


when applied to the plane wave 6 = N Qxp(ip-R/ft), gives y = |N| v. The current in a(E,E + d£)-beam of 


plane waves is theny [p (E)d E] so that the current density is j(E) = Jp(E) =\N\ vp(E), as before. 
B2.2.4.4 INELASTIC CROSS SECTIONS 

The differential cross section da^/ dry for i — > /transitions from any one of the g f initial states is defined as 
[dR f r/ dp J / gj'p the transition frequency per unit incident current. Since current is the number of particles 

crossing unit area in unit time, the cross section is therefore the effective area presented by the target towards / 
-^/internal transitions in the internal structures of the collision partners which are scattered into unit solid 
angle d pyabout direction pAn the CM-frame. 

(A) BASIC EXPRESSION FOR CROSS SECTION 

The differential cross section for 

collisions is therefore defined as 

d*7},- 1 dff,-j 27r/p f p s \ 1 ^-^ . n 

dp/ XiA #y » V J* /taj? 

which is an average over the g f initial internal degenerate states and a sum over the g, final degenerate states. 

9 9^ 

Since/. = |NJ v i p i =p i Ih , an alternative form [4] for the cross section is 

do,, - 2T fpA ^ Kff/*/|V(r A .rB. WIV/H 1 . 


d P/ 


Jh* \ ft / t/ 


B2.2.4.5 DETAILED BALANCE BETWEEN CROSS SECTIONS 

When cast in terms of cross sections, the detailed balance relation in section B2.2.4.2 is 


dp f dpi 
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The basic relationship satisfied by the differential cross sections for the forward and reverse / ^/transitions is 




(A) COLLISION STRENGTHS 

Collision strengths Q^ exploit this detailed balance relation by being defined as 

^p/ = giPJVtfiEf) = gfP 2 f Vfi(Ef) = &fi- 


They are therefore symmetrical in i and/ 
(B) REACTIVE PROCESSES 
For any reactive process 

A + B ^C+D 
the detailed balance relations involving differential/integral cross sections are 


SAtoPxnl—Tj- =gcgnP CD — r^ 


where pj K = 2Mm £jk, in terms of the reduced mass M JK and relative energy £ JK of species J and K. 

B2.2.4.6 EXAMPLES OF DETAILED BALANCE 
(A) EXCITATION-DE-EXCITATION 


*'<*'>= (SSW 
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With energy conservation, £\ = E,+ (e r- e r -) = Er+ e^ the cross section for superelastic collisions {Er> E^.) 

can be obtained from a r at energy E. via the relation 

ij i 


afiiEi - * fi ) = (l - &\ (—joifiEt) 


(B) DISSOCIATIVE RECOMBINATION/ASSOCIATIVE IONIZATION 

Dissociative recombination and associative ionization are represented by the forward and backward directions 
of 

The respective cross sections a DR and a AI are related by 




where the statistical weight of each species j involved is denoted by g.. 

(C) RADIATIVE RECOMBINATION/PHOTOIONIZATION 

Similarly, the cross sections for radiative recombination (RR) and for photoionization (PI), the forward and 
reverse directions of 


A' +*" t± hv + A(n£) 


are related by 


The photon statistical weight is g v = 2, corresponding to the two directions of polarization of the photon. The 
photon energy E is related to its momentum p and wavenumber k y and to the ionization energy '"< of the 
atom A(ni) by 

E = } j v = p v c = fik v c = I„t + £ c 
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where c is the speed of light. This ratio is 

£* _ (/iv) 2 _ g 2 (Au) 2 
^ (2E c m^c 2 ) 2E^ 

B2.2.4.7 FOUR USEFUL EXPRESSIONS FOR THE CROSS SECTION 

The final expressions to be used for the calculation of cross sections depend on the particular choice of 
normalization of the continuum wavefunction for relative motion. Since it is often a vexing problem and is a 
continued source of confusion and error in the literature, these final expressions are worked out below. The 
external relative-motion part of the system wavefunction " ^ = **V-"' r &* r sJ is A <p ? >(/£) si nce \N\ 2 p = 

rnp/h , the density p(£) of continuum states therefore depends on the choice of normalization factor TV 
adopted for the continuum wave. For future reference, the amplitude N and the energy densities p(E) 
associated with four common methods adopted for normalization of continuum waves are summarized in 

table B2.2.1. Also included is the amplitude "J of the corresponding radial partial wave 


« f ,(r)-ysinU; -Itjr + ijA 


of section B2.2. 6.1 . The external multiplicative factors y r = (2n/ fl)(p. p.//.) in the basic formula in section 

B2. 2.4.4 for the cross section are also summarized in table B2.2.1 for the various normalization schemes. The 
reduced masses before and after the collision are m f = M A M B /(M A + M fi ) and mr= M C M D /(M C + M-^), 
respectively. 


(A) ENERGY-NORMALIZED INITIAL AND FINAL STATES 

The wavefunctions 

X P = f> ]/2 NV = (mp{27iTP)>'' 2 * p lR.T A ,r Q ) 
are energy-normalized according to 

OtpW-OT-*')- 

The basic formula in section B2. 2.4.4 with/ = f J /(2itk) yields 


^, kin 1 '' 1 Wm^Jgr 1 


2 
if\ - 
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The transition probability is 


t/ J 


the magnitude squared of the element T^of the transition matrix T between Ai and X* the two energy- 
normalized eigenfunctions of // rel + /7 int + Vand // rel + /7 int , respectively. The detailed balance relation in 
this case is simply 

\Tif\ 2 = ]T fi \ 2 

thereby verifying that \T.r is indeed the i — » /transition probability for transitions between all g. initial and g, 
final states. This type of normalization is convenient for rearrangement collisions such as dissociative, 
radiative and dielectronic recombination. 

(B) UNIT AMPLITUDE INITIAL AND FINAL STATES 

Here the initial and final wavefunctions with unit amplitude are J " and O^ They are each normalized 
according to 

The basic expression in section B2. 2.4.4 with/. = |N-| 2 v.p . and |7VJ 2 p^= mr-pj }? reduces to 


where the scattering amplitude is 

] 


;2> 


J)f = - — (2>n f {V){<b f \V\*;) 

which couples scattering states J " and O^of unit amplitude. This expression is also applicable for 

rearrangement collisions A + B^>C + Dby including the reduced mass m^= M C M^/(M C + M D ) of the 
reacted species after the collision. The integral cross section consistent with the above scattering amplitude is, 

271 
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at relative energy r " /^Mab^ The scattering amplitude consistent with the common use of 

CFif(E) = ^ [" d(cosO) r \fif{0,v)\ 2 d*p 

k i J0 JO 

for rearrangement collisions is 

f if = ~{2^mJ/h 2 ){<t>f\V\*:}. 

Both conventions are identical only for direct collisions A (a) + B(P) — » A(a')+B(P'). This normalization is 
customary [5] for elastic and inelastic scattering processes. 

For symmetrical potentials V(r) scattering is confined to a plane and/., depends only on scattering angle = £. 

ij l 

kf 

(C) MOMENTUM-NORMALIZED INITIAL AND FINAL STATES 

Here the initial and final wavefunctions ^ p — * ^ 2 ' J ar ^ p ' ~ ■ '' -^ are normalized according to 

The cross section B2. 2.4.4 is then 

^=^i./i,<e.»)i i 

d/>, L-; 

where the scattering amplitude [6, 7 and 8] is now 

f if = -aw 2 ht)(2m f /h 2 )\# f \v\i:;)i 


(D) ENERGY-NORMALIZED FINAL AND UNIT AMPLITUDE INITIAL STATES 

Here the basic formula B2. 2.4.4 yields 


dp j ft V; 
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which couples the initial scattering state ^ of unit amplitude with the energy-normalized final state 

if = p ] / 2 N f ^ f 

. This normalization is customary for photoionization problems. 


B2.2.5 BORN CROSS SECTIONS 

Here an (undistorted) plane wave of unit amplitude is adopted for the channel wavefunction 

j / j v r r j t f or the complete system. The differential cross section for elastic (i =f) or inelastic 

scattering (i ±f) into K/-(0, (|)) is then 

The Born scattering amplitude for A-B collisions is 

which is the Fourier transform of the interaction potential 

V /j (i£) = ^; i (r)|^(r,fl)|^ J inL (r)} 

which couples the initial and final isolated states ■ / ' r * "" ^ rA ^^ B ^of the atoms. The diagonal potential 
V fi (R) is the static interaction for elastic scattering. The Born scattering amplitude is a pure function only of 
the collisional momentum change 

q = hK = Af ah(Vj- - v t ) = h(ki - k f ) 

where v is the A-B relative velocity. Since — J " / ™ r ^ cos0, the .Sorn integral cross section is 

<>#<*,-) = t^-t f i//f w)i 2 <?^ 
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where ^ ± ± ' J " / ' are the maximum and minimum momentum changes consistent with energy 

conservation. For symmetric interactions VJR), then 


/> m = -^/v / ,,*)^/Hd fl . 


B2.2.5.1 FERMI GOLDEN RULES 


Rule A . The transition rate (probability per unit time) for a transition from state <D . of a quantum system to a 
number p(E) dE of continuum states <& E by an external perturbation V is 


2tt 


2jt 


i»ij = -jj-l(*rlK|*i)|->j/(£?) s — | V J€ |V/(£) 


to first order in V. Since |^- e | P^has the dimension of energy, w^has the dimension t . 

Rule B. When the direct coupling V i( _ from only the initial state to the continuum vanishes, but the coupling 
V * for n * i, the transition can then occur via the intermediate states n at the rate 


("j 


2t t-^ 


Vn.Kr 


£-£« 


/>;(£). 


These rules, A and B (which are not exact) are useful for both scattering and radiative processes and are often 
referenced as Fermi's Rules 2 and 1, respectively. 

SCATTERING EXAMPLE 

The cross section for inelastic scattering of beam of particles by potential V(r, R) is 

dp, Ji ' 

A plane-wave monoenergic beam, <&j./ = JVi". /cxpOp; / ■ fJ/fiMr./^has currenty. = |JV.| v. and density 
determined from |AI" / | 2 /r f <E) = M,ujF//(27rft)* Hence 


— 
dpj 


Vf_ 


1 2M AD 


4jt ft 


^ |V /< (f)e i(p, -''' ),r/ "dr 
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Since this agrees with the first Born differential cross section for (in)elastic scattering, Fermi's Rule 2 is 
therefore valid to first order in the interaction V. 


B2.2.5.2 ION (ELECTRON)-ATOM COLLISIONS 

The electrostatic interaction between a structureless projectile ion P of charge Z p e and an atom A with nuclear 
charge Z^e is 

With the use of Bethe's integral 

e KR . _ 4t 


/ 


dfl—e^ 


IR-t-^I K* 

the Born scattering amplitude (see B2.2.5 ) reduces to 

which is a function only of momentum transfer q = hK. The dimensionless inelastic form factor for / —>f 
inelastic transitions between states § f ^of atom A with Z A electrons is defined as 

/=1 

where the integration is over all electron positions denoted collectively by r = r.. The integrated cross section 
is 

"pa ^- ^ 

7 i — 77 / ^a"j/ ~~ ^MMI 77; — 77 
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where v Q = e Ih is the atomic unit (au) of velocity. The dimensionless momentum change qlm Q v$ is Ka^. In the 


heavy-particle or high-energy limit, q,^>co and 


q- ^ - - 


A L 2M|i A t'J5 A J 

where A is ~ = iy- 2?. is the energy lost by the projectile. Since 


fifW = fi ■ (?)% + /c W</ W 


can be expressed in terms of the individual two-body amplitudes f c for Coulomb elastic scattering between 
particles of charges z 1 and z 2 , the Born cross section for inelastic collisions can be written [9, H, 28] in the 
useful form 




where P^iq) — \Fi"{q)\ 2 ^ the transition probability for which the impulsive transfer of momentum q to atom 
A and where 


( 




4JW C 2 P Z<U 4 


is the differential cross section for elastic (Coulomb) scattering with momentum q transferred from the 
projectile of charge Z p e to one electron of atom A. 

B2.2.5.3 ATOM-ATOM COLLISIONS 

The Born integral cross section for specific (aP) — > (a'P') transitions in the collision 

A( ff ) + B(^) -+ A(a ') + B(£') 
in terms of the atomic form factors is 
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B2.2.5.4 QUANTAL AND CLASSICAL IMPULSE CROSS SECTIONS 

In the impulse approximation [6, 9], the integral cross section for (a, P) — » (a', P') transitions in A is 




where fflfis the scattering amplitude for elastic P = P' or inelastic P ^ P' collisions between projectile B and 

an orbital electron of A. For structureless ions B, the Coulomb fW(q) for elastic electron-ion collisions 

reproduces the Born approximation for B-A collisions. When Born amplitudes fW(q) are used for fast atom 

B-e collisions, then the Born approximation for atom-atom collisions is also recovered for general scattering 
amplitudes Jfjjj . For slow atoms B ? y^ B is dominated by s-wave elastic scattering so that/ eB = -a and a eB = 4tt 

a where a is the scattering length. Then 




which is a good approximation for collisional transitions nl — > «T in Rydberg atoms A. The full quantal 
impulse cross section [6, 9] for general fWhas recently been presented in a valuable new form [28] which is 

the appropriate representation for direct classical correspondence. The classical impulse cross section was 
then defined [ 28 ] to yield the first general expression for the classical impulse cross section for n£ - n'& and n& 
- et electronic transitions. The cross section satisfies the optical theorem and detailed balance. Direct 
connection with the classical binary encounter approximation (BEA) was established and the derived ni - rt 
and nl- e cross sections reproduce the standard BEA cross sections. 

B2.2.5.5 ATOMIC FORM FACTOR AND GENERALIZED OSCILLATOR STRENGTH 

In terms of the form factor FJK), the generalized oscillator strength is defined as 


f if (K) = (*^) \Fvrn 1 = ^im*>i 2 




which tends to the dipole oscillator strength in the K — » limit. 
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(A) SUM RULES 


f i 

- f f 


= N+jr i m\vyiiK-ir,-r k ))\<V I )\ 


j** 


t< 


where N is the number of electrons. The summation Y Cextends over all discrete and continuum states. 

) 

(B) ENERGY-CHANGE MOMENTS 

The energy-change moments are defined as 

5(01,1) = J2(2A£}-; L )"/f/(^) 


= ^(2A£}; i r hi |F i/ (^)| 2 {/r« )- 2 . 


/¥' 


The exact energy-change moments for H(ls) are 


S(-l. K) = {1 - [1 + ^Kanfr^iKao)- 2 
■V(0, K) = 1 
S(l t K) = (KtKd 2 + i 
5(2. tf) = (Ka»f + 4(Ku ) 2 + y- 


B2.2.5.6 FORM FACTORS FOR ATOMIC HYDROGEN 


The probability of a transition i —» /resulting from any external perturbation which impulsively transfers 
momentum q to the internal momenta of the electrons of the target system is 

P i/ (q)=\F Ji (q)\ 2 . 
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The impulse can be due to sudden collision with particles or to exposure to electromagnetic radiation. The 
physical significance of the form factor is that P t As the impulsive transition probability for any atom. For nl 
— > riV transitions in atomic hydrogen, 

itt.m' 

with ¥(>) = R n jL r Wi„ffi, can be decomposed as 

P»l,Mq) = (21 + YH2I' + 1) £ ill + ]) ( ' ) [/fill ,{q)] 2 

where (. . .) is the Wigner's 3/-symbol and f'j n ^{q)is the radial integral 

/irfivfa)= / RiKr)«i.r(r),//.(</r)r 2 dr 
Jo 

where j L is the modified Bessel function. For nlm — > «'/W subshell transitions, the amplitude decomposes as 
where M= rn-rri and where the coefficients 




, f (2/ * l)(2r * mil \ 1) V J L I V \\L \ V 


Exact algebraic expressions for the probability 


P«,A<l) = £ £ l«»Tj»V rr/ V«0l 2 

ofn^n' transitions in atomic hydrogen, have been recently derived [ 11 ] as analytical functions of n and ri. 
B2.2.5.7 ROTATIONAL EXCITATION 

For ion-point dipole D interactions, only A J= ±1 transitions are allowed. For ion-point quadrupole Q 
interactions only A J= 0, ±2 transitions are allowed. The Born differential cross sections fory — » J transitions 
are 
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^- y + 1) = 3*7(27^ 
d JL v 4M2/- l)<2J+3} L 


which are all spherical symmetrical. The sum 


__ t | ii> 4 


dJfc/ 


is independent of the initial value of J. The integral cross sections 


ff (d) 


— i(^M!^)° ! 


15 ^ {2J~ l){2J + 3) 

all satisfy the detailed balance relation 

kfaj, + im4 -* J f )=k 2 f {2j ; + iwj, -► /,). 

The summed diffusion cross sections are 


t"» 




B2.2.5.8 LIST OF BORN CROSS SECTIONS FOR MODEL POTENTIALS 


k 1 = (2M AK /7r)E K = 2k sin \9 

U = {2M Aa /1i 2 )V U/k 2 = V/E. 
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For a symmetric potential, the scattering amplitude is 


f slnKR ? 


The Born integral cross section is 


4w = ^f K \f¥w\ lK * K 


which is independent of the sign of the potential V. 
(A) EXPONENTIAL 

V(R) = VgCKp(-fffl) 


(« 2 + K 1 ) 2 

■ J 4. Il„2fj J. i*p.4- 




(B) GAUSSIAN 


V(R)=V Q txp{-a 2 R-) 


-> 




(C) SPHERICAL WELL/BARRIER 


V{R) = V^for/S <a, V[R) = Oforfl > a N Uo = (2M A ufh 2 )V u 

rk " 

ffu(£) = ?-r(Coff*)[1 - **)"* + (*«)"* sin 2Jto - (*«)"* sin" 2*«], 

At low energies,^ — » (2M AB lh 2 )V^I3 and the scattering is isotropic. At high energies, a B (£) E . 
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(D) SCREENED COULOMB INTERACTION 

V(R) = V cxp{-aR)/R. 

U ti 


MK)= - 


ct 2 + K 2 


« B (E) = - ^ 


a 2 (a 2 + 4k 1 ) 

where U Q = 2Z/a Q . At low energies,^ = -U^/a is isotropic. At high energies, ct b — > n (V^IE) (t/ Q a ). 
(E) ELECTRON-ATOM MODEL STATIC INTERACTION 


V(tt) = -N{e 2 /<*<,)['/. + (*>/ ti^xpi-l'/.R/ao). 

2ff s + K 2 


2N 
MO) = — 


a — 2Zftit} 


" y/. 2 V/. 2 + k 2 a 7 y 


For atomic H (Is), N= 1 and Z= 1. For He (Is ), the approximate parameters are Af = 2 and Z = 27/16. 

(F; POLARIZATION POTENTIAL 

ViR) = V 9 /{R 2 + ^> ! 

/„<*> = -I;r^)exp(-0<,> 


ffn(£) = (^)(f ) [1 ~ (l i 4 *^)exp(-4t*d 


)]. 


B2.2.6 QUANTAL POTENTIAL SCATTERING 

The Schrodinger equation 


(" 2^ V ' + V(p) ) *^ (r) = £ ** {r> 
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solved subject to the asymptotic condition 

*fc(r) - exp(ifc-r) + -/(0 + ^)exp(iAT) 

for outgoing spherical waves is equivalent to the solution of the Lippman-Schwinger integral equation 


ip A t(r} = <&l(r) + f G(r, r)U(r f )%(r f )dr f 


where the outgoing Green's function for a free particle is 

] exp{ifc|r — r\\ 


G(r, r'j = 


4jt \t — r'| 


Solution of the scattering amplitude may then be determined from the asymptotic form of i[^t(r)directly or 
from the integral representation 

fO^) = ~(2M^/K 2 )(exp(\k f -r)|V(r)|*;>. 

The differential cross section for elastic scattering is 

B2.2.6.1 PARTIAL WAVE EXPANSION 


A plane wave of unit amplitude can be decomposed according to 

**<r) = cxp(ifc.r) = 4* £/y t (frr)l7,, t (£)K,„ r (f) 


E.m 


where Jf is the spherical Bessel function which varies asymptotically as 

jrtkr) — — sin I kr - -for \ . 
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The addition theorem for spherical harmonics is 


Another useful identity is, 


-^7- = 4 * X! it{kir)j t ikfi -WwikiW^ikf) 


r,wF 


where ^- _ jt? + i 2 - 2Jt,i-/ costf- The system wavefunction i|i+( r ) — ^<£ fc with amplitude TV is expanded 
according to 


4tT,V 


AJ- 


^i f f <t tr)y*„(fc)y f(hJ Cr). 


C,*» 


p ( • 


The radial wave is the solution of the radial Schrodinger equation 


dffV 

dr~ 


i 


k 2 - 


U(r) + 


ta+V) 


ft(r) = 0. 


The reduced potential and energy are f/(r) = (2M AB /h 2 )V(R) and AT 2 = {2M AB lh 2 )E, respectively. They both 
have dimensions of ItfJ" "). Also (ka^) = QEIz^M^lm^. Each ^-partial wave is separately scattered since 

the angular momentum of relative motion is conserved for central forces. The radial waves ^^and 
rityr} var y asymptotically as 

Ni 
r i 

F (t (> ) - c"" sin(JtJ- - \in + t)t). 

The amplitude of the partial radial wave *'" 'is "t = 4n N/k. In table B2.2.1 are displayed the amplitudes N 
and "t appropriate to various choices for normalization of the continuum wavefunctions ^(r). 
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Table B2.2.1 Continuum wavefunction normalization, density of states and cross section factors. 


Type 


<*vl**) 


.V 


N, 


P(.E) Y>f 


Unit amplitude I „ ' /■ 

Wavenumber S{k - k') 

Momentum 

Directed energy &(E - E') 




1 


T 


-■ mm 


«>_„ ^ (2.)"^ m , >w(^)' 

<2*) + 


(«p/M) 


1^ f3^_L^ 


i 


ki 


B2.2.6.2 SCATTERING AMPLITUDES 


For symmetric interactions V= V{r), the wavefunctions 4>* = Ej^f/D^'frjand exp(i£ f -r) are 


decomposed into partial waves. From their asymptotic forms, the following partial wave expansions for the 
scattering amplitude 


i x 


can be deduced. The scattering, transition and reactance matrix elements are defined, in terms of t\vQ phase 
shift iJt suffered by each partial wave, as 

Ti(k) = 2isinq< exp(irjc) 
K t (k) = tatiJfr, 

The asymptotic (kr — » qo) form of i may then be written in terms of the following linear combinations: 
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Ni - i 


£\(kr) - sin(Jtr- £:r/2) + 


\2i) 


2i 
= e'* cosr? f [sin(ftr - Jjt/2) + Kz cos(Ar - £jt/2)] 

expressed as a combinations of standing waves (trigonometric functions), of incoming (-) and outgoing (+) 
spherical waves (exponential functions) and of a standing wave and an outgoing spherical wave. The physical 

significance of the admixture coefficients , ' and 4 is then transparent. The elements are connected by 

S e =l + T t = (]+\K f )/(l-\K t ) 

at St -Fi 

l is real while both 'and f * are complex. In term of the full solutions of the radial Schrodinger equation, 

the 7-matrix element for elastic scattering is 


= -j[ FJFirWirWWilr 


where * " * '«M* r 'is the radial component of the final plane wave. The Born approximation to ' is 


obtained upon the substitution 
B2.2.6.3 INTEGRAL CROSS SECTIONS 


F^ = F f 


= -^(2Hl)[[-Rc5 t ]. 
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The semiclassical version is obtained by the substitution mvZ? = (£ + |)]fi so that K 2 b 2 = (£ + ^) 2 in terms of the 
impact parameter 6. Regarding £as a continuous variable, 

<t(£) = -t / (2f + l)|/fl 2 df = 2.t / \T(h)\ 2 hdh. 
k~ Jfl Jo 

The transition matrix \T(b)\ 2 is therefore the probability of scattering particles with impact parameter b. 
B2.2.6.4 DIFFERENTIAL CROSS SECTIONS 

The differential cross section for elastic scattering is 

where the real and imaginary parts of/(0) are, respectively, 

1 M 
A(0) = — T^ilt ^ i)sin2^P,(cosfl) 

2 *rnf 

B(0) = ^-£(2f + l)[] -cos2^]P< f {cos(?}. 

Their individual contributions to the integral cross sections are 


/4ir 
/l(0) 2 dfi = — ^(2£ - I } sin 2 >j, cos 2 >/, 

f HW) 2 dn = ttX)E2* + '> sin *^ 


(A) EXPANSION IN LEGENDRE POLYNOMIALS 


When expanded as a series of Legendre polynomials p L (cos 0), the differential cross section has the following 
form 

-35- = F 2> '^ <«*<» 
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where the coefficients 

» f+t 
tf L = J] ^ (2£ + ])(2£' + 1)(«'00 1 «'L0) 2 sin;jfcSini/ r cos(^ - ^) 

are determined by the phase shifts '* and the Clebsch-Gordon coefficients (&£ } mm } \ H'LM). 
(B) EXAMPLE: THREE-TERM EXPANSION IN COS® 
The differential cross section can be expanded as 


da{E,9) 1 , , 

— Ty[("rt — t«^) + «i cos<? + jti2 cos" ft]. 


<1Q 


The coefficients are 


^ 


a = ^(2^ l)sin 2 ^ 

«i = 6 J^(f - l>sin iji sin ^+i cos(^ + t - fj f ) 

a 2 = 5^[/^r sin 2 ^7f + c t sin ^ sin pj* l2 cos(j^. 2 - ^)] 


f=U 


where 


ft ^ f(f: + D(2g+i) 

1 (21 + l)(2* + 3) 

_ 3(£ + 1)(f + 2) 

* ~ 2?T3 ' 

(CJ EXAMPLE: S- 4A/D P-WAVE CONTRIBUTIONS 

The combined S-, P-wave (f = 0, 1) contributions to the differential and integral cross sections are 


-— = -r[sin^ ijo + [6 sin ^ sin q L cos( jji - jj t >)] cos# -f Qsin^ ^71 cos^ 91 
&(E) = p"f sin2 ''u + 3sin : i^]. 
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For pure S-wave scattering, the differential cross section (DCS) is isotropic. For pure P-wave scattering, the 
DCS is symmetric about = n/2, where it vanishes; the DCS rises to equal maxima at = 0, n. For combined 
S- and P-wave scattering, the DCS is asymmetric with forward-backward asymmetry. 

B2.2.6.5 OPTICAL THEOREM 

The optical theorem relates the integral cross section to the imaginary part of the forward scattering amplitude 
by 

ff(£') = (4ff/it)lm/t0). 

This relation is a direct consequence of the conservation of flux. The target casts a shadow in the forward 
direction where the intensity of the incident beam becomes reduced by just that amount which appears in the 
scattered wave. This decrease in intensity or shadow results from interference between the incident wave and 
the scattered wave in the forward direction. Figure B2.2.2 for the density | *Pj! (r) |of section B2.2.6 illustrates 

how this interference tends to illuminate the shadow region at the right-hand side of the target. Flux 
conservation also implies that the phase shifts iJt are always real. Thus 

1*1* = I |7V| 2 = lm7}. 

B2.2.6.6 LEVINSON'S THEOREM 

For a local potential V(r) which supports ^ bound states of angular momentum iand energy E < 0, the phase 
shift lim k _^ iJt (k)) tends in the limit of zero collision energy to iJt n. When the well becomes deep enough 
so as to introduce an additional bound level i? n+1 = at zero energy, then lim k _^ qT|q(A:) = (« Q + j) 71 - 



Figure B2.2.2. Scattering of an incident plane wave. 
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B2.2.6.7 PARTIAL WAVE EXPANSION FOR TRANSPORT CROSS SECTIONS 

The transport cross sections 

■I *+\ 


r l h- f_iy"|~ l f^ 1 do- 


for n = 1—4 have the following phase shift expansions 




,„,_, 4it^ V+l) [ tt + 2W + Q .._ x , t , 3<f+2£-l) . ; , ,1 

" ik)= eWhw < mt ■ 7)1 (2M3j Rin <"<-"-> + 


(2f-l) 


The momentum-transfer or diffusion cross section is a^ and the viscosity cross section is ig^K 


B2.2.6.8 BORN PHASE SHIFTS 


For a symmetric interaction, the Born amplitude is 


M K) = -fu(R)^R^R 


where U(r) = (2M AB /h 2 )V(R). Comparison with the partial wave expansion for f B (K) and 


provides the Born phase shift 


tan 


Jo 
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(A) EXAMPLES OF THE BORN S-WAVE PHASE SHIFT 


1 f lx 
ft Ja 


U = U< 


c -cR 

ii 


For the potential R 


tan»tf = -^ln[l+4* 1 /* 2 ]- 


*- * 


For the potential {R^+ Rfc)*- 


4* rt n 

(B) BORN PHASE SHIFTS (LARGE t) 

For Efrka, 

tan,," = __2 / UiR)R 2( ' 2 dR 

[at 1 1)!!] 2 ; ft 

valid only for finite-range interactions U(R > a) = 0. If U = -U Q , R < a and U= 0, R > a, then 


ttfl)"- 


[(2£+ 1)i!p(2f + 3)' 

The ratio »W* ~(*a/20 1 . 


B2.2.6.9 COULOMB SCATTERING 


For elastic scattering by the interaction V(r) = Z A Z B e /r, the Coulomb wave can be decomposed as 
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where the radial wave varies asymptotically as 

f t - sin(Ai? - f^jr + >jj c) - p In 2fr/?) 


where the parameter p is Z A Z fi e /fl v. The Coulomb phase shift is 


fjf = argrifi + 1 + ifi) = Im In T(£ + 1 + iff) 

to give the Coulomb ^-matrix element 

t f C T(|: + 1 - 10) 

The Coulomb scattering amplitude is 

0exp[2iijJ c ' - lor Intsin 2 1^)1 


AW = 


The Coulomb differential cross section \fJ is 


2Jtsin-^ 




/dfl 


This is the Rutherford scattering cross section. It is interesting to note that Born and classical theory also 
reproduce this cross section. Moreover, 

is a function only of the momentum transferred q = hK=2h k sin^ in the collision. Note that q = 8M AB E 
sin 2 ^ 6. 


B2.2.7 COLLISIONS BETWEEN IDENTICAL PARTICLES 

The identical colliding particles, each with spin s, are in a resolved state with total spin S f in the range (0 
2s). The spatial wavefunction with respect to particle interchange satisfies ^i**) = C - U *(--R). 
Wavefunctions for identical particles with even or odd total spin & are therefore symmetric (S) or 
antisymmetric (A) with respect to particle 
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interchange. The appropriate combinations are *&.aCTO = *(f£) ± *{-,R} ? where the positive sign 
(symmetric wavefunction S) and the negative sign (antisymmetric wavefunction A) are associated with even 
and odd values of the total spin S v respectively. The scattering wavefunction for a pair of identical particles in 
spatially symmetric (+) or antisymmetric (-) states behaves asymptotically as 

^. A (R) -+ [cxpfifc ■ R) ± cxp(-i* ■ R)} - [fW. 0) ± fin - 0. * - g)] C * P( '* g> . 

The differential cross section for scattering of both the projectile and target particles into direction is 


( 




in the CM-frame where scattering of the projectile into polar direction (71 - 0, fy + n) is accompanied by 
scattering of the identical target particle into direction (0, §). This is related to the probability that both 
identical particles are scattered into 0. In the classical limit, where the particles are distinguishable, the 
classical cross section is 


( 


^y = i/tf.^i 2 + 1/<* - 0.0+ *)r 


the sum of the cross sections for observation of the projectile and target particles in the direction (0, §). Since 

, the differential cross section for 9-mdependent amplitudes/ is then 

(¥k) = 77^ " Vw l (2* + ])[cxp2ifl,-]]ft(cosff) 

For scattering in the symmetric (S) channel where S t is even, (Dg = 2 for ieven and co^ = for iodd. For 
scattering in the antisymmetric channel where S t is odd, co^ = for ieven and 00| = 2 for iodd. The integral 
cross section is 

*s,a(£) = 77 J^(2« + 1) sin- ft. 

Let g A and g s be the fractions of states with odd and even total spins S t = 0,1,2,. . .,2s. When the 2s + 1 spin- 
states S t are unresolved, the appropriate combination of symmetric and antisymmetric cross sections is the 
weighted mean 
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B2.2.7.1 FERMION AND BOSON SCATTERING 


(A) FERMIONS 


For fermions with half-integral spin s, the statistical weights are g s = si {2s +1) and g A = (s + l)/(2s +1). The 
differential cross section for fermion-fermion scattering is then 


U + i/ 


^ = l/(*)l 2 + I / (t - 0)| 2 - [ T - TT ] Rc[f(0)f'ijz - &)]. 


The integral cross section fermion-fermion collisions is 


0* = J[ffS + ^A]- j[ffS-^A]/(2j+ I) 

which reduces, for fermions with spin-i to 

iir r °° *° 1 

ff P ^ = -7T £ C2 ^ + 1 } shl2 '* + 3 £ (2 ' ; + ]) sin " * 
* M-fiien /-odd J 

fa; bosons 

The statistical weights for bosons with integral spin s, are g s = (5 + l)/(2s +1) and g A = 5/(25 +1). The 
differential cross section for boson-boson scattering is 

<hH = \f(0)\ 2 + |/(3i -0)| 2 +■ (^—r) Re[/(0)/*Or - 0)]. 


The integral cross section boson-boson collisions is 

ffu = {[<*> + ^a] + 4[ffs - a A ]/(2s + 1) 
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which reduces, for bosons with zero spin, to 


L f = even J 


Symmetry oscillations therefore appear in the differential cross sections for fermion-fermion and boson- 
boson scattering. They originate from the interference between unscattered incident particles in the forward (0 
= 0) direction and backward scattered particles (0 = n, 1= 0). A general differential cross section for scattering 
of spin-5 particles is 

^ - \f(0)\ 2 + \f(n - 0)\ 2 + ^-fr2Rc[f(0)r & ~ <?)]- 

QU Is + 1 

B2.2.7.2 COULOMB SCATTERING OF TWO IDENTICAL PARTICLES 

(A) TWO SPIN-ZERO BOSONS 

Two spin-zero bosons (e.g. 4 He- 4 He) 

— = — -fcosec 4 ^ + sec 4 ^S + 2cosec-^&sec 2 ^9 cosy], 
tin 4k 2i 2 2 2 2 /J 

(B) TWO SPIN-1 FERMIONS 

2 


Two spin-i fermions (e.g. H^-H 4 ", e^-e ) 


dft 4k 

(C) TWO SPIN-1 BOSONS 

Two spin-1 bosons (e.g. deuteron-deuteron) 


— = -— [cosec 4 yfl + sec'yfl — cosec^flsee 3 ^) cosy]. 


= -i_[ C osec 4 ^ + sec 4 |# + ^ca$£C 2 \ftsec 2 \& cosy]. 


(a)-(c) are the Mott formulae, where p = (Ze) 2 /fi v and y = 2p ln(tan^ 0). 


-43- 


B2.2.7.3 SCATTERING OF IDENTICAL ATOMS 

Two ground-state hydrogen atoms, for example, interact via the X S*and h E* electronic states of H 2 . The 

nuclei are interchanged by rotating the atom pair by n, then by reflecting the electrons first through the 
midpoint of R and then through a plane perpendicular to the original axis of rotation. The mid-point reflection 

changes the sign only of the ungerade state wavefunction and both S + states are symmetric with respect to the 
plane reflection. 

The cross section for scattering by the gerade potential is then the combination 


\dQ ) " 4 \ dQ j s + 4 { dtt 


i 


of S and A cross sections which involve the phase shifts r '* : calculated under the singlet interaction. For 
scattering by the ungerade triplet interaction 




/d«j 


where the S and A cross sections involve the phase shifts *t calculated under the triplet interaction. Since the 

I a £* T+ 

electrons have statistical weights 4and Ifor the *and "states, the differential cross section for H(ls) - H(ls) 

scattering by both potentials is 




These combinations also hold for the integral cross sections. 
(C) SCATTERING OF INCIDENT BEAM ALONE 


Since the current of incident particles/ = 2v, the cross sections presented by the target (i.e. the number of 
incident particles removed from the beam in unit time per unit incident current) are 1/2 of all those above. For 
example, 




and 
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B2.2.8 QUANTAL INELASTIC HEAVY-PARTICLE COLLISIONS 

The wavefunction for the complete A-B collision system satisfies the Schrodinger equation 

H[r, fl)*(r, R) = H m U) - — ^— V= + V[r. R) *(*-. R) 
I 2Wau J 

where the internal Hamiltonian is the sum /? int (f) = H^a) + ^B^b) °^ individual Hamiltonians H A B for 
each isolated atomic or molecular species. The total energy (internal plus relative) 

E = '- + <? d = J - + €j 

remains constant for all channels /throughout the collision. The combined internal energy e . of A and B at 
infinite separation R is e (^4) + e (B) which are the eigenvalues of the internal Hamiltonian /7 int 

corresponding to the combined eigenstates ® A (r A )® B (r B ). There are two limiting formulations (diabatic and 
adiabatic) for describing the relative motion. These depend on whether the mutual electrostatic interaction V 
(r, R) between A and B at nuclear separation 7?, or the variation in the kinetic energy of relative motion, is 
considered to be a perturbation to the system, i.e. on whether the incident speed v- is fast or slow in 
comparison with the internal motions, e.g. with the electronic speed of the electrons bound to A and B. 

B2.2.8.1 ADIABATIC FORMULATION (KINETIC COUPLING SCHEME) 

When relaxation of the internal motion during the collision is fast compared with the slow collision speed v i? 
or when the relaxation time is short compared with the collision time, the kinetic energy operator 
(- Maij/'J } v flj s then considered as a small perturbation to the quasi-molecular A-B system at fixed R. The 
system wavefunction W(r, R) = H n F n (R)<$> n (r, R) can therefore be expanded in terms of the known 'adiabatic' 
molecular wavefunctions O (r, R) for the quasi-molecule AB at fixed nuclear separation R. This set of 
orthonormal eigenfunctions satisfies 


As R — » oo, both O n (r, R) and the eigenenergies E n (R) tend, in the limit of infinite nuclear separation R, to the 
(diabatic) eigenfunctions ^ n (^ A? ^ B ) = vi/^Jf A )(|)-()f B ), of /7 int with eigenenergies e n , respectively. The 

substitution ^(r, R) = S n F n (i?)® n (r, R) into the Schrodinger equation results in the following set 


-45- 


of coupled equations for the relative motion functions F . The local momentum AT n is determined from 
^ n = 2M ^\ E ~ £ "" (/ ^)/ & "and the coupling matrix elements are 

X rij {R) = -2{*, r (r t R)|V fl |* y (F, J R)} r 

and 

Solution of this set for F n (R) represents the adiabatic close-coupling method. The adiabatic states are 
normally determined (via standard computational techniques of quantum chemistry) relative to a set of axes 
(X',Y',Z f ) with the Z- axis directed along the nuclear separation R. On transforming to this set which rotates 
during the collision, then \|/(r', R), for the diatomic A-B case, satisfies 

\H*(r') + V(r\ R!) - * K\ V(r' ? R') = £*(r\ B') 

where the perturbation operator to the molecular wavefunctions in the rotating frame is 

in terms of the operators ^and J for the total and internal angular momentum L andy respectively of the 
collision system. Note L^ = J z ,, for diatoms. An advantage of using this rotating system in the adiabatic 

treatment is that radial perturbations, which cause vibrational v — » V and electronic nl — » n'l transitions, 
originate from the first term (radial) of k while angular perturbations (torques) which causes rotational / — » J 
and electronic nl — » nT transitions originate from the angular momentum operator products [^ X '^x' + ^Y'^X']* 
The use of a rotating frame causes some complication, however, to the direct use of the asymptotic boundary 
condition for ¥(#•', R). 

B2.2.8.2 DIABATIC FORMULATION (POTENTIAL COUPLING SCHEME) 

When relaxation of the internal motion is slow compared with the fast relative speed v., then W is expanded in 
terms of the known unperturbed (diabatic) orthonormal eigenstates Q>:(r A , f* B ) = Vj^A^k^B^ °^^int 
according to 


W(r k R) = ^2F J {R)® J (rl 
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Substituting into the Schrodinger equation, multiplying by n' r 'and integrating over r, shows that the 
unknown functions F (R) for the relative motion in channel n satisfy the infinite set of coupled equations 

The reduced potential matrix elements which couple the internal states n andy are 

V„j(R) = ^f-V^R) = UJ )t (R) 

where the electrostatic interaction averaged over states n andy is 

V nJ (R) = J <J>»V> h R)*j{r) dr, 

The local wavenumber K n of relative motion under the static interaction V rm is given by 

KJ;(R) = A;;-t/,„,(fl). 

The diagonal elements U nn are the distortion matrix elements which distort the relative motion from plane 
waves in elastic scattering, while the off-diagonal matrix elements, f/.^and [/.., U f which couple states i and/ 
either directly or via intermediate channels j cause inelastic scattering and polarization contributions to elastic 
scattering. In contrast to the adiabatic formulation, radial and angular transitions originate in the diabatic 
formulation from the radial and angular components to the potential coupling elements V-(R). The set of 
coupled are solved subject to the usual asymptotic (R — > oo) requirement that 

F;{R) - exp(ii,Z)ffy + fjj£xp(\kjR)/R 

for the elastic i =j and inelastic i *j scattered waves. In terms of the amplitude f- for scattering into direction 
(0, (|)), the differential and integral cross sections for i — »y transitions are 

and 


y =- [ d(cosO) / ' \f i} {0.<p)\ 2 <l4>. 
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As well as obtaining the scattering amplitude from the above asymptotic boundary conditions,/^can also be 
obtained from the integral representation for the scattering amplitude is 

f if (0) = {* / (r)exp(ife / . R)\V(v k 17)1*0, H)} r . fl+ 

B2.2.8.3 INELASTIC SCATTERING BY A CENTRAL FIELD 

When the atom-atom or atom-molecule interaction is spherically symmetric in the channel vector/?, i.e. V(r, 
R) = V(r,R), then the orbital / and rotational y angular momenta are each conserved throughout the collision so 
that an ^-partial wave decomposition of the translational wavefunctions for each value ofy is possible. The 
translational wave is decomposed according to 

4;nV 




and inserted into the diabatic set of coupled equations (of section B2.2.8.2 ). The radial wavefunction F.g is 
then the solution of 

which is the direct generalization of the quantal radial equation for potential scattering to directly include 
other channels j ^ i. The coupled equations are now solved subject to the requirements that 


F u (kiR) +* sinikiR- tn/2) + [ yf } c 1 **""' 2 ' 


for the elastic scattered wave and 


*™~m\ 


iiYf ZiiU K */* 


tf „-. txf2) 


for the inelastic wave. The transition-matrix elements for elastic and inelastic scattering are 
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,-«» 


where /' ^ r f r >Ji\ f r ) ' rdn are ^ so i u ^ ons f these coupled radial equations. The differential 
cross section for inelastic scattering is 


The integral inelastic cross section is 


'i = (1/4Af) 


r=0 


(2£+ l)TfjP r (cos 9) 




k 2 

J 1 ^ = I r' ) 3"' J"*. 

The transition matrix L 'J ■ is symmetrical, *J= J* , and the cross sections satisfy detailed balance. Each 

T? ? 

transition matrix element | tf p is the probability of an / — » /transition in the target for each value £ of the 

(orbital) angular momentum of relative motion. 

B2.2.8.4 TWO-STATE TREATMENT 

Here all couplings are ignored except the direct couplings between the initial and final states as in a two-level 
atom. The coupled equations to be solved are 

[V 2 + i? - U n (RMi(R) = U if (R}f f (R) 
[V 2 + *} - (///(JiJl^/tR) = U /t (R)$i(Rh 

(A) DISTORTED-WAVE APPROXIMATION 

Here all matrix elements in the two-level equations (section B2.2.8.4) are included, except the back coupling 
VjJfAerm which provides the influence of the inelastic channel on the elastic channel and is required to 
conserve probability. Distortion of the elastic and outgoing inelastic waves by the averaged (static) 
interactions V- and ^respectively is therefore included. The two-state equations can then be decoupled and 
effectively reduced to one-channel problems. An analogous static-exchange distortion approximation, where 
exchange between the incident and one of the target particles also follows from the two-level treatment. 

(B) BORN APPROXIMATION 

Here the distortion (diagonal) and back coupling matrix elements in the two-level equations (section B2.2.8.4) 
are ignored so that \\f f (R) = exp(iA.-if) remains an undistorted plane wave. The asymptotic solution for \\f, 
when compared with the asymptotic boundary condition then provides the Born elastic (i =f) or inelastic 
scattering amplitudes 
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/>-*> = — ^ fv fi (R)^*4R. 
4jT ft J 


The momentum change resulting from the collision is Q = hK where k = k-k^. The Born amplitude also 

follows by inserting \|/(r, R) = 0^r)exp i(k^ m R) in the integral representation. Comparison with potential 
scattering shows that the elastic scattering of structured particles occurs in the Born approximation via the 
averaged electrostatic interaction V U (R). 

For electron-ion or ion-ion collisions, the plane waves exp(i£. ;R) are simply replaced by Coulomb waves to 


provide the Coulomb-Born approximation. 
B2.2.8.5 EXACT RESONANCE 

The two-state equations of section B2. 2. 8.4 cannot, in general, be solved analytically except for the specific 
case of exact resonance when k f = kr= k and U u = U^= [/, U^= U^. Then the equations can be decoupled by 
introducing the linear combinations ^ ± (Rj = -4^1 ^-(J?) ± ^y-( J=t)| 9 so the two-state set can be converted to 

two one-channel decoupled equations 

[V 2 + k 2 - ((/ ± Ur f )]^(R) = 0. 

The problem has therefore been reduced to potential scattering by the interactions U + = (U± C/J associated 
with elastic scattering amplitudes/^. Hence the elastic (i =f) and 'inelastic' (i ^f) amplitudes are 

/„ = (/* + /-)/2 yj/-(/ K -/->A 

In terms of the phase shifts ty associated with potential scattering by U + , the amplitudes for elastic and 
inelastic scattering are then 

M») = ^7 E C2/ + H[(c 2i * +e 2i * )/2 - IJfl(costf) 

and 

f if {0) = -L ^(2/ + l)[(e^ - e 2i * )/2 - 1]/Uco&0). 
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The corresponding differential cross sections |f J will therefore exhibit interference oscillations. The integral 


cross sections are 


t-ti 
and 


CTjj = p" X! (2; + 1 * (j sin " ^ + + sjr2 ^ ) / " 4 ain2( ^ ~ ^ y 4 


IT *° 

^7 = 72 51(2/ + D sin 2 (*; -i,;) 


^ : " .' :i 


respectively. 

(AJ EXAMPLES: ATOMIC COLLISIONS WITH IDENTICAL NUCLEI 


Important cases of exact resonance are the symmetrical resonance charge transfer collision 

He}(ls) + He,(1s-) -* H^(lir) + H<(ls) 

which converts a fast ion beam/to a fast neutral beam and the excitation transfer collision 

He(ls2s 3 S) + He(]s 2l S)^ HeCls 2 3 S) + He(ls2s 3 S) 

which transfers the internal excitation in the projectile beam fully to the target atom. The electronic molecular 
wavefunctions divide into even (gerade) or odd (ungerade) classes upon reflection about the mid-point of the 
internuclear line (R — > -If). In the separated atom limit, \\f ~ (K*a) =•= §(**#)• The potentials U + in the former 
case are the gerade and ungerade interactions V . The phase shifts for elastic scattering by the resulting 
gerade (g) and ungerade (u) molecular potentials of ^ + are, respectively, jj^and yj fi . The charge transfer (X) 

and transport cross sections are then 

ff X C£) = ^^(2^1)sm 2 (^-^) 

4t °° 


fc 3 


-N ^-lv,;x 2 ) 




sin 3 (A-ft- 2 >- 
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For ungerade potentials, P^ = IJ| for ieven and TJ^for £odd. For gerade potentials, P^ = IJ^for -Seven and 
n^for ^odd. The diffusion cross section ff <D contains (g/u) interference. The viscosity cross section ff <l)does 

not. For charge transfer between the heavier rare gas ions Rg + with their parent atoms Rg, the degenerate 
states at large internuclear separations are not s states but p states. The states are then S which arise from 
the p state with m = and n which arises from m = ±l with space quantization along the molecular axis. 
Since there is no coupling between molecular states of different electronic angular momentum, the scattering 

by the 2 S pair and the 2 II pair of Net potentials (for example) is independent. The cross section is 

therefore the combination 

**/.x(£)= }cr E (£) + |a n {^ 

of cross sections a v and a^ for the individual contributions arising from the isolated £ and H states to 
elastic el or charge-transfer X scattering. See [12, 13] for further details on excitation-transfer and charge- 
transfer collisions. 

(B) SINGLET-TRIPLET SPIN-FLIP CROSS SECTION 

This cross section is 


<Ki(E) = ^ f^ {2t + ]) si " 2(r ~ (i) 


where jj* ,l are the phase shifts for individual potential scattering by the singlet and triplet potentials, 
respectively. 

B2.2.8.6 PARTIAL WAVE ANALYSIS 

In order to reduce the three-dimensional diabatic or adiabatic set of coupled equations for atom-atom and 
atom-molecule scattering to a corresponding working set of coupled radial equations, analogous to those in 
section B2. 2. 8. 3 , the orbital angular momentum / of relative motion must be distinguished from the combined 
internal angular momentum y associated with the internal (rotational and electronic) degrees of freedom of the 
partners A and B at rest at infinite separation R. Both the orbital angular momentum / of relative motion and 
the internal angular momentum y of the atomic electrons or of molecular rotation are in general coupled. The 
total angular momentum j = l+j and its component J z along some fixed direction (of incidence) are each 
conserved. Angular momentum may therefore be exchanged between the internal (rotational) and translational 
(orbital) degrees of freedom via the couplings V nm (R) or £. Partial wave analysis is an exercise in angular 

momentum coupling and is well-established (e.g. [14]) for both the diabatic and adiabatic treatments of 
heavy-particle collisions. 
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B2.2.9 ELECTRON-ATOM INELASTIC COLLISIONS 

B2.2.9.1 CLOSE-COUPLING EQUATIONS FOR ELECTRON-ATOM (ION) COLLISIONS 

A partial wave decomposition provides the full close-coupling quantal method for treating A-B collisions, 
electron-atom, electron-ion or atom-molecule collisions. The method [ 15 ] is summarized here for the 
inelastic processes 


e + A, -* e + A^ 

at collision speeds less or comparable with those target electrons actively involved in the transition. It is based 

upon an expansion of the total wavefunction ¥ for the (e~ - A) - multi-electron system in terms of a sum of 
products of the known atomic target state wavefunctions |® .) and the unknown functions F.(r) for the relative 
motion. Here 

V r (f| r# ■ r) = A Y/l 3 ! ('V r 2 > *.**r N ; r)-F t r (r) 

i 

involves a sum over all discrete and an integral over the continuum states of the target. The operator A 
antisymmetrizes the summation with respect to exchange of all pairs of electrons in accordance with the Pauli 
exclusion principle. The angular and spin momenta (denoted collectively by j?) of the projectile electron have 

been coupled with the orbital and spin angular momenta of the target states |®j) to produce the 'channel 
functions' <j>f 5rT (rj , j> r fr r >which are eigenstates of the total orbital L, total spin S angular momentum, 

their Z-components M^, M^ and parity n. The set T =LSM L M s n of quantum numbers are therefore conserved 
throughout the collision. Bv substituting the expansion for \i/ into the Schrodinger eauation. 


Lj=i V z r r/ r>;=1 r '/J 

expressed in atomic units, the radial functions for the motion of the scattered electron satisfy the infinite set of 
coupled integro-differential equations 

The direct potential couplings are represented by 


VJ W _*[*«„. fj^|_J_|^j- 
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The non-local exchange couplings are represented by 


^^f) = E(*'Ji^7j|M- D*y^ r ) 


The direct potential gives rise to the long-range polarization attraction which is very important for low-energy 
scattering. The exchange potentials are short range and are extremely complicated. Additional non-local 
potentials that arise from various correlations (which cannot be included directly but which can be constructed 
from pseudostates) can also be added to the right-hand side of the equations. 

Numerical solution of this set of close-coupled equations is feasible only for a limited number of close target 
states. For each TV, several sets of independent solutions F.. of the resulting close-coupled equations are 
determined subject to F.. = at r = and to the reactance iT-matrix asymptotic boundary conditions, 

for n open channels characterized by k f = 2(E - E) > 0. The argument is 

1 Z-N 

9; = kir --$iir+ — : InQkir) - v; 

2 kf 

where £. is the orbital angular momentum of the scattered electron and where a i = argr[i z - + 1 - i(Z-N)k t ] is 

the Couloumb phase. For closed channels, t? < QF-- - C .. exp(-|£.|r) as r — » qo. The scattering amplitude can 

y y i 

then be expressed in terms of the elements T.. of the (n x n) J-matrix which is related to the K and S matrices 

y 

by, 


T 1 = 


i-iJi: 1 


= s ] - /. 


The integral cross section for the transition i = ot a L ; .S ; . ->f= aLSAn the target atom, where a denotes the 
additional quantum numbers required to completely specify the state, is then 

. it ~ J2L + 1K2S+1) r , 

According to detailed balance, the collision strength 
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is therefore dimensionless and is symmetric with respect to i -^/interchange. Further extensions, 
simplifications and calculational schemes of the basic close-coupling and related methods are found in [15, 16 
and 17]. 

With modern high-speed computers, it is feasible to solve the coupled set of radial equations only for a 
restricted basis set of unperturbed states O n (r) regarded as being closely and strongly coupled. For electron- 
atom (molecule) collisions at low energies E, the full quantal close-coupling method is extremely successful 
in predicting the cross sections and shapes and widths of resonances which appear at energies E just below the 
various thresholds for excitation of the various excited levels. As E increases past the threshold for ionization, 
it becomes less successful, and is plagued by problems with convergence both in the number of the basis 

states and in the number of partial waves used in the expansion for \\f . Other methods for intermediate and 
high energies are therefore preferable. For heavy-particle collisions and for electron collisions at high 

energies, semiclassical versions (in section B2. 2. 10 ) of the close-coupling equations can be derived. 
B2.2.9.2 CLOSE COUPLING WITH PSEUDOSTATES AND CORRELATION 

(A) PSEUDOSTATES 

A partial acknowledgment of the influence of higher discrete and continuum states, not included within the 
wavefunction expansion, is to add, to the truncated set of basis states, functions of the form Yp(r)O p (r) where 
Op is not an eigenfunction of the internal Hamiltonian // int but is chosen so as to represent some appropriate 

average of bound and continuum states. These pseudostates can provide full polarization distortion to the 
target by incident electrons and allows flux to be transferred from the the open channels included in the 
truncated set. 

(B) CORRELATION 

When the initial and final internal states of the system are not well-separated in energy from other states then 
the closed-coupling calculation converges very slowly. An effective strategy is to add a series of correlation 
terms involving powers of the distance r.. between internal particles of projectile and target to the truncated 
close-coupling expansion which already includes the important states. 

B2.2.9.3 THE R-MATRIX METHOD 

This method, introduced originally in an analysis of nuclear resonance reactions, has been extensively 
developed [ 15 , 16 and 17] over the past 20 years as a powerful ab initio calculational tool. It partitions 
configuration space into two regions by a sphere of radius r = a, where r is the scattered electron coordinate. 
In the internal region r > a, the electron-atom complex behaves almost as a bound state so that a configuration 


interaction expansion of the total wavefunction \\f, as in atomic structure calculations, is appropriate. In the 
external region the scattered electron moves in the long-range multipole potential contained in the direct 
electrostatic interaction, and can be accurately represented by a perturbation approach. See [15, 16 and 17] for 
further details, for other modern quantal approximations and for various computational methods useful for 
electron-atom collisions over a wide energy range. 
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B2.2.9.4 ELECTRON-MOLECULE COLLISIONS 

The close-coupling equations are also applicable to electron-molecule collision but severe computational 
difficulties arise due to the large number of rotational and vibrational channels that must be retained in the 
expansion for the system wavefunction. In the fixed nuclei approximation, the Born-Oppenheimer separation 
of electronic and nuclear motion permits electronic motion and scattering amplitudes f^, (R) to be determined 
at fixed internuclear separations R. Then in the adiabatic nuclear approximation the scattering amplitude for i 
= n,v, J ^> n', V, J ^/transitions is 

and cross sections can be obtained. See [15] for further details. 


B2.2.10 SEMICLASSICAL INELASTIC SCATTERING 

The term semiclassical is used in scattering theory to denote many different situations. 

(a) The use of some time-dependent classical path R(t) within a time-dependent quantal treatment of the 
response of the internal degrees of freedom of A and B to the time-varying field V(R(t)) created by the 
approach of A towards B along the classical trajectory R(t). This procedure generalizes classical theory for 
potential scattering to structured collision partners and inelastic transitions. 

(b) The use of the three-dimensional eikonal-phase S{R), which is the solution of the Hamilton- J acobi 
equation, for the channel wavefunction i|/ + Cft), within the full quantal expression for the cross section. 

(c) The use of JWKB approximate solutions of the radial Schrodinger equation for the radial wavefunction 
7? E ^for A-B relative motion within the full quantum treatment of the A-B collision. 

B2.2.10.1 CLASSICAL PATH THEORY 

The basic assumption here is the existence over the inelastic scattering region of a common classical 
trajectory R(t) for the relative motion under an appropriately averaged central potential y[R(t)]. The 

interaction F[r, R(t)] between A and B may then be considered as time-dependent. The system wavefunction 
therefore satisfies 

ut 
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and can be expanded in terms of the eigenfunctions <D n of /? int as 

ft 

The transition amplitudes A n then satisfy the set 

ift :^— = J] A;(t f f)V li j(Jl(Oexp{i^ l ;0 

of first-order equations coupled by the matrix elements V-(R) between states n and j with energy separation 
fift^ = E tt — £, . Once the classical trajectory R = (R(t), 0(t), § = constant ) is determined from the classical 
equations 

^— = ± v[l - b 2 /R 2 - V(R)fE]* 
dh vb 

of motion for impact parameter b and kinetic energy E = 2M AB v , the coupled equations are solved subject to 

the requirement v4 n (b, t — » -qo) = 5 •. Since the probability for an / — > /transition is P^= |A^(b,t — » oo| 2 , the 
differential cross section for inelastic scattering is 


&-?*HSI 


where Ja el /dQ is the differential cross section |bdb/d(cos0)| for elastic scattering by y(R) and where the 
summation is over all trajectories b n which pass through (0, §). The integral cross section is 


" dh. 


Tjj = 2jt I \Af{h, co)| bi 
Jo 


IMPACT PARAMETER METHOD 


This normally refers to the use of the straight-line trajectory R(t) = (b +v t ) , 0(t) = arctan(bM) within the 
classical path treatment. See Bates [18, 19] for examples and further discussion. 
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B2.2.10.2 LANDAU-ZENER CROSS SECTION 


The Landau-Zener transition probability is derived from an approximation to the full two-state impact- 
parameter treatment of the collision. The single passage probability for a transition between the diabatic 
surfaces H l l (R) and H 22 (7?) which cross at R x is the Landau-Zener transition probability 


PrARx:b) = I -cxp(-2jr|fli 3 (ffx)l 2 /»vxlff;, - "j.D 

where H^ 2 is the interaction coupling states 1 and 2. The diabatic curves are assumed to have linear shapes in 
the vicinity of the crossing at R\+ L.e. (H lt - H 22 = A^.)andi/ 12 is assumed constant. The adiabatic surfaces 

W ± =±iH n + H 21 )±\[{H il -H22) 2 +4H ]1 ] 

do not cross (avoided crossing). They are separated at R^ by w + -W~ = 2H 12 (R X ). The probability for 
remaining on the adiabatic surface is/? 12 (R x ). The probability for remaining on the diabatic surface or for 
pseudocrossing between the adiabatic curves is 1-P 12 (R X ). The overall transition probability for both the 
incoming and outgoing legs of the trajectory R(t) is then 

V l2 = 2^0-^,2), 

The Landau-Zener cross section is 


Jo 


to 

where the variation of P l2 on impact parameter b arises from the speedr* *s l^(1 —b 2 /R^) }/1 at the crossing 
point R x . For rectilinear trajectories R = b + v Q t, 

iT, 2 (i: ) = 4jt *£[/■>(*) - toiler)] 

where E a {a) = /* y~ n exp(— ay) Ay is the exponential integral with argument a = 2rc|H 12 | 2 /fj V^Af. See 

Nikitin [20, 21] for more elaborate models which include interference effects arising from the phases or 
eikonals associated with the incoming and outgoing legs of the trajectory. 

B2.2.10.3 EIKONAL THEORIES 

Here the relative motion wavefunction F (R) is decomposed as [ 22 ] 

f„(R) = A fl («)cxpiSi(H)CTp(-x ir (K)). 
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The classical action, or solution of the Hamilton- Jacobi equation V S (R) = k JR), for relative motion under 


the channel interaction V R, is 


S n (R) = k„*R+ f (K tt -k ri ) dft. 


where Rq is the initial point on the associated trajectory ^ n (t) where K n = k n . The current j n in channel n, 
assumed elastic, satisfies the conservation condition V/ n = 0, so that x„ is the solution of 


V^ - 2(V R S n ) ■ (V n/ , t ) = 0. 

Flux in channel n is therefore lost only via transition to another state/with a probability controlled solely by 
Ap When many wavelengths of relative motion can be accommodated within the range of V rm , as at the higher 
energies favoured by the diabatic scheme, the fast ^-variation of F n is mainly controlled by S , and the 
original diabatic set of coupled equations then reduce to the simpler set 

of first-order coupled equations. When a common trajectory J? n (t) = R(t) under some averaged interaction y 
(R) can be assumed for all channels n then 

S N (JI)-S y (Ji) = ^r+7i ■ [ [V ff tR(n)-V m imt)i]di 

and the classical path equations are recovered [22]. 
(A) AVERAGED POTENTIAL 

The orbit common to all channels is found by choosing the potential governing the relative motion as the 
average [ 22 ] 

V(R) = {*(r, R)\H int {r) - V(r, R)|*£?\ R)) e . 

Hamilton's equations of motion for this interaction 


it L ^ J 
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are therefore coupled to the set of first-order equations for the transition amplitudes A^(R). An essential 
feature is that total energy is always conserved, being continually redistributed between the relative motion 
and internal degrees of freedom, as motion along the trajectory proceeds. In terms of the solutions A^ (b,t) and 

the differential cross section doj^ dQ for elastic scattering of particles with impact parameter b(0) through 
by v^R), the semiclassical scattering amplitude is 

The accumulated classical action for orbit b(0) is 




When the same scattering angle originates from more than one impact parameter b-, then interference effects 
originate from the different actions associated with the different orbits b-(0). The contributions arising from N- 


orbits which are well-separated combine according to 


The coefficients a- = exp±7r/4 depends on whether the scattered particle emerges on the same side (+) of the 
axis as it entered, as in a collision overall repulsive, or on the opposite side (-), as in an overall attractive 
collision. The coefficients p. is exp±7r/4 according to whether the sign of db/d0 is (+) or (-). The differential 
cross section will therefore exhibit characteristic oscillations, directly attributable to interference between the 
action phases S'^b-) associated with each contributing classical path b(0). The analysis can be extended, as in 
the uniform Airy function approximation to cover orbits which are not widely separated, as for the case of 
rainbow scattering or of caustics, in general, where the density of paths become infinite. This theory provides 
the basis of the multistate orbital treatment [22] which is successful for rotational and vibrational excitation in 
atom-molecule and ion-molecule collisions at higher energies ^ AB ^1 1 eV. Other semiclassical treatments 
based on the JWKB approximation to the corresponding set of coupled equations for the radial wavefunction 
for relative motion can be found in [23, 24 and 25 ], 

In figure B2.2.3 cross sections for the quenching process 

He + H 2 (v = 1, / = 0) -► He ■+■ H 2 {v = 0, J = 0} 

for collision energies E ranging from the ultracold to 1 keV are displayed. The full quantal results [ 26 ] are 
shown together with those calculated [27] from the semiclassical multistate orbital method [22]. It is seen that 
results from both methods complement and connect with each other very well, in that the quantal treatment is 
calculationally feasible up to E-\ eV while semiclassical procedures are feasible at the higher collision 
energies. 
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Figure B2.2.3. Vibrational relaxation cross sections (quantal and semiclassical) as a function of collision 
energy E. 


(B) MULTICHANNEL EIKONAL METHOD 


For electronic transitions in electron-atom and heavy-particle collisions at high impact energies, the major 
contribution to inelastic cross sections arises from scattering in the forward direction. The trajectories implicit 
in the action phases and set of coupled equations can be taken as rectilinear. The integral representation 

fi/ffl = {^f(r)exp(\kj 'R)\V{r, fl)|*(r, R)) rR 

for the scattering amplitude, where 

then provides the basis of the multichannel eikonal treatment [ 28 ] valuable, in particular, for heavy-particle 
collisions and for electron (ion)-excited atom collisions where, due to the large effect of atomic polarization 
(charge-induced dipole), the collision is dominated by scattering in the forward direction. 

B2.2.11 LONG-RANGE INTERACTIONS 

B2.2.11.1 POLARIZATION, ELECTROSTATIC AND DISPERSION INTERACTIONS 

The long-range interaction V(R) between two atomic/molecular species can be decomposed into 
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The polarization interaction arises from the interaction between the ion of charge Ze and the multipole 
moments it induces in the atom or molecule AB. The dominant polarization interaction is the ion-induced 
dipole interaction 

7 - 
V po ,(Z*; indD) = —^rl 1 + (d' 4 /at)P 2 (s • R)] 

where the averaged dipole polarizability is a d = (a,, + 2a , )/3 and a,, and a , are the polarizabilities of AB in 
the directions parallel and perpendicular to the molecular axis .vof AB. The anisotropic polarizability is a d , = 2 

(a, I - a | )/3. The next polarization interaction is the charge-induced quadrupole interaction, averaged over all 
molecular orientations 

where Sqis the averaged quadrupole polarizability. Additional polarization terms arise from permanent 

multipole moments of one partner and the dipole (or multipole) it induces in the other, averaged over all 
directions. The leading term is 

V^(D: indD) = - — (D;^ + Dfot^) 

where the subscripts i and n label the permanent dipole moments D and the dipole-polarizabilities a H of the 


ion and neutral, respectively. The variation R_ 6 is similar to that for the charge-induced quadrupole 
interaction. 

The electrostatic interaction results from the interaction of the ion with the permanent multipole moments of 
the neutral. For cylindrically symmetric neutrals or linear molecules, the ion-neutral multipole interaction is 




V^Z*: D.Q) = -^Piti ■ R) + -j — &{* ■ R) 


where /^ _ f T p(T}dr and Q n = fOz 1 — r-}p{r) dr are ^e permanent dipole and quadrupole moments of 
the neutral. The ion dipole-neutral dipole interaction is 

V cl (A; U n ) = — jripcos^cosfl* -sin ^ sin tf,co&(^ - <M] 

where i and n are the angles made by the ionic and molecular dipoles Z> i and D n and the line R of centres 
and (^ and § are the azimuthal angles of rotation about the line of centres. The dipole-molecular quadrupole 
interaction is 
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The dispersion interaction arises between the fluctuating multipoles and the moments they induce and can 
occur even between spherically symmetric ions and neutrals. Thus, 


Mlispcrs-iun 


* 6 R* ff Ly 


represents the interaction of the fluctuating dipole interacting with the induced dipole C 6 term and quadrupole 
C 8 term, respectively. The leading R -6 term represents the van der Waal's attraction. 
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B2.3 Reactive scattering 

Paul J Dagdigian 


B2.3.1 INTRODUCTION 

Reactive scattering is one of a number of gas-phase phenomena included in the field of molecular collision 
dynamics, which is the study of the molecular mechanism of elementary physical and chemical rate processes. 
Other such dynamical processes include photodissociation, vibrational and rotational energy transfer, 
electronic quenching, unimolecular decay, reactions within weakly bound complexes and gas-surface 
interactions. The object of studying the dynamics of these processes is to gain an understanding of the 
behaviour of a system at the molecular level. We would like to unravel the forces exerted on the nuclei, as 
described by the potential energy surface (PES) of interaction, during the collisional encounter. We also wish 
to learn whether the system has jumped to another PES through an electronically non-adiabatic transition. In 
this section, techniques appropriate to the study of the dynamics of chemical reactions are emphasized. 
However, these techniques are generally applicable to the study of a variety of gas-phase collisional 
processes. 

The implementation of molecular beam techniques and introduction of laser-based detection methods has 
allowed chemical reaction dynamics to be elucidated in far greater detail than is possible from inferences 
based on the temperature dependence of reaction rate constants. In an ideal crossed-beam reactive scattering 
experiment, which is illustrated schematically in figure B2.3.1 two collimated beams of the reagents, of well 
defined velocities, are crossed in a collision centre, and the flux of reaction products in specified internal 
vibration-rotation quantum states scattered into particular solid angles is determined. In this way, the 
differential cross section for scattering of the reaction product in a given internal quantum state into a given 
solid angle element can be measured. There have been relatively few studies of reaction dynamics which have 
closely approximated this ideal, nor is the ideal experiment usually required in order to infer what one would 
like to understand about the dynamics of a particular reaction. While this section describes experimental 
techniques as applied to the study of chemically reactive collisions, these methods have also been applied to 


the study of a wide range of collisional phenomena, including non-reactive energy transfer collisions, 
photodissociation, and gas-surface scattering. 

Two radically different approaches have been taken for the study of reactive scattering. In the first, which 
approximates the ideal reactive scattering experiment, collimated beams are crossed in a high-vacuum 
chamber, and the products are detected with a rotatable detector. In most scattering experiments, a mass 
spectrometer, with electron bombardment ionization and mass-resolved detection of the ions transmitted 
through a radio-frequency (RF) electric quadrupole [1], is employed as the detector, and the products are 
identified from the mass-to-charge ratio of the detected ion, either the parent or a fragment ion. This detection 
method is 'universal' in that every atom or molecule can, in principle, be detected mass spectrometrically. An 
excellent description of such a molecular beam apparatus is given by Lee et al [2]. 


source 1 



source 2 


Figure B2.3.1. Schematic diagram of an idealized molecular beam scattering experiment. 

In order to determine the partitioning of the energy available to the products into internal (vibrational and 
rotational) excitation and relative translational recoil energy of the products, the velocity of the detected 
products is determined, usually by a time-of- flight method [3]. In this way, the translational energy of the 
products can be determined. The mass spectrometer is essentially insensitive to the degree of internal 
excitation of the product, however, and the internal excitation of the products can be only determined 
indirectly, through energy conservation with the knowledge of the total energy available to the products 
(reaction exoergicity + translational and internal energy of the reagents). 

The second approach to the study of reactive scattering involves the use of some spectroscopic method for the 
detection of the products in specified internal quantum states. Molecular spectroscopy is well suited to the 
determination of the relative populations in individual states since the quantum numbers of the upper and 
lower states of a molecular line in an assigned transition are known. Moreover, the intensities may be directly 
related to concentrations of specific internal states. The original implementation of this approach for the study 
of reactive scattering involved observation of spontaneous infrared emission from the radiative decay of 
vibrationally excited products [4, 5]. This approach is still being employed, however now usually with 
detection of the emission with Fourier transform [6], rather than grating-tuned spectrometers. In some cases, 
emission from electronically excited products can be observed for highly exothermic reactions. 


For many reaction products and for the detection of molecules in their ground vibrational level, some laser- 
based spectroscopic method must be employed, rather than observation of spontaneous emission. The simplest 
spectroscopic method for determining concentrations of specified product internal states would involve the 


application of the Beer-Lambert law on resolved molecular lines in direct absorption. However, the optical 
density of the product will be very small and limited by the requirement that the nascent reaction products do 
not undergo any secondary, relaxing collisions before being detected. In the gas phase, the collision frequency 
can be conveniently reduced by changing the density. The average time between collisions is increased by 
reducing the total pressure, and hence the concentration of the products. Very recently [7], an ultrasensitive 
absorption method has been developed and applied for the detection of reaction products. 


A more sensitive method of detecting absorption, through observation of a so-called 'action' or 'excitation' 
spectrum, has been mainly employed for the detection of the reaction products. In most such experiments, a 
wavelength-tunable laser, usually a dye laser, is scanned over an electronic band system of the reaction 
product in question, and a signal indicative of molecular absorption is recorded. The relative intensities of 
spectral lines or bands are then converted into relative populations of the reaction product in specified internal 
quantum states. In this way the disposal of the available reaction energy into the internal degrees of freedom 
of this product is directly determined. If the accompanying product is an atom or a molecule with little 
internal excitation, the relative translational energy of the products can be obtained from energy conservation 
and knowledge of the total available energy. In this second approach, the angular distribution of the products 
is usually not determined. However, recent experiments employing Doppler resolution of isolated spectral 
lines have allowed determination of a low-resolution angular distribution of the product. 

Most often, fluorescence excitation to an excited electronic state with the fundamental or frequency-doubled 
output of a wavelength-tunable dye laser has been employed for laser-based detection of the reaction 
products. In this method, the total, spectrally unresolved, photon emission from the detection zone is 
monitored as the laser wavelength is scanned over a molecular transition. Such an excitation spectrum 
provides the same information as would be available from an absorption spectrum, but with much higher 
detection sensitivity. This increased sensitivity arises from two factors. Fluorescence detection is a 'zero- 
background' technique and is limited only by background due to scattered light and quantum counting 
statistics, while absorption requires the measurement of ratios of signals. In addition, the sensitivity of 
fluorescence detection is greatly enhanced by the high spectral intensity of laser radiation, as opposed to 
incoherent radiation from lamps. Product internal state distributions have been determined with laser 
fluorescence detection in both beam- and bulb-type experiments. 

The first half of this section discusses the use of the crossed beams method for the study of reactive scattering, 
while the second half describes the application of laser-based spectroscopic methods, including laser-induced 
fluorescence and several other laser-based optical detection techniques. Further discussion of both non-optical 
and optical methods for the study of chemical reaction dynamics can be found in articles by Lee [8] and 
Dagdigian [9]. 


B2.3.2 CROSSED-BEAMS METHOD 

B2.3.2.1 THE BASIC SCATTERING EXPERIMENT AND SIGNAL INTENSITY 

An ideal scattering experiment requires that the velocity spread of the reagent beams be narrow, so that the 
relative translational energy of the reagents is well defined. Effusive beams have a very broad velocity 
distribution, and their use in a scattering experiment usually required the insertion of a slotted-disk velocity 
selector [10] to reduce the velocity spread to a reasonable width. The flux in an effusive beam is not large, and 
the insertion of a velocity selector reduces the reagent beam flux significantly. The introduction of supersonic 
beam sources, with a dramatic narrowing of the velocity spread [11], radically increased the flux from beam 
sources and made the modern era of crossed-beam reactions possible. The term 'supersonic' refers to the fact 
that the molecular velocities in such a source are greater than the local speed of sound. In such a source and in 


contrast to an effusive source, the backing pressure is high enough that the mean free path within the source is 
much smaller than the orifice diameter so that the gas behaves as a hydrodynamic fluid. Under ideal 
conditions the enthalpy is converted to net motion during the expansion of the gas into vacuum, and the local 
temperature becomes very low, leading to a very small spread of velocities about the mean. 


In addition to obviating the need of velocity selection, the increased backing pressure over that attainable with 
an effusive source leads to significantly higher downstream beam densities. 

A molecular beam scattering experiment usually involves the detection of low signal levels. Thus, one of the 
most important considerations is whether a sufficient flux of product molecules can be generated to allow a 
precise measurement of the angular and velocity distributions. The rate of formation of product molecules, 
d/V/dt, can be expressed as 


dJV/df = II iH 2 V«|ffV M || (B2.3.1) 

where the number densities of the reagent beams in the collision volume are given by n^ and n 2 ; and v rel , a, 
^coll are ^ e re l a ti ye velocity between the reagents, the integral reaction cross section, and the scattering 
volume, respectively. (Equation B2.3.1 is just a re-expression of the law of mass action since the product v rel 
a is the microcanonical rate constant.) In an experiment with one supersonic beam and a velocity-selected 
effusive beam, typical values for the reagent beam densities are 10 and 10 molecules cm -3 , respectively, 
in a collision volume of 10 cm 3 . The relative velocity will be approximately 10 5 cm s , and reaction cross 
section 10~ 15 cm 2 . This leads to an estimate of 10 10 molecules s _1 for the total rate of production formation 
dN/dt. 

The product molecules will scatter into a range of laboratory angles, depending upon the exoergicity of the 
reaction, the reaction dynamics, and kinematics, which is a function of the masses of the reagents and 
products. If we assume that the scattering is confined to 1 sr of solid angle (out of the total 4n), then the 

detector will receive -3 x 10 molecules s if it subtends 1° in both directions (i.e. an angular acceptance of 
1/3000 sr). If, instead, the molecules were isotropically scattered, the detector would see a considerably 

smaller flux of -2 x 10 5 molecules s _1 . Of course, we desire not only the angular distribution of the products, 
but also the velocity distribution at each scattering angle, for a fuller understanding of the reaction dynamics. 

If the molecules could be detected with 100% efficiency, the fluxes quoted above would lead to impressive 
detected signal levels. The first generation of reactive scattering experiments concentrated on reactions of 
alkali atoms, since surface ionization on a hot-wire detector is extremely efficient. Such detectors have been 
superseded by the 'universal' mass spectrometer detector. For electron-bombardment ionization, the rate of 
formation of the molecular ions can be written as 

d[AT]/d/ = I^[M] (B2.3.2) 

where / is the intensity of the electron beam (typically 10 mA cm -2 , or 6 x 10 16 electrons cm 2 s _1 ) and a is 
the ionization cross section (typically 10 cm for electrons of 150 eV energy). This leads to an estimated 
ionization rate of h = la = 6 s . The molecules are, of course, not stationary in the ionization region but are 
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travelling with a typical velocity of -5 x 10 cm s . With an ionization region of length -1 cm, the 
probability of ionization is thus estimated to be -10 -4 , which is a low detection efficiency. For example, the 
above quoted product flux of 3 x 10 6 molecules s _1 leads to only 360 detection ions s _1 . This ion count rate, 
or rates as low as even 1 ion s , can be measured with good statistics in minutes if the background signal is 
not much larger than this. Hence, such beam-scattering experiments have been successful mainly through 


careful reduction of the background ion count rate. This discussion of expected 


product signal levels follows a discussion of intensities in crossed-beam reactive scattering experiments by 
Lee [8]. 

The background ion signal arises from two sources of molecules, namely the inherent background of 
molecules in a vacuum chamber and molecules effusing from the collision chamber into the detector chamber 
while the beams are on. The former arises from outgassing from the materials employed in the construction of 
the apparatus and the limitations in the pumps. The latter requires the careful design of differential pumping 
of the sources and the detector [2, 12 ]. 

B2.3.2.2 LABORATORY TO CENTRE-OF-MASS TRANSFORMATION 

In a crossed-beam experiment the angular and velocity distributions are measured in the laboratory coordinate 
system, while scattering events are most conveniently described in a reference frame moving with the velocity 
of the centre-of-mass of the system. It is thus necessary to transform the measured velocity flux contour maps 
into the center-of-mass coordinate (CM) system [13]. Figure B2.3.2 illustrates the reagent and product 
velocities in the laboratory and CM coordinate systems. The CM coordinate system is travelling at the 
velocity c of the centre of mass 


c = [m i V\ + m2V2]f{tni + flt2) 


(B2.3.3) 


where the velocities of the reagents are v 1 and v 2 , with masses m 1 and m 2 , respectively. Thus, the velocities in 
the two coordinate systems are related by 


Vi =c + Ui fori = 1,2. 


(B2.3.4) 
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Figure B2.3.2. Velocity vector diagram for a crossed-beam experiment, with a beam intersection angle of 
90°. The laboratory velocities of the two reagent beams are v 1 and v 2 , while the corresponding velocities in 
the centre-of-mass coordinate system are u^ and u 2 , respectively. The laboratory and CM velocities for one of 
the products (assumed here to be in the plane of the reagent velocities) are denoted U^and u+, respectively. 

The dashed circle denotes the possible laboratory velocities IJjfor the full range of CM scattering angles 0'. 


The CM velocities are given by 


U\ = mzlfai/Ofli + m 2> (B2.3.5a) 

U 2 = -WI]t? rfi |/(m| + m 2 ) (B2.3.56) 

where v rel = v 1 - v 2 is the relative velocity. 

The relative translational energy of the reagents is given by 

S,™ = l^U- (B2-3.6) 

The energy available to the product equals 

EL = £,™ s + Em? + A£ (B2.3.7) 

where E- mX is the internal excitation energy of the reagents and AE is the reaction exoergicity. The energy E' tQt 
can be partitioned between translational and internal excitation of the products. The CM speed of one of the 
products can be expressed as 


(pVU-*uf 


«':- "* (-.(EL-BL)} ■ (B2-3.8) 


where m 3 is the mass of this product, m 4 is the mass of the other product, \i' = m^m 4 /(m^m 4 ) is the reduced 
mass of the products, and E'> t is the internal excitation energy of the products. 

It can be seen from figure B2.3.2 that scattering angles (relative to the direction of one of the reagent beams) 
are different in the laboratory and CM coordinate systems. If the detected product is very heavy compared 
with its partner, or if its translational energy is very small, then its speed will be small compared with the 
speed of the centre of mass of the system. In this case, the product is scattered into a small range of scattering 
angles about c, and determination of the CM angular distribution will be difficult. Moreover, the scattered 
intensity at one laboratory scattering angle can come from two CM scattering angles, as can be seen in figure 
B2.3.2 . From the intensity estimates presented in section B2. 3. 2.1 , this concentration of the scattered product 
into a small laboratory angular range will facilitate detection of the product molecules. By contrast, if the 
product CM speed is large, then the product can be scattered into all laboratory angles. 

In addition to transforming the velocities and scattering angles between the laboratory and CM frames, we 
must also consider the transformation of the cross sections 


UvVQ/ (ih " Wd«/cw dt '' d0 (B2.3.9) 

where the last term on the right-hand side is the Jacobian of the transformation. Because the internal energies 
of the products are quantized, the velocity of a product scattered into a specific direction can have only 
discrete values. Equation (B2.3.9) is written with the velocities as continuous variables since the experimental 
resolution in reactive scattering experiments is not sufficient to resolve the discrete product velocities, due to 
the spread in the reagent beam velocities and angles. A detailed derivation of the Jacobian has been presented 
[13], and we obtain 


= — . (B2.3.10) 

dl/dft H 2 

Equation (B2.3.10) shows that the scattered intensity observed in the laboratory is distorted from that in the 
CM coordinate system. Those products which have a larger laboratory velocity or a smaller CM velocity will 
be observed in the laboratory with a greater intensity. 

The detection technique can also have an effect upon the angle- and velocity-dependent intensities. Cross 
sections refer to fluxes of molecules into a given range of velocities and angles. The commonly employed 
technique of mass spectrometric detection provides a measure of the density in the ionization region. Since 
density and flux are related by the velocity, we must include a factor of 1/v' in making the transformation 
indicated in equation (B2.3.10) from the CM cross sections to the measured laboratory intensities. 

If the reagent velocity and angular spreads are sufficiently small, one can infer the CM angle-velocity 
distributions, i.e. the CM differential cross section on the right-hand side of equation (B2.3.10), directly from 
the measured laboratory intensities, by simply transforming the velocities to the CM frame and removing the 
transformation of the Jacobian, with inclusion of the velocity-dependent detection efficiency. On the other 
hand, as the scattering experiments are often carried out with limited resolution, it is usually necessary to 
deconvolute the results over the experimental spread [14]. More commonly, a forward convolution technique 
is employed, in which the CM angle-velocity distribution is adjusted until the laboratory distribution 
calculated by transformation of the coordinate system and convoluted over the experimental spreads agrees 
with the measured laboratory distribution. 

B2.3.2.3 BEAM SOURCES 

Many reactive scattering experiments involve the reaction of an atomic species, such as hydrogen, oxygen, a 
halogen, or a metal atom, with a stable molecular reagent. A variety of techniques have been employed for the 
generation of the reagent atomic beam. Beams of halogen atoms have been prepared by thermal dissociation. 
At room temperature, these elements exist as diatomic molecules, while the equilibrium is shifted toward the 
monatomic species at sufficiently high temperatures. A detailed description of such a source for the 
production of CI, Br, and I beams is given by Valentini et al [15]. The atomic beam is prepared by heating the 
halogen molecule, diluted in a rare gas, to 2000 °C in a graphite tube. At this temperature, dissociation to 
atoms is essentially complete. In order to reduce the spread in velocities, the gas mixture is expanded 
supersonically into vacuum. Problems with materials corrosion have, until recently, limited the intensities of 
atomic fluorine beams. Use of a nickel tube limits the temperature to 700 °C, for 


which dissociation yields of <15% are obtained. Recently a F atom source employing a tube made of single- 
crystal MgF 2 has been constructed, and dissociation fractions of -80% have been achieved at tube 
temperatures near 1000 °C [16, 17]. 

Thermal dissociation is not suitable for the generation of beams of oxygen atoms, and RF [ 18 ] and microwave 
[ 19 ] discharges have been employed in this case. The first excited electronic state, O(D), has a different spin 
multiplicity than the ground 0( 3 P) state and is electronically metastable. The collision dynamics of this very 
reactive state have also been studied in crossed-beam reactions with a RF discharge source which has been 

optimized for production of 0( D) [20], 

Beams of metal atoms have been prepared by many researchers through thermal vaporization from a heated 
crucible. An example of such a source, employed for the generation of beams of alkaline earth atoms, is 
described by Irvin and Dagdigian [21]. By striking an electrical discharge within this source, beams 


containing electronically excited metastable atoms could be prepared. For Ca, the conversion efficiency to the 

3s3p 3 P and 3s4d D metastable excited states was of the order of 80%. Laser ablation from a solid has been 
widely used to generate atomic atoms of refractory elements [22]. A detailed description of such a source for 
the production of a beam of atomic carbon has been given [23], This source has been employed in crossed- 
beam studies of reactions of carbon atoms. 

Laser photolysis of a precursor may also be used to generate a reagent. In a crossed-beam study of the D + H 2 
reaction [24], a hyperthermal beam of deuterium atoms (0.5 to 1 eV translational energy) was prepared by 248 
nm photolysis of DI. This preparation method has been widely used for the preparation of molecular free 
radicals, both in beams and in experiments in a cell, with laser detection of the products. Laser photolysis as a 
method to prepare reagents in experiments in which the products are optically detected is further discussed 
below. 

In most reactive scattering experiments, the reagent beam sources, which are housed in differentially pumped 
enclosures, are fixed and cross at a 90° intersection angle, while the detector is rotated about the scattering 
centre. The stable molecular co-reagent is usually produced in an effusive or supersonic source of the pure 
reagent. Care must be taken to ensure that no clusters are formed in the beam source, for example by heating 
the source or by limiting the total pressure behind the source orifice. 

B2.3.2.4 AN EXAMPLE REACTION: F + H 2 ^HF + H 

This reaction has been intensively studied because of its accessibility to both experimental and theoretical 
treatments. This reaction is also important because it is the pumping mechanism for the hydrogen fluoride 
infrared chemical laser. We present some data from the extensive study in 1985 by Lee and co-workers 
(Neumark et al [25, 26]) and recent, higher resolution experiments by Faubel et al [27, 28]. Figure B2.3.3 
presents a schematic diagram of the apparatus in which Lee and co-workers carried out their experiments [29]. 
An effusive beam of fluorine atoms was prepared by thermal dissociation of F 2 in a nickel tube at 650 °C. The 
velocity spread was reduced to 1 1% by passage through a slotted-disk chopper. The fluorine atom beam was 
crossed with a supersonic molecular hydrogen beam, and the incident relative translational energy was varied 
by changing the temperature of the molecular beam source. The products were detected in a triply 
differentially pumped mass spectrometer employing electron-impact ionization. Cryogenically cooled 
surfaces also provided additional pumping to reduce the background signal, primarily from diffuse scattering 
from surfaces. The laboratory velocity distributions of the product at various laboratory scattering angles were 
measured by a time-of- flight method with a mechanical chopper. Crossed-beam scattering of a normal 


hydrogen beam, a/?ara-hydrogen beam (all molecules in they = rotational level) [25], and beams of D 2 and 
HD [26] was studied. 



Figure B2.3.3. Crossed-molecular beam apparatus employed for the study of the F + D 2 — » DF + D reaction. 
Indicated in the figure are: (1) the effusive F atom source; (2) slotted-disk velocity selector; (3) liquid- 
nitrogen-cooled trap; (4) D 2 beam source; (7) skimmer; (8) chopper; (9) cross-correlation chopper for product 
velocity analysis; and (11) rotatable, ultrahigh- vacuum, triply differentially pumped, mass spectrometer 
detector chamber. Reprinted with permission from Lee [29]. Copyright 1987 American Association for the 
Advancement of Science. 

We present results on the F + D 2 reaction. Study of the reaction of the D 2 isotopic reagent was easier because 
the background in the mass spectrometer was smaller at mass 21 (DF) than for mass 20 (HF). Moreover, the 
masses of the DF and D are less dissimilar, so that the DF product is less kinematically constrained in the 
range of accessible laboratory scattering angles. Figure B2.3.4 presents the laboratory angular distribution and 
a velocity vector diagram for the reaction, showing the accessible angular ranges for the product vibrational 
levels. From this plot, it already appears that the bulk of the DF products is made in the v = 3 and 4 vibrational 
levels that are scattered backward in the CM frame with respect to the incident F atom beam. 
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Figure B2.3.4. Laboratory angular distribution of DF products from the F + D 2 reaction at an incident relative 
translational energy of 1.82 kcal mol [26]. The full curve shows the fit with the derived CM angle-velocity 


contour. The angular distributions for the v = 1, 2, 3, and 4 vibrational levels are indicated by (- 
( ), and (-.-), respectively. (By permission from AIP.) 
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The above conjectures need to be verified by measurement of the doubly differential cross sections in angle 
and velocity. Typical time-of- flight distributions at several laboratory scattering angles from the more recent, 
higher resolution experiments of Faubel et al [27, 28] are presented in figure B2.3.5 . We see that products 
formed in the different vibrational levels appear at distinct time intervals, corresponding to different 
laboratory velocities. From data such as those presented in figure B2.3.5 and after transformation from the 
laboratory to the CM frame, the CM velocity flux contour map is obtained. Figure B2.3.6 displays such a 
contour plot derived by Lee and co-workers (Neumark et al [26]) for the F + D 2 reaction at one collision 
energy. It can be seen that all DF vibrational levels are predominantly scattered into the backward 
hemisphere. The CM angular spread is seen to be larger for the highest energetically accessible level, v = 4. 
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Figure B2.3.5. Typical time-of- flight spectra of DF products from the F + D 2 reaction [28]. The collision 
energies and in-plane (® lab ) and out-of-plane (^ lab ) laboratory scattered angles are given in each panel. The 
DF product vibrational quantum number v f associated with each peak is indicated. Reprinted with permission 
from Faubel et al [28], Copyright 1997 American Chemical Society. 
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Figure B2.3.6. CM angle-velocity contour plot for the F + D 2 reaction at an incident relative translational 
energy of 1.82 kcal mol [26]. Contours are given at equally spaced intensity intervals. This CM differential 
cross section was used to generate the calculated laboratory angular distributions given in figure B2.3.4. (By 
permission from AIP.) 

Keil and co-workers (Dharmasena et al [16]) have combined the crossed-beam technique with a state- 
selective detection technique to measure the angular distribution of HF products, in specific vibration-rotation 
states, from the F + H 2 reaction. Individual states are detected by vibrational excitation with an infrared laser 
and detection of the deposited energy with a bolometer [30]. 

B2.3.2.5 PROBLEMS WITH PRODUCT IDENTIFICATION 

It is well known that the electron-impact ionization mass spectrum contains both the parent and fragment ions. 
The observed fragmentation pattern can be useful in identifying the parent molecule. This ion fragmentation 
also occurs with mass spectrometric detection of reaction products and can cause problems with identification 
of the products. This problem can be exacerbated in the mass spectrometric detection of reaction products 
because these internally excited molecules can have very different fragmentation patterns than thermal 
molecules. The parent molecules associated with the various fragment ions can usually be sorted out by 
comparison of the angular distributions of the detected ions [8]. 

Many of the problems associated with electron impact ionization, such as the formation of fragment ions, can 
be alleviated by the use of photoionization detection. When the photon wavelength is tuned below the 
dissociative ionization threshold, it is possible to ionize the molecule 'softly', with the formation of parent 
ions only. This advantage for photoionization arises because the cross sections generally rise rapidly from the 
energetic threshold, namely the ionization potential. Recently, a molecular beam scattering apparatus using 
photoionization mass spectrometric detection of the products has been constructed [12]. This apparatus, 
shown schematically in figure B2.3.7 takes advantage of the intense vacuum ultraviolet (VUV) radiation 
available from a third-generation synchrotron radiation source at a national facility, the advanced light source, 
at the Lawrence Berkeley National Laboratory. In most scattering experiments, the detector is rotated about 
the crossing point of the fixed reagent beams. In this newly constructed apparatus, the detector includes an 
electron storage ring and cannot be rotated; in this case, it is the 
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sources that are rotated about the scattering centre. This apparatus makes liberal use of turbomolecular pumps, 
which can be positioned in any orientation and are convenient for vacuum pumps on rotating assemblies. 
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Figure B2.3.7. Schematic apparatus of crossed molecular beam apparatus with synchrotron photoionization 
mass spectrometric detection of the products [12]. To vary the scattering angle, the beam source assembly is 
rotated in the plane of the detector. (By permission from AIP.) 


B2.3.3 OPTICAL DETECTION OF THE REACTION PRODUCTS 

Optical methods, in both bulb and beam experiments, have been employed to determine the relative 
populations of individual internal quantum states of products of chemical reactions. Most commonly, such 
methods employ a transition to an excited electronic, rather than vibrational, level of the molecule. Molecular 
electronic transitions occur in the visible and ultraviolet, and detection of emission in these spectral regions 
can be accomplished much more sensitively than in the infrared, where vibrational transitions occur. In 
addition to their use in the study of collisional reaction dynamics, laser spectroscopic methods have been 
widely applied for the measurement of temperature and species concentrations in many different kinds of 
reaction media, including combustion media [31] and atmospheric chemistry [32]. 


-14- 


B2.3.3.1 LASER-INDUCED FLUORESCENCE DETECTION 


The most widely employed optical method for the study of chemical reaction dynamics has been laser- 
induced fluorescence. This detection scheme is schematically illustrated in the left-hand side of figure B2.3.8. 
A tunable laser is scanned through an electronic band system of the molecule, while the fluorescence emission 
is detected. This maps out an 'action spectrum' that can be used to determine the relative concentrations of the 
various vibration-rotation levels of the molecule. 
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Figure B2.3.8. Energy-level schemes describing various optical methods for state-selectively detecting 
chemical reaction products: left-hand side, laser-induced fluorescence (LIF); centre, resonance-enhanced 
multiphoton ionization (REMPI); and right-hand side, coherent anti-Stokes Raman spectroscopy (CARS). The 
ionization continuum is denoted by a shaded area. The dashed lines indicate virtual electronic states. Straight 
arrows indicate coherent radiation, while a wavy arrow denotes spontaneous emission. 

There are several requirements for this to be a suitable detection method for a given molecule. Obviously, the 
molecule must have a transition to a bound, excited electronic state whose wavelength can be reached with 
tunable laser radiation, and the band system must have been previously spectroscopically assigned. If the 
molecules are formed with considerable vibrational excitation, the available spectroscopic data may not 
extend up to these vibrational levels. Transitions in the visible can be accessed directly by the output of a 
tunable dye laser, while transitions in the ultraviolet can be reached by frequency-doubled radiation. The 

excited state must also have a reasonably short radiative lifetime (say <10 -5 s) with a near 100% fluorescence 
quantum yield (preferably independent of internal state). Finally, assignments of the individual rotational lines 
and the vibrational bands within the electronic transition must be available. These restrictions place 
considerable limits on the molecules which can be detected in this way — mainly diatomics and some 
triatomics. For incisive interpretation of the experimental observations, it is precisely those reactions 
involving small molecules whose collision dynamics can be treated theoretically with modern quantum 
mechanical methods. 

Figure B2.3.9 presents a schematic diagram of a typical laser fluorescence experiment. In this apparatus, one 
of the reagents is prepared by photolysis of a suitable precursor [ 33 ] using radiation from an excimer laser 
(usually 248 nm from a KrF laser or 193 nm from a ArF laser). The tunable laser employed for fluorescence 
excitation counter-propagates along the beam of the excimer laser. Fluorescence of the product molecules is 
collected with a telescope and is imaged onto a photomultiplier. Because of their greater coverage of 
wavelengths, pulsed, rather than continuous (cw), lasers are almost universally employed. Thus, the 
photomultiplier output signal will typically appear at the 10-50 
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Hz repetition rate of the lasers and is usually sampled with a gated integrator, whose output is recorded with a 
laboratory computer. Analogue, rather than digital, electronics is usually employed because of pile-up of the 
detected photon counts in an experiment with reasonable product intensities. 
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Figure B2.3.9. Schematic diagram of an apparatus for laser fluorescence detection of reaction products. The 
dye laser is synchronized to fire a short delay after the excimer laser pulse, which is used to generate one of 
the reagents photolytically. 

The principal source of background in laser fluorescence detection is laser light scattered diffusely from 
optical elements such as windows, and from surfaces such as the walls of the apparatus. The windows for 
entry and exit of the laser beam, which are significant sources of scattered light, are usually mounted on long 
(0.4-0.8 m) sidearms. Baffles are installed in the sidearms to prevent light from scattering off the inside of the 
sidearms. Further reduction of scattered light can be achieved through the use of imaging optics (as illustrated 
in figure B2.3.9 to relay fluorescence from the excitation zone to the photomultiplier detector, so that 
emission from only a well defined volume is detected. If the fluorescence is mainly at wavelengths greatly 
different from the excitation wavelength, as would be the case, for example, for a molecule with significantly 
different equilibrium internuclear separations in the ground and excited electronic state, and hence a very non- 
diagonal Franck-Cordon array, then spectral filtering can provide further reduction in the background signal. 

B2.3.3.2 DETERMINATION OF PRODUCT INTERNAL STATE DISTRIBUTIONS 

Considerable spectroscopic data are required for the determination of the relative populations in the various 
internal quantum levels of the product from the relative intensities of various lines, or bands, in a spectrum. 
As discussed above, the spectrum must be assigned, i.e. the quantum numbers of the upper and lower levels of 
the spectral lines must be available. In addition to the line positions, intensity information is also required. 

To compare the relative populations of vibrational levels, the intensities of vibrational transitions out of these 
levels are compared. Figure B2.3.10 displays typical potential energy curves of the ground and an excited 
electronic state of a diatomic molecule. The intensity of a (v',v") vibrational transition can be written as 


I(v\ v f ) = CJV^p,^.- 


(B2.3.11) 
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where N „ is the desired density of product molecules in the vibrational level v",p , „ is the vibrational band 
strength [34, 35], and C is a proportionality constant. If the electronic transition moment [36] is constant as a 
function of the internuclear separation, then/? v , „ is proportional to the Franck-Condon factor q v , „, i.e. the 
square of the overlap integral of the upper and lower vibrational wavefunctions (see figure B2.3.10 . The band 
strengths are usually determined experimentally from measurement of the decay lifetime of the excited 
vibrational level and the branching of emission into the various ground state vibrational levels v", as 
illustrated for the A-X transition of the hydroxyl radical [37]. Alternatively, if the potential energy curves of 
the two electronic states can be calculated from spectroscopic data, e.g. by the RKR method [38], then 
Franck-Condon factors can be computed [35] and used to estimate the relative band strengths. 
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Figure B2.3.10. Potential energy curves [ 42 ] of the ground X 2 Yl and excited A S + electronic states of the 
hydroxyl radical. Several vibrational levels are explicitly drawn in each electronic state. One vibrational 
transition is explicitly indicated, and the upper and lower vibrational wavefunctions are plotted. The upper and 
lower state vibrational quantum numbers are denoted V and v", respectively. Also shown is one of the three 
repulsive potential energy curves which correlate with the ground 0( P) + H dissociation asymptote. These 
cause predissociation of the higher rotational and vibrational levels of the A S state. 

The H + N0 2 — » OH + NO reaction provides an excellent example of the use of laser fluorescence detection 
for the elucidation of the dynamics of a chemical reaction. This reaction is a prototype example of a radical- 
radical reaction in that the reagents and products are all open-shell free radical species. Both the hydroxyl and 
nitric oxide products can be conveniently detected by electronic excitation in the UV at wavelengths near 226 
and 308 nm, respectively. Atlases of rotational line positions for the lowest electronic band systems of these 

molecules (A S -X II for both) are available [39, 40], and accurate band strengths for transition between 
various vibrational levels in the ground and excited electronic states have been reported [37, 41]. Because it is 

crossed by repulsive electronic states correlating with the ground state atoms 0( 3 P) + H (see figure B2.3.10), 
the OH(A 2 S + ) state has low fluorescence quantum yields for rotational levels N' >25, 17, and 4 for 
vibrational levels V = 0, 1, and 2, respectively [42]. This causes some problems for the detection of higher 
vibrational levels of the OH product since the intensities of the vibrational bands are strongest for the so- 
called diagonal bands, i.e. those for which Av = v'-v" = [37]. Because of the excited-state predissociation, 
the higher levels must be detected through the weaker off-diagonal bands (Av < 0). 
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The dynamics of the H + N0 2 reaction have been studied by several different techniques including laser 
fluorescence excitation of the products, infrared chemiluminescence, crossed-beam scattering, and electron 
paramagnetic resonance. We highlight the laser fluorescence studies by Sauder and Dagdigian [ 43 ] and by 
Irvine et al [44]. In the former experiment, the NO products, in vibrational levels v < 2, were detected at the 
intersection of an atomic hydrogen beam, generated by a microwave discharge source, with a pulsed beam of 
N0 2 diluted in Ar. Figure B2.3.1 1 illustrates an excitation spectrum for the detection of NO products in the 
ground (v = 0) vibrational level. The structure in the spectrum results from the energy differences between the 
rotational/fine-structure levels in the upper and lower electronic states and the degree of internal excitation of 

the products. Figure B2.3.12 illustrates the rotational transitions allowed in a 2 S + - 2 n electronic transition. 
With such a diagram it is possible to determine the rotational/fine-structure level being detected through the 
excitation of a specific rotational line in the spectrum. 
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Figure B2.3.11. (a) Experimental laser fluorescence excitation spectrum of the A 2 S + -X 2 n (0,0) band for the 
NO product from the H + N0 2 reaction [43]. Individual lines in the various rotational branches are denoted by 

the total angular momentum J of the lower state, (b) Simulated spectrum with the NO rotational state 

populations adjusted to reproduce the spectrum in (a). (By permission from AIP.) 
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Both OH and NO are open-shell free radicals, with doublet electron spin multiplicity. Consequently, the 
coupling of the angular momentum of the unpaired electron with the angular momentum TV of nuclear rotation 
leads to a more complicated rotational energy level pattern than for a closed-shell molecule ( S + electronic 
state) [45]. For the upper, 2 S + electronic state, the electron spin S = ^can couple with the rotational angular 
momentum to yield two fine-structure levels, with total angular momenta J = N + land N - 1. These are 

conventionally [34] denoted F^ and F 2 , respectively, in order of increasing energy for a given value of J. The 
rotational energy is given approximately by BN(N+ 1), where B is the rotational constant, and the splitting of 
the fine-structure levels is usually much smaller and grows with increasing N. This pattern of rotational/fine- 
structure levels is illustrated in the upper portion of figure B2.3.12. 
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Figure B2.3.12. Rotational transitions between a specific pair of vibrational levels in a 2 S + - 2 n electronic 
transition. The total angular momentum J and the parity of the lowest rotational levels in each state are given. 

Hund's case (a) coupling is assumed for the II state. Conventional spectroscopic designations [ 34 ] are given 
for the allowed rotational transitions. 

For the II state, the projection A = 1 of the electron orbital angular momentum along the internuclear axis 
can couple with the projection Z = ± -^to yield two spin-orbit levels, II Q , with Q = jand k. The NO(XIl) 

state follows so-called Hund's case (a) coupling [34], for which the spin-orbit splitting is much larger than the 
rotational energy. In this case, the rotational energy within each spin-orbit level is given approximately by B 

[J(J+ 1) - Q ], with half-integral J > Q. It should be noted that a n state is orbitally degenerate, i.e. it has two 
components of the same energy. In Cartesian notation, these are often indicated as n^ and II. As a result, the 
rotational/fine-structure levels appear as nearly 
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degenerate pairs, with opposite parity, or symmetry with respect to reflection of all the coordinates through 
the space-fixed origin. These pairs of levels are called A-doublets. 

For high rotational levels, or for a molecule like OH, for which the spin-orbit splitting is small, even for low 
J, the pattern of rotational/fine-structure levels approaches the Hund's case (b) limit. In this situation, it is not 
meaningful to speak of the projection quantum number Q. Rather, we first consider the rotational angular 
momentum TV exclusive of the electron spin. This is then coupled with the spin to yield levels with total 
angular momentum J = N + ^and N- -|. As before, there are two nearly degenerate pairs of levels associated 

with each value of J. 


The rotational/fine-structure levels of the lower, II electronic state in figure B2.3.12 are drawn for a 
molecule near the case (a) limit since NO falls into this coupling scheme. Also indicated in the figure are the 
electric-dipole allowed rotational lines, indicated with conventional spectroscopic notation [34]. In the 


spectrum displayed in figure B2. 3. 11 the individual rotational lines appear to pile up into so-called 
'heads' [34] at four distinct wavenumbers. The splitting between the two pairs of heads can be roughly 
identified with the NO(X 2 U) spin-orbit splitting. 

In a conventional spectroscopic experiment, the intensity of a rotational transition within a given vibrational 
band can be written as 


I{J\ /') = C f [N r {2J*+\r ] ]Sj>,j» (B2.3.12) 

where Nj, is the density of the rotational/fine-structure level, J f is its total angular momentum, Sj, ^„ is the 
rotational line strength factor [34, 45], and C is a proportionality constant. The relative intensities of the 
rotational lines can be used with equation (B2.3.12) to derive the rotational/fine-structure state distribution 
associated with a given vibrational level. Zare [ 45 ] presents a detailed discussion of the calculation of 
rotational line strength factors for diatomic electronic transitions. 

Strictly speaking, equation (B2.3.12) does not apply to a measurement of the concentration through laser- 
induced fluorescence detection, as would be observed in the apparatus schematically illustrated in figure 
B2.3.9 . The rotational line strength factors Sj j, apply to the situation of isotropic irradiation and detection, 
which is clearly not the case for irradiation with a unidirectional polarized laser and detection of fluorescence 
emitted into a specific solid angle. Greene and Zare [46] have considered in detail the correct relationship 
between the molecular density and the intensity for arbitrary fluorescence excitation and detection geometries. 
In practice, because of the large angular momentum J often found for reaction products, the factors Sj j, 
follow fairly closely the /-dependence of the correct line strength factors, as long as lines in the same 
rotational branch are employed. 

An additional inadequacy of equation (B2.3.12) is the assumption of an isotropic Mj distribution of product 
molecules in the detected rotational level. The product could be aligned because of the dynamics of the 
reaction. An extreme case is that of alignment imposed by kinematics for the mass combination H+ HL — » 
HH+L, where //and L represent heavy and light atoms, respectively. From angular momentum conservation, 
we have J tQt = L f + J x = Lr+ J« where J tQt is the total angular momentum of the system. Here, the vectors L 
and / are the orbital and rotational angular momenta, respectively, and the subscripts i and/denote the 
reagents and products, respectively. Because of the small moment of inertia of the HL reagent, /. will be 
small. Moreover, Zywill also be small because of the small reduced mass of the HH-L combination of the 
products. Thus, we have L f = J f This then implies that the product rotational angular 
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momentum //.must be strongly polarized, with an anisotropic Mj distribution, since L f is perpendicular to the 
initial relative velocity. 

The anisotropy of the product rotational state distribution, or the polarization of the rotational angular 
momentum, is most conveniently parametrized through multipole moments of the Mj distribution [45]. Odd 
multipoles, such as the dipole, describe the orientation of the angular momentum /, i.e. which way the tips of 
the /vectors preferentially point. Even multipoles, such as the quadrupole, describe the alignment of/, i.e. 
the spatial distribution of the /vectors, regarded as a collection of double-headed arrows. Orr-Ewing and Zare 
[ 47 ] have discussed in detail the measurement of orientation and alignment in products of chemical reactions 
and what can be learned about the reaction dynamics from these measurements. 

At low laser powers, the fluorescence signal is linearly proportional to the power. However, the power 
available from most tunable laser systems is sufficient to cause partial saturation of the transition, with the 
result that the fluorescence intensity is no longer linearly proportional to the probe laser power. While more 


elaborate treatments have been given [48, 49], saturation can be simply described by a rate-equation model of 
radiative transitions with the help of the 2-level diagram in figure B2.3.13. If the laser pulse can be 
approximated as a rectangular pulse of length T, then the fraction/of molecules originally in the lower state 
which are excited is 


/ = [W ]2 /{W [2 + W u \ A lt ))[\ -mp(-{W u - W %] I A 2] }T)). 


(B2.3.13) 


The fluorescence signal is linearly proportional to the fraction/of molecules excited. The absorption rate W l2 
and the stimulated emission rate W 2l are proportional to the laser power. In the limit of low laser power, /is 
proportional to the laser power, while this is no longer true at high powers (W^ ^21 ^^21^ ^ are must ^ us 
be taken in a laser fluorescence experiment to be sure that one is operating in the linear regime, or that proper 
account of saturation effects is taken, since transitions with different strengths reach saturation at different 
laser powers. 
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Figure B2.3.13. Model 2-level system describing molecular optical excitation, with first-order excitation rate 
constant W^ 2 proportional to the laser power, and spontaneous (first-order rate constant A 2 ±) and stimulated 
(first-order rate constant W 2l proportional to the laser power) emission pathways. 

Following the procedures outlined above, internal state distributions for the products of the H + N0 2 reaction 

have been determined [ 43 , 44, 50]. Comparison of the intensities of various bands of the NO AS - X II 
electronic transitions, through equation (B2.3.11) , allows determination of the ratio of the populations of the 
vibrational levels of the NO product, From spectra such as that in figure B2.3.11 the rotational/fine-structure 
state distribution of the NO product in a particular vibrational level can be deduced. Figure B2.3.14 presents 
the vibration-rotation state distribution 
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derived for the NO product [43]. The vibrational state populations monotonically decrease with increasing v, 
up to v = 3, the highest detected level [50], 



5 20 5 405 60.5 BOjS 
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Figure B2.3.14. Experimentally derived vibration-rotation populations for the NO product from the H + N0 2 
reaction [43]. The fine-structure labels F^ and F 2 refer to the two ways that the projections S and A of the 
electron spin and orbital angular momenta along the internuclear axis of this open-shell can be coupled (Q = 
A + Z). (By permission from AIP.) 

In the work of Irvine et al [44], the OH product was detected, as illustrated by the fluorescence excitation 
spectrum in figure B2.3.15 . Since the rotational constant of OH is much larger than that of NO, the spectrum 

is much less congested. Since OH(X 2 II) follows Hund's case (b) coupling, the spin-orbit splitting is not 
directly reflected in any separations between rotational lines. The distribution in the product OH 
rotational/fine-structure levels was determined by the same methods as employed for the analysis of the NO 
spectrum. The degree of product OH vibrational excitation was found to be significantly greater than for the 
NO product. The H + N0 2 reaction proceeds on the ground state HONO PES, which has a fairly deep well 
corresponding to the stable nitrous acid molecule. Because of this well, the collision complex has a transient 
existence. However, its lifetime is not sufficiently long that the available energy is randomized through all the 
degrees of freedom. The 'new' OH bond is found to have more vibrational energy than the 'old' NO bond. 
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Figure B2.3.15. Laser fluorescence excitation spectrum of the A 2 S + -X 2 n (1,3) band for the OH product, in 
the v = 3 vibrational level, from the H + N0 2 reaction [44]. (By permission from AIP.) 

B2.3.3.3 PREPARATION OF REAGENTS 

In most reactive scattering experiments in which the products are detected optically, a thermodynamically 
labile species is allowed to react with a stable molecular reagent. In many experiments, this involves allowing 
a beam of the unstable species, prepared in a separately pumped vacuum chamber, to impinge upon the 
scattering partner in a so-called beam-gas scattering arrangement. The beam is usually prepared by one of the 
methods described in section B2. 3. 2. 3 , e.g. a high-temperature source of a beam of metal atoms or a 
microwave discharge source for a beam of hydrogen atoms. In some cases, two beams are crossed, and the 
products are detected in the collision zone [43, 51, 52]. In a few cases, the product of a reaction of two labile 

reagents have been studied, e.g. the OD product from the 0( P) + ND 2 reaction [53]. In this study, the oxygen 
atoms were prepared in a microwave discharge source while the ND 2 reagent was prepared by laser photolysis 

ofND 3 . 

Many optical studies have employed a quasi-static cell, through which the photolytic precursor of one of the 
reagents and the stable molecular reagent are slowly flowed. The reaction is then initiated by laser photolysis 
of the precursor, and the products are detected a short time after the photolysis event. To avoid collisional 
relaxation of the internal degrees of freedom of the product, the products must be detected in a shorter time 
when compared to the time between gas-kinetic collisions, that depends inversely upon the total pressure in 
the cell. In some cases, for example in case of the stable NO product from the H + N0 2 reaction discussed in 
section B2. 3. 3. 2 , the products are not removed by collisions with the walls and may have long residence times 
in the apparatus. Study of such reactions are better carried out with pulsed introduction of the reagents into the 
cell or under crossed-beam conditions. 
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B2.3.3.4 EXTRACTION OF ANGULAR INFORMATION (CORRELATIONS) 


With spectroscopic detection of the products, the angular distribution of the products is usually not measured. 
In principle, spectroscopic detection of the products can be incorporated into a crossed-beam scattering 
experiment of the type described in section B2. 3. 2 . There have been relatively few examples of such studies 
because of the great demands on detection sensitivity. The recent work of Keil and co-workers (Dharmasena 
et al [16]) on the F + H 2 reaction, mentioned in section B2. 3. 3 , is an excellent example of the implementation 


of state-selective optical detection in the measurement of the angular distribution of a reaction product. 

The use of photo lytically generated reagents in a cell, combined with sub-Doppler detection, has allowed the 
extraction of information on the angular distribution and also the alignment of the products in experiments 
carried out in a cell [54]. The theoretical treatment of Shafer et al [55] shows how, in principle, the reaction 
product angular distribution can be extracted from measurement of its laboratory velocity distribution when 
one of the reagents is prepared by photolysis. It is well known that the angular distribution of photolytically 
formed fragments can be expressed as [ 45 ] 


P«W) = [1 + 0P 2 (.OO^ h(>i )]/(47l) (B2.3.14) 

where 6 Dhot is the angle between the E vector of the photolysis laser and the fragment recoil direction, P 2 (x) is 
the second-order Legendre polynomial, and P is the recoil anisotropy parameter. The parameter p can vary 
from -1 (perpendicular-type transition) to 2 (parallel-type transition), where the type of transition refers to the 
direction of fragment recoil relative to the electronic transition moment of the dissociation transition [45]. 

In the bimolecular collision of the photolytically generated reagent, assumed to have a mass m 1 and laboratory 
speed Vp the centre-of-mass speed will be 

c = m\vi/(m\ +m 2 ) (B2.3.15) 

if the velocity of the co-reagent (mass m 2 ) can be neglected. In this case, the relative velocity vector v rel and c 
are parallel to the laboratory velocity v 1 of the photolytically generated reagent. The CM speed Uy of the 

detected product can be computed with equation (B2.3.8) . Figure B2.3.16 illustrates how the laboratory 
velocity ^d of this product is related to the CM scattering angle, 0'. Shafer et al [55] show that the laboratory 

velocity distribution f(v\) of the product is related to the CM differential cross section by 

f(v f 1 ) = Qv f iCu t % r l (~^\ [l+jSft(costf)ft(cos^)] for |c- nil < vj <(* + «;> 

V W/CM (B2.3.16) 

f{Vy) = for Vj < \c - w^| or v^j > {c +■ u'y) 
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where 


cosa = [vf +c 2 - uf]f(2V % c) (B2.3.17) 

and ^ JL is the angle between v\ and the E vector of the photolysis laser. 



Figure B2.3.16. Velocity diagram for the reaction of a photolytically generated reagent with an assumed 
stationary co-reagent. In this case, the relative velocity v rel of the reagents is parallel to the velocity c of the 
centre of mass. 

There are several practical limitations to the use of equation (B2.3.16) for the determination of CM angular 
distributions. The optimum kinematics for the use of this equation is the case where the speed c of the centre 
of mass is approximately equal to the product CM speed u^ . In the limiting case where the latter is small, the 

product laboratory distribution is dominated by the angular distribution of the velocity c of the centre of mass 
and nothing can be learned about the product CM angular distribution. In the opposite limiting case where 
u\ is much larger than c, the angular distribution of rf is limited by the angular distribution of the 

photolytically-prepared reagent. Equation (B2.3.16) assumes that the velocity of the co-reagent can be 
neglected. This applies to the situation where the reagents are pre-cooled in a supersonic beam expansion 
[56]. The effects of thermal averaging have been considered to describe photo-initiated reactors in room- 
temperature cells [54]. Information on the anisotropy of the rotational angular momentum can also be 
determined through the study of photo-initiated reactions by variation of the direction of the E vector of the 
probe laser [54, 57 ]. 

B2.3.3.5 OTHER SPECTROSCOPIC TECHNIQUES 

In addition to laser fluorescence excitation, several other laser spectroscopic methods have been found to be 
useful for the state-selective and sensitive detection of products of reactive collisions: resonance-enhanced 
multiphoton ionization [58], coherent anti-Stokes Raman scattering [59], bolometric detection with laser 
excitation [30], and direct infrared absorption [7]. Several additional laser techniques have been developed for 
use in spectroscopic studies or for diagnostics in reacting systems. Of these, four- wave mixing [ 60 ] is 
applicable to studies of reaction dynamics although it does have a somewhat lower sensitivity than the 
techniques mentioned above. 

The most widely used of these techniques is resonance-enhanced multiphoton ionization (REMPI) [58]. A 
schematic energy-level diagram of the most commonly employed variant (2 + 1) of this detection scheme is 
illustrated in the 
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centre of figure B2.3.8 . The molecules are irradiated with the focused output of a tunable laser. As the 
wavelength of the laser is tuned through a 2-photon transition in the molecule, there will be some electronic 
excitation. If the photon energy is sufficiently large that the ionization continuum can be reached by 
absorption of an additional photon by the molecule, then molecular ions can be efficiently produced. While 
non-resonant laser ionization is possible, the efficiency of ionization will be strongly enhanced at a 2-photon 
resonance in the molecule. This particular resonant ionization scheme is called 2 + 1 REMPI. Other ionization 
schemes are possible, with different numbers of photons required for electronic excitation and ionization of 
the molecule. 


A REMPI spectrum is usually recorded by monitoring the molecular ion signal as the laser wavelength is 
scanned, although it is also possible to record the spectrum by monitoring the photoelectron signal. In most 
applications of REMPI, the laser-produced ions are mass analysed in a time-of- flight mass spectrometer 
(TOFMS) [ 61 ] in order to detect the desired molecular ion in the presence of background ions, for example 
from non-resonant ionization of other species in the reaction chamber. This mass discrimination is particularly 
important in probing chemical reaction products since the product ion signal could be obscured by a small 
degree of non-resonant ionization of reagent molecules present at much higher concentrations than the 
product. 

This technique can be used both to permit the spectroscopic detection of molecules, such as H 2 and HC1, 
whose first electronic transition lies in the vacuum ultraviolet spectral region, for which laser excitation is 
possible but inconvenient [62], or molecules such as CH 3 that do not fluoresce. With 2-photon excitation, the 
required wavelengths are in the ultraviolet, conveniently generated by frequency-doubled dye lasers, rather 
than 1 -photon excitation in the vacuum ultraviolet. Figure B2.3.17 displays 2 + 1 REMPI spectra of the HC1 
and DC1 products, both in their v = vibrational levels, from the CI + (CH 3 ) 3 CD reaction [63]. For some 
electronic states of HC1/DC1, both parent and fragment ions are produced, and the spectrum in figure B2.3.17 
for the DC1 product was recorded by monitoring mass 2 (D + ) ions. In this case, both isotopomers (D 35 C1 and 
D 37 C1) are detected. 

In the ideal case for REMPI, the efficiency of ion production is proportional to the line strength factors for 2- 
photon excitation [64], since the ionization step can be taken to have a wavelength- and state-independent 
efficiency. In actual practice, fragment ions can be produced upon absorption of a fourth photon, or the 
ionization efficiency can be reduced through predissociation of the electronically excited state. It is advisable 
to employ experimentally measured ionization efficiency line strength factors to calibrate the detection 
sensitivity. With sufficient knowledge of the excited molecular electronic states, it is possible to understand 
the state dependence of these intensity factors [65], 

Product angular and velocity distributions can be measured with REMPI detection, similar to Doppler probing 
in a laser-induced fluorescence experiment discussed in section B2. 3. 3. 5 . With appropriate time- and space- 
resolved ion detection, it is possible, in principle, to determine the three-dimensional velocity distribution of a 
product (see equation (B2.3.16) ). The time-of-arrival of a particular mass in the TOFMS will be broadened by 
the velocity of the neutral molecule being detected. In some modes of operation of a TOFMS, e.g. space- 
focusing conditions [61], the shift of the arrival time from the centre of a mass peak is proportional to the 
projection of the molecular velocity along the TOFMS axis. In addition, Doppler tuning of the probe laser 
allows one component of the velocity perpendicular to the TOFMS axis to be determined. A more general 
approach for the two-dimensional velocity distribution in the plane perpendicular to the TOFMS direction 
involves the use of imaging detectors [66], 
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Figure B2.3.17. REMPI spectra of the HC1 and DC1 products from the reaction of CI atoms with (CH 3 ) 3 CD 
[63]. The mass 36 and 2 ion signals are plotted as a function of the 2-photon wavenumber. Assignments of the 


g-branch lines (AJ= 0) of the E X 2T-X 1Z + (0,0) bands of H J:) C1 and D J:) C1 are given. In (b), both the D^Cl 
and D 37 C1 isotopomers are observed since D + ions are monitored. 

Welge and co-workers (Schnieder et al [67]) have developed a resonant ionization technique for hydrogen 
atoms which allows the determination of the velocity to -0.3% by a time-of- flight method. The hydrogen 
atoms are sequentially irradiated in the detection zone with a 121.6 nm laser, which is resonant with the n = 2 
<— n = 1 transition, and -365 nm light to produce high-n Rydberg atoms: 


H(ji = l)+ ]2L6itm-* H*(i! = 2} + --365 mm -^ H M («^40). 


(B2.3.18) 


The Rydberg atoms are allowed to drift through a -1 m flight path, after which they are field ionized with a 
strong electric field, and the resulting ions collected with a particle detector. The key to the high-velocity 
resolution of this technique is to ionize the atoms far from the laser interaction region. By contrast, when the 
ions are produced in this region, space-charge effects can lead to a significant velocity spread. This detection 
technique has been applied to the study of the H + D 2 reaction through measurement of the velocity 
distribution of the D atom products [68]. The laboratory velocities of D atoms formed in coincidence with 
specific HD vibration-rotation states were resolved, and angularly resolved differential cross sections for the 
formation of HD products in specific states were determined. This H atom detection technique has also been 
extensively employed for the study of the dynamics of the photodissociation of hydride molecules [69]. 

Coherent anti-Stokes Raman spectroscopy (CARS) [ 59 ] has also found utility in the determination of the 
internal state distributions of products of chemical reactions. This is one of several coherent Raman 
spectroscopies based on the 
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existence of vibrational resonances in four- wave mixing. As illustrated in the schematic energy level diagram 
in the right-hand side of figure B2.3.8 electric fields at three frequencies are mixed to produce a fourth field. 
In most CARS experiments, v 1 is held fixed, usually at 532 nm, the second harmonic of a Nd:YAG laser 
output, while v 2 is scanned. The intensity of the output field at v 3 is enhanced whenever the difference v 1 - v 2 
equals the energy difference between two molecular levels connected by a Raman transition. Unlike the 


normal, spontaneous Raman process, this mixing is coherent, and the output light is a coherent beam 
propagating in a particular direction. The high intensity and directionality provides a great increase in 
detection sensitivity. In reactive scattering experiments, the CARS technique has been employed for the 
determination of the vibration-rotation state distributions of H 2 and HD products in reactions yielding 
hydrogen molecular products, e.g. the H + D 2 [70] and H + HX (X = halogen) [ 71 ] reactions. 

Recently, the state-selective detection of reaction products through infrared absorption on vibrational 
transitions has been achieved and applied to the study of HF products from the F + H 2 reaction by Nesbitt and 
co-workers (Chapman et al [7]). The relatively low sensitivity for direct absorption has been circumvented by 
the use of a multi-pass absorption arrangement with a narrow-band tunable infrared laser and dual beam 
differential detection of the incident and transmission beams on matched detectors. A particular advantage of 
probing the products through absorption is that the absolute concentration of the product molecules in a given 
vibration-rotation state can be determined. 


B2.3.4 CONCLUSION 

The molecular beam and laser techniques described in this section, especially in combination with theoretical 
treatments using accurate PESs and a quantum mechanical description of the collisional event, have revealed 
considerable detail about the dynamics of chemical reactions. Several aspects of reactive scattering are 
currently drawing special attention. The measurement of vector correlations, for example as described in 
section B2. 3. 3. 5 , continue to be of particular interest, especially the interplay between the product angular 
distribution and rotational polarization. 

In most theoretical treatments of the collision dynamics, the reaction is assumed to proceed on a single PES. 
However, reactions involving open-shell reagents of products will involve several PESs. For example, in the F 
+ H 2 reaction, discussed in section B2. 3. 2.4 , three PESs emanate from the separated reagents, of which only 

one leads to the H + HF products. The F atom ground P 3 / 2 spin-orbit state is connected to the products by 
this reactive PES, while the excited P 1/2 state is not. Nevertheless, the P 1/2 state d° es react with F 2 to a 
small extent because of non-adiabatic transitions between the PESs. There is considerable current interest in 
elucidating the role of such non-adiabatic transitions in collision dynamics. 

The reaction of an atom with a diatomic molecule is the prototype of a chemical reaction. As the dynamics of 
a number of atom-diatom reactions are being understood in detail, attention is now being turned to the study 
of the dynamics of reactions involving larger molecules. The reaction of CI atoms with small aliphatic 
hydrocarbons is an example of the type of polyatomic reactions which are now being studied [ 56 , 63 , 72, 73 ]. 
The idea of controlling the outcome of a chemical reaction by exciting a particular bond in a reagent has long 
held considerable appeal. Such bond-selected chemistry has been achieved with simple triatomic reagents 
such as partially deuterated water, HOD, by preparation of the reagent in a suitable vibrational level [74]. 
Current interest is focused on the extension to larger reagents. This is more difficult than in triatomics because 
of intramolecular redistribution of the initial excitation [75], which becomes more rapid in larger molecules. 
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B2.4 NMR methods for studying exchanging 
systems 


Alex D Bain 


B2.4.1 INTRODUCTION 

No molecule is completely rigid and fixed. Molecules vibrate, parts of a molecule may rotate internally, weak 
bonds break and re-form. Nuclear magnetic resonance spectroscopy (NMR) is particularly well suited to 
observe an important class of these motions and rearrangements. An example is the restricted rotation about 
bonds, which can cause dramatic effects in the NMR spectrum (figure B2.4.1). 
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Figure B2.4.1. Proton NMR spectra of the TV, TV-dimethyl groups in 3-dimethylamino-7-methyl- 1,2,4- 
benzotriazine, as a function of temperature. Because of partial double-bond character, there is restricted 
rotation about the bond between the dimethylamino group and the ring. As the temperature is raised, the rate 
of rotation around the bond increases and the NMR signals of the two methyl groups broaden and coalesce. 


These exchanges often occur while the system is in macroscopic equilibrium — the sample itself remains the 
same and the dynamics may be invisible to other techniques. It is merely the environment of a given nucleus 
that changes. Since NMR follows an individual nucleus, it can easily follow these dynamic processes. This is 
just one of several reasons that the study of chemical exchange by NMR is important. 

First is the observation of the phenomenon itself — the exchange of axial and equatorial ligands in trigonal 
bipyramidal species, the scrambling of carbonyl ligands in metal complexes and dynamic behaviour of rings 
was mainly revealed and studied in detail by NMR methods. Not only does NMR give a detailed picture of 
the mechanism of the exchange, but it also provides excellent ways of measuring the reaction rate. 

Secondly, NMR is a good example of spectroscopy in general. The spectroscopic transition probability can be 
shown to have a simple physical interpretation in NMR: the total magnetization is divided amongst individual 
observable transitions and the intensity of a transition is related to its share of the total. This can be further 
generalized to exchanging systems, in which the transition probability now becomes a complex number. The 
exchange lineshapes can be decomposed into a sum of transitions, whose phase, intensity, position and 
linewidths are governed by the real and imaginary parts of the transition probability. 


Finally, exchange is a kinetic process and governed by absolute rate theory. Therefore, study of the rate as a 
function of temperature can provide thermodynamic data on the transition state, according to equation 
(B2.4.1)). This equation, in which k is Boltzmann's constant and h is Planck's constant, relates the observed 
rate to the Gibbs free energy of activation, AG' . 


h 


= kT c -A/# + /ffr c aV/j? 


(B2.4.1) 


In order to separate the enthalpy and the entropy of activation, the rate is measured as a function of 
temperature. These data should give a straight line on an Eyring plot of log(rate/7) against (1/1) ( figure 

B2.4.2 ). The slope of the line gives Mr , and the intercept at \IT= is related to AS*. A unimolecular 
reaction, such as many cases of exchange, might be expected to have a very small entropy change on going to 
the transition state. However, several systems have shown significant entropy contributions — entropy can 
make up more than 10% of the barrier. It is therefore important to measure the rates over as wide a range of 
temperatures as possible to obtain reliable thermodynamic data on the transition state. 

There are several ways of measuring exchange rates with NMR, each with its own optimum range. Figure 
B2.4.2 illustrates this. A combination of spin-spin relaxation time (T 2 ) measurements at high temperature, 
bandshape analysis in the intermediate regime and selective inversions at low temperature, gives rates over a 
range of about five orders of magnitude. This provides a very well defined Eyring plot. The first fully 
analysed example of chemical exchange in NMR was the proton spectrum of dimethylamides [I], observed at 
about the same time (and in the same laboratory) as the phenomenon of scalar coupling between nuclei. 
Figure B2.4.1 illustrates this type of behaviour. If there is no rotation about the bond joining the N, N 1 - 
dimethyl group to the ring, the proton NMR signals of the two methyl groups will have different chemical 
shifts. If the rotation were very fast, then the two methyl environments would be exchanged very quickly and 
only a single, average, methyl peak would appear in the proton NMR spectrum. Between these two extremes, 
spectra like those in figure B2.4.1 are observed. At low temperature, when the rate is slow, two 
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Figure B2.4.2. Eyring plot of log(rate/7) versus (1/2), where T is absolute temperature, for the cis-trans 
isomerism of the aldehyde group in furfural. Rates were obtained from three different experiments: T 2 
measurements (squares), bandshapes (triangles) and selective inversions (circles). The line is a linear 
regression to the data. The slope of the line is A H*/R, and the intercept at 1/7= is A S*/R, where R is the gas 
constant. A H* and A S* are the enthalpy and entropy of activation, according to equation (B2.4.1) 


sharp lines are seen. As the temperature is increased, the exchange becomes faster, the lines broaden, coalesce 
into a single line and finally become a single sharp line at the average chemical shift. This is perhaps the most 
familiar manifestation of chemical exchange in NMR. 

The different types of chemical exchange in NMR are classified according to the rate relative to some NMR 
timescale. The example in figure B2.4.1 is called intermediate exchange, in which the exchange rate is 
comparable to the chemical shift differences and coupling constants. Intermediate exchange gives an array of 
unusual and characteristic lineshapes in the spectrum, which can be explained quite neatly in terms of a 
generalized transition probability. Fast exchange is the regime well after coalescence, when only a single line 
is observed. There is still observable broadening due to exchange, and rates are measured from the linewidth, 
or equivalently, the spin-spin relaxation time, T 2 - In slow exchange, no dramatic line broadening is observed, 
but the exchange rate is comparable to the reciprocal of the spin-lattice relaxation time, Ty In this regime, 
modifications of the inversion-recovery experiment, or techniques related to the nuclear Overhauser effect 
(NOE), are used to measure rates. 

The timescale is just one sub-classification of chemical exchange. It can be further divided into coupled 
versus uncoupled systems, mutual or non-mutual exchange, inter- or intra-molecular processes and solids 
versus liquids. However, all of these can be treated in a consistent and clear fashion. 


The NMR experimental methods for studying chemical exchange are all fairly routine experiments, used in 
many other NMR contexts. To interpret these results, a numerical model of the exchange, as a function of 
rate, is fitted to the experimental data. It is therefore necessary to look at the theory behind the effects of 
chemical exchange. Much of the theory is developed for intermediate exchange, and this is the most complex 
case. However, with this theory, all of the rest of chemical exchange can be understood. 


B2.4.2 INTERMEDIATE EXCHANGE 

B2.4.2.1 INTRODUCTION 

Figure B2.4.1 shows the lineshape for intermediate chemical exchange between two equally populated sites 
without scalar coupling. For more complicated spin systems, the lineshapes are more complicated as well, 
since a spin may retain its coupling information even though its chemical shift changes in the exchange. 

Figure B2.4.3 shows an example of this in the aldehyde proton spectrum of N-labelled formamide. Some 
lines in the spectrum remain sharp, while others broaden and coalesce. There is no fundamental difference 
between the lineshapes in figures B2.4.1 and figures B2.4.3 — only a difference in the size of the matrices 
involved. First, the uncoupled case will be discussed, then the extension to coupled spin systems. 


JLLl 



363 K 


343 K 


323 K 


_uul 


3C3K 


>__1UL_ 


T- 


T " 


T 


S.65 B.60 S.55 ©SO 845 S,40 8.35 


15. 


ppm 


Figure B2.4.3. Proton NMR spectrum of the aldehyde proton in ^N-labelled formamide. This proton has 
couplings of 1.76 Hz and 13.55 Hz to the two amino protons, and a coupling of 15.0 Hz to the 15 N nucleus. 
The outer lines in the spectrum remain sharp, since they represent the sum of the couplings, which is 
unaffected by the exchange. The inner lines of the multiplet broaden and coalesce, as in figure B2.4.1 . The 
other peaks in the 303 K spectrum are due to the NH 2 protons, whose chemical shifts are even more 
temperature dependent than that of the aldehyde proton. 

The original analysis of the spectra in figure B2.4.1 was done by the groups of Gutowsky [1] and McConnell 
[2], both of whom treated the spectrum as a whole in the frequency domain. Reeves showed [3] somewhat 
later that the two-site exchange lineshape ( figure B2.4.4(a) )and figure B2.4.4(b) ) can be deconstructed into 
two transitions. More recently, it was demonstrated [4] that the these transitions are defined by a generalized 
transition probability. This transition probability (now a complex number) which is just the product of how 
much coherence a transition receives at the start of an experiment, times how much the transition contributes 
to the total signal. This leads naturally into a discussion of the xy magnetizations in the time domain. As with 
most NMR, the choice of the time domain or the frequency domain depends on the problem. 
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Figure B2.4.4. The two-site equally populated exchange lineshape ( figure B2.4.1 ) decomposed into two 
individual transitions. The bottom spectrum (a) is the situation before coalescence: two symmetrically out-of- 
phase lines. In slow exchange, these become the signals of the two sites. The top spectrum (b) is after 
coalescence: the lineshape is made up of two central lines, one positive and one negative. In fast exchange, the 
negative line broadens and loses intensity, to leave a single positive line at the average chemical shift. 

B2.4.2.2 THE BLOCH EQUATIONS APPROACH 

The Bloch equations for the motion of the x and y magnetizations (usually called the u- and v-mode signals), 
in the presence of a weak radiofrequency (RF) field, B ] , are given in equation (B2.4.2) ). 
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(B2.4.2) 


In this equation, co is the frequency of the RF irradiation, co is the Larmor frequency of the spin, T 2 is the 
spin-spin relaxation time and M z is the z magnetization of the spin system. The notation can be simplified 
somewhat by defining a complex magnetization, M, as in equation (B2.4.3). 


M = W+iu. (B2.4.3) 

With this definition, the Bloch equations can be written as in equation (B2.4.4)). 

<\M ] 
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In chemical exchange, the two exchanging sites, A and B, will have different Larmor frequencies, a> A and co B . 
Assuming equal populations in the two sites, and the rate of exchange to be k, the two coupled Bloch 
equations for the two sites are given by equation (B2.4.5)). 

d\t A I 

— — * i(<y A - <y)Af A + — M A - tAf u * J M A = iy ^ M^ 

(B2.4.5) 
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The observable NMR signal is the imaginary part of the sum of the two steady-state magnetizations, M A and 
M B . The steady state implies that the time derivatives are zero and a little further calculation (and neglect of 
T 2 terms) gives the NMR spectrum of an exchanging system as equation (B2.4.6)). 

v = -]/BiM T - — — . (B2.4.6) 
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B2.4.2.3 MATRIX FORMULATION OF CHEMICAL EXCHANGE 

Equation (B2.4.5) can be re-written in a matrix form [5] as equation (B2.4.7) ). 


In this equation, the matrices L, R and K are given by equation (B2.4.8), equation (B2.4.9) and equation 
(B2.4.10)). 

L-(*V" ° ) (B248) 

^ ton -to J 

R = ( Q J j ( B2A9 ) 

K=(_\ "/). ,B2.4,0, 

The steady-state solution without saturation to this equation is obtained by setting the time derivatives to zero 
and taking the terms linear in B^ as in equation (B2.4.1 1). 


(£)= (i L + R + K,-'(J£). ,B2.4,1; 


Recall that L contains the frequency co (equation (B2.4.8)). To trace out a spectrum, equation (B2.4.1 1)) is 
solved for each frequency. In order to obtain the observed signal v, the sum of the two individual 
magnetizations can be written as the dot product of two vectors, equation (B2.4.12)). 


V = {1 I) ,' }■ (B2.4.12) 




This apparently artificial way of re-writing the Bloch equations is important, since this form applies to all 
exchanging systems — coupled or uncoupled — in the frequency domain. The description starts with the 
equilibrium z magnetizations. These are affected by all the NMR interactions: chemical shifts, relaxation and 
exchange. Finally, the observed signal is detected. This is the standard preparation-evolution-detection 
paradigm used in multi-dimensional NMR. There may be algebraic and numerical complications in setting up 
and solving the equations for different systems, but the form remains the same for all frequency-domain 
calculations. 

B2.4.2.4 CHEMICAL EXCHANGE IN THE TIME DOMAIN 

If the magnetizations, M A and M B , are created (by a pulse) at time zero, and then the B^ magnetic field is 
turned off, equation (B2.4.7) can be simplified to equation (B2.4.13) ). Note that co in the matrix L (equation 
(B2.4.8)) is also zero. 


Equation (B2.4.13) is a pair of first-order differential equations, so its formal solution is given by equation 
(B2.4.14)), in which exp() means the exponential of a matrix. 

This is the description of NMR chemical exchange in the time domain. Note that this equation and equation 
(B2.4.11) ) are Fourier transforms of each other. The time-domain and frequency-domain pictures are always 
related in this way. 

In practice, the matrix (iL+R+K) is diagonalized first, with a matrix of eigenvectors, U, as in equation 
(B2.4.15)), to give a diagonal matrix, A, with the eigenvalues, X 1 , of L down the diagonal. 

A - IT 1 (iL + R + K)U. (B2.4.15) 

equation (B2.4.14) becomes equation (B2.4.16). 

(™)- B «,,r'(™). 


The exponential of a diagonal matrix is again a diagonal matrix with exponentials of the diagonal elements, 
equation (B2.4. 17)). 
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As was mentioned above, the observed signal is the imaginary part of the sum of M A and M B , so equation 
(B2.4.17)) predicts that the observed signal will be the sum of two exponentials, evolving at the complex 
frequencies A^ and A, 2 - This is the free induction decay (FID). In the limit of no exchange, the two frequencies 
are simply ico A and ico B , as expected. When k is non-zero, the situation is more complex. 

Without relaxation and exchange, L is a Hermitian matrix with real eigenvalues and eigenvectors. However, 
when the exchange contributes significantly, the Hermitian character is lost and the eigenvalues and 
eigenvectors have both real and imaginary parts. The eigenvalues are given by the roots of the characteristic 
equation, (B2.4.18) , in which 5 is (a> A - a> B )/2. 
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tf + J- + Jt-X 

-k 


-k 

•3 


= 0. 


(B2.4.18) 


The eigenvalues of equation (B2.4.16) ) are given in equation (B2.4.19)). 


;.= (i + *)±v^-^. 


(B2.4.19) 


These eigenvalues are the (complex) frequencies of the lines in the spectrum: the imaginary part gives the 
oscillation frequency and the real part gives the rate of decay. If k < 8 (slow exchange) then there are two 
different imaginary frequencies, which become ±8 in the limit of small k. Figure B2.4.4 a) shows this 
decomposition. In fast exchange, when k exceeds the shift difference, 8, the quantity in the square root in 
equation (B2.4.19) becomes positive, so the roots are pure real. This means that the spectrum is still two lines, 
but they are both at the average chemical shift (offset of zero) and have different widths ( figure B2.4.4(b) ). 

It is convenient, for simple systems, to have explicit expressions for equation (B2.4.17) . Since the original 
matrix is non-Hermitian, the matrix formed by the eigenvectors will not be unitary, and will have four 
independent complex elements. Let them be a, b, c and d, so that U is given by equation (B2.4.20). 


-C » 


(B2.4.20) 


Regardless of whether U is unitary, its inverse is given by equation (B2.4.21), where A is the determinant of 
the matrix. 




(B2.4.21) 


Equation (B2.4.16) then says that the signal is given by equation (B2.4.22), regardless of slow or fast 
exchange. 

Signal = c A| ' + c* 81 . (B2.4.22) 

* A A v ' 

For slow exchange, a convenient matrix of eigenvectors is given by equation (B2.4.23). 


/ k i^va^-F + m 


(B2.4.23) 
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After coalescence, a possible set of eigenvectors is given in equation (B2.4.24). If these are substituted into 
(B2.4.22), the results are pure real, reflecting the fact that Ir - 8 is now positive. 

(<&=*-* -^p"»V (B2.4.24, 

Because of the role of the eigenvectors in equation (B2.4.16) , the factor (amplitude) multiplying the complex 
exponential is itself complex. The magnitude of the complex amplitude gives the intensity of the line and its 
phase gives the phase of the line (the mixture of absorption and dispersion). In slow exchange, the two lines 
have the same real part, but the imaginary parts have opposite signs, so the phase distortion is opposite, as in 
( figure B2.4.4(a) ). The sum of these distorted lineshapes gives the familiar coalescence spectrum. In fast 
exchange, the two lines are both in phase, but one line is negative ( figure B2.4.4(b) ). This negative line is very 
broad, and decreases in absolute intensity as the rate increases, leaving only the single, positive, in-phase line 
for fast exchange. 

B2.4.2.5 CHEMICAL EXCHANGE IN COUPLED SPIN SYSTEMS 

The development given in the previous section is simply a special case of the general density matrix treatment 
of chemical exchange. In an uncoupled system, the whole of the coherence from one site is transferred to the 
other, since the signal is directly associated with a given nucleus. This simplifies the calculation. In a coupled 
spin system, particularly a strongly coupled system, this is no longer true. The relation between the lines in the 
spectrum and individual nuclei can be much more complicated. Furthermore, the amount of 'mixing' of nuclei 
in a given spectrum depends on the chemical shifts and couplings. Therefore, when a nucleus exchanges in a 
coupled system, coherence that was associated with a single line in one site may be distributed amongst 
several lines in the other site. In dealing with chemical exchange in coupled systems, it is necessary to keep 
track of the details of each of the lines in the spectrum, but the fundamental approach is the same. 

There is an important special case, called mutual exchange. In all exchange phenomena, a specific nucleus 
experiences a different magnetic environment when it moves from one site to the other. However, in many 
cases, the new arrangement is simply a permutation of the old, as in the case of formamide in ( figure B2.4.3 ). 
The two amide protons have switched places, but the chemical shifts and couplings are the same. All that has 
changed is the nuclei associated with each of them. This can be treated in the same way as all other 
exchanges, as two different sites. However, the permutation symmetry of the problem means that this is 
equivalent to copies of a single, mutual, exchange. The matrices are then reduced in size. It is possible to have 
quite complex permutations, so the analysis must be done carefully and systematically. It is therefore 


important to identify mutual exchange and treat it appropriately. 

Binsch [6] provided the standard way of calculating these lineshapes in the frequency domain, and 
implemented it in the program DNMR3 [7]. Formally, it is the same as the matrix description given in section 
(B2.4.2.3) . The calculation of the matrices L, R and K is more complex for a coupled spin system, but that 
should not interfere with the understanding of how the method works. This work will be discussed later, but 
first the time-domain approach will be developed. 

The basic equation [8] is the equation of motion for the density matrix, p, given in equation (B2.4.25) , in 
which H is the Hamiltonian. 
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2n at 

It is more convenient to re-express this equation in Liouville space [8, 9 and 10], in which the density matrix 
becomes a vector, and the commutator with the Hamiltonian becomes the Liouville superoperator. In this 
formulation, the lines in the spectrum are some of the elements of the density 'matrix' vector, and what 
happens to them is described by the superoperator matrix, equation (B2.4.25) becomes (B2.4.26). 

.ha 

l——p = Lfi. (B2.4.26) 

This Liouville-space equation of motion is exactly the time-domain Bloch equations approach used in 
equation (B2.4.13) . The magnetizations are arrayed in a vector, and anything that happens to them is 
represented by a matrix. In frequency units (h/27i = 1), the formal solution to equation (B2.4.26) is given by 
equation (B2.4.27) (compare equation (B2.4.14) ). 

p(t) = exp(-ih/)p(0). (B2.4.27) 

For a coupled spin system, the matrix of the Liouvillian must be calculated in the basis set for the spin system. 
Usually this is a simple product basis, often called product operators, since the vectors in Liouville space are 
spin operators. The matrix elements can be calculated in various ways. The Liouvillian is the commutator 
with the Hamiltonian, so matrix elements can be calculated from the commutation rules of spin operators. 
Alternatively, the angular momentum properties of Liouville space can be used. In either case, the chemical 
shift terms are easily calculated, but the coupling terms (since they are products of operators) are more 
complex. In section B2.4.2.7 , the Liouville matrix for the single-quantum transitions for an AB spin system is 
presented. 

Relaxation or chemical exchange can be easily added in Liouville space, by including a Redfield matrix, R, 
for relaxation, or a kinetic matrix, K, to describe exchange. The equation of motion for a general spin system 
becomes equation (B2.4.28). 

p{t) = exp(-iL - R - K)f p(0), (B2.4.28) 

In NMR, the magnetization in the xy plane is detected, so it is the expectation value of the / operator that is 
measured. This is just the unweighted sum of all the 7 xi operators for the individual spins i. It may be a 
function of several time variables (multi-dimensional experiments), including the time during the acquisition, 


but it is always given by equation (B2.4.29). 

(I A*)) = trace(/ T p(0)* (B2.4.29) 

In Liouville space, both the density matrix and the / operator are vectors. The dot product of these Liouville 
space 
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vectors is the trace of their product as operators. Therefore, the NMR signal, S, as a function of a single time 
variable, t, is given by equation (B2.4.30), in which the parentheses denote a Liouville space scalar product 
(compare equation (B2.4.12) ). 


S(i)=Ux I P(0> (B2.4.30) 

The experiment starts at equilibrium. In the high-temperature approximation, the equilibrium density operator 
is proportional to the sum of the / operators, which will be called F . If there are multiple exchanging sites 
with unequal populations,/^, the sum is a weighted one, as in equation (B2.4.31). 

t\ 

Fx-^Pihh (B2.4.31) 

r=l 

A simple, non-selective pulse starts the experiment. This rotates the equilibrium z magnetization onto the x 
axis. Note that neither the equilibrium state nor the effect of the pulse depend on the dynamics or the details of 
the spin Hamiltonian (chemical shifts and coupling constants). The equilibrium density matrix is proportional 
to F . After the pulse the density matrix is therefore given by F x and it will evolve as in equation (B2.4.27) . If 
(B2.4.28) is substituted into (B2.4.30), the NMR signal as a function of time t, is given by (B2.4.32). In this 
equation there is a distinction between the sum of the operators weighted by the equilibrium populations, F , 
from the unweighted sum, / . The detector sees each spin (but not each coherence!) equally well. 

SO) = Uvlcxp([-iL- R - KJOft). (B2.4.32) 

As with the uncoupled case, one solution involves diagonalizing the Liouville matrix, iL+R+K. If U is the 
matrix with the eigenvectors as columns, and A is the diagonal matrix with the eigenvalues down the 
diagonal, then (B2.4.32) can be written as (B2.4.33). This is similar to other eigenvalue problems in quantum 
mechanics, such as the transformation to normal co-ordinates in vibrational spectroscopy. 

S(t) = (/ J |Ucxp(-iA/)U" 1 |F ¥ ). (B2.4.33) 

Note that the Liouville matrix, iL+R+K may not be Hermitian, but it can still be diagonalized. Its eigenvalues 

and eigenvectors are not necessarily real, however, and the inverse of U may not be its complex-conjugate 

transpose. If complex numbers are allowed in it, equation (B2.4.33) is a general result. Since A is a diagonal 

matrix it can be expanded in terms of the individual eigenvalues, X.. The inverse matrix U _1 can be applied 
('backwards') to / , and we obtain equation (B2.4.34). 

Sit) = £(ir ] /.v) -<W,); e iV - (B2.4.34) 


In this equation, the index y runs over all the transitions and the exponents have both real and imaginary parts, 
which 
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give the linewidth and position of the lines. The terms before the exponential are also complex, giving the 
intensity and phase. 

For a general system, these sets of equations are huge, as written. For n spins -1/2, the density matrix has 2 
elements, so the Liouville matrix has 2 elements. However, the density matrix elements can be sorted 
according to coherence level — the number of quanta associated with the transition. In this case, the matrices 
block, and the largest single block is the one corresponding to the single-quantum transitions. Its size is the 
binomial coefficient (2n)!/(n+l)!(n-l)!. This can be further divided into blocks based on the factoring from 
spectral analysis. A transition can change the z quantum number of the spin wavefunction by only ±1. For 
three spins -1/2, the eight wave functions are divided as follows: one with z quantum number +3/2, three with 
+1/2, three with -1/2 and one with -3/2. Therefore, the 15 possible transitions are divided into two groups of 
three (+3/2 — » +1/2, and -1/2 — » -3/2), and a group of nine (+1/2 — » -1/2). Further reductions can be achieved 
using weak coupling approximations and magnetic equivalence. In practice, system of five or six spins can be 
treated with modern computers. 

B2.4.2.6 GENERALIZED TRANSITION PROBABILITIES 

The quantities (XT TV and (XJF „)■ in B2.4.34 are projections of the eigenvector^ along I x . From the above 
equations, this can be interpreted as follows. The term (UFY is the amount that the transition^' received from 

the total x magnetization created from the equilibrium state and (U IV is how much that transition 
contributes to the observed signal. These two terms may not be equal, as will be seen in exchanging systems. 

An informal way of thinking about these terms is to consider the transition moment, (c^I Ify). If the ket-bra 
operator |(|) f ) (c^l represents the transition, then the transition moment is the projection of the transition 
operator along the / operator. 

In the usual preparation-evolution-detection paradigm, neither the preparation nor the detection depend on 
the details of the Hamiltonian, except in special cases. Starting from equilibrium, a hard pulse gives a density 
matrix that is just proportional to F z . The detector picks up only the unweighted sum of the spin operators, / . 
It is only during an evolution (perhaps between sampling points in an FID) that these totals need be divided 
amongst the various lines in the spectrum. Therefore, one of the factors in the transition probability represents 
the conversion from preparation to evolution; the other factor represents the conversion back from evolution 
to detection. 

Equation (B2.4.33) and equation (B2.4.34) are the basic equations for a time-domain description. For 
instance, they say that any time-domain NMR signal is the sum of decaying oscillations. This is obvious from 
the fact that it is described by a first-order differential equation, but (B2.4.34) gives a way of calculating the 
values of these exponentials for any system, static or dynamic. The distinctions amongst different types of 
spectrum lie in the eigenvalues and eigenvectors of the Liouville matrix iL+R+K. equation (B2.4.34) 
describes static spectra, spin relaxation and spectra showing the effects of chemical exchange or T 2 relaxation, 
in a single, unified picture. 

B2.4.2.7 EXAMPLE OF THE AB SPIN SYSTEM 

For example, the observed transitions of an AB spin system have a Liouville matrix given in equation 
(B2.4.35) . The coupling constant is J, and it is assumed that a> B = -a> A = -8/2, so that 8 is the frequency 
difference between the two sites. The angle, 0, is defined for the AB system by the equation tan(0)=J/2S. The 
Liouville space basis used here is the superspin equivalent of the four product operators (/ A , I*I?> f^ ¥ /f / A ), 


and a set of rules for calculating these elements is given elsewhere [12]. 
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(B2.4.35) 


The four eigenvalues, which give the positions of the lines, are ±J/2±((J/2) 2 +(5/2) 2 ) , as expected for an AB 
system. The matrix of eigenvectors as columns is given in equation (B2.4.36), in which c = cos(0), s = sin(0) 
and 5 is defined above. 
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(B2.4.36) 


In the basis used in (B2.4.35), the total x magnetization is proportional to the vector (1,0, 1, 0). Taking the dot 
product with the eigenvectors shows that the outer lines receive (cos0 - sin0) from the total, whereas the inner 
lines receive (cos0+sin0). The squares of these terms give the familiar AB system intensities: (l-sin20) and 
(l+sin2e). 

Once the AB spin system is defined, the effects of chemical exchange can be calculated. This can be either 
non-mutual or mutual exchange. In the case of non-mutual exchange, there are two blocks, one for each site, 
and the exchange connecting them, as in equation (B2.4.37). For a simple product basis, the exchange always 
has this form: the off-diagonal blocks are themselves diagonal and the sum of the exchange contributions in 
any column must be zero, to preserve the number of spins. In this equation, zeros have been replaced by dots 
to emphasize the form of the matrix. 
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(B2.4.37) 


In this equation, the primes on the imaginary parts indicate that the Larmor frequencies and coupling 
constants will be different. Also, if the equilibrium constant for the exchange is not 1, then the forward and 
reverse rates will not be equal. Note that the 1,2 block, in the top right, represents the rate from site 2 into site 
1. 


B2.4.2.8 MUTUAL EXCHANGE IN THE AB SYSTEM 


In the case of mutual AB exchange this matrix can be simplified. The equilibrium constant must be 1, so k 
k\ Also, co' A is equal to a> B and vice versa, and the coupling constant is the same. For instance, if L is the 
Liouville matrix for one site, then the Liouville matrix for the other site is P _1 LP, where P is the matrix 
describing the permutation. 
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The exchange matrix, K, is just the rate, k, times the unit matrix. In block form, the full matrix for two sites is 
given in the eigenvalue equation, (B2.4.38). 


This equation is equivalent to the pair of equations in (B2.4.39). 

Hm -K(] -P ']tf = ka 

(B2.4.39) 
iP" 1 !^ - K(P"' - \Ui =aP"'«. 

Since K is a multiple of the unit matrix and the permutation is its own inverse, the two equation (B2.4.39) are 
the same. The Liouvillian for a single site is set up and the exchange is described by K(l-P _1 ). 

Application of this approach to equation (B2.4.37) gives equation (B2.4.40). If a> B = -a> A = -8/2, the 
symmetry of the matrix and one additional transformation means that it can be broken into two 2x2 complex 
matrices, which can be diagonalized analytically. The resulting lineshapes match the published solutions [13]. 
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(B2.4.40) 


B2.4.2.9 INTERMOLECULAR EXCHANGE 

The phenomenon of intermolecular exchange is very common. The loss of couplings to hydroxyl protons in 
all but the very purest ethanol samples was observed at a very early stage. Proton transfer reactions are still 
probably the most carefully studied [ 14 ] class of intermolecular exchange. 

In classical kinetics, intermolecular exchange processes are quite different from the unimolecular, first-order 
kinetics associated with intramolecular exchange. However, the NMR of chemical exchange can still be 
treated as pseudo-first-order kinetics, and all the previous results apply. One way of rationalizing this is as 

follows. NMR follows a particular nucleus, but typically only 1 in 10 5 nuclei is 'visible' (due to the small 
Boltzmann population difference). When a visible nucleus exchanges with another nucleus on another 
molecule, the probability is that the other nucleus is invisible. The exchange partners vastly overwhelm the 
visible nuclei. 

However, all the nuclei have spin. An example of this occurs in the intermolecular exchange of an AB spin 
system, as in equation (B2.4.41). 

AB^ A+B, (B2.4.41) 
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In this case, a spin A that was coupled to the a orientation of the B spin may end up, after the exchange, 
coupled to either a or p. Because of the Boltzmann distribution, the amounts of a and P orientation are each 


half of the sample. The first exchange is degenerate, but the second is a change of the B spin. This can be 
treated as exchange with a site in which the shifts are the same, but the coupling constant is of the opposite 
sign. If these spin parameters are used in equation (B2.4.37) , then the lineshapes in figure B2.4.5 are obtained. 
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Figure B2.4.5. Simulated lineshapes for an intermolecular exchange reaction in which the bond joining two 
strongly coupled nuclei breaks and re-forms at a series of rates, given beside the lineshape. In slow exchange, 
the typical spectrum of an AB spin system is shown. In the limit of fast exchange, the spectrum consists of 
two lines at the two chemical shifts and all the coupling has disappeared. 

B2.4.2.10 CALCULATION OF THE SPECTRUM 

Once the basic work has been done, the observed spectrum can be calculated in several different ways. If the 
problem is solved in the time domain, then the solution provides a list of transitions. Each transition is defined 
by four quantities: the integrated intensity, the frequency at which it appears, the linewidth (or decay rate in 
the time domain) and the phase. From this list of parameters, either a spectrum or a time-domain FID can be 
calculated easily. The spectrum has the advantage that it can be directly compared to the experimental result. 
An FID can be subjected to some sort of apodization before Fourier transformation to the spectrum; this 
allows additional line broadening to be added to the spectrum independent of the simulation. 
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The Bloch equation approach ( equation (B2.4.6) ) calculates the spectrum directly, as the portion of the 
spectrum that is linear in a B^ observing field. Binsch generalized this for a fully coupled system, using an 
exact density-matrix approach in Liouville space. His expression for the spectrum is given by equation 
(B2.4.42). Note that this is formally the Fourier transform of equation (B2.4.32) , so the time domain and 
frequency domain are connected as usual. 


Sto) = Ac;[f_(iL- R - K) 'Af„J. (B2.4.42) 


In practice, the spectrum is usually calculated by diagonalizing the matrix first [15], as was done in the time 
domain. This means that the large matrix does not have to be inverted for each point in the spectrum. 
However, for very large matrices, it may be numerically more efficient not to diagonalize, but rather to invert 
at each data point. For a six-spin system, the full matrix is 792x792, and each additional spin multiplies each 
dimension by roughly a factor of 4. Since the time for a diagonalization scales roughly as the cube of the 
dimension of the matrix, larger spin systems become impractical. However, modern sparse-matrix methods 
for matrix inversion do not suffer from the same dramatic scaling, and so will become more efficient for 
larger spin systems. 

The method for studying intermediate exchange in NMR is to obtain an excellent equilibrium spectrum of the 
system as a function of temperature. Then the theoretical apparatus developed above can be used to simulate 
and to fit the experimental data, in order to obtain the rate data. 


B2.4.3 FAST EXCHANGE 

B2.4.3.1 INTRODUCTION 

In the limit of fast exchange, the lineshape of chemical exchange quickly becomes a single Lorentzian line. In 

figure B2.4.4(b) the negative line broadens directly as the rate, and loses absolute intensity as well. This 

combination of the increasing width and decreasing integral means that the negative line quickly becomes 

irrelevant to the experimental lineshape. This becomes a pure Lorentzian, whose width is proportional to (Aco) 

2 /k, where Aco is the difference in Larmor frequency of the two sites and k is the exchange rate. Measuring the 
rate is then equivalent to measuring the spin-spin relaxation time, T 2 . The problem is that unless there is an 

estimate of Aco (from a spectrum that is 'frozen out') an absolute value of the rate cannot be measured. 
However, T 2 measurements themselves have an associated timescale. If that can match the exchange rate, then 
an absolute rate can be measured. 

B2.4.3.2 T 2 MEASUREMENTS 

In principle, T 2 can be measured directly from the linewidth of the spectrum. However, since experimental 
linewidths are also governed by inhomogeneous broadening (magnetic field inhomogeneities etc), careful T 2 
determinations require methods that cancel out inhomogeneous effects [16]. Three techniques are commonly 
available: the Carr-Purcell-Meiboom-Gill (CPMG) spin echo experiment, the T^ experiment and the offset- 
saturation method [17]. All three can be implemented easily on modern spectrometers. 
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The Hahn spin echo, with a single refocusing pulse, eliminates effects due to magnetic field inhomogeneities, 
so a study of the echo intensity as a function of the echo time should yield T 2 . However, if there is a field 
gradient present, diffusion in the sample can also attenuate the echo. Even with modern well shimmed 
magnets, the gradients are still large enough to affect a T 2 measurement in water. In the CPMG experiment, a 
series of closely spaced refocusing pulses suppresses diffusion effects. In this case, the echo intensity is 
measured as a function of echo number, and reliable values of T 2 can be obtained. 

The parameter Tjp is the longitudinal relaxation time of a magnetization which is spin locked along a 
radiofrequency magnetic field. The magnetization is flipped onto the x axis (for instance) by an RF pulse 
along y. The RF phase is changed by 90°, and the magnetization is then spin locked by the RF field, now 
along x. When the RF is shut off, the remaining xy magnetization can be detected directly. An analysis of the 
signal as a function of the spin-locking time yields T 2 . The offset-saturation experiment [ 18 ] consists of 


irradiating the spin system with an known RF field at some offset from resonance until a steady state is 
achieved. The z magnetization is then measured by a non-selective observe pulse. A plot (figure B2.4.6) of the 
partially saturated z magnetization against offset from resonance will show a dip at resonance. The width of 
this dip is given by equation (B2.4.43). 
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Figure B2.4.6. Results of an offset-saturation experiment for measuring the spin-spin relaxation time, T 2 . In 
this experiment, the signal is irradiated at some offset from resonance until a steady state is achieved. The 
partially saturated z magnetization is then measured with a tt/2 pulse. This figure shows a plot of the z 
magnetization as a function of the offset of the saturating field from resonance. Circles represent measured 
data; the line is a non-linear least-squares fit. The signal is normal when the saturation is far away, and dips to 
a minimum on resonance. The width of this dip gives T 2 , independent of magnetic field inhomogeneity. 


Dip width = {yR2}*JT\}Ti* 
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(B2.4.43) 


If the strength of the saturating RF, A B 2 , and the spin-lattice relaxation time, 7^, are known, then T 2 can be 
measured, again free of magnetic field inhomogeneities. 

These T 2 measurements can allow reaction rates of 10 5 s _1 or more to be measured. In combination with 
slow- and intermediate-exchange methods, this means that rates can be measured over a range of more than 
five orders of magnitude. This means that excellent thermodynamic parameters can be obtained. Figure 
B2.4.2 shows some results on furfural, a system which has unequally populated sites. For this case, the range 
of rates over which lineshape methods are useful is quite small. In the case of two-site, unequally populated 
exchange, the minor peak broadens faster than the major peak (the relative rate of broadening is the ratio of 
the major population to the minor). The minor peak disappears into the baseline quickly, but T 2 measurements 
can still provide good data. 


These experiments yield T 2 which, in the case of fast exchange, gives the ratio (Aco) /k. However, since the 
experiments themselves have an implicit timescale, absolute rates can be obtained in favourable 
circumstances. For the CPMG experiment, the timescale is the repetition time of the refocusing pulse; for the 
Tjp experiment, it is the rate of precession around the effective RF field. If this timescale is fast with respect 
to the exchange rate, then the experiment effectively measures T 2 in the absence of exchange. If the timescale 
is slow, the apparent T 2 contains the effects of exchange. Therefore, the apparent T 2 shows a dispersion as the 


timescale of the measurement method is changed [19]. Practical spectrometer considerations of RF heating 
and duty cycle usually limit these timescales to tens of kilohertz. However, if the conditions are appropriate, 
this dispersion curve yields an absolute rate. 


B2.4.4 SLOW EXCHANGE 

B2.4.4.1 INTRODUCTION 

The term 'slow' in this case means that the exchange rate is much smaller than the frequency differences in 
the spectrum, so the lines in the spectrum are not significantly broadened. However, the exchange rate is still 
comparable with the spin-lattice relaxation times in the system. Exchange, which has many mathematical 
similarities to dipolar relaxation, can be observed in a NOESY-type experiment (sometimes called EXSY). 
The rates are measured from a series of EXSY spectra, or by performing modified spin-lattice relaxation 
experiments, such as those pioneered by Hoffman and Forsen [20]. 

In the absence of exchange (and ignoring dipolar relaxation), each z magnetization will relax back to 
equilibrium at a rate governed by its own 7^, as in (B2.4.44). 




(B2.4.44) 


If there are two sites, A and B, then an analogous equation can be written, as in (B2.4.45). 
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If the two sites exchange with rate k during the relaxation, then a spin can relax either through normal spin- 
lattice relaxation processes, or by exchanging with the other site, equation (B2.4.45) becomes (B2.4.46). 
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This equation is very similar to (B2.4.13) . The basic situation is just as in intermediate exchange, except that 
it describes z magnetizations rather than xy. The frequencies are zero, and the matrix now has pure real 
eigenvalues, but the approach is the same. The time domain is a natural one for slow exchange, since a 
relaxation experiment follows the z magnetizations as a function of time. As before, the time dependence is 
obtained by diagonalizing the relaxation/exchange matrix and calculating the magnetizations for each time at 
which they are sampled. In this case, the solution is given by equation (B2.4.47), the same as equation 
(B2.4.14) , except there are no imaginary terms. 
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B2.4.4.2 TWO-DIMENSIONAL METHODS 


There are two main applications of slow chemical exchange: one is to determine the qualitative mechanism, 
and the other is to measure the rates of the processes as accurately as possible. For the first case, in which we 
have a spectrum in slow exchange, we need to establish the mechanism: which site is exchanging with which. 
For this purpose, the homonuclear two-dimensional experiment EXSY (the same pulse sequence as NOESY, 
but involving exchange) is by far the best technique to use. Exchange between sites leads to a pair of 
symmetrical cross-peaks joining the diagonal peaks of the same site, so the mechanism is very obvious. 

The EXSY pulse sequence starts with two n/2 pulses separated by the incrementable delay, ty This modulates 
the z magnetizations, so that the relaxation that occurs during the mixing time which follows, t , is frequency 

labelled. Finally, the z magnetizations are sampled with a third n/2 pulse. Magnetization from a different site 
that enters via exchange will have a different frequency label. A two-dimensional Fourier transform then 
produces the spectrum. The initial rate of increase of the cross-peak gives the rate of exchange. A series of 
EXSY experiments as a function of mixing times will define the mechanism and give an estimate of the rates. 

However, care must be taken in choosing the mixing time if there are multiple exchange processes. If the 
mixing time is too long, there is a substantial probability that a spin may have exchanged twice in that time, 
leading to spurious cross-peaks. Orrell and his group [21] have solved this problem by treating the effect of 
exchange on the z magnetizations correctly, and have written a program which simulates the two-dimensional 
spectrum as a function of the mixing time. 

Since exchange and coupled relaxation have the same mathematical form, both may contribute to a 
NOESY/EXSY spectrum, as in figure B2.4.7 . For small molecules, since the NOE is positive and exchange 
creates saturation transfer (like a negative NOE), the NOESY and EXSY cross peaks have opposite signs. For 
macromolecules in the spin-diffusion limit, the peaks have the same sign, but exchange cross-peaks can 
usually be distinguished by their much stronger temperature dependence. 
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Figure B2.4.7. Contour plot of a phase-sensitive NOESY/EXSY spectrum of the derivative of TEMPO. The 
peaks are positive, with the exception of the circled peaks, which are negative. The spectrum shows the 
exchange of the four methyl groups in this molecule. A combination of a ring-flip and inversion at nitrogen 
means that the axial methyl on one side of the ring exchanges with the equatorial methyl on the other side. 
The positive cross-peaks show the exchange of the two sets of methyls. However, there are also NOE cross- 
peaks between methyl groups on the same side of the ring. Since this is a small molecule, the NOE peaks are 
of opposite sign to the exchange peaks. There are two NOE cross-peaks for each methyl, since the exchange 
process is relatively fast, and distributes the NOE between the two exchange partners. 

B2.4.4.3 ONE-DIMENSIONAL SELECTIVE-INVERSION METHODS 

For careful rate measurements, once the mechanism is established, it is our opinion that one-dimensional 
methods are superior to quantitative 2D ones. Apart from the fact that ID spectra can be integrated more 
easily, there is also more control over the experiment. Modern spectrometers can create almost any type of 
selective excitation, so that there is control of the conditions at the start of the relaxation. For two sites, a non- 
selective inversion that inverts both sites equally will mask most of the exchange effects and the relaxation 
will be dominated by Ty However, if one site is inverted selectively, then that site can regain equilibrium by 
either T^ processes or by exchanging with the other site that was left at equilibrium [20]. The inverted signal 
will relax at roughly the sum of the exchange and spin-lattice relaxation rate, while the signal that was 
unperturbed at the start of the experiment shows a characteristic transient, as in figure B2.4.8 . These one- 
dimensional selective-inversion experiments have been widely used in systems without scalar coupling, such 
as methyl groups or 3 C spectra. 
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Figure B2.4.8. Relaxation of two of the exchanging methyl groups in the TEMPO derivative in figure B2.4.7 . 
The dotted lines show the relaxation of the two methyl signals after a non-selective inversion pulse (a typical 
T, experiment). The heavy solid line shows the recovery after the selective inversion of one of the methyl 
signals. The inverted signal (circles) recovers more quickly, under the combined influence of relaxation and 
exchange with the non-inverted peak. The signal that was not inverted (squares) shows a characteristic 
transient. The lines represent a non-linear least-squares fit to the data. 

For multiple sites, a wide range of initial conditions is available. For instance, in a three-site exchange 
amongst A, B and C, the signal due to A can be inverted selectively. This will provide rate information on the 
A-B exchange and the A-C exchange, but relatively little about B-C. This is one example of using the initial 
conditions to suppress or enhance the observation of particular processes. The definition of how selective a 
selective inversion may be gives an added degree of control over the experiment. 

The selective-inversion experiment gives excellent rate data. The description of the time evolution of the z 
magnetizations in equation (B2.4.47) is exact. There are no assumptions about short mixing times or initial 
rates. Standard non-linear least-squares methods allow fitting these curves to the measured data and deriving 
values for the rates involved. Believing in the error estimates of these multi -parameter fitting procedures can 
be dangerous, however. A more reliable error estimate can be obtained by 'profiling'. In this procedure, a 
global fit to all parameters is done first. Then the rate (or any other parameter of interest) is fixed at a different 
value, and the data are re-fitted using all the other parameters. As the rate is moved from the optimum value, 
the fit will become worse, as measured by the sum of the squares of the deviations between real data and the 
model. When the 'badness of fit' exceeds a critical level of the F statistic, the value of the rate is at the end of 
a confidence interval. This confidence interval can be several times larger than the one calculated from the 
usual standard deviation in a nonlinear least-squares fit. Even with this error estimate, it is possible to measure 
rates with errors of less than 10%. Figure B2.4.8 shows the quality of result that is possible. 


-24- 


B2.4A.4 SELECTIVE INVERSIONS ON COUPLED SYSTEMS 


In a selective-inversion experiment, it is the relaxation of the z magnetizations that is being studied. For a 
system without scalar coupling, this is straightforward: a simple pulse will convert the z magnetizations 
directly into observable signals. For a coupled spin system, this relation between the z magnetizations and the 
observable transitions is much more complex [22]. 

In a coupled spin system, the number of observed lines in a spectrum does not match the number of 
independent z magnetizations and, furthermore, the spectra depend on the flip angle of the pulse used to 
observe them. Because of the complicated spectroscopy of homonuclear coupled spins, it is only recently that 
selective inversions in simple coupled spin systems [ 23 ] have been studied. This means that slow chemical 
exchange can be studied using proton spectra without the requirement of single characteristic peaks, such as 
methyl groups. 

The z magnetizations of the spin system are key to the problem, since the exchange is measured in 
competition with their relaxation processes. However, for a coupled spin system, the lines in the spectrum do 
not directly reflect the z magnetizations. Even for two weakly coupled spins, there are four lines in the 
spectrum (two doublets), but there are only three independent z magnetizations. There are indeed four energy 
levels, but the sum of their populations must be constant. There are three independent quantities: the total 
magnetization of A, the total X magnetization, and a shared IJ z magnetization. For coupled systems, 
especially those with strong coupling, the relation of the z magnetizations to the observed spectrum can be 
quite complex. 

There are several complications. One is the flip angle dependence of the spectra [22, 24]. For a non- 
equilibrium state of a coupled spin system, the observed intensities of the lines depend on the flip angle used 
to observe them. In particular, spectra are only 'true' reflections of the z magnetizations in the limit of small 
flip angles. A further complication arises because the z magnetizations are part of a larger manifold of 
coherences that also includes the zero-quantum transitions. Both the z magnetizations and the zero-quantum 
transitions have coherence level zero and they cannot be separated by pulses or phase cycling [11]. This is the 
problem with the zero-quantum coherences in NOESY, for instance. For instance, for three spins there are 
eight z magnetizations, one of which is fixed as the total number of spins. However, there are 20 coherence 
level zero density matrix elements, leaving six pairs of zero quantum transitions. The 15 observable xy 
magnetizations for a three-spin system cannot correspond directly to the eight z magnetizations. Provided that 
these complications are recognized, they can be treated easily with standard spin-dynamics techniques. 

The xy magnetizations can also be complicated. For n weakly coupled spins, there can be n 2 lines in the 
spectrum and a strongly coupled spin system can have up to (2n!)/((n-l)!(n+l)!) transitions. Because of small 
couplings, and because some lines are weak combination lines, it is rare to be able to observe all possible 
lines. It is important to maintain the distinction between mathematical and practical relationships for the 
density matrix elements. 

These complications require some careful analysis of the spin systems, but fundamentally the coupled spin 
systems are treated in the same way as uncoupled ones. Measuring the z magnetizations from the spectra is 
more complicated, but the analysis of how they relax is essentially the same. 


-25- 


B2.4.5 EXCHANGE IN SOLIDS 

Exchange in the solid state follows the same basic principles as in liquids. The classic Cope re-arrangement of 
bullvalene occurs in both the liquid and solid state [25], and the lineshapes in the spectra are similar. 
However, because of chemical shielding anisotropy (CSA) and quadrupolar and dipolar effects, the Larmor 


frequency of a given spin depends on the orientation of the individual molecule. In a liquid, where there is 
isotropic tumbling, these effects average out and exchange is only evident if there is a change in isotropic 
chemical shift. In a solid, almost any type of molecular motion can cause lineshape and other effects. 
Furthermore, many NMR spectra of solids are run under the conditions of magic-angle spinning. This 
introduces a further timescale into the spectroscopy: the spinning rate, which can now go up to 25 kHz or 
more. The basic principles are the same, but the systems studied and the observed phenomena can be quite 
different. 

Intermediate exchange is the regime in which lineshape changes are the most obvious manifestation. In 13 C 
magic-angle spinning (MAS) spectra, there can be all the liquid-like coalescence phenomena [26]. These can 
be analysed just as before. Another type occurs when a molecule re-orients in the crystal lattice. Since the 
magnetic environment of the nucleus is anisotropic, the Larmor frequency of the spin changes. One of the 
most familiar examples is the effect of dynamics on powder patterns in deuterium spectra [27]. In a typical 
carbon-deuterium bond, the quadrupole coupling of the deuterium nucleus is about 160 kHz. This defines a 
timescale that is very useful for polymers and biological membranes, and deuterium spectra are widely used. 
Quite detailed information is available: lineshapes are significantly different for twofold jumps, threefold 
jumps or continuous diffusion. For instance, in a tert-butyl group, overall rotation can be distinguished from 
rotations of individual methyl groups. 

Magic-angle spinning of solid samples provides an experimental parameter not available in liquids. A full 
analysis of the combined effects of dynamics and MAS for all relative timescales is a very complex problem, 
but for slow exchange there are techniques that are intuitive. The first is EXSY, which works well in MAS 
spectra, although the interpretation is confused by the phenomenon of spin diffusion. Cross-peaks can be due 
to the exchange of spin polarization (rather than the nuclei themselves) via the dipolar interaction. One- 
dimensional methods that use MAS are also available. The TOSS pulse sequence will eliminate spinning 
sidebands, provided there is no internal dynamics in the sample. If there is exchange on the timescale of a 
rotor period, the careful cancellation will no longer work, and sidebands will reappear [28], Another use of 
spinning is the ODESSA [ 29 ] pulse sequence, which selectively inverts some of the sidebands, which then 
relax back due to chemical exchange. 


B2.4.6 CONCLUSIONS 

In order to study chemical exchange in NMR, it is necessary to have a scale against which to measure it. In 
fast and intermediate exchange, the timescale is the difference in Larmor frequency between the two sites. As 
the exchange rate approaches this timescale, the lines in the one-dimensional spectrum broaden, coalesce and 
sharpen into single lines. For intermediate exchange, the lineshape provides the best information. In fast 
exchange, the rate is starting to dominate the timescale, but useful information can still be extracted from T 2 
measurements. In slow exchange, the spin-lattice relaxation provides a timescale. In this regime, 
modifications of methods for measuring T, and the NOE 
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are used. In MAS spectra of solids, the spinning rate provides another timescale. When the timescales match, 
dramatic effects can often be observed. 

Once the exchange has been established, it is necessary to measure its rate. This is usually done by simulating 
the NMR experiment with a mathematical model and adjusting the rate in the model until it matches 
experiment. This means simulating the lineshape in intermediate exchange, or simulating the coupled 
relaxation of the z magnetizations in slow exchange. In both these cases, the model is similar. There is a 
matrix which describes each site in the exchange. These matrices form the diagonal of a larger block-diagonal 
matrix. The blocks are then connected by the exchange process, to form one large matrix. The exchange is 


described by the eigenvalues and eigenvectors of the single large matrix. 

The exact form of the matrices depend on the situation. In slow exchange, the matrices are real and they 
model the multi-exponential relaxation of the z magnetizations. In intermediate exchange, the matrix has both 
real and imaginary parts, as do the eigenvalues and eigenvectors. The model in this case produces a series of 
transitions, whose intensity, phase, position and width are given by a complex-valued transition probability. 
The intermediate-exchange spectrum is just the sum of these transitions. 

The NMR methods for studying chemical exchange are fundamentally no different from standard NMR 
methods. Chemical exchange effects appear in the spectrum and in measurements of the relaxation times, so 
careful measurement of these will provide good exchange data. Perhaps this is the single conclusion: apart 
from some algebraic and numerical details, chemical exchange is identical to 'normal' NMR. 
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B2.5 Gas-phase kinetics studies 

David Luckhaus and Martin Quack 


B2.5.1 INTRODUCTION 

The key to experimental gas-phase kinetics arises from the measurement of time, concentration, and 
temperature. Chemical kinetics is closely linked to time-dependent observation of concentration or amount of 
substance. Temperature is the most important single statistical parameter influencing the rates of chemical 
reactions (see chapter A3. 4 for definitions and fundamentals). 

The rich history of experimental chemical kinetics can be broadly classified according to various conceptual 
phases. The starting point of quantitative chemical kinetics was the formulation, in 1850 by Wilhelmy [1], of 
the time dependence of concentrations by a differential equation corresponding to a pseudo-first-order rate 
law for the hydrolysis ('inversion') of cane sugar. The observation of the concentration of cane sugar was 
carried out spectroscopically in the early experiments by following the time-dependent rotation of the plane of 
polarized light by the reaction mixture after mixing the reactant and the catalyst (the acid). During the 
following half-century, until about 1900, the nature of such phenomenological rate laws was clarified, as was 
the role of the rate constant and its temperature dependence. This epoch is characterized by the concepts 
introduced by van't Hoff 2 and Arrhenius 3. It became clear that a distinction must be made between 
phenomenological rate laws for reactions following from a compound mechanism, such as the inversion of 
cane sugar, and rate laws and rate constants for elementary reactions, which can be combined into a 
compound mechanism. These new concepts characterize the second phase of chemical kinetics studies. For 
about half a century from 1900 to 1950, experimental investigations concentrated on elementary reactions and 
the mechanisms in which they are combined. The fathers of gas-phase kinetics, such as Bodenstein, 
Lindemann, and Hinshelwood, may be named as the representatives of this period. 

During the course of these studies the necessity arose to study ever- faster reactions in order to ascertain their 
elementary nature. It became clear that the mixing of reactants was a major limitation in the study of fast 
elementary reactions. Fast mixing had reached its high point with the development of the accelerated and 
stopped-flow techniques [4, 5], reaching effective time resolutions in the millisecond range. Faster reactions 
were then frequently called 'immeasurably fast reactions '[JJ. 

The new concept overcoming this limitation in the third phase of experimental kinetics started around 1950, 


and consisted of initiating a chemical reaction by a very fast physical perturbation and measuring the 
subsequent relaxation kinetics without a mixing step in the experiment. Various schools developed techniques 
along these lines, such as Norrish and Porter with flash photolysis [7, 8], and Eigen's school with T-jump and 
other relaxation techniques [6]. Weller's kinetics of fluorescence change can be classified as an indirect 
technique along these lines [9, 10 and 11 ], and the Davidson and Jost schools developed shock- wave methods 
[12]. While the ideas for such techniques can be traced to earlier theoretical papers by Nernst and Einstein, the 
actual experimental developments started around 1950, initiating what has been called 'a race against time' [7, 
8]. In particular, in relation to the laser- flash photolysis and pump-probe techniques developed after 1960, the 
essence of this race is to generate ever shorter laser pulses for 'pumping' a sample and well-controlled pulse 
delays for probing the sample's time evolution, where the lengths of the pulses define 


the limits of the time resolution of the techniques. Nanosecond resolution was available by about 1966, 
picosecond resolution around 1970, and by about 1985 the domain around 10 fs had been reached. Since that 
time progress by these techniques towards shorter times has slowed down somewhat, a typical value being 
about 5 fs today (see chapter B2.1 ). One might mention, however, a paper on the '0 fs' pulse [ 13 ] (dated 1 
April 1990). 

In parallel with this race against time, new experimental concepts were introduced, which escape the race by 
switching to a conceptually new approach. Among these are the molecular-beam-scattering techniques 
developed by Datz, Taylor, Martin, Herschbach, Lee, and others [14, 15], measuring reaction cross sections 
without time resolution instead of time evolution and reaction rates. NMR line-shape methods proposed by 
Gutowsky and Holm [16, 17] are further examples for alternative techniques, where time-dependent rates are 
calculated from time independent information. Neither of these techniques, however, is very well suited to 
study very fast intramolecular primary processes, which mark the starting point of all chemical reactions and 
may be considered to characterize the present fourth phase of experimental chemical kinetics. The new 
concept making these processes accessible to experimental investigation starts from high-resolution molecular 
spectra in the frequency domain (stationary or at least without short-time resolution) to derive ultimately the 
full molecular quantum-chemical kinetics in the time ranges from nanoseconds to attoseconds [18]. This 
experimental approach was, in fact, historically the first one to provide non-trivial three-dimensional 
molecular quantum wave packet dynamics [18, 19, and 20], some time even before the first one-dimensional 
molecular quantum wave packet kinetics became available by short-pulse techniques (see [7, 21, 22 and 23] 
and references cited therein). The scope of the present chapter is to cover the most important experimental 
techniques in current use, including some of the well-established but also some of the most recent ones, 
together with current developments and to illustrate them with a few typical examples of results. The 
presentation by necessity is exemplary and not exhaustive. 

On a modest level of detail, kinetic studies aim at determining overall phenomenological rate laws. These may 
serve to discriminate between different mechanistic models. However, to it prove a compound reaction 
mechanism, it is necessary to determine the rate constant of each elementary step individually. Many kinetic 
experiments are devoted to the investigations of the temperature dependence of reaction rates. In addition to 
the obvious practical aspects, the temperature dependence of rate constants is also of great theoretical 
importance. Many statistical theories of chemical reactions are based on thermal equilibrium assumptions. 
Non-equilibrium effects are not only important for theories going beyond the classical transition-state picture. 
Eventually they might even be exploited to control chemical reactions [24]. This has led to the increased 
importance of energy or even quantum-state-resolved kinetic studies, which can be directly compared with 
detailed quantum-mechanical models of chemical reaction dynamics [25, 26 ]. 

Many experimental methods may be distinguished by whether and how they achieve time resolution — directly 
or indirectly. Indirect methods avoid the requirement for fast detection methods, either by determining relative 
rates from product yields or by transforming from the time axis to another coordinate, for example the 
distance or flow rate in flow tubes. Direct methods include (laser-) flash photolysis [27], pulse radiolysis [ 28 ] 


(see also chapter A3. 5 on ion reactions), and the important relaxation techniques, which study the relaxation 
of a reacting system back into equilibrium after a perturbation [29]. Here one distinguishes two types of 
perturbation methods: (i) small perturbations from equilibrium leading to relaxation kinetics in the narrower 
sense with generalized first-order kinetics [6] and (ii) large perturbations from equilibrium leading to 
generally nonlinear rate laws. Typical examples for large perturbations are temperature jumps achieved 
through laser heating or in shock-tube experiments [30], 

The time resolution of these methods is determined by the time it takes to initiate the reaction, for example the 
mixing time in flow tubes or the laser pulse width in flash photolysis, and by the time resolution of the 
detection. Relatively 


slow reactions can be monitored by taking samples and quickly quenching the reaction by cooling to low 
temperature or by dilution. The samples can then be analysed using conventional analytical techniques [31]. 
For time-dependent monitoring, any physical-chemical property changing during the course of a reaction can 
be used. Perhaps most important are spectroscopic detection techniques, particularly IR (infrared) and UV- 
VIS (ultraviolet-visible) absorption and fluorescence techniques. Directly recording spectroscopic signals as a 
function of time can achieve a time resolution of about 0.1 ns in favourable cases. Even higher time resolution 
of a few femtoseconds can be realised in pump-probe, laser flash-photolysis experiments [22, 32] (see also 
chapter B2.1 on ultrafast spectroscopy). 

A completely different approach, in particular for fast unimolecular processes, extracts state-resolved kinetic 
information from molecular spectra without using any form of time-dependent observation. This includes 
conventional line-shape methods, as well as the quantum-dynamical analysis of ro vibrational overtone spectra 
[18, 33, 34 and 35]. 

At this point, we only mention the very important molecular-beam techniques [15, 27, 36, 37 and 38] that 
allow the study of isolated molecules, largely without thermal congestion. They are ideally suited to the 
investigation of unimolecular processes, in particular dissociation reactions and energy redistribution 
processes (see chapter A3. 13 on energy redistribution in reacting systems). The determination of state- 
resolved cross sections for bimolecular reactions in crossed molecular beams has paved the way for 
mechanistic investigations of elementary processes in the greatest possible detail (see chapter B2.3 Reactive 
scattering). 


B2.5.2 FLOW TUBES 

Figure B2.5.1 schematically illustrates a typical flow-tube set-up. In gas-phase studies, it serves mainly two 
purposes. On the one hand it allows highly reactive shortlived reactant species, such as radicals or atoms, to 
be prepared at well-defined concentrations in an inert buffer gas. On the other hand, the flow replaces the time 
dependence, t, of a reaction by the dependence on the distance x from the point where the reactants are mixed 
by the simple transformation with the flow velocity va 

t~h = (B2.5.1) 

»7 

Instead of shifting the detector position, as indicated in figure B2.5.1 one often varies the location of the 
reactant mixing region using moveable injectors. This allows complex, possibly slow, but powerful, analytical 
techniques to be used for monitoring gas-phase reactions. In combination with mass-spectrometric detection, 


both reactants and products can be monitored quantitatively [39, 40 and 41 ]. A further possibility consists of 
keeping the position (x-x Q ) in equation B2.5.1 constant and varying the flow velocity v# thereby varying (t- 
t^. This technique is called 'accelerated flow'. 



Rtfiictant 


Detection 


Figure B2.5.1. Schematic representation of a typical flow tube set-up with moveable detection. Adapted from 

[no]. 

The time-to-distance transformation requires fast mixing and a known flow profile, ideally a turbulent flow 
with a well-defined homogeneous composition perpendicular to the direction of flow ('plug-flow'), as 
indicated by the shaded area in figure B2.5.1. More complicated profiles may require numerical 
transformations. 

One of the major limiting factors for the time resolution of flow-tube experiments is the time required for 
mixing reactants and — to a lesser extent — the resolution of distance. With typical fast flow rates of more than 
25 ms [42, 43] the time resolution lies between milliseconds and microseconds. 

Modern applications of the technique include kinetic studies of post-combustion processes and their complex 
reaction systems. The influence of traces of NO x on the reaction kinetics of the H 2 /0 2 and CO/H 2 0/0 2 
systems has recently been investigated in a high-pressure, turbulent-flow reactor at pressures up to 14 atm 
between 750 K and 1 100 K [31, 44]. The reaction was monitored by taking samples at a fixed position and 
varying the location where the fuel (CO/H 2 0/NO x or H 2 /NO x ) is injected into a hot stream of 2 . The 
samples were instantly quenched in the hot- water-cooled sampling probe and analysed with a variety of 
analytical techniques including Fourier-transform infrared spectroscopy. The results were interpreted in terms 
of a reaction mechanism including 52 elementary reactions, assuming instant mixing and homogeneous 
composition perpendicular to the flow direction. 

NO generally catalyses 'fuel consumption' by transforming hydroperoxyl radicals into highly-reactive 
hydroxyl radicals: 


]l0 2 + NO^NO* + OII 


(B2.5.2) 


H 2 + OH -+ H 2 0+H 


(B2.5.3) 


CO +■ Of I -* CO, + H (B2.5.4) 


NO> + H — WO + 01 1. (B2.5.5) 


As a stable radical, however, NO can also catalyze the recombination of radicals (X,Y) at higher 
concentrations, eventually inhibiting overall oxidation [45]: 


X + NO -+ XNO (B2.5.6) 

The balance of these two effects was found to depend delicately on the stoichiometry, pressure, and 
temperature. The results were used to develop a more comprehensive CO/H 2 0/0 2 /NO x reaction mechanism, 
incorporating the explicit fall-off behaviour of recombination reactions [ 46 , 47 ]. 


B2.5.3 RELAXATION METHODS 

Two types of relaxation techniques are distinguished, depending on whether the perturbation applied is small 
or large. 

B2.5.3.1 RELAXATION AFTER A SMALL PERTURBATION FROM EQUILIBRIUM 

Perturbation or relaxation techniques are applied to chemical reaction systems with a well-defined 
equilibrium. An 'instantaneous' change of one or several state functions causes the system to relax into its 
new equilibrium [29]. In gas-phase kinetics, the perturbations typically exploit the temperature (r-jump) and 
pressure (P-jump) dependence of chemical equilibria [6]. The relaxation kinetics are monitored by 
spectroscopic methods. 

T-jump techniques can achieve fast heating of the reaction system by pulsed radiation, for example with a 
microwave source or an IR laser. In the latter case one often adds an efficient inert absorber, such as SF 6 . The 
heating of the reaction system then results from fast collisional relaxation of the initially-excited absorber 
molecules [48,49]. 

When the perturbation is small, the reaction system is always close to equilibrium. Therefore, the relaxation 
follows generalized first-order kinetics, even if bi- or trimolecular steps are involved (see chapter A3. 4 ). Take, 
for example, the reversible bimolecular step 

h 
A+B— C+D. (B2.5.8) 

With equilibrium concentrations c eq , the (small) deviation from equilibrium is given by 

A t = C A - ^a = f B " ^l? = C C - f C = Cd ~ f D- (B2.5.91) 


Exploiting microscopic reversibility 


K 2 C A C H — K -2 C C C 


(B2.5.10) 


and neglecting terms quadratics in A c leads to the approximate first-order rate law for this elementary step: 

- -r- = — — = k^A c (B2.5.11) 

di d/ 


For this reaction alone, one would thus obtain a simple exponential relaxation with relaxation time 


(B2.5.12) 


More generally, the relaxation follows generalized first-order kinetics with several relaxation times x ., as 
depicted schematically in figure B2.5.2 for the case of three well-separated time scales. The various relaxation 
times determine the turning points of the product concentration on a logarithmic time scale. These relaxation 
times are obtained from the eigenvalues of the appropriate rate coefficient matrix ( chapter A3. 4 ). The time 
resolution of T-jump relaxation techniques is often limited by the rate at which the system can be heated. With 
typical T-jumps of several Kelvin, the time resolution lies in the microsecond range. 


t w 



c a 


00 


z 


z 


z 


-1 — I— 
to *1 


-T ►- 

*J t 


Figure B2.5.2. Schematic relaxation kinetics in a T-jump experiment, c measures the progress of the reaction, 
for example the concentration of a reaction product as a function of time t (abscissa with a logarithmic time 
scale). The reaction starts at t^. (a) Simple relaxation kinetics with a single relaxation time, (b) Complex 
reaction mechanism with several relaxation times x f . The different relaxation times x f are given by the turning 
points of c as a function of ln(^). Adapted from [ 110 ]. 


T-jump experiments are particularly well-suited to the study of the dissociation kinetics of weakly-bound 


molecules or molecular complexes. Markwalder et al [ 49 ] used the laser-induced T-jump method to 
investigate the temperature and pressure dependence of N0 2 recombination kinetics: 


N 2 4 + M^2N0 2 + M r 


(B2.5.14) 


With M = He, experiments were carried out between 255 K and 273 K with a few millibar N0 2 at total 
pressures between 300 mbar and 200 bar. Temperature jumps on the order of 1 K were effected by pulsed 
irradiation (<£ 1 |uS) with a C0 2 laser at 9.2- 9.6|um and with SiF 4 or perfluorocyclobutane as primary IR 
absorbers (^ 1 mbar). Under these conditions, the dissociation of N 2 4 occurs within the irradiated volume 
on a time scale of a few hundred microseconds. N0 2 and N 2 4 were monitored simultaneously by recording 
the time-dependent UV absorption signal at 420 nm and 253 nm, respectively. The recombination rate 
constant k YQC can be obtained from the effective first-order relaxation time, x R . A derivation analogous to 
( equation (B2.5.9) , equation (B2.5.10) , equation (B2.5.11) and equation (B2.5.12) ) yield 


■Krec — 


Kr + 4lNO»] 


(B2.5.15) 


where K is the equilibrium constant of equation (B2.5.14)). k. K , and [NOJ all refer to the final 
temperature. At 255 K, the authors obtained the typical fall-off curve depicted in figure B2.5.3 . Even at 200 
bar, the effective rate constant is still less than half the extrapolated high-pressure limit. The final results of 
the high-(£ ^ ) and low-pressure (k n n ) limiting rate constants (see chapter A3. 4 ) were [ 49 ] 


k m ^ - (2.2 ±0.2) x ]0\T/Kf^ ll) cm } 


mo I 


(B2.5.16) 


JWo = 17 5 ± (KK) x 1(^(77Ky- 90±09) x [He] cm 6 moP 2 s 


(B2.5.17) 


where the temperature is given in Kelvin. 
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Figure B2.5.3. The fall-off curve of reaction (B2.5.14) with M = He between 0.3 bar and 200 bar. The dashed 
lines represent the extrapolated low- and high-pressure limits. k n n = (2.1 ±0.2) *10 14 x [He] cm 6 mol -2 s" 1 

11 ^ —1—1 icC,U 

and k ^ = (7.0 ± 0.7) x 10 ever mol s yield the best fit (full curve) to the experimental data (full 
circles).' Adapted from [49]. 


B2.5.3.2 PERIODIC SMALL PERTURBATION FROM EQUILIBRIUM AND ULTRASOUND ABSORPTION 

The previous subsection described single-experiment perturbations by T-jumps or P-jumps. By contrast, 
sound and ultrasound may be used to induce small periodic perturbations of an equilibrium system that are 
equivalent to periodic pressure and temperature changes. A temperature amplitude 8T« 0.002 K and a 
pressure amplitude 8 P « 30 mbar are typical in experiments with high-frequency ultrasound. Figure B2.5.4 
illustrates the situation for different rates of chemical relaxation with the angular frequency of the sound wave 
co and the relaxation time x R : 

<dt r « 1 . The sample relaxes fast with the displacement from equilibrium synchronous with the 

sound wave. 

cox R ^1. Compared with the sound wave, the system relaxes slowly. It lags behind, the phase is 

shifted, and amplitudes are reduced by damping, 
cox^ » 1 . Very slow relaxation. 

As an example for the mathematical treatment, we take the bimolecular reaction 


A + B^P+M. 


(B2.5.18) 



a>l/2rf 

Figure B2.5.4. Periodic displacement from equilibrium through a sound wave. The full curve represents the 
temporal behaviour of pressure, temperature, and concentrations in the case of a very fast relaxation. The 
other lines illustrate various situations, with <dt r according to table B2.5.1. co is the angular frequency of the 
sound wave and x R is the chemical relaxation time. Adapted from [ 110 ]. 

Table B2.5.1 Form of the 'chemical wave' yjj) (equation B2.5.24)) for the various cases depicted in figure 
B2.5.4. 


Case 


Amplitude Phase shift y (t) 


1 wf k ^ 1 ~ a 

2 cur R = 10 3> I 


*0 

*7c/2 


a sin (oof) 




~a / (cox R ) 


-[a/(cox D )] cos(cox) 

K 

3 

OT R =1 

afjl 

n/4 

(d/i/5)s]n{urf - ff/4) 

4 

(WfR = </S 

a/2 

re/3 

(a/2)sin(cof-7i/3) 


The turnover variable 


X = C A {t = 0) - C A = t'u(f = 0) - tii (B2.5.19) 
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obeys a first-order rate law near the equilibrium value, x . which, in turn, depends on the temperature change 

eq 


AT induced by the sound wave: 


dt 


— = *rff(JC«,<r + &T) -X). (B2.5.20) 


For small AT, the temperature dependence of the effective first-order rate constant, £ eff , can be neglected. 
With y = x (T) -x and with the shift of the equilibrium, Ax = x eQ (7) -x ea (T + y*T), one obtains 

J2l = (y _ A^)\k h +k^+tf)) = I^^L. (B2.5.21) 

As long as AT, Ax e , and Ax remain small, they will be proportional to the sinusoidal pressure wave. In 
particular 

Aj^m = a Siniiot ). (B2.5.22) 


This leads to 


V(0 + Tk — ■ — = rt !>m{wl ) (B2.5.23) 

dt 


with the general solution 


r ~ flwtR 1 /— f \ « , , , flfijrR 
v(0 = v{0) + -^ tixp [ — } + — sin(w/) - ^ cosfuf ). (B2.5.24) 

For sufficiently long times (index w), the exponential can be neglected, leaving an oscillation of the turnover 
variable phase shifted with respect to the sound wave and with its amplitude reduced by the finite relaxation 


time x v : 


>r<0 = 


tl 


y i+ ^ T R 


sinfr :)t - urct(intfj>TR)) + 


(B2.5.25) 


The easily accessible frequency range of sound and ultrasound waves confines the range of applicability of 

this technique to relaxation times, x R , between 10 -4 s and 1CT 9 s. The derivation given here is, of course, 
independent of the underlying chemical process, as long as it is characterized by a single relaxation time. In 
general a complex relaxation spectrum is possible, so that the method reaches its limit for complex reactions 
as the interpretation of the results may become ambiguous. Extensive descriptions of relaxation experiments 
with small perturbations — both single and periodic — can be found in [29, 50]. Here one can also find 
numerous relaxation-time expressions for various equilibrium systems (uni-, bi-, trimolecular and reverse). 
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B2.5.3.3 RELAXATION AFTER LARGE PERTURBATION: SHOCK-WAVE EXPERIMENTS 


A general limitation of the relaxation techniques with small perturbations from equilibrium discussed in the 
previous section arises from the restriction to systems starting at or near equilibrium under the conditions 
used. This limitation is overcome by techniques with large perturbations. The most important representative 
of this class of relaxation techniques in gas-phase kinetics is the shock-tube method, which achieves T-jumps 
of some 1000 K (accompanied by corresponding P-jumps) [ 30 , 51 , 52 and 53 ]. Shock tubes are particularly 
useful for measuring the temperature dependence of reaction rates up to high temperatures. Figure B2.5.5 
shows a schematic representation of the experimental set-up. The shock tube consists of a high- and a low- 
pressure part (R), separated by a diaphragm (d). The latter is filled with the reaction mixture and is operated 
either as a static cell or as a low-pressure flow tube. The high-pressure part is filled with a light, inert gas, 
such as H 2 or He, whose pressure is increased until the diaphragm breaks. This creates a shock wave 
travelling through the reaction mixture with supersonic speed. Behind the shock front the temperature can 
jump by more than 1000 K within l|us or less. After passing the detection zone, where the relaxation is 
followed spectroscopically, the wave front is reflected back at the end of the tube. The resulting change of the 
temperature as a function of time is depicted in figure B2.5.6 . The temperature T 2 after the wave front is 
determined indirectly from the speed, u, at which the wave front travels through the tube. For a highly-dilute 
reaction mixture in a monoatomic gas with atomic mass M one obtains 


7i = 7" 


/ Mir l\/ tfir + l \/ 16Mir y 


(B2.5.26) 


where T^ is the temperature before the wave front has passed. 


H,/He 


I A/ | 


Tf 


R 



Figure B2.5.5. Schematic representation of a shock-tube apparatus. The diaphragm d separates the high- 


pressure part from the low-pressure reaction chamber R. The speed of the shock wave is determined by the 
time A tit takes to pass two observation points. The reaction itself is monitored spectroscopically using the 
light source L and a detector D close to the reflection wall W. Adapted from [ 110 ]. 
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Figure B2.5.6. Temperature as a function of time in a shock-tube experiment. The first T-jump results from 
the incoming shock wave. The second is caused by the reflection of the shock wave at the wall of the tube. 
The rise time 8 t typically is less than 1 (is, whereas the time delay between the incoming and reflected shock 
wave is on the order of several hundred microseconds. Adapted from [ 110 ]. 

A classic shock-tube study concerned the high-temperature recombination rate and equilibrium for methyl 
radical recombination [54, 55 ]. Methyl radicals were first produced in a fast decomposition of diazomethane 
at high temperatures (T> 1000 K) 


CH,NNCH,i^2CH, + N,, 


(B2.5.27) 


Subsequently, the recombination of methyl radicals was studied by the high-temperature UV absorption of the 
methyl radicals near 216 nm [ 5455 and 56 ], 


cHj+ch.-^Iqiv 


(B2.5.28) 
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Figure B2.5.7 shows the absorption traces of the methyl radical absorption as a function of time. At the time 
resolution considered, the appearance of CH 3 is practically instantaneous. Subsequently, CH 3 disappears by 
recombination (equation B2.5.28 ). At temperatures below 1500 K, the equilibrium concentration of CH 3 is 
negligible compared with C 2 H 6 (left-hand trace): the recombination is complete. At temperatures above 1500 
K (right-hand trace) the equilibrium concentration of CH 3 is appreciable, and thus the technique allows the 
determination of both the equilibrium constant and the recombination rate [54, 55]. This experiment resolved 
a famous controversy on the temperature dependence of the recombination rate of methyl radicals. While 
standard RRKM theories [52, 58] predicted an increase of the high-pressure recombination rate coefficient 
£ rec?00 (7) by a factor of 10-30 between 300 K and 1400 K, the statistical-adiabatic-channel model predicts a 


slight decrease of k TQc9( JT) with increasing temperature [59, 60], in agreement with experiment [54, 55 ]. This 
temperature dependence of the high-pressure recombination rate coefficient for radical-radical association is 
now a generally accepted feature, frequently reconfirmed for other examples in this class of reaction. The 
secondary isotope effect for the recombination of CD 3 has also been studied for this reaction [54, 55 ]. 
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Figure B2.5.7. Oscilloscope trace of the UV absorption of methyl radical at 216 nm produced by 
decomposition of azomethane after a shock wave (after [54]): at (a) 1280 K and (b) 1575 K. 

In a more recent example, a shock-tube experiment was used to study the thermal decomposition of 
methylamine between 1500 K and 2000 K [61, 62]: 


Ml 


k(T) = R, 17 x I0 lft cxp(-30 7 m K/T) em' nwl" ' s" 1 (±20%)* 


(B2.5.29) 
(B2.5.30) 


The pyrolysis of CH^NH 2 (<1 mbar) was performed at 1.3 atm in Ar, spectroscopically monitoring the 
concentration of NH 2 radicals behind the reflected shock wave as a function of time. The interesting aspect of 
this experiment was the combination of a shock-tube experiment with the particularly sensitive detection of 
the NH 2 radicals by frequency-modulated, laser-absorption spectroscopy [61]. Compared with 'conventional' 
narrow-bandwidth laser-absorption detection the signal-to-noise ratio could be increased by a factor of 20, 
with correspondingly more accurate values for the rate constant k(T). 
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B2.5.4 FLASH PHOTOLYSIS WITH FLASH LAMPS AND LASERS 

One of the most important techniques for the study of gas-phase reactions is flash photolysis [8, 63]- A 
reaction is initiated by absorption of an intense light pulse, originally generated from flash lamps (duration 
«1|lis). Nowadays these have frequently been replaced by pulsed laser sources, with the shortest pulses of the 
order of a few femtoseconds [22, 64]. 

B2.5.4.1 FLASH PHOTOLYSIS WITH FLASH LAMPS 


The absorption of a light pulse 'instantaneously' generates reactive species in high concentrations, either 
through the formation of excited species or through photodissociation of suitable precursors. The reaction can 


then be followed spectroscopically by monitoring reactant and product concentrations. Among the classic 
studies using this technique, one may mention methyl radical spectroscopy used to study recombination at 
room temperature [56, 65, 66, 67, and 68 ]. 

A recent example of laser flash-lamp photolysis is given by Hippler et al [69], who investigated the 
temperature and pressure dependence of the thermal recombination rate constant k for the reaction 


O + NO 


|M| 


NO 


2' 


(B2.5.31) 


The experiments were performed in a static reaction cell in a large excess of N 2 (2-200 bar). An UV laser 
pulse (193 nm, 20 ns) started the reaction by the photodissociation of N 2 to form O atoms in the presence of 
NO. The reaction was monitored via the N0 2 absorption at 405 nm using a Hg-Xe high-pressure arc lamp, 
together with direct time-dependent detection. With a 20-200-fold excess of NO, the formation of N0 2 
followed a pseudo-first-order rate law: 


[NO,] = [NChU^U - exp{-* w [NO|f}>. 


(B2.5.32) 


Direct time-dependent detection is limited by the response time of detectors, which depends on the frequency 
range, and the electronics used for data acquisition. In the most favourable cases, modern 
detector/oscilloscope combinations achieve a time resolution of up to 100 ps, but 1 ns is more typical. Again, 
this reaction has been of fundamental theoretical interest for a long time [59, 60 ], 

B2.5.4.2 LASER FLASH PHOTOLYSIS AND PUMP-PROBE TECHNIQUES 

The so-called pump-probe technique uses a first photolysis pump pulse to generate reactive species and a 
second 'probe' pulse to detect reactant and product species. Figure B2.5.8 illustrates the experimental set-up. 
The time resolution is achieved through varying the delay between the pump pulse, which initiates the 
reaction, and the probe pulse, which monitors the reaction. Variable, short time delays in the picosecond range 
are conveniently realized through geometrical variation of the optical path length that the probe pulse travels. 
The probe pulse monitors reactant or product species either through direct absorption or by fluorescence 
excitation (laser-induced fluorescence, LIF), which generally is much more sensitive [70]. 
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Figure B2.5.8. Schematic representation of laser-flash photolysis using the pump-probe technique. The beam 
splitter BS splits the pulse coming from the laser into a pump and a probe pulse. The pump pulse initiates a 
reaction in the sample, while the probe beam is diverted by several mirrors M through a variable delay line. 


The detector D monitors the absorption of the probe beam as a function of the delay between the pulses given 
by x/2c, where c is the speed of light and x is the difference between the optical path travelled by the probe 
and by the pump pulse. Adapted from [ 110 ]. 

With the short pulses available from modern lasers, femtosecond time resolution has become possible [7, 71, 
72 and 73]. Producing accurate time delays between pump and probe pulses on this time scale represents a 

major challenge for the experimentalist, since light only travels 0.3 jum fs -1 . Table B2.5.2 summarizes typical 
laser pulses and characteristic times that are now available. 

Table B2.5.2. Examples for pulsed lasers with different pulse durations and corresponding path lengths. For 

comparison the last column also gives the distance travelled by atoms with a velocity of 1000 ms (in 
parentheses) [81]. 


Pulse duration Laser Availability Optical path 


1 00-200 ns Atmospheric CO Q laser Commercial 30-60 m 

1 (0.1-0.2 mm) 

1-2 ns Atmospheric C0 2 laser, mode coupled Available 30-60 cm 

with saturable absorber (1-2fxm) 

100 fs-1 ps Solid-state laser (e.g. Tksapphire), Commercial 0.03-0.3 mm 

dye laser (100 pm-1 nm) 

8 fs Laser with subsequent pulse World record [108] 2.4|um 

compression (8.4 pm) 

6.5 fs Ti:sapphire, mode coupling, with saturable World record [64] 2 jum 

absorber (semiconductor) (6.5 pm) 
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One of the early examples for kinetic studies on the femtosecond time scale is the photochemical 
predissociation of Nal [74]: 

(B2.5.33) 

The experiment is illustrated in figure B2.5.9. The initial pump pulse generates a localized wavepacket in the 
first excited S 1 state of Nal, which evolves with time. The potential well in the S 1 state is the result of an 
avoided crossing with the ground state. Every time the wavepacket passes this region, part of it crosses to the 
lower surface before the remainder is reflected at the outer wall of the S 1 potential. The crossing leads to 

ground-state dissociation products Na ( 2 S 1/2 ) + ( 2 Pv 2 )- The crossing is monitored with the time-delayed 
probe pulse, which excites ground-state Na ( 2 P ^- S 1/2 )- A photomultiplier detects the fluorescence back to 
the ground state. Figure B2.5.10 shows the resulting LIF signal as a function of the probe delay. One clearly 
recognizes the signature of the oscillatory wavepacket motion with a relatively long oscillation period of 1 ps, 
resulting from the flat potential and the heavy masses. 


Figure B2.5.9. Schematic representation of the potential curves for the photodissociation of Nal as a function 
of the interatomic distance R. LI and L2 are the pump and probe laser pulses, respectively. The dissociation 
limits are (I) Na( 2 S 1/2 ) + 1 ( 2 P 3/2 ), (II) Na ( 2 S 1/2 ) + 1 ( 2 P 1/2} ), (HI) Na^S^ + 1" ( l S Q ), and (IV) Na( 2 P) + 1 

( 2 P 3/2 ). 

E is the excitation energy relative to the lowest dissociation limit (I). Adapted from Rosker et al [74]. 
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Figure B2.5.10. LIF signal of free Na atoms produced in the photodissociation of Nal. t - t^ is the delay 
between the photolysis pulse (at t^) and the probe pulse. Adapted from [ 111 ]. 

B2.5.4.3 THE PRINCIPLE OF CONTINUOUS DETECTION WITH UNCERTAINTY-LIMITED TIME AND FREQUENCY 
RESOLUTION 

In this approach one uses narrow-band continuous wave (cw) lasers for continuous spectroscopic detection of 
reactant and product species with high time and frequency resolution. Figure B2.5.11 shows an experimental 
scheme using detection lasers with a 1 MHz bandwidth. Thus, one can measure the energy spectrum of 
reaction products with very high energy resolution. In practice, today one can achieve an uncertainty-limited 
resolution given by 


4;r 


(B2.5.34) 


This technique with very high frequency resolution was used to study the population of different hyperfine 
structure levels of the iodine atom produced by the IR-laser- flash photolysis of organic iodides through 
multiphoton excitation: 


tu 


CF 3 I -Z± CF^ + I ("P v2 , F = 1, 2, 3, 4), 


(B2.5.35) 
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Figure B2.5.11. Schematic set-up of laser- flash photolysis for detecting reaction products with uncertainty- 
limited energy and time resolution. The excitation C0 2 laser pulse LP (broken line) enters the cell from the 
left, the tunable cw laser beam CW-L (full line) from the right. A filter cell FZ protects the detector D, which 
determines the time-dependent absorbance, from scattered C0 2 laser light. The pyroelectric detector PY 
measures the energy of the C0 2 laser pulse and the photon drag detector PD its temporal profile. A complete 
description can be found in [ 109 ], 


Figure B2.5.12 shows the energy-level scheme of the fine structure and hyperfine structure levels of iodine. 
The corresponding absorption spectrum shows six sharp hyperfine structure transitions. The experimental 
resolution is sufficient to determine the Doppler line shape associated with the velocity distribution of the I 
atoms produced in the reaction. In this way, one can determine either the temperature in an oven — as shown 
in Figure B2.5.12 — or the primary translational energy distribution of I atoms produced in photolysis, 
equation B2.5.3 5 . 
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Figure B2.5.12. Hyperfine structure energy level scheme and spectrum for the I( 2 P 3/2 )^I( 2 P 1/2 |) fine 
structure transition [ 109 ]. 


B2.5.5 MULTIPHOTON EXCITATION 


B2.5.5.1 MECHANISMS OF MULTIPHOTON EXCITATION 


The common flash-lamp photolysis and often also laser-flash photolysis are based on photochemical 
processes that are initiated by the absorption of a photon, hv. The intensity of laser pulses can reach GW cm 
or even TW cm , where multiphoton processes become important. Figure B2.5.13 summarizes the different 
mechanisms of multiphoton excitation [75, 76, 112 ]. The direct multiphoton absorption of mechanism (i) 
requires an odd number of photons to reach an excited atomic or molecular level in the case of strict electric 
dipole and parity selection rules [ 117 ]. 
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The Goeppert-Mayer two- (or multi-) photon absorption, mechanism (ii), may look similar, but it involves 
intermediate levels far from resonance with one-photon absorption. A third, quasi-resonant stepwise 
mechanism (iii), proceeds via single- photon excitation steps involving near-resonant intermediate levels. 
Finally, in mechanism (iv), there is the stepwise multiphoton absorption of incoherent radiation from thermal 
light sources or broad-band statistical multimode lasers. In principle, all of these processes and their 
combinations play a role in the multiphoton excitation of atoms and molecules, but one can broadly 


distinguish two situations. 

(A) During the multiphoton excitation of molecular vibrations with IR lasers, many (typically 10-50) 
photons are absorbed in a quasi-resonant stepwise process until the absorbed energy is sufficient to 
initiate a unimolecular reaction, dissociation, or isomerization, usually in the electronic ground state. 
The record in the number of absorbed photons (about 500 photons of a C0 2 laser) was reached with the 
C 60 molecule [77]. This case proved an exception in that the primary reaction was ionization. The IR 
multiphoton excitation is the starting point for a new gas-phase photochemistry, IR laser chemistry, 
which encompasses numerous chemical processes. 

(B) The multiphoton excitation of electronic levels of atoms and molecules with visible or UV radiation 
generally leads to ionization. The mechanism is generally a combination of direct, Goeppert-Mayer, and 
quasi-resonant stepwise processes. Since ionization often requires only two or three photons, this type of 
multiphoton excitation is used for spectroscopic purposes in combination with mass-spectrometric 
detection of ions. 

B2.5.5.2 IR MULTIPHOTON EXCITATION AND IR LASER CHEMISTRY 

The most commonly used laser-light source in IR laser chemistry is the atmospheric C0 2 laser, with IR 

emission lines between 900 cm -1 and 1 100 cm -1 , in the fingerprint range of the IR spectrum, where 
characteristic molecular vibrations can be excited. With a photon energy of about 12 kJ mol -1 , on the order of 
10-40 photons are needed to initiate a chemical reaction in the energy range of 100-500 kJ mol . The laser 
pulses are 100 ns- jus long, but a series of 1-2 ns pulses can be generated by mode coupling. Typical 

intensities are 100 MW cm -2 . Figure B2.5.14 schematically illustrates the photodissociation of CF 3 I (equation 
B2.5.35 ) after multiphoton excitation via the CF stretching vibration at 1070 cm -1 . More than 17 photons are 
needed to break the C-I bond, a typical value in IR laser chemistry. Contributions from direct absorption (i) 
are insignificant, so that the process almost exclusively follows the quasi-resonant mechanism (iii), which can 
be treated by generalized first-order kinetics. As an example, figure B2.5.15 illustrates the formation of I 
atoms (upper trace) during excitation with the pulse sequence of a mode-coupled C0 2 laser (lower trace). In 
addition to the intensity, /, the fluence, F, of radiation is a very important parameter in IR laser chemistry (and 
more generally in multiphoton excitation): 


F(t) = I /(f')d/'. (B2.5.36) 
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Figure B2.5.13. Schematic representation of the four different mechanisms of multiphoton excitation: (i) 
direct, (ii) Goeppert-Mayer (iii) quasi-resonant stepwise and (iv) incoherent stepwise. Full lines (right) 
represent the coupling path between the energy levels and broken arrows the photon energies with angular 
frequency co (Aco is the frequency width of the excitation light in the case of incoherent excitation), see also 
[112]. 

Consequently, the reaction yield F in figure B2. 5. 15 is shown as a function of the fluence, F. At the end of a 

laser-pulse sequence with a typical fluence F — 3 J cm , practically 100% of the CF 3 I is photo lysed. As 

described in section B2. 5.4. 3 , the product-level distribution of the iodine atoms formed in this type of reaction 
can be determined 
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spectroscopically. Table B2.5.3 shows the results of such an analysis of the population of hyperfine structure 
levels and of the translational energy distribution for the IR multiphoton dissociation of different organic 
iodides. The average product translational energy (in the centre-of-mass system) does not change much from 
the small CF 3 I to the much larger C 6 F 5 I molecule. Its relative share of the total energy, however, decreases: 
much more energy appears as the internal energy of the C 6 F 5 fragment. This can be readily understood 
assuming a roughly statistical distribution over the large number of internal degrees of freedom. Such results 
are crucial for a more accurate dynamical understanding of the processes taking place during a chemical 
reaction. 


Table B2.5.3. Product energy distribution for some IR laser chemical reactions. (E t ) is the average relative 
translational energy of fragments, (E int ) is the average vibrational and rotational energy of polyatomic 
fragments, and/^ is the fraction of the total product energy appearing as translational energy [ 109 ]. 


Reaction 


(E t )/(kJmor 1 ) (E. nt )/(kJmor 1 ) f t 


CF 3 I -> CF 3 + I 


9.9 


CF3CHFI -> CF 3 CHF + I 10.9 


C 6 F 5'^ C 6 F 5 + I 


13.5 


19.8 

0.33 

100.9 

0.097 

233.0 

0.055 


CF^ + wAv co -f CFy + H 2 P J/2 ) 
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Figure B2.5.14. The IR laser chemistry of CF 3 I excited up to the dissociation energy D^ with about 17 quanta 
of a C0 2 laser, hv C02 - The dissociation is detected by uncertainty limited cw absorption (hv ), see figures 
B2.5.1 1 and B2.5.12. The energy levels of the C-I stretching vibration are not drawn to scale. In reality their 
separation is much smaller. Adapted from [ 109 ], 
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Figure B2.5.15. Iodine atom formation in the IR laser chemistry of CF 3 I (excitation at 1074.65 cm , probe 
on the F = 4 — » F = 3 hyperfine structure transition, see figure B2.5.12.) (a) The absorbance as a function of 
time (effective absorption cross section a eff , full curve, left ordinate) shows clear steps at each maximum of 
the mode locked C0 2 laser pulse sequence (intensity, broken curve, right ordinate), (b) The fraction F p of 
dissociating molecules as a function of fluence F. 

In exceptional cases, the IR laser excitation can lead to ionization. An interesting example is the C0 2 -laser- 
induced ionization of C 6Q , where n > 500 photons are absorbed and vibrations are excited far beyond the 
ionization threshold of the molecule 


Ciso 


trhv 


Qb + e" 


(B2.5.37) 




(B2.5.38) 


The CJois excited further and decomposes stepwise into CJj, CJ ftj e tc with the formation of C ? units [77], 
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B2.5.5.3 MULTIPHOTON IONIZATION 


In contrast to the ionization of C 60 after vibrational excitation, typical multiphoton ionization proceeds via the 
excitation of higher electronic levels. In principle, multiphoton ionization can either be used to generate ions 
and to study their reactions, or as a sensitive detection technique for atoms, molecules, and radicals in reaction 
kinetics. The second application is more common. In most cases of excitation with visible or UV laser 
radiation, a few photons are enough to reach or exceed the ionization limit. A particularly important technique 
is resonantly enhanced multiphoton ionization (REMPI), which exploits the resonance of monochromatic 
laser radiation with one or several intermediate levels (in one-photon or in multiphoton processes). The 
mechanisms are distinguished according to the number of photons leading to the resonant intermediate levels 
and to the final level, as illustrated in figure B2.5.16. Several lasers of different frequencies may be combined. 


i , till! 


3 (3 + 1) £3 + 2) (2+1) U + l + 1) (1 + 1) 

Figure B2.5.16. Different multiphoton ionization schemes. Each scheme is classified according to the number 
of photons that lead to resonant intermediate levels and to the ionization continuum (hatched area). Adapted 
from [ 110 ]. 

As an example, we mention the detection of iodine atoms in their 2 P 3/2 ground state with a 3 + 2 multiphoton 
ionization process at a laser wavelength of 474.3 nm. Excited iodine atoms ( 2 P 1/2 ) can a l so be detected 
selectively as the resonance condition is reached at a different laser wavelength of 477.7 nm. As an example, 
figure B2.5.17 hows REMPI iodine atom detection after IR laser photolysis of CF 3 I. This 'pump-probe' 
experiment involves two, delayed, laser pulses, with a 200 ns IR photolysis pulse and a 10 ns probe pulse, 
which detects iodine atoms at different times during and after the photolysis pulse. This experiment illustrates 
a fundamental problem of product detection by multiphoton ionization: with its high intensity, the short- 
wavelength probe laser radiation alone can photolyse the 


-25- 


reactant CF 3 I molecules. One cannot distinguish between iodine atoms produced by the photolysis pulse and 
those produced by the probe pulse. In the present example the problem is solved by the well-founded 
assumption that the photolysis of CF 3 I by a visible probe pulse produces excited iodine atoms ( 2 P 1/2 ), 
whereas the IR photolysis pulse leads to ground-state iodine atoms ( P 3 / 2 )- In general, however, significant 
perturbations of the reaction system are to be expected from the REMPI spectroscopic detection of products. 



Figure B2.5.17. (a) Time-dependent intensity / and reduced fluence F/F Q for a single-mode C0 2 laser pulse 
used in the IR laser photolysis of CF 3 I. F Q is the total fluence of the laser pulse, (b) VIS-REMPI iodine atom 
signals obtained with C0 2 laser pulses of different fluence (after [ 113 ]). 

B2.5.5.4 LASER ISOTOPE SEPARATION AND MODE-SELECTIVE REACTIONS 

Apart from the obvious property of defining pulses within short time intervals, the pulsed laser radiation used 
in reaction kinetics studies can have additional particular properties: (i) high intensity, (ii) high 
monochromaticity, and (iii) coherence. Depending on the type of laser, these properties may be more or less 
pronounced. For instance, the pulsed C0 2 lasers used in IR laser chemistry easily reach intensities between 

MW cm -2 and GW cm -2 . Special lasers used in nuclear fusion experiments may even reach 10 21 W cm -2 [ 78 , 
79 ]. Ideally the monochromaticity, Av, is related to the pulse length, At, through 


AvAt ^ I. 
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(B2.5.39) 


Although this limit is not always reached. The same is true for the coherence of the radiation. Each of these 

properties can be exploited for particular chemical applications. The monochromaticity can be used to initiate 

a chemical reaction of particular molecules in a mixture. The laser isotope separation of C and C in 

natural abundance exploits the isotope shift of molecular vibrational frequencies. At 10-50 cm , the 
corresponding shift of IR absorption wavenumbers is large compared to the spectral width of the C0 2 laser 

pulse (<0.1 cm -1 ), which makes the 13 C isotope separation relatively easy. Table B2.5.4 summarizes this and 
other similar applications [75, 80, 81] . The intermolecular selectivity of IR-multiphoton excitation can be 
greatly increased by two-frequency-two-step schemes such as in the new spectroscopic technique of IRLAPS 
(InfraRed Laser Assisted Photofragment Spectroscopy [ 115 ]). 


Table B2.5.4 Laser isotope separation (see also [75]). 


Isotope Source Comments 


2i_i CHF 9 CI Hi 9 h selectivity at room temperature 

1 0r BCL Ear| y laser isotope separation after IR multiphoton excitation high selectivity at room 

temperature 

13q CHF 2 CI Two-step separation scheme (220 mg 13 C h" 1 ) 

14 N 15 N CHoA/0 9 Selectivity through two absorption bands 

16q 17q OCS IR-UV double resonance; also selective for S and C 

29 Si 30 Si S'?^fi Reaction of both isotopes with high selectivity (high fluence) 

34 5p Early report of laser isotope separation 

35q, 37qi CF CI A| so selective with respect to C 

Mo MoFo Applied to several isotopes; low selectivity and yield 

235 m UF fi Dissociation with two lasers at different wavelengths (two-colour dissociation) 


Figure B2.5.18 compares this inter molecular selectivity with intra molecular or mode selectivity. In an IR 
plus UV, two-photon process, it is possible to break either of the two bonds selectively in the same HOD 
molecule. Depending on whether the OH or the OD stretching vibration is excited, the products are either H + 
OD or HO + D [24]. In large molecules, m^ramolecular selectivity competes with fast mfrmnolecular (i.e. 
unimolecular) vibrational energy redistribution (IVR) processes, which destroy the selectivity. In laser 
experiments with D-difluorobutane [82], it was estimated that, in spite of frequency selective excitation of the 

CHDF end group, no selective reaction would occur on time scales above 10 s, figure B2.5.18 . In contrast 
to IVR processes, which can be very fast, the mtermolecular energy transfer processes, which may reduce 
intermolecular selectivity, are generally much slower, since they proceed via bimolecular energy exchange, 
which is limited by the collision frequency (see chapter A3. 13 ). 
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Figure B2.5.18. General scheme for inter- and m^ramolecular selectivity in laser chemistry. Intermolecular 
selectivity: a laser with frequency v A selectively excites molecules A, which subsequently react, in a mixture 
of A and B molecules, intramolecular selectivity: a laser with frequency v 1 (v 2 ) selectively excites the 
chromophore Chr 1 (Chr 2 ) of a molecule which preferentially follows reaction 1 (2) at this position (after 
[75]). 

Strategies for achieving intra- and intermolecular selectivity are the subject of a very active field of current 
research with many open questions. Under the label 'coherent control' it includes approaches that exploit the 
coherence properties of laser radiation to control chemical reactions. Figure B2.5.18 summarizes the different 
schemes of intra- and intermolecular selectivity. 


B2.5.6 CHEMICAL ACTIVATION 

The formation of reactive species by photodissociation of a precursor through flash photolysis can be regarded 
as a special case of chemical activation. More generally, this technique exploits the enthalpy of a chemical 
reaction to generate species with a non-equilibrium energy distribution (relative to the ambient temperature). 
Using different reactions to produce the same reactive species allows one to study the energy dependence of 
the ensuing reaction kinetics (or collisional deactivation). Historically, the method has played a central role in 
the experimental study of collisional energy-transfer processes and non-equilibrium effects on chemical 
reaction rates [83, 84, 85, 86 and 87]. 
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Although modern laser techniques can in principle achieve much narrower energy distributions, optical 
excitation is frequently not a viable method for the preparation of excited reactive species. Therefore chemical 
activation — often combined with (laser-) flash photolysis — still plays an important role in gas-phase kinetics, 
in particular of unstable species such as radicals [88]. Chemical activation also plays an important role in 
energy-transfer studies (see chapter A3. 13 ). 


A recent study of the vibrational-to-vibrational (V-V) energy transfer between highly-excited oxygen 
molecules and ozone combines laser-flash photolysis and chemical activation with detection by time-resolved 
LIF [89]. Partial laser- flash photolysis at 532 nm of pure ozone in the Chappuis band produces translationally- 

hot oxygen atoms 0( P). In the chemical-activation step they react with ozone to form an electronic ground- 
state Oj.(X 3 E~ u' ; )with up to v" = 27 quanta of vibrational excitation in an excess of thermally populated 

ozone: 


0(V) + ? — > Q 2 +0 2 {X*Z-,v& I) 


Jt = 3.9 x KT 11 cm V 1 . 


(B2.5.40) 
(B2.5.41) 


The chemical-activation step is between one and two orders of magnitude faster than the subsequent 
collisional deactivation of vibrationally excited 2 - Finally, the population of individual vibrational levels v" 
of 2 is probed through LIF in the Schumann-Runge band (U £~ *— X £~); after exciting the oxygen 

molecules to the vibrational ground state of their first electronically excited state (v' = 0), the ensuing 
fluorescence back to the electronic ground state is detected by a photomultiplier tube and recorded as a 
function of time. The resulting collisional relaxation rate constants as a function of the vibrational excitation 
ofCL 




st-tO.OLOrfiOllOi] 


(B2.5.42) 


show a pronounced maximum near v" = 23, as illustrated in figure B2. 5. 19 . At this value, the v" — > v" -1 
transition happens to be in almost perfect resonance with the symmetric stretch fundamental of O3. The 
resonance enhancement by one to two orders of magnitude is typical for collisional V-V energy transfer of 
highly-excited molecules. 


5 : 


-29- 


1 '"> 


> t 


\ 


,ijJ \ 


T T T T TV' i ' 1 1 1 r t 1 1 i 1 » '"n i 


ID 


15 


20 


25 


30 


Figure B2.5.19. The collisional deactivation rate constant k v „ (0 3 ) (equation B2.5.42 ) as a function of the 
vibrational level v". Adapted from [89]. Experimental data are represented by full circles with error bars. The 
broken curve is to serve as a guide to the eye. 


B2.5.7 LINE-SHAPE METHODS 


Energy (or frequency) spectra are fundamentally related to the underlying time-dependent processes through 


the Fourier transformation. In practice, however, the relation between spectroscopically-observed line shapes 
and kinetic (reaction) processes is neither simple nor unambiguous [18]. There are many contributions to 
observed line shapes [33]. Apart from finite instrumental resolution, spectra may be inhomogeneously 
broadened through thermal congestion. A simple example is the Doppler broadening as a result of the 
Maxwell-Boltzmann velocity distribution leading to a Gaussian line shape. 

Even if the homogeneous line shape can be extracted, many other processes can contribute. Every decay 
process contributes to the finite lifetime of an excited species, A*, with an individual decay constant A p - 


A* A A; (B2-5.43) 

^ r = ^t;|A*j (B2.5.44) 


*eff — 2-r*J" (B2.5.45) 

j 
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The exponential decay of the A* population corresponds to a Lorentzian line shape for the absorption (or 
emission) cross section, a, as a function of energy E. The lineshape is centred around its maximum at E^. The 
full-width at half-maximum (r) is proportional to £ eff : 

rta -' (w < «-£ff<i7»>' <B2 ' 5 ' 46) 

r = ft e ff/l/(2;r). (B2.5.47) 

Apart from the natural lifetime due to spontaneous emission, both uni- and bimolecular processes can 
contribute to the observed value of T. One important contribution k QQ ^ comes from collisional broadening , 
which can be distinguished by its pressure dependence (or dependence upon concentration [M] of the collision 
partner): 


-(¥)' 


-..., - I ) (a w ,)|M|. (B2.5.48) 


Equation B2.5.48 introduces the effective average collision cross section (cj co1 ). Here, the lifetime broadening 
results from the (collisional) perturbation of A* by collisions with M. 

Lifetimes of 1 ps translate into linewidths of about 5 cm -1 . Thus, line-shape methods are ideally suited to 
measure very fast decay processes, in particular predissociation of excited species. An example is the 

predissociation of 2 molecules excited above 50 000 cm , which gives rise to the broadening of the 
Schumann-Runge bands 

O; — ► O + O. (B2.5.49) 

This is the source of ozone, through the reaction q^ + q[^' q^. One obtains a pronounced dependence of the 


decay rate on the vibrational level of C>*and to a lesser extent on its rotational state [90, 91]. Typical decay 

rate constants for this reaction range from 1.5 x 10 s to 7.5 x 10 s . Another important example is the 
predissociation of methyl radicals [54] 

CH; — > Products, (B2.5.50) 

The results are summarized in table B2.5.5 . The rate constants h of individual decay channels may be 
obtained from the relative yields of all primary reaction products, which can be determined in stationary 
experiments. 
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Table B2.5.5. The photochemical decomposition of methyl radicals (UV excitation at 216 nm). f is the 
wavenumber linewidth of the methyl radical absorption and k is the effective first-order decay constant [54], 


Decay process f (cm -1 ) Av = hcV (s~ 1 ) k = 2tiAv (s~ 1 ) t = Mk(fs) 

CIi;-»CEh + H 60 1.8 x10 12 1.13 x10 13 88 

CD^CI>> + D 8 2.4 x10 11 1.51 x10 12 663 


Similar considerations have been exploited for the systematic analysis of room-temperature and molecular- 
beam IR spectra in terms of intramolecular vibrational relaxation rates [33, 34, 92, 94] (see also chapter 
A3. 13 ). 


B2.5.8 INTRAMOLECULAR KINETICS FROM HIGH-RESOLUTION 
SPECTROSCOPY 

Molecular spectroscopy offers a fundamental approach to intramolecular processes [18, 94]- The spectral 
analysis in terms of detailed quantum mechanical models in principle provides the complete information 
about the wave-packet dynamics on a level of detail not easily accessible by time-resolved techniques. 

The approach is ideally suited to the study of IVR on fast timescales, which is the most important primary 
process in unimolecular reactions. The application of high-resolution ro vibrational overtone spectroscopy to 
this problem has been extensively demonstrated. Effective Hamiltonian analyses alone are insufficient, as has 
been demonstrated by explicit quantum dynamical models based on ab initio theory [95]. The fast IVR 
characteristic of the CH chromophore in various molecular environments is probably the most 
comprehensively studied example of the kind [ 96 ] (see chapter A3. 13 ). The importance of this question to 
chemical kinetics can perhaps best be illustrated with the following examples. The atom recombination 
reaction 


H + H+H^hh + H (B2.5.51) 


is well known to occur as a very slow trimolecular process. By contrast, the polyatomic recombination 

H + ■ CR|R 3 R^ -+ CHR|R : R; (B2.5.52) 

CHR i RiR^ + M -^ CH R i R^Rt + M (B2.5.53) 

happens quickly as a sequence of bimolecular recombination and collisional energy-transfer steps, with a 
relatively long-lived intermediate, CHRjRjR* The reason is the possibility of transferring energy 

intramolecularly from the 
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initially excited C-H bond to other parts of the polyatomic molecule, according to the scheme 

H + ■ CR^R* — H -*CR|R 2 R 3 (B2.5.54) 

H -'CR|R.R 3 ^ H-CR|R;R^ (B2.5.55) 

H-CRiRtR.^ ^ H-CRlR^R^ (B2.5.56) 

H-CR i R^Rr ^ HCR i R 2 + R^- (B2.5.57) 

This illustrates the steps of energy transfer from the initially highly-excited C-H bond to other parts of the 
molecule, subsequent concentration of energy in one part of the molecule (GRJ*) ? and finally rupture of the 

corresponding bond. A typical example of this kind is the chemical activation reaction (abbreviated) 

H + QH 5 -> C 2 H£ (B2.5.58) 

C 2 H* -* 2CH<. (B2.5.59) 

It is the first IVR step of B2.5.59 that is investigated by high-resolution spectroscopy. The analysis, outlined 
in some detail in [18], follows the scheme in figure B2.5.20. This kind of analysis has been applied to the 
evolution of entropy in the single, isolated molecule CHD 2 F, as shown in figure B2.5.21 . In this case, entropy 
is investigated as a relevant time-dependent observable of kinetics (see chapter A3. 4 ). In the example, the 
question of time-reversal symmetry on the femtosecond timescale has been studied [18, 114 , 116 ], but many 
other applications can be thought of. 
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Figure B2.5.20. The combined experimental and theoretical approach 'Molecular spectra and motion' (after 
[18]). 
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Figure B2.5.21. Time-dependent entropy S(t) I S max of CHD 2 F starting from a pure CH stretching excitation 
with six quanta (v g = 6) at t = fs. Time evolution with time reversal at t = 1 ps (after [ 114 ]). 


This kind of 'dynamical spectroscopic analysis' is not restricted to fast primary IVR processes. It would apply 
just as well to the study of completely unimolecular reactions, viz isomerizations such as H-atom transfer 
reactions, for example CH 2 ^HCHO [97] HCN ^HNC [98],and references cited therein), and HCCH^ 

H 2 CC [99] and references cited therein) (although the spectroscopic aspects have not been fully exploited in 
these cases), as well as the carefully studied NH 2 ^NH 3 [ 100 , 101 ]. Recent studies on the tunnelling 

dynamics of hydrogen peroxide and aniline have actually carried through the method to a model for one of 
chemistry's most fundamental processes: the stereomutation of chiral molecular structures [ 102 , 103 , 104 , 105 
and 106 ], see also [ 107 ]. Figure B2.5.22 illustrates the minimum energy path for the interconversion of the 
left- and right-handed forms of hydrogen peroxide, roughly corresponding to the torsion about the 0-0 bond. 
The quantum dynamics are governed by tunnelling through the low barrier in the trans configuration, even at 


very high energies, a phenomenon readily understood in terms of an adiabatic picture of the stereomutation 
kinetics. The detailed model extracted from experimental spectra with the support of quantum-chemical 
calculations allows one to describe the observed mode specificity of the stereomutation in terms of the full 
six-dimensional quantum wavepacket dynamics. The time-dependent probability density in the reaction 
coordinate ( figure B2.5.23 ) illustrates the acceleration of the stereomutation by IR excitation of the 
antisymmetric bend vibration (v 6 ). An approximately Gaussian wavepacket initially localized on one side of 
the trans barrier ( figure B2.5.22 ) moves periodically between the two potential wells. In the vibrational 
ground state (v = 0), this corresponds to the stereomutation of a chiral equilibrium structure through tunnelling 
to its enantiomer within 1.5 ps. Exciting the antisymmetric bend vibration (v 6 = 1) roughly halves the time 
required for stereomutation, an effect that could be 
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considered as catalysis by IR (vibrational) excitation. Figure B2.5.23 also illustrates the high degree of 
adiabaticity of this process: the initial form of the wavepacket in the reaction coordinate is found to be 

approximately conserved, even after about 10 tunnelling periods, although the spectroscopic result (as 
analysed by theory) is exact in full six-dimensional dynamics. While v 6 can thus be considered to be a 
promoting mode for stereomutation, other vibrations of H 2 2 (except torsion) have been shown to be 
inhibiting modes, slowing down the stereomutation process. Thus, one has cases of inhibition of a reaction by 
vibrational excitation. A certain degree of thermal averaging allows one to evaluate a relaxation time 
corresponding to rate constants more characteristic for ordinary racemization kinetics [ 103 , 107 ] in contrast to 
the strictly periodic process shown in figure B2.5.23 . 



Figure B2.5.22. Potential V along the minimum energy path for the stereomutation of hydrogen peroxide. 
Adapted from [ 103 ], 


Related results of promotion (catalysis) and inhibition of stereomutation by vibrational excitation have also 
been obtained for the much larger molecule, aniline-NHD (C 6 H 5 NHD), which shows short-time chirality and 
stereomutation [ 104 , 105 ]. This kind of study opens the way to a new look at kinetics, which shows 'coherent' 
and mode-selective dynamics, even in the absence of coherent external fields. The possibility of enforcing 
coherent dynamics by fields ('coherent control') is discussed in chapter A3. 13 . 
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Figure B2.5.23. Mode-specific stereomutation tunnelling in hydrogen peroxide: time-dependent probability 

I I 9 

density | W \ in the reaction coordinate x (see figure B2.5.22). The probability density was integrated over 
the remaining coordinates (OH stretches, OH bends, 00 stretch). The initial wavepacket at t = was strictly 
localized on one side of the torsional barrier. v 6 = refers to the vibrational ground state and v 6 = 1 to an 
initial state with one quantum of antisymmetric OOH bend excitation. 


B2.5.9 SUMMARIZING OVERVIEW ON GAS-PHASE KINETICS 
STUDIES 

Gas-phase kinetics studies are ideally concerned with the most fundamental events of chemical reactions 
related to 'isolated, single molecules' either as elementary unimolecular reactions, isolated bimolecular 
collisions, or trimolecular reactions. The experimental study of such fast elementary processes has progressed 
to a point where it is possible to 'prove a reaction mechanism' by identifying each elementary reaction 
contributing to the total reactive flux and by demonstrating that any conceivable additional contribution to the 
total reactive flux must be negligible. In fact gas-phase kinetics studies have even gone beyond this 
fundamental goal of reaction kinetics. By using the techniques of femtosecond spectroscopy and quantum- 
chemical kinetics from high-resolution spectroscopy it is possible to look into the very details of the primary 
processes that initiate chemical reactions. These fields are still in active development and most of the fruits 
from these fields still remain to be harvested. 
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B3.1 Quantum structural methods for atoms and 
molecules 

Jack Simons 


B3.1.1 WHAT DOES QUANTUM CHEMISTRY TRY TO DO? 

Electronic structure theory describes the motions of the electrons and produces energy surfaces and 
wavefunctions. The shapes and geometries of molecules, their electronic, vibrational and rotational energy 
levels, as well as the interactions of these states with electromagnetic fields lie within the realm of quantum 
structure theory. 

B3.1.1.1 THE UNDERLYING THEORETICAL BASIS— THE BORN-OPPENHEIMER MODEL 

In the Born-Oppenheimer [I] model, it is assumed that the electrons move so quickly that they can adjust 
their motions essentially instantaneously with respect to any movements of the heavier and slower atomic 
nuclei. In typical molecules, the valence electrons orbit about the nuclei about once every 10 s (the inner- 
shell electrons move even faster), while the bonds vibrate every 10~ 14 s, and the molecule rotates 
approximately every 10 -12 s. So, for typical molecules, the fundamental assumption of the Born- 
Oppenheimer model is valid, but for loosely held (e.g. Rydberg) electrons and in cases where nuclear motion 
is strongly coupled to electronic motions (e.g. when Jahn-Teller effects are present) it is expected to break 
down. 

This separation-of-time-scales assumption allows the electrons to be described by electronic wavefunctions 
that smoothly 'ride' the molecule's atomic framework. These electronic functions are found by solving a 
Schrodinger equation whose Hamiltonian /? e contains the kinetic energy T of the electrons, the Coulomb 

repulsions among all the molecule's electrons V QQ , the Coulomb attractions V Qn among the electrons and all of 
the molecule's nuclei, treated with these nuclei held clamped, and the Coulomb repulsions V nn among all of 
these nuclei, but it does not contain the kinetic energy T N of all the nuclei. That is, this Hamiltonian keeps the 
nuclei held fixed in space. The electronic wavefunctions \|/ k and energies Z? k that result 

thus depend on the locations {Q^} at which the nuclei are sitting. That is, the E^ and \|/ k are parametric 
functions of the coordinates of the nuclei, and, of course, the wavefunctions \|/ k depend on the coordinates of 
all of the electrons. 

These electronic energies' dependence on the positions of the atomic centres cause them to be referred to as 
electronic energy surfaces such as that depicted below in figure B3.1.1 for a diatomic molecule. For nonlinear 
polyatomic molecules having TV atoms, the energy surfaces depend on 37V- 6 internal coordinates and thus 
can be very difficult to visualize. In figure B3.1.2 , a 'slice' through such a surface is shown as a function of 
two of the 37V- 6 internal coordinates. 
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Figure B3.1.1. Energy as a function of internuclear distance for a typical bound diatomic molecule or ion. 

The Born-Oppenheimer theory is soundly based in that it can be derived from a Schrodinger equation 
describing the kinetic energies of all electrons and of all TV nuclei plus the Coulomb potential energies of 
interaction among all electrons and nuclei. By expanding the wavefunction W that is an eigenfunction of this 
full Schrodinger equation in the complete set of functions {\|/ k } and then neglecting all terms that involve 
derivatives of any \|/ k with respect to the nuclear positions {Q^, one can separate variables such that: 

(1) the electronic wavefunctions and energies obey 

t\it k - E k $ k 

(2) the nuclear motion (i.e. vibration/rotation) wavefunctions obey 

where T N is the kinetic energy operator for movement of all nuclei. 

Each and every electronic energy state, labelled k, has a set, labelled Z, of vibration/rotation energy levels E kL 
and wavefunctions % kL . 

B3.1.1.2 NON-BORN-OPPENHEIMER CORRECTIONS— RADIATIONLESS TRANSITIONS 

Because the Born-Oppenheimer model is obtained from the full Schrodinger equation by making 
approximations, it is not exact. Thus, in certain circumstances it becomes necessary to correct the predictions 
of the Born-Oppenheimer theory (i.e. by including the effects of the neglected coupling terms using 
perturbation theory). For example, when developing a theoretical model to interpret the rate at which electrons 
are ejected from rotationally/vibrationally hot NET ions, we had to consider \3] coupling between: 


W n NH in its v = 1 vibrational level and in a high rotational level (e.g. J> 30) prepared by laser 

excitation of vibrationally 'cold' NET in v = having high J (due to natural Boltzmann populations), see 
figure B3.1.3 and 
(2) 

S NH neutral plus an ejected electron in which the NH is in its v = vibrational level (no higher level 


is energetically accessible) and in various rotational levels (labelled TV)- 

Because NH has an electron affinity of 0.4 eV, the total energies of the above two states can be equal only if 
the kinetic energy KE carried away by the ejected electron obeys 

KE = *\fr /ftM {NH- (v = I . J)) - fc-,* /fDl (NH (iJ = 0, \')) - 0.4 cV. 

In the absence of any coupling terms, no electron detachment would occur. It is only by the anion converting 
some of its vibration/rotation energy and angular momentum into electronic energy that the electron that 
occupies a bound N 2 orbital in NH~ can gain enough energy to be ejected. 

My own research efforts [4] have, for many years, involved taking into account such non-Born-Oppen-heimer 
couplings, especially in cases where vibration/rotation energy transferred to electronic motions causes 

electron detachment, as in the NH~ case detailed above. Professor Yngve Ohrn has been active [5] in 
attempting to avoid using the Born-Oppenheimer approximation and, instead, treating the dynamical motions 
of the nuclei and electrons simultaneously. Professor David Yarkony has contributed much [6] to the recent 
treatment of non-Born-Oppenheimer effects and to the inclusion of spin-orbit coupling in such studies. 

B3.1.1.3 WHAT IS LEARNED FROM AN ELECTRONIC STRUCTURE CALCULATION? 

The knowledge gained via structure theory is great. The electronic energies E k (Q) allow one to determine [7] 
the geometries and relative energies of various isomers that a molecule can assume by finding those 
geometries {Q f } at which the energy surface E k has minima d E^l d Q f = 0, with all directions having positive 

curvature (this is monitored by considering the so-called Hessian matrix H f . = d 2 EjJd Qd Q.\ if none of its 
eigenvalues are negative, all directions have positive curvature). Such geometries describe stable isomers, and 
the energy at each such isomer geometry gives the relative energy of that isomer. Professor Berny Schlegel 
[8] has been one of the leading figures in using gradient and Hessian information to locate stable structures 
and transition states. Professor Peter Pulay [9] has done as much as anyone to develop the theory that allows 
us to compute gradients and Hessians for most commonly used electronic structure methods. 

There may be other geometries on the E k energy surface at which all 'slopes' vanish d Ejjd Q f = 0, but at 
which not all directions possess positive curvature. If the Hessian matrix has only one negative eigenvalue, 
there is only one direction leading downhill away from the point {Q f } of zero force; all the remaining 
directions lead uphill from this point. Such a geometry describes that of a transition state, and its energy plays 
a central role in determining the rates of reactions which pass through this transition state. The energy surface 
shown in figure B3.1.2 displays such transition states, and it also shows a second-order saddle point (i.e. a 
point where the gradient vanishes and the Hessian has two directions of negative curvature). 
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Figure B3.1.2. Two-dimensional slice through a (37V- 6)-dimensional energy surface of a polyatomic 
molecule or ion. After [2]. 

NH (v=l; J) 


NH (v=0; N) 

+ e" 


NH (v=0; J") 

Figure B3.1.3. Energies of NH~ and of NH pertinent to the autodetachment of v = 1, /levels of NH~ formed 
by laser excitation of v = 0, J' NH~ 

At any geometry {2/}? the gradient vector having components d Ejjd Q i provides the forces (F. = -8 E^l d 
Q t ) along each of the coordinates Q f . These forces are used in molecular dynamics simulations which solve 
the Newton F = ma equations and in molecular mechanics studies which are aimed at locating those 
geometries where the F vector vanishes (i.e. the stable isomers and transition states discussed above). 

Also produced in electronic structure simulations are the electronic wavefunctions {\|/^} and energies {E k } of 
each of the electronic states. The separation in energies can be used to make predictions on the spectroscopy 
of the system. The wavefunctions can be used to evaluate the properties of the system that depend on the 
spatial distribution of the electrons. For example, the z component of the dipole moment [ 10 ] of a molecule |u 
can be computed by integrating 


the probability density for finding an electron at position r multiplied by the z coordinate of the electron and 
the electron's charge e\ ^_ = J eif/f^z dr. The average kinetic energy of an electron can also be computed by 

carrying out such an average- value integral: f ^*(— ft 2 /2Jw c V i )^ Jt dr- The rules for computing the average 

value of any physical observable are developed and illustrated in popular undergraduate text books on 


physical chemistry [ 11 ] and in graduate-level texts [12]. 

Not only can electronic wavefunctions tell us about the average values of all the physical properties for any 
particular state (i.e. \\f k above), but they also allow us to tell us how a specific 'perturbation' (e.g. an electric 
field in the Stark effect, a magnetic field in the Zeeman effect and light's electromagnetic fields in 
spectroscopy) can alter the specific state of interest. For example, the perturbation arising from the electric 
field of a photon interacting with the electrons in a molecule is given within the so-called electric dipole 
approximation [12] by: 


ffpcn = JVr; - £(0 


where E is the electric field vector of the light, which depends on time t in an oscillatory manner, and r. gives 
the spatial coordinates of they'th electron. This perturbation, // pert cs 

with probabilities that are proportional to the square of the integral: 


/ 


$pfipmiri ^ 


So, if this integral were to vanish, transitions between \|/^ and fit would not occur, and would be referred to as 

'forbidden'. Whether such integrals vanish or not often is determined by symmetry. For example, if \\f k were 
of odd symmetry under a plane of symmetry a of the molecule, while V^were even under a , then the 

integral would vanish unless one or more of the three Cartesian components of the dot product r . • E were 

odd under a . The general idea is that for the integral not to vanish, the direct product of the symmetries of \\f k 
and of frmust match the symmetry of at least one of the symmetry components present in // rt . Professor 

Poul Jorgensen [13] has been involved in developing such so-called response theories for perturbations that 
may be time dependent (e.g. as in the interaction of light's electromagnetic radiation). 

B3.1. 1.4 SUMMARY 

In summary, computational ab initio quantum chemistry attempts to solve the electronic Schrodinger equation 
for the E k (R) energy surfaces and wavefunctions \\f k (r;R) on a 'grid' of values for the 'clamped' nuclear 
positions. Because the Schrodinger equation produces wavefunctions, it has a great deal of predictive power. 
Wavefunctions contain all the information needed to compute dipole moments, polarizability, etc and 
transition properties such as the electric dipole transition strengths among states. They also permit the 
evaluation of system responses with respect to external perturbations such as geometrical distortions [9], 
which provides information on vibrational frequencies and reaction paths. 


B3.1.2 WHY IS IT SO DIFFICULT TO CALCULATE ELECTRONIC 
ENERGIES AND WAVEFUNCTIONS WITH REASONABLE 
ACCURACY? 

As a scientific tool, ab initio quantum chemistry is not yet as accurate as modern laser spectroscopic 
measurements, for example. Moreover, it is difficult to estimate the accuracies with which various methods 
predict bond energies and lengths, excitation energies and the like. In the opinion of the author, chemists who 


rely on the results of quantum chemistry calculations must better understand what underlies the concepts and 
methods of this field. Only by so doing will they be able to judge for themselves the value of given quantum 
chemistry data to their own research. There exist a variety of sources of further information on the 'jargon', 
underlying theory, methodologies, and current strengths and weaknesses of ab initio quantum chemistry. In 
1996, Head-Gordon [ 14 ] produced a nice overview entitled 'Quantum chemistry and molecular processes', 
Schaefer et al [ 15 ] offered a very good discussion in 1995; Simons [ 16 ] offered a somewhat earlier 
perspective in 1991. The present chapter includes many of the ideas contained in these and other earlier 
descriptions of this field's impacts, but also attempts to extend the perspective to include more recent 
developments. 

Returning now to the issue of the accuracy of various electronic structure predictions, it is natural to ask why 

it is so difficult to achieve reasonable accuracy (i.e. ca. 1 kcal mol in computed bond energies or activation 
energies) even with the most sophisticated and computer-resource-intensive quantum chemistry calculations. 
The reasons include the following. 

(A) Many-body problems with R~ J potentials are notoriously difficult. It is well known that the Coulomb 

potential falls off so slowly with distance that mathematical difficulties can arise. The 4n R 2 dependence 

of the integration volume element, combined with the R dependence of the potential, produce ill- 
defined interaction integrals unless attractive and repulsive interactions are properly combined. The 
classical or quantum treatment of ionic melts [17], many-body gravitational dynamics [18] and 
Madelung sums [ 19 ] for ionic crystals are all plagued by such difficulties. 

(B) The electrons require quantal treatment and they are indistinguishable. The electron's small mass 
produces local de Broglie wavelengths that are long compared to atomic 'sizes', thus necessitating 
quantum treatment. Their indistinguishability requires that permutational symmetry be imposed on 
solutions of the Schrodinger equation. 

(C) All mean-field models of electronic structure require large corrections. Essentially all ab initio quantum 
chemistry approaches introduce a 'mean field' potential F f that embodies the average interactions 

among the TV electrons. The difference between the mean-field potential and the true Coulombic 
potential is termed [ 20 ] the 'fluctuation potential' . The solutions {^r, E k } to the true electronic 

Schrodinger equation are then approximated in terms of solutions {*", E±\to the model Schrodinger 

equation in which F mf is used. Improvements to the solutions of the model problem are made using 

perturbation theory or the variational method. Such approaches are expected to work when the 
difference between the starting model and the final goal is small in some sense. 

The most elementary mean-field models of electronic structure introduce a potential that an electron at r 1 
would experience if it were interacting with a spatially averaged electrostatic charge density arising from the 
TV- 1 remaining electrons: 


Vn,f(ri)= jf pH-](r r ) : -<h*'- 


In 


Here p^_ 1 (#*') represents the probability density for finding the N- 1 electrons at r', and e 2 I |r 1 - r'\ is the 
mutual Coulomb repulsion between electron density at r 1 and r'. 

The magnitude and 'shape' of such a mean-field potential is shown below [ 21 ] in figure B3.1.4 for the two Is 
electrons of a beryllium atom. The Be nucleus is at the origin, and one electron is held fixed 0.13 A from the 
nucleus, the maximum of the Is orbital 's radial probability density. The Coulomb potential experienced by the 
second electron is then a function of the second electron's position along the x-axis (connecting the Be 
nucleus and the first electron) and its distance perpendicular to the x-axis. For simplicity, this second electron 


is arbitrarily constrained to lie on the x-axis. Along this direction, the Coulomb potential is singular, and 
hence the overall interactions are very large. 
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Figure B3.1.4. Fluctuation and mean-field SCF potentials for a 2s electron in Be. 

On the ordinate, two quantities are plotted: (i) the mean-field potential between the second electron and the 
other Is electron computed, via the self-consistent field (SCF) process (described later), as the interaction of 

the second electron with a spherical |ls| charge density centred on the Be nucleus; and (ii) the fluctuation 
potential (F) of this average (mean-field) interaction. 

As a function of the inter-electron distance, the fluctuation potential decays to zero more rapidly than does the 
mean-field potential. However, the magnitude of Fis quite large and remains so over an appreciable range of 
inter-electron distances. The corrections to the mean-field picture are therefore quite large when measured in 

kcal mol -1 . For example, the differences (called pair correlation energies) A E between the true (state-of-the- 
art quantum chemical calculation as discussed later) energies of the interaction among the four electrons in the 
Be atom and the mean-field estimates of these interactions are given in table B3.1.1 in electronvolts (1 eV = 
23.06 kcal mor 1 ). 


Table 3.1.1 Pair correlation energies for the four electrons in Be. 


Orbital pair 1sa1sp 1sa2sa 1sa2sp 1sp2sa 1sp2sp 2sa2sp 
AE(eV) 1.126 0.022 0.058 0.058 0.022 1.234 


Another example of the difficulty is offered in figure B3.1.5. Here we display on the ordinate, for helium's S 
(Is ) state, the probability of finding an electron whose distance from the He nucleus is 0.13 A (the peak of 
the Is orbital' s density) and whose angular coordinate relative to that of the other electron is plotted on the 
abscissa. The He nucleus is at the origin and the second electron also has a radial coordinate of 0.13 A. As the 
relative angular coordinate varies away from 0°, the electrons move apart; near 0°, the electrons approach one 
another. Since both electrons have opposite spin in this state, their mutual Coulomb repulsion alone acts to 
keep them apart. 



Figure B3.1.5. Probability (as a function of angle) for finding the second electron in He when both electrons 
are located at the maximum in the Is orbital' s probability density. The bottom line is that obtained using a 
Hylleraas-type function, and the other related to a highly-correlated multiconfigurational wavefunction. After 
[22]. 

What figure B3.1.5 shows is that, for a highly accurate wavefunction (one constructed using so-called 
Hylleraas functions [23] that depend explicitly on the coordinates of the two electrons as well as on their 
interparticle distance coordinate), one finds a 'cusp' in the probability density for finding one electron in the 
neighbourhood of another electron with the same spin. The probability plot for the Hylleraas function is the 
lower bold curve in figure B3.1.5. The line above the Hylleraas plot was extracted from a configuration 
interaction wavefunction for He obtained using a rather large atomic orbital (AO) basis set [22]. Even for such 
a sophisticated wavefunction (of the type used in many state-of-the-art ab initio calculations), the cusp in the 
relative probability distribution is, clearly, not well represented. Finally, the Hartree-Fock (HF) probability, 
which is not even displayed above, would, if plotted, be flat as a function of the angle shown above and thus 
clearly very much in error. 

B3.1.2.1 SUMMARY 

The above evidence shows why an ab initio solution of the Schrodinger equation is a very demanding task if 
high accuracy is desired. The HF potential takes care of 'most' of the interactions among the TV electrons 
(which interact via long-range Coulomb forces and whose dynamics requires the application of quantum 
physics and permutational 


symmetry). However, the residual fluctuation potential is large enough to cause significant corrections to the 
HF picture. The reality is that electrons in atoms and molecules undergo dynamical motions in which their 
Coulomb repulsions cause them to 'avoid' one another at every instant of time, not only in the average- 
repulsion manner that the mean-field models embody. The inclusion of instantaneous spatial correlations 
(usually called dynamical correlations) among electrons is necessary to achieve a more accurate description 
of the atomic and molecular electronic structure. 


B3.1.3 WHAT ARE THE ESSENTIAL CONCEPTS OF AB INITIO 
QUANTUM CHEMISTRY? 

The mean-field potential and the need to improve it to achieve reasonably accurate solutions to the true 
electronic Schrodinger equation introduce three constructs that characterize essentially all ab initio quantum 
chemical methods: orbitals, configurations and electron correlation. 


B3.1.3.1 ORBITALS AND CONFIGURATIONS— WHAT ARE THEY (REALLY)? 


(A) HOW THE MEAN-FIELD MODEL LEADS TO ORBITALS AND CONFIGURATIONS 

The mean-field potentials that have proven most useful are all one-electron additive: V m ^(r) = X. V m f(r). 
Since the electronic kinetic energy t= E. t. operator is also one-electron additive, so is the mean-field 

Hamiltonian /?° = 7+ ^ mf . The additivity of/? implies that the mean-field energies £jVe additive and the 
wavefunctions {ij/^jcan be formed in terms of products of functions {§ k } of the coordinates of the individual 
electrons. 

Thus, it is the ansatz that F f is separable that leads to the concept of orbitals, which are the one-electron 
functions {§.} found by solving the one-electron Schrodinger equations: (7 1 + V m ^ r \)) <K r i) = £( K r i); ^e 
eigenvalues {s.} are called orbital energies. 

Given the complete set of solutions to this one-electron equation, a complete set of TV-electron mean-field 
wavefunctions can be written. Each {*J^]is constructed by forming a product of TV orbitals chosen from the set 

of {§.}, allowing each orbital in the list to be a function of the coordinates of one of the TV electrons (e.g. 
*i - l^(n)^jttto)^tn) * - ■ far-l(nv-l)4*tf (r^H' as above )- The corresponding mean-field energy 
is evaluated as the sum over those orbitals that appear in ^ £j = £ j v ^. 

Because of the indistinguishability of the TV electrons, the antisymmetric component of any such orbital 
product must be formed to obtain the proper mean-field wavefunction. To do so, one applies the so-called 
antisymmetrizer operator [24] A= T* p (-l) p p, where the permutation operator pmns over all TV! permutations 
of the TV electrons. Application of ^to a product function does not alter the occupancy of the functions {c^.} in 
[*]*"}, it simply scrambles the order which the electrons occupy the {c^.} and it causes the resultant function 

(which is often denoted \^j c \( r \)^j c 2( r 2^k3^ r 3^' ' '^kN-\^ r N-\^kN^ r ^ an< ^ ca ^ ec i a Slater determinant) to obey 
the Pauli exclusion principle. 

Because the electrons also possess intrinsic spin, the one-electron functions {§.} used in this construction are 
taken to 
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be eigenfunctions of ( 1 + m ^ r \)) multiplied by either an a or p spin function. This set of functions is called 
the set of mean-field spin orbitals. 

By choosing to place TV electrons into TV specific spin orbitals, one specifies a configuration. By making other 
choices of which TV(b. to occupy, one describes other configurations. Just as the one-electron mean-field 
Schrodinger equation has a complete set of spin-orbital solutions {§. and s.}, the TV-electron mean-field 
Schrodinger equation has a complete set of antisymmetric TV-electron Slater determinants. When these 
determinants are combined to generate functions that are eigenfunctions of the total S and S and 
eigenfunctions of the molecule's point group symmetry (or and for atoms), one has what are called 
configuration state functions (CSFs) whose mean-field energies are also given by . 

(B) THE SELF-CONSISTENT MEAN-FIELD (SCF) POTENTIAL 

The one-electron additivity of the mean-field Hamiltonian ° gives rise to the concept of spin orbitals for any 
additive mf (f)- I n f act ? there is no single mean-field potential; different scientists have put forth different 
suggestions for f over the years. Each gives rise to spin orbitals and configurations that are specific to the 
particular rnf . However, if the difference between any particular mean-field model and the full electronic 


Hamiltonian is fully treated, corrections to all mean-field results should converge to the same set of exact 
states. In practice, one is never able to treat all corrections to any mean-field model. Thus, it is important to 
seek particular mean-field potentials for which the corrections are as small and straightforward to treat as 
possible. 

In the most commonly employed mean-field models [25] of electronic structure theory, the configuration 
specified for study plays a central role in defining the mean-field potential. For example, the mean-field 

Coulomb potential felt by a 2p^ orbital's electron at a point r in the Is 2s 2p^2p configuration description of 
the carbon atom is: 


The above mean-field potential is used to find the 2p orbital of the carbon atom, which is then used to define 
the mean-field potential experienced by, for example, an electron in the 2s orbital: 


Notice that the orbitals occupied in the configuration under study appear in the mean-field potential. However, 
it is m f that, through the one-electron Schrodinger equation, determines the orbitals. For these reasons, the 
solution of these 


-11- 

equations must be carried out in a so-called SCF manner. One begins with an approximate description of the 
orbitals in {^^]. These orbitals then define V m ^> and the equations (7 1 + ¥ m f( r \))§( r \) = £ - §( r \) are solved 
for 'new' spin orbitals. These orbitals are then be used to define an improved Y & which gives another set of 
solutions to (r 1 + ^ m ^ r i))^-( r i) = £( K r i)- This iterative process is continued until the orbitals used to define 
^ mf are identical to those that result as solutions of (7 1 + ^ m ^ r \))^{ r \) = £ - <K r i)- When this condition is 
reached, one has achieved 'self-consistency'. 

B3.1.3.2 WHAT IS ELECTRON CORRELATION? 

By expressing the mean-field interaction of an electron at r with the TV- 1 other electrons in terms of a 
probability density p^_ j(r') that is independent of the fact that another electron resides at r, the mean-field 
models ignore spatial correlations among the electrons. In reality, as shown in figure B3.1.5 the conditional 
probability density for finding one of TV- 1 electrons at r', given that one electron is at r depends on r'. The 
absence of a spatial correlation is a direct consequence of the spin-orbital product nature of the mean-field 
wavefunctions {^]. 

To improve upon the mean-field picture of electronic structure, one must move beyond the single- 
configuration approximation. It is essential to do so to achieve higher accuracy, but it is also important to do 
so to achieve a conceptually correct view of the chemical electronic structure. Although the picture of 
configurations in which TV electrons occupy TV spin orbitals may be familiar and useful for systematizing the 
electronic states of atoms and molecules, these constructs are approximations to the true states of the system. 
They were introduced when the mean-field approximation was made, and neither orbitals nor configurations 
can be claimed to describe the proper eigenstates {^r, EA. It is thus inconsistent to insist that the carbon atom 
be thought of as Is 2s 2p while insisting on a description of this atom accurate to ±1 kcal mol . 

B3.1. 3.3 SUMMARY 

The SCF mean-field potential takes care of 'most' of the interactions among the TV electrons. However, for all 


mean-field potentials proposed to date, the residual or fluctuation potential is large enough to require 
significant corrections to the mean-field picture. This, in turn, necessitates the use of more sophisticated and 
computationally taxing techniques (e.g., high-order perturbation theory or large variational expansion spaces) 
to reach the desired chemical accuracy. 

For electronic structures of atoms and molecules, the SCF model requires quite substantial corrections to 
bring its predictions in line with experimental fact. Electrons in atoms and molecules undergo dynamical 
motions in which their Coulomb repulsions cause them to 'avoid' one another at every instant of time, not 
only in the average-repulsion manner of mean-field models. The inclusion of dynamical correlations among 
electrons is necessary to achieve a more accurate description of atomic and molecular electronic structure. No 
single spin-orbital product wavefunction is capable of treating electron correlation to any extent; its product 
nature renders it incapable of doing so. 
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B3.1.4 HOW TO INTRODUCE ELECTRON CORRELATION VIA 
CONFIGURATION MIXING 

B3.1.4.1 THE MULTI-CONFIGURATION WAVEFUNCTION 

In most of the commonly used ab initio quantum chemical methods [26], one forms a set of configurations by 
placing N electrons into spin orbitals in a manner that produces the spatial, spin and angular momentum 
symmetry of the electronic state of interest. The correct wavefunction ¥ is then written as a linear 
combination of the mean-field configuration functions [4^); 4> = T\ Ci^?- For example, to describe the 

ground S state of the Be atom, the ls 2 2s 2 configuration is augmented by including other configurations such 
as ls 2 3s 2 , ls 2 2p 2 , ls 2 3p 2 , ls 2 2s3s, 3s 2 2s 2 , 2p 2 2s 2 , etc, all of which have overall S spin and angular 
momentum symmetry. The various methods of electronic structure theory differ primarily in how they 
determine the {C k } expansion coefficients and how they extract the energy E corresponding to this X F. 

B3.1.4.2 THE PHYSICAL MEANING OF MIXING IN 'EXCITED' CONFIGURATIONS 

When considering the ground S state of the Be atom, the following four antisymmetrized spin-orbital 
products are found to have the largest C k amplitudes: 

* =£ C,|ls 3 2s?| - C 3 [|ls 3 2p 3 | + I ls-2p-| + |1s 2 2p=|]. 

The fact that the latter three terms possess the same amplitude C 2 is a result of the requirement that a state of 
*S symmetry is desired. It can be shown [ 27 ] that this function is equivalent to 

* = iCplIsfl ls/¥|l<2s - tf2|>, )*<2s +<i2\\ \fi - (2s - tf2p L )fl(2s + fi2p, )e*J 
i [<2s - A2p,)ff(2s i a2p, }fi - (2s - fl2p, tfJ(2s ■ alp, to] 
- [(2s - alp. kr(2s + alp t )fi-{2*-a2p z )fti2$ - «2p. )cr]| | 

where a = ^ICi/Ci 

Here two electrons occupy the Is orbital (with opposite, a and P spins) while the other electron pair resides in 
2s-2p polarized orbitals in a manner that instantaneously correlates their motions. These polarized orbital 


pairs (2s ± a2p are formed by combining the 2s orbital with the 2p orbital in a ratio determined 

by C^Cy This way of viewing an electron pair correlation forms the basis of the generalized valence bond 
(GVB) method that Professor Bill Goddard [ 28 ] pioneered. 

9 9 -"■ 9 9 

This ratio C^C^ can be shown to be proportional to the magnitude of the coupling (Is 2s |//|ls 2p z ) between 
the two 
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configurations involved and inversely proportional to the energy difference ((Is 2s /?|ls 2s ) - (Is 2p | 

-■■9 9 

//|ls 2p )) between these configurations. In general, configurations that have similar Hamiltonian expectation 
values and that are coupled strongly give rise to strongly mixed (i.e. with large |C 2 /C 1 | ratios) polarized orbital 
pairs. 

A set of polarized orbital pairs is described pictorially in figure B3.1.6. In each of the three equivalent terms 
in the above wavefunction, one of the valence electrons moves in a 2s+a2p orbital polarized in one direction 
while the other valence electron moves in the 2s - alp orbital polarized in the opposite direction. For 
example, the first term (2s - alp )a(2s+a2p )P - (Is-alp )P(2s+a2p )a describes one electron occupying a 
Is-alp polarized orbital while the other electron occupies the 2s+a2p orbital. The electrons thus reduce their 

Coulomb repulsion by occupying different regions of space; in the SCF picture ls 2 2s 2 , both electrons reside in 
the same 2s region of space. In this particular example, the electrons undergo angular correlation to 'avoid' 
one another. 


2s - a 2p, 



2s + a2p^ 
2s and 2p, 

Figure B3.1.6. Polarized orbital pairs involving 2s and 2p orbitals. 

Let us consider another example. In describing the n electron pair of an olefin, it is important to mix in 
'doubly excited' configurations oft] 
be made clear by using the identity 


'doubly excited' configurations of the form (tt) . The physical importance of such configurations can again 


C,|-.^ff^.-.|-C2|-.-#'flr^>..| 

== Ci /2{ | . . .(^ — jr^')af (^ + jt^)^. . .| — |. . -C^ — Jr^)/9<* + jr^)or. „ | ) 

where x = (C 2 /C 1 ) 1/2 . 

In this example, the two non-orthogonal 'polarized orbital pairs' involve mixing the n and n orbitals to 
produce two left-right polarized orbitals as depicted in figure B3.1.7 . Here one says that the n electron pair 
undergoes left-right correlation when the (n ) configuration is introduced. 
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Figure B3.1.7. Left- and right-polarized orbital pairs involving n and tt* orbitals. 
B3.1.4.3 ARE POLARIZED ORBITAL PAIRS HYBRID ORBITALS? 

It should be stressed that these polarized orbital pairs are not the same as hybrid orbitals. The latter are used to 

describe directed bonding, but polarized orbital pairs are each a 'mixture' of two mean-field orbitals with 

1 10 
amplitude x = (C^C^) and with a single electron in each, thereby allowing the electrons to be spatially 

correlated and to 'avoid' one another. In addition, polarized orbital pairs are not generally orthogonal to one 
another; hybrid orbital sets are. 

B3.1.4.4 RELATIONSHIP TO THE GENERALIZED VALENCE BOND PICTURE 

In these examples, the analysis allows one to interpret the combination of pairs of configurations that differ 
from one another by a 'double excitation' from one orbital ((|)) to another (c|)') as equivalent to a singlet 
coupling of two polarized orbitals (§ - a§') and (§ + ac|)'). As mentioned earlier, this picture is closely related 
to the GVB model that Goddard [28] and Goddard and Harding [29] developed. In the simplest embodiment 
of the GVB model, each electron pair in the atom or molecule is correlated by mixing in a configuration in 
which that pair is 'doubly excited' to a correlating orbital. The direct product of all such pair correlations 
generates the simplest GVB-type wavefunction. 

In most ab initio quantum chemical methods, the correlation calculation is actually carried out by forming a 
linear combination of the mean-field configuration state functions and determining the {C k } amplitudes by 
some procedure. The identities discussed in some detail above are then introduced merely to permit one to 
interpret the presence of configurations that are 'doubly excited' relative to the dominant mean-field 
configuration in terms of polarized orbital pairs. 

B3.1. 4.5 SUMMARY 


The dynamical interactions among electrons give rise to instantaneous spatial correlations that must be 
handled to arrive at an accurate picture of the atomic and molecular structure. The single-configuration picture 
provided by the mean-field model is a useful starting point, but it is incapable of describing electron 
correlations. Therefore, improvements are needed. The use of doubly-excited configurations is a mechanism 
by which W can place electron pairs, which in the mean-field picture occupy the same orbital, into different 
regions of space thereby lowering their mutual Coulombic repulsions. Such electron correlation effects are 
referred to as dynamical electron correlation; they are extremely important to include if one expects to 
achieve chemically meaningful accuracy. 
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B3.1.5 THE SINGLE-CONFIGURATION PICTURE AND THE HF 
APPROXIMATION 

Given a set of TV-electron space- and spin-symmetry-adapted configuration state functions {® .} in terms of 
which ¥ is to be expanded as l F = 2.C.O., two primary questions arise: (1) how to determine the {C} 
coefficients and the energy E and (2) how to find the 'best' spin orbitals {§ .}? Let us first consider the case 
where a single configuration is used so only the question of determining the spin orbitals exists. 

B3.1.5.1 THE SINGLE-DETERMINANT WAVEFUNCTION 

(A) THE CANONICAL SCF EQUATIONS 

The simplest trial function employed in ab initio quantum chemistry is the single Slater determinant function 
in which N spin orbitals are occupied by N electrons: 

For such a function, variational optimization of the spin orbitals to make the expectation value 0F|j/| v P) 
stationary produces [30] the canonical HF equations 

F4>t = $i4>i 

where the so-called Fock operator pis given by 

/(OtiCUp'cd) 

The Coulomb (j .) and exchange (£.) operators are defined by the relations 


/,-0,- = / <t>}ir')<pj{r')f\r - r'\ dt'*(r) 
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and 


tjfj = j 4>* {>')&{/ )f\r -r'\fc' 4>j{r) 


the symbol ^denotes the sum of the electronic kinetic energy, and electron-nuclear Coulomb attraction 
operators. The dt implies integration over the spin variables associated with the §, (and, for the exchange 


operator, §), as a result of which the exchange integral vanishes unless the spin function of §. is the same as 

that of (|).; the Coulomb integral is non- vanishing no matter what the spin functions of (|). and §.. 
i j i 

(B) THE EQUATIONS HAVE ORBITAL SOLUTIONS FOR OCCUPIED AND UNOCCUPIED ORBITALS 

The HF [31] equations P§. = e A- possess solutions for the spin orbitals in ¥ (the occupied spin orbitals) as 
well as for orbitals not occupied in ¥ (the virtual spin orbitals) because the poperator is Hermitian. Only the 
(j>. occupied in ¥ appear in the Coulomb and exchange potentials of the Fock operator. 

(C) THE SPIN-IMPURITY PROBLEM 

As formulated above, the HF equations yield orbitals that do not guarantee that ¥ has proper spin symmetry. 
To illustrate, consider an open-shell system such as the lithium atom. If lsa, ls(3, and 2sa spin orbitals are 
chosen to appear in ¥, the Fock operator will be 

/ : = /i + J im + Jup + 4* - [^i*, + £i* + K^l 

Acting on an a spin orbital § ka with F and carrying out the spin integrations, one obtains 

In contrast, when acting on a P spin orbital, one obtains 

Spin orbitals of a and P type do nctf experience the same exchange potential in this model because ¥ contains 
two a spin orbitals and only one P spin orbital. A consequence is that the optimal lsa and IsP spin orbitals, 
which are themselves solutions of p§. = &.§., do not have identical orbital energies (i.e. s, -*■ s lcR ) and are 

III 1 SCX ISO 

not spatially identical. This resultant spin polarization of the orbitals gives rise to spin impurities in ¥. The 
determinant |lsals'P2sa| is not a 
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pure doublet spin eigenfunction, although it is an S z eigenfunction with M = 1/2; it contains both S= 1/2 and 
S = 3/2 components. If the lsa and ls'P spin orbitals were spatially identical, then |lsa ls'P 2sa| would be a 
pure spin eigenfunction with S = 1/2. 

The above single-determinant wavefunction is referred to as being of the unrestricted Hartree-Fock (UHF) 
type because no restrictions are placed on the spatial nature of the orbitals in ¥. In general, UHF 
wavefunctions are not of pure spin symmetry for any open-shell system or for closed-shell systems far from 
their equilibrium geometries (e.g. for H 2 or N 2 at long bond lengths) These are significant drawbacks of 
methods based on a UHF starting point. Such a UHF treatment forms the basis of the widely used and highly 
successful Gaussian 70 through Gaussian-9X series of electronic structure computer codes [32] which derive 
from Pople [ 32 ] and co-workers. 

To overcome some of the problems inherent in the UHF method, it is possible to derive SCF equations based 
on minimizing the energy of a wavefunction formed by spin projecting a single Slater determinant starting 


1 10 

function (e.g. using {|lsa 2sP| - |ls(3 2sa|}/2 ' for the singlet excited state of He rather than |lsa 2sp|). It is 
also possible for a trial wavefunction of the form |lsa ls(3 2sa| to constrain the lsa and IsP orbitals to have 
exactly the same spatial form. In both cases, one then is able to carry out what are called restricted Hartree- 
Fock (RHF) calculations. 

B3.1.5.2 THE LINEAR COMBINATIONS OF ATOMIC ORBITALS TO FORM MOLECULAR ORBITALS EXPANSION OF THE 
SPIN ORBITALS 

The HF equations must be solved iteratively because the J. and K. operators in F depend on the orbitals o) . for 
which solutions are sought. Typical iterative schemes begin with a 'guess' for those (^ that appear in x ¥, which 
then allows f to be formed. Solutions to jft^. = e^. are then found, and those (^ which possess the space and 

spin symmetry of the occupied orbitals of W and which have the proper energies and nodal character are used 
to generate a new ^operator (i.e. new j\ and j^. operators). This iterative HF SCF process is continued until 

the (^ and e f do not vary significantly from one iteration to the next, at which time one says that the process 
has converged. 

In practice, solution of jft^. = e^- as an integro -differential equation can be carried out only for atoms [ 34 ] and 

linear molecules [ 35 ] for which the angular parts of the § f can be exactly separated from the radial because of 
axial- or full-rotation group symmetry (e.g. § f = Yj m (Q, §)R n ff) for an atom and § f = Gxp(im$)R n j m (p 9 z) for 
a linear molecule). 

In the procedures most commonly applied to nonlinear molecules, the § f are expanded in a basis % according 
to the linear combinations of AOs to form molecular orbitals (LCAO-MO) [ 36 ] procedure: 


h = 2^CV.;Xji- 
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This reduces p^ i = s^ to a matrix eigenvalue-type equation: 


where S = <% |% ) is the overlap matrix among the AOs and 

is the matrix representation of the Fock operator in the AO basis. Here and elsewhere, the symbol £is used to 
represent the electron-electron Coulomb potential e l\r - r'|. 

The charge- and exchange-density matrix elements in the AO basis are: 

i (occupied) 

and 


/(occupied .jik1 same spin J 

where the sum in K^runs over those occupied spin orbitals whose m § value is equal to that for which the Fock 
matrix is being formed (for a closed-shell species, y^\. = l/2y^). 

It should be noted that by moving to a matrix problem, one does not remove the need for an iterative solution; 
the F v matrix elements depend on the C y f LCAO-MO coefficients which are, in turn, solutions of the so- 
called Roothaan [ 30 ] matrix HF equations: S F y C v i = e.E S v C y .. One should also note that, just as jft^. = 

sfij possesses a complete set of eigenfunctions, the matrix F a whose dimension Mis equal to the number of 
atomic basis orbitals, has M eigenvalues E f and M eigenvectors whose elements are the C y .. Thus, there are 
occupied and virtual MOs each of which is described in the LCAO-MO form with the C ' • coefficients 
obtained via solution of S F C . = e.Z S C .. 

U |i,V V,Z Z TO |I,V V,Z 
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B3.1.5.3 AO BASIS SETS 

(A) SLATER-TYPE ORBITALS AND GAUSSIAN-TYPE ORBITALS 

The basis orbitals commonly used in the LCAO-MO process fall into two primary classes: 

0) Slater- type orbitals (STOs) % , (r, 0, §) = N n l ~ Y l Jfi^r* 1 exp(-^ r), are characterized by the 
quantum numbers n, I and m and the exponent (which characterizes the 'size') Q. The symbol ^ n \ m r 

denotes the normalization constant. 
(2) Cartesian Gaussian-type orbitals (GTOs) X&aA^^- #) = N f a b Jlf)^^ tXp(-orr^J, are characterized by the 
quantum numbers a, b and c, which detail the angular shape and direction of the orbital, and the 
exponent a which governs the radial 'size'. 

For both types of orbitals, the coordinates r, and § refer to the position of the electron relative to a set of 
axes attached to the centre on which the basis orbital is located. Although STOs have the proper 'cusp' 
behaviour near the nuclei, they are used primarily for atomic- and linear-molecule calculations because the 
multi-centre integrals which arise in polyatomic-molecule calculations cannot efficiently be performed when 
STOs are employed. In contrast, such integrals can routinely be done when GTOs are used. This fundamental 
advantage of GTOs has led to the dominance of these functions in molecular quantum chemistry. 

To overcome the primary weakness of GTO functions (i.e. their radial derivatives vanish at the nucleus 
whereas the derivatives of STOs are non-zero), it is common to combine two, three, or more GTOs, with 
combination coefficients which are fixed and not treated as LCAO-MO parameters, into new functions called 
contracted GTOs or CGTOs. Typically, a series of tight, medium, and loose GTOs are multiplied by 
contraction coefficients and summed to produce a CGTO, which approximates the proper 'cusp' at the nuclear 
centre. 

Although most calculations on molecules are now performed using Gaussian orbitals (STOs are still 
commonly employed in atomic calculations), it should be noted that other basis sets can be used as long as 
they span enough of the region of space (radial and angular) where significant electron density resides. In fact, 


it is possible to use plane wave orbitals [ 37 ] of the form % (r,0,(|)) = 7Vexp[i(£^r sin0 cosc|) + k r sin0 sin(|) + k^r 
cos0), where TV is a normalization constant and k x , k and k z are the quantum numbers detailing the momenta 
of the orbital along the x, y and z Cartesian directions. The advantage to using such 'simple' orbitals is that the 
integrals one must perform are much easier to handle with such functions; the disadvantage is that one must 
use many such functions to accurately describe sharply peaked charge distributions of, for example, inner- 
shell core orbitals. 

(B) BASIS SET LIBRARIES 

Much effort has been devoted to developing sets of STO or GTO basis orbitals for main-group elements and 
the lighter transition metals. This ongoing effort is aimed at providing standard basis set libraries which: 

(1) yield predictable chemical accuracy in the resultant energies; 

(2) are cost effective to use in practical calculations; 
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(3) are relatively transferable so that a given atom's basis is flexible enough to be used for that atom in 
various bonding environments. 

The fundamental core and valence basis. In constructing an AO basis, one can choose from among several 
classes of functions. First, the size and nature of the primary core and valence basis must be specified. Within 
this category, the following choices are common. 

(1) A minimal basis in which the number of STO or CGTO orbitals is equal to the number of core and 
valence AOs in the atom. 

(2) A double-zeta (DZ) basis in which twice as many STOs or CGTOs are used as there are core and 
valence AOs. The use of more basis functions is motivated by a desire to provide additional variational 
flexibility so the LCAO-MO process can generate MOs of variable diffuseness as the local 
electronegativity of the atom varies. 

(3) A triple-zeta (TZ) basis in which three times as many STOs or CGTOs are used as the number of core 
and valence AOs (and, yes, there now are quadruple-zeta (QZ) and higher-zeta basis sets appearing in 
the literature). 

(4) Dunning and Dunning and Hay [38] developed CGTO bases which range from approximately DZ to 
substantially beyond QZ quality. These bases involve contractions of primitive uncontracted GTO bases 
which Huzinaga [ 39 ] had earlier optimized. These Dunning bases are commonly denoted as follows for 
first-row atoms: (10s,6p/5s,4p), which means that 10 s-type primitive GTOs have been contracted to 
produce five separate s-type CGTOs and that six primitive p-type GTOs were contracted into four 
separate p-type CGTOs in each of the x, y and z directions. 

(5) Even-tempered basis sets [ 40 ] consist of GTOs in which the orbital exponents a, belonging to series of 

orbitals consist of geometrical progressions: a, = aft , where a and P characterize the particular set of 

GTOs. 

(6) STO-3G bases [ 41 ] were employed some years ago, but have recently become less popular. These bases 
are constructed by least-squares fitting GTOs to STOs which have been optimized for various electronic 
states of the atom. When three GTOs are employed to fit each STO, a STO-3G basis is formed. 

(7) 4-3 1G, 5-3 1G and 6-3 1G bases [ 42 ] employ a single CGTO of contraction length 4, 5, or 6 to describe 
the core orbital. The valence space is described at the DZ level with the first CGTO constructed from 
three primitive GTOs and the second CGTO built from a single primitive GTO. 

(8) More recently, the Dunning group has focused on developing basis sets that are optimal not for use in 
SCF-level calculations on atoms and molecules, but that have been optimized for use in correlated 
calculations. These so-called correlation-consistent bases [ 43 ] are now widely used because more and 
more ab initio calculations are being performed at a correlated level. 

(9) Atomic natural orbital (ANO) basis sets [44] are formed by contracting Gaussian functions so as to 
reproduce the natural orbitals obtained from correlated (usually using a configuration interaction with 


single and double excitation (CISD) level wavefunction) calculations on atoms. 

Optimization of the orbital exponents (£s or as) and the GTO-to-CGTO contraction coefficients for the kind 
of bases described above have undergone explosive growth in recent years. As a result, it is not possible to 
provide a single or even a few literature references from which one can obtain the most up-to-date bases. 
However, the theory group at the Pacific Northwest National Laboratories (PNNL) offer a webpage [45] from 
which one can find (and even download in a form prepared for input to any of several commonly used 
electronic structure codes) a wide variety of Gaussian atomic basis sets. 

Polarization functions. One usually enhances any core and valence functions with a set of so-called 
polarization 


-21- 


functions. They are functions of one higher angular momentum than appears in the atom's valence orbital 
space (e.g. d-functions for C, N and O and p-functions for H), and they have exponents (£ or a) which cause 
their radial sizes to be similar to the sizes of the valence orbitals (i.e. the polarization p orbitals of the H atom 
are similar in size to the Is orbital). Thus, they are not orbitals which describe the atom's valence orbital with 
one higher / value; such higher-/ valence orbitals would be radially more diffuse. 

The primary purpose of the polarization functions is to give additional angular flexibility to the LCAO-MO 
process in forming the valence MOs. This is illustrated below in figure B3.1.8 where polarization d^ orbitals 
are seen to contribute to formation of the bonding n orbital of a carbonyl group by allowing polarization of the 
carbon atom's p^ orbital toward the right and of the oxygen atom's p^ orbital toward the left. 



Figure B3.1.8. The role of d-polarization functions in the n bond between C and O. 

The polarization functions are essential in strained ring compounds because they provide the angular 
flexibility needed to direct the electron density into the regions between the bonded atoms. 

Functions with higher / values and with 'sizes' like those of lower-/ valence orbitals are also used to introduce 
additional angular correlation by permitting angularly polarized orbital pairs to be formed. Optimal 
polarization functions for first- and second-row atoms have been tabulated and are included in the PNNL 
Gaussian orbital web site data base [45]. 

Diffuse functions. When dealing with anions or Rydberg states, one must further augment the basis set by 
adding so-called diffuse basis orbitals. The valence and polarization functions described above do not provide 
enough radial flexibility to adequately describe either of these cases. Once again, the PNNL web site data base 
[45] offers a good source for obtaining diffuse functions appropriate to a variety of atoms. 

Once one has specified an AO basis for each atom in the molecule, the LCAO-MO procedure can be used to 


determine the C y . coefficients that describe the occupied and virtual orbitals. It is important to keep in mind 
that the basis orbitals are not themselves the SCF orbitals of the isolated atoms; even the proper AOs are 
combinations (with atomic values for the C y . coefficients) of the basis functions. The LCAO-MO-SCF 
process itself determines the magnitudes and signs of the C y f ; alternations in the signs of these coefficients 
allow radial nodes to form. 
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B3.1.5.4 THE PHYSICAL MEANING OF ORBITAL ENERGIES 

The HF-SCF equations fft^. = s^- imply that z f can be written as 

= <<!>Aft\<l>i) + £ Vu- K ul 

jlOL'i.'LljiJ^ih 

Thus e f is the average value of the kinetic energy plus the Coulombic attraction to the nuclei for an electron in 
(|). plus the sum over all of the spin orbitals occupied in W of the Coulomb minus exchange interactions. If (|). is 
an occupied spin orbital, the term [/. . - K. .] disappears and the latter sum represents the Coulomb minus 

exchange interaction of (^ with all of the TV- 1 other occupied spin orbitals. If § f is a virtual spin orbital, this 
cancellation does not occur, and one obtains the Coulomb minus exchange interaction of (^ with all TV of the 
occupied spin orbitals. 

Hence the orbital energies of occupied orbitals pertain to interactions appropriate to a total of TV electrons, 
while the orbital energies of virtual orbitals pertain to a system with TV + 1 electrons. This usually makes SCF 
virtual orbitals not very good for use in subsequent correlation calculations or for use in interpreting electronic 
excitation processes. To correlate a pair of electrons that occupy a valence orbital requires double excitations 
into a virtual orbital of similar size; the SCF virtual orbitals are too diffuse. For this reason, significant effort 
has been devoted to developing methods that produce so-called 'improved virtual orbitals' (IVOs) [ 46 ] that 
are of more utility in performing correlated calculations. 

(A) KOOPMANS' THEOREM 

Let us consider a model of the vertical (i.e. at fixed molecular geometry) detachment or attachment of an 
electron to an TV-electron molecule. 

(1) In this model, both the parent molecule and the species generated by adding or removing an electron are 
treated at the single-determinant level. 

(2) The HF orbitals of the parent molecule are used to describe both species. It is said that such a model 
neglects 'orbital relaxation' (i.e. the reoptimization of the spin orbitals to allow them to become 
appropriate to the daughter species). 

Within this model, the energy difference between the daughter and the parent can be written as follows 
(§ h represents the particular spin orbital that is added or removed): 
( 1 ) For electron detachment 


(2) For electron attachment 

E»-E™ =-e k . 
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So, within the limitations of the single-determinant, frozen-orbital model, the ionization potentials (IPs) and 
electron affinities (EAs) are given as the negative of the occupied and virtual spin-orbital energies, 
respectively. This statement is referred to as Koopmans ' theorem [47]; it is used extensively in quantum 
chemical calculations as a means for estimating IPs and EAs and often yields results that are qualitatively 
correct (i.e., ±0.5 eV). 

(B) ORBITAL ENERGIES AND THE TOTAL ENERGY 

The total SCF electronic energy can be written as 

E = T, <*-|ai*-} + E [■*>-*<>] 

and the sum of the orbital energies of the occupied spin orbitals is given by 

i i occupied ) t (occupied ^ *\ / (ooeiipied ^ 

These two expressions differ in a very important way; the sum of occupied orbital energies double counts the 
Coulomb minus exchange interaction energies. Thus, within the HF approximation, the sum of the occupied 
orbital energies is not equal to the total energy. 

B3.1.5.5 SOLVING THE ROOTHAAN SCF EQUATIONS 

Before moving on to discuss methods that go beyond the single-configuration mean-field model, it is 
important to examine some of the computational effort that goes into carrying out an SCF calculation. 

Once atomic basis sets have been chosen for each atom, the one- and two-electron integrals appearing in F 
must be evaluated. There are numerous, highly-efficient computer codes [ 48 ] which allow such integrals to be 
computed for s, p, d, f and even g, h and i basis functions. After executing one of these ' integral packages' for 
a basis with a total of P functions, one has available (usually on the computer's hard disk) of the order of 
P 2 12 one-electron ((%_\i | /J| x v ) and < x^ I X v » and ^ 4/8 two-electron « x^Xs I £| X v X K » integrals. When 

treating extremely large AO basis sets (e.g. 1000 or more basis functions), modern computer programs [ 49 ] 
calculate the requisite integrals, but never store them on the disk. Instead, their contributions to F v are 
accumulated 'on the fly' after which the integrals are discarded. Recently, much progress has been made 
towards achieving an evaluation of the non- vanishing (i.e. numerically significant) integrals [ 48 ] as well as 
solving the subsequent SCF equations in a manner whose effort scales linearly [50] with the number of basis 
functions for large P. 

After the requisite integrals are available or are being computed on the fly, to begin the SCF process one must 
input into the computer routine which computes F y the initial 'guesses ' for the C y i values corresponding to 
the occupied 
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orbitals. These initial guesses are typically made as follows. 

(1) If one has available the C . values for the system from a calculation performed at a nearby geometry, 

one can use these C . values. 

v,z 

(2) If one has C . values appropriate to fragments of the system (e.g. for C and O atoms if the CO molecule 
is under study or for CH 2 and O if H 2 CO is being studied), one can use these. 

(3) If one has no other information available, one can carry out one iteration of the SCF process in which 
the two-electron contributions to F are ignored (i.e. take F = ( y I h I y )) and use the resultant 

solutions to X F C . = eIS C . as initial guesses. 

U JLX,V V,Z Z U JLX,V V,Z & 

Once the initial guesses have been made for the C y f of the occupied orbitals, the full F v matrix is formed 
and new s. and C . values are obtained by solving Z F C . = s.S S C .. These new orbitals are then 

l V,Z J & U JLX,V V,Z Z U JLX,V V,Z 

used to form a new F v matrix from which new s^. and C y z - are obtained. This iterative process is carried on 
until the s f and C y i do not vary (within specified tolerances) from iteration to iteration, at which time the SCF 
process has reached self-consistency. 

B3.1.6 METHODS FOR TREATING ELECTRON CORRELATION 

B3.1.6.1 AN OVERVIEW OF VARIOUS APPROACHES 

There are numerous procedures currently in use for determining the 'best' wavefunction of the form 


where ® 7 is a spin- and space-symmetry-adapted CSF consisting of determinants l^/i^^B* * '§11^ ( see t— ' 
16 , 26]). In all such wavefunctions there are two kinds of parameters that need to be determined — the C^and 
the LCAO-MO coefficients describing the (^. The most commonly employed methods used to determine 
these parameters include the following. 

(A) THE MULTICONFIGURATIONAL SELF-CONSISTENT FIELD METHOD 

In this approach [51], the expectation value ( ¥ | g\ W) I { W | Y) is treated variationally and made stationary 
with respect to variations in the Cj and C y . coefficients. The energy functional is a quadratic function of the 
Cj coefficients, and so one can express the stationary conditions for these variables in the secular form 

j 

However, E is a quartic function of the C y s because Hjj involves two-electron integrals ((^.(b. | ^|^(|)/) that 
depend quartically on these coefficients. 
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It is well known that minimization of the function (E) of several nonlinear parameters (the C y z ) is a difficult 
task that can suffer from poor convergence and may locate local rather than global minima. In a 
multiconfigurational self-consistent field (MCSCF) wavefunction containing many CSFs, the energy is only 
weakly dependent on the orbitals that appear in CSFs with small (Rvalues; in contrast, E is strongly 
dependent on those orbitals that appear in the CSFs with larger Cj values. One is therefore faced with 
minimizing a function of many variables that depends strongly on several of the variables and weakly on 
many others. 

For these reasons, in the MCSCF method the number of CSFs is usually kept to a small to moderate number 
(e.g. a few to several thousand) chosen to describe essential correlations (i.e. configuration crossings, near 
degeneracies, proper dissociation, etc, all of which are often termed non-dynamical correlations) and 
important dynamical correlations (those electron-pair correlations of angular, radial, left-right, etc nature that 
are important when low-lying 'virtual' orbitals are present). 

(B) THE CONFIGURATION INTERACTION METHOD 

In this approach [52], the LCAO-MO coefficients are determined first via a single-configuration SCF 
calculation or an MCSCF calculation using a small number of CSFs. The Cj coefficients are subsequently 
determined by making the expectation value ( W | //l^ ) / ( ^l 1 ?) stationary. 

The CI wavefunction is most commonly constructed from CSFs of O. that include: 

(1) all of the CSFs in the SCF or MCSCF wavefunction used to generate the molecular orbitals (|).. These are 

referred to as the 'reference' CSFs; 

(2) CSFs generated by carrying out single-, double-, triple-, etc, level 'excitations' (i.e. orbital replacements) 
relative to reference CSFs. CI wavefunctions limited to include contributions through various levels of 
excitation are denoted S (singly), D (doubly), SD (singly and doubly), SDT (singly, doubly, and triply) 
excited. 

The orbitals from which electrons are removed can be restricted to focus attention on the correlations among 
certain orbitals. For example, if the excitations from the core electrons are excluded, one computes the total 
energy that contains no core correlation energy. The number of CSFs included in the CI calculation can be far 
in excess of the number considered in typical MCSCF calculations. CI wavefunctions including 5000 to 50 
000 CSFs are routine, and functions with one to several billion CSFs are within the realm of practicality [53]. 

The need for such large CSF expansions should not be surprising considering (i) that each electron pair 
requires at least two CSFs to form polarized orbital pairs, (ii) there are of the order of N(N- l)/2 =Xelectron 
pairs for TV electrons, hence (iii) the number of terms in the CI wavefunction scales as 2 . For a molecule 
containing ten electrons, there could be 2 45 = 3.5><10 13 terms in the CI expansion. This may be an 
overestimate of the number of CSFs needed, but it demonstrates how rapidly the number of CSFs can grow 
with the number of electrons. 

The Hjj matrices are, in practice, evaluated in terms of one- and two-electron integrals over the MOs using 
the Slater-Condon rules [54] or their equivalent. Prior to forming the Hjj matrix elements, the one-and two- 
electron integrals, 
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which can be computed only for the atomic (e.g. STO or GTO) basis, must be transformed [55] to the MO 


basis. This transformation step requires computer resources proportional to the fifth power of the number of 
basis functions, and thus is one of the more troublesome steps in most configuration interaction calculations. 

For large CI calculations, the full Hjj matrix is not formed and stored in the computer's memory or on disk; 
rather, 'direct CI' methods [56] identify and compute non-zero Hjj and immediately add up contributions to 
the sum ILHjjCj. Iterative methods [57], in which approximate values for the Cj coefficients are refined 
through sequential application of I^.Hjj to the preceding estimate of the Cj vector, are employed to solve 
these large eigenvalue problems. 

(C) THE M0LLER-PLESSET PERTURBATION METHOD 


This method [ 58 ] uses the single-configuration SCF process to determine a set of orbitals {c^}. Then, using an 
unperturbed Hamiltonian equal to the sum of the TV electrons' Fock operators /? = S /= 1 N P(i), perturbation 
theory is used to determine the C l amplitudes for the CSFs. The MPPT procedure [59] is a special case of 

many-body perturbation theory (MBPT) in which the UHF Fock operator is used to define 8 • The amplitude 
for the reference CSF is taken as unity and the other CSFs' amplitudes are determined by the Rayleigh- 
Schrodinger perturbation using //- /?° as the perturbation. 

In the MPPT/MBPT method, once the reference CSF is chosen and the SCF orbitals belonging to this CSF are 
determined, the wavefunction W and energy E are determined in an order-by-order manner. The perturbation 
equations determine what CSFs to include and their particular order. This is one of the primary strengths of 
this technique; it does not require one to make further choices, in contrast to the MCSCF and CI treatments 
where one needs to choose which CSFs to include. 

For example, the first-order wavefunction correction *¥ is 

iP 1 = - Y, WJ\8\*».n) - (ij\g\n.m)][* v - f. t +f, t -£,]"' | *£") 

where the SCF orbital energies are denoted s^ and *,-,) represents a CSF that is doubly excited (^> f and (|). are 
replaced by § m and (|) ) relative to O. Only doubly-excited CSFs contribute to the first-order wavefunction', the 

fact that the contributions from singly-excited configurations vanish in O 1 is known as the Brillouin theorem 
[60]. 

The energy E is given through second order as 

E=E$cf- ^ \{L j\s\m. tt) -{i.j\g\tt, m)\ 2 /[e m - Si +e H -Sj]. 
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Both W and E are expressed in terms of two-electron integrals ( i,j | f | m,n ) coupling the virtual spin orbitals 

§ m and § n to the spin orbitals from which the electrons were excited (^ and A. as well as the orbital energy 
differences [s m - s f + z n -z] accompanying such excitations. Clearly, the major contributions to the 
correlation energy are made by double excitations into virtual orbitals § m § n with large ( ij | § \ m,n ) integrals 

and small orbital energy gaps [s m - £ z - + & n ~ £■]. In higher-order corrections, contributions from CSFs that are 
singly, triply, etc excited relative to <P appear, and additional contributions from the doubly-excited CSFs also 


enter. 

(D) THE COUPLED-CLUSTER METHOD 

In the coupled-cluster (CC) method [ 61] , one expresses the wavefunction in a somewhat different manner: 

where O is a single CSF (usually the UHF determinant) used in the SCF process to generate a set of spin 
orbitals. The operator 71s expressed in terms of operators that achieve spin-orbital excitations as follows: 


r = X>^ + £ir>',V/7 + 


where the combination of operators w*?denotes the creation of an electron in the virtual spin orbital § m and 

the removal of an electron from the occupied spin orbital (|). to generate a single excitation. The operation 
m*n + j 1 therefore represents a double excitation from c))^ . to § m $ n > 

The amplitudes ff* ^J , etc, which play the role of the Cj coefficients in CC theory, are determined through 
the set of equations generated by projecting the Schrodinger equation in the form 

cxp(-n//exp(n<J> = E<J> 

against CSFs which are single, double, etc, excitations relative to O: 

i^uf ■ [iVt] 1 {[[fCri t\ < i[[[w. f]7n t] 1 ^[[[[//, ryri n ni*} = o 

«;tw - [iTf] * J[(/f^i:r] + kliinrriTi n + Jj[[[[//. rfrf, rj. rji*} = o 

(*:;,■'■ 1 h - [STrj + ^uw^rj + ^[[["^Pl H + ^LUl^ ^Mi "' 1- "'"Jl*) = " 

and so on for higher-order excited CSFs. 

It can be shown [62] that the expansion of the exponential operators truncates exactly at the fourth power in T. 
As a result, the exact CC equations are quartic equations for the r™ t t*j r \ etc amplitudes. The matrix elements 

appearing in 
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the CC equations can be expressed in terms of one- and two-electron integrals over the spin orbitals including 
those occupied in O and the virtual orbitals not in <P. 

These quartic equations are solved in an iterative manner and, as such, are susceptible to convergence 
difficulties. In any such iterative process, it is important to start with an approximation reasonably close to the 
final result. In CC theory, this is often achieved by neglecting all of the terms that are nonlinear in the t 
amplitudes (because the te are assumed to be less than unity in magnitude) and ignoring factors that couple 
different doubly-excited CSFs (i.e. the sum over /',/ , m' and n f ). This gives t amplitudes that are equal to the 


amplitudes of the first-order MPPT/MBPT wavefunction: 

As Bartlett [63] and Pople have both demonstrated [64], there is a close relationship between the 
MPPT/MBPT and CC methods when the CC equations are solved iteratively starting with such an 
MPPT/MBPT-like initial 'guess' for these double-excitation amplitudes. 

(E) DENSITY FUNCTIONAL THEORIES 

These approaches provide alternatives to the conventional tools of quantum chemistry. The CI, MCSCF, 
MPPT/MBPT, and CC methods move beyond the single-configuration picture by adding to the wavefunction 
more configurations whose amplitudes they each determine in their own way. This can lead to a very large 
number of CSFs in the correlated wavefunction and, as a result, a need for extraordinary computer resources. 

The density functional approaches are different [65], Here one solves a set of orbital-level equations 

-Tr/lm^ 2 -J^Z^/It - R A \ \ J p[r)v 2 f\r - r |dr i l/{r)L - ffA 

in which the orbitals {(^.} 'feel' potentials due to the nuclear centres (having charges Z A ), Coulombic 
interaction with the total electron density p(r') and a so-called exchange-correlation potential denoted U(r'). 
The particular electronic state for which the calculation is being performed is specified by forming a 
corresponding density p(r'). Before going further in describing how density functional theory (DFT) 
calculations are carried out, let us examine the origins underlying this theory. 

The so-called Hohenberg-Kohn [ 66 ] theorem states that the ground-state electron density p(r) describing an 
TV-electron system uniquely determines the potential V(r) in the Hamiltonian 


i l ^ kTJ ' 


and, because //determines the ground-state energy and wavefunction of the system, the ground-state density p 
(r) 
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determines the ground-state properties of the system. The proof of this theorem proceeds as follows. 

( a ) p(r) determines TV because jp(r),d 3 r = N. 

(b) Assume that there are two distinct potentials (aside from an additive constant that simply shifts the 
zero of total energy) V(r) and V(r) which, when used in //and //', respectively, to solve for a ground 

state produce E^, W(r) and E^ W f (r\ ^'(r) that have the same one-electron density: j |Y| , dr 2 , dr 3 . . 
dr 7V =p(r) = j|^| 2 ,dr 2 ,dr 3 ...dr iV . 

(c) If we think of W as trial variational wavefunction for the Hamiltonian //, we know that 


Eo < {*'|tf|*'} = (*'|£'|4'') + f fi(r)[V(r) - V-'(»-)]dV 

(d) Similarly, taking *? as a trial function for the H r Hamiltonian, one finds that 

£,; < E + f p{r)[V\r)- V(r)]dV 

(e) Adding the equations in (c) and (d) gives 

Eo + E£ < £ + ££, 

A clear contradiction. 

Hence, there cannot be two distinct potentials Fand V that give the same ground-state p(r). So, the ground- 
state density p(r) uniquely determines N and F, and thus /?, and therefore ¥ and £q. Furthermore, because *P 

determines all the properties of the ground state, then p(r), in principle, determines all such properties. This 
means that even the kinetic energy and the electron-electron interaction energy of the ground state are 

determined by p(r). It is easy to see that j p(r) V{r), d?r = V[p] gives the average value of the electron-nuclear 
(plus any additional one-electron additive potential) interaction in terms of the ground-state density p(r), but 
how are the kinetic energy T[p] and the electron-electron interaction V QQ [p] energy expressed in terms of p? 

The main difficulty with DFTs is that the Hohenberg-Kohn theorem shows that the ground-state values of T, 
^ ee? K etc are all unique functional of the ground-state p (i.e. that they can, in principle, be determined once 
p is given), but it does not tell us what these functional relations are. 

9 9 

To see how it might make sense that a property such as the kinetic energy, whose operator (-A /2m )V 

involves derivatives, can be related to the electron density, consider a simple system of N non-interacting 
electrons moving in a three-dimensional cubic 'box' potential. The energy states of such electrons are known 
to be 
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E = (h 2 f ilm J J ){iil+ til + n~) 

where L is the length of the box along the three axes and n , n and n are the quantum numbers describing the 
state. We can view t^ + w; + rr: = fl-as defining the squared radius of a sphere in three dimensions, and we 

realize that the density of quantum states in this space is one state per unit volume in the n ,n and n space. 

Because n x , n and nE = (h 2 /2m Q L 2 )R 2 is one-eighth the volume of the sphere of radius R: 

Since there is one state per unit of such volume, <&(E) is also the number of states with energy less than or 
equal to E, and is called the integrated density of states. The number of states g(E),dE with energy between E 
and E + dE, the density of states, is the derivative of O: 


g{E) -d<J>/dt - (nf4)(8m*L 1 fh 2 )* ,1 E lf2 . 


If we calculate the total energy for TV electrons, with the states having energies up to the so-called Fermi 
energy (E^) (i.e. the energy of the highest occupied molecular orbital HOMO) doubly occupied, we obtain the 
ground-state energy: 




#(£)£d£ = (Sx/S){2mJli 2 ) V2 L*Et'' 2 . 


The total number of electrons N can be expressed as 

tf< b ' ) d h • = m/3) ( 2m J lr) h ' 2 L 3 e¥ 2 




which can be solved for E^ in terms of TV to then express Eq in terms of N instead of E^\ 

This gives the total energy, which is also the kinetic energy in this case because the potential energy is zero 

within the 'box', in terms of the electron density p {x,y,z) = {NIL?). It therefore may be plausible to express 
kinetic energies in terms of electron densities p(r), but it is by no means clear how to do so for 'real' atoms 
and molecules with electron-nuclear and electron-electron interactions operative. 

In one of the earliest DFT models, the Thomas-Fermi theory, the kinetic energy of an atom or a molecule is 
approximated using the above type of treatment on a 'local' level. That is, for each volume element in r space, 
one 
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assumes the expression given above to be valid, and then one integrates over all r to compute the total kinetic 
energy: 

T 1¥ [p] = / (3/r/10m,)(V^)- /3 b(r)] v ^|- r = C\ A/>(r)] v Vr 

where the last equality simply defines the C F constant (which is 2.8712 in atomic units). Ignoring the 
correlation and exchange contributions to the total energy, this Tis combined with the electron-nuclear Fand 
Coulombic electron-electron potential energies to give the Thomas-Fermi total energy: 

EoMfl) = C F [[fl[Tl)*'*A*r + / ViT)flir)A*r+e*f2 f p(r)plr}f\r -r|dVdV, 

This expression is an example of how Eq is given as a local density functional approximation (LDA). The 
term local means that the energy is given as a functional (i.e. a function of p) which depends only on p(r) at 
the points in space, but not on p(r) at more than one point in space. 

Unfortunately, the Thomas-Fermi energy functional does not produce results that are of sufficiently high 
accuracy to be of great use in chemistry. What is missing in this theory are the exchange energy and the 
correlation energy; moreover, the kinetic energy is treated only in the approximate manner described. 


In the book by Parr and Yang [67], it is shown how Dirac was able to address the exchange energy for the 
'uniform electron gas' (N Coulomb interacting electrons moving in a uniform positive background charge 
whose magnitude balances the charge of the TV electrons). If the exact expression for the exchange energy of 
the uniform electron gas is applied on a local level, one obtains the commonly used Dirac local density 
approximation to the exchange energy: 


E^.nirocH = -C, / [p(r>] 4/ *d 


with C = (3/4) (3/tt) = 0.7386 in atomic units. Adding this exchange energy to the Thomas-Fermi total 
energy Eq TF [p] gives the so-called Thomas-Fermi-Dirac (TFD) energy functional. 

Because electron densities vary rather strongly spatially near the nuclei, corrections to the above 
approximations to T[p] and E Qx Dimc are needed. One of the more commonly used so-called gradient- 
corrected approximations is that invented by Becke [68], and referred to as the Becke88 exchange functional: 

E es (Becke88) - £ tf u>i«[/>] - y f r/' s (l + 6yx sinh" 1 ^))" 1 dr 

where x = p | Vp|, and y is a parameter chosen so that the above exchange energy can best reproduce the 
known exchange energies of specific electronic states of the inert gas atoms (Becke finds y to equal 0.0042). 
A common 
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gradient correction to the earlier T [p] is called the Weizsacker correction and is given by 

irwefeaciar = (l/72)(»/m fl ) I \Vp{r)\ 2 fp{r)dr. 

Although the above discussion suggests how one might compute the ground-state energy once the ground- 
state density p(r) is given, one still needs to know how to obtain p. Kohn and Sham [69] (KS) introduced a set 
of so-called KS orbitals obeying the following equation: 


j -{h 2 f2m*)V 2 + V(r) + e 2 /2 f p{r)/\r - r\ dr + U x Ar) 


^i = £j$j 


where the so-called exchange-correlation potential U xc (r) = 8 ^ xc [p]/5p(r) could be obtained by functional 
differentiation if the exchange-correlation energy functional ^ xc [p] were known. KS also showed that the KS 
orbitals {§.} could be used to compute the density p by simply adding up the orbital densities multiplied by 
orbital occupancies n-: 

p(r) = J2»A<f>;(r)\ 2 - 

J 

Here n. = 0, 1 or 2 is the occupation number of the orbital (|). in the state being studied. The kinetic energy 


should be calculated as 


The same investigations of the idealized 'uniform electron gas' that identified the Dirac exchange functional, 
found that the correlation energy (per electron) could also be written exactly as a function of the electron 
density p of the system, but only in two limiting cases — the high-density limit (large p) and the low-density 
limit. There still exists no exact expression for the correlation energy even for the uniform electron gas that is 
valid at arbitrary values of p. Therefore, much work has been devoted to creating efficient and accurate 
interpolation formulae connecting the low- and high-density uniform electron gas expressions (see appendix E 
in [67] for further details). One such expression is 


=/ 


£lL/\| = / p(r)fiAp)i\r 
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where 

is the correlation energy per electron. Here .v = rj'", X = x 2 +hx+t\ X# = ,v ( "j +&*[>+ rand Q = (4c - b 2 ) , A = 
0.062 1814, x n = -0.409 286, b = 13.0720, and c = 42.7198. The parameter r is how the density p enters since 
^JT';is equal to 1/p; that is, r § is the radius of a sphere whose volume is the effective volume occupied by one 

electron. A reasonable approximation to the full ^ xc [p] would contain the Dirac (and perhaps gradient 
corrected) exchange functional plus the above i? c [p], but there are many alternative approximations to the 
exchange-correlation energy functional [68]. Currently, many workers are doing their best to 'cook up' 
functionals for the correlation and exchange energies, but no one has yet invented functionals that are so 
reliable that most workers agree to use them. 

To summarize, in implementing any DFT, one usually proceeds as follows. 

(1) An AO basis is chosen in terms of which the KS orbitals are to be expanded. 

(2) Some initial guess is made for the LCAO-KS expansion coefficients C. : (j> . = S C % • 

(3) The density is computed as p(r) = H.n. \§.(r)\ . Often, p(r) is expanded in an AO basis, which need not 

be the same as the basis used for the § ., and the expansion coefficients of p are computed in terms of 

1 /i 
those of the (|).. It is also common to use an AO basis to expand p (r) which, together with p, is needed 

to evaluate the exchange-correlation functional 's contribution to Eq. 

v 4 ) The current iteration's density is used in the KS equations to determine the Hamiltonian {-1/2V + V(r) 
+ e 12 \ p(r f )/\r - r'ldr' + U (r)} whose 'new' eigenfunctions {d> .} and eigenvalues {s .} are found by 

solving the KS equations. 
(5) These new §. are used to compute a new density, which, in turn, is used to solve a new set of KS 

equations. This process is continued until convergence is reached (i.e. until the §. used to determine the 


current iteration's p are the same §. that arise as solutions on the next iteration). 
(6) Once the converged p(r) is determined, the energy can be computed using the earlier expression 

E[p] = J^fi ; (4j(r)\^<Tt/2m,)V%lr))^f V{r)p(r)d^ 2 f2 

p(ripii' r )/\r~r\d^dr r ^E^[p]. 


/ 


In closing this section, it should once again be emphasized that this area is currently undergoing explosive 
growth and much scrutiny [70]. As a result, it is nearly certain that many of the specific functional discussed 
above will be replaced in the near future by improved and more rigorously justified versions. It is also likely 
that extensions of DFTs to excited states (many workers are actively pursuing this) will be placed on more 
solid ground and made applicable to molecular systems. Because the computational effort involved in these 
approaches scales much less strongly [ 71 ] with the basis set size than for conventional (MCSCF, CI, etc) 
methods, density functional methods offer great promise and are likely to contribute much to quantum 
chemistry in the next decade. 


-34- 


(F) EFFICIENT AND WIDELY DISTRIBUTED COMPUTER PROGRAMS EXIST FOR CARRYING OUT ELECTRONIC 
STRUCTURE CALCULATIONS 

The development of electronic structure theory has been ongoing since the 1940s. At first, only a few 
scientists had access to computers, and they began to develop numerical methods for solving the requisite 
equations (e.g. the HF equations for orbitals and orbital energies, the configuration interaction equations for 
electronic state energies and wavefunctions). By the late 1960s, several research groups had developed 
reasonably efficient computer codes (written primarily in Fortran with selected subroutines that needed to be 
written especially efficiently in machine language), and the explosive expansion of this discipline was 
underway. By the 1980s and through the 1990s, these electronic structure programs began to be used by 
practicing 'bench chemists' both because they became easier to use and because their efficiency and the 
computers' speed grew to the point where modest to large molecules could be studied. 

Web page links [ 72 ] to many of the more widely used programs offer convenient access. At present, more 
electronic structure calculations are performed by non-theorists than by practicing theoretical chemists, 
largely because of the proliferation of such programs. This does not mean that all that needs to be done in 
electronic structure theory is done. The rates at which improvements are being made in the numerical 
algorithms used to solve the problems as well as at which new models are being created remain as high as 
ever. For example, Professor Rich Friesner [ 73 ] has developed and Professor Emily Carter [ 74 ] has 
implemented, for correlated methods, a highly efficient way to replace the list of two-electron integrals 

((|>A-I 1/t*i 2l ( M ) /)' w hi c h number 1ST, where TV is the number of AO basis functions, by a much smaller list (cj>A- 
\g) from which the original integrals can be rewritten as 


vfJI/n.2l4W - J2 { & {s) & is) n dr Mr>h(r)f\r-g\ 


uh t 


This tool, which they call pseudospectral methods, promises to reduce the CPU, memory and disk storage 
requirements for many electronic structure calculations, thus permitting their application to much larger 
molecular systems. In addition to ongoing developments in the underlying theory and computer 


implementation, the range of phenomena and the kinds of physical properties that one needs electronic 
structure theory to address is growing rapidly. There is every reason to believe that this sub-discipline of 
theoretical chemistry is continuing to blossom. 

B3.1.6.2 COMPUTATIONAL REQUIREMENTS, STRENGTHS AND WEAKNESSES OF VARIOUS METHODS 

(A) COMPUTATIONAL STEPS 

Essentially all of the techniques discussed above require the evaluation of one- and two-electron integrals over 
the N AO basis functions: (% \f\% b ) and (x^X^I^IXcXj)- As mentioned earlier, there are of the order of A^/8 

such two-electron integrals that must be computed (and perhaps stored on disk); their computation and storage 
is a major consideration in performing conventional ab initio calculations. Much current research is being 
devoted to reducing the number of such integrals that must be evaluated using either the pseudo-spectral 
methods discussed earlier or methods that approximate 
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integrals between product distributions (one such distribution is x^X c an d another is X^Xj when the integral 
(X a X b \K\X c Xj) is treated) whenever the distributions involve orbitals on sites that are distant from one another. 

Another step that is common to most, if not all, approaches that compute orbitals of one form or another is the 
solution of matrix eigenvalue problems of the form 


/ , **it m v**vj — £f / . ^pvifj- 


The solution of any such eigenvalue problem requires a number of computer operations that scales as the 
dimension of the F n matrix to the third power. Since the indices on the F n matrix label AOs, this means 

that the task of finding all eigenvalues and eigenvectors scales as the cube of the number of AOs (An). 

The DFT approaches involve basis expansions of orbitals (|>. = E C f , % and of the density p (or various 
fractional powers of p), which is a quadratic function of the orbitals (p = 1^. n^. | §. | ). These steps require 
computational effort scaling only as TV 2 , which is one of the most important advantages of these schemes. No 
cumbersome large CSF expansion and associated large secular eigenvalue problem arise, which is another 
advantage. 

The more conventional quantum chemistry methods provide their working equations and energy expressions 
in terms of one- and two-electron integrals over the final MOs: ((|)-|/|(|)-) and (M^I^A)- The MO-based 

l J l J K l 

integrals can only be evaluated by transforming the AO-based integrals [55] . Clearly, the N scaling of the 
integral transformation process makes it an even more time-consuming step than the (AT) atomic integral 
evaluation and a severe bottleneck to applying ab initio methods to larger systems. Much effort has been 
devoted to expressing the working equations of various correlated methods in a manner that does not involve 
the fully-transformed MO-based integrals. 

Once the requisite one- and two-electron integrals are available in the MO basis, the multiconfigurational 
wavefunction and energy calculation can begin. Each of these methods has its own approach to describing the 
configurations {O y } included in the calculation and how the {C.} amplitudes and the total energy E are to be 


determined. 

The number of configurations (N c ) varies greatly among the methods and is an important factor to keep in 
mind. Under certain circumstances (e.g. when studying reactions where an avoided crossing of two 
configurations produces an activation barrier), it may be essential to use more than one electronic 
configuration. Sometimes, one configuration (e.g. the SCF model) is adequate to capture the qualitative 
essence of the electronic structure. In all cases, many configurations will be needed if a highly accurate 
treatment of electron-electron correlations are desired. 

The value of N c determines how much computer time and memory is needed to solve the 7V c -dimensional S^ 
Hjj Cj= E Cj secular problem in the CI and MCSCF methods. Solution of these matrix eigenvalue equations 
requires computer time that scales as /V r ( -,(if few eigenvalues are computed) to fli(if most eigenvalues are 

obtained). 

So-called complete active space (CAS) methods form all CSFs that 
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can be created by distributing TV valence electrons among P valence orbitals. For example, the eight non-core 
electrons of H 2 might be distributed, in a manner that gives M s = 0, among six valence orbitals (e.g. two 

lone-pair orbitals, two OH a-bonding orbitals and two OH a -antibonding orbitals). The number of 
configurations thereby created is 225. If the same eight electrons were distributed among ten valence orbitals 
44 100 configurations result; for 20 and 30 valence orbitals, 23 474 025 and 751 034 025 configurations arise, 
respectively. Clearly, practical considerations dictate that CAS-based approaches be limited to situations in 
which a few electrons are to be correlated using a few valence orbitals. 

(B) VARIATIONAL METHODS PROVIDE UPPER BOUNDS TO ENERGIES 

Methods that are based on making the functional (*¥ | //I ^ ) / ( ^l^ ) stationary yield upper bounds to the 
lowest energy state having the symmetry of the CSFs in X F. The CI and MCSCF methods are of this type. 
They also provide approximate excited-state energies and wavefunctions in the form of other solutions of the 
secular equation [75] ^Hjj Cj= E Cj. Excited-state energies obtained in this manner obey the so-called 
bracketing theorem', that is^ between any two approximate energies obtained in the variational calculation, 
there exists at least one true eigenvalue. These are strong attributes of the variational methods, as is the long 
and rich history of developments of analytical and computational tools for efficiently implementing such 
methods. 

(C) VARIATIONAL METHODS ARE NOT SIZE-EXTENSIVE 

Variational techniques suffer from a serious drawback, however: they are not necessarily size extensive [76]. 
The energy computed using these tools cannot be trusted to scale with the size of the system. For example, a 
calculation performed on two CH 3 species at large separation may not yield an energy equal to twice the 
energy obtained by performing the same kind of calculation on a single CH 3 species. Lack of size extensivity 
precludes these methods from use in extended systems (e.g. polymers and solids) where errors due to 
improper size scaling of the energy produce nonsensical results. 

By carefully adjusting the variational wavefunction used, it is possible to circumvent size-extensivity 

problems for selected species. For example, the CI calculation on Be 2 using all S CSFs formed by placing 
the four valence electrons into the 2a , 2a u , 3a , 3a u , ln u , and In orbitals can yield an energy equal to twice 
that of the Be atom described by CSFs in which the two valence electrons of the Be atom are placed into the 


2s and 2p orbitals in all ways consistent with a S symmetry. Such CAS-space MCSCF or CI calculations [ 77 ] 
are size extensive, but it is impractical to extend such an approach to larger systems. 

(D) MOST PERTURBATION AND CC METHODS ARE SIZE-EXTENSIVE, BUT DO NOT PROVIDE UPPER BOUNDS AND 
THEY ASSUME THAT ONE CSF DOMINATES 

In contrast to variational methods, perturbation theory and CC methods achieve their energies by projecting 
the Schrodinger equation against a reference function (0| to obtain [78] a transition formula (O | //I ^X rather 
than from an expectation value ( W | //I ^)- It can be shown that this difference allows non-variational 
techniques to yield size-extensive energies. 
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This can be seen by considering the second-order MPPT energy of two non-interacting Be atoms. The 
reference CSF is $ = |]s;2s;ls:2s:|; as discussed earlier, only doubly-excited CSFs contribute to the 

correlation energy through second order. These 'excitations' can involve atom a, atom b, or both atoms. 
However, CSFs that involve excitations on both atoms (e.g. | ls^2s a 2p^ I s^2sh2|>hl) gi ye r i se to one ~ an d two- 
electron integrals over orbitals on both atoms (e.g. (2sj2pJ,t;|2sh2pi>)) that vanish if the atoms are far apart, so 
contributions due to such CSFs vanish. Hence, only CSFs that are excited on one or the other atom contribute 
to the energy. This, in turn, results in a second-order energy that is additive as required by any size-extensive 
method. In general, a method will be size extensive if its energy formula is additive and the equations that 
determine the Cj amplitudes are themselves separable. The MPPT/MBPT and CC methods possess these 
characteristics. 

However, size-extensive methods have two serious weaknesses. Their energies do not provide upper bounds 
to the true energies of the system (because their energy functional is not of the expectation- value form for 
which the upper bound property has been proven). Moreover, they express the correct wavefunction in terms 
of corrections to a (presumed dominant) reference function which is usually taken to be a single CSF 
(although efforts have been made to extend the MPPT/MBPT and CC methods to allow for 
multiconfigurational reference functions, this is not yet standard practice). For situations in which two CSFs 
'cross' along a reaction path, the single-dominant-CSF assumption breaks down, and these methods can have 
difficulty. 


B3.1.7 THERE ARE METHODS THAT CALCULATE ENERGY 
DIFFERENCES RATHER THAN ENERGIES 

In addition to the myriad of methods discussed above for treating the energies and wavefunctions as solutions 
to the electronic Schrodinger equation, there exists a family of tools that allow one to compute energy 
differences 'directly' rather than by first finding the energies of pairs of states and subsequently subtracting 
them. Various energy differences can be so computed: differences between two electronic states of the same 
molecule (i.e. electronic excitation energies A E), differences between energy states of a molecule and the 
cation or anion formed by removing or adding an electron (i.e. IPs and EAs). 

Because of space limitations, we will not be able to elaborate much further on these methods. However, it is 
important to stress that: 

(1) these so-called Greens function ox propagator methods [ 71 ] utilize essentially the same input 


information (e.g. AO basis sets) and perform many of the same computational steps (e.g. evaluation of 
one- and two-electron integrals, formation of a set of mean-field MOs, transformation of integrals to the 
MO basis, etc) as do the other techniques discussed earlier; 
(2) these methods are now rather routinely used when A E, IP, or EA information is sought. In fact, the 1998 
version of the Gaussian program includes an electron propagator option. 

The basic ideas underlying most, if not all, of the energy-difference methods follow 
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(1) One forms a reference wavefunction W (this can be of the SCF, MPn, CC, etc variety); the energy 
differences are computed relative to the energy of this function. 

(2) One expresses the final-state wavefunction W (i.e. describing the excited, cation, or anion state) in terms 
of an operator Q acting on the reference *P: x ¥' = Q}¥ Clearly, the Q operator must be one that removes 
or adds an electron when one is attempting to compute IPs or EAs, respectively. 

(3) One writes equations which W and W are expected to obey. For example, in the early development of 
these methods [80], the Schrodinger equation itself was assumed to be obeyed, so ffW = E ¥ and ff'W = 

E *F' are the two equations (note that, in the IP and EA cases, the latter equation, and the associated 
Hamiltonian //', refer to one fewer and one more electrons than does the reference equation ffW = E*¥). 

(4) One combines Q}¥ = W with the equations that *P and W obey to obtain an equation that Q must obey. 
In the above example, one: (a) uses QW = W in the Schrodinger equation for W, (b) allows Q to act 
from the left on the Schrodinger equation for W and (c) subtracts the resulting two equations to achieve 
iH'tt - «//)* = <F - E)ft* or > in commutator form f //^ Q\^ = AEft^ By expressing the 
Hamiltonian in the second-quantization form, only one //appears in this final so-called equation of 
motion (EOM) f H, R]A* = Aiv^^(i- e - i n the second-quantized form, //' and //are one and the same). 

(5) One can, for example, express ¥ in terms of a superposition of configurations ¥ = Z j C^O. whose 

amplitudes Cj have been determined from an MCSCF, CI or MPn calculation and express Q in terms of 

second-quantization operators {O k } that cause single-, double-, etc, level excitations (for the IP (EA) 

cases, Q is given in terms of operators that remove (add), remove and singly excite (add and singly 
excite) electrons): Q = ^ K E> K K . 

(6) Substituting the expansions for *¥ and for j^into the EOM [H,qY¥ = A E&V, and then projecting the 
resulting equation on the left against a set of functions (e.g. {6^,| V F)} or {6^, |®A where O q is the 
dominant component of V F), gives a matrix eigenvalue-eigenvector equation: 

A' A' 

to be solved for the /^operator coefficients and the excitation energies A E. Such are the working equations 
of the EOM (or Greens function or propagator) methods. 

In recent years, these methods have been greatly expanded and have reached a degree of reliability where they 
now offer some of the most accurate tools for studying excited and ionized states. In particular, the use of 
time-dependent variational principles have allowed the much more rigorous development of equations for 
energy differences and nonlinear response properties [81] . In addition, the extension of the EOM theory to 
include coupled-cluster reference functions [82] now allows one to compute excitation and ionization energies 
using some of the most accurate ab initio tools. 


B3.1.8 SUMMARY OF AB INITIO METHODS 


At this time, it may not be possible to say which method is preferred for applications where all are practical. 
Nor is it possible to assess, in a way that is applicable to most chemical species, the accuracies with which 
various methods 
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predict bond lengths and energies or other properties. However, there are reasons to recommend some 
methods over others in specific cases. For example, certain applications require a size-extensive energy (e.g. 
extended systems that consist of a large or macroscopic number of units or studies of weak intermolecular 
interactions), so MBPT/MPPT-, CC- or CAS-based MCSCF are preferred. Moreover, many chemical 
reactions and bond-breaking events require two or more 'essential' electronic configurations. For them, 
single-configuration-based methods such as conventional CC and MBTP/MPPT should be used only with 
caution; MCSCF or CI calculations are preferred. Very large molecules, in which thousands of AO basis 

functions are required, may be impossible to treat by methods whose effort scales as A^ or higher; density 
functional methods would be the only choice then. 

For all calculations, the choice of AO basis set must be made carefully, keeping in mind the A^ scaling of the 
two-electron integral evaluation step and the N 5 scaling of the two-electron integral transformation step. Of 
course, basis functions that describe the essence of the states to be studied are essential (e.g. Rydberg or anion 
states require diffuse functions and strained rings require polarization functions). 

As larger atomic basis sets are employed, the size of the CSF list used to treat a dynamic correlation increases 
rapidly. For example, many of the above methods use singly- and doubly-excited CSFs for this purpose. For 
large basis sets, the number of such CSFs (N c ) scales as the number of electrons squared uptimes the number 

of basis functions squared N . Since the effort needed to solve the CI secular problem varies as ft r ( ~or A r c (the 
latter being to find all eigenvalues and vectors), a dependence as strong as n*N*can result. To handle such 

large CSF spaces, all of the multiconfigurational techniques mentioned in this paper have been developed to 
the extent that calculations involving of the order of 100-5000 CSFs are routinely performed and calculations 
using even several billion CSFs are possible [53], 

Some of the most significant advances that have been made recently in expanding the applicability of the ab 
initio methods to larger systems are based on recognizing that many of the two-electron integrals and one- and 
two-electron density matrix elements arising in the pertinent working equations vanish if expressed in terms 
of localized (atomic or molecular) orbitals. For example, in a polymer consisting of P monomer units (or a 
crystal composed of P unit cells), the integrals and density matrix elements indexed by monomer units far 
distant from one another are negligible. Thus, if a method whose effort scales as the Ml power of the number 
of AOs (TV) per monomer (or unit cell) is applied to a system having P units, the effort should not scale as 

(PN) k but, hopefully, as PN K . Indeed, for the DFT (k = 3), SCF (k = 4) and MP2 (k = 5) methods, specialized 
techniques [50] have allowed for the implementation of codes scaling linearly (or nearly so for MP2) with the 
system 'size' P (i.e. the number of units). 

Other methods, most of which can be viewed as derivatives of the techniques introduced above, have been 
and are still being developed; stimulated by the explosive growth in computer power and changes in computer 
architecture realized in recent years. All indications are that this growth pattern will continue; so ab initio 
quantum chemistry is likely to have an even larger impact on future chemistry research and education 
(through new insights and concepts). For many of the most commonly employed ab initio quantum chemistry 
tools, the computational efforts, as characterized by how they scale with the system size P (i.e. the number of 
units), with basis set size TV and with the number of electronic configurations 7V C , as well their variational 
nature and size extensivity are summarized in table B3.1.2 . 
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Table 3.1.2 Properties of commonly used methods. 


Method 

Variational/size extensive 

Computational scaling 

HF 

Yes/Yes 

A/ 4 integrals; A/ 3 eigenvalues; P 1 

GVB 

Yes/Yes 

A/ 4 integrals 

A/ 4 (per electron pair) GVB equations 

DFT 

No/Yes 

A/ 3 eigenvalues; A/ 2 integrals; P 1 
A/ 3 orbital orthogonalization; P 1 

MP2 

No/Yes 

A/ 5 ;P 2 

CI 

Yes/No 

A/ 5 transformed integrals; 


CISD 


Yes/No 


CAS-MCSCF Yes/Yes 


CCS 

No/Yes 

CCSD 

No/Yes 

CCSDT 

No/Yes 

CCSD(T) 

No/Yes 


N to solve for one CI energy and eigenvector 

N transformed integrals; 

n N 4 to solve for one CI energy and eigenvector 

N transformed integrals; 

A/ c to solve for CI energy; many iterations also needed 

A/ 4 
A/ 6 
A/ 8 
A/ 7 


Figure B3.1.9 [ 83 ] displays the errors (in picometres compared to experimental findings) in the equilibrium 
bond lengths for a series of 28 molecules obtained at the HF, MP2-4, CCSD, CCSD(T), and CISD levels of 
theory using three polarized correlation-consistent basis sets (valence DZ through to QZ). 
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Figure B3.1.9. Distribution in errors (picometres) in calculated bond lengths for 28 test molecules. 
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Clearly, the HF method, independent of basis, systematically underestimates the bond lengths over a broad 
percentage range. The CISD method is neither systematic nor narrowly distributed in its errors, but the MP2 
and MP4 (but not MP3) methods are reasonably accurate and have narrow error distributions if valence TZ or 
QZ bases are used. The CCSD(T), but not the CCSD, method can be quite reliable if valence TZ or QZ bases 
are used. 


In closing this section and this chapter, I wish to remind the reader that my discussion has been limited to ab 
initio techniques; that is, to methods that begin with the electronic Schrodinger equation attempt to solve it 
without explicitly introducing any experimental data or any numerical results from another calculation. There 
exists a whole family of alternative approaches called semi-empirical methods [84] in which (a) overlaps 
between pairs of orbitals distant from one another are neglected, (b) many of the two-electron integrals 
appearing in ab initio methods are neglected (because they are 'small' in some sense) and (c) certain 
combinations of one- and two-electron integrals that can be (approximately) related to orbital energies of a 
constituent atom are not computed explicitly but are replaced by experimental data (or data from an ab initio 
calculation) on that atom. Interested readers in these approaches to electronic structure are referred to the 
articles given in [84], 
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B3.2 Quantum structural methods for the solid 
state and surfaces 

Frank Starr ost and Emily A Carter 


B3.2.1 INTRODUCTION 

We are entering an era when condensed matter chemistry and physics can be predicted from theory with 
increasing realism and accuracy. This is particularly important in cases where experiments lead to ambiguous 
conclusions, for regimes in which there still exists no experimental probe and for predictions of the properties 
of modern materials in order to select the most promising ones for synthesis and experimental testing. For 
example, continuing miniaturization in microelectronics heightens the importance of understanding of 
quantum effects, which computational materials theory is poised to provide, based to some degree on the 
methods presented here. 

Our intention is to give a brief survey of advanced theoretical methods used to determine the electronic and 
geometric structure of solids and surfaces. The electronic structure encompasses the energies and 
wavefunctions (and other properties derived from them) of the electronic states in solids, while the geometric 
structure refers to the equilibrium atomic positions. Quantities that can be derived from the electronic 
structure calculations include the electronic (electron energies, charge densities), vibrational (phonon spectra), 
structural (lattice constants, equilibrium structures), mechanical (bulk moduli, elastic constants) and optical 
(absorption, transmission) properties of crystals. We will also report on techniques used to study solid 
surfaces, with particular examples drawn from chemisorption on transition metal surfaces. 

In his chapter on the fundamentals of quantum mechanics of condensed phases (AL3), James R Chelikowsky 
introduces the plane wave pseudopotential method. Here, we will complement his chapter by introducing in 
some detail tight-binding methods as the simplest pedagogical illustration of how one can construct crystal 
wavefunctions from atomic-like orbitals. These techniques are very fast but generally not very accurate. After 
reviewing some of the efforts made to improve upon the local density approximation (LDA, explained in 
A1.3), we will discuss general features of the technically more complex all-electron band structure methods, 
focusing on the highly accurate but not very fast linear augmented plane wave (LAPW) technique as an 
example. We will introduce the idea of orbital-free electronic structure methods based directly on density 
functional theory (DFT), the computational effort of which scales linearly with size, allowing very large 


systems to be studied. The periodic Hartree-Fock (HF) method and the promising quantum Monte Carlo 
(QMC) techniques will be briefly sketched, representing many-particle approaches to the condensed phase 
electronic structure problem. 

In the final section, we will survey the different theoretical approaches for the treatment of adsorbed 
molecules on surfaces, taking the chemisorption on transition metal surfaces, a particularly difficult to treat 
yet extremely relevant surface problem [1], as an example. While solid state approaches such as DFT are 
often used, hybrid methods are also advantageous. Of particular importance in this area is the idea of 
embedding, where a small cluster of surface atoms around the adsorbate is treated with more care than the 
surrounding region. The advantages and disadvantages of the approaches are discussed. 


B3.2.2 TIGHT-BINDING METHODS 

B3.2.2.1 TIGHT BINDING: FROM EMPIRICAL TO SELF-CONSISTENT 

The wavefunction in a solid can be thought to originate from two different limiting cases. One extreme is the 
nearly free electron (NFE) approach. The idea here is that the valence electrons are hardly affected by the 
periodic potential of the atomic cores. Their wavefunctions can then be assumed to be easily described as 
linear combinations of the solutions for free electrons: the plane waves, exp(i£ • r). The NFE approximation is 
particularly useful for so-called NFE metals, such as the alkali metals. At the other extreme, the solid can be 
viewed as constructed from individual atoms. The valence wavefunctions of the solid are then approximated 
as linear combinations of the wavefunctions of the valence electrons of the atoms (see also section Al. 3. 5. 6 ). 
In this case, the electrons are considered to be 'tightly bound' to the atoms. This is a physically reasonable 
view of covalently bound solids and molecules, where localized chemical bonds are the norm (bulk silicon, 
organic or biomolecules etc). Methods which employ this view of the electrons in the solid are called tight- 
binding (TB) methods. The wavefunctions are generally expanded in atomic orbitals (in a linear combination 
of atomic orbitals (LCAO) formalism) or similarly localized functions. 

An advantage of TB is that generally the number of basis functions linearly combined to give the 
wavefunctions is rather small. The solution of the Schrodinger equation in these bases is then fast because the 
matrices representing the operators are small. Also, the construction of the Hamiltonian matrix elements is 
fast, since generally a number of, sometimes drastic, approximations are made. At the same time, however, 
the small basis set generally limits the quality of the TB results, since the variational freedom for the solution 
of the Schrodinger equation is not as high as in other methods. The approximations of Hamiltonian matrix 
elements often further reduce the quality of the results. 

Today, the term TB method is generally understood to refer to a technique using TB basis functions in which 
the Hamiltonian matrix elements are adjusted to reproduce results from experiments and/or from more 
sophisticated electronic structure methods [2]. Depending on the degree of dependence on external 
parameters, the methods are called empirical or semi-empirical TB. A number of approaches are used for the 
fitting of the TB parameters, generally a tough minimization task with many minima (using genetic 
algorithms has proved quite efficient [3, 4]). It has been noted that 'great care is needed to test the resulting 
model for reasonable behavior outside the range of the fit' [5, 6]. A disadvantage of the empirical methods is 
that it is difficult to distinguish to what extent the parametrization or the method itself is responsible for errors 
in the results. 

Frequent approximations made in TB techniques in the name of achieving a fast method are the use of a 
minimal basis set, the lack of a self-consistent charge density, the fitting of matrix elements of the potential, 


the assumption of an orthogonal overlap matrix, a cut-off radius used in the integration to determine matrix 
elements, and the neglect of matrix elements that require three-centre integrals and crystal-field terms. We 
will now provide more details on these approximations. 

Generally, the following ansatz for the wavefunction is made: 


where cp Jr) = (r|cp , ) represents an atomic orbital of symmetry a (such as s, p x , p , p z ) at atom /. 
This yields the generalized eigenvalue problem 

Hd =£iSd\ (B3.2.1) 

with the elements of the Hamiltonian matrix H a ^n = (cp J#|<Po ) and the overlap matrix S a ^o = (cp 
. |q>o )• In the TB approximation, the basis functions are thought to be sufficiently localized such that 
contributions to the Hamiltonian matrix usually are accounted for only up to at most the third or fourth 
neighbour. Frequently a minimal basis set is used, i.e. a single orbital cp « is used per atom and per orbital 
symmetry to expand the wavefunction. 

In orthogonal TB methods, the overlap matrix is assumed to be diagonal, even though the basis functions of 
adjacent sites ordinarily are not orthogonal [6]. Harrison has shown that this approximation can be 
compensated for by adjustments to the Hamiltonian matrix elements (these adjustments are arrived at 
automatically in methods depending on fitting, for example, a DFT band structure) [7]. However, this 
approach reduces the transferability of the TB parameters to other structures [8]. Including the overlap matrix 
brings with it the additional cost of its calculation and solving the generalized eigenvalue problem, see 
equation (B3.2.1), rather than an ordinary eigenvalue problem. 

One can construct an effective potential, written here in the DFT language (see, for example, equation 
A1.3.38 ofA1.3)as 

*^(r) = M ejtl (r) + vn[p(r)] + w Xtt [p{r)]. (B 3.2.2) 

To rationalize the 'two-centre approximation', the effective potential is written as 

Vcfffr) = ^V C ff.l(|l i -.Kf|)> 

J 

where v eff l is centred on the atom / and vanishes away from the atom, which need not involve any 
approximation. 


In the calculation of the elements 


1 I fl f 


1 ? 
with T= -iV the kinetic energy operator, several types of potential matrix elements can be distinguished [6]: 


(1) Three-centre terms, i.e. l^m^n. These are frequently neglected, in what is called the two-centre 
approximation, based on the assumed strong localization of the orbitals cp Jr). 


(2) Inter-atomic two-centre matrix elements (cp Jv eff l + v e ff /J^r )• These matrix elements represent 
the hopping of electrons from one site to another, they can be described [7] as linear combinations of 
so-called Slater-Koster elements [9]. The coefficients depend only on the orientation of the atoms / 
and m in the crystal. For elementary metals described with s, p, and d basis functions there are ten 
independent Slater-Koster elements. In the traditional formulation, the orientation is neglected and 
the two-centre elements depend only on the distance between the atoms [6]. (In several models [6, 
10], they have been made dependent on the environment of the atoms / and m.) These elements are 
generally fitted to reproduce DFT results such as the band structure or the values of DFT matrix 
elements in diatomics. 

(3) Intra-atomic matrix elements, or on-site terms, with l = m. Traditionally, the potential contributions 
from other atomic sites, v eff ^ ^ , so-called crystal-field terms, are neglected [10]. In this case, then 
the only non-zero on-site terms have a = P, since basis functions on the same site are orthogonal 
atomic orbitals. There are methods which include these crystal-field terms [ 11 , 12 ]. Physically, these 
diagonal elements represent the energy required to place an electron in a specific orbital. In some 
implementations, they are set to the orbital energy values of the neutral free atom [13], guaranteeing 
the correct limit for isolated atoms. However, this approach ignores the potential contributions to the 
diagonal elements due to different environments in a molecule or crystal; these are taken into account 
in other variants of the method [6, 10, 11 ]. 

Most TB approaches are not charge self-consistent. This means that they do not ensure that the charge 
derived from the wavefunctions yields the effective potential v eff assumed in their calculation. Some 
methods have been developed which yield charge densities consistent with the electronic potential [14, 15 
and 16]. 

The localized nature of the atomic basis set makes it possible to implement a linear-scaling TB algorithm, 
i.e. a TB method that scales linearly with the number of electrons simulated [17]. (For more information on 
linear scaling methods, see section B3. 2. 3. 3 .) 

The accuracy of most TB schemes is rather low, although some implementations may reach the accuracy of 
more advanced self-consistent LCAO methods (for examples of the latter see [18, 19 and 20]). However, 
the advantages of TB are that it is fast, provides at least approximate electronic properties and can be used 
for quite large systems (e.g., thousands of atoms), unlike some of the more accurate condensed matter 
methods. TB results can also be used as input to determine other properties (e.g., photoemission spectra) for 
which high accuracy is not essential. 

B3.2.2.2 APPLICATIONS OF TIGHT-BINDING METHODS 

TB methods have been widely used to study properties of simple semiconductors such as Si [ 11 ] and GaN 
[16]. In the latter study, the effect of dislocations on the electronic structure of GaN was investigated with a 
view toward understanding how dislocations affect the material's optical properties. The large supercell of 
224 atoms led to TB as the method of choice. This particular variant of TB fits TB matrix elements to DFT- 
LDA results and solves self-consistently for atomic charges. It has also been used to predict reaction 
energetics of organic molecules, the structure of large biomolecules and the surface geometry and band 


structure of III-V semiconductors [15]. The TB method is expected to provide qualitatively reasonable 
results for systems where localized atomic charges make sense and hence is not expected to perform as well 
for metallic systems. Despite potential problems of TB for metals, the TB approach has also been used to 
study the phonon spectrum of the transition metal molybdenum [6], the elastic constants, 


vacancies and surfaces of monatomic transition and noble metals and the Hall coefficient of complex 
perovskite crystals [10]. As an example of data available from a TB calculation, a TB variant of extended 
Hiickel theory [21, 22] was used to describe the initial states in photoemission from GaN [23]. The parameters 
were fitted to the bulk band structure E (k) (for a definition, see section A 1.3. 6 ). As displayed in figure 
(B3.2.1) good agreement is found for the occupied states (negative energies), while larger differences for the 
conduction bands (positive energies) reveal a typical problem of the TB methods: they are far less capable of 
describing the delocalized conduction band states (the same is true for delocalized valence states in a metal, as 
mentioned above). In figure (B3.2.2) we show a series of calculated photoemission spectra compared to 
experimental results [23]. The dispersion of the main peaks as a function of emission angle and photon energy 
agrees reasonably well in theory and experiment. 



Figure B3.2.1. The band structure of hexagonal GaN, calculated using EHT-TB parameters determined by a 
genetic algorithm [23]. The target energies are indicated by crosses. The target band structure has been 
calculated with an ab initio pseudopotential method using a quasiparticle approach to include many-particle 
corrections [ 194 ]. 
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Figure B3.2.2. A series of photoemission spectra. The angles give the polar angle of electron emission at the 
stated photon energy scanning the surface Brillouin zone from Fto Atf. Left: A calculation using the tight- 
binding parametrization (given the band structure in figure (B3.2.1) ) for the initial states [23]. Right: 
Experimental spectra by Dhesi et al [ 195 ]. The difference in binding energies is due to the experimental 
difficulty in determining the Fermi energy [23]. (Experimental figure by Professor K E Smith.) 


B3.2.3 FIRST-PRINCIPLES ELECTRONIC STRUCTURE METHODS 

In this section, we briefly review the basic elements of DFT and the LDA. We then focus on improvements 
suggested to remedy some of the shortcomings of the LDA (see section B3. 2. 3.1 ). A wide variety of 
techniques based on DFT have been developed to calculate the electron density. Many approaches do not 
calculate the density directly but rather solve for either a set of single-electron orbitals, or the Green's 
function, from which the density is derived. 

In sectionB3.2.3.2 , we introduce a number of techniques commonly referred to as ab initio all-electron 
electronic structure methods. Ab initio methods, in particular, aim at calculating the energies of electrons and 
their wavefunctions as accurately as possible, introducing as few adjustable parameters as possible. (Empirical 
or semi-empirical methods include the empirical pseudopotential approach (see section Al. 3. 5. 5 ) and many 
TB techniques (see section B3. 2. 2 ).) Within the ab initio band structure approach, two communities exist that 
differ in their treatment of the singular nature of realistic, Coulomb-like crystal potentials. In the 
pseudopotential approach discussed by Chelikowsky in chapter A 1.3 , the Coulomb singularity (-Z/r) of the 
crystal potential is replaced by a smoother function, whereas in the so-called 'all-electron' approach, the 
Coulomb singularity is retained. The pseudopotential transformation limits the range of electron energies 
which can be accessed. However, since the pseudo-wavefunction is much smoother than the all-electron 
wavefunction (which has large oscillations near the nucleus), the pseudopotential allows the use of a plane 


wave basis set, which is comparatively easy to handle. In principle, the all-electron methods have no 
limitation on the energy range of calculations. This is achieved by a sophisticated representation of the 
wavefunction. 


The so-called orbital-free DFT technique, which aims to directly calculate the electron density for which the 


total energy is minimal, is presented as an example of methods whose computational effort scales linearly 
with system size (see sectionB3.2.3.3 ). In section B3. 2. 3. 4 , we discuss the periodic HF method, an alternative 
approach to DFT that offers a well defined starting point for many-particle corrections. Finally, the two most 
frequently used QMC techniques are described in sectionB3.2.3.5 . 

B3.2.3.1 THE LOCAL DENSITY APPROXIMATION AND BEYOND 

In DFT, the electronic density rather than the wavefunction is the basic variable. Hohenberg and Kohn 
showed [24] that all the observable ground-state properties of a system of interacting electrons moving in an 
external potential v ext (r) are uniquely dependent on the charge density p(r) that minimizes the system's total 
energy. However, there is no known formula to calculate from the density the total energy of many electrons 
moving in a general potential. Hohenberg and Kohn proved that there exists a universal functional of the 
density, called G[p], such that the expression 

£ W = / VmirMr) d V + \ f ^ r)ptr) d 3 r dV + G\p] (B 3.2.3) 

J 2 J \r - t | 

has as its minimum value the correct ground-state energy associated with v ext (r). Here, the first term on the 
right-hand side represents the energy due to an external potential, including the electron-nuclear potential, 
while the second term is the classical Coulomb energy of the electronic system. The functional G[p] is valid 
for any number of electrons and any external potential, but it is unknown and further steps are necessary to 
approximate it. 

Kohn and Sham [ 25 ] decompose G[p] into the kinetic energy of an analogous set of non-interacting electrons 
with the same density p(r) as the interacting system, 


7k[jo] = X)/fc -^ v3 *-} 


(where \|/ z -(r) = (r\\\f f ) is the wavefunction of electron /), and the exchange and correlation energy of an 
interacting system with density p(r), 2? xc [p]. The functional ^ xc [p] is not known exactly. Physically, it 
represents all the energy corrections beyond the Hartree term to the independent-particle model, i.e. the non- 
classical many-body effects of exchange and correlation (xc) and the difference between the kinetic energy of 
the interacting electron system T[p] and the analogous non-interacting system rjp]. 

In the LDA, the exchange and correlation energy is approximated using the exchange and correlation energy 
of the homogeneous electron gas at the same density (see section Al. 3. 3. 3 ). The crystal density is obtained by 
solving the single-particle Kohn-Sham equation 


(-- V 2 + y cff (r) j iMr) = % rlfr(r), (B 3.2.4) 


for a self-consistent potential v ff> i.e. a potential which is produced by the density p. In bulk crystal 
calculations, the index i runs over both the Bloch vector k (see section Al. 3.4 ) and the band index n (in a 
simple crystal, this band could be derived, for example, entirely from s states). The solutions to equation 
(B3.2.4) are often called Kohn-Sham orbitals. The crystal density is then 


f>(r) = X>*<r)lMr). 


The eigenenergy E f can be defined as the derivative of the total energy of the many-electron system with 
respect to the occupation number of a specific orbital [26]. In HF theory (where equation (B3.2.4) applies and 
the v eff contains a non-local exchange operator, see section Al.3.1.2 and chapter B3.1 ), Koopmans' theorem 
states that the single-particle eigenvalue is the negative of the ionization energy (neglecting the relaxation of 
the electronic system). In contrast, the identification of the highest occupied Kohn-Sham eigenvalue with the 
negative of the ionization energy is a controversial subject [27]. While there is no rigorous connection 
between eigenvalue differences and excitation energies in either HF or DF theory, comparisons of these 
values are common practice (see below for more appropriate methods). Relative differences among occupied 
single-particle energies often agree well with the experiment. Even though DFT only provides a solution for 
the ground state of the electronic system, the energy differences in the lower conduction bands, i.e. low- 
energy excited states, often are represented surprisingly well, too. However, in LDA calculations of 
semiconductors and insulators, almost always the size of the gap between the valence band maximum and the 
conduction band minimum is underestimated, since many-particle effects are incorrectly represented by the 
parametrized exchange-correlation energy (see, for example, [28]). One ad hoc remedy, which works well for 
many systems and which is employed in the examples presented here, is to use what is amusingly referred to 
as a scissor operator, i.e. a rigid shift, to correct the gap size [29, 30]. Typically the shift is determined by 
knowing, for example, the DFT error in predicting the measured optical band gap. The entire conduction band 
is shifted rigidly upward by the amount to match the experimental band gap. 

More advanced techniques take into account quasiparticle corrections to the DFT-LDA eigenvalues. 
Quasiparticles are a way of conceptualizing the elementary excitations in electronic systems. They can be 
determined in band structure calculations that properly include the effects of exchange and correlation. In the 

LDA, these effects are modelled by the exchange-correlation potential v L xc - In order to more accurately 
account for the interaction between a particle and the rest of the system, the notion of a local potential has to 
be generalized and a non-local, complex and energy-dependent exchange-correlation potential has to be 
introduced, referred to as the self-energy operator 2(r, r';E). The self-energy can be expanded in terms of the 
screened Coulomb potential W, where W= e _1 v is the Coulomb interaction v screened by the inverse dielectric 
function €~ . In a lowest order expansion in W, the self-energy can be approximated as S = GW, giving the 
GW approximation [31]. Here G is the one-electron Green's function describing the propagation of an 
additional electron injected into a system of other electrons (it can also describe the extraction of an electron). 


To be a bit more explicit (following [32, 33]), the quasiparticle energies and wavefunctions are given by 
(7" + y cxr + Vi\)f nk (r) + / dr E(t% r; E ftk )i}/ ttk {r) = E Iik ^ tlk (r) t 

where T is the kinetic energy operator, v ext is the external potential due to the ions, and v R is the Hartree 
Coulomb interaction. Since the self-energy operator in general is non-Hermitian, the quasiparticle energies 
E k are complex in general, and the imaginary part gives the lifetime of the quasiparticle. To first order in W, 
the self-energy is then given by 

2:r J 

where 8 is a positive infinitesimal and co corresponds to an excitation frequency. The inputs are the full 
interacting Green's function, 




where 5 ^ is an infinitesimal and the dynamically screened Coulomb interaction, 


f {r y r; ai) = fi" / dr"*~ (t\ r"; fjj)t'{r v — r). 


where e _1 is the inverse dielectric matrix, v(r) = \l\r\ and Q is the volume of the system. Usually the 
calculations start with the construction of the Green's function and the screened Coulomb potential from self- 
consistent LDA results. The self-energy S then has to be obtained together with G in a self-consistent 
procedure. However, due to the severe computational cost of this procedure, it is usually not carried out (see, 
for example, [34]). Instead, it is common practice to construct the self-energy operator non-self-consistently 
using the self-consistent LDA results to determine quasiparticle corrections to the LDA energies, resulting in 
the quasiparticle band structure. The GW approximation has been applied to a wide range of metals, 
semiconductors and insulators, where it has been found to lead to striking improvements in the agreement of 
optical excitation spectra with the experiment (see, for example [32, 35, 36 and 37]). Recent studies also 
found that the GW charge density is close to the experiment for diamond structure semiconductors [38], and 
lifetimes of low-energy electrons in metals have been calculated [39]. 

Another disadvantage of the LDA is that the Hartree Coulomb potential includes interactions of each electron 
with itself, and the spurious term is not cancelled exactly by the LDA self-exchange energy, in contrast to the 
HF method (see A1.3 ), where the self-interaction is cancelled exactly. Perdew and Zunger proposed methods 
to evaluate the self- interaction correction (SIC) for any energy density functional [40]. However, full SIC 
calculations for solids are extremely complicated (see, for example [41, 42 and 43]). As an alternative to the 
very expensive GW calculations, Pollmann et al have developed a pseudopotential built with self-interaction 
and relaxation corrections (SIRC) [44]. 
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The pseudopotential is derived from an all-electron SIC-LDA atomic potential. The relaxation correction 
takes into account the relaxation of the electronic system upon the excitation of an electron [44]. The authors 
speculate that ' . . .the ability of the SIRC potential to produce considerably better band structures than DFT- 
LDA may reflect an extra nonlocality in the SIRC pseudopotential, related to the nonlocality or orbital 
dependence in the SIC all-electron potential. In addition, it may mimic some of the energy and the non-local 
space dependence of the self-energy operator occurring in the GW approximation of the electronic many body 
problem' [45]. 

The LDA also fails for strongly correlated electronic systems. Examples of such systems are the late 3d 
transition-metal mono-oxides MnO, FeO, CoO, and NiO. Within the local spin density approximation 
(LSDA), the energy gaps calculated for MnO and NiO are too small [46] and, even worse, FeO and CoO are 
predicted to be metallic, whereas experimentally they have been found to be large-gap insulators. While the 
GW approximation yields an energy gap of NiO in reasonable agreement with experiment [47], the 
computational cost of this procedure is very high. The SIC-LDA method reproduces quite well the strong 
localization of the d electrons in transition metal compounds, but the orbital energies obtained by SIC are 
usually in strong disagreement with experimental results (for transition metal oxides, for example, occupied d 

bands are approximately 2Hartree below the oxygen valence band — a separation not seen in spectroscopic 
data: see, for example, the experimental results in [48]) [49]. An alternative solution to this problem is offered 
by the LDA+[/ method [49, 50], where LDA encompasses the LSDA. In the LDA+[/ technique, the electrons 
are divided into two subsystems which are treated separately: the strongly localized (d or f) electrons and the 
delocalized s and p electrons. The latter are treated by standard LDA. The on-site interactions among the 


strongly localized electrons on each atom, however, are taken into account by a term lUI,- n.n., where n- are 

l^cp I J 1 

the occupation numbers of the strongly localized orbitals and [/is the Coulomb interaction parameter (for 
details on the first-principles calculation of U, see [51]). At least for localized d or f states, the LDA+[/ 
technique may be viewed as an approximation to the GW approximation [49]. Band gaps, valence band 
widths and magnetic moments have been calculated with LDA+[/that agree with experiment for a variety of 
transition metal compounds [49, 52], among other applications. 

B3.2.3.2 ALL-ELECTRON DFT METHODS 

(A) INTRODUCTION 

When the highest accuracy is sought for the electronic and geometric properties of crystals, all the electrons of 
the atoms in the crystal and the full Coulomb singularity of the nuclear potential must be accounted for. All- 
electron approaches, which do just that, generally cannot compete with pseudopotential techniques in speed 
and simplicity of algorithm. However, the latter suffer from severe drawbacks when it comes to the 
construction of the very pseudopotentials these methods depend upon: even for so-called ab initio potentials, 
the pseudopotentials are far from uniquely determined. Additionally, problems with transferability and the 
construction of potentials for such elements as the transition metals remain. All-electron techniques can deal 
with any element and there are no worries about transferability of the potential. However, the accuracy comes 
at a price: due to the Coulomb singularity of the potential at the nuclear positions, the wavefunctions are 
highly oscillatory close to the nucleus. For those all-electron methods that use wavefunctions to represent the 
electrons (a Green's function method, for example, does not), this means that a simple plane wave basis set 
cannot be used for the expansion of the wavefunctions. To reach convergence of a plane wave exp(i£ • r) 
expansion would require a prohibitive number of basis functions. Thus, specialized basis sets have been 
invented for all-electron calculations. 


-11- 


We now discuss the most important theoretical methods developed thus far: the augmented plane wave 
(APW) and the Korringa-Kohn-Rostoker (KKR) methods, as well as the linear methods (linear APW 
(LAPW), the linear muffin-tin orbital [LMTO] and the projector-augmented wave [PAW]) methods. 

In the early all-electron techniques, the crystal was separated into spheres around the atoms, so-called 
'muffin-tin' spheres, and the interstitial region in between. Inside the spheres, the potential was approximated 
as spherically symmetric, while in the interstitial region it was assumed to be constant. This shape 
approximation of the potential is reasonable for close-packed crystals such as hexagonally close-packed 
metals, where the spheres cover a large fraction of the crystal volume. However, in less densely arranged 
crystals, such as diamond structure semiconductors (see figure (A 1.3. 4) the muffin-tin approximation leads to 
large errors. In the diamond and the related zincblende structures, only 34% of the volume is covered by 
touching muffin-tin spheres (figure (B3.2.3)). For all of the all-electron methods, versions have been 
developed that are not restricted to shape approximations of the potential. These techniques are referred to as 
general, or full, potential methods. 
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Figure B3.2.3. The muffin-tin spheres in the (110) plane of a zincblende crystal. The nuclei are surrounded by 
spheres of equal size, covering about 34% of the crystal volume. Unoccupied tetrahedral positions are 
indicated by crosses. The conventional unit cell is shown at the bottom; the crystal directions are noted. 

(B) THE AUGMENTED PLANE WAVE METHOD 

The APW technique was proposed by Slater in 1937 [53, 54]- It remains the most accurate of the band 
structure methods for the muffin-tin approximation of the potential. The wavefunction is expanded in basis 
functions (p t (k + G., E, r), the APWs, each of which is identical to the plane wave exp(i(£ + G)-r) in the 
interstitial region, where G. are the reciprocal lattice vectors (see section Al. 3.4 ). The plane waves are 
augmented, i.e. they are joined continuously at the surface of the spheres by solutions of the radial 
Schrodinger equation. This means that in the spherical harmonic expansion of a plane wave around the centre 
of a muffin-tin sphere, the respective Bessel function inside the sphere is replaced by a solution (^.(r, E) of the 
radial Schrodinger equation for a given energy. The radial function matches the Bessel functioned k + G.|r), 
value at the sphere boundary and must be regular (non-singular) at the origin. 
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With the basis functions (p f (k + G., E, r), a variational solution is sought to the Kohn-Sham equation, equation 
(B3.2.4) . Since the Hamiltonian matrix elements now depend nonlinearly upon the energy due to the energy- 
dependent basis functions, the resulting secular equation is solved by finding the roots of the determinant of 
the K(E) - E$.(E) matrix. (The problem cannot be treated by the eigenvalue routines of linear algebra.) 

Numerically, the determination of the roots can be difficult because the determinant's value may change by 
several orders of magnitude when the energy E is changed by only a few meV. Another difficulty can result at 
degenerate roots where the value of the determinant does not change sign. Additionally, the secular equation 
becomes singular when a node of the radial solution falls at the muffin-tin sphere boundary (the so-called 
'asymptote problem'). 

Physically, the APW basis functions are problematic as they are not smooth at the sphere boundary, i.e., they 
have discontinuous slope. While in a fully converged solution of the secular equation, this discontinuity 
should disappear, alternative methods have been sought instead. Following a suggestion by Marcus [55] in 
1967, the LAPW provided a way to avoid the above-mentioned drawbacks of the APW technique, as we now 
discuss. 


(C) THE LINEAR AUGMENTED PLANE WAVE METHOD 


The main disadvantage of the APW technique is that it leads to a nonlinear secular problem because the basis 
functions depend on the energy. A number of attempts have been made to construct linear versions of the 
APW approach by introducing energy-independent basis functions in different ways. In 1970, Koelling 
invented the alternative APW [ 56 ] and Bross the modified APW [57]. In 1975, Andersen constructed the 
LAPW [58] formalism, which today is the most popular APW-like band structure method. Further extensions 
of the linear methods appeared in the early 1990s: Singh developed the LAPW plus localized orbitals 
(LAPW+LO [59]) in 1991 and Krasovskii the extended LAPW (ELAPW [60]) in 1994. Recently the 
APW+LO technique has been implemented by Sjostedt and Nordstrom [61] according to an idea by Singh. 
While the LAPW technique is generally used in combination with DFT approaches, it has also been applied 
based on the LDA+C7 [62] and HF theories [63]. 

The LAPW method, as suggested in 1975 [ 58 , 64], avoids the problem of the energy dependence of the 
Hamiltonian matrix by introducing energy-independent APW basis functions. Here, too, the APWs are 
derived from plane waves by augmentation: Bessel functions jf\ k + G f \r) in the Rayleigh decomposition 
inside the muffin-tin sphere are replaced by functions u^if) derived from the spherical potential, which are 
independent of the energy of the state that is sought and that match the Bessel functions at the sphere radius in 
value and in slope (see figure (B3.2.4) ). The plane wave part of the basis remains the same but the energy- 
independent APWs allow the energies and the wavefunctions to be determined by solving a standard 
generalized eigenvalue problem. 
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Figure B3.2.4. A schematic illustration of an energy-independent augmented plane wave basis function used 
in the LAPW method. The black sine function represents the plane wave, the localized oscillations represent 
the augmentation of the function inside the atomic spheres used for the solution of the Schrodinger equation. 
The nuclei are represented by filled black circles. In the lower part of the picture, the crystal potential is 
sketched. 


In linearizing the APW problem as it is done in the LAPW method, the variational freedom of the APW basis 
set is reduced. The reason is that the wavefunction inside the spheres is rigidly coupled to its plane wave 
expansion in the interstitial region [65]. This means that the method cannot yield an accurate wavefunction 
even if the eigenvalue is within a few eV of the chosen energy parameters [66]. Flexibility is defined in this 
context as the possibility to change the wavefunction inside the spheres independently from the wavefunction 
in the interstitial region. Flexibility can be achieved in the linear band structure methods by adding basis 
functions localized inside the spheres whose value and slope vanish at the sphere boundary [54, 67, 68]. A 
'flexible' basis set extending the LAPW with localized functions is preferable to the one used in the pure 
LAPW technique. Flexible linear methods are the MAPW, the LAPW+LO and the ELAPW, the latter of 
which provides a necessary degree of flexibility with a minimal number of basis functions [65]. 


The additional functions increase the matrix dimension slightly and thus the computational effort. However, 
the increased flexibility of the basis set makes possible a number of extensions of the LAPW method. One is a 
k - p formulation of the ELAPW method [68], which would lead to large errors in the regular LAPW due to its 
lesser flexibility. The augmented Fourier components (AFC) technique [69] for treating a general potential is 
based on this. The AFC method is an alternative to the full-potential LAPW (FLAPW) method [70, 71]. 
(Recently progress has been made in increasing the computational efficiency of the FLAPW method [72].) 
The AFC method does not have the same demanding convergence criteria as the FLAPW method but yields 
physically equivalent results [69]. 

The general potential LAPW techniques are generally acknowledged to represent the state of the art with 
respect to accuracy in condensed matter electronic-structure calculations (see, for example, [62, 73]). These 
methods can provide the best possible answer within DFT with regard to energies and wavefunctions. 
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(D) THE KORRINGA-KOHN-ROSTOKER TECHNIQUE 

The KKR method uses multiple-scattering theory to solve the Kohn-Sham equations [74, 75]. Rather than 
calculate the wavefunction, modern incarnations calculate the Green's function G. The Green's function is the 
solution to the equation schematically given by (H - E)G(E) = -5, where H is the Hamiltonian, E the single- 
electron energy and 8 the delta function 5(r - r'). The properties of the system, such as the electron density, 
the density of states and the total energy can be derived from the Green's function [73]. The crystal is 
represented as a sum of non-overlapping potentials; in the modern version, there are no shape approximations, 
i.e. the potentials are space-filling [76]. Within the multiple-scattering formalism, the wavefunction is built up 
by taking into account the scattering and rescattering of a free-electron wavefunction by scatterers. The 
scatterers are (generally) the atoms of the crystal and the single-scattering properties (the properties of the 
isolated scatterer) are derived from the effective, singular potentials of the atoms (given in equation (B3.2.2) ). 
The Green's matrix is then constructed from the knowledge of the scattering properties of the single scatterers 
and the analytically known Green's function of the free electron. The full-potential KKR method has been 
shown to have the same level of accuracy as the full-potential LAPW method [73]. The Green's function 
formulation offers the advantage of easy inclusion of defects in the bulk or clean surfaces. Such calculations 
start with the Green's function of the periodic crystal and include the perturbation through a Dyson equation 
[77]. Yussouff states that the difference in speeds between the linear methods and his 'fast' KKR technique is 
at most a factor often, in favour of the former [78]. While the KKR technique has an accuracy comparable to 
the APW method, it has the disadvantage of not being a linear approach, limiting speed and simplicity. 

(E) THE LINEAR MUFFIN-TIN ORBITAL METHOD 

The LMTO method [58, 79] can be considered to be the linear version of the KKR technique. According to 
official LMTO historians, the method has now reached its 'third generation' [79]: the first starting with 
Andersen in 1975 [58], the second commonly known as TB-LMTO. In the LMTO approach, the 
wavefunction is expanded in a basis of so-called muffin-tin orbitals. These orbitals are adapted to the potential 
by constructing them from solutions of the radial Schrodinger equation so as to form a minimal basis set. 
Interstitial properties are represented by Hankel functions, which means that, in contrast to the LAPW 
technique, the orbitals are localized in real space. The small basis set makes the method fast computationally, 
yet at the same time it restricts the accuracy. The localization of the basis functions diminishes the quality of 
the description of the wavefunction in the interstitial region. 

In the commonly used atomic sphere approximation (ASA) [79], the density and the potential of the crystal 
are approximated as spherically symmetric within overlapping muffin-tin spheres. Additionally, all integrals, 
such as for the Coulomb potential, are performed only over the spheres. The limits on the accuracy of the 
method imposed by the ASA can be overcome with the full-potential version of the LMTO (FP-LMTO) 


which gives highly accurate total energies [79, 80]- It was found that the FP-LMTO is 'at least as accurate as, 
and much faster than,' pseudopotential plane wave calculations in the determination of structural and dynamic 
properties of silicon [80]. The FP-LMTO is considerably slower than LMTO-ASA, however, and it has been 
found that ASA calculations can yield accurate results if the full expansion, rather than only the spherical part, 
of the charge is used in what is called a full-charge (rather than a full-potential) method and the integrals are 
performed exactly [ 73 , 79 ]. 

The LMTO method is the fastest among the all-electron methods mentioned here due to the small basis size. 
The accuracy of the general potential technique can be high, but LAPW results remain the 'gold standard'. 
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(F) THE PROJECTOR AUGMENTED WAVE TECHNIQUE 

The projector augmented-wave (PAW) DFT method was invented by Blochl to generalize both the 
pseudopotential and the LAPW DFT techniques [81]. PAW, however, provides all-electron one-particle 
wavefunctions not accessible with the pseudopotential approach. The central idea of the PAW is to express 
the all-electron quantities in terms of a pseudo-wavefunction (easily expanded in plane waves) term that 
describes interstitial contributions well, and one-centre corrections expanded in terms of atom-centred 
functions, that allow for the recovery of the all-electron quantities. The LAPW method is a special case of the 
PAW method and the pseudopotential formalism is obtained by an approximation. Comparisons of the PAW 
method to other all-electron methods show an accuracy similar to the FLAPW results and an efficiency 
comparable to plane wave pseudopotential calculations [82, 83]. PAW is also formulated to carry out DFT 
dynamics, where the forces on nuclei and wavefunctions are calculated from the PAW wavefunctions. 
(Another all-electron DFT molecular dynamics technique using a mixed-basis approach is applied in [84].) 

PAW is a recent addition to the all-electron electronic structure methods whose accuracy appears to be similar 
to that of the general potential LAPW approach. The implementation of the molecular dynamics formalism 
enables easy structure optimization in this method. 

(G) ILLUSTRATIVE EXAMPLES OF THE ELECTRONIC AND OPTICAL PROPERTIES OF MODERN MATERIALS 

As an indication of the types of information gleaned from all-electron methods, we focus on one recent 
approach, the ELAPW method. It has been used to determine the band structure and optical properties over a 
wide energy range for a variety of crystal structures and chemical compositions ranging from elementary 
metals [ 60 ] to complex oxides [85], layered dichalcogenides [86, 87] and nanoporous semiconductors [88], 
The k - p formulation has also enabled calculation of the complex band structure of the Al (100) surface [89], 

As an illustration of the accuracy of the AFC ELAPW-A • p method, we present the dielectric function of 
GaAs. The dielectric function is a good gauge of the quality of a method, since not only do the energies enter 
the calculation, but also the wavefunctions via the matrix elements of the momentum operator -iV. For the 
calculation of the dielectric function (equation (Al.3.87)) of GaAs, the conduction bands were rigidly shifted 
so that the highest peak agreed in both experiment and theory, a shift of 0.75 eV. The imaginary part of the 
dielectric function is shown in figure B3.2.5 . Comparing the energy differences between the three peaks, we 
find that they agree to within 2 meV. For a wider comparison, we plot the results of two more experiments 
(which only have measured the two peaks at lower photon energy) and several all-electron calculations of the 
dielectric function of GaAs in figure B3.2.6 . The FLAPW results agree almost exactly with the AFC ELAPW 
values. The discrepancies compared to the experimental results found for the other methods are considerably 
larger than for the general potential LAPW results, particularly for Ey 
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Figure B3.2.5. The imaginary part of the dielectric function of GaAs, according to the AFC ELAVW-kp 
method (solid curve) [ 195 ] and the experiment (dashed curve) [ 196 ]. To correct for the band gap 
underestimated by the local density approximation, the conduction bands have been shifted so that the E 2 
peaks agree in theory and experiment. 
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Figure B3.2.6. The energies of the E^ and E" ^ peaks relative to the E 2 peak of the imaginary part of the 
dielectric function of GaAs, calculated by self-consistent DFT all-electron methods. These energies do not 
depend on the gap size. The theoretical methods are noted, as are experimental results obtained by 
ellipsometry (see chapter B 1.26 ). The lower (upper) histogram gives the energy of peak E^ (E^) relative to 
E 2 . LCGO designates a linear-combination-of-Gaussian-orbitals method, OLCAO an orthogonalized linear- 
combination-of-atomic-orbitals approach. Sources: (1) [195], (2) [196], (3) [197], (4) [198], (5) [199], (6) 
[200], (7) [199], (8) [201], (9) [202]. 
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A recent study of a class of nanoporous materials, the cetineites [88], offers further illustration of the 
possibilities offered by the modern band structure methods. The crystal is constructed of tubes of 0.7 nm 
diameter arranged in a two-dimensional hexagonal structure with 'flattened' SbSe 3 pyramids arranged 
between the tubes (see figure B3.2.7). Cetineites are of potential technological interest because, singularly 
among nanoporous materials, they are semiconductors rather than insulators. In figure B3.2.8 , we show the 


comparison of the predicted density of states to the ultraviolet photoemission spectrum (PES, see chapter 
Bl.l ). The DOS can explain the two main structures in the PES at about -3 and -12 eV. Their relative 
intensities agree with those suggested by the DOS curve. Three structures in the DOS at -1, -6 and -9 eV are 
not resolved in the PES. This may be due to the selection rules of the photoemission process, not accounted 
for in the theory, or perhaps due to incomplete angle integration experimentally. The experimental results 
confirm, in particular, that the number of states is very high close to the valence band maximum. An orbital 
analysis shows that these states are derived mainly from the p states of the O and Se constituents of the 
crystal, with the chalcogen dominating near the top of the valence band. Electrons in the Se p states are thus 
most easily excited into the conduction band. This, together with their high DOS, makes the Se p states 
located on the pyramids the prime candidates for the initial states of the photoconductivity observed in the 
cetineites. 


Cetineite (Na;Se) 



yJ & [Sb ]2 !S ]fSbJfJ 2 A=Ka, K\ X=% Sc 

Figure B3.2.7. A perspective view of the cetineite (Na;Se). The height of the figure is three lattice constants 
c. The shaded tube is included only as a guide to the eye. (From [88].) 
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Figure B3.2.8. Comparison of the photoemission spectrum for the cetineite (Na;Se) and the density of states 
calculated by the AFC ELAPW-A:/; method | 


As another example of properties extracted from all-electron methods, figure B3.2.9 shows the results of a 
PAW simulation of benzene molecules on a graphite surface. The study aimed to show the extent to which the 
electronic structure of the molecule is modified by interaction with the surface, and why the images do not 
reflect the molecular structure. The PAW method was used to determine the structure of the molecule at the 
surface, the strength of the interaction between the surface and the molecules, and to predict and explain 
scanning tunnelling microscope (STM) images of the molecule on the surface [ 90 ] (the STM is described in 
section B 1.1 9 ). 



Figure B3.2.9. A benzene molecule on a graphite surface [90]. The geometry and the charge density 
(indicated by the surfaces of constant density) have been obtained using the PAW method. (Figure by 
Professor P E Blochl.) 


-19- 


B3.2.3.3 LINEAR-SCALING ELECTRONIC STRUCTURE METHODS 


DFT calculations such as the ones mentioned in chapter A 1.3 and section B3. 2. 3. 2 become computationally 
very expensive when the unit cell of the interesting system becomes large and complex, with certain parts of 
the computational algorithm typically scaling cubically with system size. A recent objective for treating large 
systems is to have the computational burden scale no more than linearly with system size. Methods achieving 
this are called linear-scaling or O(TV) (order TV) methods, most of which are based on the Kohn-Sham equation 
(see equation (B3.2.4) ), aiming to calculate single-electron wavefunctions, the Kohn-Sham orbitals. These 


methods tend to be faster than the conventional Kohn-Sham approach above a few hundred atoms [20, 91, 92 
and 93]. Another class of methods is based directly on the DFT of Hohenberg and Kohn [24]. With these 
techniques one seeks to determine directly the density that minimizes the total energy; they are often referred 
to as orbital-free methods [94, 95, 96 and 97]. Such orbital-free calculations do not have the bottlenecks 
present in orbital-based 0(7V) DFT calculations, such as the need to localize orbitals to achieve linear scaling, 
orbital orthonormalization, or Brillouin zone sampling. Without such bottlenecks, the calculations become 
very inexpensive. 

Equation (B3.2.3) lists the terms comprising the calculation of the total energy. The term due to the external 
potential and the Hartree term describing the Coulomb repulsion energy among the electrons already 
explicitly depend on the density instead of on orbitals. More difficult to evaluate is G[p] = ^ s [p] + E xc [p], a 
functional which is not known exactly. However, over the years a number of high-quality exchange- 
correlation functionals have been developed for all kinds of systems. Only quite recently have more accurate 
kinetic energy density functionals (KEDFs) become available [97, 98 and 99] that afford linear-scaling 
computations. 

One current limitation of orbital-free DFT is that since only the total density is calculated, there is no way to 
identify contributions from electronic states of a certain angular momentum character /. This identification is 
exploited in non-local pseudopotentials so that electrons of different / character 'see' different potentials, 
considerably improving the quality of these pseudopotentials. The orbital-free methods thus are limited to 
local pseudopotentials, connecting the quality of their results to the quality of the available local potentials. 
Good local pseudopotentials are available for the alkali metals, the alkaline earth metals and aluminium [ 100 , 
101 ] and methods exist for obtaining them for other atoms (see section VI.2 of [97]). 

The orbital-free method has been used for molecular-dynamics studies of the formation of the self- interstitial 
defect in Al [ 102 ], pressure-induced glass-to-crystal transitions in sodium [ 103 ] and ion-electron correlations 
in liquid metals [ 101 ]. Calculations of densities for various Al surfaces have shown excellent agreement 
between the charge densities as calculated by Kohn-Sham DFT and an orbital-free method using a KEDF 
with a density-dependent response kernel [99]- The method was used recently to examine the metal-insulator 
transition in a two-dimensional array of metal quantum dots [ 104 ], where the theory showed that minute 
overlap of the nanoparticle's wavefunctions is enough to transform the array from an insulator to a metal. As 
an example of the ease with which large simulations can be performed, figure B3. 2. 10 shows a plot of the 
charge density from an orbital-free calculation of a vacancy among 255 Al atoms [98], carried out on a 
workstation. 
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Figure B3.2.10. Contour plot of the electron density obtained by an orbital-free Hohenberg-Kohn technique 
[98]. The figure shows a vacancy in bulk aluminium in a 256-site cell containing 255 Al atoms and one empty 
site, the vacancy. Dark areas represent low electron density and light areas represent high electron density. A 
Kohn-Sham calculation for a cell of this size would be prohibitively expensive. Calculations on smaller cell 
sizes using both techniques yielded densities that were practically identical. 

B3.2.3.4 THE HARTREE-FOCK METHOD IN CRYSTALS 

The HF method (discussed in section Al.3.1.2) is an alternative to DFT approaches. It does not include 
electron correlation effects, i.e. non-classical electron-electron interactions beyond the Coulomb and 
exchange interactions. The neglect of these terms means that the Coulomb interaction is unscreened, and 
hence the electron repulsion energy is too large, overestimating ionic character, which leads to band gaps that 
are too large by a factor of two or more and valence band widths that are too wide by 30-40% [63]. However, 
the HF results can be used as a well defined starting point for the inclusion of many-particle corrections such 
as the GW approximation [31, 32] or, with considerably less computational effort, the results can be improved 
considerably by accounting for the Coulomb hole and screening the exchange interaction using the dielectric 
function [63, 105 ]. 

Ab initio HF programs for crystals have been developed [ 106 , 107 ] and have been applied to a wide variety of 
bulk and surface systems [ 108 , 109 ]. As an example, a periodic HF calculation using pseudopotentials and an 
LCAO basis predicted binding energies, lattice parameters, bulk moduli and central-zone phonon frequencies 
of 17 III-V and IV-IV semiconductors. The authors find that ' . . . [o]n the whole, the HF LCAO data appear no 
worse than other ab initio results obtained with DF-based Hamiltonians' [ 110 ]. They suggest that the largest 
part of the errors with respect to experiment is due to correlation effects and to a lesser extent due to the 
imperfections of the pseudopotentials [ 110 ]. More recently, the electronic and magnetic properties of 
transition metal oxides and halides such as perovskites, which had been a problem earlier, have been 
investigated with spin-unrestricted HF [ 111 ]. In general, the periodic HF method is best suited for the study of 
highly ionic, large band gap crystals because such systems are the least sensitive to the lack of electron 
correlation. 
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B3.2.3.5 QUANTUM MONTE CARLO 


QMC techniques provide highly accurate calculations of many-electron systems. In variational QMC (VMC) 
[ 112 , 113 and 114 ], the total energy of the many-electron system is calculated as the expectation value of the 
Hamiltonian. Parameters in a trial wavefunction are optimized so as to find the lowest-energy state (modern 

methods instead minimize the variance of the local energy * [ 115 ]). A Monte Carlo (MC) method is used to 
perform the multi-dimensional integrations necessary to determine the expectation value, 


E = 




where \|/ is the trial wavefunction and 1^1"/ J 1^1 di is a normalized probability distribution. The integration 
is performed by summing up the local energy at points, corresponding to electron configurations, given by the 
probability distribution. A random walk algorithm, such as the Metropolis algorithm [ 116 ], is used to sample 
those regions of configuration space more heavily where the probability density is high. The standard Slater- 
Jastrow trial wavefunction is the product of a Slater determinant of single-electron orbitals and a Jastrow 
factor, a function which includes the description of two-electron correlation. As an example, the trial 
wavefunction used for a silicon crystal contained 32 variational parameters whose optimization required the 
calculation of the local energy for 10 000-20 000 statistically independent electron configurations [ 117 ]. In 
contrast to the DMC technique described below, the accuracy of a VMC calculation depends on the quality of 
the many-particle wavefunction used [ 114 ]. In figure B3.2.1 1 we show the determination of the lattice 
constant of GaAs by VMC by minimization of the total energy [ 118 ]. This figure illustrates the roughness of 
the potential energy surface due to statistical errors, which poses a challenge then for the calculation of forces 
with QMC. 
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Figure B3.2.11. Total energy versus lattice constant of gallium arsenide from a VMC calculation including 
256 valence electrons [ 118 ]; the curve is a quadratic fit. The error bars reflect the uncertainties of individual 
values. The experimental lattice constant is 10.68 au, the QMC result is 10.69 (+0.1) au (Figure by Professor 
W Schattke). 
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In the diffusion QMC (DMC) method [ 114 , 119 ], the evolution of a trial wavefunction (typically 
wavefunctions of the Slater- Jastrow type, for example, obtained by VMC) proceeds in imaginary time, x 
according to the time-dependent Schrodinger equation, which then becomes a diffusion equation. All 


if, 


components of the wave function except for the ground-state wavefunction are damped by the time evolution 
operator exp(-iffi) = exp(-i/x). The DMC was developed as a simplification of the Green's function MC 
technique [ 113 ]. A particularly well known use of the Green's function MC technique was the determination 
by Ceperley and Alder of the energy of the uniform electron gas as a function of its density [ 120 ]. This E(p) 
was subsequently parametrized by Perdew and Zunger for the commonly used LDA exchange-correlation 
potential [40]. Usually two approximations are made to make DMC calculations tractable: the fixed-node 
approximation, in which the nodes, the places where the trial function changes sign, are kept fixed for the 
solution to enforce the fermion symmetry of the wavefunction and the so-called short-time approximation, 
whose effect can be made very small [ 114 ]. Excited states have been calculated by replacing an orbital in the 
Slater determinant of the trial wavefunction by a conduction-band orbital [ 121 ]. 

Recently, a method has been proposed to overcome the problems associated with calculating forces in both 
VMC and DMC [ 122 ], It has been suggested that the use of QMC in the near future to tackle the energetics of 
systems as challenging as liquid binary iron alloys is not unthinkable [ 123 ], 

B3.2.3.6 SUMMARY AND COMPARISONS 

As we have outlined, a very wide variety of methods are available to calculate the electronic structure of 
solids. Empirical TB methods (such as discussed in section B3. 2. 2 ) are the least expensive, affording the 

calculation of unit cells with large numbers (e.g. 10 ) of atoms, or to provide cheap input to subsequent 
methods, at the price of quantitative accuracy. DFT methods ( section Al. 3. 5.4 and section B3.2.3.3 ), on the 
other hand, are responsible for many of the impressive results obtained in computational materials theory in 
recent years. The tradeoff for DFT is the opposite: its expense, except in the not-yet-general linear scaling 
methods, limits it typically to systems with at most a few hundred atoms. Once the O(N) DFT methods 
become more general (for example, when orbital-free DFT can treat non-metallic systems), then the DFT 
method will be able routinely to treat systems as large as those treated now with TB. 

The diversity of approaches based on HF ( section B3. 2. 3. 4 ) is small at present compared to the diversity 
found for DFT. For solids, HF appears to yield results inferior to DFT due to the neglect of electron 
correlation, but being a genuine many-particle theory it offers the possibility for consistent corrections, in 
contrast to DFT. Finally, the QMC techniques ( section B3. 2. 3. 4 ) hold promise for genuine many-particle 
calculations, yet they are still far from able to offer the same quantities for the same range of materials and 
geometries as the theories mentioned before. With this wide range of methods now introduced, we will look at 
their application to chemisorption on solid surfaces. 


B3.2.4 QUANTUM STRUCTURAL METHODS FOR SOLID SURFACES 

B3.2.4.1 INTRODUCTION 

First-principles models of solid surfaces and adsorption and reaction of atoms and molecules on those surfaces 
range from ab initio quantum chemistry (HF; configuration interaction (CI), perturbation theory (PT), etc: for 
details see chapter B 3.1 ) on small, finite clusters of atoms to HF or DFT on two-dimensionally infinite slabs. 
In between these 
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two extremes lie embedded cluster models, which recognize and attempt to correct the drastic approximation 
made by using a finite cluster to describe, for example, a metallic conductor whose electronic structure is 
inherently delocalized or an ionic crystal with long-range Coulomb interactions. Upon chemisorption, the 
binding of an atom or a molecule to a surface involves significant sharing of electrons in the bond between the 


adsorbate and surface atoms and this breaking of the crystal symmetry will induce localization of the 
electrons. The attractive feature of the embedded cluster idea is that it preserves the strengths of the cluster 
approach, namely it allows one to describe the very local process of chemisorption to a high degree of 
accuracy by, for example, quantum chemical methods, while at the same time attempting to account for the 
presence of the rest of the surface and bulk. Surface reconstruction and molecular adsorption have been 
studied on a variety of surfaces, including insulators, semiconductors and metals. To illustrate these methods, 
we will focus on those used to examine adsorption of atoms and molecules on transition metal surfaces. This 
is not a comprehensive review of each approach; rather, we provide selected examples that demonstrate the 
range of techniques and applications, and some of the lessons learned. 

B3.2.4.2 THE FINITE CLUSTER MODEL 

The most straightforward molecular quantum mechanical approach is to treat adsorption on a small, finite 
cluster of transition metal atoms, ranging from as small as four atoms up to -40 atoms. Though all-electron 
calculations can be performed, typically the core electrons of transition metal atoms are replaced by an 
effective core potential (ECP, the quantum chemistry version of a pseudopotential that accounts 
approximately for the core-valence electron interaction), while the valence electrons of each metal atom are 
treated explicitly within a HF, CI, PT, or DFT formalism. Typically, a few atoms in the chemisorption region 
contain the valence (or all) electrons explicitly, while surrounding atoms tend to be described more crudely 
with, for example, a one-electron ECP representation, model pseudopotentials or, in the case of ionic crystals, 
a finite array of point charges. Generally, the structure of the cluster is chosen to be a fixed fragment of the 
bulk. Examples of this type of approach include the early work of Upton and Goddard [ 124 ], who examined 
adsorption of electronegative and electropositive atoms on a Ni 20 cluster designed to mimic various low-index 
faces of Ni. In this model, only the 4s electrons on each Ni atom were treated explicitly, while the 3d electrons 
were subsumed into an ECP. They made predictions concerning preferred binding sites, geometries, 
vibrational frequencies and binding energies. Bagus et al [ 125 ] published an important comparison study 
showing that it is more accurate to treat metal atoms directly interacting with an adsorbate at an all-electron 
level, while it is sufficient to describe the surrounding metal atoms with ECPs. Panas et al [ 126 ] proposed the 
idea that a cluster should be 'bond-prepared', namely that one should study an electronic state of the finite 
cluster that has enough singly-occupied orbitals of the correct symmetry to interact with the incoming 
admolecule to form the necessary covalent bonds between the adsorbate and the metal. In one of the first 
studies of a metal surface reaction, Panas et al [ 127 ] examined dissociative chemisorption pathways at the 
multi-reference CI level for 2 on a Ni 13 cluster, generally using ECPs for all but the 4s electrons. Salahub 
and co-workers [ 128 ] used DFT-LDA with a Gaussian basis to examine chemisorption of C, O, H, CO and 
HCOO on Ni clusters containing up to 16 atoms meant to represent various low-index faces of Ni. Gradient 
corrections to the LDA scheme improved dramatically the binding energies for hydrogen bound to small Ni 
clusters, when compared to experimental results for Ni(l 11) and Ni(100) [ 129 ]. Multiple adsorbates were also 
studied by DFT-LDA: for example, in the case of hydrogen on Pd clusters modelling Pd(l 10) [ 130 ]. Diffusion 
barriers were also calculated by DFT-LDA for clusters containing up to 13 metal atoms of Pd, Rh, Sn, and Zn 
[ 131 ]. Other examples include HF calculations of K adsorbed on Cu clusters [ 132 ], HF and Moller-Plesset 
second-order PT (MP2) calculations of acetylene on Cu and Pd clusters [ 133 ], modified coupled pair 
functional (CPF) calculations for CO on Cu clusters [ 134 ], averaged CPF calculations of hydrogen adsorption 
on relaxed Cu clusters [ 135 ], HF, CASSCF (complete active space self-consistent field) and multireference CI 
and PT calculations for CO [ 136 ] and O [ 137 ] on Pt clusters, and spin-polarized DFT of c-CH 2 N 2 on Pd and 
Cu tetramers [ 138 ] and of K and CO on Pd R u [139], 
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The advantage of the finite-cluster model is that one can systematically include high levels of electron 
correlation; this is to be balanced against the lack of a proper band structure, the presence of edge effects and 
the fact that it is generally limited to modelling low coverages. Next we outline current strategies for 
ameliorating some of these difficulties. 


B3.2.4.3 FINITE-CLUSTER MODEL IN CONTACT WITH A CLASSICAL BACKGROUND 

Several modifications of the finite-cluster model meant to account for the background Fermi sea of electrons 
and to compensate for the lack of a proper band structure have been developed. They rely on simple 
approximations of the surface/bulk, usually involving classical electrostatic interactions and usually applied to 
ionic crystals (see, for example, [ 140 ]). Of these, the model invented by Nakatsuji is the primary one that has 
considered adsorption on metal surfaces [ 141 ]. The so-called 'dipped adc luster model' [ 142 ] considers a small 
cluster plus an adsorbate as the 'adcluster' that is 'dipped' onto the Fermi sea of electrons of the bulk metal. A 
normal HF calculation on the small system is performed, in which electrons are added to or removed from the 
cluster in each calculation. By comparing the variation in the total energy with respect to the fractional 
electron transfer, dEldn, to the work function of the metal, |u, the extent of electron transfer between the 
adcluster and the bulk metal can be established. Thus, charges on a small cluster are optimized and an image 
charge correction is also accounted for. In certain cases, integral charges are transferred between the cluster 
and the 'surroundings'; then electron correlation calculations, for example CI, can be carried out. This is a 
purely classical electrostatic approach to accounting for the background electrons in an implicit, rather than 
explicit, manner. Nakatsuji has used this to study adsorption of ionic adsorbates on metals, and finds that one 
can describe the polarization of the metal reasonably well. We have worked briefly with this approach [ 143 ], 
but found that there is a problem with extending the method beyond two-dimensional clusters, because of an 
ambiguity of where to place the image plane. Indeed, Nakatsuji's examples are always small one- or two- 
dimensional clusters. It is also likely that the wavefunction for such small clusters (typically <4 metal atoms) 
would not adequately represent a true metal surface wavefunction. 

A simple, implicit means of describing the metallic band structure [ 144 ] was introduced by Rosch, using a 
Gaussian broadening of the cluster energy levels in order to determine a cluster Fermi level within DFT, 
originally by the X a method (a simplified version of DFT-LDA; see section Al. 3. 3. 3 ). Recent applications of 
this method have utilized more accurate forms of gradient-corrected spin-polarized DFT to look at adsorption 
of, for example, acetylene on Ni 14 20 clusters [ 145 ], CO adsorption on Ni, Pd and Pt clusters of eight or nine 
atoms [ 146 ] and NO adsorption on Ru [ 147 ]. 

B3.2.4.4 SLAB CALCULATIONS 

The other extreme of modelling chemisorption is to use a slab described by DFT or HF. The slab is typically 
taken to be periodic in the directions parallel to the surface and contains a few atomic layers in the direction 
normal to the surface. For the adatoms not to influence each other, unless that is intended, the unit cell needs 
to be sufficiently large, parallel to the surface. For computational reasons, it is advantageous in some methods, 
namely plane wave techniques, to have periodicity in three dimensions. In the supercell geometry, this 
periodicity is gained by considering slabs which are periodic in the direction perpendicular to the surface but 
separated from each other by vacuum regions. The vacuum region has to be thick enough so that there is no 
influence between the surfaces facing each other (the same is true for the slab thickness). For a schematic 
description of several simulation model geometries, see figure B3.2.12 . 
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Figure B3.2.12. Schematic illustration of geometries used in the simulation of the chemisorption of a 
diatomic molecule on a surface (the third dimension is suppressed). The molecule is shown on a surface 
simulated by (A) a semi-infinite crystal, (B) a slab and an embedding region, (C) a slab with two-dimensional 
periodicity, (D) a slab in a supercell geometry and (E) a cluster. 

Freeman and co-workers developed the FLAPW method (see B3.2.2 ) during the early 1980s [70, 148 ]. This 
was a major advance, because the conventional 'muffin-tin' potential was eliminated from their calculation 
allowing general-shape potentials to be evaluated instead. Freeman's group first developed this for thin films 
and then for bulk metals. As mentioned before, the LAPW basis, along with the elimination of any shape 
approximations in the potential, allows for highly accurate calculations on transition metal surfaces, within the 
DFT-LDA and the generalized gradient approximation, GGA (see section Al. 3. 3. 3 ). For the 'stand-alone' 
slab geometry, figure B3. 2. 12(C) the LAPW basis functions decay exponentially into the vacuum. The 
numerous interfacial systems examined by Freeman's group include, for example, CO with K or S 
coadsorption on Ni(001) [ 149 ], adsorption of sulfur alone on Ni(001) [ 150 ], Fe monolayers on Ni(l 11) [ 151 ], 
Ag monolayers on MgO(OOl) [ 152 ], Au-capped Fe monolayers on MgO(OOl) [ 153 ], NO adsorption on Rh, Pd 
and Pt [ 154 ], and Li on Ru(001) [ 155 ]. Typical properties predicted are the equilibrium positions, magnetic 
moments, charge densities and surface densities of states. 

More recently, other groups — primarily in Europe — have begun doing pseudopotential plane wave (often 
gradient-corrected) DFT supercell slab calculations (figure B3. 2. 12(D)) for chemisorption on metals. The 
groups of N0rskov [ 156 ], Scheffler [ 157 ], Baerends [ 158 , 159 ] and Hafner and Kresse [ 160 , 161 ] have been 
the most active. Adsorbate-metal surface systems examined include: alkalis and N 2 on Ru [ 156 ], NO on Pd 
[156], H 2 on Al [156], Cu [156, 158], Pd [157, 158] and sulfur-covered Pd [157], CO oxidation on Ru [157], 
CO on Ni, Pd and Pt [158], O on Pt [160], and H 2 on Rh, Pd and Ag [ 161 ], 
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An interesting study by te Velde and Baerends [ 159 ] compared slab- and cluster-DFT results for CO 
absorption on Cu(100). They found large oscillations in the chemisorption binding energy of CO to finite 
copper clusters as a function of cluster size. This suggests that the finite-cluster model ( figure B3. 2. 12(E) ) is 
likely to be inadequate, at least for modelling metal surfaces. By contrast, the slab calculations converge 
quickly with the number of Cu layers for the CO heat of adsorption and CO-CO distances. 

The supercell plane wave DFT approach is periodic in three dimensions, which has some disadvantages: (i) 
thick vacuum layers are required so the slab does not interact with its images, (ii) for a tractably sized unit 
cell, only high adsorbate coverages are modelled readily and (iii) one is limited in accuracy by the form of the 


exchange-correlation functional chosen. In particular, while DFT, especially using gradient-corrected forms 
of the exchange-correlation functional (GGA), has proven to be remarkably reliable in many instances, there 
are a number of examples for chemisorption in which the commonly used GGAs have been shown to fail 
dramatically (errors in binding energies of 1 eV or greater) [ 162 , 163 ]. This naturally motivates the next set of 
approaches, namely the embedded cluster strategy. 

B3.2.4.5 EMBEDDED-CLUSTER SCHEMES: CLUSTER IN CLUSTER 

Whitten and co-workers developed a metal cluster embedding scheme appropriate for CI calculations during 
the 1980s [ 164 ]. In essence, the method consists of: (i) solving for a HF minimum basis set (one 4s 
orbital/atom) description of a large cluster (e.g.,~30-90 atoms); (ii) localizing the orbitals via exchange 
energy maximization with atomic basis functions on the periphery; (iii) using these localized orbitals to set up 
effective Coulomb and exchange operators for the electrons within the cluster to be embedded; (iv) improving 

the basis set on the atoms comprising the embedded cluster and (v) performing a small CI calculation (O(10 ) 
configurations) within orbitals localized on the embedded cluster. This strategy provides an approximate way 
of accounting for nearby electrons outside the embedded cluster itself. Whitten and co-workers have applied it 
to a variety of adsorbates (H, N, O, C — containing small molecules) on, primarily, Ni surfaces. Duarte and 
Salahub recently reported a DFT-cluster-in-DFT-cluster variant of Whitten' s embedding, with a couple of 
twists on the original approach (for example, fractional orbital occupancies and charges, and an extra buffer 
region) [ 165 ]. Earlier, Sellers developed a related scheme for embedding a MP2 cluster within another cluster, 
where the background was modelled with screened ECPs [ 166 ]. Also, Ravenek and Geurts [ 167 ] and 
Fukunishi and Nakatsuji [ 168 ] extended the Green's matrix method of Pisani [ 169 ] (who developed it mainly 
for ionic crystals) to again embed a cluster within a cluster by introducing a semiorthogonal basis and 
renormalizing the charge on the cluster. It was implemented within the X a method in the former case [ 167 ], 
and by broadening each discrete energy level to mimic the bulk band structure within HF theory for the 
cluster in the latter case [ 168 ], 

Pisani [ 169 ] has used the density of states from periodic HF (see B3.2.2.4) slab calculations to describe the 
host in which the cluster is embedded, where the applications have been primarily to ionic crystals such as 
LiF. The original calculation to derive the external Coulomb and exchange fields is usually done on a finite 
cluster and at a low level of ab initio theory (typically minimum basis set HF, one electron only per atom 
treated explicitly). 

The main drawback of the cluster- in-cluster methods is that the embedding operators are derived from a 
wavefunction that does not reflect the proper periodicity of the crystal: a two-dimensionally infinite 
wavefunction/density with a proper band structure would be preferable. Indeed, Rosch and co-workers 
pointed out recently a series of problems with such cluster- in-cluster embedding approaches. These include 
the lack of marked improvement of the results over finite clusters of the same size, problems with the orbital 
space partitioning such that charge conservation is violated, spurious mixing of virtual orbitals into the density 
matrix [ 170 ], the inherent delocalized nature of metallic orbitals [ 171 ], etc. 
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B3.2.4.6 EMBEDDING OF CLUSTERS IN PERIODIC BACKGROUND 

One of the first cluster embedding schemes was put forth by Ellis and co-workers [ 172 ]. They were interested 
in studying transition metal impurities in NiAl alloys, so they considered a TMA1 Ni cluster embedded in a 
periodic self-consistent crystal field appropriate for bulk (3'-NiAl. The field was calculated via X a 
calculations, as was the cluster itself. The idea was to provide a relatively inexpensive alternative to supercell 
DFT calculations. 

Perhaps the most sophisticated embedding scheme for describing metal surfaces to date is the LDA-based 


self-consistent Green's function method for semi-infinite crystals. Inglesfield, Benesh, and co-workers embed 
the near-surface layers using an embedding potential constructed from the bulk Green's function within an all- 
electron approach, using an LAPW basis [ 173 ]. Scheffler and co-workers developed a similar approach using 
a Gaussian basis for the valence electrons and pseudopotentials [ 174 ]. The formulation of the latter method is 
somewhat different from Inglesfield's and Benesh's, in that a reference system is chosen for which the 
Green's function and density are known (typically the bulk metal), and a A(Green's function) is solved for in 
order to get a A(embedding potential) and hence a A(density). This allows one to solve for the embedding 
potential locally in a small region around the adsorbate. These methods allow for an economical yet accurate 
calculation of the embedding density, which yields a trustworthy description of charge transfer and other 
equilibrium properties, though subject to the accuracy limitations inherent in DFT-LDA. 

In the late 1980s, Feibelman developed his Green's function scattering method using LDA with 
pseudopotentials to describe adsorption on two-dimensionally infinite metal slabs [ 175 ], based on earlier work 
by Williams et al [ 176 ], The physical basis for the technique is that the adsorbate may be considered a defect 
off which the Bloch waves of the perfect substrate scatter. The interaction region is short-range because of 
screening by the electron gas of the metal. Feibelman has used this technique to study, for example, the 
chemisorption of an H 2 molecule on Rh(OOl) [ 177 ], S adatoms on Al(331) [ 178 ] and Ag adatoms on Pt(l 1 1) 
[ 179 ]. Charge densities, relative energies for various adsites and diffusion barriers (the latter in good 
agreement with experiment) were the typical quantities predicted. 

Kriiger and Rosch implemented within DFT the Green's matrix approach of Pisani within an approximate 
periodic slab environment [ 180 ], They were able to successfully extend Pisani 's embedding approach to metal 
surfaces by smoothing out the step function that determines the occupation numbers near the Fermi level. 
Keys to the numerical success of their method included: (i) symmetric orthogonalization of the Bloch basis to 
produce a localized set of functions that yielded a balanced distribution of charge in the system and (ii) self- 
consistent evaluation of the Fermi energy by fixing the charge on the cluster to be neutral. The slab was 
described with a Slater basis at the DFT-LDA level, while the embedded cluster orbitals were expanded in 
terms of Gaussian functions at the DFT-LDA level. While some properties exhibited non-monotonic 
behaviour with increasing cluster size, the charge transfer between the metal surface and the adsorbate seemed 
to be well described. They concluded that properties are not well converged in this method if the cluster does 
not contain shells of metal atoms that are at least next-nearest-neighbours to the adsite metal atoms. 

Head and Silva used occupation numbers obtained from a periodic HF density matrix for the substrate to 
define localized orbitals in the chemisorption region, which then defines a cluster subspace on which to carry 
out HF calculations [ 181 ]. Contributions from the surroundings also only come from the bare slab, as in the 
Green's matrix approach. Increases in computational power and improvements in minimization techniques 
have made it easier to obtain the electronic properties of adsorbates by supercell slab techniques, leading to 
the Green's function methods becoming less popular [ 182 ]. 
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Cortona embedded a DFT calculation in an orbital-free DFT background for ionic crystals [ 183 ], which 
necessitates evaluation of kinetic energy density functionals (KEDFs). Wesolowski and Warshel [ 184 ] had 
similar ideas to Cortona, except they used a frozen density background to examine a solute in solution and 
examined the effect of varying the KEDF. Stefanovich and Truong also implemented Cortona' s method with 
a frozen density background and applied it to, for example, water adsorption on NaCl(OOl) [ 185 ]. 

B3.2.4.7 EMBEDDING EXPLICIT CORRELATION METHODS IN A DFT BACKGROUND 

In principle, DFT calculations with an ideal exchange-correlation functional should provide consistently 
accurate energetics. The catch is, of course, that the exact exchange-correlation functional is not known. 


While various GGAs have been remarkably successful, there are notable exceptions [ 186 , 187 ], including 
ones specific to surface adsorption mentioned earlier, where the binding-energy errors can be more than an eV 
[ 162 , 163 ]. As another example, Louie and Cohen and co-workers found no systematic improvement over the 
LDA when gradient corrections were included in calculations of Al, Nb and Pd bulk properties, including the 
cohesive energy [ 186 ]. Indeed, the design of exchange-correlation functional constitutes an active field of 
research (see, for example, [ 188 ]). The lack of completely systematic means to improve these functional is an 
unappealing aspect of these calculations. 

A first step towards a systematic improvement over DFT in a local region is the method of Aberenkov et al 
[ 189 ], who calculated a correlated wavefunction embedded in a DFT host. However, this is achieved using an 
analytic embedding potential function fitted to DFT results on an indented crystal. One must be cautious using 
a bare indented crystal to represent the surroundings, since the density at the surface of the indented crystal 
will have inappropriate Friedel oscillations inside and decay behaviour at the indented surface not present in 
the real crystal. 

We have developed a different first-principles embedding theory that combines DFT with explicit correlation 
methods. We sought to develop a method for treating bulk or surface phases that is more accurate than current 
implementations of DFT. The idea is to provide more accurate predictions for local energetics, such as 
chemisorption binding energies and adsorbate electronic excitation energies. To achieve this, our theory 
improves upon the DFT description of electron correlation in a local region. This is accomplished by an 
embedding theory that treats a small region within an accurate quantum chemistry approach [ 190 , 191 ], which 
interacts with its surroundings via an embedding potential, v Qm \ )Q ^( r )' This v embed (r) is derived from a periodic 
DFT calculation on the total system. It is expressed purely in terms of orbital-free DFT (kinetic and potential 
energy) interaction terms between the embedded region and its surroundings a la Cortona and, in particular, 
purely in terms of functional of the total density, p tot , and the density of the embedded region, pj. We thus 
avoid construction of localized orbitals to describe the electrons in the surrounding environment. This is 
especially important for metal surfaces, where the extensive A-point sampling required to get a well converged 
density makes localization impractical (very expensive). This way of expressing the embedding operator also 
eliminates problems that occur in other forms of embedding, such as those of matching conditions at the 
embedding boundary, or spurious charge transfer, since the electrostatic potential and the density are 
continuous by construction. Its only real disadvantage is that there is an arbitrariness associated with the 
choice of T & . Development of optimal T § functional is an active area of research in our group [ 97 , 98 and 99]. 

The self-consistent embedding cycle proceeds as follows. First, a well converged density, p tot , is calculated 
for the extended metal surface in the presence of an adsorbate. This is accomplished within a standard 
pseudopotential plane wave DFT calculation (see chapter A 1.3 ). Second, we partition the system into the 
region of interest (typically the adsorbate and neighbouring metal atoms at or near the surface) and its 
surroundings (all the other atoms in the periodic unit cell). The embedded region is defined by the integral 
number of electrons and nuclei within that region but not by 
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a particular physical, fixed boundary. This allows for the electron density from the embedded region to 
expand or contract variationally into the surroundings, thus affording some effective charge polarization to 
occur as needed. 

The electron density, p p of the embedded cluster/adsorbate atoms is calculated using quantum chemistry 
methods (HF, PT, multireference SCF, or CI). The initial step in this iterative procedure sets v embed (r) to zero, 

since p l is needed in order to calculate it. On subsequent iterations, the third step is to use Pj and p t t to 
calculate v em b e d( r X then insert it, as a one-electron operator expressed in matrix form in the atomic orbital 
basis of the adsorbate/c luster, into the quantum chemistry calculation of step two, and then pj is updated (via 
the wavefunction). We repeatedly update v^^^r) and then p T until full self-consistency is achieved, with 


fixed p tor In this way, we variationally optimize both the quantum chemistry wavefunction and, implicitly, 
the density of the surroundings, subject to fixed p t r We tacitly assume that the DFT-slab density for the total 
system, p tot , is in fact a good representation and does not need to be adjusted. 

We have shown that our embedding total energies may be written in terms of the total energy obtained in step 
one (the DFT total energy for the entire system), plus a correction term, that subtracts out the DFT energy in 
the local region I and adds back in an ab initio total energy for that same region, 

Thus, another way to think of the embedding is that the ab initio treatment of region I is correcting the DFT 
results in the same region, for the same self-consistent density. We expect, then, that such a treatment should 
reduce, for example, the famous LDA overbinding problem (LDA bond energies are generally significantly 
overestimated). We have indeed seen a smooth decrease in the LDA overbinding as a function of increasing 
electron correlation. We benchmarked the method against nearly exact calculations on a small system and 
then further corroborated it on experimentally well studied chemisorption systems: CO on transition metal 
surfaces. Our binding energies are in good agreement with nearly full configuration interaction in the former 
and experimental adsorbate binding energies in the latter. Very recently, we have demonstrated that excitation 
energies for adsorbed CO are dramatically improved compared to experiment upon inclusion of the 
embedding potential [ 192 ]. In the future, we hope this method will provide a general means for accurate 
predictions of the local electronic structure of condensed matter. 


B3.2.5 OUTLOOK 

Computational solid-state physics and chemistry are vibrant areas of research. The all-electron methods for 
high-accuracy electronic structure calculations mentioned in section B3. 2. 3. 2 are in active development, and 
with PAW, an efficient new all-electron method has recently been introduced. Ever more powerful computers 
enable more detailed predictions on systems of increasing size. At the same time, new, more complex 
materials require methods that are able to describe their large unit cells and diverse atomic make-up. Here, the 
new orbital-free DFT method may lead the way. More powerful techniques are also necessary for the accurate 
treatment of surfaces and their interaction with atoms and, possibly complex, molecules. Combined with 
recent progress in embedding theory, these developments make possible increasingly sophisticated predictions 
of the quantum structural properties of solids and solid surfaces. 
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B3.3 Statistical mechanical simulations 

Michael P Allen 


B3.3.1 INTRODUCTION 

Computer simulation, at the molecular level, has grown enormously in importance over the last 50 years. 
Affordable computer chips have historically doubled in power every 18 months, so the computer simulator, 
regarded as an experimentalist, has the unique advantage of rapidly improving apparatus. With the recent 
explosion in personal computing, there seems every prospect that this situation will continue, allowing 
computer simulation to become of even more practical value in fields such as the design of drugs and 
molecular materials. This provides a stimulus to develop simulation methods, and an industry has grown up 
marketing the necessary software. 

This chapter concentrates on describing molecular simulation methods which have a connection with the 
statistical mechanical description of condensed matter, and hence relate to theoretical approaches to 
understanding phenomena such as phase equilibria, rare events, and quantum mechanical effects. 

B3.3.1.1 THE AIMS OF SIMULATION 

We carry out computer simulations in the hope of understanding bulk, macroscopic properties in terms of the 
microscopic details of molecular structure and interactions. This serves as a complement to conventional 
experiments, enabling us to learn something new; something that cannot be found out in other ways. 

Computer simulations act as a bridge between microscopic length and time scales and the macroscopic world 
of the laboratory (see figure B3.3.1 . We provide a guess at the interactions between molecules, and obtain 
'exact' predictions of bulk properties. The predictions are 'exact' in the sense that they can be made as 
accurate as we like, subject to the limitations imposed by our computer budget. At the same time, the hidden 
detail behind bulk measurements can be revealed. Examples are the link between the diffusion coefficient and 


velocity autocorrelation function (the former easy to measure experimentally, the latter much harder); and the 
connection between equations of state and structural correlation functions. 

Simulations act as a bridge in another sense: between theory and experiment (see figure B3.3.2 . We can test a 
theory using idealized models, conduct 'thought experiments', and clarify what we measure in the laboratory. 
We may also carry out simulations on the computer that are difficult or impossible in the laboratory (for 
example, working at extremes of temperature or pressure). 
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Figure B3.3.1. Simulations as a bridge between the microscopic and the macroscopic. We input details of 
molecular structure and interactions; we obtain predictions of phase behaviour, structural and time-dependent 
properties. 
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Figure B3.3.2. Simulation as a bridge between theory and experiment. We may test a theory by conducting a 


simulation using the same model. We may test the model by comparing with experimental results. 


Ultimately we may want to make direct comparisons with experimental measurements made on specific 
materials, in which case a good model of molecular interactions is essential. The aim of so-called ab initio 
molecular dynamics is to reduce the amount of fitting and guesswork in this process to a minimum. On the 
other hand, we may be interested in phenomena of a rather generic nature, or we may simply want to 
discriminate between good and bad theories. When it comes to aims of this kind, it is not necessary to have a 
perfectly realistic molecular model; one that contains the essential physics may be quite suitable. 

The two main families of simulation technique are molecular dynamics (MD) and Monte Carlo (MC). 
Additionally, there is a whole range of hybrid techniques which combine features from both MC and MD. 

B3.3.1.2 THE TECHNIQUES OF SIMULATION 

Molecular dynamics consists of the brute-force solution of Newton's equations of motion. It is necessary to 
encode in the program the potential energy and force law of interaction between molecules; the equations of 
motion are solved numerically, by finite difference techniques. The system evolution corresponds closely to 
what happens in 'real life' and allows us to calculate dynamical properties, as well as thermodynamic and 
structural functions. For a range of molecular models, packaged routines are available, either commercially or 
through the academic community. 

Monte Carlo can be thought of as a prescription for sampling configurations from a statistical ensemble. The 
interaction potential energy is coded into the program, and a random walk procedure adopted to go from one 
state of the system to the next. MC programs can be relatively easy to program; they allow us to calculate 
thermodynamic and structural properties, but not exact dynamics. It is relatively simple to specify external 
conditions (constant temperature, pressure etc.) and many tricks may be devised to improve the efficiency of 
the sampling. 

Both MD and MC techniques evolve a finite-sized molecular configuration forward in time, in a step-by-step 
fashion. (In this context, MC simulation 'time' has to be interpreted liberally, but there is a broad connection 
between real time and simulation time (see [I, chapter 2]).) Common features of MD and MC simulation 
techniques are that there are limits on the typical timescales and length scales that can be investigated. The 
consequences of finite size must be considered both in specifying the molecular interactions, and in analysing 
the results. 


B3.3.2 SIMULATION AND STATISTICAL MECHANICS 

Here we consider various aspects of statistical mechanics (see also chapter A2.3 and [2, 3]) that have a direct 
bearing on computer simulation methodology. 

B3.3.2.1 SIMULATION TIME AND LENGTH SCALES 

Simulation runs are typically short (t ~ 10 3 - 10 6 MD or MC steps, corresponding to perhaps a few 
nanoseconds of real time) compared with the time allowed in laboratory experiments. This means that we 
need to test whether or not a simulation has reached equilibrium before we can trust the averages calculated in 
it. Moreover, there is a clear need to subject the simulation averages to a statistical analysis, to make a 
realistic estimate of the errors. 


How long should we run? This depends on the system and the physical properties of interest. Suppose that we 
are interested in a variable X, defined such that its ensemble average X= (%) = 0. (Here and throughout we use 
script letters for instantaneous dynamical variables, i.e., functions of coordinates and momenta, to distinguish 
them from averages and thermodynamic quantities.) A characteristic time, x, may be defined, over which the 
correlations (x(O)x(t)) decay towards zero. The simulation run time t mn should be significantly longer than x. 
The time scales of properties of interest will vary from one system to another; they may not be predictable in 
advance, and this will have a bearing on the length of simulation required. 

Similar considerations apply to the size of system simulated. The samples involved are typically quite small 

on the laboratory scale. Most fall in the range TV- 10-10 particles, thus imposing a restriction on the length 
scales of the phenomena that may be investigated, in the nanometre-submicron range. Indeed, in many cases, 
there is an overriding need to do a system-size analysis of simulation results, to quantify these effects. 

How large a simulation do we need? Once more this depends on the system and properties of interest. From a 
spatial correlation function (x(0)x( r )) relating values computed at different points r apart, we may define a 
characteristic distance £, over which the correlation decays. The simulation box size L should be significantly 
larger than £, in order not to influence the results. 

The ratios, t mn /x and L/^, appear in expressions for estimating the errors on simulation-averaged quantities. 

Roughly speaking, a simulation sample can be regarded as a collection of ~(L/Q sub-samples, each making a 
statistically independent contribution to the average properties. Also, a simulation run may be regarded as a 

succession of ~t mn /x statistically independent sub-runs. Then, the usual rules for combining independent 
samples apply, and estimated error bars are inversely proportional to the square root of the run time. For 
further information see [4, 5, 6 and 7]. 

Near critical points, special care must be taken, because the inequality L JW; will almost certainly not be 
satisfied; also, critical slowing down will be observed. In these circumstances a quantitative investigation of 
finite size effects and correlation times, with some consideration of the appropriate scaling laws, must be 
undertaken. Examples of this will be seen later; one of the most encouraging developments of recent years has 
been the establishment of reliable and systematic methods of studying critical phenomena by simulation. 

B3.3.2.2 PERIODIC BOUNDARY CONDITIONS 

Small sample size means that, unless surface effects are of particular interest, periodic boundary conditions 
need to be used. Consider 1000 atoms arranged inalOxlOxlO cube. Nearly half the atoms are on the outer 
faces, and these will have a large effect on the measured properties. Surrounding the cube with replicas of 
itself takes care of this problem. Provided the potential range is not too long, we can adopt the minimum 
image convention that each atom interacts with the nearest atom or image in the periodic array. In the course 
of the simulation, if an atom leaves the basic simulation box, attention can be switched to the incoming image. 
This is shown in figure B3.3.3 . Of course, it is important to bear in mind the imposed artificial periodicity 
when considering properties which are influenced by long-range correlations. Special attention must be paid 
to the case where the potential range is not short: for example, for charged and dipolar systems. Methods for 
handling this are discussed later. 
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Figure B3.3.3. Periodic boundary conditions. As a particle moves out of the simulation box, an image particle 
moves in to replace it. In calculating particle interactions within the cutoff range, both real and image 
neighbours are included. 


B3.3.2.3 MOLECULAR INTERACTIONS 

Let us denote a 'state of the system' by y. For the purposes of discussion, we shall concentrate on a system 
composed of atoms, and for this y represents the complete set of coordinates r N ) =(r 1? r 2 ,. . . r^) and conjugate 
momenta p™) = (p^ ,/? 2 ,. . . p^j). Then the energy, or Hamiltonian, may be written as a sum of kinetic and 
potential terms 7i= £"+ V. For atomic systems, Vis a function of coordinates only and Cmay be written as a 

function of momenta; in molecular systems represented as rigid bodies, or in terms of generalized coordinates, 
the kinetic energy may also depend on the coordinates [§]. 

Sticking, for simplicity, with a simple atomic system, the kinetic energy may be written 


&(PliP2 P^ = Y2 ^2 P?«/2mi- 


j = | u-x.y.r 

The potential energy Vis traditionally split into one-body, two-body, three-body . . . terms: 

The v^ ) term represents an externally applied potential field or the effects of the container walls; it is usually 
dropped for fully periodic simulations of bulk systems. Also, it is usual to neglect v^ ) and higher terms 
(which in reality might be of order 10% of the total energy in condensed phases) and concentrate on v^ 2 ). For 
brevity henceforth we will just call this v(r). There is an extensive literature on the way these potentials are 
determined experimentally, or modelled 


theoretically (see, e.g., [9, 10 and 11]). In simulations, it is common to use the simplest models that faithfully 
represent the essential physics: the hard-sphere, square-well, and Lennard- Jones potentials have the longest 


history. The latter has the functional form 

12 


•"oHQ-0 


with two parameters: d, the diameter, and s, the well depth. This potential was used, for instance, in the 
earliest studies of the properties of liquid argon [12, 13 ]. For molecular systems, we simply build the 
molecules out of site-site potentials of this, or similar, form ( figure B3.3.4 ). If electrostatic charges are 
present, we add the appropriate Coulomb potentials 

Coulombs ^ _ Q\Q? 

where Q v Q 2 are the charges. We may also use rigid-body potentials which depend on centre of mass 
positions and orientations. An example is the Gay-Berne potential [ 14 ] 

v m {r.it u u 2 ) = As{v,u u u 2 )[q^ 2 - £-*] 

with 


£ = 


d* 


which depends upon the molecular axis vectors ■ti 1 and i 2 , and on the direction rand magnitude r of the 
centre-centre vector r = r 1 - r 2 . The parameter d^ determines the smallest molecular diameter and there are 
two orientation-dependent quantities in the above shifted Lennard- Jones form: a diameter d(f, -tip u 2 ) anc ^ an 

energy z(u, u 1? u 2 ). Each quantity depends in a complicated way (not given here) on parameters 

characterizing molecular shape and structure. This potential has been extensively used in the study of 
molecular liquids and liquid crystals [15, 16, 17, 18, 19 and 20 ]. 
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Figure B3.3.4. Lennard- Jones pair potential showing the r and r" 6 contributions. 

It is common practice in classical computer simulations not to attempt to represent intramolecular bonds by 
terms in the potential energy function, because these bonds have very high vibration frequencies and should 
really be treated in a quantum mechanical way rather than in the classical approximation. Instead, the bonds 
are treated as being constrained to have fixed length, and some straightforward ways have been devised to 
incorporate these constraints into the dynamics (see later). 

For a wide range of physical problems, a lattice spin system provides a useful, if very coarse-grained, 
description. The great advantage of such an approach is the speed with which such systems may be simulated, 
especially when a single spin may be taken to represent not just one molecule but a larger region of the 
physical system. The state y of such systems may be specified by a set of discrete or continuous spin values a 
( N ) = (GpC^,. . .,a N ), where a i = ±1 for the archetypal Ising model, but takes other values for other models. 
For Ising-like systems, the energy may be written 7i= -JL, \Gfi; where E, , indicates a sum over nearest 

neighbour spins ij and J is a coupling constant. Again, there is much flexibility in the nature of the energy 
function. There is an extensive literature on spin simulations [6, 21, 22 and 23] especially in relation to the 
theory of critical phenomena [24]. Spin models are not restricted to the obvious area of magnetic solids; it has 
proved possible to include, for instance, polymer liquids in this class [25], allowing the study of otherwise 
inaccessible behaviour. 


B3.3.2.4 SIMULATIONS AND ENSEMBLES 

One of the flexibilities of computer simulation is that it is possible to define the thermodynamic conditions 
corresponding to one of many statistical ensembles, each of which may be most suitable for the purpose of the 
study. A knowledge of the underlying statistical mechanics is essential in the design of correct simulation 
methods, and in the analysis of simulation results. Here we describe two of the most common statistical 
ensembles, but examples of the use of other ensembles will appear later in the chapter. 


The microcanonical ensemble corresponds to an isolated system, with specified number of particles TV, 


volume V, and energy E. The fundamental thermodynamic potential is the entropy, and it is related to 
statistical mechanical quantities as follows: 


(B3.3.1) 


Here, ^ NVE is the number of states available to the system at given NVE, written as an integral over a thin 
energy shell; S(. . .) is the Dirac delta function. The ensemble average 'X)nve * s defined in terms of the 
ensemble probability density function e N y E (r)- 

The canonical ensemble corresponds to a system of fixed N and V, able to exchange energy with a thermal 
bath at temperature T, which represents the effects of the surroundings. The thermodynamic potential is the 
Helmholtz free energy, and it is related to the partition function Q NVT as follows: 

A = £-TS= -kTlnQwr 

J J (B3.3.2) 

{X)Nvr= f&rQ N¥ Anxin. 

Here $=l/kT. In a real system the thermal coupling with surroundings would happen at the surface; in 
simulations we avoid surface effects by allowing this to occur homogeneously. The state of the surroundings 
defines the temperature T of the ensemble. 

Since 7i=K. + V, the canonical ensemble partition function factorizes into ideal gas and excess parts, and as a 
consequence most averages of interest may be split into corresponding ideal and excess components, which 
sum to give the total. In MC simulations, we frequently calculate just the excess or configurational parts: in 
this case, y consists just of the atomic coordinates, not the momenta, and the appropriate expressions are 
obtained from equation b3.3.2 by replacing Tiby the potential energy V. The ideal gas contributions are 
usually easily calculated from exact 


expressions, in which the integrations over atomic momenta have been carried out analytically. 

B3.3.2.5 AVERAGES AND DISTRIBUTIONS 

It is generally well known that, for most averages, differences between ensembles disappear in the 
thermodynamic limit. However, for finite-sized systems of the kind studied in simulations, it is necessary to 
consider the differences between ensembles, which will be significant for mean-squared values (fluctuations) 
and, more generally, for the probability distributions of measured quantities. For example, energy fluctuations 
in the constant-TVKE 1 ensemble are (by definition) zero, whereas in the constant-TVFT ensemble they are not. 
Since these points have a bearing on various aspects of simulation methodology, we expand on them a little 
here. 

It is a standard result in the canonical ensemble that energy fluctuations are related to the heat capacity C y = 
(8E/dT) v : 


kT 2 C v = {H 2 )- (Hf = {&?{ 2 }> 

Since C y and E are both extensive properties (ocN), the root-mean-square energy fluctuations are smaller, by a 
factor l/i/fif, than typical average energies E. As the system size increases, the relative magnitude of 

fluctuations decreases, and the thermodynamic limit is achieved. 

It is instructive to see this in terms of the canonical ensemble probability distribution function for the energy, 
P N y T (£). Referring to equation B 3. 3.1 and equation (B3.3.2) , it is relatively easy to see that 


Vx VT (F r ) = (Hmr)-E)h< V T = 


Qmyt 


-E/kT 


The product of a rapidly increasing function of energy, ^ NVE , and a rapidly decreasing function e"" /iVX , gives 
a distribution of energies which is very sharply peaked about the most likely value, as shown in figure B3.3.5. 
A reasonable first approximation to P NVT (iT) is a Gaussian function, centred on this most probable value, with 
a width determined by C y . 
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Figure B3.3.5. Energy distributions. The probability density is proportional to the product of the density of 
states and the Boltzmann factor. 
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In principle, these formulae may be used to convert results obtained at one state point into averages 
appropriate to a neighbouring state point. For any canonical ensemble average 


w 


NVTi — 




-HflE 


Vwt (E) s {&(H - E)) NVTl = P^(F) — 


(*"* K >W»" 


(B3.3.3) 
(B3.3.4) 


where 8p = P^Pq- Choosing % = 5(H-E) gives a way of re -weighting the energy distribution 


Such histogram re-weighting techniques have a long history [ 26 , 27 , 28 and 29]. The usefulness of this 
equation depends sensitively on accurate sampling of energies in the region of interest, at T^ which may be 
far away from the maximum in 'P. We have seen how the mean-squared fluctuations in E are related to the 
heat capacity; higher-order, non-Gaussian terms in T^y T (E) are also related to thermodynamic derivatives 

[ 30 , 31], but they are smaller, and so are hard to measure accurately. This limits the extension to nearby state 
points; a common theme of computer simulation is the devising of techniques to get around this problem, and 
we shall return to this later. Similar considerations apply to volume distributions in constant-pressure 


ensembles, and indeed to other cases of thermodynamically conjugated pairs of variables. 

Statistical mechanics may be used to derive practical microscopic formulae for thermodynamic quantities. A 
well-known example is the virial expression for the pressure, easily derived by scaling the atomic coordinates 
in the canonical ensemble partition function 


PV = NkT 


-1{i:s4 


Here we assumed pairwise additivity V= EjE-^v.., and defined w(r) = r(dv(r)/dr). Also easily derived in the 
canonical ensemble is the general virial-like form, where q may be any coordinate or momentum, 


\ 9*/ W 


A well known example of this is obtained by setting % = p^ot, a=x, y, z, any component of momentum, giving 
the equipartition-of-energy relation 




This is commonly used to measure the temperature in a MD simulation. Less well known is the hypervirial 
relation 
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obtained by setting %=-(dV/dr- ) = f , a component of the force: 


(?©)="(?©) 


which relates the Laplacian of the potential with the mean-squared force. Butler et al [32] have suggested 
using this expression to measure the 'configurationaP temperature (as a check) in MC simulations; they also 
provide a derivation of the corresponding expression in the microcanonical ensemble (see also [33]); it is 
surprising that this useful proposal has only been made so recently. 

Finally, by considering increasing the number of particles by one in the canonical ensemble (looking at the 
excess, non-ideal, part), it is easy to derive the Widom [ 34 ] test-particle formula 

H a * A^ t , - A » = -kT Ln{e-^<). (B3.3.5) 

Here we have separated terms in the potential energy which involve the extra 'test' particle, V N+1 = V N +v t r 
The ensemble average here includes an unweighted average over inserted particle coordinates. In practice this 
means randomly inserting a test particle, many times, and averaging the Boltzmann factor of the associated 
energy change. More details of free energy calculations will be given later. 

B3.3.2.6 TIME DEPENDENCE 


A knowledge of time-dependent statistical mechanics is important in three general areas of simulation. First, 
in recent years there have been significant advances in the understanding of MD algorithms, which have 
arisen out of an appreciation of the formal operator approach to classical mechanics. Second, an 
understanding of equilibrium time correlation functions, their link with dynamical properties and especially 
their connection with transport coefficients, is essential in making contact with experiment. Third, the last 
decade has seen a rapid development of the use of nonequilibrium MD, with a better understanding of the 
formal aspects, particularly the link between the dynamical algorithm, dissipation, chaos and fractal geometry. 
Space does not permit a full description of this here: the interested reader should consult [ 35 , 36 ] and 
references therein. 

The Liouville equation dictates how the classical statistical mechanical distribution function Q(r™\p™\ i) 
evolves in time. (Also, quantum dynamics may be expressed in a formally equivalent way, but we shall 
concentrate exclusively on classical systems here.) From considerations of standard, Hamiltonian, mechanics 
[8] and the flow of representative systems in an ensemble through a particular region of phase space, it is easy 
to derive the Liouville equation 




defining the Liouville operator £. Compare this equation for gwith the time evolution equation for a 
dynamical variable y(r™\p™\ which comes directly from the chain rule applied to Hamilton's equations 
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The formal solutions of the time evolution equations are 

q{!} = *- iir e(Q) and X(r) = c^'A^O). (B3.3.6) 

A number of manipulations are possible, once this formalism has been established. There are useful analogies 
both with the Eulerian and Lagrangian pictures of incompressible fluid flow, and with the Heisenberg and 
Schrodinger pictures of quantum mechanics [37, chapter 7], [38, chapter 11]. These analogies are particularly 
useful in formulating the equations of classical response theory [39], linking transport coefficients with both 
equilibrium and nonequilibrium simulations [35]. 

The Liouville equation applies to any ensemble, equilibrium or not. Equilibrium means that ^should be 
stationary, i.e., that 

de/dt = 0. 

In other words, if we look at any phase-space volume element, the rate of incoming state points should equal 
the rate of outflow. This requires that 0be a function of the constants of the motion, and especially Q=Q(H). 
Equilibrium also implies d(%)/dt = for any %. The extension of the above equations to nonequilibrium 
ensembles requires a consideration of entropy production, the method of controlling energy dissipation 
(thermostatting) and the consequent non-Liouville nature of the time evolution [35]. 


B3.3.3 MOLECULAR DYNAMICS 


The solution of Newton's or Hamilton's equations on the computer 

t; = pi/m; and p, = ft 

where m i is the mass of atom i, and/^. is the total force acting on it, is intrinsically a simple task. Many 
methods exist to perform step-by-step numerical integration of systems of coupled ordinary differential 
equations. Characteristics of these equations are: (a) they are 'stiff, i.e., there may be short and long time 
scales, and the algorithm must cope with both; (b) calculating the forces is expensive, typically involving a 
sum over pairs of atoms, and should be performed as infrequently as possible. 

Also we must bear in mind that the advancement of the coordinates fulfils two functions: (i) accurate 
calculation of dynamical properties, especially over times as long as typical correlation times x; (ii) accurately 
staying on the constant-energy hypersurface, for much longer times t . Exact time reversibility is highly 
desirable (since the original equations 
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are exactly reversible). To ensure rapid sampling of phase space, we wish to make the time step as large as 
possible, consistent with these requirements. For these reasons, simulation algorithms have tended to be of 
low order (i.e., they do not involve storing high derivatives of positions, velocities etc): this allows the time 
step to be increased as much as possible without jeopardizing energy conservation. It is unrealistic to expect 
the numerical method to accurately follow the true trajectory for very long times t mn . The 'ergodic' and 
'mixing' properties of classical trajectories, i.e., the fact that nearby trajectories diverge from each other 
exponentially quickly, make this impossible to achieve. 

All these observations tend to favour the Verlet algorithm in one form or another, and we look closely at this 
in the following sections. For historical reasons only, we mention the more general class of predictor- 
corrector methods which have been optimized for classical mechanics simulations, [40, 41]; further details are 
available elsewhere [7, 42, 43 ]. 

B3.3.3.1 THE VERLET ALGORITHM 

There are various, essentially equivalent, versions of the Verlet algorithm, including the original method 
employed by Verlet [ 13 , 44 ] in his investigations of the properties of the Lennard- Jones fluid, and a 'leapfrog' 
form [45]. Here we concentrate on the 'velocity Verlet' algorithm [46], which may be written 

n(t + $t) = nit) + SrptW/mi •+ {S^fm/nn 
Mt + $t) = pj(/) + \&t\Mt) + fat +&!)). 

This advances the coordinates and momenta over a small time step 8 t. A piece of pseudo-code illustrates how 
this works: 

call forceOpf) 
do 3t*p = l p nst*p 

r - r + dt*p/m + (0.5*dt**2)*f/m 
p = p + 0.5*dt*f 
call force(r p f) 
p - p + 0.5*dt*f 
enddo 


The forces are calculated from the positions at the start of a simulation. They are used to advance the 
positions, and 'half-advance' the velocities or momenta. The new forces/(t+S f) are calculated, and these are 
used to complete the momentum update. At the end of the step, positions, momenta, and forces all 
conveniently refer to the same time point. Moreover, as we shall see shortly there is an interesting theoretical 
derivation of this version of the algorithm. 

Important features of the Verlet algorithm are: (a) it is exactly time reversible; (b) it is low order in time, 
hence permitting long time steps; (c) it is easy to program. 
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B3.3.3.2 PROPAGATORS AND THE VERLET ALGORITHM 

The velocity Verlet algorithm may be derived by considering a standard approximate decomposition of the 
Liouville operator which preserves reversibility and is symplectic (which implies that volume in phase space 
is conserved). This approach [ 47 ] has had several beneficial consequences. 

The Liouville operator of equation b3. 3. 6 may be written [ 48 ] 

where 8t = t/P and an approximate propagator, correct at short time steps 8 t— »0, appears in the parentheses. 
This is a formal way of stating what we do in MD, when we split a long time period t into a large number P of 
small time steps 8 t, using an approximation to the true equations of motion over each time step. It turns out 
that useful approximations arise from splitting £into two parts 

L = L p + L r . 

The following approximation 

e »£& = e <iV-iL,tfj ^ e LMi/2 e "L r fr e & F Uf2 (B3.3.7) 

is asymptotically exact in the limit 8 t — » 0. For nonzero 8 t this is an approximation to e 1 *- T because in 
general L and j^ do not commute, but it is still exactly time reversible. Tuckerman et al [47] set 

A straightforward derivation (not reproduced here) shows that the effect of the three successive steps 
embodied in equation (b3.3.7), with the above choice of operators, is precisely the velocity Verlet algorithm. 
This approach is particularly useful for generating multiple time-step methods. 

B3.3.3.3 MULTIPLE TIME STEPS 

An important extension of the MD method allows it to tackle systems with multiple time scales: for example, 
molecules which have very strong internal springs representing the bonds, while interacting externally 
through softer potentials, or molecules consisting of both heavy and light atoms. A simple MD algorithm will 
have to adopt a time step short enough to handle the fast- varying internal motions. Tuckerman et al [ 47 ] set 


out methods for generating time-reversible Verlet-like algorithms using the Liouville operator formalism 
described above. Here we suppose that there are two types offeree in the system: slow-moving external 
forces F. and fast-moving internal forces^.. The momentum satisfies P=f^ + F.. Then we break up the 

Liouville 
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operator \i = iL p + \l 

The propagator approximately factorizes 

where A t represents a long time step. The middle part is then split again, using the conventional separation, 
and iterating over small time steps 8 t = At/P: 

So the fast- varying forces must be computed many times at short intervals; the slow- varying forces are used 
just before and just after this stage, and they only need be calculated once per long time step. 

This actually translates into a fairly simple algorithm, based closely on the standard velocity Verlet method. 
Written in a Fortran-like pseudo-code, it is as follows. At the start of the run we calculate both rapidly-varying 
(f) and slowly- varying (F) forces, then, in the main loop: 

do STEP = 1. NSTEP 
p - p + O.S*DT*F 
do step * 1, instep 

r - r + dt*p/m + (Qt5*dt**2)*f/n 
p - p + 0*5*dt*f 
call forc*(r ± f) 
p = p t 0,5+dt+f 

enddo 

call F0HCE(r,F) 

p - p + 0.5*DT*F 

enddo 

The entire simulation run consists of NSTEP long steps; each step consists of nstep shorter sub-steps. DT 
and dt are the corresponding time steps, DT = nstep*dt. 

A particularly fruitful application, which has been incorporated into the computer code ORAC [49], is to split 
the interatomic force law into a succession of components covering different ranges: the short-range forces 
change rapidly with time and require a short time step, but advantage can be taken of the much slower time 
variation of the long-range forces, by using a longer time step and less frequent evaluation for these. Having 
said this, multiple time step algorithms are still under active study [50], and there is some concern that 
resonances may occur between the natural frequencies of the system under study, and the various time steps 


used in schemes of this kind [51]. 
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B3.3.3.4 CONSTRAINTS 

Although, in principle, multiple time steps provide a method of integrating stiff degrees of freedom, such as 
intramolecular bonds, the alternative of rigidly constraining bond lengths is still very popular. In classical 
mechanics, constraints are introduced through the Lagrangian [8] or Hamiltonian [52] formalisms. Given a set 
of algebraic relations between atomic coordinates, for example, a fixed bond length b between atoms 1 and 2 

&(r[ t T2) = (n -V2) *{r\ —T2) — b 2 = 

the constraint force between the atoms will have the following form 

(in rlir 

S[ = a— arid 92 = i— 

and it will appear in the equations of motion along with the normal forces. It is easy to derive an exact 
expression for the multiplier X; for many constraints, a system of equations (one per constraint) is obtained. In 
practice, since the equations of motion are only solved approximately, the constraints will be increasingly 
violated as the simulation proceeds. The breakthrough in this area came with the proposal of a scheme, 
SHAKE, to solve the equations for the constraint forces approximately (i.e., to the same level of 
approximation as the dynamical algorithm) in such a way that the constraints are satisfied exactly at the end of 
each time step [53, 54]; for a review see [55]. The appropriate version of this scheme for the velocity Verlet 
algorithm is called RATTLE [56]. 

It is important to realize that a simulation of a system with rigidly constrained bond lengths is not equivalent 
to a simulation with, for example, harmonic springs representing the bonds, even within the limit of very 
strong springs. One obvious point is that the momenta conjugated to the bond coordinates are nonzero and 
store some kinetic energy in the spring case, while they are zero by definition in the constrained case. A 
subtle, but crucial, consequence of this is that it has an effect on the distribution function for the other 
coordinates. If we obtain the configurational distribution function by integrating over the momenta, the 
difference arises because in one case a set of momenta is set to zero, and not integrated, while in the other an 
integration is performed, which may lead to an extra term depending on particle coordinates. This is 
frequently called the 'metric tensor problem'; it is explained in more detail in [7, 52], and there are well- 
established ways of determining when the difference is likely to be significant [58] and how to handle it, if 
necessary [59]. 

Some people prefer to use the multiple time step approach to handle fast degrees of freedom, while others 
prefer to use constraints, and there are situations in which both techniques are applicable. Constraints also find 
an application in the study of rare events, where a system may be studied at the top of a free energy barrier 
(see later), or for convenience when it is desired to fix a thermodynamic order parameter or ordering direction 
[17]. 

B3.3.3.5 NEIGHBOUR LISTS 

In the inner loops of MD and MC programs, we consider an atom / and loop over all atomsy to calculate the 
minimum image separations. If r ->r , the potential cutoff, the program skips to the end of the inner loop, 
avoiding expensive 
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calculations, and considers the next neighbour. In this method, the time to examine all pair separations is 

proportional to TV 2 ; for every pair, one must compute at least r- ; this still consumes a lot of time. Some 
economies result from the use of lists of nearby pairs of atoms. 

Verlet [ 13 ] suggested a technique for improving the speed of a program by maintaining a list of neighbours. 
The potential cutoff sphere, of radius r , around a particular atom is surrounded by a 'skin', to give a larger 
sphere of radius r^ t , as shown in figure B3.3.6. At the first step in a simulation, a list is constructed of all the 
neighbours of each atom, for which the pair separation is within r^ v Over the next few MD time steps, only 
pairs appearing in the list are checked in the force routine. From time to time the list is reconstructed: it is 
important to do this before any unlisted pairs have crossed the safety zone and come within interaction range. 
It is possible to trigger the list reconstruction automatically, if a record is kept of the distance travelled by 
each atom since the last update. The choice of list cutoff distance r list is a compromise: larger lists will need to 
be reconstructed less frequently, but will not give as much of a saving on cpu time as smaller lists. This choice 
can easily be made by experimentation. 



Figure B3.3.6. The Verlet list on its construction, later, and too late. The potential cutoff range, and the list 
range, are indicated. The list must be reconstructed before particles originally outside the list range have 
penetrated the potential cutoff sphere. 

For larger systems (N> 1000 or so, depending on the potential range) another technique becomes preferable. 
The cubic simulation box (extension to noncubic cases is possible) is divided into a regular lattice of n c x n c x 
n c cells; see figure B3.3.7 . These cells are chosen so that the side of the cell r ^ = L/n c is greater than the 
potential cutoff distance r . If there is a separate list of atoms in each of those cells, then searching through the 
neighbours is a rapid process: it is only necessary to look at atoms in the same cell as the atom of interest, and 
in nearest neighbour cells. The cell structure may be set up and used by the method of linked lists [45, 60 ]. 
The first part of the method involves sorting all the atoms into their appropriate cells. This sorting is rapid, 
and may be performed at every step. Then, within the force routine, pointers are used to scan through the 
contents of cells, and calculate pair forces. This approach is very efficient for large systems with short-range 
forces. A certain amount of unnecessary work is done because the search region is cubic, not (as for the Verlet 
list) spherical. 
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Figure B3.3.7. The cell structure. The potential cutoff range is indicated. In searching for neighbours of an 
atom, it is only necessary to examine the atom's own cell, and its nearest-neighbour cells. 


B3. 3.3.6 LONG-RANGE FORCES 


Many realistic simulations will involve the Coulomb interaction between charges, which decreases with 


separation as r , and the dipole-dipole interaction, which decreases as r . These cannot be treated simply by 
applying a spherical cutoff: it is essential to consider the effects of the surrounding medium. Two somewhat 
different techniques have been used in the majority of computer simulations to handle long-range forces: the 
reaction field method and the Ewald sum. 


In the reaction field method, the space surrounding a dipolar molecule is divided into two regions: (i) a cavity, 
within which electrostatic interactions are summed explicitly, and (ii) a surrounding medium, which is 
assumed to act like a smooth continuum, and is assigned a dielectric constant s^. Ideally, this quantity will be 

equal to the dielectric constant s of the liquid itself, but calculating this, of course, is frequently one of the 
goals of the simulation, not one of the input parameters. The essence of the reaction field method is to 
calculate the total dipole moment of the molecules in the cavity, hence obtaining the polarization of the 
surrounding continuum, and to use this to work out the reaction field on the molecule at the centre. This 
supplements the direct electrostatic interaction with molecules in the cavity, in the calculation of the total 
energy. The reaction field method was used by Barker and Watts [ 61 ] in early simulations of water and has 
been discussed by Neumann and Steinhauser [62] and Patey et al [63]. 

In the Ewald method, a lattice sum is performed over charges within the periodically repeating simulation 
box. This is a subtle matter, since the sum, for Coulomb potentials, is only conditionally convergent: the result 
depends on the order of terms. Nonetheless, the procedure has been carefully analysed by de Leeuw et al [ 64 ] 
and Felderhof [65]. To make the summation a practical proposition, a trick is used: each point charge is 
screened by a surrounding, Gaussian, charge distribution, which makes the interactions short-ranged: these 
interactions are tackled in real space in the usual way. The contribution of an equal and opposite set of 
Gaussians is tackled in reciprocal space, using Fourier transforms. The choice of the width of the Gaussians is 
a parameter which may be varied to optimize the speed of calculation (for a given accuracy): Perram et al [ 66 ] 
have shown that the optimal choice leads to an algorithm whose expense grows as N . 

When carried out properly, the results of the reaction field method and the Ewald sum are consistent [67]. 
Recently, the reaction field method has been recommended on grounds of efficiency and ease of programming 
[68,69]. The 
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expense of the Ewald method, particularly as the system size grows, has led to the search for alternative 
formulations [see, e.g., 70]. Recently, the practical implementation of the Ewald method has been 
significantly improved. The smooth particle mesh Ewald method, inspired by the approach of [45], employs a 
mesh and an interpolation scheme to allow evaluation of the reciprocal space sums using fast Fourier 
transforms [71, 72]. This approach has been incorporated into a standard code [ 49 ] and seems very promising 
for large biomolecular systems. It has to be said that there are still some subtleties involved in the handling of 
long-range forces [see, e.g., 73] and the reader should consult carefully the references if approaching the 
simulation of such systems from the beginning. 


B3.3.4 MONTE CARLO 

It is important to realize that MC simulation does not provide a way of calculating the statistical mechanical 
partition function: instead, it is a method of sampling configurations from a given statistical ensemble and 
hence of calculating ensemble averages. A complete sum over states would be impossibly time consuming for 
systems consisting of more than a few atoms. Applying the trapezoidal rule, for instance, to the 
configurational part of £? NVT , entails discretizing each atomic coordinate on a fine grid; then the 
dimensionality of the integral is extremely high, since there are 37V such coordinates, so the total number of 
grid points is astronomically high. The MC integration method is sometimes used to estimate 
multidimensional integrals by randomly sampling points. This is not feasible here, since a very small 
proportion of all points would be sampled in a reasonable time, and very few, if any, of these would have a 
large enough Boltzmann factor to contribute significantly to the partition function. MC simulation differs 
from such methods by sampling points in a nonuniform way, chosen to favour the important contributions. 

B3.3.4.1 IMPORTANCE SAMPLING 

MC simulation is a method of concentrating the sampled points in the important regions, namely the regions 

with high Boltzmann factor e"P : a random walk is devised, moving from one point to the next, with a biasing 
probability chosen to generate the desired distribution. Unfortunately, a consequence of this approach is that it 
is no longer possible to estimate the partition function itself, merely ratios of sums over states, that is, 
ensemble averages. Suppose that we have succeeded in selecting states y with probability proportional to Q(F) 
= exp{-P?i(r)}. Then, if we have conducted N t 'observations' or 'steps' in the process, the ensemble average 
becomes an average over steps 

l Vr 
i "' ml 

The Boltzmann weight appears implicitly in the way the states are chosen. The form of the above equation is 
like a time average as calculated in MD. The MC method involves designing a stochastic algorithm for 
stepping from one state of the system to the next, generating a trajectory. This will take the form of a Markov 
chain, specified by transition probabilities which are independent of the prior history of the system. 

Write Q{T) = £? r treating it as a component of a (very large) column vector. Consider an ensemble of systems 

all evolving at once. Specify a matrix whose elements 7i rn <— Y give the probability of going to state T' from 
state y, for 
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every pair of states. The matrix must satisfy Z 7i rn <— T = 1, to conserve probability. At each step, 
implement jumps with this transition matrix. This generates a Markov chain of states, i.e., one in which the 
transition probabilities do not depend on the history. Feller's theorem [ 74 ] tells us that, subject to some 
reasonable conditions, there exists a limiting (equilibrium) distribution of states and the system will tend 
towards this limiting distribution. (Recently [75], it has been shown that the Markov condition can be relaxed, 
and the system will still behave in this way.) A little thought shows that the limiting distribution will satisfy 


QV = /^Jtr^rQr 


which is a matrix eigenvalue equation. The eigenvector is already known: it is the Boltzmann distribution. 
The MC method is specified by choosing a transition matrix which satisfies this equation. One way of 
guaranteeing this is to ensure that 

Jrr*-r'ffr H = jrr'i-rCr 

which is usually termed the microscopic reversibility condition. An immediate consequence of this is that the 
ratio of probabilities G rn /£? r is equal to the ratio of transition matrix elements 7i rn r /7i r rn . This 

relationship is analogous to that relating the equilibrium constant for a chemical reaction to the ratio of 
forward and backward rate constants. 

The most commonly used prescription [ 76 ] is 

Tr. r = I - ^ fl"r* r othenvi&e. 

Here, an underlying matrix, with elements a rn r , sets the probability of attempting a move like r" <— T, 
and the other factor gives the probability of accepting such a move. This scheme only requires a knowledge of 
the ratio £? rn /£ r : 

min(l, Qr/Qr) = min(] J e-^ IVH{n) ) = mk(l p t~ fim ). 

It does not require knowledge of the factor normalizing the £, i.e., the partition function. For atomic and 
molecular systems, the partition function is split into a product of 'ideal' (exactly calculable) and 'excess' 
terms: the position and momentum distributions also factorize, and we wish to sample 

Q SVT (r)otwp[-flV{r)}- 

The prescription for accepting or rejecting moves is exactly as written before, but with Vreplacing "ft. 
Assuming that the interaction potential is short-ranged, it is not necessary to perform a complete recalculation 
of Vevery time an atom is moved: just the part involving that atom. For a given trial move, this is done twice: 
once before the attempted move and once after. Some improvement in efficiency may be obtained by using 
neighbour lists, as described earlier for MD. 
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Selecting trial moves in an unbiased way typically means (a) choose an atom 'randomly', with equal 
probability from the complete set; (b) displace it by random amounts in the x, y and z directions, chosen 


independently and uniformly from a predefined range (symmetric about the origin). These choices use a 
random number generator; the quality of the random numbers may be an issue. A key aspect of move 
selection is that the probabilities for attempting forward and reverse moves must be equal, so a rn r = a r 
rn . For the above prescription, it should be evident that this is true, by considering the number of ways of 
selecting an atom, and the number of positions it might be moved to (assuming a fine discretization of space). 
In the case of a rigid molecule, move selection will include a procedure for randomly rotating a molecule in 
an unbiased way. The magnitudes of trial moves are parameters of the method, chosen to give a reasonable 
acceptance rate, traditionally 50% or so. There is no special reason for this value. Ideally, for every study, one 
would investigate which choice gives the most efficient sampling of phase space. 

The analysis of Manousiouthakis and Deem [ 75 ] mentioned above has demonstrated that it is also correct to 
choose atoms sequentially rather than randomly: it has been tacitly assumed for many years that this violation 
of the Markovian restriction is acceptable, so a proof of this kind is very welcome. 

B3.3.4.2 WEIGHTED AND BIASED SAMPLING 

It is useful to write down here the basic formulae for sampling with an additional weight function applied, 
sometimes called non-Boltzmann or umbrella sampling, and for sampling when the selection of trial moves is 
done in a biased way, i.e., the a matrix is not symmetrical. 

A weight factor W(T) may be introduced in the MC sampling algorithm, to generate a modified, or 
'weighted', distribution, 

<? w «w<r)expi-/m(n}. 

The usual MC procedure is adopted, with trial moves r' <— T selected as usual, but now accepted with 
probability 


min 


rmn e ) 


In the calculation of ensemble averages, we correct for the weighting as follows 


W- 


{\/yv) w 


where (. . .) w represents the weighted simulation averages. This kind of sampling may be useful when the 
most important states for our purposes are not those which have the highest weights in the canonical 
ensemble: for example, when we wish to compute a free energy difference between two states. We will return 
to this later. 

Biased move selection means that a pn r ^a p rR . Suppose that we wish, nonetheless, to sample the 
canonical distribution. To do this, we need to calculate the ratio a rn r /a r rn as well as £? rn /£? r = e"P . 
Then we accept the move with probability 
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A consideration of the transition probabilities allows us to prove that microscopic reversibility holds, and that 
canonical ensemble averages are generated. This approach has greatly extended the range of simulations that 
can be performed. An early example was the preferential sampling of molecules near solutes [77], but more 
recently, as we shall see, polymer simulations have been greatly accelerated by this method. 


B3.3.5 SIMULATION IN DIFFERENT ENSEMBLES 

It is very convenient to be able to choose a nonstandard ensemble for a simulation. Generally, it is more 
straightforward to do this in MC, but MD techniques for various ensembles have been developed. We 
consider MC implementations first. 

B3.3.5.1 MC IN DIFFERENT ENSEMBLES 

The isothermal-isobaric ensemble corresponds to a system whose volume and energy can fluctuate, in 
exchange with its surroundings at specified NPT. The thermodynamic driving force is the Gibbs free energy G 
= A+ PV. The configurational distribution function may be written 

Here we have introduced scaled coordinates s^ = L "V^ where L is the box length (assumed cubic). 

This ensemble is a weighted superposition of NVT ensembles for different volumes. A typical MC sweep 
consists of N attempted single-particle moves, exactly as for constant-TWTMC, followed by one attempt to 
scale, homogeneously, the volume of the simulation box, together with the coordinates of all the particles in it. 
This is accepted or rejected so as to generate the above distribution. One prescription for selecting the volume 
move is to attempt to change V— » V = V + 8 V where 8 V is uniformly sampled from an interval [-8 V max< . 
.8 V max ]. The new box length is computed, and all the particle coordinates scaled by an appropriate factor; 
then the new potential energy is computed. Assuming that the selection of 8 Vis unbiased, the probability 
ratio to use in the Metropolis prescription is just the ratio of the two ensemble densities, ^ NPT (f /, )/^ N p T (^0 

and the move is accepted with probability 

mirtf],^) whcic SW = JV+ PSV - NkT\n(V'fV). 

Here 8V = V'-V and 8V= V-V. The maximum attempted volume change is chosen to give a reasonable 
acceptance rate, traditionally 35-50% or so; there is no firm reason for this choice. 

The above prescription for selecting volume changes is not unique. It may seem more natural to make 
random, 
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uniform, changes in the box length L; some people prefer to sample F uniformly [78], These choices are not 


precisely consistent with the accept/reject procedure described above. They can be regarded as unbiased 
sampling of a variable different from F(in which case a simple transformation of variables is needed to 
convert ^ NPT into the new form, and additional powers of Fwill appear in it) or as biased sampling in V- 

space, in which case a small correction factor (the same extra powers of V) will appear in the accept/reject 
procedure. An analysis of the forward and backward transition probabilities will give the appropriate 
acceptance/rejection criterion in each case. 

The grand canonical ensemble corresponds to a system whose number of particles and energy can fluctuate, in 
exchange with its surroundings at specified |u VT. The relevant thermodynamic quantity is the grand potential 
Q = A - |i N. The configurational distribution is conveniently written 

£> l(VT {s iy \N)ot {mr l V»z y exp[-pVl 

Here again we have introduced scaled coordinates s^ = L'^r^ where L is the box length (assumed cubic), z 
= exp{P|u}/A 3 is the activity, and A=h/A = h/VlrvrnkT is the thermal de Broglie wavelength. 

This ensemble is a weighted superposition of NVT ensembles with different values, of TV. As a rule of thumb, 
a typical MC sweep consists of TV attempted moves, each of which is chosen randomly to be (i) a displacement 
(handled exactly as in constant-TVPTMC); (ii) the creation of a new particle at a randomly selected position; 
(iii) the destruction of a randomly selected particle from the system. The probabilities for attempting creation 
and destruction must be equal (for consistency with what follows), but they need not be equal to the 
probability for attempting displacement (although they often are). 

For a creation attempt, a position is chosen uniformly at random within the box, and an attempt made to create 
a new particle there. The probability ratio for creation is: 

e i *vT(N + }}_ zV _ # aK%n ( „„ tel 


e f *YTim iv-i 


cxpi-^avj = Q7tpi-psz mM \ 


where 8V = V'-V is the potential energy change associated with inserting the new particle. In a Metropolis 
scheme, the creation attempt is accepted with probability min(l, expj-pS.Z 01 " 6 ^}). 

For a destruction attempt, one of the existing TV particles is selected at random, and an attempt made to destroy 
it. The probability ratio to use is 

QnvriN) zV 

where 8V is the potential energy change associated with removing the particle. In a Metropolis scheme, the 

destruction attempt is accepted with probability min(l, exp{-pSii roy }). These expressions can be shown to 
satisfy microscopic reversibility [57]. 
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In a dense system, the acceptance rate of particle creation and deletion moves will decrease, and the number 
of attempts must be correspondingly increased: eventually, there will come a point at which grand canonical 
simulations are not practicable, without some tricks to enhance the sampling. 


B3.3.5.2 MD IN DIFFERENT ENSEMBLES 

In this section we discuss MD methods in the constant-TVKT ensemble, and the constant-TVPr ensemble. 

There are three general approaches to conducting MD at constant temperature rather than constant energy. 
One method, simple to implement and reliable, is to periodically reselect atomic velocities at random from the 
Maxwell-Boltzmann distribution [79]. This is rather like an occasional random coupling with a thermal bath. 
The resampling may be done to individual atoms, or to the entire system; some guidance on the reselection 
frequency may be found in [79], 

A second approach, due originally to Nos'e [ 80 ] and reformulated in a useful way by Hoover [81], is to 
introduce an extra 'thermal reservoir' variable into the dynamical equations: 

Ti =Pifm Pi = fi -$Pi 


f = 


■W-H-l 


Here £ is a friction coefficient which is allowed to vary in time; Q is a thermal inertia parameter, which may 
be replaced by v T , a relaxation rate for thermal fluctuations; g«37Vis the number of degrees of freedom. 
Tstands for the instantaneous 'mechanical' temperature. It may be shown that the distribution function for the 
ensemble is proportional to exp(-0H'} where H' = "W +■ z;3NkT$ 2 fv*. These equations lead to the following 
time variation of the system energy ft = J^ pf^/lm'* V, and for the variable -ft': 

ia ra 

If T> T, i.e., the system is too hot, then the 'friction coefficient' Q will tend to increase; when it is positive the 
system will begin to cool down. If the system is too cold, the reverse happens, and the friction coefficient may 
become negative, tending to heat the system up again. In some circumstances, this approach generates non- 
ergodic behaviour, but this may be ameliorated by the use of chains of thermostat variables [82]. Tobias et al 
[ 83 ] give an example of the use of this scheme in a biomolecular simulation. 

As an alternative to sampling the canonical distribution, it is possible to devise equations of motion for which 
the 'mechanical' temperature is constrained to a constant value [84, 85, 86]. The equations of motion are 

r r = p;fm p t = f;-Spi f = J^ J) a PiJ J^ Pto' 
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Here the friction coefficient C, is completely determined by the instantaneous values of the coordinates and 
momenta. It is easy to see that the kinetic energy fC = J^. pr /2wiis now a constant of the motion: 

jft fa ia 

It is possible to devise extended-system methods [79, 87] and constrained-system methods [ 88 ] to simulate 
the constant-TVPr ensemble using MD. The general methodology is similar to that employed for constant- 


NVT, and in the course of the simulation the volume V of the simulation box is allowed to vary, according to 
the new equations of motion. A useful variant allows the simulation box to change shape as well as size [ 89 , 
90 ]. It is also possible to extend the Liouville operator-splitting approach to generate algorithms for MD in 
these ensembles; examples of explicit, reversible, integrators are given by Martyna et al [91]. 


B3.3.6 FREE ENERGIES, CHEMICAL POTENTIALS AND WEIGHTED 
SAMPLING 

A major drawback of MD and MC techniques is that they calculate average properties. The free energy and 
entropy functions cannot be expressed as simple averages of functions of the state point y. They are directly 
connected to the logarithm of the partition function, and our methods do not give us the partition function 
itself. Nonetheless, calculating free energies is important, especially when we wish to determine the relative 
thermodynamic stability of different phases. How can we approach this problem? 

B3.3.6.1 FREE ENERGY DIFFERENCES 

It is possible to calculate derivatives of the free energy directly in a simulation, and thereby determine free 
energy differences by thermodynamic integration over a range of state points between the state of interest and 
one for which we know A exactly (the ideal gas, or harmonic crystal for example): 


--ft f V? 

= I Edfi or A 2 -Ai = - I 
J Pi J** 


(fiA) 2 -lfiA)i= I Edfi or A 2 -A [ =-f PdV 


This is reliable and fairly accurate, if tedious. It was used, for example, by Hoover [ 92 ] to locate the melting 
parameters for soft-sphere systems. The only point to watch out for is that one should not cross any phase 
transitions in taking the path from 1 to 2: it must be reversible. 
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A free energy difference between two systems with energy functions 7i Q and "Hp respectively, may be written 
in a way analogous to equation b3. 3. 3 

Ai = A<,-ftrLii(exp{-0A7fl)o 

where (. . .) is an ensemble average for system and AW = H^ - H^. In the extreme case Ti.^=H^ this would 

give an unweighted estimate of the partition function of system 1, but would be extremely poorly sampled for 
the reasons discussed in section b3. 3.2 . Reasonable estimates will result when the two systems do not differ 
'too much', so that contributing values of the Boltzmann factor are given a significant weight by the sampling 
over states in system 0. One famous example of this is the test-particle insertion formula, equation (B3.3.5) , 
for estimating the chemical potential, where system 1 contains an additional particle. Another example is 
where a molecule in the system can be mutated into another. The efficiency of the sampling may depend 

critically on the direction of the perturbation change: estimating |u ex by particle removal, for instance, is 
formally possible but usually less accurate than particle insertion. Kofke and Cummings [93] have reviewed 
various approaches in this field and make the general recommendation that the change should take place in the 
direction of decreasing entropy. 

B3.3.6.2 HISTOGRAM RE-WEIGHTING 


A way of looking at the points raised in the previous section is to compare energy distributions in two systems 
whose free energies we wish to relate. In particular, consider measuring, in a simulation of system 0, the 
function P (AE), i.e., the probability density per unit AE of configurations for which W Q and H^ differ by the 

prescribed amount AE. The distribution Fj(AE) may be similarly calculated by simulating system 1. These 

two functions may be straightforwardly related [94]: 

Therefore, apart from an unknown constant fiAA , and a known linear term pA£, these are the same function. 
Bennett [ 94 ] suggested two graphical methods for determining pA^ from Pq(AE) and P^(AE), which rely on 

the two distributions, at worst, nearly overlapping (i.e., being measurable, with good statistics, for the same or 
similar values of AE). To broaden the sampling into the wings of the distribution, thereby improving statistics 
and extending the overlap region, we may use weighted sampling as described in section b3. 3.4. 2 . There are 
many related approaches, variously called umbrella [95], multicanonical [ 96 ] and entropic [97] sampling, 
simulated tempering [98] and expanded ensembles [99]. 

Windowing is a special case of umbrella sampling: the weight function is a constant inside a specified region 
of configuration space, and zero outside. In MC we simply reject moves which would take the system outside 
the window, and otherwise proceed as usual. This allows us to examine a distribution function, and hence a 
free energy curve, piece by piece, matching up the resulting curves afterwards. The way to do this 
combination of histograms has been discussed by Ferrenberg and Swendsen [ 100 ], and the statistical errors in 
histogram re-weighting have been discussed by Swendsen [ 101 ] and Ferrenberg et al [ 102 ], Ultimately, this 
approach leads back to the idea of performing simulations in a (nearly)-microcanonical ensemble and relating 
the results at nearby energies, as we do in thermodynamic integration; as emphasized in the review of 
Donweg [ 103 ], 'for any thermodynamic integration procedure there is an equivalent multistage sampling or 
histogram procedure, and vice versa'. 
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B3.3.6.3 THE CHEMICAL POTENTIAL 

The ensemble average in the Widom formula, ((exp{~p v fe t })), is sometimes loosely referred to as the 
'insertion probability'. It becomes very low for dense fluids. For example, for hard spheres, we can use the 
scaled-particle theory [ 104 ] or the Carnahan-Starling equation of state [ 105 ] to estimate it (see figure B3.3.8 . 

The insertion probability falls below 10" 4 , well before the freezing transition at r| « 0.49. Similar estimates 
can be made for the Lennard- Jones fluid. The lower this factor becomes, the poorer the statistics, and the more 

unreliable will be the estimate of |u ex . The problem is particularly acute for dense molecular fluids where, as a 
first guess, one could take the overall Boltzmann factor to be the product of the individual atomic values. 



Figure B3.3.8. Insertion probability for hard spheres of various diameters (indicated on the right) in the hard 
sphere fluid, as a function of packing fraction r|, predicted using scaled particle theory. The dashed line is a 
guide to the lowest acceptable value for chemical potential estimation by the simple Widom method. 

A simple method of improving the efficiency of test particle insertion [ 106 , 107 , 108 and 109 ] involves 
dividing the simulation box into small cubic regions, and identifying those which would make a negligible 
contribution to the Widom formula, due to overlap with one or more atoms. These cubes are excluded from 
the sampling, and a correction applied afterwards for the consequent bias. 

Another trick is applicable to, say, a two-component mixture, in which one of the species, A, is smaller than 
the other, B. From figure B3.3.8 for hard spheres, we can see that A need not be particularly small in order 
for the test particle insertion probability to climb to acceptable levels, even when insertion of B would almost 
always fail. In these circumstances, the chemical potential of A may be determined directly, while that of B is 
evaluated indirectly, relative to that of A. The related 'semi-grand' ensemble has been discussed in some 
detail by Kofke and Glandt [ 110 ]. 
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This naturally leads to the idea of estimating |u ex by gradual insertion [ 111 , 112 and 113 ]. This can be thought 
of as a thermodynamic integration pathway, connecting the states with N and 7V+ 1 particles via a set of 

intermediate points, characterized by a parameter X, 0< X <1 which determines the degree to which the extra 
'^-particle' is 'switched on'. A MC scheme is constructed, which (in addition to the usual moves) allows X to 
vary either continuously or in predefined discrete jumps. Then the chemical potential is expressed |i = A 
A =A (X = 1) - A (^=0). It is advantageous to apply a weighting function (see section b3. 3. 4. 2 ) [ 112 , 

cX. cX. cX. *■*" 

113 ], to ensure more or less uniform sampling of the different X states. Consider the probability histogram T 
(X) of the sampled values of X during the runs. Without an external biasing potential this will be directly 
related to a Landau free energy 


T{?,) ocexp{-^(X)} 


where 7"(l)-7"(0) = A A Qx is the desired free energy difference; to obtain a uniform distribution, a weight 
function VV-(r)ocexp{-p v F(A,)} with a biasing potential ^(X) = -/"(A,) would be used. This ideal weighting 
function is not known at the start of the simulation, but an initial guess may be iteratively refined from the 
measured 'P(X) in a series of runs. 

It is also advantageous to ensure that the A,-particle samples a wide range of positions in the fluid: this is 
achieved by attempting large-scale moves to new, randomly-selected positions from time to time, and also 
frequently attempting exchanges of position with a randomly-selected full-size particle. The former moves 
will have a high probability of success when X is small, and the latter when X is large. Camp et al [ 114 ] 
provide an example of this method in action, for a model of liquid crystals. 

B3.3.6.4 FREE ENERGY OF SOLIDS 

Early attempts to calculate solid-state free energies, and hence locate the melting transition, introduced the 
idea of conducting a thermodynamic integration along an artificial pathway [ 115 , 116 ]. Each atom is 
artificially restricted to a single cell in space so that, as the density is lowered, the system converts more or 
less smoothly into a kind of 'lattice gas'. More recently, Frenkel et al [ 117 ] proposed a method in which X 
corresponds to switching between the true potential and a harmonic spring, which couples each atom at 
instantaneous position r. to its ideal lattice site r,- <J) : 

V<X)=AV| + (!-A)V > 

Here V^ is the original, many-body potential energy function, while V Q is a sum of single-particle spring 
potentials proportional to |f* f .-r-" J | . As X — » the system becomes a perfect Einstein crystal, whose free energy 

is exactly calculable. Recently, a combination of approaches has been used [ 118 ]: first the solid is subjected to 
a set of one-particle spring potentials, and then the influence of the interparticle forces is reduced to zero by 
expanding the crystal. This method was used to locate the melting transition for a model of nitrogen at T = 
300 K. 

Density functional theory arguments [ 119 , 120 ] suggest that the springs should be of the correct strength to 
produce the same mean-squared displacement in the Einstein limit as in the original crystal. More precisely, it 
is best to switch over to a one-body potential in such a way that the one -body density p(r) is the same at all 
points along the integration path. Such a path is guaranteed not to traverse a first-order phase transition. 
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B3.3.7 CONFIGURATION-BIASED MC 

The biased-sampling approach may be considerably generalized, to allow the construction of MC moves step- 
by-step, with each step depending on the success or failure of the last. Such a procedure is biased, but it is 
then possible to correct for the bias (by considering the possible reverse moves). The technique has 
dramatically speeded up polymer simulations, and is capable of wider application. 

The idea may be illustrated by considering first a method for increasing the acceptance rate of moves (but at 
the expense of trying, and discarding, several other possible moves). Having picked an atom to move, 
calculate the new trial interaction energy v t for a range of trial positions t = 1. . . k. Pick the actual attempted 
move from this set, with a probability proportional to the Boltzmann factor. This biases the move selection, 
towards high-probability states, but we can calculate the contribution of the bias to a rn ^_ r . Then we must 


calculate a p rn for the hypothetical reverse move: we do this by selecting k-l possible trial positions 
around the new position of the atom, plus the place it originally came from, making k in all. The ratio a r 

/a rn r is used in the accept/reject decision, along with the relevant Boltzmann factors (see section 
b3.3.4.2 ). For k = 1 this gives the usual Metropolis prescription; for k^oo it is easy to show that the 
acceptance rate tends to unity, but the method becomes very expensive, since all the work has gone into 
calculating the biasing factors. 

The expense is justified, however, when tackling polymer chains, where reconstruction of an entire chain is 
expressed as a succession of atomic moves of this kind [ 121 ], The first atom is placed at random; the second 
selected nearby (one bond length away), the third placed near the second, and so on. Each placement of an 
atom is given a greater chance of success by selecting from multiple locations, as just described. Biasing 
factors are calculated for the whole multi-atom move, forward and reverse, and used as before in the 
Metropolis prescription. For further details see [ 122 , 123 , 124 , 125 ]. A nice example of this technique is the 
study [ 126 , 127 ] of the distribution of linear and branched chain alkanes in zeolites. 


B3.3.8 PHASE TRANSITIONS 

Here we discuss the exploration of phase diagrams, and the location of phase transitions. See also [ 128 , 129 , 
130 , 131 ] and [22, chapters 8-14]. Very roughly we classify phase transitions into two types: first-order and 
continuous. The fact that we are dealing with a finite-sized system must be borne in mind, in either case. 

B3.3.8.1 FIRST-ORDER AND CONTINUOUS TRANSITIONS 

At a continuous phase transition, a correlation length £, (see section b3. 3. 2.1 ) diverges and an order parameter, 
typically the ensemble average of the corresponding dynamical variable, becomes macroscopically large. The 
divergence heralding the transition is describable in terms of universal exponent relations. Effects of finite 
size close to continuous phase transitions are well studied [24, 132 ]. By contrast, a first-order phase transition 
is abrupt, as one phase becomes thermodynamically more stable than another; there are no transition 
precursors. In the thermodynamic limit, there is a step-function discontinuity in most properties, including 
thermodynamic derivatives of the free energy. Again it is possible to describe the effects of finite size [ 132 , 
133 ]. 
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For both first-order and continuous phase transitions, finite size shifts the transition and rounds it in some 
way. The shift for first-order transitions arises, crudely, because the chemical potential, like most other 
properties, has a finite-size correction ii(N)-ii(co)~0(l/N). An approximate expression for this was derived by 
Siepmann et al [ 134 ]. Therefore, the line of intersection of two chemical potential surfaces (^(T^P) and |i n 
(r,P) will shift, in general, by an amount 0(1/N). The rounding is expected because the partition function only 
has singularities (and hence produces discontinuous or divergent properties) in the limit L^co; otherwise, it is 
analytic, so for finite TV the discontinuities must be smoothed out in some way. The shift for continuous 
transitions arises because the transition happens when ^-^ L for the finite system, but when ^^oo in the 
infinite system. The rounding happens for the same reason as it does for first-order phase transitions: whatever 
the nature of the divergence in thermodynamic properties (described, typically, by critical exponents) it will 
be limited by the finite size of the system. 

In either case, first-order or continuous, it is useful to consider the probability distribution function for 
variables averaged over a spatial block of side L; this may be the complete simulation box (in which case we 


must specify the ensemble and boundary conditions) or it may be a sub-system. For purposes of illustration 
we shall not distinguish these possibilities. 

B3.3.8.2 CONTINUOUS PHASE TRANSITIONS 

Here we discuss only briefly the simulation of continuous transitions (see [ 132 , 135 ] and references therein). 
Suppose that the transition is characterized by a non- vanishing order parameter X and a corresponding 
divergent correlation length £. We shall be interested in the block average value X L = (%) L , where the L 
reminds us of the system size. In a magnetic system, X is the magnetization; in a fluid it might be the density. 
The basic idea of finite size scaling analysis is that the values of properties of the system are dictated by the 
ratio £JL 9 and that no other length scales enter the problem, near a critical point. Any property can be written 

X L = X x *(£/L) 

where X is the average value in the infinite system limit and O is some scaling function. There will be 
exponent laws dictating the behaviour of Xin the vicinity of the phase transition, and more scaling laws 
stating how £, behaves inside the function O. We can apply a scaling analysis to the distribution function P 
(X^) [ 136 , 137 ]. Actually at the critical point, the distribution can be calculated by simulation, or predicted by 
renormalization group theory [ 136 , 137 , 138 and 139 ]; different universal forms will be seen for different 
universality classes. Examination of these functions is a powerful way of locating and characterizing critical 
points, and in the critical region the histogram reweighting method is a particularly useful way of maximizing 
the information obtained from individual simulations. For example, the pre wetting critical point has been 
shown to lie in the d = 2 Ising universality class in this way [ 139 ]. A further example is the study of the 

critical point of the d = 2 Lennard- Jones fluid [ 140 , 141 ]. For this, long runs of order 10-10 sweeps were 
needed, but the system sizes were relatively small: N^lOO and 400. 

B3.3.8.3 FIRST-ORDER PHASE TRANSITIONS 

Consider simulating a system in the canonical ensemble, close to a first-order phase transition. In one phase, 
P NVT (£') is essentially a Gaussian centred around a value E^ while in the other phase the peak is around E n . 

Far from the transition, one or other of these will apply. Close to the phase transition we will see contributions 
from both Gaussians, and a double-peaked distribution. The weight of each Gaussian changes as the 
temperature is varied. Thus, a smooth 
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crossover occurs from one branch of the equation of state E(T)=(h)^y T to the other. In the transition region 
we may expect to see anomalies such as an increased specific heat: the double-peaked distribution is wider 
than its constituent single-peaked ones, and recall that C y is linked to (87i 2 ). The corresponding Landau free 
energy 

F N vT{B) = -k7lnV NV T(E) 

has two minima separated by a barrier. The high-probability, low free energy values correspond to the single 
phase configurations; the intermediate values are for mixtures of the two phases, with an interfacial free 
energy penalty. 

In the microcanonical ensemble, the signature of a first-order phase transition is the appearance of a 'van der 
Waals loop' in the equation of state, now written as T(E) or fi(E). The fi(E) curve switches over from one 


branch, phase I, of the equation of state to the other, phase II, tracing out a loop in the transition region. This 
loop is a finite-size effect, due to the interfacial free energy contributions in the transition region, just 
mentioned. For a larger system size the loop will flatten out, becoming a horizontal line in the thermodynamic 
limit, joining the two coexisting energies at the transition temperature; for N—>co the interfacial properties 
contribute a negligible amount to the total free energy. (Calling it a 'van der Waals loop' is therefore 
misleading: it has no connection with the loop in the approximate van der Waals equation of state for fluids, 
which in any case is independent of system size.) It is possible to inter-relate the form of this loop, and the 
double-peaked structure of the energy distribution, with the thermodynamic coexistence conditions [1, 132 , 
142, 143, 144, 145]. 

The previous discussions translate directly over into pressure-volume variables, if we compare the constant- 
NVT and constant-TVPr ensembles. Double-peaked distributions of volumes are seen near a transition at 
constant pressure. 

Direct coexistence of solid and fluid phases of hard spheres and disks was observed in the early simulations of 
Wood and Jacobson [ 146 ] and Alder and Wainwright [ 147 , 148 ]; the appearance of a 'van der Waals loop' in 
the equation of state was explained in some detail shortly afterwards [ 142 , 149 , 150 ]. Very detailed analyses 
of this situation, especially in relation to spin systems, have appeared in recent years [ 143 , 144 , 145 , 151 and 
152 ]. Histogram reweighting can be useful here [ 153 , 154 ] and measuring the height of the interfacial free 
energy barrier as a function of system size has been recommended as a test for first-order behaviour [ 155 ]. A 
nice example of this approach for a spin model of a liquid crystal, which exhibits a tricky weak first-order 
transition, is the work of Zhang et al [ 156 , 157 ], following on from earlier work by Fabbri and Zannoni [ 158 ], 
An example for a strong first-order transition is the study of melting and nucleation barriers in the Lennard- 
Jones system [ 159 ] and models of metallic systems [ 160 ]. This approach used windowing and biased 
sampling techniques. 

Recently, Orkoulas and Panagiotopoulos [ 161 ] have shown that it is possible to use histogram reweighting 
and multicanonical simulations, starting with individual simulations near the critical point, to map out the 
liquid-vapour coexistence curve in a very efficient way. 

Simulations in the Gibbs ensemble attempt to combine features of Widom's test particle method with the 
direct simulation of two-phase coexistence in a box. The method of Panagiotopoulos et al [ 162 , 163 ] uses two 
fully-periodic boxes, I and II. 

In the simplest version, a one-component system is simulated at a given temperature Tin both boxes; particles 
in different boxes do not interact directly with each other; however, volume moves and particle creation and 
deletion 
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moves are coupled such that the total volume Fand the total number of particles TV are conserved. With 
appropriate acceptance/rejection probabilities for the volume exchange and particle exchange moves, together 
with the usual MC procedure for moving particles around within the two boxes, the thermodynamic 
conditions for mechanical and chemical equilibrium between the boxes are ensured. A typical MC cycle 
would consist of: one attempted move per particle in each box; one attempt to exchange volumes between 
boxes; a predetermined number of attempts to exchange particles. The technique has been reviewed by 
Panagiotopoulos [ 131 , 164 , 165 ] and Smit [ 129 ]. The partition function for the two-box system is simply the 
usual canonical sum over all possible states, including a sum over all the distributions of particles between the 
boxes such that Nj + 7V n = N, and an integral over all box volumes such that V l + V^ = V. The probability 
distribution function for the ensemble, and the acceptance and rejection rules for particle and volume 


exchanges, are easily derived. 

The characteristic feature of the technique is the behaviour of the system if the overall density Nl Flies in a 
two-phase region. For a single simulation box, both phases would appear, with an interface between them; in 
the Gibbs ensemble, the interface free energy penalty can be avoided by the system arranging to have each 
phase entirely in its own box. This phase separation happens automatically during the equilibration stage of 
the simulation. 

The great advantages of the technique are its avoidance of interfacial properties, and the semi-automatic way 
that it converges on the coexisting densities without the need to input chemical potentials or guess equations 
of state. Unavoidably, it suffers from the same problems as the Widom test-particle method: at high density 
the particle exchange moves are accepted with very low probability, and special techniques are required to 
overcome this. It is essential to monitor the success rate of exchanges, and carry out enough of them to ensure 
that a few percent of molecules are exchanged at each step. 

The Gibbs ensemble method has been outstandingly successful in simulating complex fluids and mixtures. 
For a multicomponent system, it is possible to simulate at constant pressure rather than constant volume, as 
separation into phases of different compositions is still allowed. The method allows one to study 
straightforwardly phase equilibria in confined systems such as pores [ 166 ], Configuration-biased MC methods 
can be used in combination with the Gibbs ensemble. An impressive demonstration of this has been the 
determination by Siepmann et al [ 167 ] and Smit et al [ 168 ] of liquid-vapour coexistence curves for n-alkane 
chain molecules as long as 48 atoms. 

As we have seen, insertion of small molecules can be dramatically easier than large ones; this leads to a 
'semi-grand' version of the Gibbs ensemble [ 131 , 169 , 170 ]: the smaller particles are exchanged between the 
boxes, while moves that interconvert particle species are carried out within the boxes. The ideas seen before 
for computing the chemical potential by gradual insertion [ 111 , 112 , 113 ] can be naturally generalized to the 
Gibbs ensemble: at each stage a given molecule may be in an intermediate state of transfer between one box 
and another. As an example, Escobedo and de Pablo [ 171 ] use an expanded Gibbs ensemble for polymers, and 
Nath et al [ 172 ] have computed the liquid-vapour envelopes for long-chain alkanes, using different potential 
models and comparing with previous work [ 167 , 168 ]. 

B3.3.8.5 THERMODYNAMIC METHODS 

The alternative to direct simulation of two-phase coexistence is the calculation of free energies or chemical 
potentials together with solution of the thermodynamic coexistence conditions. Thus, we must solve (say) jlLj 
(P) = ^(P) at constant T. A reasonable approach [ 173 , 174 , 175 and 176 ] is to conduct constant-TVPr 
simulations, measure |u by test-particle insertion, and also to note that the simulations give the derivative d\ild 
P=(V)/N directly. Thus, conducting 
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one or two simulations may be enough for a preliminary fit to the equations of state jUj(,P), ^(P) allowing one 
to home in on the intersection point quite quickly. 

Once a point on the coexistence line has been found, one can trace out more of it using the approach of Kofke 
[ 177 , 178 ] to numerically integrate the Clapeyron equation 


(SI- 


Ah 


Here, A h=ho - h a is the difference in molar enthalpies of the coexisting phases, and A v is the difference in 
molar volumes; the suffix a indicates that the derivative is to be evaluated along the coexistence line. 

The method consists of solving the above equation in a standard step-by-step manner, for example using a 
predictor-corrector algorithm. The right-hand side is calculated by simulating both phases at constant T and P 
in separate, uncoupled boxes. At intervals, a small change in T(the independent variable) is made in both 
boxes, and this is accompanied by a change in P (the dependent variable) as dictated by the differential 
equation solver. The approach relies on a starting point at which the two phases are at thermodynamic 
equilibrium, A|u = 0; thereafter the Clapeyron equation, if solved accurately, should guarantee that equilibrium 
is maintained. The method has been applied to the liquid-vapour coexistence curve [ 177 , 178 ] and to the 
melting and sublimation curves [ 179 ] for the Lennard- Jones system; it was also extended by Agrawal and 
Kofke [ 180 ] to study the melting transition of a large family of soft-sphere systems, showing the emergence 
of the bcc phase as being stable relative to fee for high enough softness parameters. Various technical details 
of this approach have been discussed [ 178 , 179 ] and possible sources of inaccuracy considered. 

An example of the use of this method in a complex situation is the study by Bolhuis and Kofke [ 181 , 182 ] of 
the freezing of polydisperse hard spheres (a system in which there is a distribution of atomic diameters). A 
semi-grand ensemble imposes a distribution of chemical potential differences (for different hard-sphere 
diameters) on the system: the width of this distribution controls the polydispersity. The same chemical 
potentials (for all the different species) apply in two coexisting phases. The thermodynamic integration 
technique may then be used to map out the freezing-melting line in the pressure-polydispersity plane, starting 
from the monodisperse limit (simple hard spheres). The resulting phase diagram, in volume fraction- 
polydispersity variables, is shown in figure B3.3.9 . An important result is that fractionation between the two 
phases allows a highly polydisperse fluid to precipitate a solid which is only slightly polydisperse — never 
higher than 5.7% in terms of the average sphere diameter. 
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Figure B3.3.9. Phase diagram for polydisperse hard spheres, in the volume fraction ((|))-polydispersity (s) 
plane. Some tie-lines are shown connecting coexisting fluid and solid phases. Thanks are due to D A Kofke 
and P G Bolhuis for this figure. For further details see [ 181 , 182 ], 


B3.3.8.6 STUDIES OF INTERFACES 

Simulation of both bulk phases in a single box, separated by an interface, is closest to what we do in real life. 
It is necessary to establish a well defined interface, most often a planar one between two phases in 'slab' 
geometry. A large system is required, so that one can characterize the two phases far from the interface, and 
read off the corresponding bulk properties. Naturally, this is the approach of choice if the interfacial properties 
(for example, the surface tension) are themselves of interest. The first stage in such a simulation is to prepare 
bulk samples of each phase, as close to the coexisting densities as possible, in cuboidal periodic boundaries, 
using boxes whose cross sections match. The two boxes are brought together, to make a single longer box, 
giving the desired slab arrangement with two planar interfaces. There must then follow a period of 
equilibration, with mass transfer between the phases if the initial densities were not quite right. 

Equilibration of the interface, and the establishment of equilibrium between the two phases, may be very 
slow. Holcomb et al [ 183 ] found that the density profile p(z) equilibrated much more quickly than the profiles 
of normal and transverse pressure, P^z) and Pj(z), respectively. The surface tension is proportional to the z- 
integral ofPj^z)-Pj{z). The bulk liquid in the slab may continue to contribute to this integral, indicating lack 
of equilibrium, for very long times if the initial liquid density is chosen a little too high or too low. A recent 
example of this kind of study, is the MD simulation of the liquid-vapour surface of water at temperatures 
between 316 and 573 K by Alejandre et al [ 184 ], 
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B3.3.9 RARE EVENTS 

By definition, rare events happen infrequently, but this does not mean that they happen slowly. Accordingly, 
molecular simulations may contribute greatly to our understanding of such events, but some special sampling 
tricks will be needed since the 'natural' timespans of simulations are already very short. The simplest example 
is where the system crosses a free energy barrier from one region to another in phase space, and a single 
'reaction coordinate' can be identified to characterize the two stable regions and the transition state. The 
statistical mechanical background to barrier crossing rates, in terms of linear response theory, has been given 
by Chandler [2, 185 ]. The transition rate is typically a product of two factors: the equilibrium probability 
density for finding the system at the top of the barrier, and a dynamical quantity, essentially the inverse of a 
relaxation time for the system to settle from the barrier into one or other stable region. 

We have already discussed weighted sampling methods for exploring regions of high free energy, so the first 
part of this problem is tractable. The calculation of time-dependent functions, which start from the barrier top, 
is facilitated by the so-called 'blue-moon' ensemble [ 186 ] in which a constraint is applied to keep the system 
exactly on the desired hypersurface. This allows sampling of the starting conditions with good statistics; then 
the constraint may be released and subsequent dynamics accumulated. (Metric tensor factors associated with 
the constraint are discussed elsewhere [ 187 , 188 ].) To compute the time-dependent part of the barrier-crossing 
rate, special approaches have been developed to suppress transient behaviour and statistical noise [ 187 ]. 

In many cases, it may not be possible to identify a single reaction coordinate. A good example of a free- 
energy surface depending on two variables is found in the study by ten Wolde and Frenkel [ 189 ] of the 
mechanism of protein crystal nucleation, and the possible influence of fluctuations induced by a nearby 
metastable critical point. Here, the relevant variables are 'density' and 'crystallinity', as illustrated in figure 
B3.3.10 . At high levels of supercooling, well away from the hidden critical point, nucleation follows a route 
whereby a crystalline nucleus forms from the start: this involves a high free energy barrier. Close to the 
critical point, a different mechanism operates: critical fluctuations encourage formation of a liquid-like 
nucleus, and only after this has grown to a certain size does a crystal start to form. This pathway involves a 


much lower free energy barrier. This work is an example of the insight obtainable from simulations into a 
previously poorly understood area, namely the art of obtaining good protein crystals for structure 
determination, as well as illustrating the qualitatively different mechanisms that may operate under different 
conditions. 
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Figure B3.3.10. Contour plots of the free energy landscape associated with crystal nucleation for spherical 
particles with short-range attractions. The axes represent the number of atoms identifiable as belonging to a 
high-density cluster, and as being in a crystalline environment, respectively, (a) State point significantly 
below the metastable critical temperature. The nucleation pathway involves simple growth of a crystalline 
nucleus, (b) State point at the metastable critical temperature. The nucleation pathway is significantly curved, 
and the initial nucleus is liquidlike rather than crystalline. Thanks are due to D Frenkel and P R ten Wolde for 
this figure. For further details see [ 189 ], 

The more general problem of finding a transition pathway when the relevant reaction coordinates are not 
obvious, has recently been tackled [ 190 , 191 ]. The basic idea [ 192 ] is to generate chains of states linking the 
two stable states, through a weighted sampling procedure which makes no assumptions about the mechanism. 
The method is very general, but inevitably expensive. 


B3.3.10 QUANTUM SIMULATION USING PATH INTEGRALS 


In this section we look briefly at the problem of including quantum mechanical effects in computer 
simulations. We shall only examine the simplest technique, which exploits an isomorphism between a 
quantum system of atoms and a classical system of ring polymers, each of which represents a path integral of 
the kind discussed in [ 193 ]. For more details on work in this area, see [22, 194 ] and particularly [ 195 , 196 , 
197 ]. 


The coordinate representation of the density matrix, in the canonical ensemble, may be written 
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and correspondingly for the partition function 

Here we have adopted a Dirac bracket notation (. . .|. . .|. . .) which should be distinguished from the ensemble 
average (. . .). Actually evaluating this is tricky, because the Hamiltonian is the sum of the kinetic and 
potential energy operators, ft= £+ y, which do not commute. Hence 

When the exponent is small (e.g., at high temperature), reasonable approximations exist. This problem is 
attacked in a manner similar to that used to derive expressions for the propagator c \Lt, as a succession of small 

time step propagators, in section b3. 3. 3. 2 : we split the exponential up into smaller pieces. So, we write 
and insert this into the expression for the partition function 

Qwt = f dr< v V*>|e-^''V> f */''... e-^V*>). 

Now we do one of the standard quantum mechanical tricks, inserting the identity operator as a complete sum 
of states in the coordinate representation: 


= /dr<*V AY 


)<r w l 


in between each exponential. This will introduce P-\ additional integrations over coordinates. Each of the 
contributions (r ( ^ , |e~ J "*^|r* jWj ')is an un-normalized, off-diagonal, density matrix Q(r™\r^') evaluated at a 
temperature a factor P higher than the temperature of the real system. For more background on this approach 
to quantum mechanics, see [ 193 ], 

As in the case of the propagator, we shall be applying a symmetrical version of the Trotter formula [ 48 ] to the 
high-temperature density matrix 


-38- 


The potential energy part is diagonal in the coordinate representation, and we drop the hat indicating an 
operator henceforth. The kinetic energy part may be evaluated by transforming to the momentum 
representation and carrying out a Fourier transform. The result is 

Hi = ^<V(r^) + V<rf >} + . ■ - + V<r'*'» 

This is better understood with a picture: see figure B3.3.1 1. The discretized path-integral is isomorphic to the 
classical partition function of a system of TV ring polymers each having P atoms. Each atom in a given ring 

corresponds to a different 'imaginary time' pointy = 1 ... P. V(r^) represents the interatomic interactions 
(for example, Lennard- Jones) between the atoms of the real system. This couples together only 
correspondingly labelled atoms, i.e., atoms with the same index/?. So, between each pair of the original 
atoms, there are P such interactions, each one weaker than the true potential by a factor IIP. In addition, 
harmonic quantum 'springs' couple together successively indexed atoms within a ring polymer. We may 
simulate this classical ring polymer system by conventional MC or MD. 



Figure B3.3.11. The classical ring polymer isomorphism, for TV = 2 atoms, using P = 5 beads. The wavy lines 
represent quantum 'spring bonds' between different imaginary-time representations of the same atom. The 
dashed lines represent real pair-potential interactions, each diminished by a factor P, between the atoms, 
linking corresponding imaginary times. 

Temperature appears in the partition function in an unusual way. The average energy takes the form 

* - §JITMT + OW - Wp>- 

As P is increased, the partial cancellation between the kinetic part and the spring part may worsen the 
statistics on E. This has led to suggestions of alternative ways of estimating E [ 198 ], As P goes up, the springs 
become stronger, the interactions in V cl become (individually) weaker, and this leads to sampling problems. In 

MD, one needs to use multiple time step methods to ensure proper handling of the spring vibrations, and there 
is a possible physical bottleneck in the transfer of energy between the spring system and the other degrees of 
freedom which must be handled properly [ 199 ], In MC, one needs to use special methods to sample 
configuration space efficiently [ 200 , 201 ], 
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For fermions (especially) and bosons there are additional problems. Let pbe one of the TV! permutations of 
particle labels. Then the fermion density matrix £? F has the symmetry 

«,(r*W r*«V) = (-]) A Q F (Pr iN \ r (Ny ) - (-l/ftrCr™. Pr iN) ). 

It is possible to relate this to the 'Boltzmann' (i.e., distinguishable particle) density matrix Q(r^,r^ ) by 

It is necessary to sum over these permutations in a path integral simulation. (The same sum is needed for 
bosons, without the sign factor.) For fermions, odd permutations contribute with negative weight. Near- 
cancelling positive and negative permutations constitute a major practical problem [ 196 ]. 

B3.3.11 CAR-PARRINELLO SIMULATIONS 

Car and Parrinello [ 202 ] proposed a technique for efficiently solving the Schrodinger equation which has had 
an enormous impact on materials simulation (for reviews, see [ 203 , 204 , 205 , 206 ]). The technique is an ab 
initio one, i.e., free of empirical parameters, and is based on the use of a quantum mechanical orthonormal 

basis set \|/ n ) = ij/^r) to describe the electronic degrees of freedom. Specifically, the aim is to obtain the 
electron density p(r); the total energy of the system may then be written as a functional of this density, whose 
minimization yields the ground state energy [ 207 ], Pseudopotentials [ 208 ] represent the effects of the atomic 
cores on the valence electrons, allowing some economies. The energy functional is written 

CfrM *™J = Y>.K + \W) + 1 f drdi ^*? (B3.3.8) 

, ZJ I** - r | 

I _ Q t Q f 

* EM+ 2^W^rJ\- (B3.3.9) 

Here we distinguish between nuclear coordinates R and electronic coordinates r; j£is the single-particle kinetic 
energy operator, and V_ c is the total pseudopotential operator for the interaction between the valence electrons 
and the combined nucleus + frozen core electrons. The electron-electron and nucleus-nucleus Coulomb 
interactions are easily recognized, and the remaining term ^ xc [p] is the electronic exchange and correlation 
energy functional. This is usually treated in the local density approximation, using ground-state data for the 
homogeneous electron gas [ 209 ]; the most promising improvements seem to be based on the addition of 
gradient corrections [ 205 , 206 ], 

For each configuration of the nuclei, minimization of the total energy with respect to the electron density 

yields the instantaneous value of a potential energy function V(|?W), and the corresponding forces on the 
nuclei. In principle, 
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assuming an adiabatic separation between nuclear and electronic motion, Newton's equations for the nuclei 
may be solved in the usual way, while the electrons are allowed to evolve according to Schrodinger's 
equations, remaining on the instantaneous ground-state surface. This turns out to be very inefficient in 
practice, and the breakthrough came with the suggestion [ 202 ] that a classical dynamical evolution of the 
electronic configuration could be used to stay in the ground state. A Lagrangian, involving both nuclear and 
electronic degrees of freedom, is written down; the electrons are given fictitious masses, and minimization of 
the electronic energy may be performed by introducing a friction coefficient. 

The Car-Parrinello method has found wide applicability, especially for studying systems in which structure 
and bonding are inseparable, or for materials under extreme conditions for which empirical potential would be 
unreliable. Examples are the studies by Alfe and Gillan [ 210 ] and de Wijs et al [ 211 ] of iron in the Earth's 
core, at temperatures of several thousand Kelvin and pressures sufficient to compress the metal to about half 
its normal volume. It was concluded that the liquid iron in the core is not exceptionally viscous (as has been 
suggested by some seismic measurements) and that dissolved sulphur atoms show no tendency to form 
clusters or chains (which might have a large effect on viscosity). This is shown in figure B3.3.12. 
Additionally, the simulations suggest that the solid part of the core has the hep crystal structure, contrary to 
that inferred from experiments at lower pressure and temperature. 



Figure B3.3.12. Sulphur atoms in liquid iron at the Earth's core conditions, simulated by first-principle Car- 
Parrinello molecular dynamics, (a) Initial conditions, showing a manually-prepared initial cluster of sulphur 
atoms, (b) A short time later, indicating spontaneous dispersal of the sulphur atoms, which mingle with the 
surrounding iron atoms. Thanks are due to D Alfe and M J Gillan for this figure. For further details see [ 210 , 
211 ]. 
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A similar approach, in spirit, has been proposed [ 212 ] for the study of two-component classical systems, for 
example polyelectrolytes, which consist of mesoscopic, highly-charged, polyions, and microscopic, 


oppositely-charged, counterions. This time, an 'effective free energy', depending parametrically on the 
polyion coordinates R, arises by integration over the counterion coordinates r. A unified dynamical scheme, in 
which the counterions are given fictitious masses, allows the counterion density to adjust adiabatically to the 
slower motion of the polyions, and hence permits the free energy to be minimized as the system evolves. 


B3.3.12 PARALLEL SIMULATIONS 

MD programs may be efficiently parallelized, that is, the computational work divided between many 
processors to result in faster execution. Well established message-passing standards and software make it 
relatively easy to write portable and efficient codes. The algorithm to advance positions and momenta is 
trivially handled, with each processor being responsible for a subset of the atoms. The critical considerations 
are (i) the parallelization of the time-consuming force calculation, and (ii) the overheads associated with 
communicating information between processors. Two general methodologies seem to be most promising. In 
the replicated data method, all the processors hold copies of all the atomic coordinates; however, in the 
double loop over pair interactions, each processor deals with a subset of pairs. Some care needs to be taken to 
balance the load between processors, and the results of the force calculations must be broadcast to all other 
processors, which may be time-consuming, but perhaps the biggest drawback is the memory requirement of 
holding copies of all data on all nodes. Nonetheless, this method is easy to program and reasonably efficient 
for many purposes [ 213 , 214 , 215 ]. In the domain decomposition method, the simulation box is split into 
(usually) cubic regions, and each processor is responsible only for the atoms in a given region; there is some 
communication of information from neighbouring domains before the force calculation, and also some 
redistribution of atoms as they move around the system. This approach may be integrated with the link-cell 
approach of section b3. 3. 3. 5 , and is especially efficient for systems with short-range forces [ 213 , 215 , 216 , 
217 , 218 ], An example of the capabilities of such an approach on a massively parallel supercomputer is the 
study by Holian and Lomdahl [ 219 ] of shock waves in a fee crystal of 10 million atoms, as illustrated in figure 
B3.3.13 . In this case, a shock is generated by reflecting atoms at a piston face, i.e., imposing a momentum 
mirror. A system of this size was essential to ensure that the periodic boundaries do not limit the plastic flow 
induced by the shock: this can be seen in the randomly-spaced plaid pattern of the figure. The wave 
propagates back about 60 lattice spacings, generates a large number of stacking faults distributed randomly on 
the four {111} slip systems, and eventually produces a nonplanar propagation front. 
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Figure B3.3.13. Intersecting stacking faults in a fee crystal at the impact plane induced by collision with a 
momentum mirror for a square cross section of side 100 unit cells. The shock wave has advanced halfway to 
the rear (-250 planes). Atom shading indicates potential energy. Thanks are due to B Holian for this figure. 


For further details see [ 219 ]. 

Although this section has concentrated on MD, it should not be forgotten that lattice-based MC codes may be 
parallelized very efficiently; for more information on parallel simulation methods see [ 220 , 221 , 222 and 223 ] 
and references therein. 


B3.3.13 OUTLOOK 

With the rapid development of computer power, and the continual innovation of simulation methods, it is 
impossible to predict what may be achieved over the next few years, except to say that the outlook is very 
promising. The areas of rare events, phase equilibria, and quantum simulation continue to be active. 

An easily recognizable trend is the increasing application of simulation methods to problems of direct 
practical benefit to industry [ 224 ]. A pointer to this kind of use is provided by the synthesis of a small-pore 
microporous material, using a structure-directing template molecule designed by computer [ 225 ]. The aim is 
to promote formation of a desired material (here a cobalt aluminophosphate catalyst in the so-called CHA 
structure) without generating competing microporous phases. The simulation procedure allows molecular 
entities to be grown from a seed molecule by adding standard fragments, under the control of a cost function 
to minimize non-bonded overlaps with the surrounding CHA framework. Likely templates are ranked by 
binding energy, which measures how well they fit in the pore. The result is illustrated in figure B3. 3. 14 . This 
led to a successful synthetic route: the suggested template molecule forms the desired pure mesoporous 
material in 4 h at 180 C, without the formation of competing structures which are found with other templates. 
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Figure B3.3.14. Template molecule in a zeolite cage. The CHA structure (periodic in the calculation but only 
a fragment shown here) is drawn by omitting the oxygens which are positioned approximately halfway along 
the lines shown connecting the tetrahedral silicon atoms. The molecule shown is 4-piperidinopiperidine, 
which was generated from the dicyclohexane motif suggested by computer. Thanks are due to D W Lewis and 
C R A Catlow for this figure. For further details see [ 225 ]. 


A further theme is the development of techniques to bridge the length and time scales between truly 
molecular-scale simulations and more coarse-grained descriptions. Typical examples are dissipative particle 
dynamics [ 226 ] and the lattice-Boltzmann method [ 227 ]. Part of the motivation for this is the recognition that 


brute-force molecular simulation will always be limited in time scale by achievable chip speeds, even if 
increased use of parallel computers allows one to tackle larger length scales. Nonetheless, there will always be 
the need to relate such work to underlying molecular parameters, through statistical mechanics. A more 
detailed discussion of these techniques would take us beyond the scope of this chapter. 
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B3.4 Quantum dynamics and spectroscopys 

Sybil M Anderson, Rovshan G Sadygov and Daniel Neuhauser 


B3.4.1 INTRODUCTION 

The study of quantum effects associated with nuclear motion is a distinct field of chemistry, known as 
quantum molecular dynamics. This section gives an overview of the methodology of the field; for further 
reading, consult [1, 2, 3, 4 and 5,]. 

The importance of non-classical behaviour in molecular dynamics has its origins in the inherently quantal 
nature of atomic motion. The de Broglie wavelengths of atoms are small but non-vanishing. For example, 
hydrogen has a de Broglie wavelength that can be as high as « 1.0 A at room temperature. Specific quantum 
effects in atomic and molecular motion include: zero-point motion, most notably for hydrogen which has a 

zero-point energy of- 10 kcal mol" 1 in many of its covalent interactions; interference resonances, which are 
spikes in reaction or pre-dissociation probabilities associated with quasi-bound states of molecules [6, 7, 8 and 
9]; and tunnelling of nuclei which is important in many catalytic reactions [10, H, 12, 13 and 14 ]. 

In its most fundamental form, quantum molecular dynamics is associated with solving the Schrodinger 
equation for molecular motion, whether using a single electronic surface (as in the Born-Oppenheimer 
approximation — section B3. 4. 2 or with the inclusion of multiple electronic states, which is important when 
discussing non-adiabatic effects, in which the electronic state is changed [ 15 , 16 , 17 , 18 and 19 ]. 

Section B3.4. 3 , Section B3. 4.4 , Sections B3.4.5 describe methods of solving the Schrodinger equation for 
scattering events. Sections B3.4.6 and Sections B3.4.7 proceed to discuss photo-dissociation and bound states. 

As these methods are explored, it is quickly realized that the numerical effort in the theoretical description 
grows prohibitively large with the number of atoms in a molecule. The difficulty lies in precisely what makes 
molecular motion fundamentally quasi-classical, i.e. the large molecular masses (relative to the mass of the 
electron). Consequently, a molecular wavefunction has many oscillations and is difficult to model 
numerically. There have been many attempts at developing alternate approaches for representing quantum 
wavefunctions and observables without the use of large grids or basis sets, ranging from approximations to 
path-integral descriptions. The basics of these approaches are described in Sections B3.4.8 . Later, Sections 
B3.4.9 . describes the issues involved in the study of non-adiabatic phenomena. 

Finally, Sections B3.4.10 . touches on the application of quantum molecular dynamics to a very exciting field: 
laser interactions with molecules. This field presents, in principle, the opportunity to influence chemistry by 
lasers rather than to simply observe it. 

The scope of this section restricts the discussion. One omitted topic is the collision and interaction of 
molecules with surfaces (see [20, 21] and section A3. 9 ). This topic connects quantum molecular dynamics in 
gas and condensed phases. Depending on the time scales of the interaction of a molecule with a surface, the 


reactions are similar to those 


in one phase or the other. If the collision is fast, so that one may neglect the motion of the surface molecules 
and treat them as frozen, it is effectively a gas-phase reaction. On the other extreme, if the motion of the 
adsorbate is slow or comparable in time to the motion of surface and subsurface molecules, then the collision 
problem becomes very similar to the interaction of molecules in condensed phases. The latter is a subject of a 
separate Sections C3.5 . 

Another modern and highly exciting topic, omitted here due to lack of space, is the motion of very cold 
molecules [22, 23 and 24], which can have de Broglie wavelengths that are as large or larger than the 
distances between the molecules. The simplest examples are essentially extensions of floppy van der Waals 
structures, but at the extreme, when the wavelength is extremely large and there are many molecules per 
molecular wavelength, one ends up with Bose-Einstein condensates (where the wavefunctions of the 
molecules coalesce to form one giant coherent molecular function) and even molecular lasers (i.e. lasers 
where the fundamental particles are atoms or molecules rather than photons [25]) can be made. Sections CI. 4 
provides an overview of this new field. 

As in any field, it is useful to clarify terminology. Throughout this section an 'atom' more specifically refers 
to its nuclear centre. Also, for most of the section the h= 1 convention is used. Finally, it should be noted that 
in the literature the label 'quantum molecular dynamics' is also sometimes used for a purely classical 
description of atomic motion under the potential created by the electronic distribution. 

Finally, this section is related to several others, especially Sections A3. 11 on formal scattering, which should 
be carefully consulted. 


B3.4.2 QUANTUM MOTION ON A SINGLE ELECTRONIC SURFACE 

A corner-stone of a large portion of quantum molecular dynamics is the use of a single electronic surface. 
Since electrons are much lighter than nuclei, they typically adjust their wavefunction to follow the nuclei [26]. 
Specifically, if a collision is started in which the electrons are in their ground state, they typically remain in 
the ground state. An exception is non-adiabatic processes, which are discussed later in this section. 

The single-surface assumption, known also as the Born-Oppenheimer approximation, implies that the nuclei 
are described by a single wavefunction (i|/(jt,0 where x is a multi-dimensional vector describing the nuclear 
position). The time-dependent equation for the evolution of the wavefunction is simply 


itt = H f = [K + V(x)]\ff (B3.4.1) 

Hi 

where His the Hamiltonian governing the motion of the nuclei (or the atomic motion, as typically denoted), 

kis the kinetic term (a sum of terms of the form ~w i"^ H d ** x r) for each atomy) and Fis the Born- 
Oppenheimer potential which is defined as the electronic ground-state energy for nuclear configuration X, 
including the nuclear-nuclear repulsion. 

As a word of caution, the Born-Oppenheimer assumption is not universally valid. There are many reactions in 
which, 


for example, non-adiabatic curve-crossing processes occur. In these cases, two electronic potentials — e.g., the 
ground state and an excited state — are locally equal to one another at some configuration. Curve-crossing 
effects are highly non-trivial and can affect the nuclear dynamics even at energies which are way below the 
curve crossing. These points are briefly discussed in Sections B3.4.9 . 


B3.4.3 SCATTERING 

B3.4.3.1 COLLIN EAR MOTION 

Equation (B3.4.1) is general and applies to both scattering and bound state spectroscopy. Scattering will be 
considered first. For simplicity, the discussion uses the collinear model for the A + BC^ AB + C reaction 
(i.e. assuming all particles lie on a line). This model is easy to visualize and embodies most elements of three- 
dimensional (3D) scattering of larger molecules. 

After removal of centre-of-mass motion, there are two independent distances which need to be considered for 
a collinear problem, r BA (= r B - r A , where r B and r A denote the positions of A and B on the line) and r CB , 
which is similarly defined. Unfortunately, the kinetic energy is not conveniently described with these 
coordinates; therefore, alternate systems are used. The most convenient one is reactant Jacobi coordinates 
(r,R), where r = r CB , and R is the distance between A and the centre of mass of B and C [27], In these 
coordinates, the kinetic energy gets a simple and separable form so that the Schrodinger equation is 


.3^ _ 


i a 2 id 2 

+ V(«,r) * (B3.4.2) 


dt I 2M3R 2 2^ Br 2 


W,r>l* 


where ju is the reduced mass associated with the BC vibration, and M is the mass associated with the A- BC 
motion. 

Figure B3.4.1 shows the potential surface for a simple collinear reaction, D + H 2 — » HD + H. The most 
notable aspect of this potential is the angle between the products and reactant arrangement. Below breakup (D 
+ H 2 — » D + H + H, which can occur only at several electronvolts above the reaction threshold), the potential 
surface has three relevant regions: the reactants (D + H 2 ) asymptote (arrangement) at large R and small r; the 
products (H + DH) asymptote; and a strong interaction region where all three atoms are closely spaced. The 
strong-interaction region is extended over a region of «1 A x lA , containing several oscillations of the full 
(DH 2 ) wavefunction. Since the potential is not separable in R and r, upon reaction the particles would 
exchange vibrational and translational energy (and in the three-dimensional case, also rotational energy). 
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Figure B3.4.1. The potential surface for the collinear D + H 2 — » DH + H reaction (this potential is the same as 
for H + H 2 — » H 2 + H, but to make the products and reactants identification clearer the isotopically substituted 
reaction is used). The D + H 2 reactant arrangement and the DH + H product arrangement are denoted. The 
coordinates are r, the H 2 distance, and R, the distance between the D and the H 2 centre of mass. Distances are 
measured in angstroms; the potential contours shown are 4.7 eV,-4.55 eV,. . .,-3.8 eV. (The potential energy 
is zero when the particles are far from each other. Only the first few contours are shown.) For reference, the 
zero-point energy for H 2 is -4.47 eV, i.e. 0.27 eV above the H 2 potential minimum (-4.74 eV); the room- 
temperature thermal kinetic energy is approximately 0.03 eV. The graph uses the accurate Liu-Seigbahn- 
Truhlar-Horowitz (LSTH) potential surface [ 195 ], 

The collinear model does not include bifurcation, i.e. the possibility of several product channels which the 
system can access. A model potential surface for an A + BC^ AB + C, AC +B reaction is shown in figure 
B3.4.2. Both of these examples will be used in the discussion below. 

(b) 




Figure B3.4.2. (a) A schematic potential surface showing bifurcation for a triatomic reactive system, (b) By 
blocking the products' arrangement with an absorbing potential (shaded area) the reactive system is reduced 
to one arrangement; this scheme enables calculation of both total reactivities and state-to-state information. 
Reprinted from [46] with permission. 


B3.4.3.2 BOUNDARY CONDITIONS 


The eventual goal in scattering calculations is essentially to obtain the scattering matrix, S (see Section A3. 11 
and equation (B3.4.4) below). The scattering matrix can be obtained by reference to the solution of the time- 
independent Schrodinger equation, fulfilling 


(H - E)f = (B1.4.3) 

which is the Fourier transform of equation B3. 4.1 . However, this wavefunction must obey the appropriate 
boundary conditions. It should have components associated with a single 'incoming' channel (an incoming 

wave associated with the translational A + BC motion, e ~ , multiplied by a wavefunction for BC at a 
specific initial target state, <|) no (r)). In addition, there are components associated with all 'outgoing' channels. 
(See Section A3.1 1 and [27].) For example, for the bifurcating potential case where three asymptotic 
arrangements are formally possible, the wavefunction (with an index n^ attached) is 


a. l^Tfl "■ p '*-n " 




^FrFr, F 


S Vka/M = ^ /M (B3.4.4) 

where § n is the nth vibrational state of the BC diatomic an<^£ n is the translational momentum of A when BC 
is in the nth state {k^/lM = E - e w ). Quantities with a (/% J?, etc) refer to the product channel AB + C, so J?is 
the distance between AB + C, etc. In addition, for cases in which the AC + B channel is open, a double-bar 
notation is used (w> f , etc). 5^ n is the scattering matrix associated with the amplitude of the system to emerge at 
product channel iiwhen it is initially at reactant channel « Q . The equality sign is in quotes to denote that this 

relation is only valid in the asymptote where the system is separated into an atom and a diatom. 

B3.4.3.3 SCATTERING TECHNIQUES 

The presence of the multiple arrangements make molecular scattering very challenging theoretically. After 
much trial and error, several techniques have been developed. These techniques generally fall into two broad 
categories: 

• methods which aim at treating all possible arrangements simultaneously; 

• arrangement-decoupling approaches [28, 29, 30 and 31] where an absorbing potential is used to 
convert multiple-arrangement problems to inelastic-scattering (or even bound-state-like) problems. In 
recent years, these approaches have become very powerful for large-scale applications. 


B3.4.3.4 ALL-ARRANGEMENT METHODS 


(A) WAVEFUNCTION EXPANSION 


The conceptually simplest approach to solve for the ^-matrix elements is to require the wavefunction to have 
the form of equation (B3.4.4) , supplemented by a bound function which vanishes in the asymptote [32, 33, 34 
and 35] This approach is analogous to the full configuration-interaction (CI) expansion in electronic structure 
calculations, except that now one is expanding the nuclear wavefunction. While successful for intermediate 
size problems, the resulting matrices are not very sparse because of the use of multiple coordinate systems, so 
that this type of method is prohibitively expensive for diatom-diatom reactions at high energies. 


(B) CLOSE COUPLING 

Alternatively, one can use close-coupling methods. These methods are easiest to understand for single 
arrangement problems (i.e. when both the AB + C and AC + B product arrangements are very high in energy 
so that only the A + BC reactant arrangement can be accessed). Then one writes 


ir„JR<r E) = Y,<* ARWAr) 


(B3.4.5) 


and it is readily shown that the Schrodinger equation can be written as 

A 2 
— yff,ju y =2M ^(U nm (R) - E&mnfa 


iftrtft 


(B3.4.6) 


where 


£/„<*) = f fa(r)( - ^-^y + V(R, r)\<P m tr) dr. 


(B3.4.7) 


Equation (B3.4.6) is solved by starting at a small value of R, denoted by 7? start , where the potential is high and 
the wavefunction is exponentially vanishing, and picking random values for a nno and 8a /8R. Then, the 
equations are propagated towards larger 7?. Eventually, \\f (R,r) is resolved at a large value of R to yield the S- 
matrix [36]. In practice, one has to avoid linear dependence between the solutions associated with different 
initial conditions. This is achieved by simple stabilization approaches. 

The close-coupling approach works readily and simply if the reaction is purely 'inelastic'. The method can 
also be made to work very simply for a single product arrangement (as in collinear reactions), by using a 
'twisted' coordinate system, most conveniently reaction path coordinates [37, 38 and 39] as shown in figure 
B3.4.3. 
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Figure B3.4.3. A schematic figure showing, for the DH 2 collinear system, a reaction-path coordinate Q 
connecting continuously the reactants and the single products asymptote. Also shown are the cuts denoting the 
coordinate perpendicular to Q. 

The complications which occur with bifurcation, i.e. when more than one product arrangement is accessible, 
can be solved by various methods. Historically, the first close-coupling approaches for multiple product 
channels employed fitting procedures [40], where the close-coupling equations are simultaneously propagated 
from each of the asymptotes inwards and then are fitted together at a dividing surface. This approach has been 
replaced in recent calculations by two methods. One is based on using absorbing potentials to turn the reactive 
problem into an inelastic one, as explained later. The other is to use hyperspherical coordinates for carrying 
out the close-coupling propagation [ 41 , 42, 43, 44 and 45]. The hyperspherical coordinates consist of a single 
radius p, which is zero at the origin (when all nuclei are stuck together) and increases outwards, and a set of 
angles. For the collinear problem as well as the atom-diatom problem (involving three independent distances) 
the hyperspherical coordinates are typically just the regular spherical coordinates. Close-coupling propagation 
starts at p = and moves outward until a large value of p is reached. When the asymptote are reached one fits 
the wavefunction to have the form of equation (B3.4.4) and thus obtains the scattering matrix. 


B3.4.4 ARRANGEMENT DECOUPLING BY ABSORBING POTENTIALS 

A simplifying approach to scattering is to eliminate all the product asymptote. This can be done efficiently 
and rigorously [28, 30, 31, 46] by inserting in the Hamiltonian a negative-imaginary potential (or more 
generally a complex potential with a negative imaginary term) [30, 47, 48 and 49]. This potential, denoted as 
- iVj(R,r), acts to 'chop' away the product arrangements, while retaining the correct form of the wavefunction 
(see figure B3.4.4 . 
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Figure B3.4.4. (a) Schematic evolution in a ID problem of a wavepacket impinging on an absorbing potential 
with typical parameters (shaded). The width and magnitude of the absorbing potential must be sufficiently 
large so that a wavepacket impinging on it would eventually be completely absorbed, with very little 
reflected, (b) In time-independent language, the absorbing potential forces the wavefunction in the region 
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preceding it to have an outgoing (e ) form, with very little reflected (e ) component. 

To understand this unique feature of the negative imaginary potential it is easiest to refer to the time- 
dependent language, discussed later in the section. Heuristically, note that the time-dependent propagator, 
e~ l(W "'^'essentially contains a e~ Vj 'term which decays the wavefunction in regions where V l is positive. The 

negative imaginary potential therefore prevents reflection and thus imposes 'outgoing boundary 
conditions' [31] on the wavefunctions to which they are applied. 


There is considerable freedom in the choice of absorbing potentials; they are simply required [ 30 ] to be 
sufficiently extended to absorb any wavefunction which impinges on them, while not rising too sharply to 
avoid reflection from their rising slopes before the wave gets absorbed. This implies that they typically need 
to extend only over approximately one to two de Broglie wavelengths, which is usually short enough to add 


only a negligible overhead to the size of the required grids. 

The negative imaginary potentials can be applied in any scattering formalism. In close coupling, they can be 
implemented to block any product arrangement [ 31 ] (see figure B3.4.2 ) and this thereby converts the reactive 
problem to an inelastic one; the only cost is the propagation of a complex matrix, ^ n 'rorather than a real one. 

Alternately, absorbing potentials can also be applied to convert scattering to a bound-state-like problem. One 
method is to write the Schrodinger wavefunction as a sum of two terms V"<A^' == X«*(^) + Cu&(^ \ where ^ 

includes the known incoming wave term, and % includes the unknown outgoing wave part (see figure B3.4.5). 
The final equation is then [ 28 ] 


{E-(H-iVj))x^={H ~E)b 


Sri,. 


(B3.4.8) 


where the absorbing potentials are inserted to impose the correct boundary conditions on % nQ . 
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Figure B3.4.5. Schematic plot of a two-dimensional potential surface for D + H 2 restricted by an absorbing 
potential (shaded area). The absorbing potential i Vj (R, r) (shaded region) rises gently outward towards the 
edges of the grid. In practice, the grid needs to be extended only by »0.5-l A, or less for heavier mass 
systems. The absorbing potential imposes the correct boundary condition on the wavefunction in the inner 
region. This basic paradigm of a small grid (denoted by dots), used in the strong-interaction region to describe 
the main part of the wavefunction, applies in several different formulations: time-independent arrangement- 
decoupling scattering, where the time-independent wavefunction ^(R, r, E) is placed on this grid 
(supplemented by a function describing the initial wavefunction); time-dependent scattering (where it is used 
to describe the non-elastic part of the wavefunction, and the elastic part is 
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represented on a separate grid); flux-flux studies, and photodissociation. All desired scattering information 
can be obtained from the information on the wavefunction in the strong-interaction region. 


This approach has one key advantage [30]. Although when solving for % one needs to invert (E -11 +iV>) 
(more precisely calculate the action of the so-called Green's function (E-fl + iVj) on the initial state), the 

operator is defined in a finite region so that the wavefunction can be written on a single small grid covering 
the small-interaction region (as no asymptotic regions are involved). This makes the operation by //very rapid 


(formally, //is then very sparse) so that efficient iterative methods can be used [50, 51, 52, 53 and 54]. It is 
then possible to handle grid sizes of more than a million points. Thus systems like AB + CD rearrangement 
scattering, with six or more floppy distances (and »10 points per degree of freedom) are now routinely done 
with arrangement decoupling approaches based on absorbing potentials. The most widely applied iterative 
method with absorbing potentials and arrangement decoupling was developed within a time-dependent 
formulation, and is discussed below. 

B3.4.4.1 THE TIME-DEPENDENT METHOD 

The approaches discussed so far are generally called time-independent methods, since they start from the 
time-independent Schrodinger equation, (//-E)Vjt & . An alternative is to use the time-dependent Schrodinger 
equation [28, 29, 5Q, 55, 56, 52, 58, 59, 60, 61, 62, 63 and 64]. Conceptually, the time-dependent approach is 
very simple: prepare an initial wavepacket on an appropriate grid; propagate the initial wavepacket for a 
sufficiently long time; and analyse the results. The approach is efficient, since propagation of a wavepacket is 
relatively cheap and since results on scattering at many energies are extracted at once. Correct boundary 
conditions have a very simple meaning in time-dependent approaches: the scattered component of the 
wavepacket is not returned from the edges of the grid. This is done by adding an absorbing potential to the 
Hamiltonian, which absorbs any component of the wavefunction that reaches the edge of the grid. 

In more detail, in time-dependent approaches an initial wavepacket associated with the separate parts of the 
colliding system is prepared. The most efficient approach for a grid construction is to use two grids. Thus, the 
total wavefunction is divided into two parts [30, 65], a simple one which defines the initial wavefunction and 
a more complicated part, represented (for collinear scattering) on a two-dimensional grid of R, r values, which 
is used to carry most of the wavefunction (essentially the scattered part) and which is padded with absorbing 
potentials (see figure B3.4.5 ). With this approach and with methods which reduce the number of grid points 
(in a given region) to at most two per oscillation [50, 66, 67], the total number of grid points can be reduced to 
150-800 for collinear scattering involving hydrogen with energies of up to 2 eV. 

Once the grid (or two grids) are prepared, there are two similar types of approaches to propagate the initial 
wavefunction forward with time. One approach is split-operator methods, [59] where the short-time 
propagator is divided into a kinetic and potential parts so that 


.w<*.>-.*[-.*^h-.(£ + £H 


(B 3.4.9) 
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where a bra-ket notation is used. The action by the potential is trivial, since it is local in coordinate space, and 
amounts to multiplication of "&(/?, r, f) by exp(-i V (R,r),dt/2), and by a damping term, exp(-Fj(i?,r),d£/2). 

The action by the kinetic term is only slightly more complicated: the coordinate grid wavefunction, rir*(R t r), 
undergoes a fast Fourier transform (FFT) to convert it to momentum space: 

W.P.O = E e "'^W^', O. (B 3.4.10) 

The function is then multiplied by exp(-i((P 2 /2Af) + (p 2 /2u))d^) and then returned to coordinate space by the 


inverse of equation (B3.4.10). 

The key to this method is thus to act with each operator (exponential of the potential or kinetic term) in the 
representation (coordinate or momentum grid) in which it is local [50, 66, 67 ]. 

An alternative to split operator methods is to use iterative approaches. In these methods, one notes that the 
wavefunction is formally 1^(0) = exp(-i/?0| "">), and the action of the exponential operator is obtained by 
repetitive application of Hon a function (i.e. on the computer, by repetitive applications of the sparse matrix 

H on wavefunction vectors). The simplest iterative method is the Taylor expansion of c ' I ^Was 
2Lh(v— l J F fttf" | ^ W|) } Q n ^ com p U ter, this expansion would be performed by acting with //on Vjtu ? then 
acting with //on the resulting vector, etc, and adding the contribution to the sum at each stage. The action by 
//on a vector is straightforward, as in the split operator approach: the potential is local, and the kinetic energy 
is evaluated by Fourier transforming back and forth onto the momentum grid. 

The Taylor series by itself is not numerically stable, since the individual terms can be very large even if the 
result is small, but other polynomials which are highly convergent can be found, e.g. Chebyshev [50, 62, 63 
and 64] or Lancosz polynomials [ 51 , 68 ]. 

The wavepacket is propagated until a time where it is all scattered and is away from the interaction region. 
This time is short (typically 10-100 fs) for a direct reaction. However, for some types of systems, e.g. for 
reactions with wells, the system can be trapped in resonances which are quasi-bound states (see section 
B3.4.7 ). There are efficient ways to handle time-dependent scattering even with resonances, by propagating 
for a short time and then extracting the resonances and adding their contribution [69]. 

The last stage is the extraction of energy-resolved information, obtained automatically and simultaneously at 
many energies, by Fourier transforming the wavefunction to produce an energy-resolved state: 

] f x 
f ri ,{R,r,E)= — / $ n ,(R,r,t)c lEi dt (B 3.4.11) 
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where a E is related to the energy content of the initial wavepacket. It is easy to show that \|/(R, r, E) fulfills the 
time-independent Schrodinger equation. (In practice, \\f is known analytically prior to t = 0, so that the 
wavefunction only needs to be propagated forward in time.) The scattering matrix is then obtained from either 
of several formulae, all of the form [46, 65, 70, 71, 72, 73 and 74] 


$„„, = (fe|^(^)> (B 3.4.12) 

where L= is a simple function which is associated with the final state fi. These formulae extract long-range 

scattering information from the wavefunction values in the strong-interaction region. Scattering information 
can therefore be extracted even when absorbing potentials are used to remove the asymptotic regions. 

An interesting side point is that it is possible to recast the time-dependent approach, as described here, in a 
purely time-independent fashion, since from the equations above it follows that [ 74 ] 

1 

$„„(£) = constant- -r — — $ 

h — H + \V* 


The time-dependent approach is thus just one technique for evaluating the action of the Green's function on 
the initial wavepacket. 

B3.4.4.2 LARGE-SCALE APPLICATIONS 

Both close-coupling approaches (hyperspherical or with absorbing potentials) and iterative/time-dependent 
absorbing-potential arrangement-decoupling approaches are readily extended to three-dimensional atom- 
molecule and molecule-molecule scattering. The wavefunction representation becomes more complicated and 
includes rotational matrices, but the essence and application of the method remains analogous [58, 65, 25, 76 ]. 

Iterative approaches, including time-dependent methods, are especially successful for very large-scale 
calculations because they generally involve the action of a very localized operator (the Hamiltonian) on a 
function defined on a grid. The effort increases relatively mildly with the problem size, since it is proportional 
to the number of points used to describe the wavefunction (and not to the cube of the number of basis sets, as 
is the case for methods involving matrix diagonalization). Present computational power allows calculations 

with optimized grids with sizes of 10 5 -10 7 points or more. This enables efficient simulations of four-body 
reactions involving six independent distances and up to two overall rotational coordinates. Thus far there have 
been several four-body reactions reported using this method, including H 2 +OH ^H 2 0+H [75, 76, 77 and 78] 

and CO+HO ^C0 2 +H [79, 80], as well as surface reactions [58, 81] (see figure B3.4.6 for an example). 
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Figure B3.4.6. Reaction probabilities for the initial-state-selected process H 2 (v = 0,y = 0)+OH(v,y = 0) — » 
H 2 0+H, for zero total angular momentum. Taken from [75] with permission. 


B3.4.5 COARSE INFORMATION 


The methodology presented so far allows the calculations of state-to-state ^-matrix elements. However, often 
one is not interested in this high-level of detail but prefers instead to find more average information, such as 
the initial-state selected reaction probability, i.e. the probability of rearrangement given an initial state n^. In 
general, this probability is 


/>„„(£) = £ \S fintt (E)\ 2 . (B 3 4 13) 

For example, for the collinear reaction A+BC this would be the probability that if initially the diatom BC is in 
a vibrational state ^^ r \ then after the reaction a diatom AB is formed (in any product vibrational state). In 
practice, the initial-state selected probability is easily calculated from the flux of the wavefunction ^(R, r, 
E) calculated at the product arrangement (e.g., at a large value of r, the B-C separation). 

At times, however, even the information presented by ^is too detailed. If one wants to rigorously calculate 
the thermal rate of rearrangement reactions, the initial vibrational state is not important. The relevant quantity 
is the sum of the initial-state-selected probabilities 

N(E) = £ P„ n {E) = Yl |&,(£)| 2 . (B 3 4 14) 

^0 ■'i ■"! 
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N(E) is called the cumulative reaction probability. It is directly related to the thermal reaction rate k(T) by 


k(T) = * — (B 3.4.15) 

where Tis the temperature and Q is the reactants' partition function. 

A major achievement [71, 82, 83, 84, 85, 86, 87 and 88] was the development of a simple quantum ('flux- 
flux') expression for the cumulative reaction probability, N(E) 9 with the final result 


N(E) a I in Tr(FGFC') (B 3.4.16) 

where the Green's function is, as mentioned earlier, 

£?(£"> = t— ; (B 3.4.17) 

and F is the flux operator. In this expression, the trace is evaluated over a small grid region. In principle, the 
grid has to contain only a small-interaction region, in which the system 'decides' its final arrangement (i.e. 
with what probability to react). This expression does not refer to the scattering matrix and therefore the 
asymptotic region does not have to be included in the grid. 

The flux-flux expression and its extensions have been used to calculate reaction probabilities for several 
important reactions, including H 2 +0 2 ^H + H 2 0, by explicit calculation of the action of G in a grid 
representation with absorbing potentials. The main power of the flux-flux formula over the long run will be 
the natural way in which approximations and semi-classical expressions can be inserted into it to treat larger 
systems. 


B3.4.6 PHOTO-DISSOCIATION 


The time-dependent approach has the advantage that it is easy to visualize the propagation of a simple 
wavepacket and make intuitive sense of a large body of chemical phenomena. This is especially powerful in 
photo-initiated processes. As a result of a photon absorption, the ground-state wavefunction is 'jumped' to a 
higher potential energy surface of a different electronic state and propagates on this new surface. The initial 
excitation is, by the Frank-Condon principle, essentially vertical (i.e. the nuclear position and momentum do 
not change, only the electronic state). The subsequent process (see figure B3.4.7 ) is the response of the 
nuclear coordinates to the change in the electronic state. 
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Figure B3.4.7. Schematic example of potential energy curves for photo-absorption for a ID problem (i.e. for 
diatomics). On the lower surface the nuclear wavepacket is in the ground state. Once this wavepacket has 
been excited to the upper surface, which has a different shape, it will propagate. The photoabsorption cross 
section is obtained by the Fourier transform of the correlation function of the initial wavefunction on the 
excited surface with the propagated wavepacket. 

For two Born-Oppenheimer surfaces (the ground state and a single electronic excited state), the total photo- 
dissociation cross section for the system to absorb a photon of energy co , given that it is initially at a state |%) 
with energy Eq can be shown, by simple application of second-order perturbation theory, to be [ 89 ] 


ff (o>) = consram ■ / e l(w+tu>r c(0 df 


(B 3.4.18) 


where the correlation function is defined as 


Wd. 


c(t) = (*|e- ,H -'|tf) 


(B 3.4.19) 


O = |ux, jlx is the dipole moment and % is the initial vibrational state on the ground surface (with energy Eq). 
/? exc is the excited-state potential energy. This expression has a clear physical meaning. Take an initial 

wavepacket, %, multiply it by the dipole moment, and use the resulting packet (®=|u%) as an initial function so 
that it is propagated under the excited-state potentiality — c ' " L ^\ 


equation (B3.4.18) makes a very powerful statement: absorption is only related to the Fourier transform of the 


correlation function. All that is needed is to know how the wavepacket propagates in time on the upper 
surface. Keeping in mind the time-energy uncertainty principle, one can then qualitatively understand the 
spectral features in 
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photo-absorption [89]. Referring to figure B3.4.8 the correlation function starts at 1 at t = 0, then undergoes a 
period of decay over a time scale denoted by Ty This initial decay is more rapid if the excited potential 
surface is steep, slower if it is shallow. This is the shortest time scale in the correlation function, so it 
corresponds to the broadest feature in the energy spectrum, i.e. the envelope in figure B3.4.9 . In figure 
B3.4.8 , the correlation peaks every time the wavepacket returns to the initial placement (denoted by T 2 ). In 
the energy picture, the peak spacing is 2tt/T 2 . The correlation peaks are decreasing in magnitude over time T^ 
as the wavepacket either decays to other modes or moves to another region of the potential energy surface. T^ 
is therefore the peak's width in the energy spectrum. 



8 10 1* 

time(fsec) 
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Figure B3.4.8. The correlation function c(t) = (\|/q|\|/(0) as a function of time for photodissociation in a 
collinear (or three-dimensional) polyatomic case. There are three relevant time scales; T^ which measures 
how rapidly the initial wavefunction dephases; T 2 , which measures how long it takes this initial wavefunction 
to regroup; and T^ which measures how long the wavefunction takes to 'leak' to other degrees of freedom. In 
practice, photodissociation experiments may yield spectra which are more blurred, if T^T 2 and/or T^ are not 
well separated. 
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Figure B3.4.9. The Fourier transform of the correlation function, from figure B3.4.8 , which gives the 
absorption spectrum as a function of frequency. 

This time-dependent method allows one to nicely connect the theoretical and experimental observations. As 
mentioned earlier, the correlation function and its generalizations yield the spectra for a large number of other 
photospectroscopy processes, such as Raman processes [90], as well as molecular scattering [ 73 , 74 ], 


B3.4.7 BOUND STATES AND RESONANCES-EXTRACTION 

B3.4.7.1 RESONANCES— FORMALISM 

The quantum dynamics of bound and scattered systems is closely correlated through the concept of 
resonances which are, heuristically, quasi-bound states in which the system can spend time [6, 7, 8 and 9, 91, 
92, 93 and 94]- More formally, resonances are poles of the S-matrix. (See section A3. 1 1.) In a scattering 
process, the cross section typically exhibits peaks as a function of the scattering energy, exactly at (or near) 
the energy of the resonances. For example, in a one-dimensional scattering off a double well, the scattering 
probabilities exhibit sharp peaks when the collision energy matches the energy of the quasi-bound states in the 
well ( figure B3.4.10 ). (See figure B3. 4. 11 for a realistic example.) 
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Figure B3.4.10. Schematic figure of a ID double-well potential surface. The reaction probabilities exhibit 
peaks whenever the collision energy matches the energy of the resonances, which are here the quasi-bound 
states in the well (with their energy indicated). Note that the peaks become wider for the higher energy 
resonances — the high-energy resonance here is less bound and 'leaks' more toward the asymptote than do the 
low-energy ones. 
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Figure B3.4.11. (a) Reaction probability for a 4D study of the dissociation of incident H 2 on CO. The 
probability exhibits sharp peaks whenever the energy matches that of a resonance wavefunction. (b) Plot of 


the resonance wavefunction associated with one of the peaks, as well as (c) a 2D cut of the potential surface. 
Note that the resonance wavefunction decays near the end of the grid, due to the use of an absorbing potential, 
which localizes its effects to the strong-interaction region. Taken from [ 196 ], with permission. 

The classical counterpart of resonances is periodic orbits [91, 95, 96, 97 and 98]. For example, a purely 
classical study of the H+H 2 collinear potential surface reveals that near the transition state for the 
H+H 2 ^H 2 +H reaction there are several trajectories (in R and r) that are periodic. These trajectories are not 
stable but they nevertheless affect strongly the quantum dynamics. A study of the resonances in H+H 2 
scattering as well as many other triatomic systems (see, e.g., [99]) reveals that the scattering peaks are closely 
related to the frequencies of the periodic orbits and the resonance wavefunctions are large in the regions of 
space where the periodic orbits reside. 

Theoretically, resonances are essentially solutions of the Schrodinger equation at complex energies. These 
specific solutions have the property that they are mainly concentrated in the strong-interaction region and at 
the asymptote are outgoing waves. For one-dimensional predissociation ( figure B3.4.12 ) where the coordinate 
is labelled/?, the resonance wavefunction is asymptotically (i.e. for large positive R) the following outgoing 
wave: 
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^Jte(fl) ^ « e 


j*K 


(B 3.4.20) 


where a is a constant here and k is the energy-dependent wavevector, k 2 /2m = E. The existence of such a 
resonance function seems to be baffling, since graduate level quantum mechanical texts [ 100 ] prove that the 
only bound solutions to the Schrodinger equation are those with real energies and zero flux. Heuristically, the 

solution to this difficulty is that the formal resonance energies are complex. Thus, k is complex and e blows 
up when R is very large. Therefore, resonance functions are not bound functions and the regular proofs do not 
apply to them. 
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Figure B3.4.12. A schematic ID vibrational pre-dissociation potential curve (wide full line) with a super- 
imposed plot of the two bound functions and the resonance function. Note that the resonance wavefunction is 
associated with a complex wavevector and is slowly increasing at very large values of R. In practice this 
increase is avoided by using absorbing potentials, complex scaling, or stabilization. 


The key to practical calculations of resonances is to limit the extent of of the grids used for describing the 
wavefunctions. In the original approach, called 'stabilization' [92, 93 and 94], a finite grid or basis set is used 
and the Hamiltonian is diagonalized on that grid (or for that basis set). Those few eigenvalues which change 
very little when the grid size is modified are associated with the wavefunction of the resonances. (The 
resonances are concentrated in the interaction region, so that they are not sensitive to the details of the grid 
end points.) The main difficulty with this approach is that it necessitates many diagonalizations of the 
Hamiltonian matrix, one for each grid size. 

An alternate and formally very powerful approach to resonance extraction is complex scaling [7, 101 , 102 , 
103 , 104 , 105 , 106 and 107 ] whereby a new Hamiltonian is solved. In this Hamiltonian, the grid's multi- 
dimensional coordinate (e.g., x) is multiplied by a complex constant a. The kinetic energy gains a constant 

complex factor (d Idx —> (1/a )(d Idx )), while the potential needs to be evaluated at points with a complex 
argument V(a x). In a typical calculation, one diagonalizes the resulting complex Hamiltonian for several 
complex values of a, and the complex resonances are 
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those which do not change appreciably with respect to a (analogous to the stabilization approach). Complex 
scaling has been applied mostly to analytical potentials [7, 101 , 102 , 103 and 104 ]; however, it could also be 
used for numerically- derived potentials [ 105 , 106 and 107 ]. 

Finally, the simplest approach to extract resonances is to add to the Hamiltonian an absorbing potential [8, 48 , 
108 , 109 ], and then look for the complex eigenvalues of the Hamiltonian //-i V^. The absorbing potential 

ensures that the resonance wavefunction has the correct form (is outgoing) in the asymptotic region 
immediately preceding Vj (see figure B3.4.4 ). Again, resonance functions are found by varying the parameters 
(the length and magnitude of Vj). 

B3.4.7.2 NUMERICALLY EXTRACTING BOUND STATES AND RESONANCE FUNCTIONS 

As explained above, the practical extraction of resonance eigenfunctions and eigenvalues (in complex scaling 
or with absorbing potentials) amounts to extraction of eigenvalues of a complex Hamiltonian and is thus 
completely equivalent to extraction of bound states. One method for extracting the complex eigenstates of the 
Hamiltonian is simply to expand it in terms of a fixed basis set, and diagonalize directly the resulting matrix. 
This approach works for small- and intermediate-cale problems and/or low-energy eigenstates. 

An alternative is to use iterative methods. The simplest iterative technique for calculating bound state or 
resonances is to pick a random initial wavefunction i|/ (jt) and propagate it forward in time, producing a 
wavepacket: 


^(ar,0=e"^>«(^)- (B 3.4.21) 

//refers here to the complex Hamiltonian, i.e. after complex scaling or inclusion of an absorbing potential; x is 
the grid (or basis set) used to represent \|/. Fourier transform \|/ with respect to E 

f(x,E)= I & ltt }}f(x,t)dr (B3.4.22) 

Jo 

where T is a large time. It is clear that the squared norm of \\f(x,E) (i.e. j\\\r(xJZ)\ ,dx) has a peak whenever E is 


near a resonance or bound-state energy, e n , since \|/(jc,t) has contributions varying as e WnJ from each 
eigenenergy e n . 

This 'direct filter' technique is very powerful [56, 59] in extracting highly excited states, since only the 
propagation of a wavepacket is required. However, it is inefficient when there are closely-lying eigenvalues (T 

needs to be larger than the inverse level spacing, l G n+ i -G n l % as a manifestation of Heisenberg's uncertainty 
relation) or e n is a wide resonance (i.e. its imaginary eigenvalue is large in absolute magnitude compared 
with the level spacing, so that its contribution to the wavepacket, ■e" bfnJ , is washed out rapidly as a function of 
time). 

To avoid these difficulties an alternate approach, labelled filter diagonalization, was developed [ 110 , 111 , 
112 , 113 , 114 , 115 , 116 and 117 ]. The approach is powerful for extracting highly excited energies. 
Mechanistically, filter diagonalization is simple (see figure B3.4.13 . The initial wavepacket is propagated for 
a short time T, and the filtered functions \\f(x, E) are prepared as in equation (B3.4.22) but using a short time 
T. The filtering is carried out for several 
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(typically »50) closely spaced energies which could be way above the ground-state energy. The key is to note 
that even after a short propagation time, T, the \|/(x, E) functions would only contain contributions from 
eigenstates within a narrow strip of sampled energies. Thus, the filtered functions \|/(jc, E) are an excellent 
basis for eigenstates and eigenvalues within the filtered energy range. The true eigenvalues within this range 
are then found by diagonalizing the small (e.g., 50 x 50) matrix of the Hamiltonian operator within the i|/(jt,E) 
basis. The advantage of filter diagonalization is that it avoids the long propagation times of the pure filter 
approach as well as the large matrix diagonalizations associated with pure diagonalization. 



Range of Extraction 

Figure B3.4.13. The basic premise in filter diagonalization is to filter a wavefunction for a short time at 
several energies, E^ i^,..., so that, in energy space, the resulting set of several filtered functions (denoted by 
full bell-shapes) spans the eigenstates (short bars) in the energy range of interest. The short-time filtered 
wavefunctions can therefore be used to extract the eigenstates at the desired energy range, with a modest cost, 
since only short-time filter and small-matrix diagonalizations are used. 

As a side note, filter diagonalization is also useful in a more general context. It can be shown that it is an 
efficient approach for extracting frequencies from a short-time segment of a general signal [ 112 , 113 and 114 , 
118 , 119 ], so that it is not even necessary to use a wavepacket! All one needs is a signal. This feature is very 
important in semi-classical and path integral simulations discussed below, where all the information is 
extracted from a time-dependent correlation function, because the quality of the simulations degrades as a 
function of time (the number of trajectories is typically increased exponentially as the time is increased); 
therefore, information must be extracted from the shortest time possible. 


B3.4.8 BEYOND GRIDS 


The formalism outlined in the previous sections is very useful for small systems, but is, as explained, 
impractical for more than six to ten strongly interacting degrees of freedom. Thus, alternate approaches are 
required to represent dynamics for large systems. Currently, there are many new approaches developed and 
tested for this purpose, and these approaches are broadly classified as follows: 

• frozen and zero-point approximations; 

• mean-field methods and their extensions; 
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• Gaussian-wavepacket-based techniques; 

• path-integral and semi-classical approaches. 

These approaches are generally interwoven, and some of the most exciting developments in chemical 
dynamics have been associated with their combinations. This section very briefly describes the motivation 
behind and the application of these techniques. 

B3.4.8.1 FROZEN AND ZERO-POINT APPROXIMATIONS 

Often a degree of freedom moves very slowly; for example, a heavy-atom coordinate. In that case, a plausible 
approach is to use a 'sudden' approximation, i.e. fix that coordinate and do reduced dimensionality quantum- 
dynamics simulations on the remaining coordinates. A common application of this technique, in a three- 
dimensional case, is to fix the angle of approach to the target [ 120 , 121 ] (see figure B3.4.14). 



Figure B3.4.14. The infinite-order-sudden approximation for A+ BC — » AB + C. In this approximation, the 
BC molecule does not rotate until reaction occurs. 

The sudden approach has been applied widely, starting with atom-diatom calculations and continuing today 
for diatom-diatom calculations [79, 122 , 123 ]. This approach and a related approximation (coupled states or 
CS, which involves the neglect of Coriolis coupling terms in three dimensions [ 120 , 121 , 124 , 125 ]) are much 
more powerful when combined with the arrangement decoupling with absorbing potentials approach 
discussed earlier. The reason is that approximations are typically much easier to formulate and apply, and are 
more valid, for single-arrangement approaches. For an example of the successful merging of approximations 
and arrangement decoupling, see [ 126 ]. 

A related and particularly simple approximation is /-shifting [ 127 , 128 ], This method is a simple (and 


generally useful) trick for calculating reaction probabilities in three dimensions from a single calculation with 
zero total angular momentum (J= 0), by approximating the effect of the non-zero angular momentum as 
shifting a transition state: 
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Pj{E)^(2J+l)Pj u{E-Cj) (B 3.4.23) 

where Cj is determined from the contribution of angular degrees of freedom to the transition state energy. 

A related type of approximation is the reaction path method [ 122 , 123 , 129 , 130 , 131 and 132 ], in which the 
coordinates are divided into those which are relatively rigid during the reaction and are typically associated 
with harmonic oscillators, and the remaining coordinates (reaction coordinates) which undergo significant 
change during the reaction. The contribution of those degrees of freedom which are replaced by harmonic 
oscillators can be taken, for low-energy reactions, simply by the zero point energy of the harmonic oscillators. 
More sophisticated treatments, appropriate for higher temperatures, are actively developed now in the area of 
liquid reaction dynamics where they are used to describe effects of solvents (see section C3.5 for details and 
further references). In addition, proper inclusion of rotational states of the fragments was recently shown to 
yield accurate results in a molecular multi-dimensional reaction-path-like approach [ 133 ]. 

B3.4.8.2 MEAN-FIELD METHODS AND THEIR EXTENSIONS 

The mean field technique is one of the most robust and simple methods used to handle larger molecules in gas 
and liquid environments [50, 134 , 135 and 136 ], The basic premise of all mean-field methods is that the full 
wavefunction represents TV very weakly coupled modes (g z ) and can be approximated as 

s 
$t{Qs}<0=i\xiQ i >th (B 3.4.24) 

*=i 

The result of this approximation is that each mode is subject to an effective average potential created by all 
the expectation values of the other modes. Usually the modes are propagated self-consistently. The effective 
potentials governing the evolution of the mean-field modes will change in time as the system evolves. The 
advantage of this method is that a multi-dimensional problem is reduced to several one-dimensional problems. 

The fundamental disadvantage of the mean-field method is that it does not allow modes to respond in a 
correlated manner to each other. This problem can be somewhat alleviated by a good definition of the relevant 
coordinate system [ 134 , 136 ], (An extension of mean-field methods that does allow for coupling [ 137 , 138 
and 139 ] will be discussed later.) 

If one is interested in low-lying eigenvalues or low-energy scattering a Cl-like approach can be applied, in 
which one uses zero-order eigenfunctions of a simple Hamiltonian to expand the wavefunction. Spectroscopic 
calculations including up to 20 to 30 degrees of freedom have been carried out using such an approach [ 140 , 
141, 142 and 143], 

B3.4.8.3 GAUSSIAN-WAVEPACKET BASED TECHNIQUES 

Gaussian wavepackets are very special functions which, in a sense, bridge the gap between classical and 
quantum 
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descriptions [89, 144 ]. They are defined as 


^(x, = constani x esp(-{x - x(t)) 2 /2v 2 ) exp(ip(f ) - a?) (B 3.4.25) 

where x(f) 9 a and P(t) are the average position, width and momentum of the packet, respectively. A Gaussian 
wavepacket represents the ground state of a harmonic oscillator, shifted in position and momentum. (Gaussian 
wavepackets are also known as Glauber or 'coherent' states — although the latter definition is sometimes 
applied to more general functions.) 

The primary property of Gaussian wavepackets is that on a harmonic potential surface they are solutions of 
the time-dependent Schrodinger equation. Specifically, if one places a Gaussian wavepacket at t = on a 
harmonic oscillator with an arbitrary average position and momentum, x(t = 0) and P(t = 0), the resulting 
wavefunction will remain a Gaussian. The average position and momentum change in time exactly like a 
classical particle would. 

From this basic fact, several related approaches have emerged. First, a technique in which one propagates 
classical trajectories forward in time, and uses a single or multiple sets of Gaussian wavepackets (one for each 
classical trajectory) as an ansatz for the full wavefunction is introduced. For each Gaussian the position and 
momentum are specified by the classical trajectory [ 144 ]. This technique is already able to account for much 
of the zero-point energy effects. In addition, interference and tunnelling effects can be partially accounted for 
by adding several multi-dimensional Gaussians [ 144 , 145 , 146 and 147 ]. The method has been shown recently 
to be very powerful for non-adiabatic coupling problems (see section B3. 4. 9 ). 

Gaussian wavepackets have also found use in a new approach that improves on mean-field techniques [ 137 , 
138 and 139 , 148 ], In this method, the degrees of freedom are divided into those which change strongly 
during the reaction (the 'system' coordinates), which are treated by an explicit wavepacket; the remaining 
'bath' coordinates are treated by Gaussians. The parameters for the bath Gaussians are dependent on the 
system state, and this introduces explicitly a correlation between the system and bath modes (the bath 
responds differently to different parts of the system). Use of this technique in multi-dimensional reactions 
involving tunnelling has shown it to be significantly superior to mean-field techniques, while requiring 
modest numerical effort even for multi-dimensional systems. 

B3.4.8.4 PATH INTEGRAL AND SEMI-CLASSICAL DYNAMICS 

In the 1940s, Feynmann realized that quantum mechanics can be recast in a simple form which is very 
reminiscent of classical mechanics [ 149 , 150 ]. This approach, path integrals, has been heavily researched in 
chemical dynamics since, if properly convergent, it could allow calculations on very large systems. Even 
earlier, van Vleck [ 151 ] postulated a semi-classical approach, i.e. a method which (like the Wentzel- 
Kramers-Brillouin (WKB) approximation) captures quantum interference effects and is accurate when the 
masses or energies are large so that Acan be considered small. This section briefly describes Feynmann's 
approach, semi-classical dynamics, and recent developments and improvements. 


For concreteness, assume that we consider several particles described together by a multi-dimensional 

coordinate x and assum* 
mass. (Hamiltonians in 


9 9 

coordinate x and assume that the kinetic energy is the usual (— 1/2M)(9 Idx ), where Mis a single relevant 
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quantum dynamics can usually be brought to this form upon rescaling of the coordinates by constants.) The 
basic ingredient in Feynmann's approach is the correlation function for a general function \|/: 

{^|^(T»= {f\c-' ifiT \f)= f $*{x")G(x\x\T)\l>(x,T)dx f dx" (B 3.4.26) 

where the time-dependent Green's function G is very simple: 


G{x\ x\ r) = constant - £VlM»>*, 

path 


(B 3.4.27) 


In this section, we insert ^explicitly. The sum has the following meaning: for each pair of points x' and x" 
draw a path x(t) that starts at x' (x(t = 0) = x f ) and ends at x" (x(t = x) = x"). This path need not be a classical 
path (see figure B3.4.15). Each such path contributes c lSyA , where S is the action of the path. S (unrelated to the 
scattering matrix) is calculated very simply as 


-/[t(^-H* 


(B 3.4.28) 


where Fis the potential. This, in principle, gives a very simple prescription. All one has to do in order to 
calculate quantum mechanical properties is to sum over 'many' quantum trajectories. 


X (1=7<U) 


X (l=6d!) 



X (i=dt) 


X(l=0) 


Figure B3.4.15. A possible Feynmann path trajectory for a ID variable as a function of time. This trajectory 
carries an oscillating c 1 -^component with it, where S is the action of the trajectory. The trajectory is highly 
fluctuating; its values at each time step (x(dt), x(2dt),..., etc) are not correlated. 
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Unfortunately, this simple approach is not plausible numerically. The integral, as presented, will not converge, 
even for short times. The problem is that even trajectories which are 'wild', i.e. highly fluctuating, contribute, 


per path, the same as trajectories that are 'gentle'. The key to introducing convergence is to realize that highly 
fluctuating trajectories or trajectories that lie in regions of high potential energy give contributions that are 
cancelled by those of other nearby trajectories. Thus, a bundle of highly fluctuating trajectories gives only a 
very weak contribution. The only trajectories that give very strong contribution are classical trajectories (for 
example, if there is no potential, the classical trajectory would run with constant velocity from x' to x"). This 
realization is the key to the development of semi-classical approaches, i.e. approaches in which the exact 
calculation of the time-dependent Green's function is replaced by the sum of contributions of classical 
trajectories. The basic semi-classical expression is [ 151 , 152 , 153 , 154 , 155 , 156 , 157 and 158 ] 


V =$ w {x")f{K)dx&x\ (B 3.4.29) 

t^ vW /ox | 


Note the meaning of this expression: for each choice of the initial and final position*' and x", calculate the 
classical path that takes you from x' to x" in time t. Specifically, calculate the momentum along the path and 
the final momentum,/;", and find out how/?" varies with the initial position. This would give, for a multi- 
dimensional problem, a matrix dp'^/dx 9 - whose absolute determinant needs to be inverted. 

There is a simple physical explanation to the inverse determinant in equation (B3.4.29). Each classical 
trajectory has a 'volume' of quantum trajectories nearby which have similar phases to it. Beyond that volume, 
the phases are becoming random. Thus, the larger that volume, the greater contribution would that bundle give 
to the final semi-classical 'propagator'. If//' varies slowly when x' is changed (and \dp"/dx'\ is small), the 
action's phase varies slowly under a change in the end-point position, so the volume of the quantum 
trajectories that surround the classical trajectory is also large, and a large contribution is expected from the 
semi-classical propagator. 

Expression (B3.4.29) is still not well suited for classical simulations due to several reasons. First, \dp"/dx'\ can 
vanish at specific times, which leads to infinities in the result. (In classical scattering this is related to the 
existence of 'scattering rainbows'.) This is easily circumvented by changing integration parameters, from x" 
top' (i.e. from the final position to the initial momentum) 


dx'dx'' = 


3x" 


3p r 


dxdp (B 3.4.30) 


leading to [156, 157] 


(i//|^(/)) = f \^\\*(x ff )ir(x)Q l ^' pt)/h dxdp\ (B 3.4.31) 

J | ux r | 
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This transform also solves the boundary value problem, i.e. there is no need to find, for an initial position x' 
and final position jc", the trajectory that connects the two points. Instead, one simply picks the initial 
momentum and position/?', x' and calculates the classical trajectories resulting from them at all times. Such 
methods are generally referred to as initial variable representations (IVR). 

Finally, one problem still remains. There are complex terms which need to be associated with the determinant. 
The complex terms (Maslov indices) have to do with the square root of the determinant, which may be 
negative, and also appear in the related WKB approximation. They can be calculated, albeit with difficulty 


[ 152 , 153 , 154 , 155 , 156 , 157 and 158 ]. 

Even expression ( B3.4.31 ), although numerically preferable, is not the end of the story as it does not fully 
account for the fact that nearby classical trajectories (those with similar initial conditions) should be averaged 
over. One simple methodology for that averaging has been through the division of phase space into parts, each 
of which is 'covered' by a set of Gaussians [ 159 , 160 ], This is done by recasting the initial wavefunction as 

jff(x) = j j d£dp(g£r\*{f)g} fi (x) (B 3.4.32) 

where Sap Wis the Gaussian that is centred at point xwith momentum P 

g^(x) = constant c -<*-^ 2 * 2 c *" x/ft (B 3.4.33) 

and a is arbitrary. By applying these formulae two times, using the semi-classical approximation and 
eventually summing again over phases, the following expression results [ 161 , 162 and 163 ]: 


D) = f4x f 6p'^ ixf * nf *F(x\p\r). (B 3.4.34) 


W|*{ 


where the exact form of F is given clearly in [ 163 ], 

The resulting expression is not difficult to numerically propagate. 

Numerical applications of the formalism have been very successful lately [ 162 , 163 , 164 , 165 , 166 and 167 ], 
This technique is very powerful for situations where short-time propagation is sufficient. For long-time 
processes, where the number of required trajectories is large, it is necessary to introduce other ingredients, 
such as methods for reducing the total propagation time — for example, the filter diagonalization method 
discussed above which was applied recently to the semi-classical approximation [ 113 , 165 ], or backward- 
forward propagation schemes which tend to make the semiclassical integrand much smoother [ 168 ]. 
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B3.4.9 NON-ADIABATIC EFFECTS 


B3.4.9.1 FORMALISM 


The discussion in the previous sections assumed that the electron dynamics is adiabatic, i.e. the electronic 
wavefunction follows the nuclear dynamics and at every nuclear configuration only the lowest energy (or 
more generally, for excited states, a single) electronic wavefunction is relevant. This is the Born- 
Oppenheimer approximation which allows the separation of nuclear and electronic coordinates in the 
Schrodinger equation. 

This assumption breaks down in many molecules, especially upon photo-excitation, since excited states are 
often close to each other or even cross one another (i.e. have the same electronic energy at a given nuclear 
position). Thus, the full Schrodinger wavefunction needs to be considered: 


V£tt,r e , = J] ^ ,rix < ')*«( r e! x > (B 3.4.35) 


where ^ is now the index of the electronic states and r Q is the position of the electron. ® n are eigensolutions, 
for each position of Jt, of the electronic part of the Hamiltonian (every part of the electron-nuclear 
Hamiltonian except for the nuclear kinetic energy). The associated eigenvalues are labelled u n , and are the 
adiabatic ground-and excited-state energies. From this expansion there follows an equation for the nuclear 
wavefunction [ 15 , 16 , 17 and 18, 169 ], A complication is that the adiabatic electronic states, $> n (r Q ; x), depend 
themselves on the nuclear coordinate jc. Thus, the nuclear kinetic-energy terms in the Schrodinger equation, 

which have derivatives in them (d 2 /dx 2 ), also operate on O . The resulting time-dependent Schrodinger 
equation is then straightforwardly shown to be 

32 ♦■ + E^s fc+ E^5*- + *«- l 5* < B3A36 > 


2Mda? 


In 


where the matrices x and x are 

*L (*) = / ** <*V: a?) 7- *« < r * ; ^ dT * c ( B 3 - 4 - 37 ) 

J <ix 

f d 2 

t (x) = j *„(r c ; x)—<t> m (r c ; a;)dr c . (B 3.4.38) 


T 2 , 
nrfl T 


The effect of x is usually negligible, due to the MM factor. However, x is fundamentally important, yet 
mathematically difficult to treat. Specifically, it can be shown that x 1 nm is proportional to the inverse of the 
energy difference between 
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electronic states n and m so its value can become infinite. Several ways to simplify the equation have been 
developed, starting with the work of Baer [ 15 , 16 , 17 and 18, 26, 169 , 170 ]. The mathematical theory is too 
intricate to discuss here in detail; it is sufficient to say that one does a 'gauge' transform, i.e. a transformation 
from adiabatic wavefunctions (the \|/ ) to a new set of 'diabatic functions' for which the derivative coupling 

(i.e. the x 1 coefficient of d/dx) is minimized or completely absent. Instead, a new part of the potential appears 
as a coupling of the electronic states (i.e. an off-diagonal potential). This coupling potential introduces a new 
'phase' problem. The difficulty is that the diabatic functions (i.e. the functions which are obtained after the 
transformation) are defined in terms of a linear combination (sometimes complex) of the adiabatic functions. 
[ 169 ] These linear combinations are non-unique. Consider the generic example of two crossing potential 
surfaces in two (nuclear) dimensions (see figure B3.4.16 . When the nuclei move around the potential 
contours, the linear combination changes. Upon return to the same starting point, the phase of the states need 
not be the same! Thus the nuclear diabatic functions are not uniquely defined. This is a very important effect 
(called the molecular phase or Berry phase [ 15 , 16 and 17, 19, 26, 171 ]), since it would appear even at low 
energies, way below the energies in which the conical interaction appears, i.e. way below the energies of the 
excited states! This phenomenon has been recently shown to be important in scattering of H+H 2 and its 
isotopic analogy [19]. (A simple interpretation of the molecular phase phenomenon in a H+H 2 type reaction 
system is that the full wavefunction is symmetric under exchange of any of the two hydrogen atoms and it is 
antisymmetric under electron-electron exchange, so that the nuclear part of it should be antisymmetric under 
a change of any pair of the three nuclei. This modifies the Schrodinger equation for the atomic motion.) In 


addition, it was shown in a model system that they can affect state-to-state transition probabilities in a reactive 
system [ 172 ]. 



Figure B3.4.16. A generic example of crossing 2D potential surfaces. Note that, upon rotating around the 
conic intersection point, the phase of the wavefunction need not return to its original value. 

The molecular phase effects are especially important when the system has some type of symmetry. 
Nevertheless, the typical treatment of non-adiabatic effects ignores the adiabatic phase, although, as 
cautioned, this is a problematic step. 

In the remainder of this section, we will follow this simplifying (and problematic) assumption, and postulate 
that, upon the adiabatic to diabatic transformation, the Schrodinger equation has the form: 


1 ^T = " TT7 T^ $» + X V ™ * '« (B3.4.39) 
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where V is called the diabatic potential matrix. (Note that if we were to use a finite set of molecular valence 
orbitals, ^ n (^ e ) ? that do not depend on the nuclear orbitals, and expand the Schrodinger wavefunction in terms 
of O , we would automatically obtain this equation. Such a basis would be automatically diabatic.) 

To see physically the problem of motion of wavepackets in a non-diagonal diabatic potential, we plot in figure 
B3.4.17 a set of two adiabatic potentials and their diabatic counterparts for a ID problem, for example, 
vibrations in a diatom (as in metal-metal complexes). As figure B3.4.17 shows, if a wavepacket is started 
away from the 'crossing' point, it would slide towards this crossing point (where V^ = V 22 ) where it would 
branch; a part of it would continue on the same adiabatic state (i.e. shift to a different diabatic state) and the 
other part would 'jump' to a different adiabatic state. 


excited 

^ Diabat i 



Diabai 2 


Figure B3.4.17. When a wavepacket comes to a crossing point, it will split into two parts (schematic 
Gaussians). One will remain on the same adiabat (different diabat) and the other will hop to the other adiabat 
(same diabat). The adiabatic curves are shown by full lines and denoted by 'ground' and 'excited'; the 
diabatic curves are shown by dashed lines and denoted 1, 2. 

The problem of branching of the wavepacket at crossing points is very old and has been treated separately by 
Landau and by Zener [ 15 , 173 , 174 ]. The model problem they considered has the following diabatic coupling 
matrix: 


=( 


""I"*"' „/-«) (B3.4.40, 


where IF is the difference in slope of the potentials (i.e. the difference in force felt in each state), T is the 
coupling element and Rq is the crossing point. Landau and Zener showed that, in such a case, the probability 
for the wavefunction to transfer from the higher adiabatic level to the lower one (i.e. to remain on the same 
diabat) is 


( -*V\ 2 \ 


Aitm-riiabauic = C *P 77T^ (B3.4.41) 
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while the probability to remain on the same adiabatic level is 

Aidubiilu: = 1 _ Pnon-adiubalie (B3.4.42) 

where v is the velocity of the wavepacket at the crossing point. Note two things about this formula: the steeper 
the difference is in the potentials, the higher the probability of a non-adiabatic transfer; in addition, if the mass 
is large (i.e. the velocity is low), then the motion is adiabatic (P ac jiabatic ~ ^' 

B3.4.9.2 NUMERICAL APPROACHES FOR SIMULATING NON-ADIABATIC PROCESSES 

The simplest approach to simulating non-adiabatic dynamics is by surface hopping [ 175 , 176 ], In its simplest 
form, the approach is as follows. One carries out classical simulations of the nuclear motion on a specific 
adiabatic electronic state (ground or excited) and at any given instant checks whether the diabatic potential 
associated with that electronic state is intersecting the diabatic potential on another electronic state. If it is, 
then a decision is made as to whether a 'jump' to the other adiabatic electronic state should be performed, 


based on the values for p adiabatic and P non . adiabatic (when P non . adiabatic is close to 1, a jump to the other 
electronic state is made with a high probability so that the particle remains on the same diabatic potential). If 
a jump is made, the particle continues its motion along the new adiabatic potential surfaces, with the same 
instantaneous position and momentum. 

This approach is very simple and powerful. It has been used in numerous studies (for references see [ 176 , 
177 ]) and generally captures the essentials of the adiabatic versus non-adiabatic branching. It is especially 
useful in circumstances where the nuclear motion is essentially classical (i.e. zero point motion and tunnelling 
can be ignored). 

This basic hopping model has a major disadvantage, however, as it fails to say much about the phases of the 
wavefunction. Consider a case where the wavefunction visits a region where surface hopping occurs, so a part 
of it hops, and at some later time it re-visits this region and again a part of it undergoes hopping. These two 
parts would interfere together and the interference may be constructive or destructive, but the hopping model 
does not specify this information. 

To remedy this difficulty, several approaches have been developed. In some methods, the phase of the 
wavefunction is specified after hopping [ 178 ]. In other approaches, one expands the nuclear wavefunction in 
terms of a limited number of basis-set functions and works out the quantum dynamical probability for 
jumping. For example, the quantum dynamical basis functions could be a set of Gaussian wavepackets which 
move forward in time [ 147 ]. This approach is very powerful for short and intermediate time processes, where 
the number of required Gaussians is not too large. 

The ultimate approach to simulate non-adiabatic effects is through the use of a full Schrodinger wavefunction 
for both the nuclei and the electrons, using the adiabatic-diabatic transformation methods discussed above. 
The whole machinery of approaches to solving the Schrodinger wavefunction for adiabatic problems can be 
used, except that the size of the wavefunction is now essentially doubled (for problems involving two- 
electronic states, to account for both states). The first application of these methods for molecular dynamical 
problems was for the charge-transfer system 
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Here quantum-mechanical vibrational state-to-state differential cross sections were calculated for a 
translational energy of E^ = 20 eV and compared with experiments, with very good agreement between 
experiment and theory. In another application of this approach, state-selected integral cross sections were 

calculated for the (Ar + H 2 ) + system. Reactive (exchange), charge-transfer and spin transitions processes were 
treated simultaneously in one single calculation, and they compared very well with experiments [ 179 , 180 ], 

Finally, semi-classical approaches to non-adiabatic dynamics have also been formulated and successfully 
applied [ 167 , 181 ]. In an especially transparent version of these approaches [ 167 ], one employs a 
mathematical 'trick' which converts the non-adiabatic surfaces to a set of coupled oscillators; the number of 
oscillators is the same as the number of electronic states. This method is also quite accurate, except that the 
number of required trajectories grows with time, as in any semi-classical approach. 


B3.4.10 CONTROLLING MOLECULAR MOTION 


The preceding sections were concerned with the description of molecular motion. An ambitious goal is to 
proceed further and influence molecular motion. This lofty goal has been at the centrepiece of quantum 
dynamics in the past decade and is still under intense investigation [ 182 , 183 , 184 , 185 , 186 , 187 , 188 , 189 , 
190 , 191 , 192 , 193 and 194 ]. Here we will only describe some general concepts and schemes. 

The basic Hamiltonian describing the motion of atoms and molecules under a strong laser is simple in the 
dipole approximation, 

H = «, - jj - Kit) (B3.4.43) 

where E = E(t) is the time-dependent electric field at the molecule, while /? is the Hamiltonian of the static 
system; \i is the transition dipole operator, which typically connects different electronic states. 

There are several different possible goals in controlling molecular dynamics. One goal can be the localization 
of excitations to a specific bond in a molecule, and the molecule could be broken along that bond [ 188 ], 
Alternatively, one can try to transfer the molecule completely to an excited electronic state [ 186 ]. Another is 
the control of alignment (so that a molecule would point in a certain direction) [ 189 , 190 ], Still another goal 
would be the control of branching ratios; for example, in a reaction of an atom with a diatom, A + BC, one 
may want to control the branching into 


-34- 


products [182, 183 and 184 ]: 


A * BC- 


AB+C 
AC- B 
BC-A 
A*B+C. 


Finally, one may want to control the emission of light from molecules. 

The conceptually simplest approach towards controlling systems by laser field is by 'teaching' the field [ 188 , 
191 , 192 and 193 ], Typically, the field is experimentally prepared as, for example, a sum of Gaussian pulses 
with variable height and positions. Each experiment gives an outcome which can be quantified. Consider, for 
example, an A + BC reaction where the possible products are AB + C and AC + B; if the AB + C product is 
preferred one would seek to optimize the branching ratio 

Pi,\ . BC- AB-C) 

/Wh = -. B3.4.44) 

J'tA^BC-* AB + C\AC + B) 

In a purely experimental (non-theory) approach [ 188 , 191 , 192 and 193 ] the branching ratio can be controlled 
by repeating the experiment many times, each with a randomly chosen set of pulse magnitudes and start times. 
One can repeat the experiment, varying the electrical field somewhat each time until the best outcome is 
achieved. This approach maybe the most appropriate one for large systems where little is known about the 
underlying dynamics and it has recently been demonstrated to work very well on dissecting large molecules 
[188]. 

Closely related to these 'experimental' approaches are optimal control procedures, in which one simulates 


theoretically the effects of the electric field on the system, and then modifies the electric field to give the best 
objective, i.e. a desired output (in this case: a high branching ratio) [ 182 , 183 ]. The optimal control algorithm 
can be recast in a very powerful mathematical form which makes the calculation converge rapidly to give an 
excellent field for any objective, if it is possible to simulate the system motion theoretically. 

A different set of approaches uses simple physical properties to control the system [ 184 ], To demonstrate this 
type of problem, consider an even simpler branching problem where, upon excitation, two possible degenerate 
products are simultaneously produced. An example would be to photo-dissociate a diatom AB and produce 
different states of the system: one state labelled A + B*, in which B is electronically excited and A is receding 
away slowly; and another state, labelled A + B, in which B is in the ground state and A is receding rapidly (so 
that the total energy is in both cases equal). The simplest method of controlling the A + B* versus A + B 
production rate would be to mix two different pathways for obtaining A + B and A + B*; for example, mixing 
a field of frequency co with a phase-lagged third harmonic of a field which is three times lower in frequency 
(see figure B3.4. 18 ): 


1 1 cos ti>l + k j cos F — ■+■ q> I 


(B3.4.45) 
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Other possible choices are to use two pairs of frequencies which together have the same energies. The key 
point is that quantum interference between the two pathways can be used to control the branching ratio. This 
coherent-control approach is very general and can be used in virtually any branch of molecular dynamics, 
including scattering and photo-dissociation. 



Figure B3.4.18. A schematic use of coherent control in AB — » A + B, A + B* dissociation: use of a single 
high-frequency photon (co) or three low-intensity (co/3) photons would lead to emerging wavefunctions in 
both arrangements. However, by properly combining the amplitudes and phases of the single- and three- 
photon paths, the wavefunction would emerge in a single channel. 
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B 3.5 Optimization and reaction path algorithms 

Peter Pulay and Jon Baker 


B 3.5.1 INTRODUCTION 

A quantum mechanical treatment of molecular systems usually starts with the Born-Oppenheimer 
approximation, i.e., the separation of the electronic and nuclear degrees of freedom. This is a very good 
approximation for well separated electronic states. The expectation value of the total energy in this case is a 
function of the nuclear coordinates and the parameters in the electronic wavefunction, e.g., orbital 
coefficients. The wavefunction parameters are most often determined by the variation theorem: the electronic 
energy is made stationary (in the most important ground-state case it is minimized) with respect to them. The 


optimized energy, calculated as a function of the nuclear coordinates, is known as the potential energy surface 
(PES). Finding its (local) minimum gives the calculated equilibrium structure. The latter, although 
experimentally not directly accessible, is perhaps the most satisfactory definition of molecular geometry, and 
serves as the best starting point for a treatment of molecular vibrations. A large part of the total effort 
expended in quantum chemical calculations is spent in optimizing either electronic parameters or molecular 
geometries; the latter task is dominant in empirical force field calculations. The optimization of electronic 
wavefunctions is usually treated separately from the optimization of molecular geometries; however, there are 
enough similarities between the two problems to discuss them together. 

Both the electronic and the geometry optimization problem, particularly the latter, may have more than one 
solution. For small, rigid molecules, the approximate molecular geometry is chemically obvious, and the 
presence of multiple minima is not a serious concern. For large, flexible molecules, however, finding the 
absolute minimum, or a complete set of low-lying equilibrium structures, is only a partially solved problem. 
This topic will be discussed in the last section of this chapter. The rest of the article deals with local 
optimization, i.e., finding a minimum from a reasonably close starting point. We will also discuss the 
determination of other stationary points — most importantly saddle points-constrained optimization, and 
reaction paths. Several reviews have been published on geometry optimization [1, 2]. The optimization of 
SCF-type wavefunctions is often highly nonlinear, particularly for the multiconfigurational case, and this has 
received most attention [3, 4]. 

The most important consideration affecting the choice of the method for locating minima, or stationary points 
in general, is the availability of analytical derivatives of the object function, in our case the energy. Zeroth- 
order (energy only) methods can be used for a few variables but are notoriously inefficient for a larger number 
of degrees of freedom. First-order methods, which use both energy and gradient (first-derivative) information, 
are particularly useful in quantum chemistry because the extra effort needed to evaluate all first derivatives is 
usually comparable to the calculation of the energy itself and may be less, particularly for the electronic 
degrees of freedom [5]. Second-order methods, which use second derivatives, further improve the 
convergence of the optimization process. However, calculating second derivatives tends to be much more 
expensive than calculating the gradient, and full second-order methods are usually cost efficient only when 
first-order methods have severe convergence problems. Derivatives higher than the second have been used 
occasionally, but they are not generally available and are expensive to calculate. Consequently, this article 
will mainly concentrate on first- and second-order methods. 


The electronic energy Win the Born-Oppenheimer approximation can be written as W= W{q,p), where q is 
the vector of nuclear coordinates and the vector p contains the parameters of the electronic wavefunction. The 
latter are usually orbital coefficients, configuration amplitudes and occasionally nonlinear basis function 
parameters, e.g., atomic orbital positions and exponents. The electronic coordinates have been integrated out 
and do not appear in W. Optimizing the electronic parameters leaves a function depending on the nuclear 
coordinates only, E = E(q). We will assume that both W(q, p) and E(q) and their first derivatives are 
continuous functions of the variables q x and/?.. 


B 3.5.2 OVERVIEW OF TECHNIQUES FOR LOCAL OPTIMIZATION 

B3.5.2.1 CHARACTERIZATION OF STATIONARY POINTS 

In this section, we will discuss general optimization methods. Our example is the geometry optimization 
problem, i.e., the minimization ofE(q). However, the results apply to electronic optimization as well. There 
are a number of useful monographs on the minimization of continuous, differentiate functions in many 
variables [6, 7]. 


For a point on the potential energy surface to be a stationary point, all its first derivatives,^ E/9q i? must vanish, 
and thus the whole gradient vector g = {d Eld q f , i= l,n} should be zero. The character of a stationary point, 
i.e., whether it is a local minimum, maximum or saddle point, can be determined by examining the second 
derivatives. Expanding the energy change in the neighbourhood of a stationary point q in a power series in 
terms of displacement coordinates from the stationary point, 5q f = q f - q f , gives 

£(q) - £(q°) = V(0£/D^)| q .% + 1/2 V(3 2 £/D#3^)|tf%% 

I Lj (B3.5.1) 

+ higher terms. 

As q is a stationary point, the linear terms (d£/39f)|^>&Jjin equation B3.5.1 vanish, and higher-order terms do 

not have to be considered for local characterization of stationary points, because lower-order terms (quadratics 
in this case) always dominate for sufficiently small displacements. Introducing the force constant, or Hessian 
matrix, i.e., the matrix of second derivatives H, Hfj = {^Efikfrikfjll^h the above equation can be written in a 

convenient matrix notation as 

A/-: = £(q) - E(q°) = ^q*H,1c] (B3.5.2) 

Let us express the displacement coordinates as linear combinations of a set of new coordinates y : 8q = Uy; 

then AE = y * U * HUy. U can be an arbitrary non-singular matrix, and thus can be chosen to diagonalize the 

symmetric matrix H: U'HU = A, where the diagonal matrix A contains the (real) eigenvalues of H. In this 
form, the energy change from the stationary point is simply a £ — I ^\ j\. V r. It is clear now that a sufficient 

condition for a minimum is that all eigenvalues of H be positive, i.e., H must be a positive definite matrix. 
Otherwise choosing^. ^ 0, all other y. = 0, where A^. is a negative eigenvalue, will decrease the energy, i.e., 
the stationary point cannot be a minimum. Zero eigenvalues of the Hessian (inflection points) need not be 
considered because their probability in the general case is 


vanishingly small. Stationary points with only one negative Hessian eigenvalue are called first-order saddle 
points. They have considerable importance as transition states in chemical reactions. The energy difference 
between a transition state and the reactant(s) is the barrier corresponding to the reaction path passing through 
that transition state. Stationary points with two or more negative eigenvalues are far less important, as in this 
case there is always a reaction path with lower barrier which determines the reaction probability (however, in 
symmetrical systems higher-order saddle points may be preferred reference geometries[8]). 

The simplest smooth function which has a local minimum is a quadratic. Such a function has only one, easily 
determinable stationary point. It is thus not surprising that most optimization methods try to model the 
unknown function with a local quadratic approximation, in the form of equation (B3.5.1) . 

B3.5.2.2 ENERGY-ONLY METHODS 

As noted in the introduction, energy-only methods are generally much less efficient than gradient-based 
techniques. The simplex method [9] (not identical with the similarly named method used in linear 
programming) was used quite widely before the introduction of analytical energy gradients. The intuitively 
most obvious method is a sequential optimization of the variables (sequential univariate search). As the 
optimization of one variable affects the minimum of the others, the whole cycle has to be repeated after all 
variables have been optimized. A one-dimensional minimization is usually carried out by finding the 


minimum of a parabola fitted to three points obtained by varying one of the variables (keeping the others 
constant), changing values so as to bracket the minimum, and zeroing in on the minimum by diminishing the 
step size [6]. Generalized to any vector direction on the surface, this is called a line search. The convergence 
rate of the sequential univariate search can be exceedingly slow if the variables are strongly coupled and, thus, 
this method is not recommended. A better alternative is to convert gradient methods, covered in the next 
section, to energy-only methods by calculating the gradients numerically. One of the most widely used 
energy-only methods is the modified Fletcher-Powell method described by Schlegel [JJ; perhaps better is the 
numerical version of Baker's Eigenvector Following (EF) algorithm (see later) [10]. In spite of these 
ingenious algorithms, the general consensus among researchers in the field is that energy-only methods are 
simply not cost effective for systems with more than a few degrees of freedom. 

B3.5.2.3 GRADIENT METHODS 

All efficient optimization methods require the gradient vector, i.e., the first derivatives of the function to be 
optimized. As the quantum mechanical energy as a function of nuclear coordinates is the result of an iterative 
procedure, and ordinary first-order perturbation theory is inapplicable in the usual case where the basis 
functions move with the nuclei, this is not a trivial problem. The introduction of analytical energy derivatives 
(forces on the atoms in the context of geometry optimization) of the SCF energy [11], and later 
generalizations to more complex wavefunctions (for reviews see, e.g., [5, 12]) improved the efficiency of 
geometry optimizations by one or two orders of magnitude, depending on the molecular size, making possible 
structure optimization for large polyatomic molecules. 

(A) NEWTON'S METHOD 

Most gradient optimization methods rely on a quadratic model of the potential surface. The minimum 
condition for the 


quadratic energy expression 

E{q) = Ctq- 1 ) i 5i| + s i j&fHiSq 

using the symmetry of H, leads to 

c; ■ H5q - and 5q = -H ' g, (B3.5.3) 

Here g is the gradient vector at q°, q c \ g P = (9E/df/j}|^. The minimizer q° + Sq is exact on a quadratic surface 
and requires only the solution of a linear system of equations. For a nonlinear surface, this method has to be 
iterated. Near the solution, the iterative procedure is quadratically convergent. In practice, this means that 
three or four iterations usually suffice for locating the minimum to high accuracy. This is the basic Newton (or 
Newton-Raphson) method. Despite its rapid convergence for nearly quadratic surfaces, it suffers from a 
number of shortcomings and in its original form is seldom used for geometry optimization. It has some 
importance for difficult cases of wavefunction optimization. The principal defect of Newton's method is that 
it requires the Hessian (second-derivative) matrix at every iteration. Second derivatives are typically much 
more expensive than gradients in quantum chemistry applications. Another problem is that far from the 
minimum the Hessian may not be positive definite. The energy change in a Newton-Raphson cycle is Sq ' g = 
~2 g ' H _1 g in the quadratic approximation. Thus the energy does not necessarily decrease, even for small 
steps, unless H (and obviously its inverse) is positive definite. 


(B) SIMPLE RELAXATION 

Both defects of the Newton method can be eliminated by replacing the exact inverse Hessian H by a (fixed) 
positive definite approximation to it, F. This method is known as simple relaxation. In both geometry and 
wavefunction optimization, it is usually possible to construct a fairly good approximate Hessian. For 
geometry optimization, this can be based on the molecular connectivity and transferability of potential 
parameters, or on previous low-level calculations. For wavefunction optimization, a guess based on orbital 
energy differences is often reasonably accurate. Far from the minimum, approximate Hessian methods using 
positive definite matrices are preferable to the Newton method, as they have the descent property, i.e., the 
energy decreases for sufficiently small steps. However, they lack the quadratic terminal convergence rate of 
the Newton method. Instead, the residual error vector (the distance from the accurate minimum) is given by r 

W = (I - FH) V°) on a quadratic surface. Here I is the unit matrix and ( n ' denotes the nth cycle. The ultimate 
convergence rate is governed by the magnitude of the largest eigenvalue of the matrix (I - FH). This will be 

small if F is a good approximation to H . To show this, we introduce new variables, through a linear 
transformation of the old ones, which diagonalize FH. Using these coordinates, the Mi component of the 

residue in step n is A^r^where X^ is the Ath eigenvalue of (I - FH). This explains a common property of 

simple relaxation: it usually shows good initial convergence but slows down later as the surviving components 
of the residuum take on directions in which the Hessian is poorly estimated. If one of the eigenvalues of (I - 
F H) exceeds 1 in absolute magnitude then simple relaxation without a line search will ultimately diverge. 

If there is no approximate Hessian available, then the unit matrix is frequently used, i.e., a step is made along 
the gradient. This is the steepest descent method. The unit matrix is arbitrary and has no invariance properties, 
and thus the 


resulting step may be made arbitrarily large or small by scaling the coordinates. Therefore, steepest descent 
methods require a line search for a minimum along the direction of the gradient vector. Line searches are 
often recommended in general optimization texts. However, they tend to be less efficient in quantum 
chemistry, as the evaluation of the gradient vector costs roughly the same as the calculation of the energy for a 
wide range of methods, and supplies much more information. Nevertheless, they may be necessary for 
strongly non-quadratic functions (or, what is essentially the same thing, at points far from the minimum). A 
good compromise which requires no additional energy evaluations was suggested by Schlegel [13]: a 
polynomial is fitted to the energies at two points, and to the gradients projected on the line connecting them, 
and its minimum is located. The polynomial can be cubic or, as recommended by Schlegel, a special quartic 
with only one minimum. If a line search is used, the energy of simple relaxation and steepest descent steps 
should always decrease, and they should ultimately converge. However, convergence may be very slow if F is 

a poor approximation to the inverse Hessian H , as is usually the situation in the steepest descent method. In 
this case, illustrated in figure B3.5.1 the optimization takes a zigzag path converging slowly to the minimum. 



Figure B3.5.1. Contour line representation of a quadratic surface and part of a steepest descent path 
zigzagging toward the minimum. 


In simple relaxation (the fixed approximate Hessian method), the step does not depend on the iteration 
history. More sophisticated optimization techniques use information gathered during previous steps to 
improve the estimate of the minimizer, usually by invoking a quadratic model of the energy surface. These 
methods can be divided into two classes: variable metric methods and interpolation methods. 

(C) VARIABLE METRIC METHODS 

In these methods, also known as quasi-Newton methods, the approximate Hessian is improved (updated) 
based on the results in previous steps. For the exact Hessian and a quadratic surface, the quasi-Newton 
equation Ag^ = HAq^ and its analogue H _1 Ag^ = Aq^^ must hold (where Ag^ = g^ w+1 ^ - g^ n \ and 
similarly for Aq^). These equations, which have only n components, are obviously insufficient to determine 
the n(n + l)/2 independent components of the Hessian or its inverse. Therefore, the updating is arbitrary to a 
certain extent. It is desirable to have an updating scheme that converges to the exact Hessian for a quadratic 
function, preserves the quasi-Newton conditions obtained in previous steps, and — for minimization — keeps 
the Hessian positive definite. Updating can be performed on either F or its inverse, the approximate Hessian. 
In the former case repeated matrix inversion can be avoided. All updates use dyadic products, usually built 

from Aq( w ) and FAgW. Fletcher [6] gives a detailed description of various update techniques. The most 
important update formulae are the Murtagh-Sargent update [14]: 


the Davidon-Fletcher-Powell (DFP) update [15]: 

and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update [6]: 

pin-i'i = pi*) +5( | +j£gT<rf} F c*»£ g ^ iq T(,0£ q <it] _ j(Aq^ Ag^'F 1 ^ + F'^A^'iq 71 ^) 

where s = l/(Aq T ^AgW). A linear combination of these updates is also possible. Both the DFP and BFGS 
updates preserve positive definite F matrices provided Aq^ T Ag^ > 0; current opinion is that the latter is the 
best update to use for general minimization. 

For transition state searches, none of the above updates is particularly appropriate as a positive definite 
Hessian is not desired. A more useful update in this case is the Powell update [16]: 

H iflin = H cn _ | v A q 0MT + Aq 0« v T _ [ v ^Aq MT )(Aq {n) Aq iifrT )/t)/t 

where v = AgW - H^Aq^ and t = (Aq^ T AqW). The Powell update allows the signature of the Hessian, i.e., 
the number of negative eigenvalues, to change, which is necessary if the region of the potential energy surface 
is inappropriate for the stationary point being sought. Perhaps the best Hessian update for transition state 
searches is a linear combination of the Powell and Murtagh-Sargent updates proposed by Bofill [17, 18 ]. 

For a very large number of variables, the question of storing the approximate Hessian or inverse Hessian F 
becomes important. Wavefunction optimization problems can have a very large number of variables, a million 
or more. Geometry optimization at the force field level can also have thousands of degrees of freedom. In 
these cases, the initial inverse Hessian is always taken to be diagonal or sparse, and it is best to store the 


upgrade vectors and associated scalars and generate the inverse Hessian in situ, rather than store the full 
updated inverse Hessian itself. 

A more general update method, widely used in the Gaussian suite of programs [19], is due to Schlegel [13]. In 
this method, the Hessian in the ^-dimensional subspace spanned by taking differences between the current q 

( n ) and previous geometries q( n \ . . ., q'°) is calculated numerically. This is possible (although not terribly 
accurate), as the n Aq and n Ag values suffice for the calculation of an ^-dimensional Hessian by forward 
differences. The Hessian in the small subspace is then projected back to the full space. A line search along the 
new correction vector is avoided by using the constrained quartic interpolation scheme described above. 

(D) INTERPOLATION METHODS 

For a quadratic surface, the gradient vector is a linear function of the coordinates. An alternative way of using 


information gathered during the optimization is to interpolate among the coordinate vectors obtained in the 
preceding cycles. The basic interpolation method is the preconditioned conjugate gradient (CG) method [20], 
Although usually formulated in a different way, it is equivalent to first making a simple relaxation step using 
an approximate inverse Hessian F, called the preconditioner, and replacing the calculated displacement by a 
linear combination of the current and all previous coordinate displacement vectors Aq^: 


Aq ltl+]) = -Fg<"- ]) - JPftiti' 1 "'. 


The coefficients P . are chosen so that, on a quadratic surface, the interpolated gradient becomes orthogonal to 

all AqW. This condition is equivalent to minimizing the energy in the space spanned by the displacement 
vectors. In the quadratic case, a further simplification can be made as it can be shown that all P z - with the 
exception of P vanish. The latter is given by 

^ = g ( H l l T Fg ,„l) /(gf „ ) T A(| ,n ) 

although there are several forms which are equivalent for a quadratic function but not in general [6], e.g., the 
Polak-Ribiere form [21] . The gradient at the (interpolated) new point is not recalculated but is itself 
interpolated. For very large problems, the conjugate gradient method has the advantage that it needs to store 
only a few vectors (the preconditioner is usually diagonal or sparse, and must be positive definite). 

A similar method, direct inversion in the iterative subspace (DIIS) [22, 23] tries to minimize the norm of the 
error vector (in most cases the gradient) by interpolating in the subspace spanned by the previous vectors. 
Unlike the CG method, DIIS is able to converge to saddle points. DIIS is now the standard method for the 
SCF optimization problem. It is also useful for geometry optimization [24]. It does not have the conjugate 
property and therefore requires the storage of previous coordinate and gradient vectors (in practice, usually 
restricted to about 20 or fewer). However, not using the CG property, which is valid for quadratic surfaces 
only, probably adds to the stability of the method. 

To derive the DIIS equations, let us consider a linear combination of coordinate vectors q( \. . ., q( n \ 
q = V7 Gi^> On a quadratic surface, the gradient (or any linear function of the gradient) is an analogous 

linear combination if the coefficients sum to unity: 


g = X>e'"- 


(B3.5.4) 


Minimizing the square of the gradient vector under the condition Y.1 = lyields the following linear system 
of equations 
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where B- = g® g^. Due to the wide dynamic range of the B f . coefficients, it is best to normalize the diagonal 
elements of this equation to 1. This procedure yields an extrapolated geometry and gradient B3. 5.4 . The next 
step is calculated by relaxing the extrapolated gradient with the approximate inverse Hessian and adding it to 
the extrapolated geometry. Hessian updating can be combined with DIIS, but the method works well even 
with a static Hessian. Any linear function of the gradient can be used instead of the gradient. This changes the 
weighting of the error somewhat. 

B3.5.2.4 SECOND-ORDER METHODS 

As mentioned in the introduction, full second-order methods (e.g., the Newton method) are usually not cost 
efficient, particularly for geometry optimization, due to the high cost of Hessian evaluation. For wavefunction 
optimization, the explicit evaluation of the full Hessian is not practicable due to the large number of degrees 
of freedom. Second-order methods can still be utilized by using direct methods, i.e., finding the solution of the 
Newton-Raphson equation HSq = -g without explicitly constructing and inverting H. In such a case, second- 
order methods are competitive, and in difficult cases superior to first-order methods in the quadratic region, 
i.e., close to the minimum. Optimization of transition states also frequently requires the explicit evaluation of 
the Hessian or a submatrix of it. 

B3.5.2.5 DAMPING METHODS 

Particularly in the early stages of an optimization, when the gradient is large and Hessian information is 
inaccurate, the computed step size may be too great; using such large steps may lead to divergence or 
convergence to unwanted minima. Methods which incorporate a line search are usually immune to this 
problem; however, as discussed above, line searches are inefficient in quantum chemistry because the 
evaluation of the full gradient vector can often take less time than the evaluation of a single energy. 

The simplest way to deal with large, potentially disastrous steps is to limit the step size, either its maximum 
component or its norm. For geometry optimization, 0.2 to 0.3 A or rad appears to be a reasonable value for a 
maximum single component; 0.3 rad is also appropriate for the maximum orbital rotation component in 
wavefunction optimization. A better method than simply scaling the displacement is to use a trust radius. The 
idea behind the trust radius is to restrict the step taken so that it lies in the local region of the energy surface 
where the truncation of the original power series expansion ( equation (B3.5.1) ) to quadratic terms only is 
valid. The neighbourhood about the current point where quadratic behaviour holds is called the trust region. 


If the computed step size exceeds the trust radius, t, its direction is reoptimized under the condition that |Aq| = 
t, i.e., the Lagrangian 


£(q. J) = E(q°) - Aq'g * ^Aq r HAq * ^(|Aq| 2 -f 2 ) 


is minimized. The solution is Aq = -(H + d\)~ g where the positive denominator shift d is a complex function 
of ^ and is usually determined iteratively from the condition |Aq| = t. The trust radius can be adjusted based on 
the accuracy of the energy difference predicted by the quadratic model, compared with the actual energy 
difference [2, 6]. One problem with the trust radius method, and other methods which limit |Aq|, is that they 
do not scale properly with the system size: they are not 'size consistent'. For instance, if n identical, non- 
interacting molecules are optimized simultaneously, the maximum displacement norm for each decreases like 

n~ /2 . For this reason, it is perhaps best to limit the maximum component, rather than the norm of the 
displacement [6]. 

An alternative, and closely related, approach is the augmented Hessian method [25]. The basic idea is to 
interpolate between the steepest descent method far from the minimum, and the Newton-Raphson method 
close to the minimum. This is done by adding to the Hessian a constant shift matrix which depends on the 
magnitude of the gradient. Far from the solution the gradient is large and, consequently, so is the shift d. One 

can, e.g., choose d to be proportional to the expected energy lowering, d = -a 2 g T Aq (note that g T Aq is 
negative for minimization and thus d is positive), and solve the damped equation 

This is equivalent to finding the lowest eigenvalue X (which is always negative and approaches zero at 
convergence) of the generalized eigenvalue equation 

(5 !)(?)->C?)C *)■ 

Equation B 3. 5. 5 is, in turn, equivalent to the minimum condition on the rational function 

E(q) = fc(qV(a + Aq + ±Aq T HAq)/(l + * 2 Aq T Aq). 

For a = 0, minimization of this expression yields the Newton-Raphson formula for Aq. For large values of a, 

Aq becomes asymptotically a g/|g|, i.e., the steepest descent formula with a step length 1/a. The augmented 
Hessian method is closely related to eigenvector (mode) following, discussed in section B3. 5. 5. 2 . The main 
difference between rational function and trust radius optimizations is that, in the latter, the level shift is 
applied only if the calculated step exceeds a threshold, while in the former it is imposed smoothly and is 
automatically reduced to zero as convergence is approached. 

B 3.5.3 THE OPTIMIZATION OF WAVEFUNCTIONS 

The basic self-consistent field (SCF) procedure, i.e., repeated diagonalization of the Fock matrix [26], can be 
viewed, if sufficiently converged, as local optimization with a fixed, approximate Hessian, i.e., as simple 
relaxation. To show this, let us consider the closed-shell case and restrict ourselves to real orbitals. The SCF 
orbital coefficients are not the 
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best set of coordinates to work with because, being constrained to be orthonormal, they are not independent. 

A better set of parameters can be chosen as the above-diagonal elements of an antisymmetric matrix K, used 

1 -1 
to build an orthogonal matrix U by U = exp(K) [27] or, alternatively U = 2(1 - 2K) - 1 (see, e.g., [23]). U 

describes a generalized rotation between the orbitals. We start with an orthonormal set of orbitals C Q , defined 

as cp T = X T Cq, where cp T and % T are row vectors of molecular orbitals and atomic basis functions, respectively. 
All possible orthonormal orbital sets can be expressed as C = CqU. Not all elements of K are relevant. 

Rotations between virtual orbitals can obviously be omitted, and rotations between two occupied orbitals have 

no effect on the energy because the determinantal wavefunction is invariant against such rotations. Therefore, 

only rotations between occupied and virtual orbitals, i.e., the elements K fa , i<n,a> n are needed if, as usual, 

the n occupied orbitals cp i precede the virtual ones cp a . 

The gradient and second derivative components of the SCF energy can be expressed for both kinds of 
parametrization (see [28]) as 


\SE}dK hi = F i£i (B3.5.6) 

^d 2 E/BK^BK Jh = F ab Sij - FijSe* + 4(io|jW - (ij\ab) - (ih\ja) (B3.5.7) 

where, e.g., (ij\ab) is a two-electron integral in the usual Mulliken notation. In a typical SCF iteration near 
convergence, the Fock matrix is nearly diagonal, and the orbital rotation parameter corresponding to a small 
occupied-virtual (Brillouin-violating) element F ia is, from first-order perturbation theory, K^ = F ia /(s a - s^. 
Comparing this with (B3.5.7) and noting that in a canonical orbital basis F- = s i? F aa = s a and the off-diagonal 
elements of F are zero, it is clear that the ordinary SCF iteration is equivalent to neglecting the two-electron 
integrals in the electronic Hessian, equation (B3.5.7). This explains the observation that straight SCF iteration 
frequently slows down as the SCF procedure progresses, cf. B3.5.2.3 . 

For ordinary SCF problems, interpolation methods are particularly suitable, as they require the storage of only 
a limited amount of information. The standard method for closed-shell or simple open-shell problems is DIIS 
(direct inversion in the iterative subspace) [22, 23]. Equation (B3.5.6) is not appropriate for the error vector 
because each error vector is expressed in a different basis. Transforming the occupied-virtual block of the 
orbital Fock matrix to a common basis, e.g., to the atomic orbital basis, yields the commutator SDF - FDS, 
arranged as a vector, for the gradient [22, 23]. DIIS usually converges well for closed shell systems from a 
reasonable starting wavefunction. Several modifications of this method have been proposed [29, 30]. For 
unrestricted (UHF) wavefunctions, DIIS is widely used but it is less appropriate. As it is a gradient norm 
minimization technique, it has a tendency to converge to the closest stationary point. For an even number of 
electrons, the closed-shell wavefunction is a formal solution of the UHF equations, and DIIS, unless started 
close to the expected minimum, may converge uphill to the closed-shell solution. The preconditioned 
conjugate gradient method (the preconditioner being the SCF approximation to the Hessian) is probably more 
appropriate in this case. Similarly, in density functional calculations with a plane wave basis set, the basis set 
is often huge, and the conjugate gradient method, with its limited storage requirement, is preferable [ 31 , 32 ]. 

Due to the large number of variables in wavefunction optimization problems, it may appear that full second- 
order methods are impractical. For example, the storage of the Hessian for a modest closed-shell 
wavefunction with 500 
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basis functions and 200 electrons requires more than (100 x 400) 12 = 8x10 words. However, as shown by 
Bacskay [ 28 ] for closed- and open-shell SCF, and Lengsfield and Liu for MC-SCF [33], using techniques 
analogous to direct configuration interaction [34], the solution of the linear system of equations (B3.5.3) can 
be accomplished iteratively, each micro-iteration (to be distinguished from the SCF iterations, which are 
called macro -iterations) taking about the same effort as an SCF cycle. For closed- and open-shell SCF, the 
resulting doubly iterative algorithm is comparable in efficiency with DIIS [ 35 ] but it is more complex, and 
thus less widely used. 

The situation is different for the multi-configurational SCF (MC-SCF) case. Although DIIS has been used 
successfully for simpler cases [36], the strong coupling between orbital rotations and configuration interaction 
(CI) coefficients mandates the use of second-order or approximate second-order methods (see the reviews [3 , 
4] and references therein). As the signature of the Hessian is frequently incorrect, the augmented Hessian 
(rational function) method, which forces a step in the right direction, is generally employed. Perhaps the most 
efficient method is that proposed by Werner and Meyer [37] and further expanded by Werner and Knowles 
[38]. In this method, an approximate MC-SCF energy expression is defined, which is accurate to second order 
in terms of the orbital coefficients C and not the unitary parameters K. Through this change of variables, the 
effect of orthonormality and the periodicity of the orbitals as functions of the orbital rotations are taken into 
account correctly, resulting in a large increase of the radius of convergence of the Newton-Raphson method. 


B 3.5.4 OPTIMIZATION OF MOLECULAR GEOMETRIES 

This section discusses techniques specific to the optimization of molecular geometries. 

There are four main factors that influence the rate of convergence of molecular structure optimizations: (1) the 
initial guess geometry; (2) the optimization algorithm; (3) the quality of the Hessian matrix and (4) the 
coordinate system. The first of these is obvious; the closer the starting geometry is to the final converged 
geometry, the fewer optimization cycles it should take to get there. Optimization algorithms will not be 
discussed here, as with a reasonable starting geometry and Hessian, most standard methods (see section 
B3.5.2 ) perform well. The choice of algorithm is, however, much more crucial for transition states, and one 
method, the Eigenvector Following algorithm, will be described in the section dealing with transition state 
optimization. The third point can also be dealt with briefly. Most current optimization algorithms use 
approximate second-derivative (Hessian) information with updating to help predict the next step. Assuming 
that the surface can be adequately modelled by a quadratic function, the more reliable the initial Hessian 
information and the updating is, the better will be the predicted step and the fewer cycles it should take to 
converge. Lower-level calculations, force fields and simple universal force constant [39, 40 and 41] formulae 
can be employed to generate the initial Hessian. The fourth factor, the coordinates used to carry out the 
optimization, is now recognized as being vitally important and it is the choice of coordinates that is largely 
responsible for the efficiency of modern geometry optimization algorithms. 

B3.5.4.1 THE COORDINATE SYSTEM 

As noted above, the coordinate system is now recognized as being of fundamental importance for efficient 
geometry optimization; indeed, most of the major advances in this area in the last ten years or so have been 
due to a better choice of coordinates. This topic is seldom discussed in the mathematical literature, as it is in 
general not possible to choose simple and efficient new coordinates for an abstract optimization problem. A 
nonlinear molecule with N atoms and no 
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symmetry needs 37V- 6 internal coordinates to specify its geometry. Unless symmetry or other constraints fix 
the values of some coordinates, it is not possible to omit coordinates. However, it is possible, and sometimes 


useful to use more. The coordinates are then not independent, a situation called redundancy. 

The two key factors that make a good set of coordinates are to minimize the degree of coupling between them, 
and make the potential energy surface more quadratic. In general, the less coupling the better, as variation of 
one particular coordinate will then have minimal impact on the other coordinates. Coupling manifests itself 
primarily as relatively large mixed partial derivative terms between different coordinates. For example, a 
strong harmonic coupling between two different coordinates, i andy, results in a large off-diagonal element, 
//.., in the Hessian matrix. Cubic and higher-order couplings are even more deleterious for optimization, as 
they cannot be eliminated by a linear transformation. 

Cartesian coordinates are an obvious choice as they can be defined for all systems, and gradients and second 
derivatives are calculated directly in Cartesians. Unfortunately, they normally make a poor coordinate set for 
optimization as they are fairly heavily coupled, their only advantage being their simplicity and completely 
general nature. If the quadratic model holds, i.e., for very small displacements, and if the gradient and Hessian 
are properly transformed, Cartesians are equivalent to any other coordinate set [42]. Of course, the source of 
good Hessian data is often a force field expressed in valence coordinates, so the latter are used implicitly. A 
further minor inconvenience of Cartesians is that the Hessian is singular, due to the presence of translational 
degrees of freedom (interestingly, rotations cause singularity of the force constant matrix — and zero 
vibrational frequencies — only at stationary points, a problem first discussed in [12] and rediscovered many 
times since). Cartesian optimization is used almost exclusively in molecular mechanics, despite its 
inefficiency, as it requires no transformation of the coordinates and derivatives. While the computational 
effort required by these transformations is negligible in ab initio work, it becomes significant in force field 
methods with energy and gradient computation being so rapid. Recent work promises to improve the 
efficiency of methods using valence internal coordinates [ 43 , 44 and 45 ], 

Z-matrix coordinates are widely used to define molecular geometries. A Z matrix specifies the molecular 
geometry in a treelike manner, by connecting each new atom in the system to those that have been defined 
previously. The first three atoms in the Z matrix are unique, with the first atom at the origin, the second lying 
on the Z axis (connected to the first by a single stretch) and the third lying in the XZ plane (connected to either 
the first or second atom via a stretch and defining a bend with the unconnected atom). Each new atom after 
the third is defined with respect to atoms previously defined in the Z matrix using, for example, one stretch, 
one bend and one torsion. An example of a typical Z matrix (for fluoroethylene) is shown in figure B3.5.2 . 
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Figure B3.5.2. Example Z matrix for fluoroethylene. Notation: for example, line 4 of the Z matrix means that 
a H atom is bonded to carbon atom CI with bond length L3 (angstroms), making an angle with carbon atom 

o 

C2 of A3 (degrees) and a dihedral angle with the fluorine atom of 180.0 . All parameters given lettered 
variable names (LI, Al etc) will be optimized; the dihedral angles are given explicitly as these are fixed by 

symmetry (the molecule is planar). Simple constraints can be imposed by removing parameters from the 

optimization list. 

Initially, the Z matrix was utilized simply as a means of geometry input. It was subsequently found that 
optimization was generally more efficient in Z-matrix coordinates than in Cartesians, especially for acyclic 
systems. This is not always the case, and care must be taken in constructing a suitable Z matrix. A short 
discussion on good Z-matrix construction strategy is given by Schlegel [39]. 

The first ab initio gradient geometry optimizations were performed in what are now called natural internal 
coordinates [46], although they were formally defined only later [47]. These coordinates are derived from 
vibrational spectroscopy and are appropriate for covalent (mainly organic) molecules. They include all 
individual bond stretching coordinates, but only non-redundant linear combinations of bond angles and 
torsions as deformational coordinates. Suitable linear combinations of bends and torsions (the two are 
considered separately) are selected using group theoretical arguments based on approximate local symmetry. 
The major advantage of natural internal coordinates in geometry optimization is that they significantly reduce 
the coupling, both harmonic and anharmonic, between the various coordinates. Compared to natural internals, 
Z-matrix coordinates arbitrarily omit some angles and torsions — to prevent redundancy — and this can induce 
strong anharmonic coupling between the coordinates, especially with a poorly constructed Z matrix. 
Successful minimizations can be carried out in natural internals with only an approximate (e.g., diagonal) 
Hessian provided at the starting geometry but a good starting Hessian is still needed for a transition state 
search. Using a suitable set of internal coordinates can reduce the number of cycles required to converge 
compared to the corresponding Cartesian optimization by an order of magnitude or more, depending on the 
system. 
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Despite their clear advantages, natural internals have only become popular relatively recently, principally 
because in early programs they had to be user defined, a tedious and error-prone procedure for large 
molecules. This situation changed with the development of algorithms capable of generating natural internals 
automatically from input Cartesians [48, 49]. For minimization, natural internals and their successors have 
become the coordinates of choice [ 50 , 51 ]. 

However, there are some disadvantages to natural internal coordinates. Their automatic construction proceeds 
by an exhaustive topological analysis involving thousands of lines of code and, for molecules with a complex 
structure, e.g., multiply fused rings and cages, the algorithm may be unable to generate a suitable non- 
redundant set of coordinates. Additionally, more coordinates than the 37V- 6 (where TV is the number of 
atoms) required may be generated. The redundancies can be removed by eliminating some coordinates, but 
this is arbitrary and may negatively influence convergence. 

Various methods have been suggested for dealing with redundant coordinates. The normal coordinate 
optimization method [ 52 ] can use a force field defined in redundant coordinates, but is restricted to rectilinear 
coordinates. A general force field, expressed in redundant coordinates, can be transformed to a non-redundant 
set [13]. The current method of choice is to carry out the optimization directly in the redundant coordinate 
space [53] using the concept of a generalized inverse. If the total number of internal coordinates, including 

redundancies, is n > 37V- 6, then one constructs and diagonalizes the n x n matrix G = BB T where B is the 
first-order transformation matrix from Cartesians to internal coordinates, Aq = BAx. Diagonalization of G 
results in two sets of eigenvectors; a set of m = 37V- 6 eigenvectors with eigenvalues X > 0, and a set of n - m 
eigenvectors with eigenvalues X = (to numerical precision). The eigenvalue equation for G can be written 


G(UB>-(URj(J J) (B3.5.8) 

and the generalized inverse of G, G~, involves inverting the non-zero eigenvalues only and back-transforming 

G =UA U 1 . 

In this way the optimization can be cast in terms of the original coordinate set, including the redundancies. 
Exactly the same transformations between Cartesian and internal coordinate quantities hold as for the non- 
redundant case (see the next section), but with the generalized inverse replacing the regular inverse. 

The redundant optimization scheme [ 53 ] can be applied to natural internal coordinates, which are sometimes 
redundant for polycyclic and cage compounds. It can also be applied directly to the underlying primitives. 
This has the disadvantage that the coordinate space is larger, and contains many redundancies, but it is simpler 
to implement than a full natural internal coordinate scheme and can handle essentially any molecule, 
regardless of the topology, thus avoiding any failure in the generating algorithm. The well known Gaussian ab 
initio program [19] now uses, as a default, this type of algorithm [54]. 
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As originally implemented, the redundant optimization scheme involved solution of equation (B3.5.8) at the 
beginning of every optimization cycle, which may be expensive in semiempirical or force field methods. A 
scheme which involves a single diagonalization was introduced by Baker et al [55]. Diagonalization of the G 
matrix partitions the original coordinate space into two subspaces, a redundant subspace spanned by the set of 
vectors in R and a non-redundant subspace spanned by U. Since R is redundant, it can be discarded and the 
set of vectors in U defines a complete, non-redundant coordinate set which can be retained throughout the 
entire optimization. Unlike natural internals, which are linear combinations of just a few of the primitives 
localized in small regions of the molecule, each vector in U is potentially a linear combination of all of the 
primitives and is delocalized over the entire molecule; they are known as delocalized internal coordinates 
[55]. Despite their apparent complexity, delocalized internals perform in practice as well as natural internals. 

We present in table B3.5.1 a comparison of Cartesian, Z-matrix and delocalized internal coordinate 
optimizations, using the semiempirical PM3 method [ 56 ] on ten typical medium-sized organic molecules. All 
optimizations were started with a unit Hessian and used the EF algorithm (see later) [57] with a BFGS 
Hessian update [6] to compute the optimization step; the only difference is in the coordinate system. The final 
column shows our best results using an initial non-unit Hessian matrix, diagonal in the space of primitive 
internals, with diagonal force constants estimated using the recipe of Schlegel [39]. The results, in terms of 
the number of cycles required for convergence, clearly show the advantages of using a good set of coordinates 
combined with a reliable estimate of the corresponding force constants. 
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Table B3.5.1 Number of cycles to converge for geometry optimizations of some typical organic molecules 
using Cartesian, Z-matrix and delocalized internal coordinates a . 


Cycles to converge 
Molecule Formula Symmetry CART b ZMAT b INT b INT C 


azulene C 10 H g C 2v 

lumazine C 6 H 4 N 4 2 C s 

dichloropropane (gauche) C 3 H 6 CI 2 C 1 

3,3,3-trifluoro-2-methyl c H F C 


propene 


4 1 '5' 3 


cyanomethyl methyl c H NO C 

ether 3 5 1 

salicylic acid C 7 H 6 3 C s 

isoxanthopterin ^6 H 5 N 5^2 ^s 

pyrroloquinoline quinone n u m n n 

anion (3-) U 14 H 3 N 2 U 8 U 1 

2,5-bis-(4-aminophenyl)- r N m n r 

1,3,4-oxadiazol H 4 H 12 N 4 U u 2v 

permethyl-nonasilane q- ,q,j x q 

(gauche conformer) 9 3 20 1 


24 

13 

16 

10 

26 

18 

14 

8 

48 

24 

33 

7 

24 

10 

13 

7 

45 

25 

33 

9 

32 

45 d 

13 

10 

36 

18 

12 

9 

167 e 

F f 

109 

36 

27 

F f 

14 

7 

355 e 

1139 

213 

47 


a Calculations using the semiempirical PM3 method with standard convergence criteria of 0.0003 au on the 
maximum component of the gradient vector and either an energy change from the previous cycle of < 10 
hartree or a maximum predicted displacement for the next step of < 0.0003 au. 

Started with a unit Hessian matrix. 
c Started with a Hessian diagonal in the space of primitive internals using the recipe of Schlegel [39], 

Poor Z matrix. 
e Converged prematurely with too high energy due to small energy changes between steps. 

Z matrix generated using Cartesian -^ Z-matrix conversion program. Severe converge problems with energy 
oscillation; halted after 90 cycles with energy higher than at starting geometry. 

g Acyclic system; good Z matrix. 

B3.5.4.2 TRANSFORMATION BETWEEN COORDINATE SYSTEMS 

This section deals with the transformation of coordinates and forces [H, 47] between different coordinate 
systems. In particular, we will consider the transformation between Cartesian coordinates, in which the 
geometry is ultimately specified and the forces are calculated, and internal coordinates which allow efficient 
optimization. 

(A) TRANSFORMATION OF FIRST AND SECOND DERIVATIVES 

Let us consider the energy expanded through second order in two sets of displacement coordinates Ax and Aq. 
The two coordinate systems are related by 
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£ij? 


Aqt = ^ B itt A^ + \Y1 C '<* Aa " A Xh + " " " (B3.5.9) 

The potential energy is given as 


E = En + Yl &qi& + J J^ HijAqt&qj + ■ 

„ i ^ ( B3 - 5 - 1 °) 

= En + 2^ A^j/ ri + j 2^ K frb Ax ti Ax h + ■ 

or, in matrix notation, 

£ = £ + Aq 1 g - } Aq L H Aq + ■ = £ + Ax T f + ±Ax J KAy + ■ ■ ■ 

where g and H are the gradient and the force constant matrix, respectively, in internal coordinates, and f and 
K are the same in Cartesians. Substituting (B3.5.9) into equation (B3.5.10) and equating equal powers one 
obtains 

f=B T g 

I 

T 1 

If the two coordinate systems are connected by a non-singular transformation then, defining A = (B ) , the 
more important inverse transformations are given by 

B = Af 

(B3.5.11) 


H= AKA 1 ^jf.AC'A 1 


Thus the transformation matrix for the gradient is the inverse transpose of that for the coordinates. In the case 
of transformation from Cartesian displacement coordinates (Ax) to internal coordinates (Aq), the 
transformation is singular because the internal coordinates do not specify the six translational and rotational 
degrees of freedom. One could augment the internal coordinate set by the latter but a simpler approach is to 
use the generalized inverse [ 58 ] 

A = (BMB T J 3 BM 

where M is any non-singular 37V x 37V matrix, in the simplest case the 37V-dimensional unit matrix. 

The second term in equation B3.5.1 1 deserves comment. This term shows that Hessian (second-derivative) 
matrices 
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in different coordinate systems are not related simply by a similarity transformation, except at stationary 
points. In particular, its signature (number of negative eigenvalues) does not have to be the same in different 
coordinate systems. Properly chosen internal coordinates tend to make the Hessian positive definite, and are 
probably one of the reasons why internal coordinates are preferable for molecular geometry optimization. 

(B) TRANSFORMATION OF MOLECULAR GEOMETRIES 


When working with any coordinate system other than Cartesians, it is necessary to transform finite 
displacements between Cartesian and internal coordinates. Transformation from Cartesians to internals is 
seldom a problem as the latter are usually geometrically defined. However, to transform a geometry 
displacement from internal coordinates to Cartesians usually requires the solution of a system of coupled 
nonlinear equations. These can be solved by iterating the first-order step [ 47 ] 

Ax = A 1 Aq 

where Aq is the difference between the current internal coordinates and their desired values, calculated to full 
accuracy from the Cartesians. If Aq is large, it may be better to proceed in stages, converging Axroughly 
between each stage. 

B3.5.4.3 CONSTRAINED OPTIMIZATION 

Constrained optimization refers to optimizations in which one or more variables (usually some internal 
parameter such as a bond distance or angle) are kept fixed. The best way to deal with constraints is by 
elimination, i.e., simply remove the constrained variable from the optimization space. Internal constraints 
have typically been handled in quantum chemistry by using Z matrices; if a Z matrix can be constructed which 
contains all the desired constraints as individual Z-matrix variables, then it is straightforward to carry out a 
constrained optimization by elimination. 

The situation is more complicated in molecular mechanics optimizations, which use Cartesian coordinates. 
Internal constraints are now relatively complicated, nonlinear functions of the coordinates, e.g., a distance 
constraint between atoms i andy in the system is R-^ = ^{(.Vj — Xj ) 2 + (yj — V/)~ + (Zi — Zj)~) = f 0' anc * this 
cannot be handled by simple elimination. There are two main approaches if elimination is not possible, 
penalty functions and Lagrange multipliers. 

The general constrained optimization problem can be considered as minimizing a function of n variables F(x), 
subject to a series of m constraints of the form C(x) = 0. In the penalty function method, additional terms of 

the form ^g.C(x) 2 , a. > 0, are formally added to the original function, thus 


fur = f(x)f J2^ c ^ : 


with the summation over all m constraints. If the constraint is satisfied, the additional term is zero, but if not 
then the value of the function increases in proportion to the square of the deviation, i.e., the additional term 
penalizes any geometries that do not satisfy the constraint. In practice, the value of the function is left 
unaltered and what is done is to modify the gradient according to 
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DFW^/iLxj = l)F[x)fdxj +5^<T i -3C r -(x)/ajc > . 


Exactly the same types of step as for an unconstrained optimization can then be taken, using the modified as 
opposed to the regular gradient. 

The performance of the penalty function algorithm is heavily influenced by the value chosen for a.. The larger 
the value of a. the better the constraints are satisfied but the slower the rate of convergence. Optimizations 
with very high values of a. encounter severe convergence problems. However, the method is very general and 


easy to apply. 

A better approach is the method of Lagrange multipliers. This introduces the Lagrangian function [ 59 ] 


L(i,i)=F(x)-^XfCf(x) 


which replaces the function F(x) in the unconstrained case. Here the X^ are the so-called Lagrange (or 
unknown) multipliers, one for each constraint. Differentiating with respect to xand X gives 


d Fix)*™ f3x j = dF{\)/Hxj + ^0f3Q(l)/9*j 


and 


BL(x,X)fdXt = -CiU). 

At a stationary point of the Lagrangian function, we have VL = 0, i.e., all d Lid x. = and all d L/dX f = 0. This 
latter condition means that all C(x) = and so all constraints are satisfied. Hence finding a set of values (x, X) 
for which VL = gives a solution to the constrained optimization problem in exactly the same way as finding 
an x for which VF = gives a solution to the corresponding unconstrained problem. 

A major difference between the penalty function and Lagrange multiplier methods is that in the latter the 
unknown multipliers are part of the optimization space and are treated essentially as additional variables. The 
Lagrange multiplier method usually converges significantly faster than the penalty function method and has 
the further advantage that constraints are satisfied essentially exactly. Note that, in both methods, constraints 
do not need to be satisfied in the starting geometry, but are instead satisfied at convergence. An efficient 
algorithm (within the context of Cartesian optimization) for imposing constraints in Cartesian coordinates, 
which incorporates both penalty functions and Lagrange multipliers, was presented by Baker in 1992 [60], 
with further improvements in the following year [61]. 

By combining the Lagrange multiplier method with the highly efficient delocalized internal coordinates, a 
very powerful algorithm for constrained optimization has been developed [62]. Given that delocalized internal 
coordinates are potentially linear combinations of all possible primitive stretches, bends and torsions in the 
system, cf. Z-matrix coordinates which are individual primitives, it would seem very difficult to impose any 
constraints at all; however, as 
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shown in the original reference [55], any desired internal constraint can be imposed using a relatively simple 
Schmidt orthogonalization procedure. By projecting unit vectors (with unit components corresponding to 
particular primitives) onto the non-redundant subspace U (see equation (B3.5.8) ) and Schmidt orthogonalizing 
the resultant vectors against all other vectors in U, it is possible to isolate individual primitives in a consistent 
manner into single vectors. By removing these vectors from the optimization space, optimizations can be 
carried out in which the primitives involved retain their initial values throughout the optimization. The 
resulting algorithm has all the advantages of redundant internal coordinate optimizations in terms of 
efficiency, combined with the advantage of the Lagrange multiplier method that desired constraints do not 
have to be satisfied in the starting geometry. As constraints do become satisfied, they can simply be 
eliminated from the optimization space (this cannot be done with Cartesian optimizations due to the form of 
the constraints). By starting with an appropriate constraint vector, it is possible to impose constraints on linear 


combinations of variables rather than on individual primitives. It is also possible to perform constrained 
transition state searches. For more details see [ 55 ] and [62]. 

There are alternative methods for imposing constraints, but they are less satisfactory than those discussed 
above. One commonly used alternative is to use projection techniques [12, 63]. In this approach, components 
in directions that would result in motion that would violate the constraints are projected out of the gradient 
vector and Hessian matrix before calculating the next step. Unfortunately, while projection works fine for 
constraints that are linear in the coordinates (the standard method for imposing the Eckart conditions is by 
projection), nonlinear constraints have to be linearized, and consequently 'feasibility corrections' [ 63 ] must be 
applied to prevent deviations from the desired constraints increasing as the optimization progresses. The 
interesting method of Taylor and Simons [ 64 ] for combining linearized geometrical constraints with mode 
following also suffers from this drawback. 

Another way of attempting to constrain variables without eliminating them from the optimization space is to 
set the appropriate force constants to very large values. This is what is currently done in Schlegel's redundant 
internal coordinate algorithm to prevent motion in the redundant subspace [54]. Perhaps a better method is to 
set the corresponding rows and columns in the inverse Hessian to zero. This method must begin with a 
geometry that satisfies the constraints. 


B 3.5.5 OPTIMIZATION OF TRANSITION STATES 

Searching for transition states poses additional difficulties compared to minimization. The first problem is the 
starting geometry. There is a host of structural information that can be called upon to provide a good estimate 
of the likely geometry of a local minimum. Far less knowledge is available about transition state geometries. 
Second is the structure of the Hessian. In the region of a transition structure, the Hessian must have one, and 
only one, negative eigenvalue. Unlike the situation for a minimum search, there is no simple and cheap 
method for guessing a reasonable starting Hessian with the appropriate eigenvalue structure. Even if you 
calculate an exact initial Hessian, if your starting geometry is poor, its eigenstructure will probably be 
inappropriate. One thing that is often done for transition states is to calculate a few rows and columns of the 
Hessian — those corresponding to variables in the 'active site' where most of the geometrical changes are 
expected — by finite difference on the gradient, and guess diagonal Hessian matrix elements for the rest of the 
system in the same way one would do for a minimum search. The starting Hessian will then hopefully have an 
appropriate negative eigenvalue. 


-21- 


In addition to the problem of generating a starting Hessian with the correct signature, there are problems in 
retaining it. The Hessian updates commonly used for minimization generally retain positive definiteness (see 
section B3. 5. 2. 3 ); there are no such guarantees for retaining a negative eigenvalue during a transition state 
search. Once the desired region of the potential energy surface (PES) has been reached, quasi-Newton 
techniques can be used to refine the geometry; however they must be able to correct for the occasional bad 
update which may destroy the Hessian eigenstructure. Transition state searches can thus be separated into two 
parts; first find the correct region on the PES and then home in onto the transition state. Many of the methods 
described below for locating approximate transition states have the advantage that they require no second- 
derivative information. 

B3.5.5.1 LOCATING THE CORRECT NEIGHBOURHOOD 
(A) COORDINATE DRIVING 


A commonly used approach is coordinate driving. Here an appropriate internal coordinate, or a linear 
combination of coordinates, is chosen as a reaction coordinate. At various intervals along this coordinate, 
between its value in the reactants and in the products, all the other variables are optimized. This then defines a 
minimum energy path. The energy maximum on this path can be shown to be the transition state geometry. 
Usually, however, the maximum on the path is located only approximately. Coordinate driving involves 
several minimizations in (n - 1) variables; consequently it is quite expensive. Moreover, its success depends 
on a good definition of the reaction coordinate; it should be roughly parallel with the true reaction path. If, at 
any point along the path, the reaction coordinate becomes nearly perpendicular to the reaction path, the latter 
may become discontinuous. The minimum energy path defined in this way has little physical significance, as 
different choices of reaction coordinate can produce different pathways. 

(B) SYNCHRONOUS TRANSIT 

Another approach requiring less intuition is the synchronous transit method [65]. Here the path between 
reactants and products is interpolated linearly between the reactant and the product. The interpolation can be 
carried out in Cartesians, internal coordinates or, perhaps best, in terms of distance coordinates [66]; the 
results depend somewhat on the interpolation method. A maximum is first found along this linear 
synchronous transit path. This is followed by alternate minimization along directions orthogonal to the 
original direction, combined with maximum searches along a parabolic path (the quadratic synchronous 
transit) joining the reactant, the product and the current estimate of the transition state. For very curved 
reaction paths the quadratic synchronous transit path may be a poor approximation, and the reaction path may 
have to be approximated piecewise. A similar algorithm, which involves minimization in a space conjugate to 
the maximum search direction, was developed by Bell and Crighton [67]. This last reference also contains a 
good discussion of various transition-state search strategies. 

Both of the above methods can be considered as approximations to the Fukui reaction path, discussed later in 
this article. The maximum of the Fukui reaction path also yields the transition state, although usually at 
significantly more expense than coordinate driving or the synchronous transit method. A more modern 
development of these ideas has been given by Ionova and Carter [68]. 

(C) WALKING UPHILL 

These algorithms try to walk up to transition states from minima, usually along the shallowest path, i.e., along 
the 
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eigenvector of the Hessian which has the lowest eigenvalue [69, 70 and 71 ]. They are more important when 
the transition state has already been located approximately and will be discussed in the next section. 

(D) BRACKETING THE TRANSITION STATE 

These methods try to bracket the transition state from both the reactant and the product side [72, 73]. For 
example, in the method of Dewar et al [73], two structures, one in the reactant valley and one in the product 
valley, are optimized simultaneously. The lower-energy structure is moved to reduce the distance separating 
the two structures by a small amount, e.g. by 10%, and its structure is reoptimized under the constraint that the 
distance is fixed. This process is repeated until the distance between the two structures is sufficiently small. 

(E) SEAM CROSSING 

Here the transition state is approximated by the lowest crossing point on the seam intersecting the diabatic 
(non-interacting) potential energy surfaces of the reactant and product. The method was originally developed 


for excited state surfaces [74], and has subsequently been used to locate approximate transition states [75, 76 ]. 
B3.5.5.2 REFINING THE TRANSITION STATE 

It is usually not efficient to use the methods described above to refine the transition state to full accuracy. 
Starting from a qualitatively correct region on the potential surface, in particular one where the Hessian has 
the right signature, efficient gradient optimization techniques, with minor modifications, are usually able to 
zero in on the transition state quickly. 

(A) DIRECT INVERSION IN THE ITERATIVE SUBS PACE 

One of the methods which is appropriate is DIIS applied to geometry optimization [24]; being a gradient norm 
minimization method, it will converge to any stationary point. This is, however, one of its problems as it may 
converge to the wrong point. The convergence radius of augmented Hessian type methods is larger. One of 
these methods, the eigenvector following (EF) method [52], is generally useful for both transition states and 
minima and will be described here as an example of a modern optimization algorithm. 

(B) EIGENVECTOR FOLLOWING 

The EF algorithm [52] is based on the work of Cerjan and Miller [ 69 ] and, in particular, Simons and 
coworkers [20, 71 ]. It is closely related to the augmented Hessian (rational function) approach[25]. We have 
seen in section B3. 5. 2. 5 that this is equivalent to adding a constant level shift (damping factor) to the diagonal 
elements of the approximate Hessian H. An appropriate level shift effectively makes the Hessian positive 
definite, suitable for minimization. 

Although a single shift parameter can also be used to find a transition state, the eigenvector following 
algorithm utilizes two level shifts: one for the Hessian (transition state) mode along which the energy is to be 
maximized and the other for modes for which it is minimized. In terms of a diagonal Hessian representation, 
transforming the gradient appropriately, we have the two eigenvalue equations 
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In these two equations, the b i are the eigenvalues of H(bj < b 2 < . . . < b^), g' is the gradient vector 

transformed to the basis of the eigenvectors U of H: g' = U g, and we have (arbitrarily) set the factor a in 
equation (B3.5.5) to unity. Note that X is the lowest eigenvalue of equation (B3.5.12) (it is always negative 
and approaches zero at convergence), while X is the highest eigenvalue of equation (B3.5.13) (it is always 
positive and again approaches zero at convergence). Once suitable values of X and X have been determined, 
the final step is given by 


Aq = - g [u } /(h t -k p ) -^tfutfihi -a w ) (i ^2 /j) 

where is it assumed that we are maximizing along the lowest Hessian mode u 1? and minimizing along all the 
others. This holds regardless of the Hessian eigenvalue structure (unlike the Newton-Raphson step), and so 
the algorithm can handle Hessian matrices with the wrong signature. 

It is also possible to maximize along modes other than the lowest and, in this way perhaps, locate transition 
states for alternative rearrangements/dissociations from the same initial starting point. For maximization along 
the Ath mode (instead of the lowest), b^ would be replaced by & k , and the summation would now exclude the 
Ml mode but include the lowest. Since what was originally the Ml mode is the mode along which the negative 
eigenvalue is required, then this mode will eventually become the lowest mode at some stage of the 
optimization. To ensure that the original mode is being followed smoothly from one cycle to the next, the 
mode that is actually followed is the one with the greatest overlap with the mode followed on the previous 
cycle. This procedure is known as mode following. For more details and some examples, see [57], Mode 
following can work well for small systems, but for larger, flexible molecules there are usually a number of 
soft modes which lead to transition states for conformational rearrangements and not to the more interesting 
reaction saddle points. Moreover, each eigenvector can be followed in two opposite directions and frequently 
only one leads to a reaction. 

Although it was originally developed for locating transition states, the EF algorithm is also efficient for 
minimization and usually performs as well as or better than the standard quasi-Newton algorithm. In this case, 
a single shift parameter is used, and the method is essentially identical to the augmented Hessian method. 
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B 3.5.6 SIMULTANEOUS OPTIMIZATION OF GEOMETRIES AND 
WAVEFUNCTIONS 

So far, we have considered the optimization of wavefunction and geometry parameters separately. In view of 
the much shorter timescale and higher energy associated with the former, this is reasonable. However, 
additional savings can be potentially obtained by optimizing the wavefunction and the geometry 
simultaneously. This was first proposed for density functional methods [77] and later for traditional quantum 
chemistry techniques [78]. With the large increase of computing speed compared to disk input/output speed, 
direct techniques [79] were generally adopted. In direct methods, the large disparity between calculating the 
gradients of the molecular energy with respect to electronic parameters (the Fock matrix in SCF theory) and 
nuclear coordinates disappeared; gradients are now only a few times more expensive than a Fock matrix 
evaluation, making simultaneous wavefunction-geometry optimization much more attractive. In spite of this, 
such methods are not yet widely used, except in the crude form of relaxing the SCF convergence criteria if the 
geometry parameters are far from convergence. 

The molecular dynamics method introduced by Car and Parrinello [80], though not strictly an optimization 
method, has many features in common with simultaneous optimization of the wavefunction and geometry. In 
this method, the electronic wavefunction and energy are close to, but not identical with, the Born- 
Oppenheimer energy. The basic idea is to consider the electronic degrees of freedom as dynamical variables, 
along with the nuclear coordinates. The Lagrangian contains the kinetic energy of the nuclei, the potential 
energy as a function of both the nuclear and electronic degrees of freedom and a fictitious kinetic energy term 
which is the square of the time derivative of the electronic wavefunction multiplied by a small mass. The 
inertia of this fictitious electronic mass causes the wavefunction to deviate slightly from the Born- 
Oppenheimer surface. The Car-Parrinello method is most efficient for plane wave basis sets, as the 


calculation of the nuclear gradient is very inexpensive in this method, but it has also been introduced into SCF 
theory [81, 82]. 


B 3.5.7 REACTION PATH ALGORITHMS 

The reaction path is defined by Fukui [ 83 ] as the line q(s) leading down from a transition state along the 
steepest descent direction 


i)qts)/i)s = -g<J)/|g(j)l- (B3.5.14) 

Here s is the path length, ds = (d<y? + ■ ■ + dg N ) 1/; . The reaction path is, unfortunately, dependent on the 
coordinate system. This should perhaps be emphasized more than is generally the case. Scaling one coordinate 
by a factor a > 1 increases the coordinate value but decreases the corresponding gradient component, so that 
if the reaction path was antiparallel to the gradient before the scaling it will not be so after scaling. For 
qualitative studies of chemical reactions, there is little to recommend one particular reaction path over 
another. However, for dynamical studies, the intrinsic reaction coordinate (IRC) [83], defined as the path 
length along the reaction path in mass-weighted Cartesian coordinates, t. — in if2 x P has advantages over other 

definitions (for example the kinetic energy matrix is the unit 
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matrix). Here m f is the atomic mass corresponding to the Cartesian coordinate x f , making the reaction path 
isotope dependent. A major difficulty with reaction paths is that to decide whether a given point is on the path 
the whole path must be constructed; local information (energy, gradient, force constants etc.) is insufficient. 

B3.5.7.1 FOLLOWING THE REACTION PATH DOWNHILL 

The most widely used methods try to follow the gradient downhill starting from a transition state. At the 
transition state itself, the gradient vanishes and the first step must be made along the imaginary eigenvector of 
the Hessian in the proper coordinates, i.e., mass-weighted Cartesians for the IRC path. As pointed out by 
Schlegel [1, 2], ( B3.5.14 ) is a stiff differential equation and its integration by simply making small downhill 
steps along the gradient, a method equivalent to Euler's method, requires very small steps and consequently 
much effort. Otherwise, the calculated reaction path diverges from the true one, at first slowly and then more 
rapidly. To deal with this problem requires either constrained minimization steps at each point on the path, or 
alternatively second-order (both gradient and Hessian) information. This increases the cost of the individual 
steps but allows much larger steps to be taken. 

The method of Ishida et al [ 84 ] includes a minimization in the direction in which the path curves, i.e. along 
(g/|g|-g'/|g'|), where g and g' are the gradient at the beginning and the end of an Euler step. This technique, 
called the stabilized Euler method, performs much better than the simple Euler method but may become 
numerically unstable for very small steps. Several other methods, based on higher-order integrators for 
differential equations, have been proposed [ 85 , 86 ]. 

Page et al [87] use a local quadratic model for the surface. This requires the Hessian, but once it is available, 
the reaction path can be inexpensively determined for a quadratic (or even higher-order) analytical surface 
(see also [88]). Gonzales and Schlegel [ 89 , 90 ] approximate the reaction path by an arc of a circle. They first 
make a step along the gradient of length half the current stepsize to an intermediate point. From this, they 
make another half step so that the energy is minimized, subject to the stepsize constraint. The wavefunction 


and the gradient need not be evaluated at the intermediate point. This method is implemented in the Gaussian 
series of programs [ 19 ] and is widely used. It does not need the exact Hessian, but a good estimate should be 
available so that the many local optimizations converge rapidly. An advantage of this method is that it yields 
the curvature of the reaction path at the transition state correctly. 

B3.5.7.2 APPROACHING THE REACTION PATH FROM THE SIDE 

These methods, which probably deserve more attention than they have received to date, simultaneously 
optimize the positions of a number of points along the reaction path. The method of Elber and Karplus [ 91 ] 
was developed to find transition states. It furnishes, however, an approximation to the reaction path. In this 
method, a number (typically 10-20) equidistant points are chosen along an approximate reaction path 
connecting two stationary points a and b, and the average of their energies is minimized under the constraint 
that their spacing remains equal. This is obviously a numerical quadrature of the integral 5- 1 f f * £(g(. V ) where 

S is the path length between the points a and b. The Euler equation to this variation problem yields the 
condition for the reaction path, equation (B3.5.14) . A similar method has been proposed by Stacho and Ban 
[92]. 

B3.5.7.3 BIFURCATION OF THE REACTION PATH AND VALLEY-RIDGE INFLECTION POINTS 

As shown by Valtazanos and Ruedenberg [93], steepest descent paths (e.g., the Fukui intrinsic reaction 
coordinate) 
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can bifurcate, i.e., split in two, only at stationary points. Thus, the intuitive notion of a reaction path forking, 
e.g., upon ascent in a valley to two different transition states, or, starting down from a transition state to two 
different minima, is impossible. This should be regarded as an inherent limitation of the standard definition of 
a reaction path, not as a physical impossibility. Such cases, in which a valley floor is gradually transformed 
into a ridge are, in fact, quite common. Mathematically, they are characterized by fact that one of the 
eigenvalues of the Hessian in the subspace perpendicular to the path changes from positive to negative. The 
point at which the eigenvalue is zero is called the valley-ridge inflection point. The reaction path, started at a 
stationary point, will run directly along the ridge and thus becomes non-physical past a valley-ridge inflection 
point. The actual reaction, of course, will not follow the reaction path in this case, not even qualitatively. 
Steepest descent paths started a little away from the reaction path will veer away from the latter after passing 
the valley-ridge inflection point. Baker and Gill [94] have devised a method for locating valley-ridge 
inflection points (which they call branching points). The reader is reminded, however, that the signature of the 
Hessian at non-stationary points depends strongly on the coordinate system. Thus, the location of a valley- 
ridge inflection point may be quite different in Cartesians or mass-weighted Cartesians than in internal 
coordinates. In particular, the Hessian in Cartesian coordinates may have spurious negative eigenvalues 
corresponding to rotational coordinates. 


B 3.5.8 GLOBAL OPTIMIZATION 

For our purposes, global optimization refers to the location of the lowest minimum on a given potential energy 
surface. As mentioned in the introduction, this is currently only a partially solved problem. The number of 
conformational minima, e.g., for a large protein, increases enormously with the size of the system, and the 
only way to be absolutely sure that the lowest-energy structure has been found is to do an exhaustive search of 
the entire energy surface; for large molecules this is essentially impossible. Even if the lowest-energy 
structure were successfully located, this would likely have only limited chemical significance, as there would 
be many structures energetically close to the global minimum (within a kcal or so) which would need to be 


considered for an accurate treatment of the thermodynamics. It is almost a certainty (though the authors are 
unaware of a formal proof) that finding the global minimum on molecular potential energy surfaces is 
computationally NP complete, and thus scales factorially with the size of the problem. Such problems are 
generally regarded as insoluble (however, this does not exclude their solution in a given case). 

With systematic PES searches excluded, random (stochastic) methods have become the most common 
techniques for global minimization. The two most popular methods are simulated annealing [ 95 ] and genetic 
algorithms [96]. The former method derives its name from the annealing process in condensed matter physics 
in which a solid is melted in a bath and the temperature is then slowly decreased; the particles are expected to 
settle into their lowest-energy states, provided the initial temperature is sufficiently high and the cooling rate 
is sufficiently low. In practical optimizations, cooling is represented by local minimizations and heating by 
random jumps, i.e., random displacements of some or all of the atoms. After a 'sufficient number' of local 
minimization/random jump cycles, the procedure is terminated with the lowest-energy structure found so far 
taken as the global minimum. 

The genetic algorithm method takes its name from the trading of genetic information in chromosomes 
between parents to produce an offspring. A random population of individuals (geometrical structures for the 
system in question) is created, and local minimizations are performed on each individual. Selected structural 
components (genes) from mostly the lowest-energy individuals are allowed to exchange, producing a new set 
of individuals for the next round of local 
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minimizations. After a sufficiently large number of rounds, the global minimum should be located. 

Both of these global optimization methods require a very large number of essentially full local optimizations 
and, consequently, are normally restricted to moderate-sized systems described using mechanics force fields. 
A somewhat different approach has been developed by Piela and coworkers [97], utilizing the diffusion or 
heat conduction equation. In this method, a surface containing multiple local minima is smoothly deformed in 
such a way that wells on the surface gradually disappear, with shallower wells vanishing faster than deeper, 
lower-energy wells. Eventually a surface will be derived which has just one minimum, related to the lowest- 
energy, global minimum on the original surface. By carefully reversing the procedure, keeping track of the 
minimum as it evolves, one is (hopefully) led back to the global minimum as the original surface is reformed. 

Other deterministic methods for global optimization have also been developed (see, e.g., [98]). 
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ed D Yarkony (Singapore: World Scientific) pp 459-500 

An excellent, up-to-date treatise on geometry optimization and reaction path algorithms for ab initio quantum 
chemical calculations, including practical aspects. 

Werner H-J 1987 Matrix-formulated direct multiconfigurational self-consistent field and multireference 
configuration interaction methods Adv. Chem. Phys. 69 1 

A lucid and carefully written exposition of this difficult subject from one of the authors of the highly 
acclaimed MOLPRO suite of programs. It contains examples and plenty of physical insight. 

Shepard R 1987 The multiconfiguration self-consistent field method Adv. Chem. Phys. 69 63 

A very detailed, pedagogical treatment of the subject, including much of the mathematical background and a 
nearly complete list of references prior to 1987. 

Pulay P 1995 Analytical derivative techniques and the calculation of vibrational spectra Modern Electronic 
Structure Theory ed D Yarkony (Singapore: World Scientific) pp 1 191-240 

A concise introduction to the calculation of analytical derivatives in quantum chemistry, with 
applications to simulating vibrational spectra. 
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B3.6 Mesoscopic and continuum models 

Marcus Midler 


B3.6.1 INTRODUCTION 

Many systems in physical chemistry exhibit structure on length scales that greatly exceed the atomic 
dimensions. Systems containing surfactants — detergents or milk, for instance — often consist of droplets of 
one component dissolved in another phase. The size of these droplets exceeds the extension of the molecular 
constituents by far. Very generally, mesoscopic and continuum models describe the properties of materials on 
length scales larger than the atomic dimensions by incorporating the details of the underlying atomic structure 
only in terms of a reduced number of effective variables. In this very broad sense, the Navier-Stokes equation 


[1], which describes the motion of a fluid via a density, energy, and velocity field and elasticity theory [2], 
and which describes solids in terms of stress and displacement fields, also belongs to this class of model. In 
both approaches, the subject of the model is not the properties of individual atoms (e.g., their position or 
quantum state) but rather their average properties (like the density or velocity) in a small coarse-graining 
volume. Usually the coarse graining is not performed explicitly, but it is understood that the averaging volume 
is large enough to result in a continuous spatial variation of the variables of the mesoscopic model and yet still 
be smaller than the characteristic length scale of the phenomena under consideration. 

In the following entry we shall restrict ourselves to discussing mesoscopic and continuum models for complex 
fluids in chemical physics. The wide span of time and length scales in these materials is illustrated in figure 
B3.6.1 for a blend of two polymers. On the atomistic scale each polymer consists of chemical repeat units 
joined together to form the chain molecule. The length scale is set by the distance between the atoms along 
the backbone of the polymer, typically in the range of 1-2 A. The vibrations of the atoms occur on the 
timescale of picoseconds. In a dense melt, the flexible chain molecules adopt a random- walk-like 
conformation. The 'step length' of the random walk, or persistence length b, is typically of the order of a few 
nanometres. Since several thousands of repeat units form a polymer, the overall size of a single molecule, as 
specified by its radius of gyration, exceeds the persistence length by 1-3 orders of magnitude. On this range 
of length scales the structure of the polymer is self-similar. If the two components of the blend are not 
miscible, as it is generally the case, one species forms droplets that are dispersed in a matrix of the other 
species. The size of the droplets is in the micrometre range. On even larger length scales (say 1 mm) the 
material appears homogeneous. Clearly the properties on the mesoscopic length scale are important for 
application properties. A decrease of the droplet size or even the formation of a connected morphology (i.e. a 
microemulsion) improves the mechanical properties of the composite material. A similar span of time and 
length scales is encountered in many other systems (e.g., mixtures of oil, water and surfactant or glassy 
materials) and this behaviour is rather typical for complex fluids. 
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Figure B3.6.1. Illustration of the wide span of length scale in a binary polymer blend. (See the text for further 
explanation.) 

A unified model that describes the structure from the atomistic length scale up to macroscopic properties is 
not analytically tractable. Even state-of-the-art supercomputers cannot cope with such a broad spread of time 
and length scales in numerical simulations. Today's largest simulated systems in thermal equilibrium 

comprise about 10 particles and, hence, span about 2-3 decades in length scales. With the increase of 
computing power and progress in simulation methodology, simulating larger and larger system sizes will 
become feasible, but computer modelling from atomic to macroscopic scales in the framework of a single, 
unified model is not feasible at present or in the near future. 


Another caveat for the modelling from the atomistic level up to the macroscopic level is the requirement of 
sufficiently accurate interaction potentials. Minor inaccuracies in calculations on small length scales can give 
rise to pronounced effects on the mesoscopic scale. Consider, for instance the self-assembly of amphiphilic 
molecules (see section B3. 6. 3 ) into a spatially ordered structure. The free-energy difference between the 
different morphologies can be as small as 10 £7 per molecule. The ab initio prediction of such a small free- 


energy difference is certainly a formidable task. 

Mesoscopic and continuum models do not attempt to describe large-scale phenomena starting from the 
smallest atomic length scale, but rather incorporate the local structure via a small number of effective 
parameters. Mesoscopic models lump a small number of atoms into an effective particle. These particles 
interact via coarse-grained interactions. By this coarse-graining procedure much of the atomistic detail is lost, 
and only those interactions pertinent to the phenomena on the mesoscopic length scales are retained. Even if 
the interactions on the microscopic scale are extremely complex (e.g., hydrophobic interactions [3] in lipid 
water mixtures), they can often be captured by simple expressions on the mesoscopic length scale. Coarse- 
grained models thus yield valuable insights into the structure on large length scales. For specific examples the 
effective interactions are derived by eliminating the degrees of freedom on the smallest (atomistic) length 
scales, retaining only those on larger length scales; for some systems (e.g., polymer chains in the gas phase) 
this coarse-graining procedure has a formal justification due to the self-similar structure on a large range of 
length scales; for other systems the mapping between the atomistic/microscopic level and the mesoscopic 
description is rather a concept than a practicable procedure. In this latter case, the application of mesoscopic 
models rests on the observation that different systems (e.g., diblock copolymers and lipid water mixtures) 
share a common behaviour on mesoscopic scales. Universal mesoscopic behaviour that does not depend on 
the details on the atomistic level in a qualitative way is the subject of mesoscopic models. 

Continuum models go one step further and drop the notion of particles altogether. Two classes of models shall 
be discussed: field theoretical models that describe the equilibrium properties in terms of spatially varying 
fields of mesoscopic quantities (e.g., density or composition of a mixture) and effective interface models that 
describe the state of the system only in terms of the position of interfaces. Sometimes these models can be 
derived from a mesoscopic model (e.g., the Edwards Hamiltonian for polymeric systems) but often the 
Hamiltonians are based on general symmetry considerations (e.g., Landau-Ginzburg models). These models 
are well suited to examine the generic universal features of mesoscopic behaviour. 

Mesoscopic and continuum models bridge the gap between atomistic realistic simulations and the description 
on the macroscopic level, (e.g., elasticity theory). The objectives of mesoscopic models are twofold. On the 
one hand they help identify interactions that are necessary to bring about the phenomena on a mesoscopic 
scale (e.g., phase separation or self-assembly) and they aid in investigating the dependence of the mesoscopic 
behaviour on the effective interactions. This information also yields some qualitative insight into how the 
microscopic parameters influence mesoscopic behaviour (e.g., the dependence of the structure in a self- 
assembled system on the architecture/shape of the amphiphilic molecules). On the other hand, this class of 
models elucidates universal behaviour on the mesoscopic scale (e.g., identifying various morphologies into 
which systems can self-assemble, the relation between confinement and phase behaviour, or the consequences 
of fluctuations) and establishes a relation between behaviour on large length scales and experimentally 
accessible (mesoscopic) quantities (e.g., Flory-Huggins parameter, interfacial tension, or bending rigidity of 
membranes). 

The hierarchy of models is complemented by a variety of methods and techniques. Mesoscopic models that 
incorporate some fluid-like packing (e.g., spring-bead models for polymer solutions) are investigated by 
Monte Carlo 


simulations, molecular dynamics or density functional techniques. Lattice models are studied by Monte Carlo 
simulations. The larger the span of length scales considered, the larger the computational effort required. 
Models without pronounced packing effects (e.g., the Edwards Hamiltonian) are investigated by self- 
consistent field techniques. Continuum models are often analytically tractable, at least in the mean field 
approximation, and simple analytical expressions for various quantities (e.g., interfacial tension between two 
immiscible polymers) can be obtained in some limiting cases. The effect of fluctuations has been assessed by 
computer simulations, transfer matrix calculations and renormalization group techniques. 


At the heart of mesoscopic and continuum models lies the question: Which degrees of freedom are to be 
retained as relevant and which can be ignored? The answer depends on the specific problem. By comparing 
different models the degree of universality and the relevance of interactions can be gauged. This yields much 
insight into the mechanisms which underly the phenomena. Mesoscopic and continuum models make contact 
with chemical models on the atomistic level as well as with the macroscopic descriptions. Effort is being 
made to incorporate more chemical realism into the models as well as to extend them to larger length scales. 

In the following we shall describe various applications of mesoscopic models to complex fluids. The 
examples extend from applications that are quite close to the atomistic level (e.g., coarse-grained polymer 
models) to highly idealized models (e.g., effective interface Hamiltonians or Ginzburg-Landau models). 
Moreover, we restrict ourselves mainly to the description of thermodynamic equilibrium. The remainder of 
this entry is organized as follows. In section B3.6.2 we discuss applications of coarse-grained models to 
systems involving homopolymers. Mesoscopic models for the description of self-repelling chains, polymer 
solutions, polymer melts and binary blends are introduced. From these models, more coarse-grained 
descriptions can be derived in terms of Ginzburg-Landau expansions or effective interface Hamiltonians. 
Section B3. 6. 3 then considers amphiphilic molecules. Their co-operative behaviour on the supramolecular 
level has been explored in the framework of models with various degrees of detail. Chain models retain the 
salient features of the amphiphile's architecture while lattice models or continuum models yield a description 
in terms of a spatially varying concentration. On even larger scales, the statistical mechanics of interfaces has 
been investigated via random interface models. This article closes with a brief look at the application of 
mesoscopic and continuum models to dynamical phenomena. 


B3.6.2 POLYMERIC SYSTEMS 

B3.6.2.1 POLYMER SOLUTIONS 

Coarse-grained models have a longstanding history in polymer science. Long-chain molecules share many 
common mesoscopic characteristics which are independent of the atomistic structure of the chemical repeat 
units [4, 5 and 6]. The self-similar structure [7, 8, 9 and 10] on large length scales is only characterized by a 
single length scale, the chain extension R. 

The important interactions in polymer solutions are the connectivity of the segments along the chain 
molecules and interactions between segments. The solvent molecules are often not treated explicitly, but their 
effect is incorporated into the effective interactions between polymer segments. A good solvent corresponds 
to an effective repulsion between segments and the polymer chains adopt a swollen configuration. A bad 
solvent gives rise to an attraction between the polymer segments and leads to a collapse. 


The observation of the universality and self-similarity of the large-length-scale properties has a theoretical 
basis. In 1972 de Gennes [ 11 ] related the structure of a polymer chain in a good solvent to a field theory of a n 
component vector model in the limit n — » 0. This class of models (see the entry on phase transitions and 
critical phenomena; A2.5) exhibits a continuous phase transition and the properties close to this critical point 
have been investigated extensively with renormalization group calculations [8, 9 and 10 ]. The inverse chain 
length plays the role of the distance from the critical point of the n = component vector model. As in the 
theory of critical phenomena, the behaviour in the vicinity of this critical point (i.e. 1/7V^1) is governed by a 
universal scaling behaviour that is brought about by only a few relevant interactions. The relation between the 
behaviour of polymer chains in the limit of 7V^> oo and the critical behaviour justifies the use of highly coarse- 
grained models that incorporate only two relevant interactions: connectivity along the chain and binary 
segmental interactions. 


Lattice models of polymer solutions are a particularly simple and computationally efficient realization, and 
therefore have attracted abiding interest [12]. In simple lattice models, a small group of atomistic repeat units 
is represented by a site on a simple cubic lattice. Segments along a polymer occupy neighbouring lattice sites 
and multiple occupation of lattice sites is forbidden (excluded volume). The latter constraint corresponds to 
the repulsive binary interaction under good solvent conditions. Isolated chains on the lattice adopt 
configurations of self-avoiding walks. The polymer's end-to-end distance R scales with the chain length like 

R - N^. The exponent v = 0.588 has been calculated using renormalization group techniques [9, 10 ], 
enumeration techniques for short chain lengths and Monte Carlo simulations [13]. 

The application of lattice models to study the behaviour of multi-chain systems (i.e. dilute and semi-dilute 
solutions, and dense melts) is straightforward in principle. The equilibration of dense multi-chain systems is, 
however, a challenging problem for computer simulations, and simple lattice models have been a testing bed 
for many algorithms. Some methods are tailored to isolated chains or very dilute systems (e.g., the pivot 
algorithm [13] or the construction of a chain via the pruned-enriched Rosenbluth method [14]); other methods 
provide an effective relaxation of the overall chain dimensions in dense systems (e.g., configurational bias 
Monte Carlo [15, 16] or the recoil growth algorithm [17]). 

Though these simple lattice models reproduce the universal features of polymer solutions, it is difficult to 
incorporate details of the chain architecture. The simple lattice model allows only for two bond angles which 
makes the investigation of orientational effects prone to lattice artefacts. Moreover the particles in real fluids 
arrange to form neighbouring shells. This local packing structure of the fluid does not affect the universal 
scaling behaviour but it is pertinent to the relation between the coarse-grained effective interactions and the 
underlying microscopic potentials. Since the vacancies on the lattice and the polymer segments have the same 
size, packing effects in the density correlation function are largely absent. More sophisticated lattice models 
(e.g., the bond fluctuation model [18]), in which monomers are represented by extended objects (e.g., a whole 
unit cube) on the lattice, have been explored. These models exhibit packing effects and a large number of 
bond angles while still retaining the computational advantages of lattice models. They also allow for a 
diffusive dynamics of the polymers on the lattice which consists of random local displacements of the 
monomers. Moreover, the bond vectors can be chosen such that the excluded volume constraint prevents 
bonds from crossing through each other in the course of these local displacements. This non-crossability takes 
account of topological effects which are important for the dynamical properties of linear chains [ 19 ] and 
influence the conformational statistics of ring polymers [ 20 , 21 and 22] (e.g., in order to avoid topological 
interactions rings collapse in a concentrated solution). 

Off-lattice models enjoy a growing popularity. Again, a particle corresponds to a small number of atomistic 
repeat units 


along the backbone of the polymer. Off-lattice models allow simulations at constant pressure or the 
calculation of the pressure via the virial expression. This yields direct access to the pVT behaviour. By 
modelling polymers as a sequence of tangent hard spheres in continuous space, computer simulators have 
investigated the equation of state in polymer solutions and the detailed packing structure of polymer solutions 
in contact with a hard wall. This class of model is particularly suited for comparing the results to analytical 
theories (e.g., Wertheim's theory [ 23 ] or density functional approaches [24, 25 and 26]) because of the 
existence of elaborated analytical descriptions for the corresponding hard-sphere monomer fluid. 

Hard-sphere models lack a characteristic energy scale and, hence, only entropic packing effects can be 
investigated. A more realistic modelling has to take hard-core-like repulsion at small distances and an 
attractive interaction at intermediate distances into account. In non-polar liquids the attraction is of the van der 
Waals type and decays with the sixth power of the interparticle distance r. It can be modelled in the form of a 
Lennard- Jones potential Vj T (r) between segments 


Vu(r) = 4«J(£) -(j\ \ (B3.6.1) 


where the exponent of the first, repulsive term is chosen for computational convenience. The Lennard- Jones 
radius a sets the microscopic length scale and e sets the energy scale. In many simulational applications the 
potential is truncated and shifted so as to yield a continuous, finite ranged potential. This does not alter the 
qualitative behaviour but shifts the temperature and density of the liquid-vapour critical point. The Lennard- 
Jones particles are tied together to form chain molecules. The constraint of fixed bond length or an harmonic 
bonding potential has been employed. Another popular choice is the FENE potential [27, 28]. It takes the 
form 

V ¥mB (r) = ~«Jln (l - ^ ) with *» s l5<r - ( B3 6 2 ) 

The parameter k tunes the stiffness of the potential. It is chosen such that the repulsive part of the Lennard- 
Jones potential makes a crossing of bonds highly improbable (e.g., k = 30). This off-lattice model has a rather 
realistic equation of state and reproduces many experimental features of polymer solutions. Due to the 
attractive interactions the model exhibits a liquid-vapour coexistence, and an isolated chain undergoes a 
transition from a self-avoiding walk at high temperatures to a collapsed globule at low temperatures. Since all 
interactions are continuous, the model is tractable by Monte Carlo simulations as well as by molecular 
dynamics. Generalizations of the Lennard- Jones potential to anisotropic pair interactions are available: e.g., 
the Gay-Berne potential [29]. This latter potential has been employed to study non-spherical particles that 
possibly form liquid crystalline phases. 

In the limit that the number of effective particles along the polymer diverges but the contour length and chain 
dimensions are held constant, one obtains the Edwards model of a polymer solution [9, 30]. Polymers are 
represented by random walks that interact via zero-ranged binary interactions of strength v. The partition 
function of an isolated chain is given by 
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where the density field pis related to the configuration r(t) of the polymer via 

p{r f ) = f dr Sir - r{i )). (B3.6.4) 

The path integral E'sums over all polymer conformations r(f), where < t < N denotes the contour parameter 
along the polymer. The second term represents the connectivity along the molecule and b denotes the 
persistence length (i.e. the 'step length' of the random walk). In the absence of the third term, the partition 
function describes a Gaussian chain with the end-to-end distance R = ft v^(Gaussian chain model). This is 
the only length scale in the problem. Very much like in quantum mechanics, the path integral results in a 
diffusion equation (i.e. the polymer analogue of Schrodinger's equation for the propagator) for the probability 
of finding a chain's segment after t steps along the chain at position r in space. The third term describes the 
interactions: if segments t and f are located at the same position they interact with the strength v. 


This model is very popular for analytical calculations and generalizations to multi-chain systems are 
straightforward. Properties of polymer solutions have been obtained via renormalization group techniques [8, 
9 and 10 ]. Similar to the simple lattice model the Edwards model includes only the chain connectivity and 
binary segmental interactions: the detailed structure of the underlying fluid is omitted. For v = the self- 
similar Gaussian statistics persist on all length scales and there is no rod-like behaviour on smaller scales. 
Generalizations of thread-like models to stiff polymers and orientational interactions, however have been 
explored. The most popular one is the wormlike chain model, in which the second term in the Edwards 
Hamiltonian (B3.6.3) is replaced by [ 31 ] 
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where u(t) = dr/dt denotes the tangent vector along the path with unit norm. The parameter r| controls the 
local stiffness of the path. On small distances along the chain the tangent vectors u are highly correlated and 
the stiffness parameter r| controls the decay of orientational correlations along the chain's contour: (u(t)u(f)) 
= exp(-\t-t'\/r\). On large length scales, however, the Gaussian behaviour is recovered and the end-to-end 
distance is given by R = 2r\N. 

B3.6.2.2 POLYMER BLENDS 

The above models can be generalized to multicomponent systems by modifying the segmental interactions. 
Most applications deal with a binary blend in a common solvent. The excluded volume interaction between 
segments limits density fluctuations in a dense polymer liquid. Therefore many models of dense multi- 
component systems neglect the finite compressibility and model the excluded volume interactions by 
enforcing a uniform segment density. In 1941, Flory [ 32 ] and Huggins [ 33 ] employed a simple lattice model 
to calculate the phase diagram for a dense binary polymer blend in mean field approximation. The two 
polymer species — denoted A and B — are modelled as walks on a lattice, 


and the binary interactions of strengths e AA , e AB and e BB act between neighbours on the lattice. Since all 
lattice sites are occupied, only the difference of the segmental interactions (the Flory-Huggins parameter) 


», -£(«„- 2!i±2K) (B3.6.6, 

determines the phase diagram. Here, z = 6 denotes the coordination number of the simple cubic lattice. 

Typical experimental values of the Flory-Huggins parameter % are in the range 10 _2 -10 -5 for partially 
compatible blends while the individual interactions between the segments e .. (ij = A, B) are of the order of 
k B T. This illustrates that the phase behaviour is governed by a delicate cancellation of interactions. Starting 
from a model with atomistic details and performing an ab initio calculation of the packing structure and the 
effective segmental interactions in a binary blend would require extremely accurate interatomic potentials as 
input and a very high numerical quality of the calculation. Therefore, predicting the value of the Flory- 
Huggins parameter on an ab initio basis is virtually impossible. The concept of describing the effective 
incompatibility of two polymer species by a single mesoscopic parameter % FH has proven remarkably 
successful, however. When % FH is used as an adjustable parameter the mean field theory of Flory and Huggins 
is quite successful in describing many experimental observations. The values of the x FH parameter of various 
pairs of polymers have been extracted from a comparison between theory and experiment and are compiled in, 
for example, [34]. In the framework of the mean field theory, the excess free energy of mixing per segment 


takes a particularly simple form: 

——— = — Ma + — Mb - XKH0A^B (B3.6.7) 

where 7V" A and N B denote the chain length of the two polymer species and § A and (|> B denote the relative 
amount of A or B segments, respectively. <|) A + (|) B = 1- The first term represents the translational entropy of 
mixing. Due to the connectivity of the segments it is reduced by a factor VN A or 1/iVg, respectively. The 
second term describes the repulsion between unlike segments. The chain conformations are assumed to be 
independent of the composition. Therefore the conformational entropy does not give any contribution to the 
free energy of mixing in the Flory-Huggins treatment. Most notably, the theory rationalizes the fact that long 
macromolecules tend to demix, because a small repulsion is sufficient to far outweigh the entropy of mixing, 
which is reduced by the factor I IN. This expression for the excess free energy of mixing also forms the basis 
for self-consistent field models of spatially inhomogeneous systems. 

In order to gain qualitative insight into how to relate the Flory-Huggins parameter to the architectural 
properties of the components, mesoscopic models with various degrees of structural detail have been 
investigated. Complex lattice models allow monomeric units to occupy more than one lattice site. In the 
lattice cluster model of Freed and co-workers [ 35 ] the effect of explicit monomer structure has been explored. 
The partition function of the model is expressed in a systematic double expansion with respect to the inverse 
temperature and the inverse coordination number of the underlying lattice. To zero order the approach 
recovers the results of the original Flory-Huggins theory. Higher-order terms account for geometric packing 
on the monomer scale and non-random mixing effects. This approach has been successful in predicting 
various subtle influences of the monomer architecture, including the occurrence of entropic contributions to 
the Flory-Huggins parameter. 


Similar questions can be addressed by the P-RISM (polymer reference interaction site model) theory of Curro 
and Schweizer [36], This integral equation theory generalizes the Ornstein-Zernike equation to polymeric 
systems in order to account for the fluid-like packing structure. Details of the molecular architecture enter via 
the single-chain structure factor. The P-RISM approach yields a detailed description of the phase behaviour 
and the local structure and has been applied to models with various degrees of structural detail. In the limit 
that the chains are modelled as infinitely thin Gaussian paths the results are very similar to the Flory-Huggins 
theory. The theory has been applied to fairly realistic chain models taking the experimentally measured 
single-chain structure factors as input. More recently, this approach has been applied self-consistently to 
calculate the change of the molecular conformation upon blending. 

The bond fluctuation model [37] and off-lattice [38, 39] models have been used to investigate the binary 
polymer blends within Monte Carlo simulations. Attention has focused on rather different topics: (i) Monte 
Carlo simulations appropriately account for the effect of composition fluctuations. They are important in the 
vicinity of the critical temperature of the unmixing transition. When the chain length is increased, this 
fluctuation-dominated region shrinks and one observes a crossover between the 3D Ising universality class 
and mean field critical behaviour [40]. (ii) The relation between the polymer architecture and the Flory- 
Huggins parameter has been explored in simulations. Disparities in the architecture on the scale of the coarse- 
grained monomers (e.g., different local stiffness of the chains or different monomer shapes) alter the packing 
structure and give rise to enthalpic and entropic contributions to the Flory-Huggins parameter [37, 39]. When 
comparing experimental data to the predictions of the mean field theory, deviations from the simple 
proportionality x FH ~ 1/7" of the Flory-Huggins parameter are rather the rule than an exception, (iii) Monte 
Carlo simulations reveal that the chains in the minority phase shrink. By reducing their size, they increase the 
local density of their own monomers and reduce the number of unfavourable contacts with the opposite 
species. The latter effect is, however, not captured in simple mean field theories, (iv) Off-lattice models have 


been employed to study binary blends at constant pressure and to explore the effect of compressibility on the 
miscibility behaviour [38, 39 ]. 

These coarse-grained approaches investigate the generic behaviour and the qualitative dependence on the 
chain architecture. Again it should be pointed out that these simulations and analytical methods cannot predict 
the absolute value of the Flory-Huggins parameter of a specific pair of polymers. However by a careful 
choice of the coarse-grained model, they help in identifying relevant parameters for the miscibility on a 
coarse-grained scale. 

B3.6.2.3 SELF-CONSISTENT FIELD APPROACH AND GINZBURG-LANDAU MODELS 

Long polymers tend to demix and the properties of the interfaces between the coexisting phases have attracted 
longstanding interest. Using the Gaussian chain model, Helfand and Tagami [ 41 ] investigated the interfacial 
properties in the self-consistent field theory. Within the mean field approximation the problem of interacting 
polymers is formulated in terms of a single-chain problem in an effective, external field. This effective, 
external field replaces the interactions with the surrounding polymers in the binary A/B blend. The effective 
field 

it' A (r) = ?(r){tf A (r) + 4>n(r) - l\ + x<Pn(r) (B3.6.8) 

acts on a monomer of species A at position r with <\> A and (|> B denoting the local composition of the blend. A 
similar equation holds for w B . The first term enforces the incompressibility; the factor \ is adjusted to comply 
with the constraint § A (r)+(|) B (r) = 1 everywhere. The second term describes the repulsion between different 
species parameterized by %. The local composition, in turn, depends on the fields and is obtained as the 
Boltzmann average of 
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isolated A and B chains in the fields w A and w B , respectively. For Gaussian chains, described by the first part 
of the Hamiltonian ( B3.6.3 ), this leads to a diffusion equation in the external potential w A and w B . Since the 
fields depend on the local composition the equations have to be solved self-consistently. 

Helfand and Tagami calculated the composition profiles across the interface and determined the interfacial 
tension. In general, the self-consistent field equations have to be solved numerically. Different schemes in real 
space [42], on lattices [43] and in Fourier representation [44] have been devised. There are, however, two 
interesting limits in which simple analytical expressions for the interfacial width and the interfacial tension 
can be obtained. The limit in which the width of the interface is much smaller than the extension of the 
polymer and yet larger than the persistence length b is called the strong segregation limit. It corresponds to the 
range 1 *£>% &1/N of incompatibility. This strong segregation limit is only accessible for long chain lengths N 
and corresponds to truly polymeric behaviour. The interfacial width w and tension y are described by the 
simple forms 


w = /j/v /6 ;C and y = ph/xfb (B3.6.9) 

where p denotes the monomer density. The leading corrections to the strong segregation behaviour are of the 
order l/%7Vand have been the subject of much investigation [ 45 , 46 and 47]. Of course, the Gaussian chain 
model cannot describe the structure on length scales smaller than or comparable to the persistence length b of 
the polymer. This restricts the application of this mesoscopic model to the range %*K1. 

The binary polymer blend exhibits a second-order unmixing transition. Close to the critical temperature the 


concentration of the two coexisting phases does not differ very much and the characteristic length scale of 
composition fluctuations £, or the interfacial width w are large compared with the size R of the polymer coil. 
In this weak segregation limit polymer blends behave very similarly to mixtures of small molecules in the 
vicinity of the critical point. The difference between the composition of the coexisting phases and the 
composition of the mixture at the critical point defines the order parameter m of the unmixing transition (see 
the entry on phase transitions and critical phenomena; A2.5). It increases with a universal power law upon 
cooling the system below the critical temperature T Q : 

m - ft with / = T *~ T > 0. (B3.6.10) 


P is the critical exponent and t denotes the reduced distance from the critical temperature. In the vicinity of the 
critical point, the free energy can be expanded in terms of powers and gradients of the local order parameter m 
0) = <|> A (r) - <|> B (r): 

f[«W] f ^ f ^ , '\„ ,A -^ „ , ' 2 i 


pkT 


/dV[/(/fl) + y(Vm) 2 l with f (m ) = - t - m 2 + — m \ (B3.6.11) 


This form is called a Ginzburg-Landau expansion. The first term f(m) corresponds to the free energy of a 
homogeneous (bulk-like) system and determines the phase behaviour. For t > the function/exhibits two 
minima at w = ± v^37. This value corresponds to the composition difference of the two coexisting phases. 
The second contribution specifies the cost of an inhomogeneous order parameter profile. / sets the typical 
length scale. 
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The general form of the expansion is dictated by very general symmetry considerations; the specific 
coefficients for the example of a polymer blend can be derived from the self-consistent field theory. For a 

binary blend this yields I 2 = b 2 N/l8. In mixtures of small molecules the coefficient is determined by the range 
of the interactions; in polymeric systems the coefficient is associated with the conformational entropy. It is the 
shape of the extended molecule and its deformation at a spatial inhomogeneity that gives rise to the free 
energy cost. 

Ginzburg-Landau models constitute a widely used example of continuum models. This class of continuum 
models describes the generic behaviour of all binary mixtures close to the unmixing transition. The properties 
of the specific model enter only via the coefficients of the expansion which set the energy scale and length 
scale. Extensions to different transitions (e.g., first-order transitions or microemulsion) are available (see also 
section B3. 6. 3) . This approach, however, does not incorporate any structural detail of the underlying systems 
and hence becomes quantitatively inaccurate at lower temperatures, where the coexisting phases differ more 
strongly in their composition or the characteristic length scale (i.e. the correlation length £) becomes 
comparable to the size of the molecules. 

Within this continuum approach Cahn and Hilliard [48] have studied the universal properties of interfaces. 
While their elegant scheme is applicable to arbitrary free-energy functional with a square gradient form we 
illustrate it here for the important special case of the Ginzburg-Landau form. For an ideally planar interface 
the profile depends only on the distance z from the interfacial plane. In mean field approximation, the profile 
m(z) minimizes the free-energy functional ( B3.6.11 ). This yields the Euler-Lagrange equation 


^ = =► -toil - m > -f 2 ^ = (B3.6.12) 


which, in turn, is solved by a simple function 

jpi = ±y5/ l/2 tanh (-) whh ur = — , (B3.6.13) 

In the vicinity of the critical point (i.e. \t\ ^1) the interfacial width w is much larger than the microscopic 
length scale / and the Landau-Ginzburg expansion is applicable. 

Both Monte Carlo simulations of lattice models [49, 50] and spring-bead models [ 51 ] have been employed to 
study interfaces in polymeric systems. The simulations yield insight into the local properties of the polymeric 
fluid. Unlike in the Landau-Ginzburg expansion, the notion of polymers is retained and the orientation of the 
extended molecules at the interface or the enrichment of end segments have been studied. Moreover, the 
simulations incorporate fluctuations, which are ignored in the mean field approximation. In the vicinity of the 
critical temperature composition fluctuations are important. The mean field treatment overestimates the 
critical point and the binodals are flatter in the simulations which exhibit 3D Ising critical behaviour ((3 = 
0.324) than in the mean field case (P = Vi). The importance of composition fluctuations can be gauged by the 
Ginzburg criterion [52]: The neglect of fluctuations is justified when the order parameter fluctuations in one 

'correlation volume' of size ^ are small compared with the order parameter itself. In the case of a symmetric 
binary polymer blend this condition yields 
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X - Xc N 2 1 

" » "T^a * 77' (B3.6.14) 

Ultimately, in the vicinity of the critical point, composition fluctuations are important, but the region in which 
these fluctuations dominate the behaviour decreases with the chain length N. Qualitatively, the behaviour can 
be understood as follows: long-chain molecules do not fill space and strongly interdigitate; the number of 
other chains in the volume of a reference chain increases like \ /Vwith chain length. This large number of 
interaction partners results in a strong suppression of fluctuations in the interactions on the level of a whole 
molecule and, hence, replacing the interactions by a non-fluctuating mean field is a good approximation. 

Another important difference between the mean field treatment and the simulations or experiments are 
fluctuations of the local interfacial position. While the mean field treatment assumes a perfectly flat, planar 
interface right from the outset, the local interfacial position fluctuates in experiments and simulations. A 
typical snapshot of the local interface position, as obtained from a Monte Carlo simulation of a binary 
polymer blend is depicted in figure B3.6.2 . On not too small length scales the local position of the interface is 
smooth and without bubbles or overhangs. The system configuration can be described by two ingredients: the 
position i/(r,|) of the centre of the interface as a function of the lateral coordinates r,, and the local structure 
described by profiles across the interface. The latter quantities depend only on the coordinate normal to the 
interface. In many applications the coupling between the long-wavelength fluctuations of the local interfacial 
position u and the intrinsic profile is neglected. In this case the intrinsic profiles describe the variation of 
quantities across an ideally planar interface. The apparent interfacial profile/? (z), which is averaged over 
fluctuations of the local interfacial position in experiments or simulations, can be approximated by a 
convolution of the intrinsic profile p mi (z) and the distribution P(u) of the local interface position [ 53 ] 

Pwp(z) = / diiP(u)p\ni(z — w) (B3.6.15) 


If one is only interested in the properties of the interface on scales much larger than the width of the intrinsic 
profiles, the interface can be approximated by an infinitely thin sheet and the properties of the intrinsic 
profiles can be cast into a few effective parameters. Using only the local position of the interface, effective 
interface Hamiltonians describe the statistical mechanics of fluctuating interfaces and membranes. 
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Figure B3.6.2. Local interface position in a binary polymer blend. After averaging the interfacial profile over 
small lateral patches, the interface can be described by a single-valued function u(rn). (Monge representation). 
Thermal fluctuations of the local interface position are clearly visible. From Werner et al [49]. 


B3.6.2.4 EFFECTIVE INTERFACE HAMILTONIANS 


The fluctuations of the local interfacial position increase the effective area. This increase in area is associated 
with an increase of free energy Wwhich is proportional to the interfacial tension y. The free energy of a 
specific interface configuration t/(r,,) can be described by the capillary wave Hamiltonian: 


* wl -,/i,^, + (i)7. + (t) 1 - 1 }^/A«^. 


(B3.6.16) 


The functional Ti[u] can be diagonalized via a Fourier transformation with respect to the lateral coordinates #*,,. 
This results in 


%1-yE^ 1 ^ 1 


(B3.6.17) 


In this Fourier representation the Hamiltonian is quadratic and the equipartition theorem yields for the thermal 
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fluctuations: (u (q)) = k B T/yq . This spectrum corresponds to a Gaussian distribution of the local interface 
position: 


^-t^M-^) wi,h 

4k- J 2ny V<y,mn/ 


(B3.6.18) 


where a short and long wave length scale cut-off o „ and cr ■ have to be introduced to avoid the divergence 

*-^ niaX iTiin *-^ 

at q -^>oo and g^O. 

The interfacial fluctuations broaden laterally averaged profiles. Within the convolution approximation 
(B3.6.15) one obtains a profile with the shape of the erfc function [49]: 


4r \qmto/ 


wJpp=W?rrt-* ,. I'M ' ] (B3.6.19) 


Thus, the apparent interfacial width w , which is measured in simulations or experiments, is larger than the 
intrinsic width w> t and depends via the wavevector cut-offs on the geometry considered. This can actually be 
used to measure the interfacial tension in computer simulations. 

For a free interface the cut-off at large length scales is determined by the lateral patch size on which the 
interface is observed. In simulations this is set by the size of the simulation cell. In scattering experiments 
(e.g., neutron reflectivity) it is associated with the lateral coherence length of the beam. If the coexisting 
phases differ in density, gravitation will give rise to a large-scale cut-off for capillary waves [ 54 ] 


to = J— (B3.6.20) 

V y 

where Ap is the density difference and g the gravitational constant. Similarly, interactions with boundaries 
(e.g. van der Waals forces) limit fluctuations and give rise to a large length scale cut-off. In this case the cut- 
off depends on the distance between the interface and the wall and the cut-off imparts a dependence of the 
apparent interfacial width on the distance between the wall and the interface. 

On short length scales the coarse-grained description breaks down, because the fluctuations which build up 
the (smooth) intrinsic profile and the fluctuations of the local interface position are strongly coupled and 
cannot be distinguished. The effective interface Hamiltonian can describe the properties only on length scales 
large compared with the width l/# max - w of the intrinsic profile. The absolute value of the cut-off is difficult 
to determine: the apparent profiles are experimentally accessible, but in order to use equation (B3.6.19) the 
width of an hypothetically flat interface without fluctuations has to be known. Polymer blends are suitable 
candidates for investigating this problem. Since the self-consistent field theory gives an accurate description 
of the interface profile except for 
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fluctuations, it yields a quantitative description of the intrinsic, ideally flat, profile. The comparison with 


Monte Carlo simulations, which include fluctuations, then yields q . Simulations of a coarse-grained 
polymer blend by Werner et al find q max = 1.65/w int [49] in the strong segregation limit, in rather good 
agreement with the value # Q Y = 2/w- i suggested by analytical theory [55]. 

iiicix. in l 

An important application of effective interface Hamiltonians are wetting phenomena. If a binary mixture is 
confined, the wall of the container will favour one component of the mixture, say A. This component forms an 
enrichment layer at the wall, while the B component is expelled from the wall region. Rather than describing 
the detailed composition profile at the wall, the effective interface Hamiltonian specifies the system 
configuration solely by the distance between the A-rich enrichment layer at the wall and the B component 
further away. This coarse graining concept is sketched in figure B3.6.3 . The profile is distorted in the vicinity 
of the wall and this gives rise to a short-range effective interaction between the wall and the interface. The 
length scale of the interaction is set by the characteristic length scale of the (free) interface profile. Dispersion 
forces give rise to an additional long-ranged effective interaction, which decays like a power-law with the 
distance /. The effective interfacial Hamiltonian takes the form: 


mir { )\ = Jd"Ti j JyVo 5 * ftU)\ (B3.6.21) 


where g(l) denotes the effective interaction between the wall and a portion of the interface at a distance /. The 
first term corresponds to the capillary wave Hamiltonian. In general, the coefficient y(/) in front of the square 
gradient depends on the distance between the wall and the interface [56], because the intrinsic profile is 
distorted in the presence of the wall. Only for large distances / — » oo does the effective interfacial tension y(/) 
tend to its macroscopic value. The second term describes the effective interface potential between the wall and 
the interface. Depending on the shape of the interface potential g(l) different situations are encountered [57]: 
if the effective potential exhibits a minimum in the vicinity of the wall, the interface is bound to the wall. This 
corresponds to a microscopically thin layer of the preferred component A at the wall. One says: A does not 
wet the wall. If g(l) has a minimum at infinite distance /^>oo there is a macroscopically thick layer of A at the 
wall: component A wets the wall. The transition between both states is the wetting transition, which can be 
continuous (i.e. the thickness of the enrichment layer diverges upon approaching the transition temperature) or 
(most often) discontinuous (i.e. the thickness of the layer jumps from a microscopic value to a macroscopic 
one). 
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Figure B3.6.3. Sketch of the coarse-grained description of a binary blend in contact with a wall, (a) 
Composition profile at the wall, (b) Effective interaction g(l) between the interface and the wall. The different 
potentials correspond to complete wetting, a first-order wetting transition and the non-wet state (from above 
to below). In case of a second-order transition there is no double-well structure close to the transition, but g(l) 
exhibits a single minimum which moves to larger distances as the wetting transition temperature is 
approached from below, (c) Temperature dependence of the thickness / of the enrichment layer at the wall. 
The jump of the layer thickness indicates a first-order wetting transition. In the case of a continuous transition 
the layer thickness would diverge continuously upon approaching r wet from below. 

In the mean field considerations above, we have assumed a perfectly flat interface such that the first term in 
the Hamiltonian (B3.6.21) is ineffective. In fact, however, fluctuations of the local interface position are 
important, and its consequences have been studied extensively [57, 58 ]. 
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B3.6.3 AMPHIPHILIC MODELS 


Another important class of materials which can be successfully described by mesoscopic and continuum 
models are amphiphilic systems. Amphiphilic molecules consist of two distinct entities that like different 
environments. Lipid molecules, for instance, comprise a polar head that likes an aqueous environment and one 
or two hydrocarbon tails that are strongly hydrophobic. Since the two entities are chemically joined together 
they cannot separate into macroscopically large phases. If these amphiphiles are added to a binary mixture 
(say, water and oil) they greatly promote the dispersion of one component into the other. At low amphiphile 



concentrations the molecules enrich at the interface so as to place their different ends into the corresponding 
phases. This displaces the water and the oil from the oil/water interface and greatly reduces the interfacial 
tension. At larger concentration of amphiphiles the molecules self-assemble into complex morphologies. 
These might either be isotropic (i.e. a microemulsion) or possess liquid crystalline order. The spatial structure 
is selected by a balance to minimize the contacts between the different entities and to fill space. Some of the 
possible morphologies are displayed in figure B3.6.4. Analogous morphologies are encountered in polymeric 
systems involving block copolymers. 


$ « 

Figure B3.6.4. Illustration of three structured phases in a mixture of amphiphile and water, (a) Lamellar 
phase: the hydrophilic heads shield the hydrophobic tails from the water by forming a bilayer. The 
amphiphilic heads of different bilayers face each other and are separated by a thin water layer, (b) Hexagonal 
phase: the amphiphiles assemble into a rod-like structure where the tails are shielded in the interior from the 
water and the heads are on the outside. The rods arrange on a hexagonal lattice, (c) Cubic phase: amphiphilic 
micelles with a hydrophobic centre order on a BCC lattice. 

The relation between the architecture of the molecules and the spatial morphology into which they assemble 
has attracted longstanding interest because of their importance in daily life. Lipid molecules are important 
constituents of the cell membrane. Amphiphilic molecules are of major importance for technological 
applications (e.g., in detergents and the food industry). 

The large length scale on which the self-assembly occurs and the universality of the morphologies borne out 
in experiments on a large variety of different systems make mesoscopic and continuum models suitable tools 
for investigating the underlying universal mechanism. Experiments suggest that many of the generic features 
can be captured by the amphiphilicity of the molecules. The models that have been employed can be broadly 
divided into models that aim at correlating the molecular architecture with the morphology and those models 
which investigate the generic phase behaviour and the influence of fluctuations. 
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B3.6.3.1 CHAIN MODELS 

The architecture of the lipid molecules or the diblock copolymers results in the typical amphiphilic properties, 
like surface activity and self-assembly. On the most qualitative level, understanding of the self-assembly in 
lipid systems [3] is provided by a characterization of the molecules as a simple geometrical object ('wedge') 
parameterized by its volume, the maximum chain length and the area per head group. The different phases 
result from simple geometric packing considerations. Similar arguments on the balance between chain 
stretching and interfacial tension yield the qualitative features of the phase diagrams in systems containing 
diblock copolymers [59], 

Chain models capture the basic elements of the amphiphilic behaviour by retaining details of the molecular 
architecture. Ben-Shaul et al [60] and others [61] explored the organization of the hydrophobic portion in lipid 
micelles and bilayers by retaining the conformational statistics of the hydrocarbon tail within the RIS 
(rotational isomeric state) model [4, 5] while representing the hydrophilic/hydrophobic interface merely by an 


effective tension. By invoking a mean field approximation and calculating the properties of the tails by an 
enumeration of a large sample of conformations, they investigated the packing effects inside the hydrocarbon 
core for various detailed chain architectures. This mean field technique has been extended, for example, to 
include a modelling of the hydrophilic head and to study the self-assembly of lipids in aqueous solutions [ 62 ] 
or to investigate the absorption of proteins at surfaces covered with a polymer brush [63], 

Many simulational approaches use a coarse-grained description of the amphiphiles by representing them via 
short-chain molecules on a lattice. The lattice is there only for computational convenience but is assumed to 
play no role otherwise. Typically, the number of lattice sites to model the amphiphiles is small and does not 
exceed 32. Each site is conceived as a small number of atomistic units along the amphiphilic molecule. A 
particularly popular model has been suggested by Larson [64]. There are two types of sites: hydrophilic and 
hydrophobic. Hydrophobic sites correspond to the oil or the hydrocarbon tail of the amphiphiles; hydrophilic 
sites represent the polar head of the amphiphiles or water. Oil and water are modelled as single-site entities. 
There is a short-range repulsion between unlike segments. The phase diagram of ternary 
oil/amphiphiles/water and binary amphiphile/water mixtures has been investigated by Monte Carlo 
simulations. Many phases observed in experiments (disordered, lamellar, hexagonal and even the gyroid 
phase) can be obtained as a function of temperature, composition and architecture of the amphiphile. Of 
course, special care has to be devoted to the study of finite-size effects. Typically, only a small number of unit 
cells of the spatially periodic structure fit into a simulation cell. If the size of the simulation box is close to a 
multiple of the unit cell size the stability of the phase might be greatly enhanced; if the size of the simulation 
box is incompatible with the spatially periodic structure the morphology is strongly distorted and its stability 
reduced. Very similar effects occur in nature if a spatially periodic structure is confined into a thin film. 

A multitude of different variants of this model has been investigated using Monte Carlo simulations (see, for 
example [65]). The studies aim at correlating the phase behaviour with the molecular architecture and 
revealing the local structure of the aggregates. This type of model has also proven useful for studying rather 
complex structures (e.g., vesicles or pores in bilayers). 

For structures with a high curvature (e.g., small micelles) or situations where orientational interactions 
become important (e.g., the gel phase of a membrane) lattice-based models might be inappropriate. Off-lattice 
models for amphiphiles, which are quite similar to their counterparts in polymeric systems, have been used to 
study the self-assembly into micelles [66], or to explore the phase behaviour of Langmuir monolayers [ 67 ] 
and bilayers. In those systems, various phases with a nematic ordering of the hydrophobic tails occur. 
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Since the amphiphilic nature is essential for the phase behaviour, systems of small molecules (e.g., lipid water 
mixtures) and polymeric systems (e.g., homopolymer copolymer blends) share many common features. 
Within the mean field approximation, the phase behaviour of block-copolymer models can conveniently be 
explored in the framework of the Gaussian chain model. The investigation of the self-assembly into various 
complex phases takes advantage of a Fourier decomposition of the spatially varying densities. The phase 
diagrams for pure diblock copolymers, binary blends of diblock copolymers and binary and ternary solutions 
have been investigated [44, 48]. These calculations reveal a rich variety of different morphologies as a 
function of the incompatibility, architecture and amount of homopolymer 'solvent'. In binary and ternary 
solutions, highly swollen phases are found in which the periodicity of the structure far exceeds the radius of 
gyration. An example of the possible phases in a ternary blend of two homopolymers and a symmetric diblock 
copolymer is presented in figure B 3. 6. 5. At a fixed incompatibility one finds a complex phase diagram, 
including disordered homopolymer-rich phases, a symmetric lamellar phase L and asymmetric swollen 
lamellar phases L A and Z B , which accommodate different amounts of the homopolymer components. Note 
that very similar phase diagrams are found in ternary oil water amphiphile mixtures. In the self-consistent 
field calculations not only the phase behaviour but also effective properties of the internal interfaces (e.g., the 
interfacial tension of bending moduli) are accessible [69]. The latter information might serve as input to 


effective interface Hamiltonians. 


r-0-8 



Figure B3.6.5. Phase diagram of a ternary polymer blend consisting of two homopolymers, A and B, and a 
symmetric AB diblock copolymer as calculated by self-consistent field theory. All species have the same 
chain length TV and the figure displays a cut through the phase prism at % N = 11 (which corresponds to weak 
segregation). The phase diagram contains two homopolymer-rich phases A and B, a symmetric lamellar phase 
L and asymmetric lamellar phases, which are rich in the A component L A or rich in the B component Z B , 
respectively. From Janert and Schick [68]. 
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Figure B3.6.6. Illustration of the two principal radii of curvature for a membrane. 


In general, the self-consistent field calculations are accurate for long polymers; for short chains however, 
fluctuations become important. Self-consistent field calculations and simulations of polymeric models show 
for example, that the bending rigidity of copolymer-laden interfaces might be quite small for short chain 
lengths. In a region where the self-consistent field theory predicts a highly swollen lamellar phase, the 
lamellar order is unstable with respect to interfacial fluctuations and a microemulsion forms [70]. Polymeric 
microemulsions have been observed in simulations [71] and experiments [72], but they are much more 
common in small molecular amphiphilic systems. Similarly, fluctuations easily destroy the body-centred 
cubic arrangement of micelles found in the self-consistent field theory and lead to formation of a micellar 
solution. 


These chain models are well suited to investigate the dependence of the phase behaviour on the molecular 
architecture and to explore the local properties (e.g., enrichment of amphiphiles at interfaces, molecular 
conformations at interfaces). In order to investigate the effect of fluctuations on large length scales or the 
shapes of vesicles, more coarse-grained descriptions have to be explored. 

B3.6.3.2 LATTICE MODELS 

A further step in coarse graining is accomplished by representing the amphiphiles not as chain molecules but 
as single site/bond entities on a lattice. The characteristic architecture of the amphiphile — the hydrophilic 
head and hydrophobic tail — is lost in this representation. Instead, the interaction between the different lattice 
sites, which represent the oil, the water and the amphiphile, have to be carefully constructed in order to bring 
about the amphiphilic behaviour. 

As early as 1969, Wheeler and Widom [ 73 ] formulated a simple lattice model to describe ternary mixtures. 
The bonds between lattice sites are conceived as particles. A bond between two positive spins corresponds to 
water, a bond between two negative spins corresponds to oil and a bond connecting opposite spins is 
identified with an amphiphile. The contact between hydrophilic and hydrophobic units is made infinitely 
repulsive; hence each lattice site is occupied by either hydrophilic or hydrophobic units. These two states of a 
site are described by a spin variable s f , which can take the values +1 and -1. Obviously, oil/water interfaces 
are always completely covered by amphiphilic molecules. The Hamiltonian of this Widom model takes the 
form 
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w = "* Y, ■*"' " J Y,*<*t - 2M H **/ ~ M 1] *t*J (B3.6.22) 

where (//), ((//)), and <((//))) denote nearest, next-nearest and fourth-nearest neighbours on the lattice, 
respectively. The first two terms correspond to the Ising Hamiltonian. h acts as a chemical potential which 
favours positive spins, while J controls the incompatibility between water and oil. If only these two terms 
were present the model would describe a simple binary mixture. The additional terms have to be incorporated 
to bring about the amphiphilic properties. The case of negative M has been much investigated. In this case, the 
third term imparts some kind of bending rigidity to the oil/water interface, while the fourth term favours 
sequences of the form (••• + + — + + — •••). This leads to the formation of lamellar phases which are not 
directly tied to the lattice spacing. 

Slightly more complex models treat the water, the amphiphile and the oil as three distinct variables 
corresponding to the spin variables S = +1, ? and -1. The most general Hamiltonian with nearest-neighbour 
interactions has the form 

W = - J>5 S> + K Sf S] + CIS? S; + 4^)} - ^{HSi - ASfl (B3.6.23) 

M r 

This Blume-Emery-Griffiths (BEG) model [74] has been studied both by mean field calculations as well as 
by simulations. There is no pronounced difference between the amphiphile molecules S=0 9 the oil or the 
water. Indeed, the model was first suggested in a quite different context. An extension of the model by Schick 
and Shih [75] includes an additional interaction of the form 

(B3.6.24) 


mi 

where (ij k) denotes three sites in a line. For negative values of L the term favours local conformations in 
which the amphiphile sits between the water and the oil. The model exhibits an oil-rich phase, a water-rich 
phase, a lamellar phase and a disordered phase, which exists between the lamellar phase and the oil-water 
coexistence. The disordered phase consists of water and oil domains separated by amphiphile sheets. It is 
homogeneous on large length scales, but shows — for certain parameter regions — oscillating structure on 
smaller length scales. These two length scales are associated with the structure of a microemulsion. The 
period of the oscillations characterize the local domain size in the microemulsion, while this nearly liquid- 
crystalline order 'dephases' on larger length scales. This defines the persistence length £. The latter length 
scale is mesoscopic (i.e. of the order of 100 A), while the former is roughly the size of the molecules. 

Lattice models have been studied in mean field approximation, by transfer matrix methods and Monte Carlo 
simulations. Much interest has focused on the occurrence of a microemulsion. Its location in the phase 
diagram between the oil-rich and the water-rich phases, its structure and its wetting properties have been 
explored [76]. Lattice models reproduce the reduction of the surface tension upon adsorption of the 
amphiphiles and the progression of phase equilibria upon increasing the amphiphile concentration. Spatially 
periodic (lamellar) phases are also describable by lattice models. However, the structure of the lattice can 
interfere with the properties of the periodic structures. 
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B3.6.3.3 CONTINUUM MODELS 

An even coarser description is attempted in Ginzburg-Landau-type models. These continuum models describe 
the system configuration in terms of one or several, continuous order parameter fields. These fields are 
thought to describe the spatial variation of the composition. Similar to spin models, the amphiphilic properties 
are incorporated into the Hamiltonian by construction. The Hamiltonians are motivated by fundamental 
symmetry and stability criteria and offer a unified view on the general features of self-assembly. The 
universal, generic behaviour — the possible morphologies and effects of fluctuations, for instance — rather than 
the description of a specific material is the subject of these models. 

An important example is the one-order-parameter model invented by Gompper and Schick [77], which 
describes a ternary mixture in terms of the density difference § between water and oil: 


F\Mr)] = j dr(/W) + g{0)|V^| 2 ■+ l |A0| 3 }. (B3.6.25) 

The first two terms resemble the Ginzburg-Landau Hamiltonian for the polymeric systems. f(§) describes the 
bulk free energy and there is a gradient square term to account for the free-energy costs of a spatially varying 
order parameter profile. In principle, the functions/, g and c of this Ginzburg-Landau expansion can be 
derived from a more microscopic model (e.g., the lattice models of the previous section). Such a derivation 
serves to relate the input parameters of the Ginzburg-Landau theories to microscopic parameters (e.g., length 
of the amphiphile) of the underlying model. The coefficients/ g and c are also related to the scattering 
intensity and, hence, some guidance from the experiment is available on how to choose them. Though the 
amphiphiles do not occur explicitly in the description, they determine the density dependence of the functions 
/and g. In order to model three-phase coexistence between an oil-rich phase, a water-rich phase and a 
microemulsion with roughly equal amounts of oil and water, the function/has to exhibit three minima. The 
amphiphiles decrease the free-energy cost of interfaces. This is modelled by a negative value of g in some 


intermediate composition range. This favours the formation of interfaces and, therefore, the third term (with c 
> 0) is required to ensure thermodynamic stability. 

By virtue of their simple structure, some properties of continuum models can be solved analytically in a mean 
field approximation. The phase behaviour interfacial properties and the wetting properties have been explored. 
The effect of fluctuations is investigated in Monte Carlo simulations as well as non-equilibrium phenomena 
(e.g., phase separation kinetics). Extensions of this one-order-parameter model are described in the review by 
Gompper and Schick [76]. A very interesting feature of these models is that effective quantities of the 
interface — like the interfacial tension and the bending moduli — can be expressed as a functional of the order 
parameter profiles across an interface [78]. These quantities can then be used as input for an even more 
coarse-grained description. 

B3.6.3.4 RANDOM INTERFACE MODELS 

Most characteristics of amphiphilic systems are associated with the alteration of the interfacial structure by the 
amphiphile. Addition of amphiphiles might reduce the free-energy costs by a dramatic factor (up to 10 dyn 
cm in the oil/water/amphiphile mixture). Adding amphiphiles to a solution or a mixture often leads to the 
formation of a microemulsion or spatially ordered phases. In many aspects these systems can be conceived as 
an assembly of internal interfaces. The interfaces might separate oil and water in a ternary mixture or they 
might be amphiphilic bilayers in 
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binary solutions. Random interface models study the large-scale structure of amphiphilic systems by 
describing the configuration of the local, instantaneous interfacial position. 

The effective free energy of the system of interfaces takes the general form [79, 80 and 81] 

H = fdSlY+Xifi±2xfl*+ZK\ (B3.6.26) 

where dS denotes the surface element, //the local mean curvature and K the local Gaussian curvature of the 
interface. The latter two quantities are related to the two principal radii of curvature via 2H= \IR^ + l/R 2 and 
K= \IR^R 2 . The interfacial tension y controls the area of the interface. X describes a spontaneous curvature of 
the interface, which is related to an asymmetry of the interface. This might occur even in a bilayer, when it is 
composed of an amphiphilic mixture and the two sheets have different compositions. The coefficients k and 
^characterize the bending rigidity and the saddle-splay modulus respectively. If the interface is closed, the 
Gauss-Bonnet theorem relates ,/ "S a = 2jTXQto the Euler characteristic x E - Since this quantity is a 

topological invariant, the last term in the Hamiltonian can be omitted if the topology of the interface does not 
change (e.g., in the case of a vesicle). 

One can regard the Hamiltonian (B3.6.26) above as a phenomenological expansion in terms of the two 
invariants K and Hof the surface. To establish the connection to the effective interface Hamiltonian ( b3.6.16 ) 
it is instructive to consider the limit of an almost flat interface. Then, the local interface position u can be 
expressed as a single-valued function of the two lateral parameters u(n\ In this Monge representation the 
interface Hamiltonian can be written as 


(B3.6.27) 


* = / d ^T |V " |2 + I |Atf|2 }- 


Among the different problems which have been tackled with random interface Hamiltonians are the 
following, (i) The phase diagram of the random interface Hamiltonian has been explored by Huse and Leibler 
[82]. The phase diagram comprises a droplet phase, in which the minority component is dissolved into the 
matrix of the majority component, disordered phases and lamellar phases, (ii) Much interest has focused on 
the role of fluctuations. In the presence of a wall or another interface, the fluctuations of the local interface 
position are restricted. This gives rise to an entropic repulsion between the fluctuating interface and confining 
boundaries (Helfrich interaction [83]). (iii) In order to avoid the free-energy cost of a rim, membranes close 
up to form vesicles. The shapes of vesicles as a function of the bending rigidity and the pressure difference 
between the vesicle's interior and the outside have been mapped out [84], 


B3.6.4 APPLICATIONS TO DYNAMIC PHENOMENA 

Though this entry has focused on equilibrium properties, mesoscopic and continuum models in chemical 
physics can also describe non-equilibrium phenomena, and we shall mention some techniques briefly. 

Mesoscopic models can often be treated by molecular dynamics simulations. This method generates a realistic 
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(Hamiltonian) trajectory in the phase space of the model from which information about the equilibrium 
dynamics can readily be extracted. The application to non-equilibrium phenomena (e.g., the kinetics of phase 
separation) is, in principle, straightforward. 

Exploring the hydrodynamic behaviour of complex fluids with conventional molecular dynamics models 
poses a challenge to computational resources, because the hydrodynamic behaviour appears only on large 
time and length scales. Coarse-grained models [85, 86 and 87] have been explored in which a particle does 
not correspond to a molecule or a small number of atoms but rather to a fluid element. These 'fluid 
particles' [86] interact with an extremely soft potential, which does not diverge as two effective particles 
approach each other (see the Lennard- Jones potential ( B3.6.1 )) but increases only linearly with the 
interparticle distance. This very soft repulsive potential allows for very large time steps in a molecular 
dynamics simulation. As the particles correspond to coarse-grained fluid elements they do not conserve 
energy when they collide. This provides a motivation for a dissipative friction force and a random force. The 
strength of the friction and the noise are related by a fluctuation-dissipation theorem, which ensures that the 
equilibrium distribution corresponds to the canonical ensemble. Unlike the standard implementation of noise 
and friction forces in molecular dynamics schemes, noise and friction in the dissipative particle dynamics do 
not act on the velocity of a single particle but on pairs of particles. In this way momentum is conserved. The 
macroscopic behaviour is not diffusive but hydrodynamic (note, however, that the energy is not conserved and 
there is no transport equation of the energy). This method promises to be an efficient way to study dynamic 
effects on the mesoscopic scale of complex fluids. An application of dissipative particle dynamics to a binary 
homopolymer blend is described in [87], 

Monte Carlo schemes generate a stochastic trajectory through phase space (see the entry about statistical 
mechanical simulations; B3.3). If the Monte Carlo moves resemble the configurational changes in a realistic 
dynamics (e.g., the conformations evolve via small displacements of particles) some dynamical information 
can be gained. Since there is no momentum in Monte Carlo simulations the dynamics is diffusive. However, 
many Monte Carlo algorithms employ moves that involve rather large changes in the system conformation 
(e.g., deletion of a molecule and subsequent insertion at a random position). These 'unphysicaP moves are 


extremely efficient in propagating the system through configuration space, but they do not allow for a 
dynamic interpretation of the trajectory. 

A lattice scheme which does capture hydrodynamic behaviour is the lattice Boltzmann method [88, 89, 90 and 
91 ]. This method has been devised as an effective numerical technique of computational fluid dynamics. The 
basic variables are the time-dependent probability distributions f a (x, t) of a velocity class a on a lattice site x. 
This probability distribution is then updated in discrete time steps using a deterministic local rule. A careful 
choice of the lattice and the set of velocity vectors minimizes the effects of lattice anisotropy. This scheme has 
recently been applied to study the formation of lamellar phases in amphiphilic systems [92, 93 ]. 

Analytic techniques often use a time-dependent generalization of Landau-Ginzburg free-energy functionals. 
The different universal dynamic behaviours have been classified by Hohenberg and Halperin [94]. In the 
simple example of a binary fluid (model B) the concentration difference can be used as an order parameter m. 
A gradient in the local chemical potential \i(r) = 8F/8m(r) gives rise to a current y 


SF 
j = _AV- (B3.6.28) 

dm{r) 
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which strives to minimize the free energy. The kinetic coefficient A denotes a phenomenological constant 
which sets the time scale. In complex fluids (e.g., polymer blends) the relation between the gradient of the 
chemical potential and the current is non-local and the kinetic coefficient has to be generalized [95], If the 
order parameter is conserved (e.g., in the demixing of a binary mixture) the change of the order parameter and 
the current are related by the continuity equation 


= Vj = -VAV , (B3.6.29) 

dt Sm(r) 

This time development of the order parameter is completely deterministic; when the equilibrium |u(r) = const 
is reached the dynamics comes to rest. Noise can be added to capture the effect of thermal fluctuations. This 
leads to a Langevin dynamics for the order parameter. 

Time-dependent Ginzburg-Landau models can be generalized to models with or without conserved order 
parameters. Also, the effect of additional conservation laws (for example, the inclusion hydrodynamic effects) 
has been explored. More complicated forms of the free-energy functional can be used to incorporate more 
details of the systems and alleviate the restriction to small order parameters inherent in the Ginzburg-Landau 
expansion. Shi and Noolandi [ 96 ] have used the free energy functional of the self-consistent field theory to 
explore fluctuations in spatially structured phases of diblock copolymers. A similar free-energy functional is 
employed by Fraaije and co-workers [ 97 ] to study the kinetics of self-assembly in amphiphilic systems. 
Extensions of the time-dependent Ginzburg-Landau equation to a formal scheme for the time evolution of 
non-equilibrium systems in terms of a set of coarse-grained variables have been explored [98]. 
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C1.1 Clusters 

Lai-Sheng Wang 


C1.1.1 CLUSTERS 


C1. 1.1.1 INTRODUCTION 


As we divide and subdivide a piece of bulk crystal, its properties will not change dramatically until we reach the 
nanometre scale. As particle size approaches molecular dimensions, all properties of a material change. Particles 
consisting of a few to a few thousand atoms are called clusters. These particles often show unique electronic, 
magnetic and chemical properties with dramatic size and shape dependence. The field of cluster research involves 
the elucidation of these unique size-dependent properties and how they evolve from the molecular to the bulk as 
more and more atoms are added. A wide variety of clusters have been made and investigated, including metals, 
semiconductors, ionic solids, noble gases and small molecules. The discovery of C 60 and fullerenes [1], as a result 
of studying carbon clusters (see chapter CI. 2 ), represents the best yield from cluster research. Curl, Kroto and 
Smalley were awarded the 1996 Nobel Prize in Chemistry for this remarkable discovery. 

Clusters are intermediates bridging the properties of the atoms and the bulk. They can be viewed as novel 
molecules, but different from ordinary molecules, in that they can have various compositions and multiple shapes. 
Bare clusters are usually quite reactive and unstable against aggregation and have to be studied in vacuum or inert 
matrices. Interest in clusters comes from a wide range of fields. Clusters are used as models to investigate surface 
and bulk properties [2]. Since most catalysts are dispersed metal particles [3], isolated clusters provide ideal 
systems to understand catalytic mechanisms. The versatility of their shapes and compositions make clusters novel 
molecular systems to extend our concept of chemical bonding, structure and dynamics. Stable clusters or passivated 
clusters can be used as building blocks for new materials or new electronic devices [4] and this aspect has now led 
to a whole new direction of research into nanoparticles and quantum dots (see chapter C2. 17 ). As the size of 
electronic devices approaches ever smaller dimensions [5], the new chemical and physical properties of clusters 
will be relevant to the future of the electronics industry. 

Cluster research is a very interdisciplinary activity. Techniques and concepts from several other fields have been 
applied to clusters, such as atomic and condensed matter physics, chemistry, materials science, surface science and 
even nuclear physics. While the dividing line between clusters and nanoparticles is by no means well defined, 
typically, nanoparticles refer to species which are passivated and made in bulk form. In contrast, clusters refer to 
unstable species which are made and studied in the gas phase. Research into the latter is discussed in the current 
chapter. 
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C1.1.2 TECHNIQUES FOR CLUSTER GENERATION AND DETECTION IN 
THE GAS PHASE 

01.1.2.1 CLUSTER GENERATION IN THE GAS PHASE 

The formation of clusters in the gas phase involves condensation of the vapour of the constituents, with the 
exception of the electrospray source [6], where ion-solvent clusters are produced directly from a liquid solution. 
For rare gas or molecular clusters, supersonic beams are used to initiate cluster formation. For nonvolatile 
materials, the vapours can be produced in one of several ways including laser vaporization, thermal evaporation 
and sputtering. 


SUPERSONIC EXPANSION SOURCE 

Supersonic expansion is an indispensable tool in modern chemical physics and physical chemistry [7]. It is an 
effective technique to produce weakly bonded clusters from gaseous species. Supersonic expansion of a gas sample 
through a small orifice cools the gas sample adiabatically to very low temperatures. Cluster growth is initiated 
through three-body collisions. A number of parameters (nozzle size, shape and backing pressure) can be varied to 
produce cold clusters and to tune cluster size distributions. Clusters with very low rotational and vibrational 
temperatures (a few kelvins) can be produced using the seeded beam technique, where a small amount of 
condensing gas is seeded in a helium beam to promote cluster formation and cooling. Clusters of rare gases and 
other small molecules are all produced and studied using the supersonic beam technique. 

LASER VAPORIZATION SUPERSONIC CLUSTER SOURCE 

Laser vaporization is one of the most popular and powerful techniques to produce metal and semiconductor clusters 
in the gas phase. Figure CI. 1.1 shows a schematic of a generic laser vaporization supersonic cluster beam source, 
first developed by Smalley and coworkers [§]. In this technique, an intense pulsed laser beam is focused onto a 
target. The rapid electronic to vibrational energy transfer allows the laser beam to heat the radiated spot to up to 
«10 000 K, producing a plasma with both neutral and charged atomic species. A pulsed high pressure carrier gas 
(usually helium) is delivered in coincidence with the laser pulse, and the rapid cooling due to the carrier gas 
initiates the cluster growth. The nascent clusters are entrained in the carrier gas and undergo a supersonic 
expansion to be further cooled. Both neutral and charged clusters can be produced. The laser vaporization 
technique is very versatile and it can produce clusters from any metal and semiconductor elements in the periodic 
table. Mixed clusters can be produced either by using an alloy target or adding a reactive gas in the carrier gas. A 
two-laser vaporization source has also been used to produce alloy clusters [9, 10 ]. 
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Figure CI. 1.1. Schematic of a typical laser vaporization supersonic metal cluster source using a pulsed laser and a 
pulsed helium carrier gas. 

THERMAL EVAPORATION SOURCE 


The thermal evaporation source was the earliest used to produce metal clusters in the gas phase [ 11 , 12 and 13 ], 
mostly for clusters of the alkalis and other low melting point materials. In this technique, a bulk sample is simply 


heated in an oven to produce the atomic vapour. The vapour is entrained in a low-pressure gas flow where 
nucleation and cluster growth take place. Clusters of sodium atoms with more than 20 000 atoms have been made 
with this source [14]. A high-pressure carrier gas can also be used to produce a supersonic beam of clusters with 
the thermal evaporation source. 

SPUTTERING SOURCE 

Sputtering of a target surface with energetic particles can produce clusters. The energetic particles are typically ion 
beams of the rare gases or cesium. Clusters are lifted from the target surface by the energetic impact. Cluster sizes 
produced in this technique are generally limited and the cluster temperatures are high [15]. Intense continuous 
beams of cluster ions can be produced by sputtering and have been used for size-selected cluster deposition [16]. A 
related technique is a cold cathode discharge in a flowing rare gas. The discharge ionizes the rare gas, which then 
sputter metal atoms off the target (cathode). Cluster ions are formed through aggregation in the flowing gas stream. 
Continuous beams of small cluster ions can be effectively produced with this technique [17]. 
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ELECTROSPRAY SOURCE 

Electrospray was originally invented as a soft ionization technique for biological mass spectrometry [6, 18]. In 
electrospray, a liquid solution containing the molecules of interest is sprayed through a syringe needle under high 
voltage. Highly charged droplets produced in this fashion are broken down and desolvated to produce the ions of 
interest. Electrospray can also be used effectively to produce ion-solvent clusters and weakly bonded complexes 
[19]. Electrospray is a fairly new technique and is beginning to be used by physical chemists to produce novel gas 
phase clusters and complexes [20, 21 ]. 

Ci. 1.2.2 CLUSTER DETECTION IN THE GAS PHASE 

The whole arsenal of physical chemistry methods has been utilized to investigate clusters. The development of 
cluster research is driven largely by new techniques to generate clusters and by new experimental tools to probe 
them. Mass spectrometry is the most useful tool in gas-phase cluster research because the first information one 
wants to know is the cluster's mass and size. All gas-phase investigations of clusters rely on mass spectrometry one 
way or the other. Since more stable clusters tend to be more abundant, a mass distribution of clusters contains 
valuable information about cluster stabilities, revealing 'magic numbers' — clusters with significantly higher 
abundance than their neighbours. Some of the most important discoveries of cluster science were based on the mass 
distribution of clusters, for example, the shell structure of free electron metal clusters [ 22 ] and C 60 [JJ. There are 
several ways to perform mass spectrometry, all using charged particles and measuring the mass/charge ratios of 
clusters. Mass separation of neutral clusters is still a challenging task [23]. The general assumption is that charged 
clusters also reflect the stability or distribution of the neutral clusters, although that is not always the case. 

The most popular mass spectrometric technique is the 'time-of-flight' method [24], in which cluster mass 
information is obtained by measuring the flight times of a cluster ion beam in a given distance. The time-of-flight 
method is particularly suitable with pulsed laser vaporization cluster sources, and has high efficiency because the 
whole mass range is measured for a given laser shot. The time-of-flight technique has moderate mass resolution, 
but high resolution can be achieved by using a reflection [25, 26]. Ion cyclotron resonance mass spectrometry is 
another powerful technique used in cluster research [27]. Ions are confined and stored in a three-dimensional trap 
formed by a strong uniform magnetic field (B) in the x-y directions and an electrostatic potential well in the z 
direction. The ion cyclotron frequency is co c = qB/m, so that highly accurate masses (m/q) can be obtained by 
measuring the cyclotron frequencies. Chemical reaction and fragmentation experiments are routinely performed 
with the stored ions [28, 29 ]. 

There are other techniques for mass separation such as the quadrupole mass filter and Wien filter. Another mass 
spectrometry technique is based on ion chromatography, which is also capable of measuring the shapes of clusters 
[ 30 , 31 ]. In this method, cluster ions of a given mass are injected into a drift tube with well-defined entrance and 
exit slits and filled with an inert gas. The clusters drift through this tube under a weak electric potential. Since the 


cluster mobility depends on collision cross sections with the inert gas, different isomers of a given cluster size are 
spatially separated in the drift tube. Structural information can be obtained for clusters whose isomers exhibit 
significantly different shapes, such as carbon and silicon clusters [32]. 
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C1.1.3 METAL CLUSTERS 

Metal clusters are bonded by strong covalent or metallic bonds. Clusters of the low melting point metallic elements 
are produced using the thermal evaporation technique. With the laser vaporization technique, metal clusters from 
all the metallic elements in the periodic table can be made. Simple metal clusters include those main group 
elements whose cluster properties are dominated by the delocalized nature of their valence electrons. In contrast to 
the simple metal clusters, transition-metal clusters are extremely complicated. Because of the unfilled d orbitals, 
transition-metal clusters possess a high density of electronic states. Transition-metal clusters possess both metallic 
and covalent bonding characters and exhibit interesting chemical, magnetic and electronic properties. Studies of 
transition-metal clusters are directly relevant to heterogeneous catalysis, surface science, metal cluster chemistry 
and metal-metal bonding in inorganic chemistry [33]. Although accurate theoretical descriptions of transition- 
metal clusters still pose a tremendous challenge, improved experimental and theoretical techniques are expected to 
make significant progress in the investigation of transition-metal clusters. 

C1. 1.3.1 SIMPLE METAL CLUSTERS AND THE ELECTRON SHELL MODEL 

The simple metal clusters are among the earliest cluster species experimentally investigated [34]. They include 
those clusters from elements of the main groups IA-IIIA. Clusters of the IB and IIB elements with filled d-shells 
can also be categorized as simple metal clusters because their properties are largely dominated by the free electron 
nature of the valence s electrons. The relative ease of their formation through the thermal evaporation source and 
their relatively simple electronic structure made many experimental and theoretical investigations possible [34, 35 
and 36]- In 1984, Knight and coworkers first observed from mass spectra of sodium clusters that clusters with 8, 
20, 40, 58 and 92 atoms are more abundant than other clusters [22], as shown in figure CI. 1.2 . They explained this 
observation in terms of a one-electron shell model [3], in which the valence electrons of the constituent atoms are 
completely delocalized within the volume of the cluster. 
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Figure CI. 1.2. (a) Mass spectrum of sodium clusters (Na^) ? TV = 4-75. The inset corresponds to N = 75-100. Note 
the more abundant clusters at N = 8, 20, 40, 58, and 92. (b) Calculated relative electronic stability, A(N+ 1) - A(7V) 
versus Abusing the spherical electron shell model. The closed shell orbitals are labelled, which correspond to the 
more abundant clusters observed in the mass spectrum. Knight W D, Clemenger K, de Heer W A, Saunders W A, 
Chou M Y and Cohen M L 1984 Phys. Rev. Lett 52 2141, figure 1. 

In the shell model [37, 38], the jellium approximation is used to replace the positive cores with a uniform 
background potential. The valence electrons are treated as a quantized Fermi gas moving in the jellium potential 
and bounded by the cluster surface. The effective one-electron wavefunctions of a spherically symmetric potential 
are characterized by a main quantum number, n, and an angular momentum quantum / with degeneracy 2(21+1), 
including spin. A closed shell system is obtained when all the levels for a given / are occupied with valence 
electrons. A closed shell system exhibits an enhanced stability because there exists a large energy gap to the next 
empty level (see figure CI. 1.2(b)). The ordering of the levels depends on the shape of the potential. It turns out that 
the potential form for the metal clusters is analogous to that used in nuclear physics with similar shell structures. 
Therefore, the quantum numbers in 
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metal cluster physics follow the nuclear physics convention and there is no restriction on the angular quantum 
number / (unlike the filling of angular momentum shells in atomic physics underlying the periodic table of the 
elements). The nuclear shell model was developed in the 1940s to explain particularly stable nuclei. Thus concepts 
and ideas have been borrowed from nuclear physics to cluster physics and there is an interesting cross fertilization 
of two very different fields [39]. Besides the idea of shell closing, collective excitations [40, 41 and 42], fission 
[43, 44 and 45], and scattering [ 46 ] — concepts familiar in nuclear physics — have also been studied for the simple 
metal clusters. 


The spherical shell model can only account for the major shell closings. For open shell clusters, ellipsoidal 
distortions occur [47], leading to subshell closings which account for the fine structures in figure CI. 1.2(a) . The 
electron shell model is one of the most successful models emerging from cluster physics. The electron shell effects 
are observed in many physical properties of the simple metal clusters, including their ionization potentials, electron 
affinities, polarizabilities and collective excitations [34]. 

C1. 1.3.2 TRANSITION-METAL CLUSTERS: CHEMISTRY 

The microscopic understanding of the chemical reactivity of surfaces is of fundamental interest in chemical physics 
and important for heterogeneous catalysis. Cluster science provides a new approach for the study of the 
microscopic mechanisms of surface chemical reactivity [48]. Surfaces of small clusters possess a very rich 
variation of chemisorption sites and are ideal models for bulk surfaces. Chemical reactivity of many transition- 
metal clusters has been investigated [49]. Transition-metal clusters are produced using laser vaporization, and the 
chemical reactivity studies are carried out typically in a flow tube reactor in which the clusters interact with a 
reactant gas at a given temperature and pressure for a fixed period of time. Reaction products are measured at 
various pressures or temperatures and reaction rates are derived. It has been found that the reactivity of small 
transition-metal clusters with simple molecules such as H 2 and NH 3 can vary dramatically with cluster size and 
structure [48 , 49 , 50, 51 and 52] . 

Figure CI. 1.3 shows a plot of the chemical reactivity of small Fe, Co and Ni clusters with H 2 as a function of size 
(full curves) [53]. The reactivity changes by several orders of magnitudes simply by changing the cluster size by 
one atom. Both geometrical and electronic arguments have been put forth to explain such reactivity changes. It is 
found that the reactivity correlates with the difference between the ionization potential (IP) and the electron affinity 

(EA) for a given cluster, corrected by a Coulomb energy, e /R, where R is the radius of the cluster ( figure CI. 1.3 ), 
dashed lines). This observation is interpreted using a model in which the probability of H 2 chemisorption is 
proportional to the magnitude of an entrance channel barrier caused by Pauli repulsion between H 2 and the cluster. 
This barrier is assumed to be proportional to the energy gap between the highest occupied and lowest unoccupied 
orbitals, characterized by the difference between the IP and EA of the cluster. Figure CI. 1.3 demonstrates the 
importance of the cluster electronic structures on the cluster reactivity. 
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Figure CI. 1.3. Relative reactivity of transition-metal clusters with H 2 (full curves, log scale) and the promotion 

energy EP(7V) = IP(iV)- EA(7V) - e /R(N), where IP, EA, and R represent the cluster ionization potential, electron 
affinity, and radius, respectively. The top figure is for Fe^ (N= 8-20), the middle figure is for Co^ (N= 4-26), and 
the lower figure is for Ni^ (N= 7-20). Conceicao J, Laaksonen R T, Wang L S, Guo T, Nordlander P and Smalley 
R E 1995 Phys. Rev. B 51 4668, figure 3. 

Cluster chemisorption experiments have been used extensively to probe the geometric structure of transition-metal 
clusters by Riley and coworkers [54, 55 and 56]. Since most of the atoms of a cluster are on the cluster surface, the 
available surface sites contain information about the underlying cluster structure. By measuring the maximum 
uptake of molecules, one can gain an insight into the cluster packing geometry. These studies have been mainly 
focused on clusters of Fe, Co, Ni and Cu and have found that icosahedral packing is the dominating structural 
feature of these clusters [54, 55 and 56]. There is ample evidence that multiple structural isomers exist for many of 
these clusters. For Cu clusters, evidence is provided for both electron shell behaviour and icosahedral geometrical 
structure [56]. 
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The reactivity of size-selected transition-metal cluster ions has been studied with various types of mass 
spectrometric techniques [15]. Fourier- transform ion cyclotron resonance (FT-ICR) is a particularly powerful 
technique in which a cluster ion can be stored and cooled before experimentation. Thus, multiple reaction steps can 
be followed in FT-ICR, in addition to its high sensitivity and mass resolution. Many chemical reaction studies of 
transition-metal clusters with simple reactants and hydrocarbons have been carried out using FT-ICR [49, 57, 58 ]. 
A special reactive channel is cluster fragmentation induced either by photoabsorption or collisions with an inert 
gas. Measuring cluster fragmentation pathways and energetics provides information about both the cluster 
structures and bonding energies. Strong size-dependent bonding energies are found for small transition-metal 
cluster ions, and the bonding energy approaches the bulk cohesive energy smoothly for large clusters [59, 60 ]. 


C1. 1.3.3 TRANSITION-METAL CLUSTERS: ELECTRONIC STRUCTURE 

The diverse chemical and physical properties of the transition-metal clusters derive from their rich electronic 
structure. Thus probing the electronic structure of transition-metal clusters is of special interest. Transition metal 
dimers have been extensively studied by resonance two-photon ionization (R2PI) spectroscopy [61, 62, 63 and 64 ]. 
However, the high density of low-lying electronic states, characteristic of the transition-metal clusters, prevents the 
R2PI technique being used for larger clusters. Single photon photofragmentation spectroscopy of clusters bound 
with rare gas atoms has been used to probe the electronic structure of larger transition-metal clusters [65, 66 ]. 
Photoionization experiments have been used extensively to measure the IPs of transition-metal clusters [67, 68 ]. 
Recently, ZEKE (zero kinetic energy) spectroscopy has been applied to small neutral transition-metal clusters [69]. 

A more powerful experimental technique to probe the electronic structure of transition-metal clusters is size- 
selected anion photoelectron spectroscopy (PES) [70, 71, 72, 73, 74, 75 and 76]. In PES experiments, a size- 
selected anion cluster is photodetached by a fixed wavelength photon and the kinetic energies of the photoemitted 
electrons are measured. PES experiments provide direct measure of the electron affinity and electronic energy 
levels of neutral clusters. This technique has been used to study many types of clusters over a large cluster size 
range and can probe how the electronic structures of transition-metal clusters evolve from molecular to bulk [77, 
78 , 79, 80 and 81]. Research has focused on the 3d transition-metal clusters, for which there have also been many 
theoretical studies [82, 83, 84, 85, 86, 87, 88 and 89]. It is found that the electronic structure of the small transition- 
metal clusters is molecular in nature, with discrete electronic states. However, the electronic structure of the 
transition-metal clusters approaches that of the bulk rapidly. Figure CI. 1.4 shows that the electronic structure of 
vanadium clusters with 65 atoms is already very similar to that of bulk vanadium [90]. Other 3d transition-metal 
clusters also show bulk-like electronic structures in similar size range [78]. 
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Figure CI. 1.4. Photoelectron spectra of V A .(7V= 17, 27, 43, and 65) at 6.42 eV photon energy, compared to the 
bulk photoelectron spectrum of V(100) surface at 21.21 eV photon energy. The cluster spectra reveal the 
appearance of bulk features at V 17 and how the cluster spectral features evolve toward the bulk. The bulk spectrum 
is referenced to the Fermi level. Wu H, Desai S R and Wang L S 1996 Phys. Rev. Lett. 11 2436, figure 2. 

C1. 1.3.4 TRANSITION-METAL CLUSTERS: MAGNETISM 


One of the interesting aspects of transition-metal clusters is their novel magnetic properties [91, 92, 93 and 94 ]. 


Although most transition-metal atoms have unpaired d-electrons and are magnetic, very few bulk transition-metal 
crystals are magnetic. Therefore, it is of great interest to understand how the magnetic properties of transition 
metals develop (diminish) as cluster size increases. The magnetic properties of transition-metal clusters have been 
investigated using the Stern-Gerlach molecular beam deflection method. Magnetic properties of clusters of the 
three bulk ferromagnetic materials, Fe, Co and Ni have been extensively studied [95, 96, 97, 98 and 99]. These 
clusters are found to be superparamagnetic with strong size-dependent magnetic moments. Figure CI. 1.5 shows the 
measured magnetic moments of small Ni clusters as a function of size [99]. The dramatic size dependence of the 
cluster magnetic moments is interpreted to be due to a surface enhancement: the mimima correspond to clusters 
with closed geometrical shells and maxima to clusters with relatively open structures. Small clusters generally 
possess much higher moments than the bulk materials, and the moments approach bulk values in the size range of 
about 500 atoms. Magnetism has also been detected in clusters of those elements whose bulk crystals are 
nonmagnetic [ 100 ]. 
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Figure CI. 1.5. Nickel cluster magnetic moment per atom (jli) as a function of cluster size, at temperatures between 
73 and 198 K. Apsel S E, Emmert J W, Deng J and Bloomfield L A 1996 Phys. Rev. Lett. 76 1441, figure 1. 


C1.1.4 SEMICONDUCTOR CLUSTERS 


Since silicon is the most important semiconductor material, clusters of silicon have been most extensively studied, 
both theoretically and experimentally. The electronic structure [101, 102 ., 103 and 104 ], geometrical structure [ 105 , 
106, 107 , 108 , 109 and 110 ] and chemical reactivity [ 111 ] of silicon clusters have been investigated. The structures 
of small silicon clusters assume three-dimensional structures different from both that of the bulk crystal and that of 
its group IV neighbour, carbon. Ion mobility experiments have been very effective in providing experimental 
structural information for silicon clusters, and confirm that many structural isomers exist for silicon clusters 
because of their strong covalent bonding and relatively open structures [ 106 , 110 ]. Ion mobility results show that 
silicon clusters up to -27 atoms follow a prolate growth sequence, resulting in geometries with an aspect ratio of -3 
[ 106 ]. Larger clusters appear to assume more spherical geometries. The structures of medium-sized silicon clusters 
with 12-26 atoms have been studied recently by theoretical calculations using density functional theory in 
combination with ion mobility experiments [ 110 ]. Figure CI. 1.6 shows the calculated structures of silicon clusters 
containing 12-20 atoms. The clusters with less than 18 atoms can be visualized as stacked Si 9 tricapped trigonal 
prisms, whereas global minima of Si 1Q and Si 20 assume more spherical structures. 
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Figure CI. 1.6. Minimum energy structures for neutral Si^ clusters (n = 12-20) calculated using density functional 
theory with the local density approximation. Cohesive energies per atom are indicated. Note the two nearly 
degenerate structures of Si 16 . Ho K M, Shvartsburg A A, Pan B, Lu Z Y, Wang C Z, Wacher J G, Fye J L and 
Jarrold M F 1998 Nature 392 582, figure 2. 

Other semiconductor clusters have also been studied, such as germanium clusters [ 101 , 112 ] and mixed clusters of 
the III-V semiconductors GaAs [ 113 , 114 and 115 ] and InP [ 116 , 117 ]. Of particular interest is the evolution and 
emergence of the energy band gap in these clusters. Infrared and visible absorption spectroscopy has been 
performed on indium phosphide clusters (In P ) with x + y up to 14 [ 116 ]. An optical-gap-like feature with an onset 
close to the band gap of bulk crystalline indium phosphide is already observed for the even clusters in this size 
range. This is surprising, because according to the model of quantum confinement these tiny clusters are expected 
to have band gaps much larger than that of the bulk crystal. Photoelectron spectroscopy experiments on size 
selected gallium arsenide cluster anions (Ga.As) showed that the electron affinity of the neutral clusters withx + y 

around 50 already approached that of the bulk [ 113 ], quite different from the behaviour of metal clusters. Even 
though the electronic structure of 
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transition-metal clusters already approaches that of the bulk in a similar size regime, their electron affinity is still 
smaller than that of the bulk by more than 1 eV [77, 78, 79, 80 and 81]. These observations indicate the importance 
of charge localization in semiconductor clusters [ 118 ]. For bulk GaAs and InP, surface reconstruction creates 
shallow traps for conduction-band electrons. The wavefunction for a trapped electron is localized and is not 
subjected to the same quantum confinement effects as the delocalized orbital of a conduction-band electron. Thus, 


quantum confinement effects are expected to be less important in clusters, where charge localization dominates, as 
should be the case for small clusters of GaAs and InP. The nature of charge localization suggests that such 
molecular-sized clusters may be ideal models for studying the surface behaviours of bulk semiconductors. 


C1.1.5 IONIC CLUSTERS AND MIXED CLUSTERS 

C1.1.5.1 IONIC CLUSTERS 

Ionic clusters, such as alkali halide clusters, were among the earliest cluster species experimentally investigated 
[ 119 ]. The binding in ionic clusters is dominated by classical electrostatic effects, and simple interaction potentials 
can give fairly accurate descriptions of these clusters. One characteristic of the ionic clusters is that they mimic the 
structures of the bulk ionic crystals even at relatively small sizes, making ionic clusters attractive targets for both 
experimental and theoretical investigations. Nonstoichiometric alkaline halide clusters or clusters with excess 
electrons have been used as models to study bulk defects [ 120 , 121 and 122 ]. Recent investigations have found 
facile structural transformation in alkaline halide clusters with rather low activation energies [ 123 , 124 ]. 

Oxide clusters are another class of important ionic clusters because of the important roles that oxide materials play 
in both chemical catalysis and advanced materials applications. Oxide clusters of the main group I— III elements are 
dominated by the electrostatic interactions [ 125 , 126 and 127 ]. Oxide clusters of the transition metals become more 
complicated with both ionic and covalent characters [ 128 ]. Oxide clusters of the late main group elements, such as 
silicon, are more dominated by covalent bonding. Oxide clusters are relatively less well characterized. Chemical 
reactivity of a number of transition-metal oxide clusters has been studied in a fast flow reactor with laser 
vaporization [ 129 ]. Antimony and bismuth oxide clusters have been recently produced, and magic number clusters 
characteristic of bulk compositions are observed [ 130 , 131 ]. Photoelectron spectroscopy of size-selected anions has 
been carried out on a number of oxide cluster series [ 126 , 132 , 133 and 134 ]. The electronic structure evolution 
from that of a bare cluster to that of an oxide is monitored as the cluster is oxidized step-by-step by oxygen. Figure 
CI. 1.7 shows the structures of a series of Si 3 O v (y = 1-6) clusters [ 134 ], which can be viewed as a sequential 


oxidation of a Si 3 cluster. The local Si-0 bonding structure from Si 3 to Si 3 3 mimic that of the initial oxidation 
of a silicon surface, whereas the larger clusters (Si 3 4 to Si 3 4 ) with a Si0 4 unit begins to mimic that of bulk 


silicon oxide. 
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Figure CI. 1.7. MP2/6-31 1 + G* optimized structures of the Si 3 (y = 1-6) clusters. All bond lengths are in A. 
Note that for y = 1-4, all the O atoms are bridge bonded to two Si. Wang L S, Nicholas J B, Dupuis M, Wu H and 
Colson S D 1997 Phys. Rev. Lett 78 4450, figure 2. 

C1.1.5.2 CARBIDE CLUSTERS 

Metal-carbide clusters are relevant to the formation of both endohedral fullerenes and carbon nanotubes [ 135 ]. 
There also exists a class of apparently stable metal-carbide cluster ions, M^Cj^M = Ti, V, Cr, Zr and Hf), called 

metallocarbohedrenes (met-car), first discovered by Castleman and coworkers [ 136 ]. The formation mechanisms of 
these novel clusters and nanostructures are still not elucidated [ 137 ]. Understanding the chemical bonding and 
structures of small metal-carbide clusters provides important insight into their growth mechanisms and can help 
design more efficient techniques for their bulk synthesis. For example, annealing of LaC^clusters in the gas phase 

converts them into endohedral fullerenes, and suggests that the La atom acts as a nucleation centre and the carbon 
rings arrange themselves around the La atom to form the final products [ 138 ]. Met-cars have not been isolated in 
bulk form and their structures have not been determined despite extensive experimental and theoretical 
investigations [ 137 ]. Many 
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questions still remain about the structure and bonding of metal-carbide clusters and the met-cars. Detailed 
characterization of the small carbide clusters will be key to understanding the formation of met-cars and endohedral 
fullerenes, as well as the catalytic effects in carbon nanotube growth. Mixed clusters in general provide interesting 
gas phase systems because not only their size, but also their composition, can be systematically varied. Clusters 
with tailored chemical or physical properties may be designed with the mixed clusters. 


C1.1.6 RARE-GAS CLUSTERS AND OTHER WEAKLY BONDED 
MOLECULAR CLUSTERS 

Rare-gas clusters can be produced easily using supersonic expansion. They are attractive to study theoretically 
because the interaction potentials are relatively simple and dominated by the van der Waals interactions. The 
Lennard- Jones pair potential describes the structures of the rare-gas clusters well and predicts magic clusters with 
icosahedral structures [ 139 , 140 ]. The first five icosahedral clusters occur at 13, 55, 147, 309 and 561 atoms and 
are observed in experiments of Ar, Kr and Xe clusters [ 141 ]. Small helium clusters are difficult to produce because 
of the extremely weak interactions between helium atoms. Due to the large zero-point energy, bulk helium is a 
quantum fluid and does not solidify under standard pressure. Large helium clusters, which are liquid-like, have 
been produced and studied by Toennies and coworkers [ 142 ]. Recent experiments have provided evidence of 

superfluidity in He clusters for as few as 60 atoms [ 143 ]. Helium clusters provide an ultracold environment, which 
can be used as a matrix to trap other molecules. Currently there is considerable interest in using large helium 
clusters as 'nanocryostats' to trap metal clusters and other molecular species for high-resolution spectroscopy 
investigations [144, 145 ]. 

Molecular clusters are weakly bound aggregates of stable molecules. Such clusters can be produced easily using 
supersonic expansion, and have been extensively studied by both electronic and vibrational spectroscopy [146, 
147 ]. Hydrogen-bonded clusters are an important class of molecular clusters, among which small water clusters 
have received a considerable amount of attention [ 148 , 149 ]. Solvated cluster ions have also been produced and 
studied [ 150 , 151 ]. These solvated clusters provide ideal model systems to obtain microscopic information about 
solvation effect and its influence on chemical reactions. 


C1. 1.7 OUTLOOK 

Gas-phase investigations of clusters over the past two decades have provided major advances of our fundamental 
understanding of these microscopic species and how their physical and chemical properties change with size. With 
new and continuously improved experimental and theoretical techniques, more discoveries and deeper 
understanding can be expected. Clusters provide the flexibility in size, shape, and composition in making new 
molecular species, which will enrich our concept of chemical structure and bonding. With new species that do not 
follow the classical valency, new chemical bonding theories and ideas will need to be developed to predict their 
physical and chemical properties. To understand the molecular details of the evolution from small clusters to 
nanocrystals will continue to be a challenge to both experimental and theoretical investigations. The diverse topics 
and systems afforded by clusters will make this field continue to be exciting and challenging for the new 
millennium to come. 
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C1.2 Fullerenes 

Dirk M Guldi 


INTRODUCTION 

The scope of the following article is to survey the physical and chemical properties of the third modification of 
carbon, namely [60]fullerene and its higher analogues. The enthusiasm that was triggered by these spherical carbon 
allotropes resulted in an epidemic-like number of publications in the early to mid-1990s. In more recent years the 
field of fullerene chemistry is, however, dominated by the organic functionalization of the highly reactive fullerene 


core, yielding literally thousands of fullerene derivatives with new and, in part, even more intriguing properties 
than pristine fullerenes. This still growing field is the subject of a new number of excellent review articles and 
books. The beginning of the current review deals with the fullerenes' structural and electronic configurations, 
followed by a detailed description of their undoped/doped thin films and a discussion of fullerene -based polymers 
and Langmuir-Blodgett films. In addition, the properties of fullerenes in condensed media, ranging from 
electrochemical redox reactions and photoexcited states to electron transfer processes, are elucidated. This account 
will end with a final section regarding metal incorporated endohedral complexes. 


C1.2.1 STRUCTURE 

The initial report regarding the existence and characterization of [60] fullerenes ( figure CI. 2.1 ) by Kroto et al is an 
important landmark for the chemistry and physics of fullerenes [JJ. The importance of this discovery was 
acknowledged with the Nobel Prize in 1996. It took, however, a few more years until Kratschmer et al reported a 
method describing the arc discharge of carbon rods with the prospect of synthesizing large quantities of fullerene 
materials ( figure CI. 2.2 ) [2]. In parallel, the laser evaporization cluster beam technique has been employed by 
Smalley and coworkers to vaporize graphite in a helium atmosphere and, thus, to mimic the appropriate fullerene 
nucleation conditions [3]. With gram quantities at hand, scientists began to investigate the unique chemical and 
physical properties of this spherical carbon allotrope. 

The fundamental concept proposed for the composition of three-dimensional fullerene structures is the introduction 
of five-membered (i.e. pentagon) rings, which are primarily responsible for the curvature [4]. They function like 
defects in a graphite structure and lead to nonplanarity of the Ti-electronic structure. However, the strain energy will 
only be minimized when the pentagons are as far apart as possible. This 'isolated pentagon' principle [5] has best 
been achieved in [60] fullerene, which consists of 12 regularly implanted five-membered (i.e. pentagon) and 20 six- 
membered (i.e. hexagon) rings and, therefore, differs most markedly from two-dimensional carbon structures (i.e. 
graphite). As a direct result of the 12 pentagonal faces, [60]fullerene shows, in contrast to graphitic sheets, an 
anisotropic electron distribution. In [60] fullerene, the pentagons are most evenly distributed, but not as far apart as 
possible. 
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Figure Cl.2.1. Structure of [60] fullerene (I h ), [70] fullerene (D 5h ), [76] fullerene (D 2 ), [78] fullerene (C 2v ) and 
[84] fullerene (D 2 ).="-l"> 2v ) and [84] fullerene (D 2 ). 


More important, the surface curvature of the carbon network exerts a profound impact on the reactivity of the 
fullerene core [6, 7]. In this context, the most striking consequence emerges from the pyramidalization of the 
individual carbon atoms. Influenced by the curvature, the sp hybrids which exist in truly two-dimensional planar 


carbon networks and hydrocarbons adopt an sp hybridization with p orbitals that possess an s character of 
0.085. Accordingly, the exterior surface is much more reactive than planar analogues, and is comparable to those of 
electron deficient polyolefms. This, in turn, rationalizes the high reactivity of the fullerene core towards 
photolytically and radiolytically generated carbon- and heteroatomic-centred radicals and also other neutral or ionic 
species [8]. The interior, in contrast, is shown to be practically inert [9]. Despite these surface related effects, the 

9 97R 

sp character of the carbon atoms is expected to have a stabilizing effect on carbon-centred radical ions as well 
as carbanions and carbocations. 

The first fullerene to be characterized was the I h [60] fullerene, which was originally identified by its four-band IR 
absorption spectrum [2]. The proposed cagelike structure of [60] fullerene, with a diameter of 7.1 A, was 

unequivocally confirmed by the detection of a single 13 C NMR resonance, stemming from the equivalency of all 
the carbon atoms in this molecule [10]. X-ray crystal structures reveal two different types of C-C bond, i.e. short 
6-6 bonds, with a high double bond character and long 6-5 bonds, possessing low double bond character [3]. In 

contrast, the C NMR spectrum of [70] fullerene exhibits five lines, again in perfect agreement with a closed 
sphere. Also ionization experiments helped to characterize the fullerene structure [H, 12 and 13 ]. The C^* carbon 

clusters, produced upon ionization, have large internal energies and cool via the sequential emission of C 2 
molecules. The latter route arises predominantly from a combination of the high stability of the even-membered C 2 
clusters and their relatively high binding energy of -3.6 eV. This makes the C 2 loss mechanism energetically more 
favourable than separating two individual carbon atoms. 
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Figure Cl.2.2. Diagram of the apparatus used to produce fullerenes from graphite rods. 


In addition to the most abundant fullerene, namely [60] fullerene, a number of higher fullerenes have also been 
isolated and characterized, including [ 70 ] (point group D 5h ), chiral [ 76 ] (point group D ? ), the D^ and C ? v isomers 


of [78] and an equilibrated mixture of [84]fullerene of D 2 and D 2d symmetry (see figure Cl.2.1 ) [ 14 , 15, 16 , 17 and 
18 ]. Large crystalline quantities of these higher fullerenes are scarce, while there is a greater complexity associated 
with the lower symmetry of the molecule. In addition to their relatively small synthetic amounts, the presence of 
more than a single isomer which satisfies the isolated-pentagon rule results in further complications with respect to 
separation of these isomeric mixtures. 


C1.2.2 CRYSTAL STRUCTURE 

Below 90 K, [60]fullerene freezes into an orientational glass in which it adopts a simple cubic structure [19]. This 
low temperature structure can be traced to the anisotropic electronic structure. Alignment of the electron rich 
regions of 
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one molecule over the electron deficient regions of its neighbouring molecule optimizes the electrostatic 
contribution to the predominantly van der Waals intermolecular bonding and, in turn, governs the overall stability 
of this glass phase. Above 90 K, the structure is a primitive cubic structure (space group T 6h or Pa3) [20]. In this 
temperature regime, molecular motions are no longer restricted and, accordingly, the molecules start to move freely 
between two distinct nearly degenerate orientations, differing in energy by -1 1.4 meV. In principle, this phase of 
residual rotational motion is followed by a first-order phase transition at 261 K yielding a face-centred cubic (fee) 
structure Fm3m, characterized by rapid isotropic reorientational motion of the molecules [21]. In this phase 
transition, from an orientationally ordered to an orientationally disordered phase, a competition dominates between 
an entropy gain by rotation and an energy gain by intermolecular attraction. 

[70]fullerene, with a D 5h symmetry, on the other hand, crystallizes in two phases, namely, centred cubic (ccp, 
figure CI. 2. 3) and hexagonal close packed structures (hep, figure CI. 2.4 ), which differ, in essence, only in their 
stacking sequence. Heating experiments on the pure hep phase (ABAB), by means of dilatometry, in the 200-400 
K temperature window give rise to two phase transitions: first to a deformed hep and secondly to a monoclinic 
structure. Structural studies on a ccp crystal revealed a transition to an fee (Fm3m) at high temperature and, upon 
cooling, a phase transition to a rhombohedral phase (R3m) (ABC ABC) [22, 23, 24 and 25 ]. 



Figure Cl.2.3. Cubic close packing (ABC) of [60]fullerene. 
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Figure Cl.2.4. Hexagonal packing (ABA) of [60]fullerene. 


C1.2.3 ELECTRONIC CONFIGURATION 

[60]fullerene has a truncated-icosahedral form, with a point group symmetry I h which allows a degeneracy as high 
as five. The 30 filled pn orbitals host 60 n electrons, in a structural pattern closely resembling that of free particles 
on the surface of a sphere and, in turn, evoke an equal net atomic charge distribution on each carbon [26]. All 60 
carbon atoms have equivalent symmetry, but the bonds fall into two sets, namely, hexagon-pentagon and hexagon- 
hexagon edges. The 60 Hiickel molecular orbitals give rise to the reducible representation: 2A + 3T, + 4T 2 + 
6G +8H +1A +4T, + 5T 9 + 6G + 7H . Only the A„ and the H modes are Raman active while the T t 

g g U 111 ZU U U g g 111 

modes are solely IR active [27]. 

In essence, the 60 MOs split into 30 bonding and 30 antibonding n molecular orbitals with the h u and t lu 
broadening into the valence and conduction bands of the solid, respectively [28]. Because of the presence of both 
pentagonal and hexagonal rings in the fullerene cage, there are six tj band electrons in addition to the more 
common a and n electrons [26]. For example, graphite consists only of hexagons and, hence, only the a and n 
electrons are present. Molecular calculations regarding the electronic configuration have determined that this 
threefold degenerate LUMO (tj ) is separated by -1.8 eV from a lower lying fivefold degenerate HOMO (h u ) and 
from a higher lying threefold degenerate LUMO + 1 (t 1 ) ( figure CI. 2. 5 ) [29, 30]. The moderate optical energy gap 
not only underlines the remarkable electron accepting features of these carbon spheres, but it emphasizes, in 
combination with the optical conductivity, the semiconducting characteristics of solid fullerene, comparable to 
hydrogenated amorphous semiconductor silicon (a-Si:H). 
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Figure Cl.2.5. Illustration of the prc orbital energy levels in [60] fullerene, [70]fullerene and monofunctionalized 
pyrrolidino [60] fullerene [ 26 ] . 

In light of oxidative processes, the high degree of resonance stabilization that arises from the maximally occupied 
HOMO (10 electrons), makes it an extremely difficult task to remove an electron from the HOMO level [31]. Thus, 
[60] fullerene can be considered mostly an electronegative entity which is much more easily reduced than oxidized. 

The electronic configuration of higher fullerenes, e.g. [ 70 ] (figure Cl.2.5) [76], [78] and [84] fullerene, is in essence 
similar to that known for [60] fullerene with, however, one fundamental difference. Their HOMO-LUMO energy 
gap decreases gradually with increasing number of carbon atoms [32]. On the other hand, recent calculations 
regarding the smallest so far isolated fullerene species, namely [36]fullerene, indicate also a substantially reduced 
energy gap of -0.2 eV [33]. From the even stronger constrained curvature, relative to [60] fullerene, stems the 
fundamental consequence that the carbon atoms in [36] fullerene become so reactive that rapid polymerization 
occurs, preventing a systematic detailed investigation of any other properties of [3 6] fullerene. 

The most important classes of functionalized [60] fullerene derivatives, e.g. methanofullerenes [34], 
pyrrolidinofullerenes [35], Diels-Alder adducts [34] and aziridinofullerene [36], all give rise to a cancellation of 
the fivefold degeneration of their HOMO and threefold degeneration of their LUMO levels (figure Cl.2.5). This 
stems in a first order approximation from a perturbation of the fullerene' s Ti-electron system in combination with a 
partial loss of the derealization. 


C1.2.4 THIN FILMS 

The growth of a well ordered fullerene monolayer, by means of molecular beam epitaxy, has been used for the 
controlled nucleation of single crystalline thin films. The quality and stability of molecular thin films has been 
shown 
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to depend strongly on the interaction between the molecules and the substrates chosen for their growth [37, 38]. It 
is important to note that fullerenes give rise to much stronger intermolecular interactions relative to conventional 
organic molecules. Thus, their utilization for molecular thin films, by means of organic molecular beam epitaxy, 
has been vigorously investigated. In essence, parameters such as the character and also the strength of the 
interaction between the fullerene core and the substrate determine the morphology and regularity of the film 
growth. In the case of strong interactions the mobility of the chemisorbed fullerene molecule is limited and 


successive deposition leads to the growth of poly crystalline grains with typically small diameters. In clear contrast, 
substrates such as {001} KBr [39, 40], {0001} MoS 2 [41], {0001} GaSe [41] {111} CaF [42], {111} Si [42, 43], 
{111} GeS [44], GaAs [45], freshly cleaved mica [46, 42] and layered materials [38], induce sufficiently strong van 
der Waals interaction between the fullerene molecules and, in turn, overpower the interaction between the substrate 
and individual fullerene molecules. This leads to a high effective surface mobility of the physisorbed fullerene 
molecules for fairly large grains. Predominantly a fee structure with a series of close-packed planes {111} oriented 
with respect to the substrate plane, or a hep structure, with {0001} close-packed planes, is found. Also good single- 
domain epitaxy structure was reported on Au(l 10), Ag(l 10) and Ni(l 10), while multi-domain growth predominates 
on Cu(l 1 1), Au(l 1 1), Ag(l 1 1) and Pt(l 1 1). Charge transfer into the fullerene's LUMO is deemed to be the 
dominant effect that leads to a strong interaction with the substrate and reduces the effective surface mobility, as 
has been observed for Au{l 10} and a variety of metals including Ag, Mg, Cr and Bi [48,49,50,51 and 52 ]. 

Consequently, deposition of the first monolayer is the most important factor, determining the growth of the 
subsequent layers and, consequently, the crystallinity of the resulting multilayered films. General information 
regarding the strength of the fullerene-substrate bond can be derived from the thermal stability of the fullerene 
layer. This has been impressively documented via the observation of the principally different crystallinity of [ 60 ] 
fullerene films on semiconductor surfaces such as Si. Depending on the Si surface, being either hydrophobic 
(passivated) and hydrophilic (non-passivated), films were crystalline with a fee structure and a noticeable {111} 
texture, or of amorphous nature, respectively. The amorphous character has been ascribed to the fullerene's 
interaction with the hydrophilic substrate, where [60] fullerene is mobile and diffuses freely even at temperatures as 
low as 100 K [37]. 


C1.2.5 DOPING OF FULLERENES AND SUPERCONDUCTIVITY 

The crystal structure of [60]fullerene reveals characteristics of a fee packing with the molecules located at the 
lattice points and four fullerene molecules per unit cell [26]. According to the fee lattice constant, a = 14.15A the 
intramolecular centre-to-centre distance (an2) is exactly 10.01 A. The close packing of the nanometre-sized 
fullerenes creates two types of interstitial site, sufficiently large to host small-sized molecules or atoms without 
distorting the crystal [53]. In particular, one octahedral site and two tetrahedral sites per [60] fullerene are present 
with radii of 2.06 A and 1.12 A, respectively. It was shown that the respective tetrahedral and octahedral vacancies 
of the fullerene crystal may be filled with a wide variety of dopants [54,55,56 and 57]. These range from various 
alkali (Li, Na, K, Rb, Cs), earth alkali metals (Ca, Sr, Ba) and rare-earth metals (Yb, Sm, Eu) to organic donors, 
such as ferrocene and tetrakis(dimethylamino) ethylene. Metal diffusion in fullerene films and single crystals or 
vapour transport diffusion are the most widely applied methodologies for fabricating [60]fullerene-metal 
intercalation composites. On the other hand, intercalation of compounds that have low diffusion coefficients in C 60 
necessitates sublimation in a UHV chamber. 

The occupation of each tetrahedral and octahedral site in these regularly oriented arrays of cavities by, for example, 
alkali atoms results in the transfer of a single electron to the fullerene's conduction band (t ln ) [58]. Consequently, 
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via stoichiometric alkali metal doping, intercalation structures were fabricated ranging from A 1 C 60 and A 3 C 60 to 
terminally reacted A 6 C 60 phases and A 12 C 60 /A' 6 C 60 (A = Li and A' = Sr, Ba; full occupation of the LUMO (t lu ) 
and LUMO + 1 (t 1 ) levels) (figure CI. 2.6) [54, 55, 56 and57]. It was shown that the stability of the alkali A 3 C 60 
and A 6 C 60 composite structures is mainly governed by the Madelung potential. 
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Figure Cl.2.6. Summary of fee [6Q]fullerene structure and alkali-intercalation composites of [60]fullerene. 
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The half-filled molecular conduction band (t lM ) in A 3 C 60 gives rise to a maximized density of states at the Fermi 
level. It is interesting to note that this metallic phase becomes a superconductor at low temperatures, while the 
A 6 C 60 phase proves to be mainly insulating ( figure Cl.2.6 ). The resulting A 3 C 60 compounds are typically ionic 
[A*h[C£.T jand form fee lattices which positions the icosahedral fullerene cores in sites of local cubic symmetry 

with a lattice parameter (a = 14.24 A) that is only slightly expanded from that of fee dopant free [60] fullerene [54]. 


Raman scattering is the key technique for probing the doping process in fullerites [59]. Specifically, the position of 
the A (2) pinch mode, which is a characteristic fingerprint for the charge transfer in these alkali doped systems, has 
been employed with great success to identify the various doped phases. Fundamental experiments have been 
performed with the scope to identify the nature of the superconductivity in these classes of materials including 
interpretation of the phase diagram as a function of composition, pressure and magnetic field, structure 
determination, magnetic susceptibility and, finally, NMR relaxation measurements in the normal state [60]. 
Specifically, band structure calculations on A 3 C 60 composites indicate that the charge transfer is nearly complete 
and that the electrons are used to half fill the conduction band [54]- Consequently, a simple charge transfer concept, 
from the cations to the LUMO of the fullerene, yielding a metallic state, has been proposed for a qualitative 
rationalization of the electronic properties. 


In pristine [ 60 ] fullerene, the t lu band is completely empty while, in contrast, the A 6 C 60 phase (bcc lattice) has a 
completely filled conduction band. In the intermediately doped A 4 C 60 phase (bet lattice), the density of states at the 
Fermi level is, however, nearly zero. These considerations are consistent with the absence of high temperature 
superconductivity in [60] fullerene, A 4 C 60 and A 6 C 60 . In conclusion, the superconducting behaviour strongly 
depends on the concentration of conduction band electrons, on the lattice constant and the degree of orientational 
order, yielding composites which display T c values between 2 and 40 K. The highest T c values that are reported 
are those of K- (33 K), Rb- (33 K) and Cs- (40 K, stabilized under hydrostatic pressure) doped A 3 C 60 composites 
( figure CI. 2. 7 ). Their properties may be best understood on the basis of a high average phonon frequency in 
combination with weak intermolecular interactions and strongly scattering intramolecular modes. 

Also, novel magnetic properties have been reported in mixed fullerene composites, in which the fullerene is limited 

to a single negative charge. For instance, the tetrakis(dimethylamino) ethylene/[60] fullerene salt, namely, [TDAE + ] 
[C^], has been described as a soft ferromagnet with a Curie temperature of 16 K [ 61 , 62 ]. 
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Figure Cl.2.7. Superconducting transition temperature plotted as a function of the a lattice parameter for a variety 
of A 3 C 60 phases [55]. 


C1.2.6 FULLERENE POLYMERS 


Several studies have demonstrated the successful incorporation of [60] fullerene into polymeric structures by 
following two general concepts: (i) in-chain addition, so called pearl necklace type polymers or (ii) on-chain 
addition pendant polymers. Pendant copolymers emerge predominantly from the controlled mono- and multiple 
functionalization of the fullerene core with different amine-, azide-, ethylene propylene terpolymer, polystyrene, 


poly(oxy ethylene) and poly(oxypropylene) precursors [63,64,65,66,67 and 68]. On the other hand, (-C 6Q Pd- 


-)„ 


polymers of the pearl necklace type were formed via the periodic linkage of [60] fullerene and Pd monomer units 
after their initial reaction with the p-xylyleriQ diradical [69,70 and 71 ]. 


An alternative approach envisages the stimulating idea to produce an all-carbon fullerene polymer in which 
adjacent fullerenes are linked by covalent bonds and align in well characterized one-, two- and three-dimensional 
arrays. Polymerization of [60] fullerene, with the selective formation of covalent bonds, occurs upon treatment 
under pressure and relatively high temperatures, or upon photopolymerization in the absence of a triplet quencher, 

such as molecular oxygen, using an Ar ion laser at intensities of 50 mW mm [72,73,74,75,76,77 and 78]. The 
synthesis of at least three polymer phases is reported with characteristics ranging from those of rhombohedral and 


orthorhombic to tetragonal phases ( figure CI. 2. 8 ). Typically, in these oligomer and polymer composites the 
interfullerene C-C linkage evolves from a [2+2] cycloaddition between neighbouring fullerene cores, which results 
in the formation of four-membered carbon rings (D 2h symmetry), fusing together adjacent molecules. 
Photopolymerization is hindered below the ordering transition of [60] fullerene as the probability of short carbon- 
carbon bonds approaching intermolecular contact diminishes. 
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Figure Cl.2.8. Orthorhombic ID and rhombohedral 2D structure of polymerized [60] fullerene. 

The C-C linkage in the polymeric [60]fullerene composite is highly unstable and, in turn, the reversible [2+2] 
phototransformation leads to an almost quantitative recovery of the crystalline fullerene. In contrast the similarly 
conducted illumination of [70] fullerene films results in an irreversible and randomly occurring photodimerization. 
The important aspect which underlines the markedly different reactivity of the [60]fullerene polymer material 
relative to, for example, the analogous [36]fullerene composites, is the reversible transformation of the former back 
to the initial fee phase. 


C1.2.7 LANGMUIR-BLODGETT FILMS 

Well ordered two-dimensional monolayered films are of great interest because of the valuable insights they provide 
regarding molecule interactions and their potential application to important technologies related to coatings and 
surface modifications. The optical and electrical properties of [60]fullerene films are strongly affected by the 
deposition conditions, by impurities or disordered structures. Thus, the exploration of these properties requires the 
controlled incorporation of fullerenes in well defined two-dimensional arrays and three-dimensional networks. 
Extensive efforts have been undertaken, ranging from modifying the deposition conditions to applying amphiphilic 
host molecules, with the scope to generate stable and well ordered monolayered fullerene films 
[79,80,81,82,83,84,85,86,87,88,89,90,91 and 92]. The strong n-n interaction and the resulting tendency to form 
aggregates precludes, however, formation of stable monolayers and Langmuir-Blodgett films at the air-water 
interface and solid substrates, respectively. In essence, the currently available data suggest three promising 
approaches to overcome these fundamental difficulties: (i) amphiphilic functionalization of the hydrophobic 
fullerene core via covalent attachment of hydrophilic groups [93,94,95,96,97,98,99,100,101,102,103 and 104], (ii) 
reducing the hydrophobic surface via controlled multiple 
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functionalization [105,106 and 107 ] or (iii) self-assembly via electrostatic attractions of oppositely charged species 
[108]. 

The controlled functionalization of the fullerene core, at one or more positions by addends of different 
hydrophobicities, has been demonstrated to provide a viable alternative to control the supermolecular structures 
formed upon their spreading on the water surface. It was unequivocally demonstrated that choosing an adequate 
hydrophobic-hydrophilic balance is a fundamental aspect with regard to forming stable and monolayered fullerene 
composites. In particular, functionalization with hydrophilic addends, such as cryptates, triethyleneglycol 
monomethyl ether, benzocrowns, N-acetyl pyrrolidine derivatives, carboxylic acid groups and oxygen (C 60 O), 
leads to the important promotion of the amphiphilic character of the fullerene core. A simple picture helps to 
rationalize the success of this concept: The hydrophilic or water-soluble head groups enhance the interaction with 
the aqueous subphase and, in turn, allow a two-dimensional fixation of the fullerene core at the air-water interface. 


C1.2.8 ELECTROCHEMISTRY 

One aspect that reflects the electronic configuration of fullerenes relates to the electrochemically induced reduction 
and oxidation processes in solution. In good agreement with the threefold degenerate LUMO, the redox chemistry 
of [60]fullerene, investigated primarily with cyclic voltammetry and Osteryoung square wave voltammetry, 
unravels six reversible, one-electron reduction steps with potentials that are equally separated from each other. The 
separation between any two successive reduction steps is -450 ± 50 mV. The low reduction potential (only -0.44 V 
versus SCE) of the process, that corresponds to the generation of the Ti-radical anion [ 31 , 109 , 110 , 111 and 112 ], 
deserves special attention. 

In contrast to the relative ease of reduction, oxidation of fullerenes requires more severe conditions [ 113 , 114 ]. Not 
only does the resonance stabilization raise the level of the corresponding oxidation potential (1.26 V versus 
Fc/Fc + ), but also the reversibility of the underlying redox process is affected [ 115 ]. 

This behaviour also stands for functionalized [60]fullerene derivatives, with, however, a few striking differences. 
The most obvious parameter is the negative shift of the reduction potentials, which typically amounts to -100 mV. 
Secondly, the separation of the corresponding reduction potentials is clearly different. While the first two reduction 
steps follow closely the trend noted for pristine [60] fullerene, the remaining four steps display an enhanced 
separation. This has, again, a good resemblance to the HOMO-LUMO calculations, namely, a cancellation of the 
degeneration for functionalized [60] fullerenes [31, 116 , 117 ]. 

The electrochemical features of the next higher fullerene, namely, [70]fullerene, resemble the prediction of a 
doubly degenerate LUMO and a LUMO + 1 which are separated by a small energy gap. Specifically, six reversible 
one-electron reduction steps are noticed with, however, a larger splitting between the fourth and fifth reduction 
waves. It is important to note that the first reduction potential is less negative than that of [60]fullerene [31]. 
Parallel to the shift that the reduction of higher fullerenes shows, oxidation of the latter is also made easier (D 5h 

[70] fullerene: +1.20 V versus Fc/Fc + ). The underlying HOMO LUMO gap in D 5 , [70]fullerene (2.22 V) is, 
therefore, markedly decreased relative to [60] fullerene (2.32 V). This trend is further extended in D 2 [76]fullerene 
(1.64 V) C 2v [78]fullerene (1.72 V) and D 2 /D 2d [84] fullerene (1.6 V) [32, 118 ]. In conclusion, higher fullerenes are 
better electron accepting and, at the same time, better electron donating materials relative to their smaller cousin, 
[ 60 ] fullerene. 


-13- 

Thin films of fullerenes, which were deposited on an electrode surface via, for example, drop coating, were largely 
heterogeneous, due to the entrapping of solvent molecules into their domains. Consequently, their electrochemical 
behaviour displayed different degrees of reversibility and stability depending on the time of electrolysis and the 


number of consecutive redox cycles scanned. Langmuir-Blodgett films of pristine [60] fullerene, upon 
electrochemical reduction, formed insoluble films which stem from the immobilization of charge compensating 
counter-cations into the film. The large separation between the cathodic and anodic waves, indicative of a high 
degree of irreversibility, has been attributed to structural rearrangements upon the reduction and reoxidation 
process and documents the high disordering of the fullerene cores in LB films [91, 119 , 120 , 121 , 122, 123 and 
124 ]. 


C1. 2.9 SOLUBILITY 

The quasi-aromatic structure of fullerenes affects the solubility of these hydrophobic moieties. Typical 
representatives are nonpolar organic solvents, such as toluene, benzene and chlorinated hydrocarbons [25]. In 
toluene, benzene and o-xylene the solubility of [60] fullerene exhibits, surprisingly, a negative temperature 
dependence, along with a maximum of solubility around 280 K. On the other hand, polar solvents including 
alcohols and aqueous systems are of impractical use for investigating the physical and chemical properties of 
fullerenes [ 126 , 127 ,]. For example, in polar solutions the hydrophobic fullerene core aggregates spontaneously, 
yielding clusters with indefinite aggregate sizes and unknown properties that vary, in part, quite significantly from 
those of true fullerene monomers [ 128 , 129 ]. This fullerene clustering has been monitored directly through 
dynamic light scattering and gel exclusion chromatography. An elegant route to overcome, in particular, the water- 
insolubility of fullerenes, is their incorporation into water-soluble superstructures. In this context, cyclodextrins 
[ 130 , 131 ], calixarenes [ 132 ], various micellar [ 133 , 134 , 135 and 136 ] and vesicular host structures [ 137 , 138 , 139 
and 140 ] were successfully utilized and the resulting complexes were studied under the aspect of photo-induced 
cytotoxicity and, if C radiolabeled, as potential biochemical tracers [ 141 ]. 


C1.2.10 PHOTOEXCITED STATES 

Another interesting physical feature relates to the chromophoric character of fullerenes. Based on the symmetry 
prohibitions, solutions of [60]fullerene absorb predominantly in the UV region, with distinct maxima at 220, 260 
and 330 nm. In contrast to extinction coefficients on the order of 10 M cm at these wavelengths, the visible 
region shows only relatively weak transitions (k max at 536 nm; s =710 M cm") [ 142 ]. 

Similar to the fullerene ground state the singlet and triplet excited state properties of the carbon network are best 
discussed with respect to the three-dimensional symmetry. Surprisingly, the singlet excited state gives rise to a low 

emission fluorescence quantum yield (^pltj) of 1-0 x 10 -4 [ 143 ]. Despite the highly constrained carbon network, 
the low O value relates to the combination of a short lifetime (1.8 ns) [ 144 ], a quantitative intersystem crossing 
(® ISC =1) [ 145 ] and, finally, the symmetry forbidden nature of the lowest energy transition. To the same extent, 
also the phosphorescence quantum yield (^p H o K 10 ) [146] * s strongly impacted by the spherical structure. 

Concerning transient absorption, laser or light excitation throughout the UV-visible region leads to the generation 
of the singlet excited state. The latter gives rise to a characteristic singlet-singlet absorption, maximizing around 
920 nm [ 144 ], whose lowest vibrational state has an energy of 1.99 eV. Once formed, the singlet excited state 
undergoes 
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a rapid and, more importantly, a quantitative intersystem crossing to the energetically lower lying triplet excited 
state. The lowest triplet excited state has an energy of -1.57 eV [ 147 , 148 ,]. In the case of [60]fullerene, this 
intersystem crossing takes place with a lifetime of 1.8 ns, governed by a large spin-orbit coupling which makes this 
process much faster than those known for two-dimensional rigid hydrocarbons. On the other hand, the fast ISC rate 
and the weak fluorescence provide the means for a triplet quantum yield (^triplet) c ^ ose to um ty- 

The triplet-triplet absorption spectrum reveals, similar to the singlet-singlet features, a maximum in the near-IR 


region around 750 nm [ 149 ]. In the absence of alternative deactivation processes, such as triplet-triplet annihilation 
and also ground- state quenching, the triplet lifetime amounts to -100 jus and is, again, much shorter than the triplet 
lifetime of comparable planar hydrocarbons [150, 151 ]. In this context, it is interesting to note that the highly 
constrained carbon network, particularly that of [60] fullerene, prohibits any vibrational motion, C-C bond 
elongation, or even changes of the dipole moment that may accelerate the deactivation of the singlet or triplet 
excited states. This argument is further substantiated by a small Stokes effect [152], which correlates to the 
energetic adaptation of the excited state to a new solvent environment, and also insignificant resonance Raman 
shifts [ 153 ] upon reduction of the fullerene core. 

In aerated or oxygen saturated solutions, the fullerene triplet lifetime suffers a marked reduction [ 147 , 148 ]. 

Luminescence studies (at 1365 nm) helped to identify singlet oxygen (^CL) as a product evolving from a 
bimolecular, Dexter-type energy transfer reaction. Particularly promising is the quantum yield for the singlet 
oxygen formation, which is near unity [ 147 , 148 ]. In other words, the fullerene triplet excited state is quantitatively 
converted into this biologically important oxygen species. Corresponding experiments with functionalized [ 60 ] 
fullerene derivatives that display sufficient water solubility in the absence of a host structure, such as C 60 [C(COCT) 
2 ] 2 , C 60 [C(COO _ )2] 3 , C™ [(CH^),SCUNa] 6 and C 60 (OH) 18 , corroborated the efficient formation of singlet oxygen 
also in aqueous solutions [ 154 , 155 ]. This, of course, evoked a tremendous interest to probe fullerenes as a 
potential agent for photodynamic therapy. Encouraging results stem from the strong cytotoxicity to L929 upon 
visible light irradiation as a result of superoxide production [ 141 ]. 

There are, indeed, many biological implications that have been triggered by the advent of fullerenes. They range 
from potential inhibition of HIV- 1 protease, synthesis of drugs for photodynamic therapy and free radical 
scavenging (antioxidants), to participation in photo-induced DNA scission processes [ 156 , 157 , 158 , 159 , 160 , 161 , 
162 and 163 ]. These examples unequivocally demonstrate the particular importance of water-soluble fullerenes and 
are summarized in a few excellent reviews [ 141 , 175 ]. 

Another application is optical limiting [ 164 , 165 ]. This is performed by materials whose transmittance strongly and 
quickly drops as the intensity of a laser pulse traversing them increases beyond a saturation level. The reverse 
saturable absorption mechanism is assumed to be the major parameter that determines the optical limiting. This 
mechanism plays an active role when an excited state, that is efficiently populated by optical excitation, has an 
absorption cross-section larger than that of the ground state at the excitation wavelength. In this light, the weak 
broad absorption of [60] fullerene throughout the UV-visible spectrum is beneficial to excite the ground state. Most 
importantly, both excited states, e.g. the singlet and triplet state, display cross sections significantly larger than that 
of the singlet ground state all over the accessible visible and near-IR spectrum. Consequently, the reverse saturable 
absorption that occurs with samples of pristine [60] fullerenes and functionalized [60]fullerene derivatives during 
the picosecond time regime and also on the nano-/microsecond time scales is attributed to emerging from the 
lowest singlet and triplet excited state, respectively. In particular, sol-gel films of [60] fullerene and some 
derivatives exhibit a marked enhancement in the red spectral region. 
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C1.2.11 IT-RADICAL ANIONS 

The most important species among the reduced fullerenes (7i-radical anions to hexa-anions) are the one-electron 
reduced forms. In general, various techniques were employed for their characterization, ranging from transient 
absorption spectroscopy [ 166 , 167 and 168 ] to transient electron spin resonance spectroscopy [ 111 , 112 , 167 , 169 , 
170 , 171 and 172 ]. The absorption features of [60], [70], [76], [78] and [§4]fullerene Ti-radical anions, which lie 
predominantly in the near-IR region, are unmistakably confirmed [ 173 ]. For example, the [60]fullerene Ti-radical 
anion shows a narrow band around 1080 nm which serves as a diagnostic probe for the identification of this one- 
electron reduced species and, furthermore, allows an accurate analysis of inter- and intramolecular ET dynamics in 
[60]fullerene containing systems. This cannot be concluded for the interpretation of the ESR signals, which are still 
subject to a controversially conducted discussion favouring either a narrow or, alternatively, a broad ESR feature 
[ 111 , 112 , 167 , 169 , 170 , 171 and 172 ]. A recent hypothesis proposes, in essence, a narrow ESR line for the n- 
radical anion which, upon rapid dimerization, undergoes a significant line broadening [ 169 , 170 ]. It remains, 


however, to be shown by different techniques such as, for example, time-resolved pulse radiolysis coupled with an 
ESR detector which assumption can be trusted. 


C1.2.12 ELECTRON TRANSFER REACTIONS 

The combination of a high degree of electron derealization within the fullerene's Ti-system and their effective 
sizes prompts the application of this carbon material as new electron accepting moieties ( figure CI. 2. 9 ). More 
importantly, the total reorganization energy upon reduction has been shown to be relatively small [ 174 ]. Hence, 
fullerenes became very appealing spheres for inter- and intramolecular electron transfer processes under the aspect 
of energy conversion and energy storage [175, 176 and 177 ]. 

A specific case that attracted a lot of attention encompasses the intermolecular electron transfer from a series of 
arene radical cations to [76] and [78]fullerene [ 178 ]. The high degree of charge and energy derealization within 
the fullerene moiety is expected to exert an effect in the desired direction as it minimizes vibrational differences 
between the reaction partners in the ground and transition state. The corresponding relation between the rate 
constant and the thermodynamic driving force follows the features of a parabola, i.e. the electron transfer rates 
increase only to a maximal value before they decrease noticeably at higher driving forces. This kinetic study made 
use of the unequally sized reaction partners, namely, a large-sized electron donor and small-sized electron acceptor 
couple, elevating the diffusion-controlled limit. Furthermore, the relatively low reorganization energy is clearly 
beneficial for the possibility to establish a Marcus-inverted behaviour which facilitates reaching the maximum of 
the exothermic electron transfer process at lower -AG and, in turn, reaching the inverted region at lower energy. 
This example is one of the rare cases that establish a 'Marcus-inverted' region in a bimolecular electron transfer, 
beside those reports on the geminate recombination of photolytically generated radical pairs. 
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Figure Cl.2.9. Schematic representation of photo induced electron transfer events in fullerene based donor- 
acceptor arrays (i) from a TTF donor moiety to a singlet excited fullerene and (ii) from a ruthenium excited MLCT 
state to the ground state fullerene. 


The redox properties of pristine fullerenes and monofunctionalized fullerene derivatives in their ground and excited 
states have drawn much attention for the design of devices such as molecular switches, receptors, photoconductors 
and photoactive dyads [ 175 , 176 and 177 ]. These applications are generally based on the implication of fullerenes 
as a multifunctional electron storage moiety. The excellent electron accepting properties of fullerenes, together 
with their low reorganization energy, makes [60]fullerene and its derivatives good candidates for building blocks of 
systems employable for solar energy conversion, batteries and photo voltaics. The concept of linking fullerenes to a 
number of interesting electro- or photoactive species offers new opportunities in the preparation of materials with 
building blocks having highly symmetrical and coordinating geometries. 
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The chemically simplest case encompasses the covalent attachment of addends that lack any visible absorption and 
are as such redox inactive, but may indirectly influence electron or energy transfer reactions. Of increasing 
complexity are arrays carrying electroactive addends, e.g. in N,N dimethylaniline (DMA) [ 179 , 180 ], ferrocene (Fc) 
[ 181 ] or tetrathiofulvalene (TTF) ( figure CI. 2. 10 ) [ 182 ]. In these multicomponent supermolecules, the fullerene 
core is implemented as a photosensitizer that sequentially accepts an electron from the adjacent electroactive 
moiety. Accordingly, these systems can be classified as electroactive, but due to their insignificant visible 
absorption characteristics, photoinactive dyads. Considering the moderate absorption features of [60] fullerene in 
the visible region, functionalization with antenna molecules, such as metalloporphyrins [ 183 , 184 , 185 , 186 , 187 , 
188 , 189 , 190 , 191 , 192 , 193 and 194] or MLCT transition metal complexes [195, 196 and 192], have developed as 
important objectives to promote the visible absorption characteristics of the resulting dyads and, most importantly, 
to improve the light harvesting efficiency of the fullerene core ( figure CI. 2. 10 ). As a direct consequence, the role 
of the fullerene is significantly changed. Under these circumstances fullerenes operate exclusively as either electron 
or energy acceptor moieties. For details, the reader is directed to a series of excellent review articles, which 
appeared during recent years [ 175 , 176 and 177 ]. 


C1.2.13 ENDOHEDRAL FULLERENES 

Finally, endohedral fullerenes are discussed. They attracted considerable attention for their potential use as 
superconductors, organic ferromagnets and magnetic resonance imaging agents (MRI). The enthusiasm that has 
arisen is based, in part, on the fact that the carbon network of each fullerene surrounds a large empty space which, 
in turn, renders it capable of encapsulating atomic particles. Furthermore, these novel materials created the 
stimulating possibility to fine-tune the fullerene 's physical and chemical properties via systematic substitution of 
the embedded metal species. In general, two approaches are pursued to incorporate the metal into the fullerene's 
interior. The first one implies the synergetic utilization of the arc discharge method of carbon rods in the presence 
of metal carbides [ 198 , 199 ]. Thus, the metal is present during the genesis of the fullerene network and can be 
scavenged by the closing sphere. In contrast to this approach, the second alternative involves the chemically 
induced opening of the carbon network, stuffing of the vacant interior with metals, and, in the last step, the 
subsequent re-closing of the open sphere [ 200 ]. It should be emphasized that the latter concept is a very challenging 
endeavour from the standpoint of synthesis. In fact, so far only the first route has led to isolable yields of 
endohedral fullerenes. 

Metallofullerenes are commonly found with [74], [76], [ 80 ] and [ 82 ] fullerene and span composites that have a 
single (M@Cg 2 ), two (M 2 @C 82 ) or even three metal atoms (M 3 @C 82 ) encapsulated. The first type of 
metallofullerene extracted from fullerene soot was lanthanum fullerene La@C 82 followed a short time later by the 
detection of scandium fullerene Sc@Cg 2 and yttrium fullerene Y@C 82 [ 201 , 202 and 203 ]. These have been 
completed by essentially all alkali metals, alkali-earth metals, noble gases and rare-earth metals ( figure CI. 2. 11 ) 
[ 204 , 205 , 206 , 207 , 208 , 209 , 210 , 211 , 212 , 213 , 214 , 215 , 216 , 217 , 218 , 219 and 220]. 
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Figure Cl.2.10. Representative examples of fullerene based donor-bridge-acceptor dyads and triads. 
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Figure Cl.2.11. Synthesized and isolated endohedral fullerenes. 

There has been a long dispute on the endo- or exohedral nature of such metallofullerenes. Experimental evidence 
including scanning tunnelling microscopy, extended x-ray absorption fine structure and transmission electron 
microscopy confirmed unequivocally the endohedral structure of these fullerene composites [ 204 , 205, 206 , 207 , 
208 , 209 , 210 , 211 , 212 , 213 , 214 , 215 , 216 , 217 and 218]. On the other hand, ESR has been proven to be the 
method of choice to investigate the electronic state [ 210 , 221 , 222 and 223 ]. Spectral evidence from the latter 
technique demonstrates that the encaged metal atom transfers a significant amount of charge to the carbon cage. 

The most recent success in light of stuffing the fullerene's interior and using it as a carrier is the stabilization of 
atomic nitrogen and phosphorus inside [60]fullerene [9, 224 ]. Particularly, the remarkable stability of the N@C 60 
system led the researchers to postulate a Faraday-cage-like property of the carbon network. EPR and ENDOR 
experiments, in combination with the stability of the system in air, clearly demonstrate that the nitrogen is 
incorporated into the fullerene interior and that, furthermore, it sustains completely its atomic ground state 
configuration [9]. The spherical symmetry of N@C 60 , as, for example, derived from the absence of anisotropic 
hyperfme interaction, is interpreted as a strong indicator for the fixation of the nitrogen in the centre of [ 60 ] 
fullerene. In addition, the centre position is substantiated by potential energy calculations suggesting that the highly 
reactive nitrogen atom is trapped and shielded from the surroundings. 

The noble-gas fullerene compounds have no chemical bond between the gas atom and the carbon atoms, yet they 
are also extremely stable, since the gas atom simply cannot escape from the fullerene cage. In this light, the 

recently introduced He NMR spectroscopy of endohedral He@C 60 is bound to become a major experimental tool 
to study the structure and reactions of fullerenes [ 225 , 226 ]. 
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C1.2.14 CONCLUDING REMARKS 


Recently, considerable advances have been made in the production of a different class of nanoscopic carbon 
structures, namely, carbon nanotubes, which stimulated fundamental research exploring the structure-property 
relationship of these materials [ 227 ]. In their simplest form carbon nanotubes are composed of only a single 
cylindrical graphene shell with a central hollow internal cavity. These structurally uniform cylinders are invariably 
sealed at both ends by bended carbon caps, which contain both five- and six-membered rings similar to the 
structures of fullerenes. Based on their similarity with highly graphitized carbonaceous materials, nanotubes have 
low chemical reactivity. Therefore, the chemistry of carbon nanotubes is mainly focused on opening reactions at its 


caps to enable filling of the hollow cavity with electron conducting material. In contrast, the tubewalls are 
practically nonreactive. These materials are promising candidates for future applications ranging from catalysis to 
separation and storage technology to electronics and accordingly warrant appropriate attention. 
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C1 .3 Van der Waals molecules 

Jeremy M Huts on 


C1.3.1 INTRODUCTION 

The attractive forces between pairs of atoms or molecules are almost always strong enough to support bound 
vibrational states. The resulting molecular complexes, or Van der Waals molecules, are very weakly bound, and are 
easily destroyed by collisions with other molecules. They exist in small but significant concentrations in gases and 
gas mixtures: for example, in Ar gas at 120 K and 1 bar, about 0.4% of the atoms are present as bound dimers [1]. 
The dimer concentrations decrease with increasing temperature, but are larger for systems with stronger attractive 
forces, such as most systems containing polar molecules. 


Van der Waals complexes can be observed spectroscopically by a variety of different techniques, including 
microwave, infrared and ultraviolet/visible spectroscopy. Their existence is perhaps the simplest and most direct 
demonstration that there are attractive forces between stable molecules. Indeed the spectroscopic properties of Van 
der Waals complexes provide one of the most detailed sources of information available on intermolecular forces, 
especially in the region around the potential minimum. The measured rotational constants of Van der Waals 
complexes provide information on intermolecular distances and orientations, and the frequencies of bending and 
stretching vibrations provide information on how easily the complex can be distorted from its equilibrium 
conformation. In favourable cases, the whole of the potential well can be mapped out from spectroscopic data. 

Studies of Van der Waals complexes have provided a wealth of information on the properties of weak non- 
chemical bonds. They have allowed the determination of complete potential energy surfaces for small systems, and 
have thrown considerable light on the nature of the hydrogen bond. Studies of larger clusters have begun to provide 
information of relevance to the liquid state, and to explain the behaviour of hydrogen-bonded networks. Studies of 
clusters containing reactive species are now starting to throw new light on the dynamics of chemical reactions. All 
these examples will be discussed in more detail below. 


C1.3.2 TYPES OF SPECTROSCOPY 

The spectroscopic signatures of Van der Waals complexes were first observed by Vodar and co-workers in the late 
1950s [2], as broad features in the 'missing Q-branch' regions of the spectra of hydrogen halides and their mixtures 
at high pressures. Rank et al [3] subsequently observed irregular but fairly sharp features between the monomer 
vibration-rotation lines at lower pressures. The lines were attributed to dimers because their intensity increased 
quadratically with the gas pressure. Many attempts were made to obtain resolved spectra that could be analysed 
reliably. However, the basic problem at the time was that, at the pressures needed to obtain substantial 
concentrations of dimers, there was so much pressure broadening that the monomer lines swamped the spectra of 
the dimers. As the pressure was reduced, the dimer lines became somewhat sharper and the accessible region 
between the monomer lines increased, but the dimer spectra soon became too weak to observe above the noise. 
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Greater success was achieved for complexes formed from homonuclear diatoms such as H 2 and N 2 , for which 
monomer infrared transitions are forbidden by dipole selection rules. Complexes containing such molecules do 
have an infrared spectrum, because the quadrupole moment of the homonuclear monomer creates an electric field, 
which induces a dipole moment in the attached atom or molecule. In particular, rotationally resolved spectra were 
obtained for several complexes formed from H 2 and rare gases [4, 5]; these complexes have unusually large 
rotational constants because of the presence of H 2 . Spectra were also obtained for analogous complexes containing 
D 2 and HD. They were used to obtain a succession of intermolecular potentials for H 2 -Ar, H 2 -Kr and H 2 -Xe, of 
steadily increasing sophistication and accuracy [6, 7 and §]. It was possible to determine not only the radial 
dependence of the potential, but also its dependence on intermolecular angle (anisotropy) and on the monomer 
bond length r. 

A major breakthrough in the spectroscopy of Van der Waals complexes came in 1972, when Dyke, Howard and 
Klemperer [9] succeeded in measuring the microwave spectrum of (HF) 2 in a molecular beam. Such experiments 
are described in detail in the chapter on Jet Spectroscopy. In a typical molecular beam spectroscopy experiment, 
gas at around 1 bar pressure is expanded into a vacuum through a nozzle of aperture around 50 urn. Under these 
conditions, the gas molecules undergo many collisions during the expansion, and the collisions equalize the 
velocities of the different molecules. This is often referred to as a supersonic expansion. At the end of the 
expansion, nearly all of the random thermal energy of the gas molecules has been converted into ordered 
translational motion of the beam; all the molecules have almost the same velocity, and the relative velocities are 
very low. The beam itself, beyond the expansion region, is a nearly collision-free environment. The low relative 
velocities correspond to very low effective translational temperatures: temperatures of 1-10 K are common in 
molecular beam spectroscopy experiments. The populations of rotational and vibrational levels do not relax as fast 
as translation; rotational and vibrational distributions are sometimes characterized by higher temperatures, or may 


not follow a Boltzmann distribution at all. In most cases, the rotational distribution is nearly as cold as the 
translational distribution, while the vibrational temperature is somewhere between the translational temperature and 
the source temperature. In any case, the effective temperature of the beam is low enough for large concentrations of 
complexes to be formed, and the complexes are mostly in their ground vibrational state. 

C1. 3.2.1 MICROWAVE SPECTROSCOPY 

Klemperers' early microwave experiments on Van der Waals complexes used molecular beam electric resonance 
(MBER) spectroscopy, which relies on the fact that a beam of molecules with permanent dipole moments can be 
focused by an inhomogeneous electric field. Because of the Stark effect, the energy of a dipolar molecule in an 
electric field depends on the field strength. A MBER spectrometer usually uses a quadrupolar field; the beam is 
passed down the centre of a set of four parallel rods with alternating positive and negative voltages. This 
arrangement gives no electric field along the central axis, but the field increases linearly away from the axis. For a 
molecule in a state with a positive quadratic Stark effect, the energy thus increases quadratically with displacement 
from the centre; this creates a linear restoring force, so that a beam of such molecules is focused by the rods. 

A MBER spectrometer is shown schematically in figure CI. 3.1 . The technique relies on using two inhomogeneous 
electric fields, the A and B fields, to focus the beam. Since the Stark effect is different for different rotational 
states, the A and B fields can be set up so that a particular rotational state (with a positive Stark effect) is focused 
onto the detector. In MBER spectroscopy, the molecular beam is irradiated with microwave or radiofrequency 
radiation in the 
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central region (C field); any molecule which undergoes a rotational transition to a state with a negative Stark effect 
will not be focused by the B field, and will not reach the detector. Thus a transition is detected by monitoring the 
molecular beam flux as a function of the irradiation frequency. A mass spectrometer is usually used as the beam 
detector, allowing straightforward discrimination between monomers and Van der Waals complexes. 



Figure Cl.3.1. Schematic diagram of a molecular beam electric resonance spectrometer. (Taken from [60].) 

An enormous range of complexes has been observed by MBER spectroscopy. Almost any pair of volatile 
compounds can be expanded through a nozzle and made to form complexes. In spectroscopic experiments, it is 
common to use around 1% of the sample gases in a buffer gas such as Ar, in order to minimize the formation of 
trimers and larger complexes. The restrictions on the technique are that the constituent molecules must be volatile 
and that the complex must have a dipole moment. In addition, for very large complexes, even the temperature of a 
molecular beam is high enough that a large number of rotational states are populated. The sensitivity of MBER 
relies on being able to deplete the beam intensity significantly by removing the molecules in a single rotational 
state. Thus, if the rotational partition function is very high, MBER may not be sensitive enough for spectra to be 
detected. 


A few Van der Waals complexes have been observed using the analogous technique of molecular beam magnetic 
resonance, in which the molecules are focused using a magnetic rather than an electric field. 


An alternative approach to obtaining microwave spectroscopy is Fourier transform microwave (FTMW) 
spectroscopy in a molecular beam [10]. This may be considered as the microwave analogue of Fourier transform 
NMR spectroscopy. The molecular beam passes into a Fabry-Perot cavity, where it is subjected to a short 
microwave pulse (of a few milliseconds duration). This creates a macroscopic polarization of the molecules. After 
the microwave pulse, the time-domain signal due to coherent emission by the polarized molecules is detected and 
Fourier transformed to obtain the microwave spectrum. 

Microwave studies in molecular beams are usually limited to studying the ground vibrational state of the complex. 
For complexes made up of two molecules (as opposed to atoms), the intermolecular vibrations are usually of 
relatively low amplitude (though there are some notable exceptions to this, such as the ammonia dimer). Under 
these circumstances, the methods of classical microwave spectroscopy can be used to determine the structure of the 
complex. The principal quantities obtained from a microwave spectrum are the rotational constants of the complex, 
which are conventionally designated A, B and C in decreasing order of magnitude: there is one rotational constant B 
for a linear complex, two constants (A and B or B and Q for a complex that is a symmetric top and three constants 
(A,B and Q for an 
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asymmetric top. The rotational constants are related to the moments of inertia of the complex. For a rigid complex, 
the rotational constants are simply 


A = 


2U 


fl = 


2/« 


C = 


2/c 


(C1.3.1) 


where I A , I B and/ c are the moments of inertia of the complex about its three principal axes. If the structures of the 
free monomers are assumed to be unchanged in the complex, then the number of coordinates required to define a 
'structure' varies from one (for an atom-atom complex) to six (for a complex formed from two nonlinear 
molecules) as shown in figure CI. 3. 2. The rotational constants obtained for a single isotopic species are thus not 
usually enough to determine the structure of the complex, and it is usual to measure spectra for several isotopically 
substituted species. 
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Figure Cl.3.2. Coordinate systems used for intermolecular potential energy surfaces. (Taken from [60].) 


The use of isotopic substitution to determine structures relies on the assumption that different isotopomers have the 
same structure. This is not nearly as reliable for Van der Waals complexes as for chemically bound molecules. In 
particular, substituting D for H in a hydride complex can often change the amplitudes of bending vibrations 
substantially; under such circumstances, the idea that the complex has a single 'structure' is no longer appropriate 
and it is necessary to think instead of motion on the complete potential energy surface; a well defined equilibrium 
structure may still exist, but knowledge of it does not constitute an adequate description of the complex. 


There are other important properties that can be measured from microwave and radiofrequency spectra of 
complexes. In particular, the dipole moments and nuclear quadrupole coupling constants of complexes may contain 
useful information on the structure or potential energy surface. This is most easily seen in the case of the dipole 
moment. The dipole moment of the complex is a vector, which may have components along all the principal 
inertial axes. 
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Measurements of Stark splittings in microwave and radiofrequency spectra allow these components to be 
determined. The main contribution to the dipole moment of the complex arises from the permanent dipole moment 
vectors of the monomers, which project along the axes of the complex according to simple trigonometry (cosines). 
Thus, measurements of the dipole moment convey information about the orientation of the monomers in the 
complex. It is of course necessary to take account of effects due to induced dipole moments and to consider 
whether the effects of vibrational averaging are important. 

The argument for nuclear quadrupole coupling constants is similar, except that they are second-rank tensors rather 
than vectors and so different angular functions (involving second Legendre polynomials, or squares of cosines) are 
involved in the projections. Nuclear quadrupole coupling constants have the advantage that induced effects are 
usually very small, so that they contain uncontaminated angular information. 

When a complex rotates, it stretches slightly and may also undergo other small structural changes. These changes 
are reflected in centrifugal distortion constants, which can be extracted from microwave (or other) spectra. The 
centrifugal distortion constants contain useful information on intermolecular forces, because they measure how 
easily the complex is distorted; in essence, they reflect the force constants for the bending and stretching vibrations. 

It is also possible to measure microwave spectra of some more strongly bound Van der Waals complexes in a gas 
cell rather than a molecular beam. Indeed, the first microwave studies on molecular clusters were of this type, on 
carboxylic acid dimers [11]. The resolution that can be achieved is not as high as in a molecular beam, but bulk gas 
studies have the advantage that vibrational satellites, due to pure rotational transitions in complexes with 
intermolecular bending and stretching modes excited, can often be identified. The frequencies of the vibrational 
satellites contain information on how the vibrationally averaged structure changes in the excited states, while their 
intensities allow the vibrational frequencies to be estimated. 

C1. 3.2.2 INFRARED SPECTROSCOPY 

As described above, classical infrared spectroscopy using grating spectrometers and gas cells provided some 
valuable information in the early days of cluster spectroscopy, but is of limited scope. However, the advent of 
tunable infrared lasers in the 1980s opened up the field and made rotationally resolved infrared spectra accessible 
for a wide range of species. As for microwave spectroscopy, tunable infrared laser spectroscopy has been applied 
both in gas cells and in molecular beams. In a gas cell, the increased sensitivity of laser spectroscopy makes it 
possible to work at much lower pressures, so that strong monomer absorptions are less troublesome. 

The intermolecular bending and stretching vibrations of Van der Waals complexes typically have wave-numbers 

between 20 cm and 200 cm" . At the temperatures used in gas cells, usually between 77 K and 300 K, many 
complexes are in excited vibrational states. In addition, since most complexes (other than those of He and H 2 ) have 

rotational constants that are 0.1 cm or less, very long rotational progressions (up to J= 100) are often observed. 
Spectra obtained in gas cells can thus be very congested and quite difficult to assign reliably. Nevertheless, the 
large number of excited states produces a very rich spectrum, and it is sometimes possible to characterize excited 
states that would not otherwise be accessible. 

Infrared spectroscopy can also be carried out in molecular beams. The primary advantages of beam spectroscopy 
are that it dispenses almost entirely with monomer absorptions that overlap regions of interest, and that the 
complexes are 
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internally cold, so that the spectrum is very much simplified. The latter advantage is illustrated in figure CI. 3. 3 
which compares infrared spectra of Ar-HF obtained in a gas cell and in a molecular beam, using a tunable 
difference frequency laser as the light source in each case. 
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Figure Cl.3.3. Comparison between infrared spectra for the p bend combination band of Ar-HF obtained in the gas 
phase and in a slit jet. (a) The gas-phase spectrum (Taken from [36]). (b) The slit jet spectrum (Taken from [61]). 

The earliest molecular beam infrared experiments on Van der Waals complexes used photodissociation 
spectroscopy: a molecular beam is irradiated with a tunable infrared laser and the molecular beam intensity is 
measured as a function of 
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laser frequency [12]. The original experiments did not use state selection, and monitored the beam intensity using a 
bolometer, which measures the total energy that is deposited by the molecules that reach it. Since complexes in 
vibrationally excited states usually undergo fast vibrational predissociation, and the recoil velocities are such that 
the fragments are ejected from the beam and do not reach the detector, transitions involving complexes are detected 


as a decrease in the energy flux reaching the bolometer. This allows complexes to be distinguished from 
monomers, which reach the detector undissociated, and produce a positive signal because the energy absorbed from 
the photon is deposited on the bolometer. Photodissociation experiments have proved to be a very rich source of 
information, not only on energy levels but also on predissociation dynamics, because the fragments that leave the 
beam can be detected and their internal states inferred. 

One problem with molecular beam techniques is that, although the concentration of complexes is high, the 
available path length is very low. For some time this precluded the observation of spectra in molecular beams by 
the conventional spectroscopic approach of monitoring the attenuation in the laser beam intensity caused by 
absorption. In order to overcome this, Nesbitt and co-workers [13] developed techniques using molecular beams 
expanded through a slit rather than a circular hole. This provides much longer path lengths, and makes it possible to 
carry out direct absorption experiments, monitoring the depletion in laser beam intensity (as in a normal 
spectrometer) rather than the molecular beam intensity. 

Most infrared spectroscopy of complexes is carried out in the mid-infrared, which is the region in which the 
monomers usually absorb infrared radiation. Van der Waals complexes can absorb mid-infrared radiation either 
with or without simultaneous excitation of intermolecular bending and stretching vibrations. The mid-infrared 
bands that contain the most information about intermolecular forces are combination bands, in which the 
intermolecular vibrations are excited. Such spectra map out the vibrational and rotational energy levels associated 
with monomers in excited vibrational states and, thus, provide information on interaction potentials involving 
excited monomers, which may be slightly different from those for ground-state molecules. 

It is thus of great interest to carry out experiments that excite the intermolecular bending and stretching vibrations 
directly, without exciting the monomers as well. These transitions lie deep in the far infrared, typically in the 20- 

200 cm region, and this has traditionally been a very difficult region for spectroscopy. Fourier transform 
spectroscopy in gas cells can be applied to obtain rotationally resolved spectra of complexes in this region [14], and 
it is also possible to measure far-infrared spectra in molecular beams. Early work [15, 16] used laser Stark 
spectroscopy, in which the molecules were tuned into resonance with the laser by applying a Stark field. However, 
this was superseded by methods using tunable far-infrared lasers, based on nonlinear mixing of fixed-frequency 
molecular lasers with microwave radiation [17]; these make it possible to obtain zero-field far-infrared spectra of 
complexes, which are much easier to interpret than laser Stark spectra. 

Mid-infrared combination bands and far-infrared spectra of Van der Waals complexes map out the pattern of 
energy levels associated with intermolecular bending and stretching vibrations. The principal quantities that can be 
observed are vibrational frequencies and rotational constants, though once again subsidiary quantities such as 
centrifugal distortion constants, dipole moments and nuclear quadrupole coupling constants may sometimes be 
extracted. In addition, observation of line broadening due to predissociation can sometimes provide very direct 
measurements of binding energies (and hence of the depths of potential wells). 

For chemically bound molecules, it is usual to analyse the vibrational energy levels in terms of normal modes: a 
non-linear (or linear) molecule with TV atoms has 3 N- 6 (or 3 N- 5) vibrational degrees of freedom. There is a 
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fundamental frequency v. associated with each degree of freedom i. The vibrational energy levels may be labelled 
with a quantum number v. for each normal mode and, to a first approximation, the energy is given by the harmonic 
expression 


f I 


E t = £ A 4 (i V + i). (C1.3.2) 


For quantitative work, it is necessary to include anharmonic corrections and coupling between the normal modes, 
but the general picture suffices to handle the lower vibrational levels of most near-rigid molecules. 


Unfortunately, the normal-mode picture is quite inadequate for many Van der Waals complexes, even in their 
lowest few levels. This may be illustrated by considering the bending levels of Ar-HCl, which are shown in figure 
CI. 3.4 . The equilibrium geometry of Ar-HCl is linear, so it has one doubly degenerate bending mode. In a normal- 
mode picture, the vibrational levels would be labelled by a bending quantum number v and a vibrational angular 
momentum / (or K). The normal-mode energy level pattern is shown on the right-hand side of figure CI. 3. 4 . The 

ground state has v^ = , while the first excited state has v^= 1 . The v = 2 excited states lie at about twice the 
energy of the 1 states and are split into components with v K = 2° and 2 . The actual levels of the complex 
clearly have a quite different pattern; in particular, the lowest excited state with K = lies below the lowest K=±\ 
level, not above it. The pattern observed in the complex is in fact much closer to that expected for an HC1 molecule 
undergoing nearly free internal rotation in the complex, shown in the centre of figure CI. 3. 4 . In the free-rotor 
limit, the HC1 energy levels are simply bj(j +1), where b is the rotational constant of HC1 andy is the rotational 
quantum number for HC1 internal rotation (not overall rotation of the complex). In the presence of potential 
anisotropy,y becomes quantized along the intermolecular axis with a projection quantum number K, which can take 
integer values up to ±j. In a complex such as Ar-HCl, the splittings between the different K levels for a given value 
of j are substantial, but the general picture of hindered internal rotation is considerably closer to the truth than the 
normal-mode picture of low-amplitude vibrations about an equilibrium geometry. A full analysis of the energy 
level pattern can provide very detailed information on the anisotropic potential energy surface [18]. 
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Figure Cl.3.4. The real pattern of intermolecular bending energy levels for Ar-HCl (left) compared with the 
pattern expected for a free internal rotor (centre) and a near-rigid bender (right). The allowed transitions are shown 
in each case. (Taken from [19].) 


The quantum numbers that are appropriate to describe the vibrational levels of a quasilinear complex such as Ar- 
HCl are thus the monomer vibrational quantum number v, an intermolecular stretching quantum number n and two 
quantum numbers y and K to describe the hindered rotational motion. For more rigid complexes, it becomes 
appropriate to replacey and K with normal-mode vibrational quantum numbers, though there is an awkward 
intermediate regime in which neither description is satisfactory: see [3] for a discussion of the transition between 
the two cases. In addition, there is always a quantum number J for the total angular momentum (excluding nuclear 
spin). The total parity (symmetry under space-fixed inversion of all coordinates) is also a conserved quantity that is 
spectroscopically important. 


C1. 3.2.3 PHOTODISSOCIATION AND PREDISSOCIATION 

The binding energy of a Van der Waals complex is usually considerably less than the energy of a mid- infrared 
photon. Accordingly, Van der Waals complexes containing vibrationally excited monomers usually have enough 
energy to dissociate, by converting the vibrational energy into relative translational energy of the fragments. This 
process is referred to as vibrational predissociation. 

For complexes such as Ar-H 2 , Ar-HF and Ar-HCl, vibrational predissociation is a very slow process and does not 
cause appreciable broadening of the lines in the infrared spectrum. Indeed, for Ar-HF, Huang et al [20] showed 
that 
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vibrationally excited molecules survive for at least 0.3 ms, and reach a bolometric beam detector undissociated. 
However, for heavier monomers and more strongly bound complexes, vibrational predissociation can be much 
faster and lead to observable line broadening in the spectrum. The predissociation lifetime x and linewidth y (full 
width at half maximum, in energy units) are related by 


h 

r = — . 

y 


(C1.3.3) 


Spectra of two different bands of (HF) 2 , showing the difference between predissociated and undissociated spectra 
[21], are shown in figure CI. 3. 5 

The occurrence of predissociation opens up a new family of observable quantities. It is possible to measure not 
only linewidths or lifetimes, but also the internal state distributions of the fragments. All these quantities are 
sensitive to the intermolecular potential and can be used to test or refine proposed potential surfaces. 
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Figure Cl.3.5. Spectra of two different infrared bands of HF dimer, corresponding to excitation of the 

"bound' (lower panel) and "free' (upper panel) HF monomers in the complex. Note the additional line width for the 

"bound' HF, caused by vibrational predissociation with a lifetime of about 0.8 ns. (Taken from [21].) 
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C1. 3.2.4 VISIBLE AND ULTRAVIOLET SPECTROSCOPY 

Tunable visible and ultraviolet lasers were available well before tunable infrared and far-infrared lasers. There are 
many complexes that contain monomers with visible and near-UV spectra. The earliest experiments to give 
detailed dynamical information on complexes were in fact those of Smalley et al [22], who observed laser-induced 
fluorescence (LIF) spectra of He-I 2 complexes. They excited the complex in the I 2 B ^-Xband, and were able to 
produce excited-state complexes containing 5-state I 2 in a wide range of vibrational states. From line widths and 
dispersed fluorescence spectra, they were able to study the rates and pathways of dissociation. Such work was 
subsequently extended to many other systems, including the rare gas-Cl 2 systems, and has given quite detailed 
information on potential energy surfaces [23]. 

The homonuclear rare gas pairs are of special interest as models for intermolecular forces, but they are quite 
difficult to study spectroscopically. They have no microwave or infrared spectrum. However, their vibration- 
rotation energy levels can be determined from their electronic absorption spectra, which lie in the vacuum 
ultraviolet (VUV) region of the spectrum. In the most recent work, Herman et al [24] have measured vibrational 
and rotational frequencies to great precision. In the case of Ar-Ar, the results have been incorporated into a 
multiproperty analysis by Aziz [25] to develop a highly accurate pair potential. 

C1. 3.2.5 OTHER SPECTROSCOPIC METHODS 

Far- infrared and mid-infrared spectroscopy usually provide the most detailed picture of the vibration-rotation 
energy levels in the ground electronic state. However, they are not always possible and other spectroscopic 
methods are also important. 

It is often difficult to observe direct infrared transitions from the ground state to highly excited intermolecular 
bending and stretching states, because the spectroscopic intensities are very low. One technique that circumvents 
this difficulty is stimulated-emission pumping (SEP) spectroscopy. Molecules (or complexes) are first promoted to 
an excited electronic state by a pump laser, and then emission is stimulated by a second tunable laser (the dump 
laser) at slightly lower frequency. Meanwhile, the population of the excited state is monitored in some way 
(perhaps by observing the spontaneous emission signal). When the dump laser is resonant with a transition back 
down to an excited vibrational level of the ground electronic state, it causes an observable dip in the population of 
the excited state. SEP spectroscopy has been applied with great success to map out the energy levels of the open- 
shell complex Ar-OH in its ground electronic state [26], and the resulting spectra have been used to obtain 
potential energy surfaces for the interaction [27]. 


C1. 3.3 EXAMPLES 

C1. 3.3.1 BINARY COMPLEXES: AR-HCL AND AR-HF 

The Ar-HCl and Ar-HF Van der Waals complexes were among the first to be detected experimentally, by the 
observation of weak peaks lying between the vibration-rotation lines of HCl and HF in mixtures with rare gases as 
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described above. The first measurements were made at a time when very little was known about intermolecular 
forces involving molecules and it was clear from the beginning that, if high-resolution spectra could be measured, 
they would contain an immense amount of valuable information. It was also clear that atom-molecule complexes 
would be much easier to treat theoretically than molecule-molecule complexes. Accordingly, when MBER 
spectroscopy (see above) was developed in the early 1970s, Ar-HCl [28] and Ar-HF [29] were among the first 
systems studied. 

The early MBER spectra were of two types: first, pure rotational (microwave) transitions and, secondly, 
radiofrequency transitions between different hyperfme levels; the latter were observable only for Ar-HCl, because 

both CI and CI have nuclear quadrupole moments while F has none. In addition, the rotational levels of both 
complexes could be split by applying an electric field (Stark effect) and the dipole moment could be determined 
from the size of the splittings. It turned out that Ar-HCl and Ar-HF were 'pseudo-linear' molecules: their 
rotational energy levels could be interpreted entirely in terms of a single rotational constant B and centrifugal 
distortion constant Dj. Nevertheless, the spectra contained signatures that showed that they were not rigid linear 
species: the dipole moments of the complexes were only about two-thirds of those of the HC1 and HF monomers, 
and the nuclear quadrupole coupling constants of Ar-HCl were only about one-third of those of the corresponding 
HC1 isotopomers. Both these fractions would have been close to one for a rigid linear species. 

The Ar-HCl and Ar-HF complexes became prototypes for the study of intermolecular forces. Holmgren et al [ 30 ] 
produced an empirical potential energy surface for Ar-HCl fitted to the microwave and radiofrequency spectra, 

showing that the equilibrium geometry is linear, Ar-HCl, with a well depth of around 175 cm" . However, it turned 
out that this surface gave a poor account of properties that were sensitive to the shape of the repulsive wall, such as 
the pressure broadening of HC1 vibration-rotation lines in the gas phase [31]. This is a general problem with 
intermolecular potentials determined solely from the spectra of complexes; the spectra do not contain enough 
information to determine the shape of the repulsive wall. Conversely, the potential energy surface determined from 
the pressure broadening [31] gave a very poor account of the spectra of the complexes [32]. 

Hutson and Howard [ 33 ] combined the Van der Waals spectra with pressure-broadening data and virial coefficients 
to produce an improved surface. However, even then, they showed that the microwave spectra determined the 
potential quite accurately between 9 = and 90°, around the absolute minimum, but could not determine whether 
there was a secondary minimum around 9 = 180°, the linear Ar-CIH structure. This again illustrates a general 
limitation: microwave spectra do not usually sample geometries far from equilibrium. Hutson and Howard [ 34 ] 
suggested that the best way to probe the Ar-CIH region would be to measure mid-infrared or far-infrared spectra, 
exciting intermolecular bending bands, and gave predictions for such spectra for potential surfaces with and 
without secondary minima. 

Laser techniques capable of obtaining rotationally resolved infrared spectra of complexes came along fairly quickly 
[12]. For Ar-HCl, Marshall et al [15] and Ray et al [16] developed laser Stark methods for measuring far-infrared 
spectra in a molecular beam with fixed-frequency lasers. The laser Stark results were extended by Busarow et al 
[17], who used a tunable far-infrared laser for the first time. High-resolution mid-infrared spectra for Ar-HCl and 
Ar-HF were measured in both gas cells [35, 36] and slit jets [13, 37] using tunable difference-frequency lasers. 
Hutson [38] used the new spectra to develop an improved potential energy surface for Ar-HCl, establishing 

definitively that there is a secondary minimum around the linear Ar-CIH geometry, about 30 cm shallower than 
the primary minimum at the Ar-HCl geometry. 
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Over the next few years, both the mid-infrared and the far-infrared spectra for Ar-HF and Ar-HCl were extended 
to numerous other bands and to other isotopic species (most importantly those containing deuterium). In 1992, 
Hutson [18, 39] combined all the available spectroscopic data to produce definitive potential energy surfaces that 
included both the angle dependence and the dependence on the HF/HC1 monomer vibrational quantum number v 

(actually through the mass-reduced vibrational quantum number [40], ^ = t y+ 2 Vv^hxJ xhe resulting Ar-HF 
potential (designated H6(4,3,2)) is shown in figure CI. 3. 6. The potentials have since been used to calculate 


numerous properties, including pressure-broadening coefficients, inelastic scattering integral cross sections, 
differential cross sections and the spectra of HF and HC1 in rare gas clusters and matrices. They have proved to be 
remarkably reliable. They have also been used extensively as testing grounds for new ab initio electronic structure 
methods. 
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Figure Cl.3.6. Contour plot of the H6(4,3,2) potential for Ar~HF, which was fitted principally to data from far- 
infrared and mid-infrared spectroscopy of the Ar~HF Van der Waals complex. The contours are labelled in cm" 1 . 
(Taken from [39]-) 

C1. 3.3.2 LARGER CLUSTERS: (H 2 0) N 

The most important molecular interactions of all are those that take place in liquid water. For many years, chemists 
have worked to model liquid water, using molecular dynamics and Monte Carlo simulations. Until relatively 
recently, however, all such work was done using 'effective potentials' [41], designed to reproduce the condensed- 
phase properties but with no serious claim to represent the true interactions between a pair of water molecules. 

The advent of cluster spectroscopy offered the opportunity to place studies of liquid water and aqueous solutions on 
a 
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much firmer footing, by learning first about the water dimer and the true water-water pair potential and then 
exploring how the interactions change for larger clusters. 


There has been considerable progress in this direction. The water dimer has been the subject of intense 
spectroscopic study, especially by far-infrared vibration-rotation-tunnelling spectroscopy (FIR-VRTS) [42]. Many 
different bands have been observed, involving intermolecular bending and stretching vibrations and tunnelling 
motions. The potential energy surface is six-dimensional (one distance and five angles) even when intramolecular 
vibrations of the water monomer are neglected. Because of this, developing a purely empirical potential surface 
from the spectroscopic observations is a difficult task. Nevertheless, Fellers et al [43] have used the spectra to fit 
the parameters of a functional form due to Millot and Stone [44]. 


Larger water clusters, including the trimer, tetramer, pentamer and hexamer, have also been studied 
spectroscopically. The equilibrium geometries of some of them are shown in figure CI .3.7 . A characteristic feature 
of the water clusters, and of many others, is that they have large numbers of equivalent minima on their potential 
energy surfaces, with relatively low barriers that allow tunnelling motions between the different minima. There are 
often low-lying subsidiary minima as well, which also affect the spectroscopy. Assigning and interpreting the 
spectra is a complex task involving permutation-inversion symmetry as well as sophisticated potential energy 
surfaces and many-dimensional bound-state calculations. The behaviour of the clusters cannot be explained in 
terms of small excursions from an equilibrium structure: it really becomes necessary to talk of 'spectroscopy 
beyond structure'. 
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Figure Cl.3.7. Equilibrium geometries of some water clusters from ab initio calculations. (Taken from [62].) 

The equilibrium geometry of the water trimer is chiral, with fast tunnelling between the enantiomers [45]. There are 
actually 48 right-handed and 48 left-handed forms, all with the same energy. The tetramer is cyclic, but non-chiral, 
and with many fewer equivalent minima [46]. The pentamer is also cyclic [47], but the hexamer has a cage 
structure [48]. Higher clusters have also been studied, though at lower resolution, by infrared spectroscopy on 
beams of mass-selected clusters [49]. The octamer is particularly interesting, because it is believed to have 
dynamical cubic symmetry. Many other spectroscopic techniques have been used as well. 

The ultimate reason for studying water clusters is of course to understand the interactions in bulk water (though 
clusters are interesting in their own right, too, because finite-size systems can have special properties). There has 
been 
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a vast amount of work on simulating the water clusters, both classically and quantally. Quantum simulations are 
particularly challenging, because of the high dimensionality of the systems. Nevertheless, calculations on the 
lowest vibrational states of systems as large as water octamer have been carried out using quantum Monte Carlo 
methods. 

The intermolecular forces between water molecules are strongly non-additive. It is not realistic to expect any pair 
potential to reproduce the properties of both the water dimer and the larger clusters, let alone liquid water. There 
has therefore been a great deal of work on developing potential models with explicit pairwise-additive and non- 
additive parts [44, 50, 51 ]. It appears that, when this is done, the energy of the larger clusters and ice has a non- 
additive contribution of about 30%. 

An important area that has yet to be fully explored is the effect of the flexibility of water molecules. The 
intermolecular forces in water are large enough to cause significant distortions from the gas-phase monomer 
geometry. In addition, the flexibility is crucial in any description of vibrational excitation in water. 

C1. 3.3.3 REACTIVE SPECIES: H 2 -OH 

One of the motivations for studying Van der Waals complexes and clusters is that they are floppy systems with 
similarities to the transition states of chemical reactions. This can be taken one stage further by studying clusters 
that actually are precursors for chemical reactions, and can be broken up to make more than one set of products. A 
good example of this is H 2 -OH, which can in principle dissociate to form either H 2 + OH or H 2 + H. Indeed, 
dissociation to H 2 O + H is energetically favoured: the reaction H 2 + OH^> H 2 O + H is exothermic by about 5000 

cm , and plays a key role in combustion. It has been extensively studied both in the gas phase and in crossed 
molecular beams. The only reason that the H 2 -OH complex can be observed at all is that there is a barrier to 

reaction of more than 2000 cm , and the transition state has a quite different geometry from the complex. 

OH is an open-shell molecule with a IT ground state. In the complex, the IT state splits into two states of A ' and A " 
symmetry. Ab initio calculations [ 52 ] gave a well depth of 188 cm (relative to free OH + H 2 ) for the A " state, in a 
symmetric 0-H-H 2 geometry. The first excited state, by contrast, has a well at least 2300 cm deep. The first 
spectroscopic observations of H 2 -OH [53], using laser-induced fluorescence, were carried out in parallel with 

bound-state calculations [54]. The early experiments showed broad peaks because the levels of the excited 
electronic state are very short-lived; they confirmed the existence of bound H 2 -OH complexes, but provided only 
limited information on the interactions between ground-state molecules. Nevertheless, they opened the way to 
much higher-resolution studies using infrared overtone pumping to excite the OH vibration [55] and stimulated 
Raman spectroscopy to excite the H 2 vibration in the complex [56]. 

The vibrationally excited states of H 2 -OH have enough energy to decay either to H 2 and OH or to cross the barrier 
to reaction. Time-dependent experiments have been carried out to monitor the non-reactive decay (to H 2 + OH), 
which occurs on a timescale of microseconds for H 2 -OH but nanoseconds for D 2 -OH [ 57 , 58 ]. Analogous 
experiments have also been carried out for complexes in which the H 2 vibration is excited [59]. The reactive decay 
products have not yet been detected, but it is probably only a matter of time. Even if it proves impossible for H 2 - 
OH, there are plenty of other 'pre-reactive' complexes that can be produced. There is little doubt that the 
spectroscopy of such species will be a rich source of information on reactive potential energy surfaces in the fairly 
near future. 
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C1.4 Atom traps and studies of ultracold systems 

John Weiner 


C1.4.1 INTRODUCTION 

Just over a decade ago the first successful experiments and theory demonstrating that light could be used to cool 
and confine atoms to sub-milliKelvin temperatures [1, 2] opened several exciting new chapters in chemical physics 
and in atomic, molecular and optical (AMO) physics. Atom optics and interferometry [5, 6], holography [7], optical 
lattices [8] and Bose-Einstein condensation in dilute gases all exemplify startling new physics where atoms cooled 
with light play a pivotal role. The nature of the interactions between these cold particles has become the subject of 
intensive study not only because of their importance to these new areas of AMO physics but also because their 
investigation has led to new insights into how association spectroscopy of the colliding species can lead to 
precision measurements of atomic and molecular parameters and how radiation fields can manipulate the outcome 


of a collision itself. As a general orientation figure CI. 4.1 shows how a typical atomic de Broglie wavelength 
varies with temperature and where various physical phenomena are situated along the scale. With de Broglie 
wavelengths of the order of a few thousandths of a nanometre, conventional gas-phase chemistry can usually be 
interpreted as the interaction of classical nuclear point particles moving along potential surfaces defined by their 
associated electronic charge distribution. At one time liquid helium was thought to define a regime of cryogenic 
physics, but it is clear from figure CI. 4.1 that optical and evaporative cooling have created 'cryogenic' 
environments below liquid helium by many orders of magnitude. At the level of Doppler cooling and optical 
molasses the de Broglie wavelength becomes comparable to or longer than the chemical bond, approaching the 
length of the cooling optical light wave. Here we can expect wave and relativistic effects such as resonances, 
interferences and interaction retardation to become important. Following Suominen [9], we will term the Doppler 
cooling and optical molasses temperature range, roughly between 1 mK and 1 juK, the regime of cold collisions. 
Most collision phenomena at this level are studied in the presence of one or more light fields used to confine the 
atoms and to probe their interactions. Excited quasimolecular states often play an important role. Below about 1 ju 
K, where evaporative cooling and Bose-Einstein condensation (BEC) become the focus of attention, the de Broglie 
wavelength grows to a scale comparable to the mean distance separating atoms in a dilute gas; quantum degenerate 
states of the atomic ensemble begin to appear. In this regime ground-state collisions only take place through radial 
(not angular) motion and are characterized by a phase shift, or scattering length, of the ground-state wave function. 
Since the atomic translational energy now lies below the kinetic energy transferred to an atom by recoil from a 
scattered photon, cooling with light fields can be of no further use; and collisions occurring in a temperature range 
from 1 ju K to must be ground-state interactions. 


-2- 


Temptratur* 


10ME 


IK 


IntfL 


] ,iK 


lift 


\ 

Chemistry 

Conventional 
Atomic Beams 
\ 
Liquid He 

\ 

Velocity Selected 

Atomic Beams 
V 

\ 

Doppler Cooling 
Optical Traps 

u y 

Optical Molasses 
\ 
Magnetic Traps 

\ 

[ivup oration 

BEC 

I I L I I Q 


OjM 


■\ 


100 lo' 

1 mn 

de Bfoghe X (mn) 

p = mv = h/X 


10* 

1 mn 


Figure Cl.4.1. The situation of various physical phenomena along a scale of temperature plotted against de Broglie 
wavelength. 


C1.4.2 THE PHYSICS OF NEUTRAL-ATOM TRAPS 

C1. 4.2.1 ATOM TRAPS 

(A) LIGHT FORCES AND DOPPLER-LIMIT COOLING 

It is well known that a light beam carries momentum and that the scattering of light by an object produces a force. 
This property of light was first demonstrated by Frisch [10] through the observation of a very small transverse 

deflection (3 x 10 rad) in a sodium atomic beam exposed to light from a resonance lamp. With the invention of 
the laser, it became easier to observe effects of this kind because the strength of the force is greatly enhanced by the 
use of intense and highly directional light fields, as demonstrated by Ashkin [ 11 ] with the manipulation of 
transparent dielectric spheres suspended in water. Although the results of Frisch and Ashkin rekindled interest in 
using light forces 
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to control the motion of neutral atoms, the basic groundwork for the understanding of light forces acting on atoms 
was not laid out before the end of the decade of the 1970s. Unambiguous experimental demonstration of atom 
cooling and trapping was not accomplished before the mid-80s. In this section we discuss some fundamental 
aspects of light forces and schemes employed to cool and trap neutral atoms. 

The light force exerted on an atom can be of two types: a dissipative, spontaneous force and a conservative, dipole 
force. The spontaneous force arises from the impulse experienced by an atom when it absorbs or emits a quantum 
of photon momentum. When an atom scatters light, the resonant scattering cross section can be written as 
a<s = ^/27rwhere X^ is the on-resonant wavelength. In the optical region of the electromagnetic spectrum the 
wavelengths of light are of the order of several hundreds of nanometres, so resonant scattering cross sections 
become quite large,~10 -9 cm 2 . Each photon absorbed transfers a quantum of momentum Tjk to the atom in the 
direction of propagation (A is the Planck constant divided by 2tt, and k = 2n I X is the magnitude of the wave vector 
associated with the optical field). The spontaneous emission following the absorption occurs in random directions; 
and, over many absorption-emission cycles, it averages to zero. As a result, the net spontaneous force acts on the 
atom in the direction of the light propagation, as shown schematically in the diagram of figure CI. 4.2. The 
saturated rate of photon scattering by spontaneous emission (the reciprocal of the excited-state lifetime) fixes the 
upper limit to the force magnitude. This force is sometimes called radiation pressure. 
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Figure Cl.4.2. Spontaneous emission following absorption occurs in random directions, but absorption from a 
light beam occurs along only one direction. 

The dipole force can be readily understood by considering the light as a classical wave. It is simply the time- 
averaged force arising from the interaction of the transition dipole, induced by the oscillating electric field of the 
light, with the gradient of the electric field amplitude. Focusing the light beam controls the magnitude of this 
gradient and detuning the optical frequency below or above the atomic transition controls the sign of the force 
acting on the atom. Tuning the light below resonance attracts the atom to the centre of the light beam while tuning 
above resonance repels it. The dipole force is a stimulated process in which no net exchange of energy between the 


field and the atom takes place, but photons are absorbed from one mode and reappear by stimulated emission in 
another. Momentum conservation requires that the change of photon propagation direction from initial to final 
mode imparts a net recoil to the atom. Unlike the spontaneous force, there is in principle no upper limit to the 
magnitude of the dipole force since it is a function only of the field gradient and detuning. 
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We can bring these qualitative remarks into focus by considering the amplitude, phase and frequency of a classical 
field interacting with an atomic transition dipole in a two-level atom. A detailed development of the following 
results is beyond the scope of the present article, but can be found elsewhere [12, 13 ]. The usual approach is 
semiclassical and consists in treating the atom as a two-level quantum system and the radiation as a classical 
electromagnetic field [14]. A full quantum approach can also be employed [15], but it will not be discussed here. 
What follows immediately is sometimes called the Doppler cooling model. It turns out that atoms with hyperfme 
structure in the ground state can be cooled below the Doppler limit predicted by this model; and, to explain this 
unexpected sub-Doppler cooling, models involving interaction between a slowly moving atom and the polarization 
gradient of a standing wave have been invoked. We will sketch briefly in the next section the physics of these 
polarization gradient cooling mechanisms. 

The basic expression for the interaction energy is 


U = -fiE (C1.4.1a) 

where ju is the transition dipole and E is the electric field of the light. The force is then the negative of the spatial 
gradient of the potential, 

F = -V R U =/iV K £ ( C1 - 4 - 2 ) 

where we have set V R equal to zero because there is no spatial variation of the dipole over the length scale of the 
optical field. The optical-cycle average of the force is expressed as 

(F) = (uV R t) = ju[(V R £o)u - (£Wr(*rK))u] (C1-4-3) 

where u and v arise from the steady-state solutions of the optical Bloch equations, 


u — 


£2 A^l 

lu^y 1 ir/iy- *&/2 (C1A4) 


and 


n r/2 

V = . (C1 .4.5) 

In equations (CI. 4.4) and (CI. 4. 5) Aco L = co - co R is the detuning of the optical field from the atomic transition 
frequency co p , Q is the natural width of the atomic transition and co is termed the Rabi frequency and reflects the 
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strength of the coupling between field and atom, 


n = - 




(C1.4.6) 


In writing equation (CI. 4. 3) we have made use of the fact that the time-average dipole has in-phase and in- 
quadrature components, 


(fi) = 2/4(hcos&?l/ — usin^O 


(C1.4.7) 


and the electric field of the light is given by the classical expression, 


E = £ [cos;(^/ -k L R)l 


(C1.4.8) 


The time-averaged force, equation (CI. 4. 3) , consists of two terms: the first term is proportional to the gradient of 
the electric field amplitude; the second term is proportional to the gradient of the phase. Substituting equation 
(CI. 4.4) and equation (CI. 4. 5) into equation (CI. 4.3) , we have for the two terms, 


< f '-« v » e 4[ <w-£w»~gy2 ] 


(C1.4.9) 


The first term is the dipole force, sometimes called the trapping force, Fj, because it is a conservative force and can 
be integrated to define a trapping potential for the atom: 


_ fir iwi i 

ft-^»Wy[ (AwL)1 + (r/2)a + ff/2 J 


(C1.4.10) 


and 


u T 


-I 


FjdR = 


h AfttL 


ln[ 


1 i 


Q 2 /2 


(A^,.) 2 -(r/2) 3 _ 


(C1.4.11) 
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The second term is the spontaneous force, sometimes called the cooling force, F c , because it is a dissipative force 
and can be used to cool atoms, 


il 


r/2 


Fc-tiEokL 2 L(A WL ) 2 + (r/2) 2 + n 2 /2. 


(C1.4.12) 


Note that in equation (Cl.4.10) , the line-shape function in square brackets is dispersive and changes sign as the 
detuning Aco T changes sign from negative (red detuning) to positive (blue detuning). In equation (CI. 4. 12), the 


line-shape function is absorptive, peaks at zero detuning and exhibits a Lorentzian profile. These two equations can 
be recast to bring out more of their physical content. The dipole force can be expressed as 

£ i = ^VQ 2 7iAio L \ 1 (C1.4.13a) 

2Q 2 Ll+$J 

where s, the saturation parameter, is defined to be 

_ ±£ll (C1.4.14) 

(A^) 2 + (r/2) 2 " 

In equation (CI. 4. 14) the saturation parameter essentially defines a criterion to compare the time required for 
stimulated and spontaneous processes. Ifs^C 1 then spontaneous coupling of the atom to the vacuum modes of the 
field is fast compared to the stimulated Rabi coupling and the field is considered weak. If s^> 1 then the Rabi 
oscillation is fast compared to spontaneous emission and the field is said to be strong. Setting s equal to unity 
defines the saturation condition 


Q ttl = V2 M (C1-4.15) 


-o 


and, as can be seen from the line-shape factor in equation (CI. 4. 12), the resonance line width is power broadened 
by a factor of v'?. With the help of the definition of the Rabi frequency, equation (CI. 4.6) , and the light beam 
intensity, 


/ ={wE} (C1-4.16) 
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equation (CI. 4. 13a ) can be written in terms of the gradient of the light intensity, the saturation parameter and the 
detuning, 


4/ LI + .t J 


(C1.4.17) 


Note that negative Aco L (red detuning) produces a force attracting the atom to the intensity maximum while positive 
D oo L (blue detuning) repels the atom away from the intensity maximum. The spontaneous force or cooling force 
can also be written in terms of the saturation parameter and the spontaneous emission rate, 


Fc - — LtttJ 


(C1.4.18) 


which shows that this force is simply the rate of absorption and reemission of momentum quanta ^k L carried by a 


fijL.r 


photon in the light beam. Note that as s increases beyond unity, F c approaches 5sJ- 5 the maximum photon 
scattering rate. Furthermore, from the previous definitions of 7,Q and Q t , we can write 

(C1.4.19) 


r *H 


1-/2 


and 


F = r^j f r/i* 


(C1.4.20) 


Now if we consider the atom moving in the +z direction with velocity v z and counterpropagating to the light wave 
detuned from resonance by Aoe> L , the net detuning will be 


At£? = Acoi ■•" A"i,i/ ; 


(C1.4.21) 


where the term k^y z is the Doppler shift. The force F acting on the atom will be in the direction opposite to its 
motion. In general, 


F ± = ± 


Jrfc, r 


m 


™-. 


2 [ ( ^-A- P|) ) ^ f//a| + 1 j 


(C1.4.22) 
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Suppose we have two fields propagating in the ±z directions and we take the net force F = F + + F_. lfkv z is small 
compared to T and Aoo L , then we find 


F 2r 47tk 




(C1.4.23) 


This expression shows that if the detuning Aoo L is negative (i.e. red detuned from resonance), then the cooling force 
will oppose the motion and be proportional to the atomic velocity. The one-dimensional motion of the atom, 
subject to an opposing force proportional to its velocity, is described by a damped harmonic oscillator. The Doppler 
damping or friction coefficient is the proportionality factor, 


u d = -4hk 2 


(2A^/r) 


/ aail []+t2aw L /r) 2 ] 2 


(C1.4.24) 


and the characteristic time to damp the kinetic energy of the atom of mass m to lie of its initial value is 


r = 


2^1 


(C1.4.25) 


However, the atom will not cool indefinitely. At some point the Doppler cooling rate will be balanced by the 
heating rate coming from the momentum fluctuations of the atom absorbing and re-emitting photons. Setting these 
two rates equal and associating the one-dimensional kinetic energy with \k^T, we find 


ar- n-ga^/p 2 (C1426) 

B 4 2|A^ L |/r ■ 

This expression shows that Tis a function of the laser detuning, and the minimum temperature is obtained when 
A ^ L = ~ 2 At this detuning, 

B 2 

which is called the Doppler-cooling limit. This limit is typically, for alkali atoms, on the order of a few hundred 
microKelvin. For example the Doppler cooling limit for Na is T= 240 microKelvin. In the early years of cooling 
and trapping, prior to 1988, the Doppler limit was thought to be a real physical barrier. Then the experimental 
measurements of Lett et al [16] showed that in fact Na atoms could be cooled well below the Doppler limit. 
Although the physics of this sub-Doppler cooling in three dimensions is still not fully understood, the essential role 
played by the hyperfme structure of the ground state has been worked out in one-dimensional models which we 
describe here. 
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(B) SUB-DOPPLER COOLING 

Two principal mechanisms which cool atoms to temperatures below the Doppler limit rely on spatial polarization 
gradients of the light field through which the atoms move [17]. These two mechanisms, however, invoke very 
different physics, and are distinguished by the spatial polarization dependence of the light field. Two parameters, 
the friction coefficient and the velocity capture range, determine the significance of these cooling processes. In this 
section we compare expressions for these quantities in the sub-Doppler regime to those found in the conventional 
Doppler cooling model of one-dimensional optical molasses. 

In the first case two counterpropagating light waves with orthogonal polarization form a standing wave. This 
arrangement is familiarly called the 'lin-perp-lin' configuration. Figure CI. 4.3 shows what happens. We see from 
the figure that if we take as a starting point a position where the light polarization is linear s 1? it evolves from linear 
to circular over a distance of A/8 (a). Then over the next A/8 interval the polarization again changes to linear but in 
the direction orthogonal to the first (s 2 ). Then from j/4 to 3 A/8 the polarization again becomes circular but in the 
sense opposite (a + ) to the circular polarization at A/8, and finally after a distance of A/2 the polarization is again 
linear but antiparallel to (Sj). Over the same half- wavelength distance of the polarization period, atom-field 
coupling produces a periodic energy (or light) shift in the hyperfme levels of the atomic ground state. To illustrate 
the cooling mechanism we assume the simplest case, a J = ^-^ J Q = ^transition. As shown in figure CI. 4.4 the 

atom moving through the region of z around A/8, where the polarization is primarily a_, will have its population 
pumped mostly into J = -^. Furthermore the Clebsch-Gordan coefficients controlling the transition dipole 

coupling to J = ^impose that the J = -A level couples to a_ light three times more strongly than does the J = +\ 

6 g z - g z 

level. The difference in coupling strength leads to the light shift splitting between the two ground states shown in 
figure CI. 4.4 . As the atom continues to move to +z, the relative coupling strengths are reversed around 3 A/8 where 
the polarization is essentially a + . Thus the relative energy levels of the two hyperfme ground states oscillate 'out of 
phase' as the atom moves through the standing wave. The key idea is that the optical pumping rate, always 
redistributing population to the lower-lying hyperfme level, lags the light shifts experienced by the two atom 
ground-state components as the atom moves through the field. The result is a 'Sisyphus effect' where an atom 
cycles through a period in which it spends most of its time climbing a potential hill, converting kinetic energy to 
potential energy, subsequently dissipating the accumulated potential energy into the empty modes of the radiation 
field and simultaneously transferring population back to the lower lying of the two ground-state levels. Figure 
Cl.4.5 illustrates the optical pumping phase lag. In order for this cooling mechanism to work, the optical pumping 
time, controlled by the light intensity, must be less than the light-shift time, controlled essentially by the velocity of 
the atom. Since the atom is moving slowly, having been previously cooled by the Doppler mechanism, the light 


field must be weak in order to slow the optical pumping rate so that it lags the light-shift modulation rate. This 
physical picture combines the conservative optical dipole force, whose space integral gives rise to the potential hills 
and valleys over which the atom moves and the irreversible energy dissipation of spontaneous emission required to 
achieve cooling. We can make the discussion more precise and obtain simple expressions for the friction 
coefficient and velocity capture by establishing some definitions. As in the Doppler cooling model we define the 
friction coefficient (Xj , to be the proportionality constant between the force F and the atomic velocity v. 
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Figure Cl.4.3. Schematic diagram of the 'lin-perp-lin' configuration showing spatial dependence of the 
polarization in the standing-wave field (after [17]). 
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Figure Cl.4.4. Schematic diagram showing how the two ±2 levels of the ground state couple to the spatially 
varying polarization of the 'lin-perp-lin' standing wave light field (after [17]). 
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Figure CI. 4.5. Population modulation as the atom moves through the standing wave in the 'lin-perp-lin' one 
dimensional optical molasses. The population lags the light shift such that kinetic is converted to potential energy 
then dissipated into the empty modes of the radiation field by spontaneous emission (after [17]). 
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F = -<Y lf> |L'. (C1.4.28) 

We assume that the light field is detuned to the red of the J — > J atomic resonance frequency, 

Al =(o L -0)& (C1.4.29) 

and term the light shifts of the J = ±^ levels A_ respectively. At the position z = A/8, A_ = 3A + and at z = 3 A/8, A + 

= 3A_. Since the applied field is red detuned, all A have negative values. Now in order for the cooling mechanism 
to be effective the optical pumping time x should be comparable to the time required for the atom with velocity v 
to travel from the bottom to the top of a potential hill, i£I, 


X/4 


T ^ _^_ (C1.4.30) 


V 


or 

r'-jfcu (C1.4.31) 

where F = l/x and A/4— l/k, with k = =^the magnitude of the optical wave vector. Now the amount of energy W 
p ^ 

dissipated in one cycle of hill climbing and spontaneous emission is essentially the average energy splitting of the 
two light-shifted ground states, between, say, z = A/8 and 3 A/8 or W~-ftA. Therefore the rate of energy dissipation 
is 


— - = -rfiA. (C1.4.32) 


However in general the time-dependent energy change of a system can be always be expressed as J J l " = fi?so in 
this one-dimensional model and taking into account equation (CI. 4.28) we can write 

dW 


dt 


= - ff|]1 |i; 2 =-r/rA (C1-4.33) 
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so that 


tf| P i=- = =-- ( C1A34 ) 


Note that since A < 0, a 1 , is a positive quantity. Note also that at far detunings (A L ^>r) equation (CI. 4. 11) shows 
that A — j^— It is also true that for light shifts large compared to the natural linewidth (A ^>r), p / p' = ^so the 
sub-Doppler friction coefficient can also be written 


<JT| P L = 


Jt 2 ftA, 
41 


(C1.4.35) 


Equation (CI. 4. 35) yields two remarkable predictions: first, that the sub-Doppler friction coefficient can be a big 
number compared to a d since at far detuning A L /r is a big number; and second, that oij * is independent of the 
applied field intensity. This last result contrasts sharply with the Doppler friction coefficient which is proportional 
to field intensity up to saturation (see equation (CI. 4.24) . However, even though cl^ looks impressive, the range of 
atomic velocities over which is can operate are restricted by the condition that T fj ^kv. The ratio of the capture 
velocities for Doppler versus sub-Doppler cooling is therefore only uipi/uj ^ — • Figure CI. 4. 6 illustrates 

graphically the comparison between the Doppler and the 'lin-perp-lin' sub-Doppler cooling mechanism. The 
dramatic difference in capture range is evident from the figure. Note also that the slopes of the curves give the 
friction coefficients for the two regimes and that, within the narrow velocity capture range of its action, the slope of 
the sub-Doppler mechanism is markedly steeper. 
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Figure Cl.4.6. Comparison of capture velocity for Doppler cooling and 'lin-perp-lin' sub-Doppler cooling. Notice 
that the slope of the curves, proportional to the friction coefficient, is much steeper for the sub-Doppler mechanism. 
(After [17].) 


The second mechanism operates with the two counterpropagating beams circularly polarized in opposite senses. 
When the two counterpropagating beams have the same amplitude, the resulting polarization is always linear and 
orthogonal to the propagation axis, but the tip of the polarization axis traces out a helix around the propagation axis 
with a pitch of A,. Figure CI. 4.7 illustrates this case. The physics of the sub-Doppler mechanism does not rely on 
hill-climbing and spontaneous emission, but on an imbalance in the photon scattering rate from the two 
counterpropagating light waves as the atom moves along the z axis. This imbalance leads to a velocity-dependent 
restoring force acting on the atom. The essential factor leading to the differential scattering rate is the creation of 
population orientation along the z axis among the sublevels of the atom ground state. Those sublevels with more 
population scatter more photons. Now it is evident from a consideration of the energy level diagram and the 


Clebsch-Gordan coefficients coupling ground and excited levels that J = 2<r+ J Q = ^transitions coupled by linearly 
polarized light cannot produce a population orientation in the ground state. In fact the simplest system to exhibit 
this effect is / = l<-» J = 2, and a measure of the orientation is the magnitude of the (J) matrix element between 
the J = ± sublevels. If the atom remained stationary at z = 0, interacting with the light polarized along y, the light 
shifts Aq, Aj of the three ground-state sublevels would be 


A +! = A_| = JA (C1.4.36) 

and the steady-state populations 4/17, 4/17 and 9/17 respectively. Evidently, linearly polarized light will not 
produce a net steady-state orientation, (J ). As the atom begins to move along z with velocity v, however, it sees a 
linear polarization precessing around its axis of propagation with an angle cp = -kz = -kvt. This precession gives 
rise to a new 
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term in the Hamiltonian, V= kvJ z . Furthermore, if we transform to a rotating coordinate frame, the eigenfunctions 
belonging to the Hamiltonian of the moving atom in this new 'inertial' frame become linear combinations of the 
basis functions with the atom at rest. Evaluation of the steady-state orientation operator, J , in the inertial frame is 
now nonzero, 


w -^^-» [ni ,-n_,]. ( C1437 ) 

17 Ao 

Notice that the orientation measure is only nonzero when the atom is moving. In equation (CI. 4. 37) we denote the 
populations of the | ±) sublevels as n + , and we interpret the nonzero matrix element as a direct measure of the 
population difference between the |±— » levels of the ground state. Note that since A Q is a negative quantity (red 
detuning), equation (CI. 4. 37) tells us that the n_ population is greater than the n + population. Now, if the atom 
travelling in the +z direction is subject to two light waves, one with polarization a_(a + ) propagating in the -z(+z) 
direction, the preponderance of population in the | -) level will result in a higher scattering rate from the wave 
travelling in the -z direction. Therefore the atom will be subject to a net force opposing its motion and proportional 

to its velocity. The differential scattering rate is l? in and, with an hk momentum quantum transferred per 
scattering event, the net force is 

/■ = (C1.4.38) 

17 A c 

The friction coefficient a, is evidently 
cp 

40. r 

a = -—hk 2 — (C1.4.39) 

17 A ft 

which is a positive quantity since A Q is negative from red detuning. Contrasting a with a 1 , we see that a must 
be much smaller since the assumption has been all along that the light shifts A were much greater than the line 
widths P. It turns out, however [17], that the heating rate from recoil fluctuations is also much smaller so that the 
ultimate temperatures reached from the two mechanisms are comparable. 
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Figure Cl.4.7. Spatial variation of the polarization from the field resulting from two counterpropagating, circularly 
polarized fields with equal amplitude but polarized in opposite senses. Note that the polarization remains linear but 
that the axis rotates in the x-y plane with a helical pitch along the z axis of length X. 

Although the Doppler cooling mechanism also depends on a scattering imbalance from oppositely travelling light 
waves, the imbalance in the scattering rate originates from a difference in the scattering probability per photon due 
to the Doppler shift induced by the moving atom. In the sub-Doppler mechanism the scattering probabilities from 
the two light waves are equal but the ground-state populations are not. The state with the greater population 
experiences the greater rate. 

(C) THE MAGNETO-OPTICAL TRAP (MOT) 

Basic notions. [18] originally suggested that the spontaneous light force could be used to trap neutral atoms. The 
basic concept exploited the internal degrees of freedom of the atom as a way of circumventing the optical 
Earnshaw theorem (OET) proved by [19]. This theorem states that if a force is proportional to the light intensity, its 
divergence must be null because the divergence of the Poynting vector, which expresses the directional flow of 
intensity, must be null through a volume without sources or sinks of radiation. This null divergence rules out the 
possibility of an inward restoring force everywhere on a closed surface. However, when the internal degrees of 
freedom of the atom are considered, they can change the proportionality between the force and the Poynting vector 
in a position-dependent way such that the OET does not apply. Spatial confinement is then possible with 
spontaneous light forces produced by counterpropagating optical beams. Using these ideas to circumvent the OET, 
Raab et al [20] demonstrated a trap configuration that is currently the most commonly employed. It uses a radial 
magnetic field gradient produced by a quadrupole field and three pairs of circularly polarized, counterpropagating 
optical beams, detuned to the red of the atomic transition and intercepting at right angles in the position where the 
magnetic field is zero. The magneto-optical trap exploits the position-dependent Zeeman shifts of the electronic 
levels when the atom moves in the radially increasing magnetic field. The use of circularly polarized light, red- 
detuned ~r results in a spatially dependent transition probability whose net effect is to produce a restoring force 
that pushes the atom toward the origin. 

To make clear how this trapping scheme works, consider a two-level atom with &J=0—>J = 1 transition moving 
along the z direction. We apply a magnetic field B(z) increasing linearly with distance from the origin. The Zeeman 
shifts of the electronic levels are position dependent, as shown in figure CI. 4. 8(a) . We also apply 
counterpropagating optical fields along the ±z directions carrying oppositely circular polarization and detuned to 
the red of the atomic 


-16- 


transition. It is clear from figure CI .4.8 that an atom moving along +z will scatter a photons at a faster rate than 


a + photons because the Zeeman effect will shift the A Mj = -1 transition closer to the light frequency. 


(*) 



I +Z 



Figure Cl.4.8. (a) An energy level diagram showing the shift of Zeeman levels as the atom moves away from the z 
= axis. The atom encounters a restoring force in either direction from counterpropagating light beams, (b) A 
typical optical arrangement for implementation of a magneto-optical trap. 
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(C1.4.40) 


Similarly, if the atom moves along -z it will scatter a + photons at a faster rate from the AMj = +1 transition. 




Q 2 /2 
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(C1.4.41) 
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The atom will therefore experience a net restoring force pushing it back to the origin. If the light beams are red 
detuned ~T, then the Doppler shift of the atomic motion will introduce a velocity-dependent term to the restoring 
force such that, for small displacements and velocities, the total restoring force can be expressed as the sum of a 
term linear in velocity and a term linear in displacement, 


(C1.4.42) 


Equation (CI. 4.42) expresses the equation of motion of a damped harmonic oscillator with mass m, 

Z-—Z + —Z = U. (C1.4.43) 

The damping constant a and the spring constant K can be written compactly in terms of the atomic and field 
parameters as 

ttT . i6|AW) 2 (*/n 

<* = "AT r— r (C1 4 44) 

and 

C=8tr — ^ ' , (C1.4.45) 

[] < 2|ffff[l < a^P 

where Q',A' and d^/dz = ^^"^ are T-normalized analogues of the quantities defined earlier. Typical MOT 
operating conditions fix Q' = 1/2, A' =1, so a and K reduce to 

«^(0.132)ftJfc 2 (C1.4.46) 

and 
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The extension of these results to three dimensions is straightforward if one takes into account that the quadrupole 
field gradient in the z direction is twice the gradient in the x,y directions, so that K z = 2K x = 2K The velocity 
dependent damping term implies that kinetic energy E dissipates from the atom (or collection of atoms) as 
E/En = e~ - 'where m is the atomic mass and E^ the kinetic energy at the beginning of the cooling process. 

Therefore, the dissipative force term cools the collection of atoms as well as combining with the displacement term 
to confine them. The damping time constant t = |ris typically tens of microseconds. It is important to bear in 

mind that an MOT is anisotropic since the restoring force along the z axis of the quadrupole field is twice the 
restoring force in the x-y plane. Furthermore, an MOT provides a dissipative rather than a conservative trap and it 
is therefore more accurate to characterize the maximum capture velocity rather than the trap 'depth'. 

Early experiments with MOT-trapped atoms were carried out by initially slowing an atomic beam to load the trap 
[ 20 ,21]. Later, a continuous uncooled source was used for that purpose, suggesting that the trap could be loaded 
with the slow atoms of a room-temperature vapour [22]. The next advance in the development of magneto-optical 
trapping was the introduction of the vapour-cell magneto-optical trap (VCMOT). This variation captures cold 
atoms directly from the low-velocity edge of the Maxwell-Boltzmann distribution always present in a cell 


background vapour [23]. Without the need to load the MOT from an atomic beam, experimental apparatus became 
simpler and now many groups around the world use the VCMOT for applications ranging from precision 
spectroscopy to optical control of reactive collisions. 


Densities in an MOT. The VCMOT typically captures about a million atoms in a volume less than a millimetre in 


UO, 


.-3 


diameter, resulting in densities -10 cm . Two processes limit the density attainable in an MOT: (1) collisional 
trap loss and (2) repulsive forces between atoms caused by reabsorption of scattered photons from the interior of 
the trap [21, 24]. Collisional loss in turn arises from two sources: hot background atoms that knock cold atoms out 
of the MOT by elastic impact and binary encounters between the cold atoms themselves. Trap loss due to cold 
collisions is the topic of section CI. 4. 3 . The 'photon-induced repulsion' or photon trapping arises when an atom 
near the MOT centre spontaneously emits a photon which is reabsorbed by a another atom before the photon can 
exit the MOT volume. This absorption results in an increase of 2ftk in the relative momentum of the atomic pair 
and produces a repulsive force proportional to the product of the absorption cross section for the incident light 
beam and scattered fluorescence. When this outward repulsive force balances the confining force, further increase 
in the number of trapped atoms leads to larger atomic clouds, but not to higher densities. 


(D) DARK SPOT 

In order to overcome the 'photon-induced repulsion' effect, Ketterle et al [25] proposed a method that allows the 
atoms to be optically pumped to a 'dark' hyperfme level of the atom ground state that does not interact with the 
trapping light. In a conventional MOT one usually employs an auxiliary 'repumper' light beam, copropagating with 
the trapping beams but tuned to a neighbouring transition between hyperfme levels of ground and excited states. 
The repumper recovers population that leaks out of the cycling transition between the two levels used to produce 
the MOT. As an example, figure CI. 4. 9 shows the trapping and repumping transitions usually employed in an Na 
MOT. The scheme, known as a dark spontaneous-force optical trap (dark SPOT), passes the repumper through a 
glass plate with a small black dot shadowing the beam such that the atoms at the trap centre are not coupled back to 
the cycling transition but spend most of their time (-99%) in the 'dark' hyperfme level. Cooling and confinement 
continue to 


-19- 


function on the periphery of the MOT but the centre core experiences no outward light pressure. The dark SPOT 
increases density by almost two orders of magnitude. 



Figure Cl.4.9. Usual cooling (carrier) and repumping (sideband) transitions when optically cooling Na atoms. The 
repumper frequency is normally derived from the cooling transition frequency with electro-optic modulation. 


Dashed lines show that lasers are tuned about one natural linewidth to the red of the transition frequencies. 

(E) THE FAR-OFF RESONANCE TRAP (FORT) 

Although an MOT functions as a versatile and robust 'reaction cell' for studying cold collisions, light frequencies 
must tune close to atomic transitions and an appreciable steady-state fraction of the atoms remain excited. Excited- 
state trap-loss collisions and photon-induced repulsion limit achievable densities. 

A far-off resonance trap (FORT), in contrast, uses the dipole force rather than the spontaneous force to confine 
atoms and can therefore operate far from resonance with negligible population of excited states. A hybrid 
MOT/dipole-force trap was used by a NIST-Maryland collaboration [ 26 ] to study cold collisions, and a FORT was 

demonstrated by Miller et al [ 27 ] for 85 Rb atoms. The FORT consists of a single, linearly polarized, tightly focused 
Gaussian-mode beam tuned far to the red of resonance. The obvious advantage of large detunings is the 
suppression of photon absorption. Note from equation (CI. 4. 12) that the spontaneous force, involving absorption 
and reemission, falls off as the square of the detuning, while equation (CI. 4. 11) shows that the potential derived 
from the dipole force falls off only as the detuning itself. At large detunings and high field gradients (tight focus) 
equation (CI. 4. 11) becomes 


(/ :~ (C1.4.48) 
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which shows that the potential becomes directly proportional to light intensity and inversely proportional to 
detuning. Therefore, at far detuning but high intensity the depth of the FORT can be maintained but most of the 
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atoms will not absorb photons. The important advantages of FORTs compared to MOTs are (1) high density (-10 
cm ) and (2) a well defined polarization axis along which atoms can be aligned or oriented (spin polarized). The 
main disadvantage is the small number of trapped atoms due to small FORT volume. The best number achieved is 

about 10 4 atoms [28]. 
(F) MAGNETIC TRAPS 

Pure magnetic traps have also been used to study cold collisions and they are critical for the study of dilute gas- 
phase Bose-Einstein condensates (BECs) in which collisions figure importantly. We anticipate, therefore, that 
magnetic traps will play an increasingly important role in future collision studies in and near BEC conditions. 

The most important distinguishing feature of all magnetic traps is that they do not require light to provide atom 
containment. Light-free traps reduce the rate of atom heating by photon absorption to zero, an apparently necessary 
condition for the attainment of BEC. Magnetic traps rely on the interaction of atomic spin with variously shaped 
magnetic fields and gradients to contain atoms. The two governing equations are 


U = -fi iS B = -^2^SB = -^^-M S B (C1.4.49) 


and 


^M.VJJ. (C1.4.50) 


F = -^^-M S VB 


If the atom has nonzero nuclear spin / then F = S + I substitutes for 5* in equation (C 1 .4 .49), the g-factor 
generalizes to 

... F(F+1) +5(5+ !)-/</ + !) ....... 

vy = pv (C1.4.51) 

and 

F= -Ht-l^Mi VS. (C1.4.52) 

ft 

Depending on the sign of U and F, atoms in states whose energy increases or decreases with magnetic field are 
called 'weak-field seekers' or 'strong-field seekers', respectively. One could, in principle, trap atoms in any of 
these states, 
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needing only to produce a minimum or a maximum in the magnetic field. Unfortunately only weak- field seekers 
can be trapped in a static magnetic field because such a field in free space can only have a minimum. Dynamic 
traps have been proposed to trap both weak- and strong-field seekers [29]. Even when weak- field seeking states are 
not in the lowest hyperfme levels they can still be used for trapping because the transition rate for spontaneous 

magnetic dipole emission is ~10 s . However, spin-changing collisions can limit the maximum attainable 
density. The first static magnetic field trap for neutral atoms was demonstrated by Migdall et al [30]. An anti- 
Helmholtz configuration, similar to an MOT, was used to produce an axially symmetric quadrupole magnetic field. 
Since this field design always has a central point of vanishing magnetic field, nonadiabatic Majorana transitions 
can take place as the atom passes through the zero point, transferring the population from a weak- field to a strong- 
field seeker and effectively ejecting the atom from the trap. This problem can be overcome by using a magnetic 
bottle with no point of zero field [31, 32, 33 and 34]- The magnetic bottle, also called the Ioffe-Pritchard trap, was 
recently used to achieve BEC in a sample of Na atoms pre-cooled in an MOT [35]. Other approaches to eliminating 
the zero-field point are the time-averaged orbiting potential (TOP) trap [ 36 ] and an optical 'plug' [ 37 ] that consists 
of a blue-detuned intense optical beam aligned along the magnetic trap symmetry axis and producing a repulsive 
potential to prevent atoms from entering the null-field region. Trap technology continues to develop and the recent 
achievement of BEC will stimulate more robust traps containing greater numbers of atoms. At present ~10 atoms 
can be trapped in a BEC loaded from an MOT containing ~10 9 atoms. 


C1.4.3 INELASTIC EXOERGIC COLLISIONS IN MOTS 

An exoergic collision converts internal atomic energy to kinetic energy of the colliding species. When there is only 
one species in the trap (the usual case) this kinetic energy is equally divided between the two partners. If the net 
gain in kinetic energy exceeds the trapping potential or the ability of the trap to recapture, the atoms escape; and the 
exoergic collision leads to trap loss. 

Of the several trapping possibilities described in the last section, by far the most popular choice for collision studies 
has been the magneto -optical trap (MOT). An MOT uses spatially dependent resonant scattering to cool and 
confine atoms. If these atoms also absorb the trapping light at the initial stage of a binary collision and approach 
each other on an excited molecular potential, then during the time of approach the colliding partners can undergo a 
fine-structure-changing collision (FCC) or relax to the ground state by spontaneously emitting a photon. In either 
case, electronic energy of the quasimolecule converts to nuclear kinetic energy. If both atoms are in their electronic 
ground states from the beginning to the end of the collision, only elastic and hyperfme changing (HCC) collisions 


can take place. Elastic collisions (identical scattering entrance and exit states) are not exoergic but figure 
importantly in the production of Bose-Einstein condensates (BECs). At the very lowest energies only s waves 
contribute to the elastic scattering and in this regime the collisional interaction is characterized by the scattering 
length. The sign of the scattering length determines the properties of a weakly interacting Bose gas and the 
magnitude controls the rate of evaporative cooling needed to achieve BEC. The HCC collisions arise from ground- 
state splitting of the alkali atoms into hyperfme levels due to various orientations of the nonzero nuclear spin. A 
transition from higher to lower molecular hyperfme level during the collisional encounter releases kinetic energy. 
In the absence of external light fields HCCs often dominate trap heating and loss. 
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If the collision starts on the excited level, the long-range dipole-dipole interaction produces an interatomic 

potential varying as ±C^/R . The sign of the potential, attractive or repulsive, depends on the relative phase of the 
interacting dipoles. For the trap-loss processes that concern us in this chapter we concentrate on the attractive long- 

range potential, -CyR . Due to the extremely low energy of the collision, this long-range potential acts on the 
atomic motion even when the pair are as far apart as X/2n (the inverse of the light- field wave vector k). Since the 
collision time is comparable to or greater than the excited-state lifetime, spontaneous emission can take place 
during the atomic encounter. If spontaneous emission occurs, the quasimolecule emits a photon red shifted from 
atomic resonance and relaxes to the ground electronic state with some continuum distribution of the nuclear kinetic 
energy. This conversion of internal electronic energy to external nuclear kinetic energy can result in a considerable 
increase in the nuclear motion. If the velocity is not too high, the dissipative environment of the MOT is enough to 
cool this radiative heating, allowing the atom to remain trapped. However, if the transferred kinetic energy is 
greater than the recapture ability of the MOT, the atoms escape the trap. This process constitutes an important trap- 
loss mechanism termed radiative escape (RE), and was first pointed out by Vigue [38]. For alkalis there is also 
another exoergic process involving excited-ground collisions. Due to the existence of fine structure in the excited 
state (P 3/2 and P 1/2 )> the atomic encounter can result in FCC, releasing A FS of kinetic energy, shared equally 
between both atoms. For example in sodium Mt£ ^ \j K n wmcn can easily cause the escape of both atoms from 

the MOT, typically 1 K deep. 

These three effects, HCC, RE and FCC, are the main exoergic collisional process that take place in an MOT. They 
are the dominant loss mechanisms which usually limit the maximum attainable density and number in MOTs. They 
are not, however, the only type of collision in the trap. 

C1 .4.3.1 PHOTOASSOCIATION AT AMBIENT AND ULTRACOLD TEMPERATURES 

The first measurement of a free-bound photoassociative absorption appeared long before the development of 
optical cooling and trapping, about two decades ago, when Scheingraber and Vidal [39] reported the observation of 
photoassociation in collisions between magnesium atoms. In this experiment fixed UV lines from an argon ion 
laser excited free-bound transitions from the thermal continuum population of the ground X X ££ state to bound 
levels of the A A 1 Sj state of Mg 2 . Scheingraber and Vidal analysed the subsequent fluorescence to bound and 

continuum states from which they inferred the photoassociative process. The first unambiguous photoassociation 
excitation spectrum, however, was measured by Inoue et al [40] in collisions between Xe and CI at 300 K. In both 
these early experiments the excitation was not very selective due to the broad thermal distribution of populated 
continuum ground states. Jones et al [41], with a technically much improved experiment, reported beautiful free- 
bound vibration progressions in KrF and XelX-^ B transitions; and, from the intensity envelope modulation, were 
able to extract the functional dependence of the transition moment on the internuclear separation. Although 
individual vibrational levels of the B state were clearly resolved, the underlying rotational manifolds were not. 
Jones et al [41] simulated the photoassociation structure and line shapes by assuming a thermal distribution of 
rotational levels at 300 K. Photoassociation and dissociation processes prior to the cold and ultracold epoch have 
been reviewed by Tellinghuisen [42]. 

A decade after Schenigraber and Vidal reported the first observation of photoassociation, Thorsheim et al [ 43 ] 


proposed that high-resolution free-bound molecular spectroscopy should be possible using optically cooled and 
confined atoms. Figure CI. 4. 10 shows a portion of their calculated X — » A absorption spectrum at 10 mK for 
sodium atoms. This figure illustrates how cold temperatures compress the Maxwell-Boltzmann distribution to the 
point where 
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individual rotational transitions in the free-bound absorption are clearly resolvable. The marked differences in peak 
intensities indicate scattering resonances, and the asymmetry in the line shapes, tailing off to the red, reflect the 
thermal distribution of ground-state collision energies at 10 mK. Figure CI. 4. 11 plots the photon-flux-normalized 
absorption rate coefficient for singlet X X E+ -^ A A 1 T.* and triplet a A 1 EJ -^ 1 3 X E* molecular transitions over 
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a broad range of photon excitation, red detuned from the Na ( S — » P) atomic resonance line. The strongly 
modulated intensity envelopes are called Condon modulations, and they reflect the overlap between the ground- 
state continuum wavefunctions and the bound excited vibrational wavefunctions. We shall see later that these 
Condon modulations reveal detailed information about the ground state scattering wave function and potential from 
which accurate s-wave scattering lengths can be determined. Thorsheim et al [44], therefore, predicted all the 
notable features of ultracold photoassociation spectroscopy later to be developed in many experiments: (1) 
precision measurement of vibration-rotation progressions from which accurate excited- state potential parameters 
can be determined, (2) line profile measurements and analysis to determine collision temperature and threshold 
behaviour and (3) spectral intensity modulation from which the ground-state potential, the scattering wave function 
and the s-wave scattering length can be characterized with great accuracy. 
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Figure Cl.4.10. Calculated free-bound photoassociation spectrum at 10 mK. 
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Figure Cl.4.11. Calculated absorption spectrum of photoassociation in Na at 10 mK, showing Condon 
fluctuations. 
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An important difference distinguishes ambient-temperature photoassociation in rare-gas halide systems and sub- 
milliKelvin temperature photoassociation in cooled and confined alkali systems. At temperatures found in MOTs 
and FORTs (and within selected velocity groups in atomic beams) the collision dynamics are controlled by long- 
range electrostatic interactions, and Condon points R c are typically at tens to hundreds of a^. In the case of the 
rare-gas halides the Condon points are in the short-range region of chemical binding and, therefore, free-bound 
transitions take place at much smaller internuclear distances, typically less than ten a Q . For the colliding A,B 
quasimolecule the pair density «asa function of R is given by 


n = n A n H 47t R 2 e" ^ (C1.4.53) 

so the density of pairs varies as the square of the internuclear separation. Although the pair-density R dependence 
favours long-range photoassociation, the atomic reactant pressures are quite different with the n A n B product of the 

order of 10 cm for rare-gas halide photoassociation and only about 10 cm for optically trapped atoms. 
Therefore the effective pair density available for rare-gas halide photoassociation greatly exceeds that for cold 
alkali photoassociation, permitting fluorescence detection and dispersion by high-resolution (but inefficient) 
monochromators. 

C1. 4.3.2 ASSOCIATIVE AND PHOTOASSOCIATIVE IONIZATION 

Conventional associative ionization (AI) occurring at ambient temperature proceeds in two steps: excitation of 
isolated atoms followed by molecular autoionization as the two atoms approach on excited molecular potentials. In 
sodium for example [ 44 ] 

Na + fi*>^ Na 4 (C1.4.54) 

Na* + Na* -* NaJ + e. (C1 .4.55) 

The collision event lasts a few picoseconds, fast compared to radiative relaxation of the excited atomic states (~ 
tens of nanoseconds). Therefore the incoming atomic excited states can be treated as stationary states of the system 
Hamiltonian, and spontaneous radiative loss does not play a significant role. In contrast, cold and ultracold 
photoassociative ionization (PAI) must always start on ground states because the atoms move so slowly that 
radiative lifetimes become short compared to collision duration. The partners must be close enough at the Condon 
point, where the initial photon absorption takes place, so that a significant fraction of the excited scattering flux 
survives radiative relaxation and goes on to populate the final inelastic channel. Thus PAI is also a two-step 
process: (1) photoexcitation of the incoming scattering flux from the molecular ground-state continuum to specific 
vibration-rotation levels of a bound molecular state and (2) subsequent photon excitation either to a doubly excited 
molecular autoionizing state or directly to the molecular photoionization continuum. For example, in the case of 
sodium collisions the principal route is through doubly excited autoionization [45] 
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Na + Na + Ticoi -*> Na^ + ftioi -> Na** -* Na£ + e 


(C1.4.56) 


whereas for rubidium atoms the only available route is direct photoionization in the second step [ 46 ] 

Rb + Rb + Thoi -* Rb^ + Tut* -> Rb^ + e. (C1 .4.57) 

Collisional ionization can play an important role in plasmas, flames and atmospheric and interstellar physics and 
chemistry. Models of these phenomena depend critically on the accurate determination of absolute cross sections 
and rate coefficients. The rate coefficient is the quantity closest to what an experiment actually measures and can 
be regarded as the cross section averaged over the collision velocity distribution, 




v<r{v)f(v)dv> (C1.4.58) 


The velocity distribution f(v) depends on the conditions of the experiment. In cell and trap experiments it is usually 
a Maxwell-Boltzmann distribution at some well defined temperature, but/(v) in atomic beam experiments, arising 
from optical excitation velocity selection, deviates radically from the normal thermal distribution [47]. The actual 
signal count rate, llfJc ^, relates to the rate coefficient through 


i\t 


' d(JQ =tf[X] 2 (C1.4.59) 


Va <lt 


where Fis the interaction volume, a the ion detection efficiency and [X\ the atom density. If rate constant or cross 
section measurements are carried out in crossed or single atomic beams [44, 47, 48] special care is necessary to 
determine the interaction volume and atomic density. 

PAI was the first measured collisional process observed between cooled and trapped atoms [26]. The experiment 
was performed with atomic sodium confined in a hybrid laser trap, utilizing both the spontaneous radiation pressure 
and the dipole force. The trap had two counterpropagating, circularly polarized Gaussian laser beams brought to 
separate foci such that longitudinal confinement along the beam axis was achieved by the spontaneous force and 
transversal confinement by the dipole force. The trap was embedded in a large (-1 cm diameter) conventional 
optical molasses loaded from a slowed atomic beam. The two focused laser beams comprising the dipole trap were 
alternately chopped with a 3 u^ 'trap cycle', to avoid standing-wave heating. This trap cycle for each beam was 
interspersed with a 3 jus 'molasses cycle' to keep the atoms cold. The trap beams were detuned about 700 MHz to 
the red of the 3s S 1/2 (F = 2) — > 3 p I^CF = 3) transition while the molasses was detuned only about one natural 
line width (-10 MHz). The atoms captured from the molasses (~ 10 7 cm -3 ) were compressed to a much higher 
excited atom density (~ 5 x 10 cm ) in the trap. The temperature was measured to be about 750 juK. Ions formed 
in the trap were accelerated and 
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focused toward a charged-particle detector. To assure the identity of the counted ions, Gould et al [26] carried out a 
time-of- flight measurement; the results of which, shown in figure CI. 4. 12 clearly establish the Nation product. The 

linearity of the ion rate with the square of the atomic density in the trap supported the view that the detected 
Nations were produced in a binary collision. After careful measurement of ion rate, trap volume and excited atom 

density, the value for the rate coefficient was determined to be JC — ( | , | ^ ) x 1G _I ' cm 3 s~ '■ Gould et al [26], 

following conventional wisdom, interpreted the ion production as originating from collisions between two excited 
atoms, 


s— K r- 


(r)d> = tfn,A/ c 


(C1.4.60) 


dM. 


where -jps the ion production rate, n Q (r) the excited-state density, N Q the number of excited atoms in the trap (= 

J^ e ( r ) ? d (T) and ii the 'effective' excited-state trap density. The value for K was then determined from these 

measured parameters. Assuming an average collision velocity of 130 cm s _1 , equivalent to a trap temperature of 
750 juK, the corresponding cross section was determined to be a = (S.6* ] ^ 3 ) x 10" IJ cm 3 - In contrast the cross 

section at - 575 K had been previously determined tobe~1.5><10 cm [49, 50 and 51]. Gould et al [ 26 ] 
rationalized the difference in cross section size by invoking the difference in de Broglie wavelengths, the number 
of participating partial waves and the temperature dependence of the ionization channel probability. The quantal 
expression for the cross section in terms of partial wave contributions 1 and inelastic scattering probability S l2 is 


<Tj2 


(e) = (^) E (2/ + ui*i2^oi a - ^fe) °™* + 1)2f>i2 


(C1.4.61) 


where A, dB is the entrance channel de Broglie wavelength and P^ 2 is the probability of the ionizing collision 

channel averaged over all contributing partial waves of which / max is the greatest. The ratio of (/ x + l) 2 between 
575 K and 750 ju K is about 400 and the de Broglie wavelength ratio factor varies inversely with temperature. 
Therefore, in order that the cross section ratio be consistent with low- and high-temperature experiments, 
"rOyijTK) "^ ' ■' x '^ "^ Gould et al [26] concluded that P^ 2 must be about three times greater at 575 K than at 750 

JLlK. 
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Figure Cl.4.12. Time-of-flight spectrum clearly showing that the ions detected are ^'^and not the atomic ion. 


However, it soon became clear that the conventional picture of associative ionization, starting from the excited 
atomic states, could not be appropriate in the cold regime. Julienne [ 52 ] pointed out the essential problem with this 
picture. In the molasses cycle the optical field is only red detuned by one line width, and the atoms must therefore 
be excited at very long range, near 1800 a^. The collision travel time to the close internuclear separation where 
associative ionization takes place is long compared to the radiative lifetime, and most of the population decays to 
the ground state before reaching the autoionization zone. During the trap cycle, however, the excitation takes place 
at much closer internuclear distances due to a 70 line-width red detuning and high-intensity field dressing. 
Therefore, one might expect excitation survival to be better on the trap cycle than on the molasses cycle, and the 
NIST group set up an experiment to test the predicted cycle dependence of the ion rate. 


Lett et al [53] performed a new experiment using the same hybrid trap. This time, however, the experiment 
measured ion rates and fluorescence separately as the hybrid trap oscillated between 'trap' and 'molasses' cycles. 
The results from this experiment are shown in figure CI. 4. 13 . While keeping the total number and density of atoms 
(excited atoms plus ground-state atoms) essentially the same over the two cycles and while the excited state 
fraction changed only by about a factor of two, the ion rate increased in the trapping cycle by factors ranging from 
20 to 200 with most observations falling between 40 and 100. This verified the predicted effect qualitatively even if 

the magnitude was smaller than the estmated 10 factor of Julienne [52]. This modulation ratio is orders of 
magnitude more than would be expected if excited atoms were the origin of the associative ionization signal. 
Furthermore, by detuning the trapping lasers over 4 GHz to the red, Lett et al continued to measure ion production 
at rates comparable to those measured near the atomic resonance. At such large detunings, reduction in atomic 
excited-state population would have led to reductions in ion rate by over four orders of magnitude, had the excited 
atoms been the origin of the collisional ionization. Not only did far off-resonance trap cycle detuning maintain the 
ion production rate, but Lett et al [53] observed evidence of peak structure in the ion signal as the dipole trap cycle 
detuned to the red. 
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Figure Cl.4.13. Trap modulation experiment showing much greater depth of ion intensity modulation (by more 
than one order of magnitude) than fluorescence or atom number modulation, demonstrating that excited atoms are 
not the origin of the associative ionizing collisions. 

To interpret this experiment, Julienne and Heather [ 45 ] proposed a mechanism that has become the standard picture 
for cold and ultracold photoassociative ionization. Figure CI. 4. 14 details the model. 
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Figure C1.4.14.Photoassociative ionization (PAI) in Na collisions. 
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Two colliding atoms approach on the molecular ground-state potential. During the molasses cycle with the optical 
fields detuned only about one line width to the red of atomic resonance, the initial excitation occurs at very long 
range, around a Condon point at 1800 a^. A second Condon point at 1000 a^ takes the population to a 1 doubly 
excited potential that, at shorter internuclear distance, joins adiabatically to a 3 J~^ + potential, thought to be the 

principal short-range entrance channel to associative ionization [54, 55 ]. More recent calculations suggest other 
entrance channels are important as well [56]. The long-range optical coupling to excited potentials in regions with 
little curvature implies that spontaneous radiative relaxation will depopulate these channels before the approaching 
partners reach the region of small internuclear separation where associative ionization takes place. The overall 
probability for collisional ionization during the molasses cycle remains therefore quite low. In contrast, during the 
trap cycle the optical fields are detuned 60 line widths to the red of resonance, the first Condon point occurs at 450 


a Q ; and, if the trap cycle field couples to the long-range molecular state [57], the second Condon point occurs at 
60 a Q . Survival against radiative relaxation improves greatly because the optical coupling occurs at much shorter 

range where excited-state potential curvature accelerates the two atoms together. Julienne and Heather [ 45 ] 

calculate about a three-orders-of-magnitude enhancement in the rate constant for collisional ionization during the 

trap cycle. The dashed and solid arrows in figure CI. 4. 14 indicate the molasses-cycle and trap-cycle pathways, 

respectively. The strong collisional ionization rate constant enhancement in the trap cycle calculated by Julienne 

and Heather [45] is roughly consistent with the measurements of Lett et al [53], although the calculated modulation 

ratio is somewhat greater than what was actually observed. Furthermore, Julienne and Heather calculate structure in 

the trap detuning spectrum. As the optical fields in the dipole trap tune to the red, a rather congested series of ion 

peaks appear, which Julienne and Heather ascribed to free-bound association resonances corresponding to 

vibration-rotation bound levels in the 1 or the ~ molecular excited states. The density of peaks corresponded 
roughly to what Lett et al [53 ] had observed; these two tentative findings together were the first evidence of a new 
photoassociation spectroscopy. In a subsequent full paper expanding on their earlier report, Heather and Julienne 
[ 58 ] introduced the term 'photoassociative ionization' to distinguish the two-step optical excitation of the 
quasimolecule from the conventional associative ionization collision between excited atomic states. In a very recent 
paper, Pillet et al [59] have developed a perturbative quantum approach to the theory of photoassociation, which 
can be applied to the whole family of alkali homonuclear molecules. This study presents a useful table of 
photoassociation rates which reveals an important trend toward lower rates of molecule formation as the alkali 
mass increases and provides a helpful guide to experiments designed to detect ultracold molecule production. 


-30- 


REFERENCES 


[1] Phillips W D, Prodan J V and Metcalf H J 1985 Laser cooling and electromagnetic trapping of neutral 
atoms J.Opt.Soc.Am. B 2 1751-67 

[2] Dalibard J and Cohen-Tannoudji C 1985 Dressed-atom approach to atomic motion in laser light: the 
dipole force revisited J.Opt.Soc.Am. B 21707-20 

[3] Metcalf H and van der Straten P 1994 Cooling and trapping of neutral atoms Phys. Rep. 244 203-86 

[4] Adams C S and Riis E 1997 Laser cooling and trapping of neutral atoms Prog. Quant. Electr. 21 1-79 

[5] Adams C S, Carnal O and Mlynek J 1994 Atom interferometry Adv. At. Mol. Opt. Phys. 34 1-33 

[6] Adams C S, Sigel M and Mlynek J 1994 Atom optics Phys. Rep. 240 143-210 

[7] Morinaga M, Yasuda M, Kishimoto T and Shimizu F 1996 Holographic manipulation of a cold atomic 
beam Phys. Rev. Lett. 77 802-5 


[8] Jessen P S and Deutsch I H 1996 Optical lattices Adv. At. Mol. Opt. Phys. 37 95-138 

[9] Suominen K-A 1996 Theories for cold atomic collisions in light fields J.Phys.BiAt.Mol.Opt.Phys. 29 
5981-6007 

[10] Frisch C R 1933 Experimenteller Nachweis des Einsteinschen Strahlungsruckstosses Z.Phys. 86 42-8 

[11] Ashkin A 1970 Acceleration and trapping of particles by radiation pressure Phys. Rev. Lett. 24 156-9 

[12] Stenholm S 1986 The semiclassical theory of laser cooling Rev.Mod.Phys. 58 699-739 

[13] Cohen-Tannoudji C, Dupont-Roc J and Grynberg G 1992 Atom-Photon Interactions: Basic Processes 
and Applications (New York: Wiley) 

[14] Cook R J 1979 Atomic motion in resonant radiation: an application of Earnshaw's theorem Phys. Rev. 
A 20 224-8 

[15] Cook R J 1980 Theory of resonant-radiation pressure Phys. Rev. A 22 1078-98 

[16] Lett P D, Watts R N, Westbrook C I, Phillips W D, Gould P L and Metcalf H J 1988 Observation of 
atoms, laser-cooled below the Doppler limit Phys. Rev. Lett. 61 169-72 

[17] Dalibard J and Cohen-Tannoudji C 1989 Laser cooling below the Doppler limit by polarization 
gradients: simple theoretical models J.Opt.Soc.Am. B 6 2023-45 

[18] Pritchard D E, Raab E L, Bagnato V, Wieman C E and Watts R N 1986 Light traps using spontaneous 
forces Phys.Rev.Lett. 57 310-13 

[19] Ashkin A and Gordon J P 1983 Stability of radiation-pressure particle traps: an optical Earnshaw 
theorem Opt. Lett. 8 511-13 

[20] Raab E, Prentiss M, Cable A, Chu S and Pritchard D E 1987 Trapping of neutral sodium atoms with 
radiation pressure Phys.Rev.Lett. 59 2631-4 

[21] Walker T, Sesko D and Wieman C 1990 Collective behavior of optically trapped neutral atoms 
Phys.Rev.Lett. 64 408-11 


-31- 


[22] Cable A, Prentiss M and Bigelow N P 1990 Observation of sodium atoms in a magnetic molasses trap loaded by a 
continuous uncooled source Opt. Lett. 15 507-9 

[23] Monroe C, Swann W, Robinson H and Wieman C 1990 Very cold trapped atoms in a vapor cell Phys.Rev.Lett. 65 
1571-4 

[24] Sesko D W, Walker T G and Wieman C 1991 Behavior of neutral atoms in a spontaneous force trap 
J.Opt.Soc.Am. B 8 946-58 

[25] Ketterle W, Davis K B, Joffe M A, Martin A and Pritchard D 1993 High densities of cold atoms in a dark 
spontaneous-force optical trap Phys.Rev.Lett. 70 2253-6 

[26] Gould P L, Lett P D, Julienne P S, Phillips W D, Thorsheim H R and Weiner J 1 988 Observation of associative 
ionization of ultracold laser-trapped sodium atoms Phys.Rev.Lett. 60 788-91 

[27] Miller J D, Cline R A and Heinzen D J 1993 Far-off-resonance optical trapping of atoms Phys. Rev. A 47 R4567-70 

[28] Miller J D, Cline R A and Heinzen D J 1993 Photoassociation spectrum of ultracold Rb atoms Phys.Rev.Lett. 71 
2204-7 

[29] Lovelace RVE, Mehanian C, Tommila T J and Lee D M 1985 Magnetic confinement of a neutral gas Nature 318 
30-6 

[30] Migdall A L, Prodan J V, Phillips W D, Bergman T H and Metcalf H J 1985 First observation of magnetically 
trapped neutral atoms Phys.Rev.Lett. 54 2596-9 

[31] Gott Y V, loffe M S and Telkovsky V G 1962 Nuclear Fusion Suppl. part 3 (Vienna: International Atomic Energy 
Agency) p 1045 

[32] Pritchard D E 1983 Cooling neutral atoms in a magnetic trap for precision spectroscopy Phys.Rev.Lett. 51 1336-9 

[33] Bagnato V S, Lafyatis G P, Martin A C, Raab E L, Ahmad-Bitar R and Pritchard D E 1987 Continuous stopping 
and trapping of neutral atoms Phys.Rev.Lett. 58 2194-7 


[34] Hess H F, Kochanski G P, Doyle J M, Greytak T J, and Kleppner D 1986 Spin-polarized hydrogen maser 
Phys.Rev. A 34 1602^ 

[35] Mewes M-O, Andrews M R, van Druten N J, Kurn D M, Durfee D S and Ketterle W 1996 Bose-Einstein 
condensation in a tightly confining DC magnetic trap Phys.Rev. Lett. 77 416-19 

[36] Anderson M H, Ensher J R, Matthews M R, Wieman C E and Cornellm E A 1995 Observation of Bose-Einstein 
condensation in a dilute atomic vapor Science 269 198-201 

[37] Davis K B, Mewes M-O, Andrews M R, van Druten N J, Durfee D S, Kurn D M and Ketterle W 1995 Bose-Einstein 
condensation in a gas of sodium atoms Phys.Rev. Lett. 75 3969-73 

[38] Vigue J 1986 Possibility of applying laser-cooling techniques to the observation of collective quantum effects 
Phys.Rev. A 34 4476-9 

[39] Scheingraber H and Vidal C R 1977 Discrete and continuous Franck-Condon factors of the Mg 2 A 1 A' Sj - 
X 1 X ' E* system and their J dependence J.Chem.Phys. 66 3694-704 

[40] Inoue G, Ku J K and Setser D W 1982 Photoassociative laser induced fluorescence of XeCI J.Chem.Phys. 76 
733-4 

[41] Jones R B, Schloss J H and Eden J G 1993 Excitation spectra for the photoassociation of Kr-F and Xe-I collision 
pairs in the ultraviolet (209-258 nm) J.Chem.Phys. 98 4317-34 


-32- 


[42] Tellinghuisen J 1985 Photodissociation and Photoionization (Advances in Chemical Physics LX) ed K P Lawley 
(New York: Wiley) pp 299-369 

[43] Thorsheim H R, Weiner J and Julienne P S 1987 Laser-induced photoassociation of ultracold sodium atoms 
Phys.Rev. Lett. 58 2420-3 

[44] Weiner J, Masnou-Seeuws F and Guisti-Suzor A, Associative ionization: experiments, potentials, and dynamics 
Advances in Atomic, Molecular and Optical Physics vol 26, ed D Bates and B Bederson (Boston: Academic) pp 
209-96 

[45] Julienne P S and Heather R 1991 Laser modification of ultracold atomic collisions: theory Phys.Rev. Lett. 67 2135- 
8 

[46] Leonhardt D and Weiner J 1995 Direct two-color photoassociative ionization in a rubidium magneto-optic trap 
Phys.Rev. A 52 R4332-R4335 

[47] Tsao C-C, Napolitano R, Wang Y and Weiner J 1995 Ultracold photoassociative ionization collisions in an atomic 
beam: optical field intensity and polarization dependence of the rate constant Phys.Rev. A 51 R18-21 

[48] Thorsheim H R, Wang Y and Weiner J 1990 Cold collisions in an atomic beam Phys.Rev. A 41 2873-6 

[49] Bonanno R, Boulmer J and Weiner J 1983 Determination of the absolute rate constant for associative ionization in 
crossed-beam collision between Na 3 2 P 3/2 atoms Phys.Rev. A 28 604-8 

[50] Wang M-X, Keller J, Boulmer J and Weiner J 1 986 Strong velocity dependence of the atomic alignment effect in 
Na(3p) + Na(3p) associative ionization Phys.Rev. A 34 4497-500 

[51] Wang M-X, Keller J, Boulmer J and Weiner J 1987 Spin-selected velocity dependence of the associative ionization 
cross section in Na(3p) + Na(3p) collisions over the collision energy range from 2.4 to 290 meV Phys.Rev. A 35 
934-7 

[52] Julienne P S 1988 Laser modification of ultracold atomic collision in optical traps Phys.Rev. Lett. 61 698-701 

[53] Lett P D, Jessen P S, Phillips W D, Rolston S L, Westbrook C I and Gould P L 1991 Laser modification of ultracold 
collisions: experiment Phys.Rev. Lett. 67 2139-42 

[54] Dulieu O, Giusti-Suzor A and Masnou-Seeuws F 1991 Theoretical treatment of the associative ionization reaction 
between two laser-excited sodium atoms. Direct and indirect processes J.Phys.BiAt.Mol.Opt.Phys. 24 4391-408 

[55] Henriet A, Masnou-Seeuws F and Dulieu O 1991 Diabatic representation for the excited states of the Na 2 

molecule: application to the associative ionization reaction between two excited sodium atoms Z.Phys. D 18 287- 
98 

[56] Dulieu O, Magnier S and Masnou-Seeuws F 1994 Doubly-excited states for the Na 2 molecule: application to the 
dynamics of the associative ionization reaction Z.Phys. D 32 229-40 


[57] Stwalley W C, Uang Y-H and Pichler G 1978 Pure long-range molecules Phys.Rev.Lett 41 1 164-6 

[58] Heather R W and Julienne P S 1993 Theory of laser-induced associative ionization of ultracold Na Phys.Rev. A 47 
1887 

[59] Pillet P, Crubellier A, Bleton A, Dulieu O, Nosbaum P, Mourachko I and Masnou-Seeuws F 1997 Photoassociation 
in a gas of cold alkali atoms: I. Perturbative quantum approach J.Phys.BiAt.Mol.Opt.Phys. 30 2801-20 


-33- 


FURTHER READING 

A good introduction to the physics of laser cooling and trapping can be found in two special issues of the Journal 
of the Optical Society of America B. These are: 

1985 The mechanical effects of light J.Opt.Soc.Am. B 2 11 

1989 Laser cooling and trapping of atoms J.Opt.Soc.Am. B 6 1 1 

Two recent reviews recount subsequent research in the physics of neutral-atom cooling and trapping [3, 4]. 

For an introduction to current research in alkali-atom BEC see the special issue on BEC in the Journal of Research 
of the National Institute of Standards and Technology: 

1996 Journal of Research of the National Institute of Standards and Technology 101 4 

Another useful review can be found by: 

Meschede D et al 1998 Naturwissenschaften 85 203-18 
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C1.5 Single molecule spectroscopy 

Anne Myers Kelley 


C1.5.1 INTRODUCTION 

Until the late 1980s, virtually all molecular spectroscopic measurements involved observing a signal having 
contributions from a large number of different molecules, 'large' meaning much greater than one. While 
spectroscopists have long had the ability to detect and count individual photons or ions each originating from a 
single molecule, the statistical averaging required to make a meaningful measurement generally required observing 
many such events from many different molecules. Only recently have experimental techniques been developed that 
allow interrogation of fundamentally quantum mechanical entities on a one-by-one basis. These developments are 
driving revolutionary changes in the way molecular scientists make and interpret physical measurements. For 
example, simple organic molecules that are chemically identical, distinguished only by slightly different local 
environments within a solid, have been shown to have distinctly different electronic and vibrational spectra, 
linewidths, electric field and pressure-induced spectral shifts, and fluorescence lifetimes and quantum yields. The 
ability to correlate these various spectroscopic properties on a molecule-by-molecule basis is providing powerful 
insight into the details of intermolecular interactions. Single-molecule techniques have also shown that apparently 
pure, homogeneous enzyme preparations contain molecules having a wide range of catalytic activities and that an 


individual enzyme's catalytic activity retains a 'memory' of its past history. This new information is stimulating a 
re-evaluation of established models for the chemical kinetics of biological systems. Single-molecule experiments 
involve sequential measurements of a given observable on the same molecule at different times and, if a time 
average is equivalent to an ensemble average (the ergodic hypothesis), no additional information is gained by 
probing individual members. The value of single-molecule measurements lies precisely in the fact that in many 
systems of interest, different members of the ensemble remain distinct on time scales much longer than that 
required to perform an experiment. 

A wide variety of measurements can now be made on single molecules, including electrical (e.g. scanning 
tunnelling microscopy), magnetic (e.g. spin resonance), force (e.g. atomic force microscopy), optical (e.g. near- 
field and far-field fluorescence microscopies) and hybrid techniques. This contribution addresses only those 
techniques that are at least partially optical. Single-particle electrical and force measurements are discussed in the 
sections on scanning probe microscopies (B1.19) and surface forces apparatus (B1.20). 

C1. 5.2 HISTORY 

The approach to, and finally the achievement of, detection and spectroscopy of single molecules proceeded almost 
independently from three separate directions. 

C1. 5.2.1 SPECTRAL SELECTION IN CRYOGENIC SOLIDS 

The development of tunable, narrow-bandwidth dye laser sources in the early 1970s gave spectroscopists a new 
tool for selectively exciting small subsets of molecules within inhomogeneously broadened ensembles in the solid 
state. The technique of fluorescence line-narrowing [1, 2 and 3] takes advantage of the fact that relatively rigid 
chromophoric 
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molecules in dilute mixed solids at low temperatures often have very narrow electronic origin transitions even 
when the bulk absorption spectrum is severely broadened by an inhomogeneous distribution of spectrally distinct 
environments. By tuning a narrow-band excitation source into resonance with the low-energy side of the absorption 
band, only those molecules that absorb at that precise frequency can be excited. This subensemble, having a well 
defined electronic origin frequency, will then produce spectrally sharp emissions. As the sample is made 
increasingly dilute, and/or the excitation is tuned progressively farther toward the red edge of the absorption 
spectrum, the number of molecules on resonance with the laser decreases, eventually becoming either zero or one. 

The first clearly demonstrated optical detection of single chromophores was published by Moerner's group at IBM 
Almaden 1989 on the mixed crystal system pentacene in /?-terphenyl at 1.6 K [4, 5]. They utilized a double- 
modulation direct absorption technique employing frequency modulation of the laser source coupled with electric 
field or ultrasonic strain modulation of the absorption line. Direct absorption is not known for its high sensitivity, 
and these preliminary experiments achieved only modest signal-to-noise ratios. Shortly thereafter, Orrit's group at 
Bordeaux demonstrated single-molecule detection in the same system with the fundamentally much more sensitive 
technique of fluorescence excitation — fluorescence line-narrowing carried to the extreme of a single resonant 
chromophore at a time (figure CI. 5.1) [6]. Fluorescence excitation has since been the technique of choice for 
nearly all single-molecule optical experiments, although some refinements in direct absorption detection have 
recently been demonstrated [7]. 
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Figure Cl.5.1. (A) Schematic diagram showing the principle of single-molecule spectral selection in solids at low 
temperatures. The inhomogeneously broadened electronic origin is composed of a superposition of the Lorentzian 
profiles of individual molecules, with a Gaussian distribution of centre frequencies caused by random strains and 
defects in the surrounding environment. (B) The number of dopant molecules in the probed volume on resonance 
with a narrow-band laser can be controlled by tuning the laser wavelength to the red side of the inhomogeneous 
band. Reprinted with permission from Moerner [ 178 ]. Copyright 1994 American Association for the Advancement 
of Science. 
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C1. 5.2.2 FLUORESCENCE STATISTICS IN LIQUID SOLUTIONS 


Fluorescence, because of its essentially background-free nature, has long been appreciated by both analytical and 
physical chemists for its high sensitivity. Photon counting techniques allow detection of single photons each 
emitted by a single molecule, although under normal conditions the total signal is composed of photons arising 
from many different molecules. Hirschfeld was apparently the first to demonstrate that individual molecules (in his 
case, large antibodies each tagged with 80-100 fluorophores) in a highly dilute solution could be detected by the 
burst of fluorescence emitted by each molecule as it diffused through the observation volume of an optical 
microscope [§]. The Keller [9] and Mathies [10] groups subsequently combined this idea with a more detailed 
analysis of the photon burst statistics to detect single molecules of the highly fluorescent multichromophoric 
protein phycoerythrin. The first demonstration of genuine smgle-chrornophore detection in liquid solution was 
published by Shera and co-workers in 1990 on the laser dye rhodamine 6G in water [11], using pulsed excitation 
and time-gated detection to red uce background counts. Somewhat later, Nie et al [12] demonstrated that with some 
modifications to the detection system, a commercial laser confocal microscope could also be made sensitive 
enough to detect single chromophores diffusing in and out of the detection volume. 

C1. 5.2.3 SPATIAL SELECTION IN SOLIDS AND ON SURFACES 


A third approach to single-molecule optical detection began with the development of near- field scanning optical 
microscopy (NSOM) in the 1980s (see section 1.19). NSOM allows optical measurements on surfaces to be made 
with a resolution approaching ?J40 in the best cases [13], which for visible light is comparable to the size of single 
large molecules such as proteins and polymers. The first observation of single molecules by NSOM was reported in 


1993 by Betzig and Chichester [14]. Shortly thereafter, several groups demonstrated that the techniques of ordinary 
far-field fluorescence microscopy, if coupled with highly sensitive, low-noise detectors, can also detect single 
molecules as long as they are spaced far enough apart that the limiting resolution of about X/2 is adequate. Far- field 
fluorescence microscopy is technically simpler than NSOM and has the advantage of not being restricted to 
surfaces, and has become the technique of choice for spatial selection of single molecules as long as the sample can 
be made sufficiently dilute that the additional resolution of the near- field technique is not needed. 


C1.5.3 PRINCIPLES AND TECHNIQUES OF SINGLE-MOLECULE 
OPTICAL 

C1. 5.3.1 FLUORESCENCE 

The vast majority of single-molecule optical experiments employ one-photon excited spontaneous fluorescence as 
the spectroscopic observable because of its relative simplicity and inherently high sensitivity. Many molecules 
fluoresce with quantum yields near unity, and spontaneous fluorescence lifetimes for chromophores with large 
oscillator strengths are a few nanoseconds, implying that with a sufficiently intense excitation source a single 

o 

molecule should be able to absorb and emit of the order of 10 photons per second. Additionally, in most molecules 
much of the emitted light is sufficiently red-shifted from the excitation frequency to allow detection against a 
nominally near-zero background. 

A number of experimental and physical realities cloud this rosy picture. Inevitably many emitted photons are lost 
due to the finite solid angle over which the fluorescence is collected, losses at the various filters, lenses, windows, 
and other 
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optical elements between the sample and the detector, and the quantum efficiency of the photodetector. Most 
experimental configurations actually detect emitted photons with an overall efficiency of only a few per cent. All 
molecules have nonzero, if small, quantum yields for forming long-lived metastable states (often triplet states), 
reducing the number of absorption-emission cycles that can be accomplished per second. The intensity of the 
excitation source has to be kept low enough that it does not induce undesired nonlinear optical effects ( section B 1.1 
and section B 1.5 ). Finally, all known molecules undergo photochemical degradation with some nonzero yield, and 
this photobleaching is generally the limiting factor in determining the total number of photons that can be collected 
from a single molecule. Nevertheless, it is routinely possible with strongly fluorescent chromophores to detect 10 - 
10 photons per second. The challenging part of performing single-molecule fluorescence detection is not the 
absolute size of the signal, which is huge compared with that typical of many conventional ensemble-averaged 
spectroscopies such as Raman, but rather ascertaining that the signal arises from only one molecule, reducing the 
background count level, and obtaining a statistically significant amount of data from the molecule under 
observation before it photobleaches. 

(A) SPECTRAL SELECTION 

The spectral selection approach [15, 16 and 17] is based on the fact that purely lifetime-limited line widths for 
electronic transitions of molecules rarely exceed 10-20 MHz, whereas the apparent width of the electronic origin in 
condensed-phase molecular spectra is typically orders of magnitude greater even in single crystals and certainly in 
polymers and glasses. At temperatures below 4 K, where little thermal population of phonons is possible, this 
additional width is ascribed to slightly different local environments for different chromophores, each giving rise to 
a slightly different spectral shift. When a spectrally narrow laser (bandwidth on the order of the intrinsic lifetime- 
limited width) is tuned through the ensemble-broadened electronic origin, only those chromophores on resonance 
with that particular laser frequency can absorb, and as the laser is tuned into the wings of the spectral line, the 
number of molecules on resonance approaches either zero or one ( figure CI. 5.1 ) and figure CI. 5. 2 . In practice this 
works best when tuning to the red side of the electronic origin; on the blue side, the zero-phonon lines (pure 


electronic origin transitions) of blue-shifted molecules are degenerate with the much broader phonon sidebands 
(electronic excitation of the chromophore coupled with excitation of low-frequency matrix or intermolecular 
vibrations) of redder- shifted molecules. Typically the emission is collected with mirror or lens systems placed 
inside the cryostat to collect light over the largest possible solid angle, and observed with a high quantum 
efficiency detector, either a photomultiplier tube or an avalanche photodiode, through one or more long-pass or 
bandpass filters to block scattered or transmitted laser light. Alternatively, the emitted light may be dispersed with a 
spectrograph and detected with a high-efficiency array photodetector, generally a CCD. 
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Figure Cl.5.2. Fluorescence excitation spectra (cps = counts per second) of pentacene in/?-terphenyl at 1.5 K. (A) 
Broad scan of the inhomogeneously broadened electronic origin. The spikes are repeatable features each due to a 
different single molecule. The laser detuning is relative to the line centre at 592.321 nm. (B) Expansion of a 2 GHz 
region of this scan showing several single molecules. (C) Low-power scan of a single molecule at 592.407 nm 
showing the lifetime -limited width of 7.8 MHz and a Lorentzian fit. Reprinted with permission from Moerner 
[ 198 ]. Copyright 1994 American Association for the Advancement of Science. 

The criterion that single molecules are being observed is often taken to be the appearance in a fluorescence 
excitation frequency scan of well separated peaks that have about the same intensity and width, separated by 
regions of flat background. However, one cannot be certain that a given peak does not arise from two molecules 
with accidentally degenerate resonant frequencies. Stronger evidence is provided by observing that when 
spontaneous or photoinduced spectral jumps or photobleaching occur, the fluorescence excitation feature jumps to 
a new frequency or disappears entirely from the scanned region in a quantized, all-or-nothing manner. Probably the 
best evidence for single-molecule observation comes from the statistics of photon emission on short time scales. 
Since a single molecule must experience a nonzero time interval between successive photon emissions (after 
emitting a photon, it must absorb one prior to its next emission), the probability of emitting two photons with zero 
time delay goes to zero for a single molecule, the phenomenon known as photon antibunching. 


The requirement of a very sharp and strong electronic origin absorption line limits the technique to strongly 
absorbing and fluorescing, relatively rigid chromophores and matrices having little Franck-Condon activity in low- 
frequency 
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vibrations. The requirement that the electronic origin be spectrally quite stable limits the technique to very low 
temperatures and to chromophore-matrix combinations in which spectral diffusion and photophysical hole-burning 
processes are slow. Thus, while the intrinsically high spectral resolution of this technique allows detailed 
spectroscopic and dynamical studies on individual molecules, the number of material systems to which it can be 
applied seems to be quite limited. 

(B) SPATIAL SELECTION WITH NEAR-FIELD OPTICS 

The development of near-field scanning optical microscopy (NSOM) as a viable experimental technique opened up 
the possibility of performing optical measurements with spatial resolution on molecular scales, just as scanning 
tunnelling microscopy (STM) allows imaging through electrical measurements down to the atomic level. By 
forcing the excitation light to pass through a metallized tip with an aperture much smaller than the wavelength of 
the light, and placing the sample in the near field of the tip (much less than a wavelength away), a variety of optical 
microscopies can be performed at a resolution much better than the classical far-field limit of «A/2 [13, 14, 15, 16, 
17 and 18]. Apertureless variants of NSOM have also been described [19 and 20]. The NSOM technique in general 
is described in more detail in section B 1.20 . 

NSOM in fluorescence mode can easily detect single molecules that are spatially separated by tens of nanometres 
or more, as long as they are sufficiently photostable to withstand the necessary number of excitation-emission 
cycles [14, 15, 16, 17, 18, 19, 20 and 21 ]. The usual criterion for single-molecule observation is the appearance of 
single isolated spots in the NSOM fluorescence image ( figure CI. 5. 3 ), although two or more molecules that are 
accidentally in very close proximity are not always resolvable. The quantized nature of photobleaching events is 
another good criterion for single-molecule observation in spatially as well as spectrally selected techniques. NSOM 
is difficult to perform at low temperatures, but cryogenic near-field microscopes have been described [22] and 
demonstrated at the single-molecule level [23 and 24]- A significant limitation is that the chromophore of interest 
must be at or very near a surface to allow the tip to be brought into close proximity. The tips are notoriously 
difficult to fabricate in a reproducible manner, particularly when a very small aperture is desired, and can become 
quite hot during operation due to the laser power dissipated in the metal coating. The proximity of the metal tip to 
the chromophore can induce artifacts that have been discussed in some detail [25, 26 and 27]. In applications where 
sub-diffraction-limited optical resolution is needed, for example in studying biological or engineered 
supermolecular structures, NSOM is the only viable technique. If, however, the goal is merely to study single 
molecules and the samples can be prepared such that the molecules are spaced arbitrarily far apart, conventional 
far-field fluorescence microscopies are technically more straightforward and less subject to artifacts. 
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Figure Cl.5.3. Near- field fluorescence image 4.5 urn square) of single oxazine 720 molecules dispersed on the 
surface of a PMMA film. Each peak (fwhm 100 nm) is due to a single molecule. The different intensities are due to 
different molecular orientations and spectra. Reprinted with permission from Xie [122]. Copyright 1996 American 
Chemical Society. 

An intriguing alternative to NSOM is to engineer a tip bearing a single fluorescent molecule that can be excited at 
one wavelength, emitting light at a Stokes-shifted wavelength to be used as a highly spatially localized probe light 
source [28 and 29]. 

(C) SPATIAL SELECTION WITH FAR-FIELD OPTICS 

Confocal scanning laser fluorescence microscopy is a well established optical technique (see section B 1.1 9 ) that 
combines the transverse resolution common to any optical microscopy with a high degree of depth resolution that 
comes from requiring the fluorescence emission to follow the same optical path as the exciting laser light and pass 
through a pinhole conjugate to a pinhole through which the exciting laser was focused. A three-dimensional 
fluorescence image is mapped out by raster scanning either the exciting laser beam or the sample in the transverse 
plane and stepping the focusing lens to sample distance along the depth axis. If the fluorescence collection and 
detection systems are efficient enough, this technique can be sufficiently sensitive to detect emission from single 
molecules [12, 13, 14, 15, 16, 17, 18, 19, 20, 2122 , 23, 24, 25, 26, 27, 28, 29 and 30]. The limiting transverse 
resolution is determined by the numerical aperture of the laser focusing lens but cannot exceed about A/2, while the 
resolution along the depth axis is typically of the same order. Thus, for single molecules to be resolved directly 
they must be separated on average by roughly 0.2-0.5 um or more, although sub -diffraction resolution may be 
achieved through subsequent numerical processing of the images [31] or by making clever use of optical 
interference effects [32, 33]. Figure CI. 5. 4 compares near- field and far-field images of the same set of single 
molecules at a surface. Once a strong feature due to a single molecule has been identified, the centre of the focused 
spot can be moved to that molecule and more detailed spectroscopic or kinetic measurements made. The low 
resolution makes it difficult to be certain that a fluorescent spot truly arises from a single molecule, and it is useful 
to have other evidence such as photon emission statistics and/or quantized jumping or photobleaching to verify 
single-molecule observation. 
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Figure Cl.5.4. Comparison of near- field and far-field fluorescence images, spectra and lifetimes for the same set 
of isolated single molecules of a carbocyanine dye at a PMMA-air interface. Note the much higher resolution of 
the near-field image. The spectrum and lifetime of the molecule indicated with the arrow were recorded with near- 
field excitation and with far-field excitation at two different excitation powers. Reproduced with permission from 
Trautman and Macklin [ 125 ]. 

Forming an image by scanning the laser spot across the sample, or vice versa, minimizes the light dose received by 
each molecule and reduces photobleaching. The tradeoff is that it requires some time to gather an image. A 
fluorescent image can be obtained much more rapidly by irradiating a larger area in the transverse plane and 
imaging the emission from the entire area at once onto a two-dimensional photodetector. This approach is most 
useful for highly-photostable molecules at low temperatures [34, 15 and 36]- Photobleaching can be further reduced 
by employing an automatic positioning system with feedback to locate and centre the excitation on a single 
molecule as rapidly as possible [ 37 ] and also by excluding oxygen [38] and/or working at very low temperatures 
where most chromophores are more stable, although the latter adds considerable complexity to the experimental 
configuration [39, 40 and 41 ]. 

(D) STATISTICAL DETECTION METHODS 

The statistics of the detected photon bursts from a dilute sample of chromophores can be used to count, and to some 
degree characterize, individual molecules passing through the illumination and detection volume. This can be 
achieved either by flowing the sample rapidly through a narrow fluid stream that intersects the focused excitation 
beam or by allowing individual chromophores to diffuse into and out of the beam. If the sample is sufficiently 
dilute that 
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chromophores pass through the beam effectively one at a time, repetitive excitation and emission of each molecule 
during its passage time generates a burst of emitted photons, superimposed on a random background count level 
due to stray room and laser light, Raman scattering from the solvent, etc [9, 10 and H, 42, 43, 44, 45 and 46 ]. 
Figure CI. 5. 5 shows representative data from a configuration allowing free diffusion of chromophores into and out 
of the beam. A variety of statistical analyses of the photon bursts can be performed to improve the fidelity of 
detection and/or to discriminate between chromophores of different chemical species based on spectral and/or 
temporal features of the emission [44, 45, 47, 48, 49, 50, 51 and 52]. These statistical methods are employed 
mainly for counting molecules in analytical applications. The comparatively short observation time for each 
molecule limits the extent to which photophysical, spectroscopic or dynamic properties can be examined at the 
single-molecule level. 



Time* (a) 

Figure Cl.5.5. Time-dependent fluorescence signals observed from liquid solutions of rhodamine 6G by confocal 
fluorescence microscopy. Data were obtained with 514.5 nm excitation and detected through a 540-580 nm 

bandpass filter. The integration time is 1 ms per point. (A) Blank. (B) 2 x 10 M rhodamine 6G in water. Each 
peak represents detection of a single molecule diffusing into and then out of the detection volume. The inserts are 

close up views of the peaks indicated, showing the multiple detection of a single molecule. (C) 5 x 10 M 
rhodamine 6G in ethanol. Reprinted with permission from Nie et al [12]. Copyright 1994 American Association for 
the Advancement of Science. 
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C1. 5.3.2 RAMAN SCATTERING 


Raman scattering ( section B 1.3 ) is a notably weak process. In most experimental configurations no more than one 
in 10 laser photons is scattered into a given Raman line. While resonance enhancement may improve this to one 
in 10 , Raman would still appear to be a highly unpromising technique for single-molecule detection. Nevertheless, 
at least four different groups have recently claimed single-molecule sensitivity using surface-enhanced Raman 
scattering (SERS), an enhancement mechanism whose physical basis is still subject to some controversy. Nie and 
Emory [ 53 ] examined rhodamine 6G dye molecules bound to particles of colloidal silver that were immobilized on 
a substrate such that they could be imaged via confocal microscopy. Using very low dye concentrations and 
exciting with 514.5 nm light, they observed Raman scattering from only a very few particles that they attributed to 
particularly stable and highly surface-enhanced 'hot' binding sites ( figure CI. 5. 6 ) and figure CI. 5. 7 . The apparent 
enhancement is of the order of 10 over ordinary unenhanced Raman, whereas generally accepted values for 
SERS enhancements from ordinary bulk experiments are around 10 . More recent work by Brus et al essentially 
confirms and extends these observations 54- Kneipp et al examined crystal violet and a cyanine dye on colloidal 
silver with excitation in the near-infrared, away from both molecular electronic resonances and the principal 
particle plasmon resonance. They probed the particles in free solution and based their claim of single-chromophore 
detection on the excitation volume, dye-to-silver concentration ratio, and detection statistics [55, 56]. Finally, Kail 
et al examined the SERS spectra of haemoglobin attached to silver nanoparticles and concluded that single- 
molecule SERS is possible only for protein molecules situated between and bound to more than one silver particle 


[57]. 


-11- 



Figure CI. 5.6. Single Ag nanoparticles imaged with evanescent-wave excitation. (A) Unfiltered photograph 
showing scattered laser light (514.5 nm) from Ag particles immobilized on a polylysine-coated surface. (B) 
Bandpass filtered (540-580 nm) photograph taken from a blank Ag colloid sample incubated with 1 mM NaCl and 

no dye. (C) and (D) Filtered photographs taken from an Ag colloid sample incubated with 2 x 10 M rhodamine 
6G. Each image shows at least one Raman scattering particle. (E) and (F) Filtered photographs of Ag colloid 

incubated with higher concentrations of rhodamine 6G (2 x 10 M and 2 x 10 M, respectively). Each image 
shows several Raman scattering particles. Reprinted with permission from Nie and Emory [53]. Copyright 1997 
American Association for the Advancement of Science. 
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Figure Cl.5.7. Surface-enhanced Raman spectra of a single rhodamine 6G particle on silver recorded at 1 s 
intervals. Over 300 spectra were recorded from this particle before the signals disappeared. The nine spectra 
displayed here were chosen to highlight several as yet unexplained sudden changes in both frequency and intensity. 
Reprinted with permission from Nie and Emory [53]. Copyright 1997 American Association for the Advancement 
of Science. 

It now seems clear that, under certain conditions, massive enhancements of what is normally a very weak process 
can be achieved. The ability to obtain vibrational spectra would be a great advance in the characterization of single 
molecules if methods could be found to reproducibly observe all molecules in a sample, not only those that happen 
to bind to special sites on the colloid. 

C1. 5.3.3 MULTIPHOTON EXCITATION 

Electronic excitation to a fluorescent state can be accomplished not only by direct one -photon absorption to that 
state, but also by the 'simultaneous' absorption of two or more photons whose energy sums to that of the excited 
state. Multiphoton absorption cross sections are sufficiently small that these processes are negligibly weak with cw 
light sources, but two- and three-photon absorption become viable processes for single-molecule detection at the 
high peak intensities provided by focusing femtosecond pulses to diffraction-limited spot sizes. In particular, the 
development of Ti:sapphire lasers reliably delivering femtosecond pulses in the far-red and near-infrared regions of 
the spectrum has enabled two- and three-photon excited fluorescence microscopy of blue and ultraviolet 
chromophores [58, 59]. Multiphoton absorption is sometimes referred to as 'intrinsically confocal' since the 
probability of absorbing two or three photons depends on the second or third power of the laser intensity, 
respectively, greatly enhancing the 
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contribution to the signal from molecules at the beam focus over those outside the focus, even without any 
additional aperturing. The transverse resolution can be somewhat enhanced for the same reason, although this 
advantage is partially offset by the larger focused spot size imposed by the longer wavelength of the laser. In 
addition, the negligible absorption of the unfocused red or infrared light by many samples, particularly biological 
samples, allows probing at comparatively great depths within samples that often absorb strongly in the UV. 


Two-photon excited fluorescence detection at the single-molecule level has been demonstrated for chromophores in 
cryogenic solids [60], room-temperature surfaces [61], membranes [62] and liquids [63, 64 and 65]. Although 
multiphoton excited fluorescence has been embraced with great enthusiasm as a technique for both ordinary 
confocal microscopy and single-molecule detection, it is not a panacea; in particular, photochemical degradation in 
multiphoton excitation may be more severe than with ordinary linear excitation, probably due to absorption of more 
than the desired number of photons from the intense laser pulse (e.g. triplet excited state absorption) [61]. 


C1.5.4 SYSTEMS AND PHENOMENA 

C1. 5.4.1 SPECTROSCOPY AND PHOTOPHYSICS 

(A) CRYOGENIC STUDIES 

Studies of single molecules in cryogenic solids, while limited to a relatively small number of chromophore/ matrix 
combinations (and small variations thereupon), have covered a wide range of spectroscopic and dynamic processes 
[66,67]. 

The spectral and temporal characteristics of the fluorescence excitation spectra have been examined for several 
chromophores in a variety of single crystalline, Sh'polskii, and amorphous matrices. Two distinct methods are 
employed in these studies: direct measurements in which the laser frequency is scanned repetitively across a single- 
molecule electronic origin while counting emitted photons, and correlation techniques in which the laser frequency 
is fixed and the autocorrelation statistics of emitted photons measured. Both techniques can provide information 
about the dynamics of fluctuations in the electronic origin frequency of a single molecule, although the correlation 
techniques are useful over a broader range of time scales. The direct technique also permits the electronic origin 
linewidths for different single molecules to be measured, with the limitation that the time required to accurately 
characterize the lineshape by scanning the laser frequency ranges from fractions of a second to minutes, and any 
fluctuations occurring on faster time scales are folded into the apparent linewidth. In all systems examined to date, 
at least some molecules exhibit time-dependent fluctuations in electronic origin frequency ranging from a few MHz 
to many GHz or more. These occur on time scales from microseconds to hours, and may be apparently spontaneous 
(usually denoted 'spectral diffusion') and/or light-induced (often called 'hole burning'). In general the narrowest 
lines, as narrow as the lifetime limit based on the fluorescence lifetime measured in bulk experiments, and most 
stable spectra are observed in single crystals: pentacene [6, 68 and 69] and terrylene [70, 71] in/?-terphenyl, and 
terrylene [72], dibenzoterrylene [73] and dibenzanthanthrene [ 74 ] in naphthalene. These and closely related 
chromophores in n- alkane Sh'polskii matrices (terrylene in ?z-alkanes [75, 76, 77 and 78], perylene in ^2-nonane 
[79], dibenzanthanthrene [ 80 ] and terrylenediimide [ 81 ] in ?z-hexadecane) tend to show more spectral diffusion and 
a broader distribution of apparent linewidths, up to several times the lifetime limit. Even more spectral lability is 
observed in amorphous (polymer) matrices; the electronic origin linewidths measured over any finite period of time 
often exceed the lifetime limit by more than an order of 
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magnitude, and a rich variety of spectral diffusion and spectral jumping behaviours are observed (see figure CI. 5. 8 
[ 78 , 81, 82, 83, 84, 85, 86 and 87]. The spontaneous spectral diffusion that persists even down to 1.5 K is attributed 
to tunnelling between 'two-level systems', different configurations of the matrix separated by small barriers that 
are nearly isoenergetic for the chromophore in its ground electronic state but have significantly different energies in 
the excited electronic state, thus shifting the electronic transition frequency. Considerable theoretical work has been 
and continues to be performed to understand the physical origin of these two-level systems and to interpret the 
observed linewidth distributions and dynamics of the spectral diffusion [88, 89, 90, 91, 92, 93, 94 and 95 ]. 
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Figure Cl.5.8. Spectral jumping of a single molecule of terrylene in polyethylene at 1.5 K. The upper trace 
displays fluorescence excitation spectra of the same single molecule taken over two different 20 s time intervals, 
showing the same molecule absorbing at two distinctly different frequencies. The lower panel plots the peak 
frequency in the fluorescence excitation spectrum as a function of time over a 40 min trajectory. The molecule 
undergoes discrete jumps among four (briefly five) different resonant frequencies during this time period. Arrows 
represent scans during which the molecule had jumped entirely outside the 10 GHz scan window. Adapted from 
[199]. 

The Stark effect (shifting of the absorption frequency in response to an external electric field) has been measured 
for nominally centrosymmetric single molecules both in mixed crystals (pentacene in/?-terphenyl) [96] and in 
amorphous solids (terrylene in polyethylene) [97]. In the former system the energy shifts are dominated by the term 
quadratic in applied field as expected for a centrosymmetric system, but in the latter the Stark shifts are dominated 
by the linear term and vary widely in both magnitude and sign among different molecules. This is excellent 
evidence for large and variable dipole moments induced in nominally centrosymmetric chromophores by the 
disorder in the environment, and dramatically demonstrates the strength of the local electric fields produced by 
even the moderately polar bonds of a simple hydrocarbon environment. 
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Line shifts as a function of pressure have been studied for pentacene and terrylene in/?-terphenyl [98, 99]. Both 
exhibited linear and reversible spectral red shifts with increasing pressure. Modest variations (factors of 1.3-1.6) in 
the pressure shifts among molecules were attributed to slightly different local environments. 

Fluorescence lifetimes have been measured directly by time-correlated single-photon counting for pentacene inp- 
terphenyl [ 100 ]. This experiment requires careful selection of the laser pulse characteristics such that the pulse 
duration is short enough to resolve the 23 ns decay time, yet has a bandwidth narrow enough to allow spectral 
selection of individual molecules. Four different molecules had the same lifetime to within experimental 
uncertainty, indicating that the principal contributions to the S 1 state decay (radiation and internal conversion to S Q ) 
are not strongly sensitive to the local environment in this relatively homogeneous crystalline matrix. 


The polarization properties of single-molecule fluorescence excitation spectra have been explored and utilized to 
determine both the molecular transition dipole moment orientation and the depth of single pentacene molecules in a 
/?-terphenyl crystal, taking into account the rotation of the polarization of the excitation light by the birefringent 


crystal [101, 102], 

Dispersed fluorescence spectra showing resolution of the ground- state vibrations have been reported for single 
molecules of pentacene in/?-terphenyl [ 103 , 104 ], terrylene in /?-terphenyl [ 71 ] and terrylene in polyethylene [ 105 , 
106 and 107 ]. In the former system all molecules were found to have quite similar spectra, but the small variations 
between crystallographically distinct 1 and 2 sites were shown to be reproducible enough to distinguish between 
sites [ 104 ], and other small variations due to either natural abundance isotopic substitution or local defect induced 
redistributions of intensity were noted [ 104 ]. Terrylene in both crystalline and amorphous matrices exhibits 
significant spectral variations among molecules (see figure CI. 5. 9 ). The two distinct types of spectra observed in 
polyethylene were suggested to arise from terrylene molecules in two very different local environments, perhaps 
amorphous and crystalline, although the possibility of a chemical impurity could not be ruled out at the time and 
has since been suggested as the correct explanation [ 108 ]. These experiments highlight the promise of low- 
temperature single-molecule spectroscopy for making very detailed spectroscopic measurements probing 
correlations among various spectroscopic observables, e.g., electronic and vibrational frequencies. They also 

suggest the possibility that single-molecule spectroscopy could provide the vibrational frequencies of C 
substituted isotopomers, essential in classical vibrational analysis, without the need to synthesize specifically 
labelled material. 
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Figure CI. 5.9. Vibrationally resolved dispersed fluorescence spectra of two different single molecules of terrylene 
in polyethylene. The excitation wavelength for each molecule is indicated and the spectra are plotted as the 
difference between excitation and emitted wavenumber. Each molecule's spectrum was recorded on a CCD 
detector at two different settings of the spectrograph grating to examine two different regions of the emission 
spectrum. 'Type 1' and 'type 2' spectra were tentatively attributed to terrylene molecules in very different local 
environments, although the possibility that type 2 spectra arise from a chemical impurity could not be ruled out. 
Further details are given in Tchenio [ 105-107 ]. 

A long-range goal of some single-molecule work is to engineer single-molecule optical devices. Toward this, two 
groups have demonstrated the ability to use light to drive single molecules of terrylene among two or more stable 
states in a predictable manner — a single-molecule optical 'switch' [77, 109 ]. The optical properties of terrylene in 
^2-octane have been modified by exciting nearby triphenylene molecules to their triplet state [ 110 ], and this idea has 
been extended to probe the dynamics of triplet excitons in isotopically mixed naphthalene crystals using frequency 
shifts of single molecules of terrylene as a local environmental probe [ 111 ]. 


Finally, the ability to optically address single molecules is enabling some beautiful experiments in quantum optics. 
The non-Poissonian photon arrival time distributions expected theoretically for single molecules have been 
observed directly, both antibunching at short times [ 112 ] and bunching on longer time scales [6, 112 and 113 ]. The 
fluorescence excitation spectra of single molecules bound to spherical microcavities have been examined as a probe 


of the optical resonances of small particles [ 114 ]. A variety of nonlinear optical effects have been observed at high 
laser intensities, including the AC Stark effect and Rabi oscillations, and exceptional agreement with theoretical 
predictions has been shown (see figure CI. 5. 10 ) [115, 116 ., 117 and 118 ]. In a particularly exciting application of 
quantum optics with single molecules, dibenzanthanthrene in ?z-hexadecane has been made to act as a triggered 
source of single photons by using the method of adiabatic following to prepare a single molecule in its fluorescent 
state [ 119 ]. 
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Figure Cl.5.10. Normalized fluorescence intensity correlation function for a single terrylene molecule inp- 
terphenyl at 2 K. The solid line is the theoretical curve. Regions of deviation from the long-time value of unity due 
to photon antibunching (the finite lifetime of the excited singlet state), Rabi oscillations (absorption-stimulated 
emission cycles driven by the laser field) and photon bunching (dark periods caused by intersystem crossing to the 
triplet state) are indicated. Reproduced with permission from Plakhotnik et al [66], adapted from [ 118 ]. 

(B) ROOM TEMPERATURE STUDIES 

Spatial selection techniques, both near-field and far-field, have been employed to examine a variety of 
spectroscopic and photophysical properties of single molecules at room temperature. Near- field and far- field 
images, fluorescence spectra, and lifetimes have been compared and found to be consistent under appropriate 
conditions (see figure CI. 5. 11 ), although the close proximity of the metallic tip used in near- field experiments can 
also alter fluorescence lifetimes as discussed above. While most of these studies involve dye molecules that have 
considerably more conformational flexibility than the rigid aromatics used in the low-temperature spectral selection 
experiments, single-molecule data over the full range from superfluid helium to room temperature have been 
obtained for a few chromophore/matrix combinations [81, 120 ]. The spectral linewidths (integrated over the 
required accumulation times of milliseconds or more) as well as the observed spectral jumps are orders of 
magnitude larger in the room temperature experiments, but similar phenomena of highly variable linewidths and 
resonant frequencies among different molecules, spontaneous and light-induced spectral jumping and 'blinking' of 
the fluorescence when exciting at a fixed frequency are observed [26, 121 , 122, 123 , 124, 125, 126 , 127 , 128 and 
129 ]. Photon bunching in the emission correlation function due to triplet state formation has also been observed at 
room temperature [ 129 , 130 ]. 
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Figure Cl.5.11. Far-field fluorescence images (A and D), corresponding fluorescence spectra (B and E), and 
fluorescence decays (C and F) for two different molecules of a carbocyanine dye at a PMMA-air interface. 
Lifetimes were fitted to a single exponential (dotted curves) with decay times of 2.56 ns (% 2 = 1.05) in (C) and 3.20 
ns (x = 1.16) in (F). For comparison, an ensemble measurement averaged over several hundred molecules is 
shown in (G) and (H). A single exponential fit to the lifetime yields a decay time of 2.70 ns (% = 6.7). The larger 
% indicates a deviation from single-exponential behaviour, reflecting the ensemble average over a distribution of 
lifetimes. Reprinted with permission from Macklin et al [ 126 ]. Copyright 1996 American Association for the 
Advancement of Science. 

The three-dimensional orientation of a single molecule's transition dipole has been determined using near- field 
optics, taking advantage of the longitudinal component of the electric field near the tip [14]. Similar determinations 
have been made using far- field optics with a 'donut mode' laser beam [ 123 ] and by analysing the intensity patterns 
from far- field fluorescence images under slightly aberrating conditions [ 131 ]. The far- field techniques in particular 
should prove invaluable for following orientational motions of single molecules in environments that permit such 
motion, as discussed briefly below. 

C1. 5.4.2 MAGNETIC RESONANCE OF CHROMOPHORES IN SOLIDS 

Magnetic resonance techniques, while powerful spectroscopic probes of molecular structure (sections B1.12- 
B1.16), typically have quite low sensitivities, and direct detection of single nuclear or electron spins has yet to be 
demonstrated. However, electron spin resonance at the single-molecule level has been demonstrated through the 
indirect technique of optically detected magnetic resonance. The original experiments exploited the dependence of 
the time-averaged fluorescence intensity on the rate of intersystem crossing from the fluorescent singlet to the 
essentially nonemissive triplet state. The splittings among the magnetic components of the triplet state were 
detected by sweeping the RF field while measuring the total fluorescence intensity from a single molecule selected 
out of the inhomogeneous ensemble by its fluorescence excitation frequency [132, 133 ]. This idea has subsequently 
been extended to examine isotope effects on the rf resonant linewidths [ 134 , 135 ] and to demonstrate single-spin 
coherence and spin echo phenomena [136, 137 ]. 
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C1. 5.4.3 CHEMICAL REACTIONS 


Chemical reactions can be studied at the single-molecule level by measuring the fluorescence lifetime of an excited 
state that can undergo reaction in competition with fluorescence. Reactions involving electron transfer ( section 
C3.2 ) are among the most accessible via such techniques, and are particularly attractive candidates for study as a 
means of testing relationships between charge-transfer optical spectra and electron-transfer rates. If the physical 
parameters that determine the reaction probability, such as overlap between the donor and acceptor orbitals, 


thermodynamic driving force for the reaction and nuclear reorganization energies, are not constant across the 
ensemble, different molecules will exhibit different electron transfer 'rates', as defined through multiple 
measurements on the same molecule. A very broad distribution of lifetimes, and thereby electron transfer rates, was 
observed for excited cresyl violet molecules transferring an electron to an indium tin oxide surface (figure CI. 5. 12) 
[ 138 ]. In contrast, a study of the Os(VIII) ion-catalysed redox reaction between Ce(IV) and As(III), producing a 
fluorescent Ce(III) product, found uniform catalytic activities for different Os(VIII) ions [ 139 ]. Single-molecule 
rate measurements coupled with spectroscopy hold great promise for allowing the factors that dictate electron 
transfer and other chemical reactions to be teased apart. 
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Figure C1.5.12.(A) Fluorescence decay of a single molecule of cresyl violet on an indium tin oxide (ITO) surface 
measured by time-correlated single photon counting. The solid line is the fitted decay, a single exponential of 480 ± 
5 ps convolved with the instrument response function of 160 ps fwhm. The decay, which is considerably faster than 
the natural fluorescence lifetime of cresyl violet, is due to electron transfer from the excited cresyl violet (D*) to 
the conduction band or energetically accessible surface electronic states of ITO. (B) Distribution of lifetimes for 40 
different single molecules showing a broad distribution of electron transfer rates. Reprinted with permission from 
Lu and Xie [ 138 ]. Copyright 1997 American Chemical Society. 


-20- 


C1. 5.4.4 TRANSLATIONAL AND ROTATIONAL MOTIONS 


While translational and rotational diffusion are very well understood at the bulk level, there are advantages to being 
able to observe trajectories of individual molecules, particularly in nonhomogeneous materials. Fluorescence 
microscopy has been used to follow single-molecule translation over long time scales in hindered and/or 
anisotropic environments such as polymers [ 140 , 141 ], gels [ 142 ], engineered submicrometre channels [ 143 ] and 
lipid membranes [ 144 ], as well as in aqueous solution [ 145 ] and to quantify the 'optical tweezers' effect whereby a 
polarizable molecule is attracted into the focus of an intense light source [ 146 , 147 ]. The three-dimensional 
rotational motions of chromophores bound to polymers can be monitored by the intensity distribution patterns in 
confocal fluorescence microscopy (figure CI. 5. 13) figure CI. 5. 14 and figure CI. 5. 15 [ 148 , 149 ]. Polarization 
modulation techniques have been employed to probe rotational motions for chromophores on surfaces and bound to 
polymers [ 127 , 150 and 151 ] and near- field excitation with two polarization detection channels has been used to 


examine rotation of dye molecules on glass surfaces and in polymers [ 141 ]. These techniques seem likely to find 
broad applicability as probes of local motion in nano- and meso structured materials, as well as in monitoring the 
conformational dynamics of biopolymers such as DNA and proteins. 
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Figure Cl.5.13. Schematic diagram of an experimental set-up for imaging 3D single-molecule orientations. The 
excitation laser with either s- or p-polarization is reflected from the polymer/water boundary. Molecular 
fluorescence is imaged through an aberrating thin water layer, collected with an inverted microscope and imaged 
onto a CCD array. Aberrated and unaberrated emission patterns are observed for z- and xy-orientated molecules, 
respectively. Reprinted with permission from Bartko and Dickson [ 148 ]. Copyright 1999 American Chemical 
Society. 
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Figure Cl.5.14. Fluorescence images of three different single molecules observed under the imaging conditions of 
figure Cl.5.13 . The observed dipole emission patterns (left column) are indicative of the 3D orientation of each 
molecule. The right-hand column shows the calculated fit to each observed intensity pattern. Molecules 1, 2 and 3 
are found to have polar angles of (9,(|))=(4.5 ,-24.6 ), (-5.3°,51.6°) and (85.4°,-3.9°), respectively. Reprinted with 
permission from Bartko and Dickson [ 148 ]. Copyright 1999 American Chemical Society. 



Figure Cl.5.15. Molecular orientational trajectories of five single molecules. Each step in the trajectory is 
separated by 300 ms and is obtained from the fit to the dipole emission pattern such as is shown in figure CI. 5. 14. 
The radial component is displayed as sin 9 and the angular variable as (|). The lighter dots around the average 
orientation represent ±1 standard deviation. Reprinted with permission from Bartko and Dickson [ 148 ]. Copyright 
1999 American Chemical Society. 
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C1. 5.4.5 CONJUGATED POLYMERS, CHROMOPHORE AGGREGATES, AND MICROHETEROGENEOUS MATERIALS 

Materials that possess fundamental optical inhomogeneities are among the most appealing candidates for single- 
molecule techniques. These include conjugated polymers such as poly(phenylenevinylene) and its relatives, in 
which different polymer chains differ not only in their physical length but also in their effective electronic 
conjugation length, and noncovalent structures such as submicroscopic crystals and the J-aggregates formed by 
certain cyanine dyes, in which different 'supermolecules' differ in both numbers of chromophores and their spatial 
relationships. Large reversible fluorescence intensity fluctuations observed in far-field single-molecule studies on 
conjugated polymers have been interpreted in terms of excited-state quenching by a photochemically produced 
charge-separated state [ 129 , 152 and 153 ]. Near- field techniques are particularly valuable in these systems, which 
have a functionally important structure on nanometre length scales. Near-field scanning optical microscopy, 
including analysis of the excitation and emission polarization properties, has been used to probe spatial 
relationships between excitation and emission in J-aggregates, providing information about the degree of order in 
the aggregate and the spatial extent of exciton migration [ 154 , 155 and 156 ]. NSOM has also been applied to small 
molecular crystals to examine spatial inhomogeneities, energy transfer, and excitation trapping [ 156 , 157 and 158 ]. 

Single molecules also have promise as probes for local structure when doped into materials that are themselves 
nonfluorescent. Rhodamine dyes in both silicate and polymer thin films exhibit a distribution of fluorescence 
maxima indicative of considerable heterogeneity in local environments, particularly for the silicate material [ 159 ]. 
A bimodal distribution of fluorescence intensities observed for single molecules of crystal violet in a PMMA film 
has been suggested to result from high and low viscosity local sites within the polymer that give rise to slow and 
fast internal conversion, respectively [ 160 ]. 

C1. 5.4.6 METALLIC AND SEMICONDUCTOR NANOPARTICLES 

Metallic and semiconductor 'nanoparticles' or 'nanocrystals' — chunks of matter intermediate in size and physical 
properties between single atoms and the macroscopic bulk materials — are of great interest both for their 


fundamental properties and for technological applications ( section C2.17 ). Although methods have been developed 
to synthesize some of these materials with high monodispersity, even the best preparations have significant 
variations from particle to particle in the number of atoms and/or the geometry of the particle, making the ability to 
interrogate single particles particularly valuable. Generally these materials can be prepared such that the individual 
particles are spaced arbitrarily far apart, making far- field microscopy the optical technique of choice. 

Size-dependent optical properties of single silver and gold nanoparticles have been studied [ 161 , 162 ], as have the 
surface Raman enhancements of organic molecules bound to these nanoparticles, as discussed above [53]. Several 
beautiful spectroscopic studies have been carried out on single II- VI semiconductor nanocrystals, revealing very 
well resolved low-temperature fluorescence spectra having well defined phonon substructures [ 163 , 164 and 165 ] 
(see figure CI. 5. 16 ), large Stark shifts in these spectra [ 166 ], pronounced spectral diffusion [ 164 , 166 , 167 ] and 
dramatic intensity-dependent effects on the spectra [ 165 ]. The spectral diffusion was interpreted as being due to 
randomly fluctuating local electric fields caused by charge carriers trapped at surface defects [ 166 ]. Recently, 
single-particle techniques have also been used to resolve luminescence from the individual 'chromophores' of 
nanostructured porous silicon [ 168 ]. 
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Figure Cl.5.16. Spatial selection of single CdS nanocrystals on a quartz cover slip. Left: fluorescence image of 
isolated CdS nanocrystals recorded via scanning in a low-temperature confocal microscope (T=20 K). Total 
fluorescence was excited at 442 nm. Fifty-five per cent of the fluorescence was directed to a spectrometer during 
the scan, resulting in the Z spectrum on the right. Spectra A and B were taken after moving the scanner to the bright 
spots marked in the fluorescence image, reducing the excitation intensity and increasing the acquisition time. Under 

these conditions both spectra show a second emission line shifted 5 meV (40 cm") to the blue of the main peak. 
Reprinted with permission from Koberling et al [ 165 ]. Copyright 1999 by the American Physical Society. 

C1. 5.4.7 BIOLOGICAL SYSTEMS 

The application of single molecule optical techniques to biological phenomena is an area currently seeing explosive 
growth [169]. 


The most obvious biological studies to undertake with optical techniques at the level of single functional units 
(generally not single chromophores) involve probing the photophysics of systems whose function, whether 
naturally evolved or engineered by man, involves responding to light. Photosynthetic light-harvesting complexes 
from green algae were imaged by near-field techniques at room temperature and fluorescence lifetimes measured 
with picosecond time resolution [ 170 ]. Subsequently, far-field microscopy has been used to obtain fluorescence 
excitation spectra of single bacterial light-harvesting complexes at 1 .2 K [ 171 , 172 ] and as a function of 
temperature [ 173 ] providing detailed information about individual chromophores' site energies and energetic 
disorder, dipolar couplings between chromophores and spectral diffusion. Confocal fluorescence microscopy of 
single light-harvesting complexes at room temperature showed that photobleaching of just one bacteriochlorophyll 


molecule of the 18-member assembly provides an energy trap that effectively quenches fluorescence from the 
entire assembly [ 174 ]. Single molecules of green fluorescent protein, widely used as a fluorescent tag in molecular 
biology, exhibit pronounced on-off 'blinking' effects whose origin has yet to be understood [ 175 , 176 ]. 

Single-molecule optical methods have also been adapted to probe biological systems whose function is unrelated to 
light. Much of the research on single-molecule detection in liquids and gels is directed toward rapid DNA 
sequencing, utilizing either intrinsic fluorescence or, more likely, exogenous dyes bound to specific DNA bases 
[44, 45, 64, 177 and 178 ]. Single-molecule optical techniques are being used to study a variety of chemical and 
physical properties of 
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oligonucleotides, RNA and DNA including base pairing [179], electron transfer between dyes and DNA bases 
[ 180 ], ligand-induced conformational changes in RNA [ 181 ] and conformational dynamics in oligonucleotides 
[127, 150 , 182, 183 , 184 and 185 ], as well as to selectively cleave DNA [ 186 ]. Conformational dynamics in 
proteins that may have functional significance are being accessed through polarization studies, fluorescence 
quenching and energy transfer techniques carried out at the single-molecule level [ 187 , 188 ]. Total internal 
reflection fluorescence microscopy has been used to visualize the motions of single molecules of fluorescently 
labelled kinesin, the motor protein that powers organelle transport along microtubules [ 189 ]. Dunn and co-workers 
are applying single-molecule techniques to study the morphology and dynamics of microenvironments in model 
biological membranes [ 190 , 191 ]. 

Perhaps the most exciting application of single-molecule techniques in biology is the probing of enzymatic 
reactions at the level of individual turnovers [ 192 , 193 ], utilizing systems in which either enzyme or substrate or 
both undergo large changes in optical properties (e.g. switching from fluorescent to nonfluorescent states) during 
the course of the reaction. These studies allow the possibility of heterogeneity in reaction rates among nominally 
identical enzyme molecules to be assessed, and make it possible to probe 'memory' effects, the extent to which an 
enzyme's binding constant or turnover rate depends upon its previous reaction history. Generally the enzyme is 
either bound to a surface [ 194 ] or confined within the pores of a gel [ 142 ] or nanoengineered structure [ 139 , 143 ], 
and the substrate and product molecules allowed to diffuse toward and away from the immobilized enzyme. The 
catalytic activity among different molecules of nominally identical lactate dehydrogenase enzyme was found to 
vary by up to a factor of four, an observation tentatively attributed to the presence of multiple stable conformers of 
the enzyme [ 195 ]. A detailed study of single molecules of mammalian alkaline phosphatase revealed even more 
pronounced heterogeneities among different molecules (more than tenfold differences in turnover rate and more 
than two-fold differences in activation energy), which were attributed at least in part to post-translational 
modification producing chemically nonidentical enzyme molecules [ 196 ]. In contrast, single molecules of highly 
purified bacterial alkaline phosphatase have indistinguishable enzymatic activities [ 137 ]. Lengthy turnover 
trajectories of cholesterol oxidase, a flavoprotein that catalyses the oxidation of cholesterol by oxygen ( figure 
CI. 5. 17 ), have been analysed to obtain the distribution of 'on' and 'off times in the Michaelis-Menten catalytic 
mechanism, and also revealed evidence for memory effects probably due to slow conformational fluctuations in the 
protein [ 192 , 193 ]. 
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Figure C1.5.17.(A) Enzymatic cycle of cholesterol oxidase, which catalyses the oxidation of cholesterol by 
molecular oxygen. The enzyme's naturally fluorescent FAD active site is first reduced by a cholesterol substrate, 

generating a nonfluorescent FADFT, and then re-oxidized by molecular oxygen. (B) Structure of FAD. (C) 
Fluorescence intensity trajectory of an individual cholesterol oxidase enzyme, immobilized in an agarose gel, 
undergoing reaction with cholesterol and oxygen. Each on-off cycle of emission corresponds to one enzymatic 
turnover. (D) Distribution of emission on-times derived from the intensity trajectory. The nonexponential 
distribution reflects the fact that the forward reaction is not a single elementary step but involves an intermediate 
enzyme-substrate complex as shown in the inset. The solid line is a curve simulated by convolving two exponential 
distributions with £ 1 [S]=2.5 s and £ 2 [S]=15.3 s . Reproduced with permission from Xie and Trautman [ 123 ] 


C1.5.5 CONCLUSION 

The ability to make optical measurements on individual molecules and submicroscopic aggregates, one at a time, is 
a valuable new tool in several areas of molecular science. By eliminating inhomogeneous broadening it allows pure 
spectroscopy to be performed with unprecedented precision in certain condensed phase systems. As an analytical 
method it permits the rapid detection of certain analytes with unmatched sensitivity. Finally, it is revolutionizing 
our 
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understanding of the relationships among nominally identical members of molecular ensembles. The deepest and 
most lasting contribution of this technique is likely to be in its application to complex systems such as biological 
macromolecules, polymers, and engineered nanostructures, where ensembles of members that are 'identical' to the 
best of nature's or man's efforts still exhibit differences on time scales that are of functional importance. 


The techniques and applications of single molecule spectroscopy are currently in a state of rapid development, 
making this a difficult field to summarize at any given time. This contribution is, at best, a single frame of a movie 


featuring plenty of action, improvization, and unexpected plot twists. 
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C2.1 Polymers 

Pierre Robyr 


C2.1.1 INTRODUCTION 

Polymers are substances consisting of large molecules also known as macromolecules. The molecules are built up 
of many subunits called monomers which are linked together, usually by covalent bonds. In a polymer, the number 
of subunits is generally larger than 100 [1]. Assemblies of less than 100 subunits are often referred to as oligomers. 
Macromolecules make up many of the materials in living organisms, as for example cellulose, lignin, proteins and 
nucleic acids. The latter two have highly specific roles in life. Proteins control many biochemical processes and 
nucleic acids store genetic information. Many polymers are man-made materials, and are therefore called synthetic 
polymers. These polymers have a great industrial importance because they offer an attractive compromise between 
ease of processability and final mechanical and thermal properties. This article focuses on the general properties of 
polymers, without dealing with the specific roles of natural polymers, such as proteins and nucleic acids. 

In contrast to low-molecular-weight compounds and polymers with specific roles in biochemical processes, most 
polymers consist of similar molecules with different molecular weights. The mean value and the distribution of the 
molecular weight depend on the preparation conditions and decisively influence the material properties. Both 
quantities can be obtained from different techniques such as light scattering, viscosity measurements or gel 
permeation chromatography. However, each technique provides a different average molecular weight [2]. The most 
important ones are the number average molecular weight 

where n- x is the number of molecules with molecular weight Mj, and the weight average molecular weight: 

£, n, Mi Mi 


Af» = 
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The ratio M w I M n is one for a strictly uniform molecular- weight distribution and larger than one for molecular- 
weight distributions with finite widths. The variance of the molecular weight distribution is tM u .fM„ - l)Af;;[3] ? 
therefore M I M - 1 is often mentioned as a measure for molecular- weight dispersion. 

In polymers made of dis-symmetric monomers, such as, for example, poly (propylene), the structure may be 
irregular and constitutional isomerism can occur as shown in figure C2.1.1(a) . The succession of the relative 
configurations of the asymmetric centres can also vary between stretches of the chain. Configuration isomerism is 
characterized by the succession of dyads which are named either rneso, if the two asymmetric centres have the 
same relative configurations, or racemo if the configurations differ ( figure C2. 1.1(b) ). A polymer is called isotactic 
if it contains only one type of dyad and syndiotactic if the dyad sequence strictly alternates between the meso and 
racemo forms. 
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Polymers without configurational regularity are called atactic. Configurationally regular polymers can form 
crystalline structures, while atactic polymers are almost always amorphous. Many polymers consist of linear 
molecules, however, nonlinear chain architectures are also important (figure C2.1.2). 
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Figure C2.1.1. (a) Constitutional isomerism of poly (propylene). The upper chain has a regular constitution. The 
lower one contains a constitutional defect, (b) Configurational isomerism of poly(propylene). Depending on the 
relative configurations of the asymmetric carbons of two successive monomer units, the corresponding dyad is 
either meso or racemo. 
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Figure C2.1.2. Polymers with linear and nonlinear chain architectures. The nonlinear polymers can have branched 
chains. Short chains of oligomers can be grafted to the main chain. The chains may form a star-like structure. The 
chains can be cross-linked and form a network. 
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Most properties of linear polymers are controlled by two different factors. The chemical constitution of the 
monomers determines the interaction strength between the chains, the interactions of the polymer with host 
molecules or with interfaces. The monomer structure also determines the possible local conformations of the 
polymer chain. This relationship between the molecular structure and any interaction with surrounding molecules is 
similar to that found for low-molecular- weight compounds. The second important parameter that controls polymer 
properties is the molecular weight. Contrary to the situation for low-molecular- weight compounds, it plays a 
fundamental role in polymer behaviour. It determines the slow-mode dynamics and the viscosity of polymers in 
solutions and in the melt. These properties are of utmost importance in polymer rheology and condition their 
processability. The mechanical properties, solubility and miscibility of different polymers also depend on their 
molecular weights. 


C2.1.2 POLYMER SYNTHESIS 


The successful preparation of polymers is achieved only if the macromolecules are stable. Polymers are often 
prepared in solution where entropy destabilizes large molecular assemblies. Therefore, monomers have to be 
strongly bonded together. These links are best realized by covalent bonds. Moreover, reaction kinetics favourable 
to polymeric materials must be fast, so that high-molecular-weight materials can be produced in a reasonable time. 
The polymerization reaction must also be fast compared to side reactions that often hinder or preclude the 
formation of the desired product. 

Polymerization reactions are generally divided into two main categories according to the mechanism of chain 
growth [4]. In the first category, called chain polymerization, chain growth proceeds exclusively by reaction 
between a monomer and the reactive site on the polymer chain with regeneration of the reactive site at the end of 
each growth step and possible production of a side-product L: 


P^ + M 


+ [M- 


Chain polymerization involves at least two steps: initiation, i.e. formation of reactive sites, which are generally 
radicals or ions; and propagation, through which the chains grow. In most practical conditions termination reactions 
play an important role. Depropagation, i.e. the reduction of the degree of polymerization by one unit is also 
possible. In termination reactions, one or two reactive sites vanish and chain growth is stopped. The possibility of 
growth termination of a chain exists at every addition step and, therefore, the probability of termination increases 
with the degree of polymerization. This increase leads to the flattening of the average molecular weight as a 
function of the degree of monomer conversion, as shown in figure C2.1.3 [5]. In the limiting situation of fast 
initiation, irreversible propagation and the absence of termination, the molecular weight increases linearly with the 
degree of conversion ( figure C2.1.3 ). These conditions are approached in some ionic chain polymerizations 
performed under very clean conditions. Such reactions are called living polymerizations. Radical chain 
polymerization and also many chain polymerizations with simple ions as reactive sites produce polymers with 
irregular configuration. To prepare highly-regular polymers special catalysts have been developed. Two types 
widely used are Ziegler—Natta catalysts [6] and metallocene-based catalysts [7]. 
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Figure C2.1.3. Schematic dependence of the molecular weight of a polymer as a function of the degree of 
monomer conversion for different polymerization reactions. 

The second category of polymerization reactions does not involve a chain reaction and is divided into two groups: 
polyaddition and polycondensation [4]. In both reactions, the growth of a polymer chains proceeds by reactions 
between molecules of all degrees of polymerization. In polycondensations a low-molecular- weight product L is 
eliminated, while polyadditions occur without elimination: 


P, + P V ^P, V+V + [L}. 


In polycondensation reactions, the equilibrium can be easily shifted to the side of the higher molecular- weight 
compounds by allowing the eliminated compound to escape the reaction vessel or by actively removing it. Since 
molecules of all degrees of polymerization can react with each other, the average molecular weight grows faster 
with a higher degree of conversion (figure C2.1.3). Consequently, the preparation of polymers with high molecular 
weights requires a large degree of conversion [5]. 

Copolymerization involves the reaction of at least two different monomers A and B. In the case of chain 
copolymerization, the reactivity ratios r A and r B are important, r A = k AA /k AB and r B = k BB /k BA , where k yx is the 
rate constant of the reaction of a free monomer Y with a chain ending with a unit X [5]. Three different situations 
can be envisaged, (i) If r^r^ = 1, k AA /k AB = ^ BA /^ BB - The ratio of the probabilities of adding a monomer A or B to 
a chain end A is equal the ratio of the probabilities of adding a monomer A or B to the chain end B. Consequently, 
the probabilities of adding a monomer A or B are independent of the chain end. This type of copolymerization is 
called ideal or Bernoullian copolymerization. (ii) If r A r B < 1, the sequence of the monomers tends to alternate. In 
the limiting case where r A = r B = 0, the monomer sequence strictly alternates, (iii) If r A r^ > 1, the two types of 
monomers tend to sequence and build block structures along the chain. 

We have tacitly assumed that the rate constants depend only on the last unit of the chain. In such a situation, the 
copolymerization is called a Markov copolymerization of first order. The special case (i), r A r^ = 1, is a Markov 
copolymerization of order zero. If reactivity also depends on the penultimate unit of the chain, the polymerization 
is a Markov copolymerization of second order. 
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C2.1.3 CONFORMATION OF A SINGLE CHAIN 


C2.1.3.1 INTRODUCTION 


Polymer chains possess a huge number of degrees of freedom, which can be divided into two categories: on the one 
hand, bond angles and bond lengths and, on the other hand, torsional angles. Those of the first category undergo 
fast oscillations on the 10-100 fs time scale and vary little from one monomer to the other. Torsional angles are 
much softer degrees of freedom and set the conformation of the polymer chain. Two conformations are often 
encountered: random coils and helices. Random coils are found in polymer solutions, melts, and polymer glasses. 
Denatured proteins also adapt random coil structures. Helical structures often occur in crystalline polymer or as 
subunits in folded proteins. 

The energy difference between the trans and gauche conformations in alkanes was investigated with Raman 
scattering [9]. It was found that the difference between the two energy minima in solution is 2- 3 kJ mol and that 
the energy barrier is about 12 kJ mol (figure C2.1.4). Consequently, at room temperature the torsional angles of 
an alkane chain are most of the time either in the gauche or the trans state with roughly equal probabilities, and 
from time to time a chain segment collects sufficient thermal energy so that a conformational change can occur. 
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Figure C2.1.4. Potential energy as a function of the rotation about the central C-C bond in butane. The sketches 
show the projection of the molecule along the central C-C bond. 
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C2.1.3.2 GAUSSIAN CHAINS 


At first sight, one might think that any treatment of the properties of a polymer chain has to emanate from its 
microscopic chemical structure since it determines the populations of the different rotational isomeric states of a 
given torsional angle. However, in polymers without well defined conformations, the correlation between the 
torsional angles along the chain decays rapidly. Beyond that length scale, typically a few nanometres and referred 
to as persistence length [10], one can think of the chain as a succession of jointed sticks, with unrestricted angles at 
the junctions. Such a chain with a large number of segments is called Gaussian. It can be easily shown [ 11 ] that for 
a chain consisting of 7V § segments of length a the root mean square of the distance between the two chain ends 
averaged over all possible conformations is 


vW) = a^V 


(C2.1.1) 


This relation also applies to any portion of the chain segments as long as the number of segments in the portion is 
sufficient. Therefore, if one proceeds n s segmental steps, starting from a point in the interior of the chain, the 
resulting average displacement is of the order of. Conversely, the number of monomers contained in a sphere of 

radius r scales as n oc n s oc ir. Thus, the Gaussian chain fills only partially a three-dimensional space and its fractal 
dimension is two. The mean monomer density c m in a volume of size as a function of the degree of polymerization 

TV scales as 


CmW OC 
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(C2.1.2) 


The mean monomer density decreases with the increasing degree of polymerization. 


It is not possible to apply (C2.1.1) down to the level of monomers and replace N by the degree of polymerization 
TV and «rby the sum of the squares of the bond lengths in the monomer £j- because the chemical constitution 
imposes some stiffness to the chain on the length scale of a few monomer units. This effect is accounted for by 
introducing the characteristic ratio C^ defined as Gqq — {& )/{$&£ ) • The characteristic ratio can be determined 

from viscosity or scattering measurements. 


Light scattering techniques play an important role in polymer characterization. In very dilute solution, where the 
polymer chains are isolated from one another, the inverse of the scattering function S (q) can be expressed in the 
limit of vanishing scattering vector q — » as [ 12 ] 
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S(q) 


= N~ l ( 


I + q~ + 
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(C2.1.3) 


Thus, by plotting S~ as a function of q in the limit of small q the mean square of the end-to-end distance can be 
obtained. For large value of q, q (R ) » 1, 
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so that the characteristic ratio can be evaluated from the plateau value of 5(4)gf ti£at large # [12]. 

C2.7 .3.3 EXCLUDED-VOLUME EFFECTS 

The Gaussian chain model considers only the interactions between neighbouring monomers along the chains, 
which determine the characteristic ratio, but neglects the fact that two monomers distant along the chain by more 
than the persistence length cannot occupy the same volume. At first sight this shortcoming might seem intolerable. 
However, exactly this situation, where excluded-volume effects between distant monomers vanish, holds in poor 
solvents and in the melt. In a poor solvent, the interactions between the solvent molecules and the monomers are 
not favourable and the chain tends to contract onto itself. Solutions in which the poorness of the solvent exactly 
compensates for the excluded-volume effects are called theta solutions, or solutions at conditions [13, 14 ]. 


Theta conditions in dilute polymer solutions are similar to the state of van der Waals gases near the Boyle 
temperature. At this temperature, excluded-volume effects and van der Waals attraction compensate each other, so 
that the second virial coefficient of the expansion of the pressure as a function of the concentration vanishes. On 
dealing with solutions, the quantity of interest becomes the osmotic pressure II rather than the pressure. Its virial 
expansion may be written as 


n = RTc 


«(^+ A * c *+ A **>+") 


(C2.1.5) 


where c is the weight concentration of the polymer. Since the interactions between the monomers and the solvent 
are temperature dependent, the conditions at which the second virial coefficient vanishes can be found by varying 
the temperature. The conditions of a dilute solution of polystyrene in cyclohexane occur at T= 35°C and ambient 
pressure (figure C2.1.5). 
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Figure C2.1.5. Reduced osmotic pressure n M n I (RTc w ) as a function of the weight concentration c w of 

polystyrene (M n = 130 000 g mol ) in cyclohexane at different temperatures. At T= 35 °C and ambient pressure, 
the solution is at the conditions. (Figure from [74], reprinted by permission of EDP Sciences.) 


-8- 
C2.1.3.4 EXPANDED CHAINS 

Polymer chains at low concentrations in good solvents adopt more expanded conformations than ideal Gaussian 
chains because of the excluded- volume effects. A suitable description of expanded chains in a good solvent is 
provided by the 'self-avoiding random walk' model. Flory [15] showed, using a mean field approximation, that the 
root mean square of the end-to-end distance of an expanded chain scales as 


,/{R2) = /V s 3/5 tf, = C±£ 2 N VS a? (C2.1 .6) 

where a ¥ = a^ in the state. These results were later put on a firmer theoretical basis by de Gennes [16]. Using the 
same arguments as in the case of Gaussian chains, the fractal dimension of expanded chain is found to be five- 
thirds and the scaling behaviour of the scattering function for q ^/{K 1 ) S> ^ s 

1 
S(tf)* -^7J (C2.1.7) 

compared to S(q) <x q~ 2 in the case of Gaussian chains ( equation (C2.1.4) ) 

The neutron scattering data of figure C2.1.6 show that if the excluded volume effects are activated by increasing 
the temperature from the point, and thus increasing the goodness of the solvent, the transition from the Gaussian 
chain behaviour into that of an expanded chain depends on the length scale. The transition from the regime S~ oc 
q to S~ oc q Di:> occurs first for low values of q, i.e. for long distances, while below a temperature-dependent 
length, referred to as the thermic correlation length, the chain is still Gaussian. The whole chain many be seen as a 
self-avoiding walk of Gaussian blobs ( figure C2.1.7 ). 
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Figure Cl.2.6. Summary of fee [ 60 ] fullerene structure and alkali-intercalation composites of [60] fullerene. 
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Figure C2.1.7. Schematic drawing of a polymer chain that behaves as a Gaussian chain on short length scales and 


as an expanded chain on longer length scales. The cross-over distance corresponds to the size of the pearls. 

C2.1.3.5 ROTATIONAL ISOMERIC STATE MODELS 

The quantitative description of many properties of polymer chains, such as energy, entropy and detailed 
conformation, necessitates the consideration of the monomer structure and its influence on the possible states of the 
torsional angles. The rotational isomeric state (RIS) theory focuses on such detailed descriptions of a polymer chain 
[9, 17, 18]. The RIS theory takes into account only the interactions between nearest-neighbour monomers and 
assumes that each torsional angle can take only a few possible conformations. The nearest-neighbour interactions 
control the populations of the few allowed conformations. Using these approximations, the statistical weight of a 
given chain conformation can be calculated from the sum of the pair interactions; by summing over all the 
conformations, the partition function of a polymer chain can be obtained. From the partition function and the 
conformational Helmholtz energy, estimates of the conformational entropy and internal energy can be calculated. 
The RIS theory also provides a simple way to calculate quantities such as the characteristic ratios C^, the mean 

square end-to-end distance (R ) and the scattering curves. 
C2.1.3.6 POLYELECTROLYTES 

The polymers considered hitherto were electrically neutral molecules in neutral solvents. If the monomers contain 
functional groups that can dissociate into ions in polar solvents, the behaviour of the polymer chain is significantly 
influenced by the long-range Coulombic interactions [19]. For example, if the ionic strength of the solvent is 
increased by adding some salt, the repulsive interaction between the equivalent charges along the polyelectrolyte 
chain are reduced and the polymer shrinks. Concomitantly, the viscosity of the solution which largely depends on 
the overall polymer dimension, diminishes drastically. Polyelectrolytes have also attracted much attention as ionic 
conductors in the development of light and efficient dry batteries [20]. 
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C2.1.4 SOLUTION, MELT AND GLASS 

C2.1.4.1 DILUTE AND SEMI-DILUTE SOLUTIONS 

In dilute solutions, the polymer chains are isolated from one another and only interact during brief encounters. With 
increasing polymer concentration, a point is reached where the chains start to overlap, this point C* is referred to as 

the critical concentration of monomers at the overlap limit and can be approximated as r^ = N/Rp, with 

Ry = ^{Jf-Jin a good solvent. Using equation (C2.1.6) , the volume fraction of the polymer at the overlap limit is 

(|>* < A^ [21]. Thus, for a polymer with N = 10 , the chains are isolated in solution only if the polymer volume 
fraction is less than about 0.001. 

In discussion of the solution behaviour, the osmotic pressure is a quantity of primary interest. According to 
equation (C2.1.5) , the osmotic pressure at very low concentrations is inversely proportional to the molecular weight 
of the polymer. This behaviour is indeed observed in figure C2.1.8. With increasing polymer concentration, the 
dependence of the osmotic pressure on the molecular weight vanishes and only the weight concentration of the 
polymer is relevant. In this regime the chains strongly overlap and the osmotic pressure scales as ["[/ RT <x <^ 4 - 
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Figure C2.1.8. Reduced osmotic pressure U/(RTc ) as a function of the polymer weight concentration c w for 
solutions of poly(a-methylstyrene) in toluene at 25 °C. The molecular weight of poly(a-methylstyrene) varies 

between M n = 7 x 10 g mol^ (uppermost curve) and M n = 7.47 x 10 g mol (lowest curve). (Figure from [76], 
reprinted by permission of the American Chemical Society.) 
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The inverse scattering function of dilute polymer solutions for small scattering wavenumber (qR^ « 1) obeys the 
so-called Zimm relation [22]: 


S(q,c w ) N \ IS / 


(C2.1.8) 


By extrapolating the measurement of 1 / S (q, c ) to q — » and c w — » 0, A 2 and R^ can be measured. The behaviour 
of the osmotic pressure and that of the scattering function can be satisfactorily explained in the dilute regime, the 
cross-over region and the semi-dilute range by using only three parameters, namely, the polymer concentration c , 
the Flory radius R^ and the thermic correlation length. Beyond the semi-dilute range, about § > 0.1, the description 
based on the three parameters only is no longer valid. 

C2.1.4.2 MELTS AND SCREENING EFFECT 

Consider the monomer-density distribution for a single chain averaged over all possible conformations. The 
distribution has a bell-like shape with its middle point at the gravity centre of the chain. With this picture in mind, 
the excluded-volume effect encountered in good solvents can be understood as an entropic force which acts on the 
monomers in the direction of lower concentration. Overall, these forces lead to an expansion of the polymer chain. 
In the melt, the monomer density is constant since the monomers of the other chains compensate for the density 
gradient of a particular chain. This compensation is referred to as the 'screening effect' and is responsible for the 
Gaussian behaviour of chains in the melt. These qualitative arguments [23 ] are expressed quantitatively in the fact 
that, in polymer melts, the second virial coefficient of the local osmotic pressure scales as 1 / TV and thus vanishes 
for long chains [24]. 


In summary, we see now how the change from the expanded chains in dilute solutions to the ideal chains in a melt 
is accomplished. With increasing polymer concentration, the chain overlap increases and the length scale over 


which excluded-volume effects are screened decreases. As the screening length decreases to the thermic correlation 
length, all excluded-volume effects disappear. Simultaneously, the polymer shrinks to the size of a Gaussian chain. 

C2.1.4.3 GLASSY STATE 

Upon cooling the melt of a polymer that cannot crystallize, the system becomes glassy, i.e. hard and void of long- 
range order. In ( figure C2. 1.9(a) ) the specific volume of poly (vinyl acetate) is plotted as a function of the 
temperature. The specific volume is measured upon heating the sample after having quenched it at -20 °C. Clearly, 
near T= 30 °C the slope of the graphs changes. The change in slope from the glassy region to the melt is not 
accompanied by a discontinuity, as in the case of the fusion of crystalline materials. This type of transition, with 
continuous specific volume or heat capacity, is typical of glassy materials and the temperature at which the linear 
extrapolations of the measured quantity cross defines the glass temperature T However, the exact value of the 
glass transition temperature depends on the rate of temperature change at which the measurement is performed. 
This dependence on the measurement kinetics supports the view that no real phase transition occurs, but rather that 
the system 'freezes' in a non-equilibrium state. 
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Figure C2.1.9. Specific volume of poly(vinyl acetate) as a function of the temperature measured during heating 
two samples which were preliminary quenched from the melt to -20 °C. One sample was stored for 1 min and the 
other for 100 h at -20 °C before heating. (Figure from [77], reprinted by permission of John Wiley and Sons Inc). 

The variation of the specific volume as a function of heating in figure C2.1.9 is plotted for two samples which were 
stored for different times after a quench to -20 °C. The specific volume in the glassy region and the glass 
temperature depend on storage time. This dependence shows that, on a practical time scale, glassy polymers are not 
in equilibrium. A second point of interest about the structure of glassy polymers is the presence of local order. 
Some experimental results suggest the occurrence of more local order in the glass than that found in the melt, 
where the chains behave as Gaussian with hardly any specific order between each other. However, many 
investigations support the view that very little difference exists between the average conformation of polymer chain 
in the glass and that in the melt [25]. 


Several interpretations of the glass transition have been put forward. They can be grouped into three main 
categories [26]. First, some theories are based on the concept of the free volume in the form of voids as a 
requirement for the onset of cooperative motion. Upon cooling of the melt, the free volume is continuously 
squeezed out of the system. The transition into the glassy state occurs when no free-volume is left. Second, kinetic 
theories of the glass transition relate the onset of the segmental motion of the polymer to the transition from the 
glassy state to the melt. Third, thermodynamic theories are based on the extrapolation to the glass transition at 
infinitely long measurement times. In this hypothetical regime, T becomes independent of the experimental 
procedure and a second-order transition occurs between two phases at equilibrium. Each of the theories can predict 
some of the observed changes at the glass temperature. However, none can explain them all [26]. 
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C2.1.5 THERMODYNAMICS AND PHASE TRANSITION OF POLYMER 
MIXTURES 

Often it is difficult to achieve the desired mechanical properties with a system made of a single polymer. 
Polystyrene, for example, possesses a good stiffness but has a low impact strength, it is a brittle material. By 
mixing small amounts of polybutadiene with polystyrene, a tough material is obtained which retains satisfactory 
stiffness. Blending polystyrene with a small amount of polybutadiene results in a two-phase structure, in which 
polybutadiene droplets are embedded in the polystyrene matrix. Other binary systems with good mechanical 
properties, like blends of polystyrene and poly(dimethylphenylene oxide), form homogeneous structures. In order 
to correlate the mechanical properties with the structures of the blend, it is of primary interest to understand the 
formation of the different types of structures. 

C2.1.5.1 FLORY-HUGGINS THEORY 

The Flory-Huggins theory of polymer mixtures [15, 27] is based on two main assumptions: first, the screening of 
excluded- volume effects renders all polymer chains effectively Gaussian; second, the interaction between the 
monomers of different chains can be treated using a mean field approximation, i.e. all monomers feel, on average, 
the same environment. These assumptions allow the derivation of the Gibbs energy of mixing of a binary polymer 
system: 


AGLk = RTnJ^\n* A + ^\*4* + X*A<h) (CZ1 " 9) 

where n c is the number of reference units, approximately equal to the number of monomers, <\> A and <\> B are the 
volume fractions of polymers A and B, N A and N B are the degrees of polymerization. The quantity % is the Flory- 
Huggins parameter; it is dimensionless and determines in an empirical manner the change of the local Gibbs energy 
per reference unit upon mixing [15, 27]. The first two terms in the parentheses arise from the increase of the 
translational entropy of the centres of mass of the chains after mixing, the third from the interactions between the 
chain segments. 

The gain of translational entropy in mixing macromolecules is small compared to that achieved in mixing low- 
molecular- weight compounds. This poor gain manifests itself in the occurrence of the degrees of polymerization in 
the denominators of the two entropy terms in equation (C2.1.9). Consequently, homogeneous mixing occurs only 
when x is negative, or positive but small. In figure C2. 1.1 0(a) the Gibbs energy of mixing of two polymers with the 
same degrees of polymerization TV is plotted for different values of the product %N. If%N< 2, mixing occurs at any 
(|> A value. For %N> 2, the shape of AG mix as a function of 4> A changes. Consider the curve for % N = 2.4 at § A = 
0.4; the system can lower its Gibbs energy by separating into two homogeneous phases with volume fractions 
^Y an d ^A- A similar behaviour occurs for all values of % with %N> 2. The ensemble of volume fractions resulting 
from the phase separation form the binodal ( figure C2. 1.1 0(b) ) [28]. Within the binodal, the system has a two- 
phase structure. If the condition N A = N B = Nis released, the graphs in figure C2.1.10 become asymmetric, but the 
essential features qualitatively remain. 
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Figure C2.1.10. (a) Gibbs energy of mixing as a function of the volume fraction of polymer A for a symmetric 
binary polymer mixture N A = N B = N. The curves are obtained from equation (C2.1.9) . (b) Phase diagram of a 
symmetric polymer mixture N A = N B = N. The full curve is the binodal and delimits the homogeneous region from 
that of the two-phase structure. The broken curve is the spinodal. 

In many applications the phase structure as a function of the temperature is of interest. The discussion of this issue 
requires the knowledge of the temperature dependence of the Flory-Huggins parameter % (J). If the interactions 

between the different monomers weakly depend on temperature, % is proportional to T because of the prefactor 
RT in ( C2.1.9 ). In this situation, the axis in figure C2. 1.1 0(b) indicates decreasing temperature and the region where 
the separation into a two-phase structure occurs is referred to as the 'lower miscibility gap'. In some cases, % 
increases with increasing temperature with a sign change from negative to positive. Such a temperature dependence 
of the Flory-Huggins parameter leads to the existence of an 'upper miscibility gap'. 

It is not always possible to describe the variations of AG mix ((|) A , T) with % depending only on temperature. For 
some systems, the dependence of % on the volume fraction must be considered too. However, the Flory-Huggins 
theory provides a very useful insight into structural behaviour of polymer mixtures. 

C2.1.5.2 MECHANISMS OF PHASE SEPARATION 

Phase separation of a polymer mixture is induced when the conditions change from the one-phase region into a 
miscibility gap. Usually, phase separation is provoked by a change in temperature. Two mechanisms of phase 
separation are known: 'nucleation and growth' and 'spinodal decomposition' [29]. Nucleation and growth 
processes start with the formation of the nuclei of a new composition. Subsequently, domains of the new 
composition grow from these nuclei. Systems that undergo spinodal decomposition behave differently. The sizes of 
the domains with the new compositions do not vary much until the latter stages of the phase separation process. In 
contrast, the compositions of the new phases change continuously. 
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The occurrence of one or the other phase-separation mechanisms can be predicted from the consideration of the 
change of the local Gibbs energy, 8 g, associated with a spontaneous fluctuation of the local composition about the 
composition of the homogeneous mixture (|> A0 . If § g is positive, restoring forces will bring back the system to the 
homogeneous composition. In this regime, phase separation has to overcome an energy barrier, this is achieved in 
the nucleation step before the growth of the domains with a new composition. If § g is negative, no restoring force 
exists and the amplitude of the composition fluctuation grows so that the system separates into different phases 
spontaneously. Approximately, the sign of § g is given by that of the second derivative of A G mix ((|) A ) at (|) A0 . 
Therefore, the locus of the values (|> A with a vanishing second derivative of A G mix delimits the region of the 
miscibility gap in which spinodal decomposition occurs. This locus is referred to as the spinodal ( figure C2.1.10 
(b) ). The length scale of the concentration fluctuations at the beginning of the separation process is controlled by 


the distance from the spinodal. If a system is brought into the region between the spinodal and the binodal, phase 
separation follows the nucleation and growth mechanism. 

C2.1.5.3 BLOCK COPOLYMERS 

In block copolymers [8, 30], long segments of different homopolymers are covalently bonded to each other. A 
large part of synthesized compounds are di-block copolymers, which consist only of two blocks, one of monomers 
A and one of monomers B. Tri- and multi-block assemblies of two types of homopolymer segments can be 
prepared. Systems with three types of blocks are also of interest, since in ternary systems the mechanical properties 
and the material functionality may be tuned separately. 

Similarly to polymer mixtures, block copolymers can form an homogeneous phase but also separate into phases of 
different compositions. However, the presence of covalent bonds between the different blocks has important 
consequences on the structural arrangement after phase separation. Each of the different types of monomer 
segregate and almost pure domains are formed, but the domains have mesoscopic dimensions corresponding to the 
sizes of the blocks. Furthermore, since the block lengths usually have uniform sizes, the arrangement of the 
different domains are ordered. In the case of di-block copolymers the type of order depends on the ratio of the 
degree of polymerizations of block A and block B (figure C2.1.1 1). Tri- and multi-block binary systems exhibit 
qualitatively the same phase behaviour as di-block polymers. Changes occur for ternary systems. Their structures 
still exhibit periodic order, but the lattices are more complex [30]. 
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Figure C2.1.11. Morphologies of a microphase-separated di-block copolymer as function of the volume fraction of 
one component. The values here refer to a polystyrene-polyisoprene di-block copolymer and (|) ps is the volume 
fraction of the polystyrene blocks. OBDD denotes the ordered bicontinuous double diamond structure. (Figure 
from [78], reprinted by permission of Annual Reviews.) 
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C2.1.6 PARTIALLY CRYSTALLINE POLYMERS 


C2.1.6.1 SEMI-CRYSTALLINE STRUCTURES 


Polymers with a regular configuration can crystallize. However, because of the presence of chain entanglements, 
chain defects and chains with different molecular weights, only partial crystallization occurs and an amorphous part 
always subsists. Semi-crystalline polymers exhibit a hierarchical structure. On the molecular level, the chains are 
fully stretched or form regular helices which pack parallel to each other into lamellar crystallites. The chain axes 
are perpendicular to the lamellar plane and the lamellar thickness is of the order of 10 nm. In single-polymer 
crystals prepared from highly dilute solutions, the chain folds back in a regular manner and builds the adjacent 
helix. In most of the systems, however, the chain may also re-enter the same lamella at a distant position or leave it 
definitively and possibly participate in another lamella after crossing an amorphous layer. Many lamellae form 
larger spherical assemblies with amorphous interstices called spherulites (figure C2.1.12). These assemblies have 
sizes from a few micrometres up to centimetres. The directions of the polymer chains or helices are always 
perpendicular to the radius of the spherulite. 
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Figure C2.1.12. Schematic drawing of the cross section through a spherulite. The lines indicate the connectivity of 
the crystalline lamellae. The inner structure of a lamella is also shown and consists of parallel polymer chains with 
their axes perpendicular to the spherulite radius. 

The content of crystalline material in a system can be characterized by several techniques. Density measurements 
provide the volume fraction occupied by the crystallites. Measuring the heat of fusion provides the weight fraction 
of the crystalline domains. More details on the structure of semi-crystalline polymers can be obtained from electron 
microscopy, scattering techniques, Raman spectroscopy and NMR [31] . It is important to realize that the formation 
of crystalline polymeric structures is hindered because of the length of the chains and the presence of 
entanglements. Therefore, structure formation is governed by kinetical criteria rather than equilibrium 
thermodynamics: the structure that develops at a given temperature is that with the maximal growth rate rather than 
that with the lowest Gibbs energy. 
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C2.1.6.2 PRIMARY CRYSTALLIZATION 

The usual and therefore most important situation where polymers crystallize is in melts cooled below the point of 
the fusion of a crystallite of infinite dimensions. Then, crystallization occurs by the nucleation and growth of 
spherulites. Another crystallization process is sometimes encountered in oriented melts and glasses. In such 
systems, the crystallization seems to occur at once in the whole sample and not at the interface between the 
growing crystallites and the amorphous matrix. Despite numerous studies, the crystallization process is not fully 
understood. Scattering measurements suggest a preliminary spinodal decomposition of the undercooled isotropic 
melt in phases with and without chain ends and chain defects before the formation of the crystallites [32]. 

In supercooled isotropic melts, crystallite growth occurs after nucleation. Initially, the lateral size of the lamellae 
increases at a constant rate, but as soon as the spherulites start to touch each other, the growth rate decreases. 
Overall, the sigmoidal time dependence of volume fraction of the crystalline regions is well described by the 

Avrami equation: § c oc 1 - exp(-(^0 )? m which z is related to the rate constant of the lateral growth of the lamellae 
and P is a phenomenological parameter called the 'Avrami exponent'. The rate constant of the lateral growth is 
controlled by the balance of two factors; namely, the thermodynamic driving force of crystallization and the 
mobility of the chain segments. The former increases with undercooling, the latter decreases, so that a maximal rate 
of crystallization exists at a given undercooling. The minimal thickness of the lamellae also depends on the 
undercooling: it decreases with increasing undercooling. 

The study of the crystallization of oriented melts and glasses shows that density variations on the length scale of 
tens of nanometres, as measured with small-angle x-ray scattering, occur before Bragg peaks appear in wide-angle 
x-ray scattering measurements [33]. These observations exclude a lateral growth of crystalline lamellae and support 
a continuous phase separation of the whole sample into crystalline and amorphous domains. This process is similar 
to that found in spinodal decomposition of polymer mixtures. The reason for this alternative crystallization 
mechanism might be related to an increased mobility of defects along the preoriented chains. 


C2.1.6.3 SECONDARY CRYSTALLIZATION 

After completion of the primary crystallization at a given temperature, crystallization does not come to an end, but 
resumes upon cooling. Two modes of secondary crystallization have been identified [34]. The more common one 
consists of the 'insertion' of new crystallites between those formed during primary crystallization. The inserted 
crystallites have a smaller thickness because they are formed at a lower temperature. The second mode is called 
'surface crystallization'. The increase of the thickness with decreasing temperature is possible if the mobility is still 
high enough so that the defects concentrated at the crystal surface after primary crystallization can move further 
into the amorphous domains. Then, because of the higher thermodynamic driving force to crystallization due to the 
lower temperature, crystal thickness augments. 


C2.1.7 POLYMER DYNAMICS AND MECHANICAL BEHAVIOUR 

Polymers have found widespread applications because of their mechanical behaviour. They combine the 
mechanical properties of elastic solids and viscous fluids. Therefore, they are regarded as viscoelastic materials. 
Viscoelastic 
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behaviour does not mean a simple superposition of the two properties, but includes a new phenomenon called 
anelasticity in which elastic response and viscous flow are coupled. When a load is applied, part of the 
deformation, although reversible, requires a certain time to occur. 

C2.1.7.1 MICROSCOPIC DYNAMICAL MODELS 

(A) ROUSE MODEL 

A polymer chain can be approximated by a set of balls connected by springs. The springs account for the elastic 
behaviour of the chain and the beads are subject to viscous forces. In the Rouse model [35], the elastic force due to 
a spring connecting two beads is/= bAr, where Ar is the extension of the spring and the spring constant is 
h = *Sk T/a^\ rtRis the root-mean-square distance of two successive beads. The viscous force that acts on a bead is 
the product of the bead velocity u and of the friction coefficient ^ R of a bead. With these assumptions, one finds for 
the slowest relaxation mode of the Rouse chain [35], which corresponds to the motion of the end-to-end vector, 


Since (R ) oc N, we have x R oc N . It is important to note that x R should be independent of the length of the Rouse 
unit given by a R . Since ri^is proportional to the number of monomer units between two beads, ^ R should scale 

equally. This is the case in a melt, but not for an isolated polymer chain in a solvent where hydrodynamic 
interactions strongly affect the motion. 

Using the fluctuation-dissipation theorem [36], which relates microscopic fluctuations at equilibrium to 
macroscopic behaviour in the limit of linear responses, the time-dependent shear modulus can be evaluated [37]: 


G(t) = CpkT ^ cxpf J. 


(CZ1.11) 


The summation extends over the JV, 


R 


1 Rouse modes with relaxation time x m , w : 


l,...,tf R - 


- 1 , and c is the 
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number of polymer chains per unit of volume. Integrating equation (C. 2. 1.1 1) leads to G(t) oc t . From G(7), the 
viscosity at the zero shear rate can be calculated [37], j^ = f^ G(t) df, and results in r| oc N. This is indeed 

found in many melts of polymer chains shorter than the entanglement molecular weight. 


(B) ENTANGLEMENT EFFECTS AND REPTATION MODEL 

With increasing molecular weight, polymer chains interpenetrate and become entangled. A critical molecular 
weight at the entanglement limit, M c , is defined, above which effects of entanglements become apparent. Two of 
these effects are the occurrence of the rubber-elastic plateau in the mechanical-response functions (see section 
C2. 1.7.2 ) and a change in the dependence of the viscosity on the molecular weight. 
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Microscopically, entanglements mainly hinder the lateral motion of a polymer chain. On the basis of this idea, De 
Gennes [38] and Doi and Edwards [ 39 ] proposed that the chain motion occurs in a tube formed by the obstacles set 
by the adjacent chains (figure C2.1.13). The reptation model assumes that the average over the rapid wriggling 
along the cross section of the tube defines the primitive path of length / along which the chain has to diffuse. The 
time t d required for the chain to leave the tube along the curvilinear path is Tp = /^/ ft, since D = kT I (N R ^ R ), x D 


cc N . Experimentally, it was found that t d oc A^, with 3.2 < v < 3.6. The viscosity scales also as ru oc 
contrast to the situation below M , where r| oc N. It is also interesting to compare the dependence of th< 


N v An 


the diffusion 


coefficient on the molecular weight: in the absence of entanglements, D ~ (R ) ; / x R oc M~\ while in the presence 
of entanglements, D « i^)/^o a AWA^,hence D oc M" 2 . 
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Figure C2.1.13. (a) Schematic representation of an entangled polymer melt, (b) Restriction of the lateral motion of 
a particular chain by the other chains. The entanglement points that restrict the motion of a chain define a 
temporary tube along which the chain reptates. 

(C) HYDRODYNAMIC INTERACTIONS IN SOLUTIONS 


In dilute solutions, the dependence of the diffusion coefficient on the molecular weight is different from that found 
in melts, either entangled or not. This difference is due to the presence of hydrodynamic interactions among the 
solvent molecules. Such interactions arise from the necessity to transfer solvent molecules from the front to the 
back of a moving particle. The motion of the solvent gives rise to a flow field which couples all molecules over a 


distance larger than the size of the moving particle. A well known result, derived by Stokes, relates the friction 
coefficient of a sphere to its radius R h : ^ = 6% R^^ where r| o is the solvent viscosity. 
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In dilute polymer solutions, hydrodynamic interactions lead to a concerted motion of the whole polymer chain and 
the surrounding solvent. The folded chains can essentially be considered as impermeable objects whose 
hydrodynamic radius is ^ R„^ /?„is the gyration radius defined as 


R l =^& r r-T*ci 2 ) (C2.1.12) 


where r. is the position of monomer i and r c is the position of the centre of gravity. The radius of the gyration is 
related to the end-to-end distance: for a Gaussian chain, &} — (M 2 )& - *-\M*t(Jb. Using Einstein's and Stoke's 
relations we have D = kT l\ = kT I {An r\ R ). Together with the fact that the radius of gyration is proportional to the 

root mean square of the end-to-end distance we get D oc M _v , with v = 0.5 for Gaussian chains and v = 0.6 for 
expanded chains. The diffusion coefficient can be measured using light scattering and provides the radius of 

gyration. From the radius of gyration the degree of polymerization can be obtained. 

Another simple way to obtain the molecular weight consists of measuring the viscosity of a dilute polymer 
solution. The intrinsic viscosity [r|] is defined as the excess viscosity of the solution compared to that of the pure 
solvent at the vanishing weight concentration of the polymer [40]: 

[,,]= ii m !LJ!i-L, (C2.1.13) 

The Mark-Houwink-Sakurada equation relates the intrinsic viscosity to the polymer weight: 

[*]]= KM* f l=lv- 1. (C2.1.14) 

Two limiting cases exist: ju = 0.5, corresponding to v = 0.5 for Gaussian chains, and ju = 0.8, or v=0.6 for expanded 
chains. 

C2.1.7.2 MECHANICAL RESPONSES 

(A) RESPONSE FUNCTIONS 

Several functions are used to characterize the response of a material to an applied strain or stress [41]. The tensile 
relaxation modulus E(t) describes the response to the application of a constant tensile strain tf,: £{r } = <7- : (f )/e? T . 
Here o zz (t) is the tensile stress and e®. = &L-/L Z , where L z is the initial length of the sample and AL z is the 
sample elongation. In shear experiments, the shear relaxation modulus G(t) is defined as G(t ) = n\ : (7 )/<■"_, where 
£ ?fl_is the constant shear strain applied and a xz t is the shear stress. The dynamical shear modulus G*(oo) measures 
the response or° n cxp(i(ftrt + 5))to a small oscillatory shear strain c* z exp(ifLtf) + G x i&>} = tf^exp{i5)/ej,. Another 
quantity, 


-22- 


which is often encountered is the dynamical shear compliance J*(co) = 1 / G*(oo), which characterizes the response 


to a small oscillatory shear stress. The dynamical shear compliance and the dynamical shear modulus have a real 
and an imaginary part. The real part corresponds to the elastic response to the oscillatory field applied, while the 
imaginary part is characteristic of the viscous response and quantifies the work expended on the driven system. 

(B) MECHANICAL RELAXATION PROCESSES 

Before discussing the complex mechanical behaviour of polymers, consider a simple system whose mechanical 
response is characterized by a single relaxation time x, due to the transition between two states. For such a system, 
the dynamical shear compliance is [ 42 ] 


rfa) = r-ij w = 


AJ 


A J tor 


— l- 


iWr 2 ]+w 2 r 2 


(C2.1.15) 


where A/is related to the equilibrium value of the strain under the shear stress o^in a creep experiment: 
AJrf" = fj, : (j — * sc). The real and imaginary parts of the dynamical compliance are shown in figure C2.1.14. If 
the angular frequency of the applied shear stress is much faster than the transition rate constant x , the system 
reacts to the average stress only, which is zero; if the angular frequency is much slower than x ~ l , thermal 
equilibrium is maintained throughout the deformation process. In the intermediate range, cox ~ 1, the system 
absorbs energy from the applied stress field. 



Figure C2.1.14. (a) Real part and (b) imaginary part of the dynamic shear compliance of a system whose 
mechanical response results from the transition between two different states characterized by a single relaxation 
time x. 

In the limit of a small deformation, a polymer system can be considered as a superposition of a two-state system 
with different relaxation times. Phenomenologically, the different relaxation processes are designated by Greek 

letters, a, P, and y. The a processes are those with slow relaxation rate constants x , in which several monomers 
move cooperatively. They are usually associated with the glass transition. On the other hand, the symbol y is used 
for fast processes, which are generally localized within a monomer unit. The different processes span a time scale 
of more than ten orders of magnitude. 


-23- 


The mechanical behaviour of a polymer as a function of temperature is summarized in figure C2.1.15. The 

compliance is about 10 N m in the glassy state and increases to about 10 N m after the glass-rubber 
transition. The width of the rubber plateau depends on the density of entanglement. For chains below the critical 
molecular weight at the entanglement limit, M , the plateau disappears and the polymer directly enters the terminal 
flow region. It is important to note that even in this region polymers still behave as viscoelastic liquids. This is in 
contrast to low-molecular- weight compounds above their melting point. 
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Figure C2.1.15. Schematic representation of the typical compliance of a polymer as a function of temperature. 
(C) VOGEL-FULCHER AND WILLIAMS-LANDEL-FERRY EQUATIONS 

Most response functions of polymers obey a time-temperature or frequency-temperature superposition [43, 44]. A 
change in temperature is equivalent to a shift of the logarithmic frequency axis: 


G*(7\logw) = G*(Tv Jog <y + log eirl 


(C2.1.16) 


In amorphous polymers, this relation is valid for processes that extend over very different length scales. Modes 
which involved a few monomer units as well as terminal relaxation processes, in which the chains move as a 
whole, obey the superposition relaxation. On the basis of this finding an empirical expression for the temperature 
dependence of viscosity at a zero shear rate and that of the mean relaxation time of a modes were derived: 


rfu(T) = Bcxp 


r - Ts 


T a (T) — T U CXp 


T A 


T - 71 


(C2.1.17) 


These are the Vogel-Fulcher equations [44]. In addition to the prefactors, two common parameters appear, namely 
the activation temperature T A , typically T A = 1000 -2000 K, and the Vogel-Fulcher temperature r y , which is 
generally 30- 70 K below the glass temperature. Using the Vogel-Fulcher equations, Williams, Landel and Ferry 
derived an expression for the shift parameter log a T This expression is known in the literature under the name 
'WLF equation' [45,46]: 


logtfr = -C\ 
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r - v., 


T-Tq C 2 


(C2.1.18) 


The parameters C 1 and C 2 are defined as 


C\ = logc 


anJ 


C; = 7]] — Tv * 


Tu - Tv 

The WLF equation relates the dependence of the mechanical responses on frequency to that on temperature. 


C2.1.8 NONLINEAR MECHANICAL BEHAVIOUR 


In the last section we considered the mechanical behaviour of polymers in the linear regime where the response is 
proportional to the applied stress or strain. This section deals with the nonlinear behaviour of polymers under large 
deformation. Microscopically, the transition into the nonlinear regime is associated with a change of the polymer 
structure under mechanical loading. 

C2.1.8.1 ELASTICITY OF IDEAL RUBBER 

Rubbers are crosslinked polymers above the glass transition. These materials can be stretched by a large factor, 
sometimes exceeding 10. After removing the load, the system generally recovers its initial shape. In the elongated 
state, the applied stress is balanced by restoring forces of a mainly entropic nature. Using the extension ratio X 
defined as 

_ L-+AL, 

where L z is the original length of the sample and L z + AL z is its length under the applied stress, the restoring force/ 
can be written as 


L- \ 3X Jy j L z \ (iy/y T 


(C2.1.19) 


where E is the internal energy, S is the entropy, and Fis the volume of the system. Experimentally, it has been 
found that the energetic contribution to the force is usually small [47]. If the restoring force is purely entropic, the 
system is referred to as ideal rubber. The reduction of entropy upon elongation which gives rise to the restoring 
force is easy to understand: part of the chain conformations accessible in the undeformed state cannot be accessed 
after elongation. If the chain junctions are assumed to be fixed and deformed in affine manner, an expression for 
the engineering tensile stress (force divided by the initial cross sectional area) as a function of the elongation can be 
derived [47]: 
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c is the number of elasticity active chains per volume unit. The comparison between experimental data and the 
prediction by (C2.1.20) shows a reasonable agreement up to large deformation (figure C2.1.16). For large values of 
X, strain hardening arises because of the limited extensibility of the chains or because of shear-induced 
crystallization. 
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Figure C2.1.16. Tensile stress as a function of the extension ratio registered for a sample of natural rubber 
(circles). The broken curve is calculated from equation (C2.1.20). (Data from [79].) 

C2.1.8.2 SHEAR THINNING AND NORMAL STRESS IN POLYMER MELTS 

Polymers owe much of their attractiveness to their ease of processing. In many important techniques, such as 
injection moulding, fibre spinning and film formation, polymers are processed in the melt, so that their flow 
behaviour is of paramount importance. Because of the viscoelastic properties of polymers, their flow behaviour is 
much more complex than that of Newtonian liquids for which the viscosity is the only essential parameter. In 
polymer melts, the recoverable shear compliance, which relates to the elastic forces, is used in addition to the 
viscosity in the description of flow [48]. 

Flow behaviour of polymer melts is still difficult to predict in detail. Here, we only mention two aspects. The 
viscosity of a polymer melt decreases with increasing shear rate. This phenomenon is called shear thinning [48]. 
Another particularity of the flow of non-Newtonian liquids is the appearance of stress normal to the shear direction 
[48]. This type of stress is responsible for the expansion of a polymer melt at the exit of a tube that it was forced 
through. Shear thinning and normal stress are both due to the change of the chain conformation under large shear. 
On the one hand, the compressed coil cross section leads to a smaller viscosity. On the other hand, when the stress 
is released, as for example at the exit of a tube, the coils fold back to their isotropic conformation and, thus, give 
rise to the lateral expansion of the melt. 
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C2.1.8.2 YIELD PROCESS AND FRACTURE 


Glassy and semicrystalline polymers exhibit complex stress-strain diagrams. The stress-strain relation for the 
plane-strain compression of bisphenol-A polycarbonate is shown in figure C2.1.17. For small strains the material 
response is elastic. This behaviour persists up to the yield point, after which further elongation is not accompanied 
by an increase of stress. Finally, strain hardening sets in, whose signature is a steep exponential-like increase of 
stress. Ultimately, the sample fractures. If deformation is stopped before this ultimate step, the sample only slightly 
retracts. However, most of the deformation is reversed when the polymer is heated above the glass transition [49]. 
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Figure C2.1.17. Stress-strain curve measured from plane-strain compression of bisphenol-A polycarbonate at 25 ° 
C. The sample was loaded to a maximum strain and then rapidly unloaded. After unloading, most of the 
deformation remains. 

Under compression or shear most polymers show qualitatively similar behaviour. However, under the application 
of tensile stress, two different deformation processes after the yield point are known. Ductile polymers elongate in 
an irreversible process similar to flow, while brittle systems whiten due the formation of micro voids. These voids 
rapidly grow and lead to sample failure [50, 51 ]. The reason for these conspicuously different deformation 
mechanisms are thought to be related to the local dynamics of the polymer chains and to the entanglement network 
density. 

Deformation recovery upon heating above the glass transition suggests that the structural changes due to 
deformation might be similar to those occurring in rubber, where the chains are stretched between two cross- 
linking points. In glassy polymers, the chain entanglements could act as temporary cross-linking points. Another 
observation supports the view that entanglements are of primary importance in the deformation of glassy and 
semicrystalline polymers. Fibres spun from gels with a low density of entanglements have a much higher 
drawability than fibres spun from entangled melts [52]. 


C2.1.9 DIFFUSION IN POLYMERS 

Small molecules can penetrate and permeate through polymers. Because of this property, polymers have found 
widespread use in separation technology, protection coating, and controlled delivery [53]. The key issue in these 
applications is the selective permeability of the polymer, which is determined by the diffusivity and the solubility 
of a given set of low-molecular- weight compounds. The diffusion of a small penetrant occurs as a series of jumps 
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from one hole in the polymer matrix to another. Obviously, the size of the holes must be sufficient to accommodate 
the moving molecule. Because of this restriction, glassy polymers often possess a lower diffusivity but a higher 
selectivity than rubbers. However, diffusion of small molecules in polymers is a manyfold process. On the one 
hand, it can be relatively simple in the case where the diffusivity is independent of the local concentration of the 
penetrant and the polymer matrix is left unchanged by the motion of the small molecules. On the other hand, the 
penetrant may interact with the matrix and render the diffusivity strongly concentration dependent. 

The diffusion of small molecules in polymers can be described using Fick's first and second laws. In a one- 
dimensional situation, the flux J(c, x) as a function of the concentration c and the position x is given by 


OX 


(C2.1.21) 


where D(c) is the diffusion coefficient. The concentration variation is obtained by evaluating the net change in flux 
within an elementary volume: 


l c ;>/j/;j<?\ 2 
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If the diffusion coefficient is independent of the concentration, equation (C2.1.22) reduces to the usual form of 
Fick's second law. Analytical solutions to diffusion equations for several types of boundary conditions have been 
derived [54]- In the particular situation of a steady state, the flux is constant. Using Henry's law (c = kp) to relate 
the concentration on both sides of the membrane to the partial pressure, the constant flux can be written as 

Ap 

J = -Dk-j- (C2.1.23) 

where k is Henry's constant, A p is the partial-pressure difference on the two sides of the membrane, and / is the 
membrane thickness. The product P = Dk is called the permeability. Often Henry's law may not be applied or the 
diffusion coefficient may be concentration dependent, so that the permeability is only a phenomenological 
parameter with practical relevance, but little fundamental significance [55]. 

Several ideas have been put forward to calculate the diffusion coefficient of small molecules in polymers. 
Glasstone et al [56] proposed an expression based on transition-state theory 

n , 2 kTZ* / AE*\ 

D - A TT e " , V"5rJ <C2,24) 

where X is the root mean square of the jump distance, Z"!" and Z are the partition functions of the system in the 
activated and normal states, respectively and AE* is the activation energy of the jump process at K. 
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Other expressions for the diffusion coefficient are based on the concept of free volume [57], i.e. the amount of 
volume in the sample that is not occupied by the polymer molecules. Computer simulations have also been used to 
quantify the mobility of small molecules in polymers [58]. In a first approach, the partition functions of the ground 

and activated states and the energy of activation AE* of a jump process are estimated from computer simulations. 
The simulations also provides an estimate of the mean jump distance X, so that D can be calculated from ( C2.1.24 ). 
Another straightforward approach is to use molecular dynamics simulations and evaluate the diffusion coefficient 

directly from (\r (t) - r (0)| ) = 6Dt, where r(t) is the position of a penetrant molecule at time t. However, this 
method is restricted to penetrants with high mobility because the simulation time is limited by the computational 
power available. 

In glassy polymers the interactions of the penetrant molecules with the polymer matrix differ from one sorption site 
to another. A limiting description of the interaction distribution is known under the name of the dual-sorption 
model [59, 60]. In this model, the concentration of the penetrant molecules consists of two parts. One obeys 
Henry's law and the other a Langmuir isotherm: 

c = C}f + c L = kpi f^r- (C2125) 

1 +bp 

where b is the hole affinity constant and C H is the hole saturation constant. The simplest transport model assumes 
that the molecules sorbed in the Langmuir mode are immobile. Other models assume different mobilities for the 
two modes. In all models equilibrium is maintained between the two modes. 


1 II 

In sorption experiments, the weight of sorbed molecules scales as the square root of the time, M(i) oc t , if 
diffusion obeys Fick's second law. Such behaviour is called case I diffusion. For some polymer/penetrant systems, 
M(i) is proportional to t. This situation is named case II diffusion [61, 62]. In these systems, sorption strongly 
changes the mechanical properties of the polymers and a sharp front of penetrant advances in the polymer at a 
constant speed (figure C2.1.18). Intermediate behaviours between case I and case II have also been found. The 
occurrence of one mode, or the other, is related to the time the polymer matrix needs to accommodate the structural 
changes induced by the progression of the penetrant. 
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Figure C2.1.18. Schematic representation of the time dependence of the concentration profile of a low-molecular- 
weight compound sorbed into a polymer for case I and case II diffusion. In both diagrams, the concentration 
profiles are calculated using a constant time increment starting from zero. The solvent concentration at the surface 
of the polymer, x = 0, is constant. 
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C2.1.10 COMPUTER SIMULATIONS 

The complexity of polymeric systems make the development of an analytical model to predict their structural and 
dynamical properties difficult. Therefore, numerical computer simulations of polymers are widely used to bridge 
the gap between the theoretical concepts and the experimental results. Computer simulations can also help the 
prediction of material properties and provide detailed insights into the behaviour of polymer systems. A simulation 
is based on two elements: a more or less detailed model of the polymer and a related force field which allows the 
calculation of the energy and the motion of the system using molecular mechanisms, molecular dynamics, or 
Monte Carlo techniques [63]. 

The objective of molecular mechanics is to generate static minimum-energy configurations at the prescribed 
density, corresponding to the local minima of the total potential energy. Such energy minimizations are used in the 
generation of realistic conformations suitable as starting structures for molecular dynamics and Monte Carlo 
simulations. They also allow the estimation of phase stability from the calculation of chemical potential differences 
though a procedure called thermodynamic integration [64]. 

Molecular dynamics tracks the temporal evolution of a microscopic model system through numerical integration of 
the equations of motion for the degrees of freedom considered. The main asset of molecular dynamics is that it 
provides directly a wealth of detailed information on dynamical processes. 


Monte Carlo simulations generate a large number of conformations of the microscopic model under study that 
conform to the probability distribution dictated by macroscopic constrains imposed on the systems. For example, a 
Monte Carlo simulation of a melt at a given temperature T produces an ensemble of conformations in which 
conformation i with energy E f occurs with a probability proportional to exp (- E i / kT) . An advantage of the Monte 
Carlo method is that, by judicious choice of the elementary moves, one can circumvent the limitations of molecular 
dynamics techniques and effect rapid equilibration of multiple chain systems [65]. However, Monte Carlo 


simulations do not provide truly dynamical information. 

The complexity of polymer systems prevents their simulation in full structural and dynamical detail [66]. First, the 
relevant length scales of polymer systems range from about 1 A, the length of a bond, to hundreds of Angstroms, 
the size of the chains. Second, the time scale important to polymer systems covers more than ten orders of 
magnitude. Consequently, a trade-off between the level of structural detail, the size of the system, and the time 
scale of the processes under study has to be made. On one extreme, atomistically detailed models [67, 68] provide 
the specific behaviour of a particular polymer, but the dynamics can be followed up to a few nanoseconds only 
[67]. On the other extreme, coarse-grained models permit the study of dynamics in the melt and of phase separation 
processes, but they reveal only universal features, the particular behaviour of different polymers is lost [66]. When 
none of the extreme models suit, it may be possible to identify the elementary move of Monte Carlo simulations 
with a relative time step, this method is known under the name dynamical Monte Carlo simulation [66]. 
Alternatively, the observation window of a molecular dynamics simulation can be shifted to longer times by 
freezing the fastest degrees of freedom and increasing the duration of the integration time step. For example, in so- 
called constrained molecular dynamics simulations the bond lengths and the bond angles are kept fixed, so that the 
integration time step can be chosen one or two orders of magnitude longer than 1 fs, the time step typically used in 
unconstrained atomistic molecular dynamics simulations [67]. 
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Atomistically detailed models account for all atoms. The force field contains additive contributions specified in 
terms of bond lengths, bond angles, torsional angles and possible crossterms. It also includes non-bonded 
contributions as the sum of van der Waals interactions, often described by Lennard- Jones potentials, and Coulomb 
interactions. Atomistic simulations are successfully used to predict the transport properties of small molecules in 
glassy polymers, to calculate elastic moduli and to study plastic deformation and local motion in quasi-static 
simulations [67, 68]. The atomistic models are also useful to interpret scattering data [ 69 ] and NMR measurements 
[ 70 ] in terms of local order. 

Coarse-grained models represent the polymer chain as a sequence of beads connected by springs similar to the 
Rouse model (see section C2. 1.7. 1(a) ). Frictional forces acting on the beads, elastic forces between two connected 
beads and van der Waals forces between non-connected beads are taken into account in the force field. Several 
coarse-grained models exist [66]. They are grouped into two main categories depending on whether the bead 
positions are restricted to a lattice or not. Lattice models permit one to consider excluded-volume effects simply 
and to sample the possible conformations efficiently. Coarse-grained models are used in the study of melt 
dynamics, glass transition and entanglement effects [71-73]. They have also contributed to a better understanding 
of the phase behaviour of polymer blends and copolymers [71, 72 ]. 
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C2.2 Liquid crystals 

/ WHamley 


INTRODUCTION 

We are all familiar with the three states of matter: gases, liquids and solids. In the 19th century the liquid crystal 
state was discovered [1 and 2]; this can be considered as the fourth state of matter [3]. The essential features and 
properties of liquid crystal phases and their relation to molecular structure are discussed here. Liquid crystals are 
encountered in liquid crystal displays (LCDs) in digital watches and other electronic equipment. Such applications 
are also considered later in this section. Surfactants and lipids form various types of liquid crystal phase but this is 
discussed in section C2. 3 . This section focuses on low-molecular-weight liquid crystals, polymer liquid crystals 
being discussed in the previous section. 

The label 'liquid crystal' seems to be a contradiction in terms since a crystal cannot be liquid. However, the term 
refers to a phase formed between a crystal and a liquid, with a degree of order intermediate between the molecular 
disorder of a liquid and the regular structure of a crystal. What we mean by order here needs to be defined 
carefully. The most important property of liquid crystal phases is that the molecules have long-range orientational 
order. For this to be possible the molecules must be anisotropic, whether this results from a rodlike or disclike 
shape. 

Molecules that are capable of forming liquid crystal phases are called mesogens and have properties that are 
mesogenic. From the same root, the term mesophase can be used instead of liquid crystal phase. A substance in a 
liquid crystal phase is termed a liquid crystal. These conventions follow those in the Handbook of Liquid Crystals , 
[4, 5 and 6] the nomenclature of which [7] for various liquid crystal phases is adopted elsewhere in this section. 


C2.2.1 TYPES OF LIQUID CRYSTAL 

C2.2.1.1 CLASSIFICATION 

Liquid crystal phases can be divided into two classes. Thermotropic liquid crystal phases are formed by pure 


mesogens in a certain temperature range, hence the prefix thermo referring to phase transitions in which heat is 
generated or consumed. About 1% of all organic molecules melt from the solid crystal phase to form a 
thermotropic liquid crystalline phase before eventually transforming into an isotropic liquid at still higher 
temperature. In contrast, lyotropic liquid crystal phases form in solution and, thus, concentration controls the liquid 
crystallinity (hence lyo, referring to concentration) in addition to temperature. Thermotropic liquid crystals do not 
need a solvent in order to form. Lyotropic liquid crystal phases are formed by amphiphiles in solution. 
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C2.2.1.2 THERMOTROPIC LIQUID CRYSTALS 

Thermotropic liquid crystal phases are formed by anisotropic molecules with long-range orientational order and in 
many types of structure with some degree of translational order. The main types of mesogen are those that are 
rodlike or calamitic and those that are disclike or discotic. 

An understanding of the correlation between molecular structure and physical properties of thermotropic mesogens 
is important in order to optimize parameters such as the operating temperature range. A detailed discussion of such 
structure-property relationships is beyond the scope of this chapter. Further details can be found elsewhere [8 and 
9]. The key feature of calamitic mesogens is a rigid aromatic core to which one or more alkyl chains are attached. 
Often the core is formed from linked 1,4-phenyl groups. Within a homologous series it is often found that the 
nematic phase is stable when the alkyl chain is short, whereas smectic phases are found with longer chains. The 
groups that link aromatic moieties in the core should maintain its linearity, whilst additionally increasing the length 
and polarizability of the core if liquid crystal phase formation is to be enhanced. Terminal units such as cyano 
groups also favour the formation of liquid crystal phases, due to polar attractive interactions between pairs of 
molecules. Lateral substituents are also used to control molecular packing. These are groups attached to the side of 
a molecule, usually in the aromatic core. Suitable lateral substituent such as fluoro groups can enhance molecular 
polarizability. On the other hand, they can disrupt molecular packing and thus reduce the nematic-isotropic phase 
transition temperature. Perhaps the most important use of lateral substitution is to generate the tilted smectic C 
phase by creating a lateral dipole. This is especially important in the chiral smectic C phase that is the basis of 
ferroelectric displays ( section C2.2.4.3 ). 


(A) NEMATIC PHASE 

This is the simplest liquid crystal phase. It is formed by calamitic or discotic mesogens, typical examples of the 
former being shown in figure C2.2.1 . The molecules have no long-range translational order, just as in a normal 
isotropic liquid. However, they do possess long-range orientational order, in contrast to a liquid. The nematic phase 
can thus be considered to be an anisotropic liquid. It is denoted N, and an illustration of its structure is included in 
figure C2.2.2 . The most successful theories for orientational order in liquid crystals deal with the nematic phase. 
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Figure C2.2.1. Examples of rodlike nematogens. 

The nematic phase formed by chiral molecules is itself chiral. This used to be called the cholesteric phase, because 
the mesogen for which it was first observed contained a cholesterol derivative. However, it is now called the chiral 
nematic phase, denoted N*, because it has been observed for other types of mesogen. The chiral nematic phase is 
illustrated in figure C2.2.2 . The director (average direction of molecules) twists round in a helix. It is important to 
note that this helical twist refers to the average orientation of molecules and not the packing of molecules 
themselves, because they do not have long-range translational order. The helical structure has a characteristic pitch, 
or repeat distance along the helix, which can range from about 100 nm to near infinity. When the pitch length is 
comparable to the wavelength of light, the chiral nematic phase scatters or reflects visible light, producing colours. 
Furthermore, the pitch and, thus, colour are sensitive to temperature, which is the basis of thermochromic devices, 
i.e. those that produce colour changes in response to temperature ( section C2.2.4.6 ). 
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Figure C2.2.2. Isotropic, nematic and chiral nematic phases. Here n denotes the director. In the chiral nematic 
phase, the director undergoes a helical rotation, as schematically indicated by its reorientation around a cone. 

Although in figure C2.2.2 they are sketched with rodlike molecules, both nematic and chiral nematic phases can 
also be formed by discotic molecules. 
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(B) SMECTIC PHASES 

The notation follows the discovery of different smectic phases, largely on the basis of miscibility experiments 
which did not provide information on the molecular arrangement. Some phases originally thought to be smectic 
(e.g. smectic D) turned out not to be so [10]; thus the modern nomenclature system is not very systematic. Typical 
mesogens forming smectic phases are shown in figure C2.2.3 . Smectic phases are characterized by weak layering 
of molecules. This layering is usually so weak that the density modulation is essentially sinusoidal normal to the 
'layers'. In a smectic A (SmA) phase the molecules are, on average, normal to the layers ( figure C2.2.4 ). In 
contrast, in smectic C (SmC) phases the director is tilted with respect to the layers ( figure C2.2.4 ). Different 
alignments of this structure are possible in which the molecules are aligned with an external field and the layers are 
tilted, or if grown from an SmA phase in a weak aligning field, the layer orientation can stay the same and the 


molecules can tilt. 
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Figure C2.2.3. Examples of smectogens. 
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In the smectic A 1 (SmAj) phase, the molecules point up or down at random. Thus, the density modulation can be 
described as a Fourier series of cosines: 

Here the p^ are the amplitudes of the harmonics of the density, q s is the wavenumber and the <3> n are arbitrary phase 
angles, which are necessary for a complete theoretical description of this structure (see section C2.2.3.4 ). The z 
direction is, by convention, normal to the layers. 

The smectic A phase is a liquid in two dimensions, i.e. in the layer planes, but behaves elastically as a solid in the 
remaining direction. However, true long-range order in this one-dimensional solid is suppressed by logarithmic 
growth of thermal layer fluctuations, an effect known as the Landau-Peierls instability [ 11 , 12 and 13] 

Detailed x-ray diffraction studies on polar liquid crystals have demonstrated the existence of multiple smectic A 
and smectic C phases [14, 15 and 16]. The first evidence for a smectic A-smectic A phase transition was provided 
by the optical microscopy observations of Sigaud etal [17] on binary mixtures of two smectogens. Different 
structures exist due to the competing effects of dipolar interactions (which can lead to alternating head-tail or 
interdigitated structures) and steric effects (which lead to a layer period equal to the molecular length). These 


phases are thus sometimes referred to as frustrated smectics to reflect the simultaneous presence of two, sometimes 
incommensurate, length scales [18, 19 and 20]. Observed smectic A and smectic C structures are shown in figure 
C2.2.5 . Here the arrows denote longitudinal molecular dipoles. In the SmA 1 phase, the layer periodicity, d, is equal 
to the molecular length /. The molecules are interdigitated in the SmA d phase, due to overlap between aromatic 
cores in antiparallel dimers of polar molecules (e.g. with N0 2 or CN terminal groups), leading to typical values of 
d = (1.4— ,1.8)/. In the SmA 2 phase, the polar molecules are arranged in an antiparallel arrangement with d = 2/ 
[21,22]. There are also two modulated smectic phases [21,22]. In the SmA phase, there is an alternation of 
antiferroelectric ordering producing a 'ribbon' structure, in which the ribbons are arranged on a centred lattice. In 
the SmA cre 'crenellated' phase, on the other hand, the ribbons lie on a primitive lattice, i.e. there is an alternation 

in the lateral size of 'up' and 'down' domains ( figure C2.2.5 ). An SmCphase has also been observed, with an 
alternation of bilayers in which the molecules are tilted with respect to the layers ( figure C2.2.5 ). Finally, so-called 
'incommensurate' SmA phases have been identified, in which SmA d and either SmA 1 or SmA 2 periodic density 
waves coexist along the layer normal producing SmA, • and SmA 9 • phases, respectively. Such phases are quite 

i-pHiL- zL^iiiL' 

difficult to represent in real space, so are not shown in figure C2.2.5 . In the case of a weakly coupled phase, the 
two independent and incommensurate waves coexist almost independently of each other, whereas in a strongly 
coupled incommensurate (soliton) SmA inc phase regions of 'locked' SmA ordering are separated by smaller 
regions where the coexisting density waves are out of phase [21,21, 24 and 25]. X-ray diffraction is an invaluable 
technique to elucidate the structure of frustrated smectics, because Bragg peaks are obtained that are reciprocally 
related to the periodicities in the structure. In an oriented sample, the orientation of these peaks furthermore 
indicates the direction of these periodicities. Excellent reviews have been provided of the experimental evidence 
for frustrated smectic phases [21, 22] and of their theoretical description [18, 19, 21, 21, 25 ]. 
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Both SmA and SmC phases are characterized by liquid-like ordering within the layer planes. Other types of smectic 
phase with in-plane order have been identified [9, 20, 26, 22]. These phases exhibit bond orientational order but 
short-range positional order within the smectic layers. The layers themselves are stacked with quasi-long-range 
order. True long-range ordering is suppressed due to the Landau-Peierls instability, as in all smectic phases. Bond- 
orientational order refers to the orientation of the vectors defining the in-plane lattice; it was proposed to exist in a 
smectic phase [28, 29] before being observed [10, 31 ]. As illustrated in figure C2.2.4 in smectic B, smectic I and 
smectic F phases there is sixfold bond-orientational order, i.e. the lattice orientation is retained in the layers but the 
translational order is lost within a few intermolecular distances. The smectic B (SmB, sometimes known as hexatic 
B) phase resembles the SmA phase, but with long-range 'hexatic' bond-orientational order. The SmI and SmF 
phases are tilted versions of SmB. In the SmI phase, the molecules are tilted towards a vertex (nearest neighbour) 
whereas in the SmF phase they are tilted towards an edge of the hexagonal net. 
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Figure C2.2.4. Types of smectic phase. Here the layer stacking (left) and in-plane ordering (right) are shown for 
each phase. Bond orientational order is indicated for the hexB, SmI and SmF phases, i.e. long-range order of lattice 
vectors. However, there is no long-range translational order in these phases. 
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Figure C2.2.5. Frustrated smectic phases. Here the arrows denote longitudinal molecular dipoles. 
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Versions of the SmB, SmI and SmF phases with a higher degree of order were originally classified as smectic 
phases; however, they are now known to be 'soft-crystal' phases, with true long-range positional order in three 
dimensions. The layers are, however, very weakly attached to each other and this was the source of the original 
misidentification. The crystal version of SmB is now termed crystal B (abbreviated simply as B [7]) and crystal J 
and crystal G are three-dimensionally ordered versions of SmI and SmF, respectively. A further soft-crystal phase, 
confused in the early literature with a smectic, is the crystal E phase in which the molecules have a 'herringbone' or 
'chevron' packing, which results from the quenching of sixfold rotational disorder in the B phase to produce long- 
range ordering of the short molecular axes. Tilted versions of this phase, called crystal H and crystal K, are derived 
from G and J phases respectively [20, 22, 26 and 27 ]. 

As with the nematic phase, a chiral version of the smectic C phase has been observed and is denoted SmC*. In this 
phase, the director rotates around the cone generated by the tilt angle [9,32]. This phase is helielectric, i.e. the 
spontaneous polarization induced by dipolar ordering (transverse to the molecular long axis) rotates around a helix. 
However, if the helix is unwound by external forces such as surface interactions, or electric fields or by 
compensating the pitch in a mixture, so that it becomes infinite, the phase becomes ferroelectric. This is the basis of 
ferroelectric liquid crystal displays ( section C2.2.4.4 ). If there is an alternation in polarization direction between 
layers the phase can be ferrielectric or antiferroelectric. A smectic A phase formed by chiral molecules is 
sometimes denoted SmA*, although, due to the untilted symmetry of the phase, it is not itself chiral. This notation 
is strictly incorrect because the asterisk should be used to indicate the chirality of the phase and not that of the 
constituent molecules. 

(C) COLUMNAR PHASES 

Columnar phases are formed by discotic mesogens [33], examples of which are shown in figure C2.2.6 . An 
excellent review of molecules that form discotic phases has recently appeared [34]. Discotic molecules can form a 
nematic phase (termed N D ) just like calamitic mesogens. In addition, several types of of columnar phase have been 
observed ( figure C2.2.7 ) [35]. The recommended abbreviation for these phases is col [7], although D is often 
encountered, especially in the early literature. In the col hd phase there is a disordered stacking of discotic molecules 
in the columns which are packed hexagonally. Hexagonal columnar phases where there is an ordered stacking 
sequence (col ho ) or where the mesogens are tilted within the columns (col t ) are also known [9, 20, 34, 35 and 36 ]. 
It is, however, important to note that individual columns are one-dimensional stacks of molecules and long-range 
positional order is not possible in a one-dimensional system, due to thermal fluctuations and, therefore, a sharp 
distinction between col hd and col ho is not possible [20]. Phases where the columns have a rectangular (col rd ) or 
oblique packing (col Qb d ) of columns with a disordered stacking of mesogens have also been observed [9, 20, 25, 
34, 35 and 36]. 

C2.2.1.3 LYOTROPIC LIQUID CRYSTALS 

Lyotropic liquid crystals are discussed in section C2.3 and section C2.6 of this encyclopedia and will not be 
considered further here. 
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Figure C2.2.6. Examples of disclike mesogens. 
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Figure C2.2.7. Schematic illustrating the classification and nomenclature of discotic liquid crystal phases. For the 
columnar phases, the subscripts are usually used in combination with each other. For example, D rd denotes a 
rectangular lattice of columns in which the molecules are stacked in a disordered manner (after [33]) 


C2.2.2 CHARACTERISTICES OF LIQUID CRYSTAL PHASES 

C2.2.2.1 IDENTIFICATION OF LIQUID CRYSTAL PHASES 

(A) TEXTURES 

Liquid crystal phases possess characteristic textures when viewed in polarized light under a microscope. These 
textures, which can often be used to identify phases, result from defects in the structure. Compendia of micrographs 
showing typical textures exist to facilitate phase identifications [37, 38]. These monographs also discuss the origins 
of defect structures in some detail. 
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As in crystals, defects in liquid crystals can be classified as point, line or wall defects. Dislocations are a feature of 
liquid crystal phases where there is translational order, since these are line defects in this 'lattice' order. Unlike 
crystals, there is a type of line defect unique to liquid crystals termed disclination [39]. A disclination is a 
discontinuity of orientation of the director field. 


Disclinations in the nematic phase produce the characteristic 'Schlieren' texture, observed under the microscope 
using crossed polars for samples between glass plates when the director takes nonuniform orientations parallel to 
the plates. In thicker films of nematics, textures of dark flexible filaments are observed, whether in polarized light 
or not. This texture, in fact, gave rise to the term nematic (from the Greek for 'thread') [40]. The director fields 


around disclinations of different 'strength', s, are shown in figure C2.2.8 (the lines run normal to the page). The 
variation in director orientation can be mapped out by rotating the sample between crossed polars. If the director 
nin the xy-plane is denoted by a vector n = [cos $(r), sin fl(r)], then it can be shown [37, 41, 42] that 9 varies with 
r = (x,y) as 


9 = s tan" 


(iH 


(C2.2.2) 


where 9 Q is a fixed angle. The chiral nematic phase has textures distinct from those of the non-chiral phase, which 
depend on the director orientation with respect to the confining glass slides. These are discussed in [37] and [ 43 ] 

Wall defects are also very important in nematic phases, especially in electric or magnetic fields. This will be 
considered further in section C2.2.4.1 , which discusses Freedericksz transitions in a nematic in an electric or 
magnetic field. 
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Figure C2.2.8. The director field in a nematic around disclinations of various strengths, s. The director fields are 
given by equation (C2.2.2). 
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Smectic A phases, in which the layers are not uniformly parallel to the glass slides confining the sample (non- 
planar orientation [37, 43]) are characterized by 'fanlike' textures, made up of 'focal conies' [20, 37, 43, 44]- A 
focal conic is an intersection in the plane of a geometric object called a Dupin cyclide, that results from lamellae 
forming a concentric roll (like a swiss roll or jelly roll) being bent into an elliptical torus of non-uniform cross- 
section. The fan texture results from disclinations in layers perpendicular to the plane of the confining glass plates, 
usually from focal conies packed into polygonal domains, producing ellipses lying in the plane (figure C2.2.9) [20, 
37 , 44 ]. For smectic A phases with layers oriented parallel to the confining slides, defects such as steps at the edge 
of lamellae are common [ 37 ] although other types of defect have been observed [20]. If a smectic C phase is 
prepared by cooling a smectic A phase, regions of Schlieren texture develop in areas of the sample that previously 
appeared dark under crossed polars, that can coexist with a fan structure. The hexatic B phase is characterized by 
'mosaic' or 'broken' fan textures [9, 37, 38] in non-planar orientations, whilst SmI and SmF phases show a 
Schlieren texture similar to that of SmC, and can often be difficult to distinguish [9]. The crystal phases are 
characterized by mosaic, platelet or batonnet structures. Further details can be found elsewhere [37, 38 ]. 



Figure C2.2.9. Polygonal domains of focal conies in a smectic A phase confined between parallel plates. 

(B) LIGHT SCATTERING 

The milky appearance of nematics is due to variations in refractive index on the length scale of the wavelength of 
visible light, which result from thermal fluctuations of director orientation. It is possible to analyse the angular 
dependence of scattered light intensity in static light scattering experiments [45, 46] to obtain ratios of Frank elastic 
constants (defined in section C2.2.3.3 ). However, dynamic light scattering (DLS) proves far more powerful since it 
yields information on hydrodynamic modes as well as the static elasticities. It can thus be used to obtain the Leslie 
coefficients related to the viscosity of nematic phases (see section C2.2.3.3 ). In samples oriented in specific 
geometries, it is also possible to measure the Frank elastic constants K^, K 2 and K^ individually using DLS rather 
than just their ratios [45, 46 ]. 

The aforementioned light scattering techniques probe long-range fluctuations of the director. Other spectroscopic 
techniques can provide information on molecular ordering. For example, polarized Raman spectroscopy has been 
used to measure orientational order parameters in nematic and smectic phases based on measurements of the 
depolarization of light in oriented samples [47]. These measurements depend on anisotropic molecular 
polarizabilities, associated with specific Raman active bond vibrations, in liquid crystal phases [47]. Brillouin 
scattering is a type of inelastic scattering characterized by smaller frequency shifts than Raman bands [48]. It 
results from light scattered from alternate layers 
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of compression and rarefaction produced by phonons in a material and has been applied to nematic and smectic 
liquid crystals [49, 50]. It provides a measure of the velocity and absorption of second sound, and has been used to 
obtain elastic constants in the smectic phase [50]. 

(C) X-RAY AND NEUTRON DIFFRACTION 

X-ray diffraction is one of the primary methods to determine the structure of a liquid crystal phase [22, 51 ]. 
Smectic phases are characterized by Bragg spots, which result from the layer periodicity. If the sample is oriented, 
usually in an electric or magnetic field, this yields further information for example on the tilt in SmC phases or on 
the structure of modulated phases. An oriented nematic phase is characterized by diffuse arcs that result from local 
anisotropic intermolecular interactions. Orientated smectic and nematic phases are characterized by wide-angle 
scattering arcs which result from the side-to-side packing of molecules. The anisotropy of these arcs is related to 
the extent of orientational ordering; indeed, the azimuthal angular dependence of the scattered intensity can be used 
to obtain an orientational order parameter [52]. However, this analysis does not provide information on the single 
molecule orientational distribution function ( section C2.2.3.1 ). A powerful method to obtain this exploits the 
contrast variation possible in small-angle neutron scattering using deuterium labelling. A mixture of normal and 
deuterated mesogens produces purely single-molecule scattering at low angles, and this can be directly analysed to 
provide order parameters and to reconstruct the orientational distribution function [53, 54]- In fact, diffraction is 
capable of providing, in principle, the full distribution function, unlike spectroscopic methods [55]. Neutron 
scattering has also been used to probe molecular diffusive motions (rotational and translational), via incoherent 
quasi-elastic neutron scattering [54]. 


(D) SPECTROSCOPIC TECHNIQUES 

Of spectroscopic techniques, nuclear magnetic resonance (NMR) has been most widely used to measure 
orientational ordering in liquid crystals [56, 52 and 58]. Most commonly, changes of line splittings in the spectra of 

deuterium-labelled molecules are used, specifically H quadrupolar splittings or intermolecular dipole-dipole 
couplings between pairs of protons. If molecules are partially deuterated then information on the ordering of the 
labelled segments can be obtained. In addition, the dipolar spectra are easier to analyse than those of fully 
protonated molecules, when deuterium decoupling techniques are exploited. Further details are provided in [57]. 
Another method to obtain spectra that can easily be interpreted is to use rigid solutes dissolved in the liquid crystal 
phase [57]. If the structure of the solute molecules is not too complicated, the dipolar couplings can be analysed to 
provide orientational order parameters. Of course, the method only provides information on orientational ordering 
in liquid crystal-solute mixtures. However, since the form of the anisotropic interactions should be the same as in 
the pure liquid crystal phase, studies on solutes can provide indirect information on the ordering of the mesogens. 
The analysis of chemical shift anisotropics has recently been exploited to probe orientational ordering [56]. 

NMR is not the best method to identify thermotropic phases, because the spectrum is not directly related to the 
symmetry of the mesophase, and transitions between different smectic phases or between a smectic phase and the 
nematic phase do not usually lead to significant changes in the NMR spectrum [56]. However, the nematic- 
isotropic transition is usually obvious from the discontinuous decrease in orientational order. NMR can, however, 

be used to identify lyotropic liquid crystal phases, using H NMR on solutions in D 2 0. Cubic, hexagonal and 
lamellar phases can be distinguished due to different averages of the quadrupolar interaction which result from 
differences in the curvature and symmetry of the amphiphile-water interface [56, 52 and 58 ]. 
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(E) DIFFERENTIAL SCANNING CALORIMETRY 

This method is used to locate phase transitions via measurements of the endothermic enthalpy of phase transition. 
Details of the technique are provided elsewhere [25, 58]- Typically, the enthalpy change associated with transitions 
between liquid crystal phases or from a liquid crystal phase to the isotropic phase is much smaller than the melting 
enthalpy. Nevertheless, it is possible to locate such transitions with a commercial DSC, since typical enthalpies are 

l-5kJ mol [9]. These relatively small values indicate that transitions between liquid crystal phases and between 
these and the isotropic phase involve much more delicate structural changes than those that accompany the crystal- 
liquid crystal melting transition. Most liquid crystal phase transitions are first order (discontinuous in enthalpy), 
although some, such as the SmC to SmA transition can be second order (continuous). The latter can be difficult to 
locate, because the heat capacity is small and it is then necessary to turn to a higher-resolution technique such as 
adiabatic scanning calorimetry [59]. 


C2.2.3 THEORY 

Thermotropic liquid crystal phases are formed by rodlike or disclike molecules. However, in the following we 
consider orientational ordering of rodlike molecules for defmiteness, although the same parameters can be used for 
discotics. In a liquid crystal phase, the anisotropic molecules tend to point along the same direction. This is known 
as the director, which is a unit vector denoted n . 

C2.2.3.1 DEFINITION OF AN ORIENTATIONAL ORDER PARAMETER 

Long-range orientational order of the constituent molecules is the defining characteristic of liquid crystals. It is 
therefore important to be able to quantify the degree of orientational order. To do this, an orientational order 
parameter is introduced, which describes the average orientation of the molecules. In general, the orientational 
distribution for a rigid molecule is a function of the three Euler angles Q = (a, P, y) with respect to n. However, for 
a uniaxial phase of cylindrically symmetric molecules, only the polar angle P is relevant. The orientational 


distribution function then describes the probability for molecules to be oriented at an angle P with respect to the 
average, i.e. with respect to the director. It is usually denoted by/(P) and in terms of the anisotropic potential of 
mean torque, £/(P), is defined as 


/<(*) = Z^expl-UifDfkT] 

where Z is the orientational partition function 


(C2.2.3) 


(C2.2.4) 


Typical shapes of the orientation distribution function are shown in figure C2.2.1Q . In a liquid crystal phase, the 
more highly oriented the phase, the more/P tends to be sharply peaked near p=0. However, in the isotropic phase, a 
molecule has an equal probability of taking on any orientation and then/p is constant. 

An orientational order parameter can be defined in terms of an ensemble average of a suitable orthogonal 
polynomial. In liquid crystal phases with a mirror plane of symmetry normal to the director, orientational ordering 
is specified, 
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to lowest order, by the order parameter 


ft = |cos 3 0-- 


(C2.2.5) 


Here the bar indicates an average over the orientational distribution function.Here /^ (cos fi) = ( ^ co& 2 ft — \ )is the 

second rank Legendre polynomial. This average takes the value Fj = Ofor an isotropic phase. For a completely 
oriented phase /*i = L. The order parameter is sometimes denoted by the symbol S [20]. The average in equation 
(C2.2.5) can be written explicitly in terms of the orientational distribution function: 


-/ 


P±(casj9)/(/r)cI(cHKjB). 


(C2.2.6) 


To completely specify the orientational ordering, the complete set of orientational order parameters, /^ , L = 0,2,4. . 
., is required. Only the even rank order parameters are non-zero for phases with a symmetry plane perpendicular to 
the director (e.g. N and SmA phases). 
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Figure C2.2.10. Orientational distribution functions for (a) a highly oriented liquid crystal phase, (b) a less well 


oriented liquid crystal phase and (c) an isotropic phase. 

C2.2.3.2 THEORIES FOR ORIENTATIONAL ORDER 

There are basically two types of theory for orientational ordering in liquid crystals. The first considers long-range 
attractive dispersion interactions. The Maier-Saupe theory for orientational ordering in nematic phases belongs to 
this category. The second type of theory assumes that orientational order results from short-range steric 
interactions. The first example of this type of theory was the Onsager model, in which the excluded volume for 
rodlike particles is calculated as a function of their volume fraction. At sufficiently large volume fractions, the 
theory is able to predict a nematic phase. 

We consider first the Maier-Saupe theory and its variants. In its original formulation, this theory assumed that 
orientational order in nematic liquid crystals arises from long-range dispersion forces which are weakly anisotropic 
[60, 61 and 62]. However, it has been pointed out [63] that the form of the Maier-Saupe potential is equivalent to 
one in 
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which there are both long-range attractive and short-range attractive contributions to the intermolecular potential. 
The general form of this potential is 

t/(COS fi) = |7ift A(WS fi), (C2.2.7) 

This can be inserted in equation (C2.2.3) to give the orientational distribution function, and thus into equation 
(C2.2.6) to determine the orientational order parameters. These are determined self-consistently by variation of the 
interaction 'strength' li ^in equation (c2.2.7). As pointed out by de Gennes and Prost [ 20 ] it is possible to obtain the 
Maier-Saupe potential from a simple variational, maximum entropy method based on the lowest-order anisotropic 
distribution function consistent with a nematic phase. 

A generalization of the Maier-Saupe theory to account for terms higher than second rank in the potential was 
presented by Humphries et al [64]- The model has also been extended to account for the orientational ordering of 
non-cylindrically symmetric (biaxial) nematogens 65. Exploiting the rotational isomeric state model [ 66 ] to 
generate conformers, a molecular field theory for the orientational order of flexible nematogens has been 
developed, where the orientational ordering of each segment of a molecule is described by a second-rank tensor 
[67, 68]. The ability of such models to describe the orientational order of nematogens containing terminal alkyl 
chains has been assessed by making comparisons with order parameters extracted from NMR experiments [69, 70 ]. 
An odd-even variation of segmental orientational order parameters with the number of carbon atoms in the chain is 
one of the observed features that these theories can reproduce. The nematic-isotropic phase transition temperature 
and entropy also show an odd-even variation [67, 69 ]. 

The Maier-Saupe theory was developed to account for ordering in the smectic A phase by McMillan [71]. He 
allowed for the coupling of orientational order to the translational order, by introducing a translational order 
parameter which depends on an ensemble average of the first harmonic of the density modulation normal to the 
layers as well as /v This model can account for both first- and second-order nematic-smectic A phase transitions, 
as observed experimentally. 

Turning now to theories for the nematic phase based on short-range repulsive intermolecular interactions, we 
consider first the Onsager model [72]. This theory has been used to describe nematic ordering in solutions of 
rodlike macromolecules such as tobacco mosaic virus or poly(y-benzyl-L-glutamate). Here, the orientational 
distribution is calculated from the volume excluded to one hard cylinder by another. The theory assumes that the 
rods cannot interpenetrate. Denoting the length of rods by L and the diameter by D, it is assumed that the volume 
fraction <Jj = ^ 1 tt Z,/) "(^concentration) is much less than unity and that the rods are very long L ^>D. It is found 

that the nematic phase exists _above a volume fraction ® c = 4.5£>/L[20]. The Onsager theory predicts jumps in 
density and order parameter ^ J at the isotropic-nematic phase transition on cooling, that are much larger than those 


observed for thermotropic liquid crystals [20]. It is an athermal model so that quantities like the transition density 
are independent of temperature. For these reasons, it has not proved very successful for thermotropic liquid 
crystals, for which the (thermal) Maier-Saupe theory and its extensions are more suitable. 

It has not proved possible to develop general analytical hard-core models for liquid crystals, just as for normal 
liquids. Instead, computer simulations have played an important role in extending our understanding of the phase 
behaviour of hard particles. Frenkel and Mulder found that a system of hard ellipsoids can form a nematic phase for 
ratios L/D > 2.5 (rods) or L/D < 0.4 (discs) [73]; however, such a system cannot form a smectic phase, as can be 
shown by a scaling 
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argument within the statistical mechanical theory [74]. However, simulations show that a smectic phase can be 
formed by a system of hard spherocylinders [75, 76]. The critical volume fractions for stability of a smectic A 
phase depend on whether the model is that of parallel spherocylinders [75], or, more realistically, freely rotating 
spherocylinders [74]. 

C2.2.3.3 CONTINUUM THEORY FOR ELASTIC PROPERTIES 

An aligned monodomain of a nematic liquid crystal is characterized by a single director n . However, in imperfectly 
aligned or unaligned samples the director varies through space. The appropriate tensor order parameter to describe 
the director field is then 


Qvfi{T) = Q(T)[f\A^ P lT) - }M (C2.2.8) 

where a, P = 1, 2, 3 and 8 „ is the Kronecker delta function. In the continuum theory, it is assumed that n varies 

slowly and smoothly with spatial position r, so that details on a molecular scale can be neglected. This model is an 
extension of the elastic theory for solid bodies and was first applied to liquid crystals by Oseen [77]. The modern 
version of the theory is due to Frank [ 39 ] and its relationship to hydrodynamic theory has been considered [78]. 
Details can be found elsewhere [20, 39 and 79]. The result is that the elastic energy per unit volume has the form 

Ki = |*T|[V. n]- + I K 2 \u ■ < V x n)] 2 + J ^[n x (V x n)l". (C2.2.9) 

Here K^K 2 and K^ are elastic constants. The first, K^ is associated with a splay deformation, K 2 is associated with 
a twist deformation and K^ with bend (figure C2.2.1 1). These three elastic constants are termed the Frank elastic 
constants of a nematic phase. Since they control the variation of the director orientation, they influence the 
scattering of light by a nematic and so can be determined from light- scattering experiments. Other techniques 
exploit electric or magnetic field-induced transitions in well-defined geometries (Freedericksz transitions, see 
section (C2.2.4.1) [20, 80]. 


splay (K 1 1 twist ( K j) bend ( K 3 ) 

Figure C2.2.11. (a) Splay, (b) twist and (c) bend deformations in a nematic liquid crystal. The director is indicated 
by a dot, when normal to the page. The corresponding Frank elastic constants are indicated (equation(C2.2.9)). 


Continuum theory has also been applied to analyse the dynamics of flow of nematics [77, 80, 81 and 82]. The 
equations provide the time-dependent velocity, director and pressure fields. These can be determined from 
equations for the fluid acceleration (in terms of the total stress tensor split into reversible and viscous parts), the 
rate of change of director in terms of the velocity gradients and the molecular field and the incompressibility 
condition [20]. 
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Further details can be found elsewhere [20, 78, 82 and 84]. An approach to the dynamics of nematics based on 
analysis of microscopic correlation functions has also been presented [85]. Various combinations of elements of the 
viscosity tensor of a nematic define the so-called Leslie coefficients [20, 84 ]. 

As for crystals, the elasticity of smectic and columnar phases is analysed in terms of displacements of the lattice 
with respect to the undistorted state, described by the field u(r). This represents the distortion of the layers in a 
smectic phase and, thus, u(y) is a one-dimensional vector (conventionally defined along z), whereas the columnar 
phase is two dimensional, so that u(r) is also. The symmetry of a smectic A phase leads to an elastic free energy 
density of the form [ 86 ] 




(C2.2.10) 


Here F Q is the free energy of the isotropic phase. As usual, the z direction is normal to the layers. Thus, two elastic 
constants, B (compression) and K^ (splay), are necessary to describe the elasticity of a smectic phase [ 20 , 79, 86 ]. 
A simple derivation of this equation based on the lowest-order derivative (curvature) of the layer displacement field 
u(r) has been provided [87]. A similar expression can be obtained for a uniaxial columnar phase [ 20 ] (with the 
columns lying in the z direction): 


F =* + TUr + v) + i[(ii7-if) 


(C2.2.11) 


Here B is again a compressional elastic constant, K^ is a bend elastic constant and the elastic constant C results 
from an elliptical deformation of the rods (this term is absent if the column is liquid). 


C2.2.3.4 THEORIES FOR PHASE TRANSITIONS 


(A) NEMATIC-SMECTIC A TRANSITION 


The nematic to smectic A phase transition has attracted a great deal of theoretical and experimental interest because 
it is the simplest example of a phase transition characterized by the development of translational order [88]. 
Experiments indicate that the transition can be first order or, more usually, continuous, depending on the range of 
stability of the nematic phase. In addition, the critical behaviour that results from a continuous transition is 
fascinating and allows a test of predictions of the renormalization group theory in an accessible experimental 
system. In fact, this transition is analogous to the transition from a normal conductor to a superconductor [89], but 
is more readily studied in the liquid crystal system. 

When a nematic phase is cooled towards a smectic A phase, fluctuations of smectic order build up. These 
fluctuations were called 'cybotactic clusters' in the early literature. Regardless of the physical picture of such 
fluctuations, 
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it has been observed that the cluster size grows as the transition is approached [20, 90]. Furthermore, these clusters 
are anisotropic, being elongated along the director. They also grow faster along this direction as the transition is 
approached from above [20, 90 ]. 

Undoubtedly the most successful model of the nematic-smectic A phase transition is the Landau-de Gennes model 
[20]. It is applied in the case of a second-order phase transition by combining a Landau expansion for the free 
energy in terms of an order parameter for smectic layering with the elastic energy of the nematic phase [20]. It is 
first convenient to introduce an order parameter for the smectic structure, which allows both for the layer 
periodicity (at the first harmonic level, cf. equation (C2.2.1) ) and the fluctuations of layer position uy [20]: 

tf/(r) =p i (r)e m,) (C2.2.12) 

where O(r) = -q s u(r) is a phase factor. 

Using this order parameter, the free energy in the nematic phase close to a transition to the smectic phase can be 
shown to be given by [20, 88, 89, 91] 

F = ^ ]+ «A|^|- + iC]f| 4 + ^1^1* +C|||V„| 2 + C x |(V 1 -i^ii)| 2 + ^- (C2.2.13) 

Here A, C and E are phenomenological coefficients in the Landau expansion in terms of the smectic ordering; 
C and C i account for gradients of the smectic order parameter; the fifth term also allows for director fluctuations, 
<S n. The term F N is the elastic free-energy density of the nematic phase, given by equation (C2.2.9) . In the smectic 

A phase itself, the amplitude of the density modulation is constant and twist and splay distortions are forbidden, 
thus the expression for the free energy density simplifies to equation (C2.2.10) . 

High-resolution heat capacity measurements showed that the exponent for the temperature dependence of the heat 
capacity followed the predictions of the 3D XY model [92, 93] in systems where the nematic phase was large [94]. 
High-resolution x-ray scattering experiments in the nematic phase close to the continuous transition to a smectic A 
phase provided definitive evidence that the transition belongs to the 3D XY universality class [90]. The critical 
exponents obtained for the growth of correlation lengths were in excellent agreement with the renormalization 
group theory predictions, and provide strong support for the 3D XY model [92, 93 ]. 

(B) SMECTIC A-SMECTIC C TRANSITION 

This transition is usually second order [18, 19 and 20]. The SmC phase differs from the SmA phase by a tilt of the 
director with respect to the layers. Thus, an appropriate order parameter contains the polar (9) and azimuthal ((|)) 
angles of the director: 

#{r) = 9{T)c lMf K (C2.2.14) 

Obviously 9 =0 corresponds to the SmA phase. This transition is analogous to the normal-superfluid transition in 
liquid helium and the critical behaviour is described by the XY model. Further details can be found elsewhere [18, 
19 and 20]. 
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(C)N AC POINT 

A point at which nematic, SmA and SmC phases meet was demonstrated experimentally in the 1970s [95, 96]. The 
NAC point is an interesting example of a multicritical point because lines of continuous transition between N and 


SmA phases, and SmA and SmC phases, meet the line of discontinuous transitions between the N and SmC phase. 
The latter transition is first order due to fluctuations of SmC order, which are continuously degenerate, being 
concentrated on two rings in reciprocal space rather than two points in the case of the N-SmA transition [18, 19 
and 20]. Because the NAC point corresponds to the meeting of lines of continuous and discontinuous transitions it 
is an example of a Lifshitz point (a precise definition of this critical point is provided in [18, 19 and 20]). The NAC 
point and associated transitions between the three phases are described by the Chen-Lubensky model [97], which is 
able to account for the topology of the experimental phase diagram. In the vicinity of the NAC point, universal 
behaviour is predicted and observed experimentally [20]. 

(D) FRUSTRATED SMECTICS 

Prost [21, 25, 98] showed that the properties and structure of frustrated smectic phases, as sketched in figure C2.2.5 
can be described by two order parameters. These are the mass density order parameter p (r) ( equation (C2.2.1) ) and 
the polarization order parameter P(r), which describes long-range correlations of dipoles. He then constructed a 
phenomenological Landau mean- field theory in which the free energy contains terms up to the quartic in these 
order parameters, their gradients and coupling terms. The number of terms in the free energy reflects the symmetry 
of the particular frustrated SmA and SmC phase under consideration. The theory has been comprehensively 
reviewed [18, 19, 20 and 21, 25]. 

(E) PHASE TRANSITIONS INVOLVING SMECTIC B PHASES 

The transition from smectic A to smectic B phase is characterized by the development of a sixfold modulation of 
density within the smectic layers ('hexatic' ordering), which can be seen from x-ray diffraction experiments where 
a sixfold symmetry of diffuse scattering appears. This sixfold symmetry reflects the bond orientational order. An 
appropriate order parameter to describe the SmA-SmB phase transition is then [18, 19 and 20] 

fa = ^e 6 ^ (C2.2.15) 

where § is the angle about the C 6 axis, and p 6 denotes a constant density. That such hexatic order could be created 
by dislocations was predicted by Halperin and Nelson and by Young [28, 99]. Again, high-resolution heat capacity 
measurements have also been useful in elucidating critical behaviour close to the transition in bulk samples [ 100 , 
101 ]. Calorimetry experiments on thin, freely suspended liquid crystal films have provided a great deal of 
information on the crossover between two- and three-dimensional behaviour at the SmA-HexB transition and they 
have confirmed that the transition is continuous [ 101 , 102 ]. 
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(F) PHASE TRANSITIONS IN DISCOTICS 

McMillan's model [ 71 ] for transitions to and from the SmA phase ( section C2.2.3.2 ) has been extended to 
columnar liquid crystal phases formed by discotic molecules [36, 103 ]. An order parameter that couples 
translational order to orientational order is again added into a modified Maier-Saupe theory, that provides the 
orientational order parameter. The coupling order parameter allows for the two-dimensional symmetry of the 
columnar phase. This theory is able to account for stable isotropic, discotic nematic and hexagonal columnar 
phases. 

Monte Carlo computer simulations of spheres sectioned into a 'disc' [ 104 , 105 ] show that steric interactions alone 
can produce a nematic phase of discotic molecules. Columnar phases are also observed [ 104 , 105 ]. 


C2.2.4 APPLICATIONS OF LIQUID CRYSTALS 


C2.2.4.1 NEMATIC LIQUID CRYSTAL DISPLAYS (LCDS) 

The anisotropy of liquid crystal molecules leads to a susceptibility to electric and magnetic fields. Such fields can 
be used to change the average orientation of molecules and this is the basis for liquid crystal displays. In fact, the 
basic physics underlying nematic LCDs was worked out by Freedericksz (a.k.a. Frederiks) in the 1930s [ 106 ]. It 
relies on the strong interactions of liquid crystal molecules with surfaces as well as their susceptibility to 
electromagnetic fields. Consider a nematic liquid crystal sandwiched in a thin film (about 10 |um thick) between 
two pieces of glass that have been treated to produce preferential orientation at the surface. In nematic liquid crystal 
displays, molecules are oriented parallel to the glass using a thin layer of rubbed polyimide polymer. Rubbing 
produces microscopic grooves in the polymer and, hence, alignment of the mesogens. To remain in an undeformed 
state, the bulk of the sample also adopts this orientation, which is termed homogeneous ( figure C2.2.12 ) left). Now 
an electric or magnetic field is applied normal to the surface. In the bulk of the sample where the molecules are not 
pinned by the surface, the director tends to reorient in the direction of the field, as shown in figure C2.2.12 (right). 
If we compare with figure C2.2.11 , we can see that this deformation involves bend and splay of the director field. 
This field-induced transition in director orientation is called a Freedericksz transition [9, 106 ., 107 ]. We can also 
define Freedericksz transitions when the director and field are both parallel to the surface, but mutually orthogonal 
or when the director is normal to the surface and the field is parallel to it. It turns out there is a threshold voltage for 
attaining orientation in the middle of the liquid crystal cell, i.e. a deviation of the angle of the director [9, 107 ]. For 
all three possible geometries, the threshold voltage takes the form [9, 107 ] 


V^=.T (1 J - — ■ (C2.2.16) 

Here d is the thickness of the cell, K is either Kp K 2 or K^ depending on the geometry (e.g. K^ in the case of figure 
C2.2.12 and As is the anisotropy in permittivity in the nematic liquid crystal. Note that in equation (C2.2.16) the 
threshold voltage, that is the relevant quantity for display operation, is independent of cell thickness. 
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Figure C2.2.12. A Freedericksz transition involving splay and bend. This is sometimes called a splay deformation, 
but only becomes purely splay in the limit of infinitesimal displacements of the director from its initial position 
[ 106 ]. The other two Freedericksz geometries ('bend' and 'twist') are described in the text. 

This equation is derived by accounting for the energies of the electric field and of the distorted director fields. 
Further details are provided in [9] and [ 107 ]. It is also possible to obtain expressions for the switching time [ 107 ]., 
using the appropriate expressions for the hydrodynamics of nematic liquid crystals in an external field [ 108 ., 109 ]. 
Clearly, both threshold voltage and switching times are critical factors in the application of Freedericksz transitions 
in LCDs. Both can be tuned by varying the cell thickness and elastic constants. The latter can be varied by 
appropriate choice of molecule or, in commercial devices, mixtures of molecules. It should also be noted that 
equation (C2.2.16) provides a means of measuring the Frank elastic constant in each of the three Freedericksz 
geometries. 

C2.2.4.2 TWISTED NEMATIC (TN) AND SUPERTWISTED NEMATIC (STN) LCDS 

The first stable commercial liquid crystal display (LCD) device was the twisted nematic (TN) [ 110 ], still widely 


used in watches and calculators. A TN display is sketched in figure C2.2.13 [9, 110 , 111 and 112 ]. It relies on the 
Freedericksz transition described in the preceding section. The cell consists of two glass plates coated with rubbed 
polyimide to induce orientation of the director parallel to the surface. In addition, there is a thin layer of the 
transparent conducting material, indium tin oxide. This is used to apply an electric field across the liquid crystal 
sandwich, which is about lOurn thick (controlled by spacers). The display also needs polarizers on the top and 
bottom plates (actually these are the most expensive part of TN displays) with their polarization axes parallel to the 
rubbing direction. The bottom plate is twisted with respect to the top one, so that the surface-aligned director is 
rotated through 90° and the polarizers are crossed. This induces a twist to the nematic phase, hence the name for 
the device. In this state, as light passes through the cell, its polarization axis is guided through 90° so it is 
transmitted through the bottom plate. In a normal device, the light is then reflected from the back plate and passes 
through the device again. This produces a silver or grey state. However, when an electric field is applied across the 
cell, the director switches to orient parallel to the field in the middle of the cell. Then light passing through the cell 
does not have its polarization axis rotated and so cannot be transmitted through the bottom polarizer. The cell then 
looks dark in the 'on' state. This can be used to create dark characters against a light background, as in most LCDs. 

The reverse contrast, i.e. bright characters on dark, can be achieved by orientating one polarizer parallel to the 
rubbing direction and the other one perpendicular to it, so that the device is dark in the 'off state. Backlit versions 
of this device are used in car dashboard displays [ 111 ]. 
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Figure C2.2.13. Principle of operation of a twisted nematic display. In the 'off state' the liquid crystal is in a 
homogeneous orientation throughout the cell. The director is oriented by rubbing the glass plates, which are placed 
such that the rubbing directions are parallel to the polarization direction in the adjacent polarizer. These polarizers 
are crossed. Light entering in the cell is rotated by the director and can pass through the cell. With a backlight or 
reflector, the cell appears 'white' (actually usually grey). However, upon application of an electric field via the 
indium tin oxide (ITO)-coated glass plates, the cell is switched 'on'. The director undergoes a Freeedericksz 
transition to reorient parallel to the field and, thus, light entering the cell does not undergo rotation. With crossed 
polars the cell appears dark. This geometry describes displays with black pixels on a light background. 

Important considerations for construction of a liquid crystal display include the following [ 111 ]: 


[1] Operating conditions. The nematic phase must be stable over the temperature range for which the device is 
to function, usually -20 to 80°C. At the same time, the mesogens must be rugged, i.e. chemically stable and 
capable of being switched many times. The development of LCDs in the 1970s was driven by the discovery 
of the alkylcyanobiphenyl ( figure C2.2.1 ) class of mesogens that satisfy these requirements. No single 
compound, however, is able to provide the full desired operating temperature range, and in devices eutectic 


mixtures are usually used. 

[2] Threshold voltage. Batteries can only supply low voltages, so for portable appliances the switching voltage 
or threshold voltage must be sufficiently small. 

[3] Current drawn. LCDs usually draw small currents, which is one of their main advantages. 

[4] Sharpness. This describes the steepness of the electro-optical switching as a function of voltage. This is 
defined in terms of the ratio of voltages required to achieve 90% compared to 10% transmission of light. 
This ratio should be as close to unity as possible. 
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[5] Contrast. This can be quantified as the ratio of transmitted light intensity in the bright state compared to the dar 

[6] Viewing angle. The optical activity of the liquid crystal cell and polarizers depends strongly on viewing angle, ] 
degradation in image quality if not viewed straight on (as confirmed by viewing a calculator display at differem 

[7] Switching speed. Typical switching times for TN devices are 20 to 50ms, which is quite slow and has limited th 
television displays. 

Parameters (ii)-(vii) depend on the dielectric, mechanical and optical properties of the mesogens. To optimize a dis 
compromise between different molecular characteristics is often required and mixtures of liquid crystals are usuall) 
commercial displays. 

The desire to improve sharpness and viewing angle range led to the development of supertwisted nematic displays | 
name suggests, STN displays have higher twist angles than the TN display, typically 220-270°. They are widely us 
laptop computers. 

Often a seven-segment array is sufficient for TN devices where numbers are displayed [ 113 ]. The limited number c 
switched means that each can be addressed directly. However, dot matrix or VGA displays with large numbers of p 
to create alphabetical characters (not just in the Roman alphabet, but also more complex symbols such as those in C 

C2.2.4.3 THIN-FILM TRANSISTOR (TFT) LCDS 

Compared to STN displays, active matrix addressing in TFTs allows enhanced sharpness and greater multiplexing, 
each liquid crystal pixel is addressed by a transistor, which thus primarily governs the response of the device. TFT i 
greater number of pixels (higher resolution) and number of colour levels than STN devices. They are widely used ii 
computers, although they are more expensive than STN displays. Further details can be found elsewhere [ 115 ]. 

C2.2.4.4 FERROELECTRIC LCDS 

Ferroelectric LCDs have potential as very fast displays and also do not require active matrix addressing technology 
response, they also have potential applications as high speed electro-optical shutters or spatial light modulators. Ho 
fabrication technology problems, they have yet to find extensive commercial application [112, 116 , 117 ]. Thus, onl 
the principles of operation is included here. The method relies on orientating the mesogens in a ferroelectric SmC* 
the surfaces of the cell, but with no preferred in-plane orientation. This produces a so-called surface-stabilized ferrc 
crystal (SSFLC) [112, 116, 117, 118,andll9] if the cell is sufficiently thin. In this so-called 'bookshelf geometry t 
vector lies perpendicular to the plane of the cell in either the 'up' or 'down' states and can be reoriented in response 
voltage. The reorientation of the polarization is coupled to molecular tilt and, hence, the optical axis, which can be 
of an optical switch if the change in tilt angle is sufficient. Each of the two orientation states has the 
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same energy and so the device is bistable indefinitely. Further, bistable switching of spontaneous polarization 
occurs much faster than the polarization changes induced by reorientation of the director required in TN and STN 
display cells. However, there are a number of technical constraints that have to be overcome before SSFLC 
technology can be reliably used in displays: specifically, the cell has to be very thin (1 urn or less) and the 
director alignment and smectic layer orientation have to be controlled very carefully over large areas. 
Furthermore, the alignment in the optimal 'bookshelf geometry can be destroyed by mechanical effects, e.g. 
simply by pressure of a finger, due to the high viscosity of the smectic phases, which are 'soft solids'. In 
addition, there are problems with achieving a surface stabilized state, since other configurations are possible, 
particularly the so-called 'chevron' structure [ 120 ]. Further details are provided by Lagerwall [ 116 ]. Many of 
these problems can be solved and, indeed, prototype FLC displays have been demonstrated by several 
manufacturers [ 116 ]. 

C2.2.4.5 POLYMER DISPERSED LIQUID CRYSTAL (PDLC) DISPLAYS 

A display which does not need polarizers can be made by dispersing a nematic liquid crystal in a polymer — a so- 
called polymer-dispersed liquid crystal (PDLC). The liquid crystals form microdroplets which scatter light in the 
'off state, but allow it to pass in the 'on' state, switched by an electric field. An example of the application of 
PDLCs is thus as switchable windows for privacy since the display can be switched between opaque and 
transparent states. For the device to function in this way, it is necessary for the director in the liquid crystal 
droplet to be oriented in a tangential orientation, which leads to two poles (bipolar droplet [ 121 ]). In addition, the 
refractive index of the polymer must be equal to the index of refraction for light polarized perpendicular to the 
director. In the off state, the two poles are oriented at random, but in an electric field they orient along the 
direction of the field ( figure C2.2.14 ). In the off state, there is on average a difference in the refractive index of 
the liquid crystal droplets compared to the polymer, which leads to light scattering, whereas in the on state, the 
refractive indices are matched for light incident along the field direction and the display appears clear [9, 114 , 
121 ]. Thus, PDLCs are useful as switchable windows, e.g. for privacy or sunlight shading. 

There are two basic methods of preparing PDLCs [ 121 ]. In the first, the liquid crystal is dispersed as an emulsion 
in an aqueous solution of a film- forming polymer (often polyvinyl alcohol) [ 121 ]. This emulsion is then coated 
onto a conductive substrate and the emulsion is dried to form the dispersed liquid crystal-in-polymer film. In the 
second method, the phase separation of a polymer is exploited to disperse the liquid crystal [ 123 ]. In the method 
of polymerization-induced phase separation [ 121 ], polymerization is induced through the application of heat, 
light or radiation e.g. through cross-linking of a network. A commonly used example exploits the cross-linking of 
epoxy adhesives to form a solid structure containing phase-separated liquid crystal droplets. In thermally induced 
phase separation, the liquid crystal is mixed with a thermoplastic polymer at high temperatures. When the system 
is cooled, the liquid crystal phase separates from the solidifying polymer. In solvent-induced phase separation, a 
polymer and a liquid crystal are mixed to form a single-phase mixture in an organic solvent. Evaporation of the 
solvent then drives the phase separation of polymer and liquid crystal [ 121 ]. 
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Figure C2.2.14. Principle of operation of a polymer-dispersed liquid crystal display. The contours of the liquid 


crystal droplets in the polymer matrix correspond to the director orientation, which here is dipolar. In the off 
state, the cell scatters light and appears opaque, due to refractive index variations between the liquid crystal and 
the polymer. However, when an electric field is applied and the liquid crystal directors reorient, the refractive 
index along the field is matched to that of the matrix, and the cell becomes clear. 

Further details of PDLCs can be found in the excellent monograph by Drzaic [ 121 ]. A review of the non-linear 
optical properties of PDLCs has also been presented [ 124 ]. 

C2.2.4.6 OTHER APPLICATIONS 

The main non-display applications of liquid crystals can be subdivided into two classes. The first exploits their 
anisotropic optical properties in spatial light modulators [ 125 ] or their non-linear optical properties (optical wave 
mixing etc) [48, 124, 125 ]. Spatial light modulators are usually based on the ferroelectric SmC* phase aligned in 
a thin film. Liquid crystal spatial light modulators find advanced applications such as the storage of computer- 
generated holograms [ 125 ]. The second class of non-display applications exploits temperature-dependent colour 
(thermochromic) changes in the chiral nematic phase [9, 126 ]. Chiral nematic phases can appear coloured due to 
scattering of light by the helical structure, which can have a pitch as small as 100 nm. The pitch 'unwinds' as 
temperature is decreased, usually as a smectic A phase is approached, leading to observable colour changes. 
These have been exploited in medical thermography, where heat variations across the body surface are mapped 
[ 126 , 127 ]. This is especially important in oncology. The technique has also found applications in engineering 
and aerodynamic research. Models of aircraft, for example, in wind tunnels can be coated with a chiral nematic 
liquid crystal. The flow of air over the model leads to heat variations (turbulent flow leading to 'cold spots') 
which can be visualized directly [ 126 ]. Gimmick applications of thermochromic liquid crystals include colour- 
changing clothes or beermats. 
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C2.3 Micelles 

John Texter 


C2.3.1 INTRODUCTION 

Surfactants are the primary molecular constituents of micelles. They are also called amphiphiles, and certain 
classes of surfactants are detergents. Surfactants are amphiphilic molecules having separate lyophilic or solvophilic 
(solvent-loving) groups and lyophobic or solvophobic (solvent-hating) groups (see section C2.3.3 ). Having both 
types of group makes a molecule amphiphilic and amphipathic. Micelles are the most prevalent aggregate structure 
in surfactant solutions and form over a narrow range in surfactant concentration centred around the critical micelle 
concentration, cmc (see section C2.3.4. ). This process of micellization is taken, in the so-called pseudophase 
approximation, as the formation of a two-phase system comprising the continuous solvent phase and a pseudophase 
consisting of the oily micellar cores. This approximation is convenient for many purposes, but it must not be taken 
literally, as micellar solutions are generally single-phase isotropic solutions and micelles are thermodynamically 
reversible aggregates, in dynamic equilibrium with surfactant 'monomers' in solution. 

Idealized structures of some normal and reverse (inverse) micelles are illustrated in figure C2.3.1. Suctants self- 
assemble (aggregate) in order to lower the solution free energy. This process involves the creation of an interface 
separating the solvent (aqueous) phase from the solvophobic (hydrophobic) portions of the surfactant. When a 
solvophilic headgroup interacts attractively with a solvent, the solvophobic portions aggregate to form an oily 
nanodroplet of interpenetrating tail portions that is separated from the solvent by the headgroups. The headgroups 
serve to define a boundary between the solvent and solvophobic portion. Whether headgroups are charged or not, 


their packing is influenced by repulsion forces between headgroups and solvation interactions with the solvent. 
Such aggregates define 'normal' micelles. When the surfactant headgroup is not solvated by the solvent, the 
headgroups tend to aggregate together, forming an internal structure separated from the solvent by an interface 
defined by the tailgroups. Such micelles are called inverse micelles or reverse micelles. 




Figure C2.3.1. Idealized cross-sections of normal spherical micelles (left) and of reverse micelles (right). 
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C2.3.2 HISTORICAL OVERVIEW 

Aqueous surfactant aggregation to form micelles appears to have first been suggested by McBain in 1913 [1]. A 
spherical model for micelle structure was put forward by Hartley [2, 3] where the hydrophilic headgroups were 
pictured to lie upon a roughly spherical surface, the internal volume was filled with the tailgroups and an electrical 
double layer developed radially out from the headgroups. The tailgroups form an oily nanodroplet that is covered 
with a polar shell. This early, and admittedly, idealized model was criticized on various grounds [4], although 
Hartley pointedly noted that the assumption of perfectly spherical symmetry was done for 'mathematical 
simplicity'. Menger proposed [4] taking more detailed account of the filling of the internal volume by the 
tailgroups and he explicitly introduced the use of gauche kinks in the hydrocarbon chains to help fill the requisite 
space. 


This more careful attention to space filling was also addressed by Fromherz [5] and by Dill and Flory [6]. 
Fromherz showed that it was possible to produce space filling and nominally spherical micellar structures, with 
rough headgroup regions, while still allowing straight chain surfactants to aggregate with extended tailgroups, but 
allowing gauche defects to modify the orientation of headgroups. This model, along with that of Menger and with 
that of Dill and Flory, resulted in rough headgroup regions and significant contacts between the solvent and regions 
of the tailgroups. These approaches were put on a statistical dynamical basis by Dill and Flory using a lattice model 
illustrated in figure C2.3.2. In this lattice model the surfactant is imagined to be composed of a headgroup and 
tailgroup chain segments. Each of the headgroup and segments occupies a single lattice site, as illustrated in figure 
C2.3.2 . The lattice comprises constant radial interlayer spacing and sites of equal volume. This approach set the 
stage for advances in structural modelling using the modern computational tools of statistical mechanics (see 
section C2.3.7 ). 



Figure C2.3.2. Two-dimensional radial lattice representation of micelle structure using the approach of Dill and 
Flory [6]. Each lattice site is considered to be equal in volume to the others. Reproduced by permission from [6]. 
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C2.3.3 SURFACTANTS 

Surfactants are generally pictured as having a lyophilic (hydrophilic) headgroup of some type and a hydrocarbon 
tailgroup. The word surfactant derives as a contraction of the phrase surface active agent. They can also be 
exemplified by simple species such as short chain alcohols and amides that are usually surface active, but do not 
necessarily lower interfacial tensions significantly. It is difficult to draw clear boundaries with respect to molecular 
structure for defining what constitutes a surfactant, so we adopt an operational definition. A surfactant is a surface 
active amphiphile that aggregates in water or other solvent to form microstructures such as micelles and bilayers or 
segregates at interfaces to form monolayer assemblies. This definition is inclusive of many important polymeric 
dispersants, such as moderate molecular weight block copolymeric surfactants. 

C2.3.3.1 CLASSIFICATION OF SURFACTANTS 

Schemes for classifying surfactants are based upon physical properties or upon functionality. Charge is the most 
prevalent physical property used in classifying surfactants. Surfactants are charged or uncharged, ionic or nonionic. 
Charged surfactants are further classified as to whether the amphipathic portion is anionic, cationic or zwitterionic. 
Another physical classification scheme is based upon overall size and molecular weight. Copolymeric nonionic 
surfactants may reach sizes corresponding to 10 000-20 000 Daltons. Physical state is another important physical 
property, as surfactants may be obtained as crystalline solids, amorphous pastes or liquids under standard 
conditions. The number of tailgroups in a surfactant has recently become an important parameter. Many surfactants 
have either one or two hydrocarbon tailgroups, and recent advances in surfactant science include even more 
complex assemblies [7, 8 and 9]. 

Surfactants derive their general classification from their surface activity and tendency to preferentially segregate at 
liquid-gas, liquid-liquid and liquid-solid interfaces. They always contain at least two functional parts, a lyophilic 
or solvophilic part that preferentially solvates in the solvent and a lyophobic or solvophobic part that is poorly 
solvated in the same solvent. Solvophilicity (hydrophilicity in the case of water) is imparted by functional groups 
that have high solvent affinity. Carboxylates, sulphates and sulphonates are examples of charged functional groups 
that have relatively high affinity for water as a solvent. Uncharged hydrophilic groups may include almost any 
uncharged polar group. Hydroxyl and ethylene oxide groups are the most prevalent. 

Solvophobicity (hydrophobicity with respect to water) is most often exemplified as a linear or branched 
hydrocarbon chain. Fluorocarbon chains and siloxane chains are also hydrophobic. Many commercially important 


surfactants have more complex solvophobic groups such as substituted phenyl and naphthyl ring systems. The 
number of solvophobic tailgroups (single tail, double tail) in a surfactant is an important parameter because it 
dramatically affects solubility and surfactant packing in micellar aggregates (see section C2.3.6 ). 

The majority of practical micellar systems of 'normal' micelles use water as the main solvent. Reverse micelles use 
water immiscible organic solvents, although the cores of reverse micelles are usually hydrated and may contain 
considerable quantities of water. Polar solvents such as glycerol, ethylene glycol, formamide and hydrazine are 
now being used instead of water to support 'regular' micelles [10]. Critical fluids such as critical carbon dioxide are 
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also being exploited for various micellar applications. Copolymeric surfactants of polystyrene and poly(l,l- 
dihydroperfluorooctyl acrylate) have been developed [ 11 ] to form polydisperse micelles in critical carbon dioxide 
and to solubilize polystyrene oligomers in such micellar solutions. 

Charged surfactants impart electrostatic forces and ion-pairing interactions when they are aggregated as micelles. 
Charged surfactant solubility is greatly affected by counter-ions and the binding affinity of counter-ions. For 
example, lithium surfactant salts are much more hygroscopic than sodium salts and tend to have greater aqueous 
solubility. In such cases counter-ion charge density has a dramatic effect on binding affinity and, hence, on 
solubility. This ion-ion association also affects the formation and structure of micelles, since headgroup repulsion 
interactions are dramatically affected by counter-ion binding and the screening of repulsive electrostatic forces by 
counter-ions. An extensive practical listing of surfactants, as well as a key to their synthesis, may be found in an 
exhaustive review [12]. 

(A) ANIONIC 

A selection of important anionic surfactants is displayed in table C2.3.1 . Carboxylic acid salts or the soaps are the 
best known anionic surfactants. These materials were originally derived from animal fats by saponification. The 
ionized carboxyl group provides the anionic charge. Examples with hydrocarbon chains of fewer than ten carbon 
atoms are too soluble and those with chains longer than 20 carbon atoms are too insoluble to be useful in aqueous 
applications. They may be prepared with cations other than sodium. 

The blocking and elimination of the carboxyl group and replacement of anionic charge by sulphate and sulphonate 
substitution provided a revolution in detergency. Detergents such as sodium dodecylbenzene sulphonate (SDBS) 
have replaced soaps as laundry cleansing agents because of their efficacy and low cost. Such sulpho compounds 
may be readily derived from many natural products and synthetic precursors. Alkyl sulphates, such as sodium 
dodecylsulphate (SDS), alkyl ether sulphates, alkyl sulphonates, secondary alkyl sulphonates, aryl sulphonates such 
as alkylbenzene sulphonates, methylester sulphonates, a-olefmsulphonates and sulphonates of alkylsuccinates are 
important classes of anionic surfactants. Fatty acids and sulpho compounds illustrate three important anionic 
groups, carboxylate (-COJ), sulphate (-OSO^ )and sulfonate (~SOg ). Phosphates such as mono- (-P(0H)O 2 land 
dianions (-PO^~ jare also important. These dianions are basic and initially protonate in the neutral to slightly 
alkaline range, but they remain negatively charged to relatively low pH. Various data 13 have led to the following 
rank ordering of these groups with respect to their relative hydrophilicity: 


-CGJ 3> -SOJ > -050; > -P(OH)Oj , (C2.3.1) 

The negative charge density is greatly affected by the group size. The carboxylate group is the smallest, attains the 
highest charge density and is the most hydrophilic in this series. However, it generally protonates in the pH 4-5 
range, so its range of usefulness is very sensitive to pH. This follows also for the phosphates, but the sulphates and 
sulphonates generally remain charged, even to pH as low as 1 . 
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Table C2.3.1 Structures of key anionic surfactants. 


CHjfCH^^h^OSOj- N** Sodium dodecylsulphate, SDS 

R-C*^QSQf Na* Sodium alkylsulphates 



SOT Ni 


X 


Sodium dodecylbenzene sulphonate, SDBS 


Sodium alkylbenzensulphonates 


Sodium bis(2-ethylhexyl) sulphosuccinate, 
AOT 


Sodium succinate esters 


Sodium di-, tri-isopropyl-naphthalene 
sulphonate, DTINS 


a-sulpho fatty acid methyl esters 


Dialkylphosphates 


(B) CATION I C 

Most cationic surfactants derive from the quaternarization of nitrogen and most of the key cationic surfactant 
classes are illustrated in table C2.3.2 . Alkylammonium halides such as dodecylammonium bromide, for example, 
are good hydrogen bond donors and interact strongly with water. They also often can give up a proton and are then 
transformed into a nonionic surfactant. Chemical blocking of this hydrogen bond donating capability by full 
alkylation to yield the tetra-alkylammonium group results in cations that interact relatively weakly with halide 
counterions but strongly with organic anions. 
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Table C2.3.2 Structures of key anionic surfactants. 


Alkylammonium halides 


Hexadecyltrimethylammonium bromide, 
CTAB 


R-NHj* X- 

CH3(Chj)i4CHiN(CKj)3 Br" 
R-M;CH3)j x- 

^NfCHsb Br 


Alkyltrimethylammonium halides 
Dialkyldimethylammonium bromides 

1, 1'-dialkyl-4, 4'-bipyridine chloride 


(A/-Alkyl-2, 2 , -bipyridine)-bis(2, 2'- 

w - ~$ \, $ ^- A ^ bipyridine) ruthenium perchlorate 
CI R— n y — f N-R Ca 


(hpy) 2 Ru 



?cior 


Dodecylpyridinium bromide 


CH^CH*) iD CkV \ Br~ 




A wide class of aryl-based quaternary surfactants derives from heterocycles such as pyridine and quinoline. The N- 
alkyl pyridinium halides are easily synthesized from alkyl halides, and the paraquat family, based upon the 4, 4'- 
bipyridine species, provides many interesting surface active species widely studied in electron donor-acceptor 
processes. Cationic surfactants are not particularly useful as cleansing agents, but they play a widespread role as 
charge control (antistatic) agents in detergency and in many coating and thin film related products. 

(C) ZWITTERIONIC AND AMPHOTERIC 

a-amino acids in the isoelectric pH range are true zwitterions and result from apparent intramolecular proton 
transfer. This class and other related surfactants are depicted in table C2.3.3 . The term zwitterionic surfactant is 
now taken to mean just about any combination of an anionic and a cationic group in a single amphiphilic molecule, 
whether or not either or both of these groups may be neutralized at some pH. In this sense zwitterionic is taken as 
synonymous with amphoteric. Often such species contain only one readily ionizable (or neutralizable) group. When 
such a group is the carboxyl group, the expected changes in charge and physical properties with pH must be borne 
in mind. Examples include tri-alkylammonioalkanoates, where the quaternary nitrogen and carboxylate are 
separated by the alkanoate carbon chain and wherein the nitrogen quaternerizes with the co-carbon of the alkanoate. 
The relative hydrophilicities of carboxy, sulphonate and sulphate ammonio zwitterionics decrease in the following 
order: 
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-coj $> -so; > -oso: 


(C2.3.2) 


This ordering is the same as discussed above for anionic surfactants and follows from the charge density of the 
particular acid group. 

Table C2.3.3 Structures of key zwitterionic and amphipathic surfactants. 


R-CHCQf 

u ? Hl .0 

C^h^C-MCHiCHE'l^CHjC^Chtcf _ 

i4j ° 

CHj 

R 

VJ 
CHj{CHjtocH|C— O-CH o' J;Hj Dialkanoyl lecithins 

4- 


Alkylamino acids 


Alkylbetaines 


Amidoalkylbetaines 


Alkylsulphobetaines 


Imidazolium betaines 


2,3-dimethyl-3-dodecyl-1 ,2,4- 
triazolium-5-thiolate 


The various forms of betaines are very important for their charge control functions in diverse applications and 
include alkylbetaines, amidoalkylbetaines and heterocyclic betaines such as imidazolium betaines. Some 
surfactants can only be represented as resonance forms having formal charge separation, although the actual atoms 
bearing the formal charge are not functionally ionizable. Such species are mesoionic and an example of a 
trizaolium thiolate is illustrated in table C2.3.3. 
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(D) NONIONIC 


A selection of nonionic surfactant structures is listed in table C2.3.4. Many of these are analogous to some of the 
anionic and cationic surfactants, with the exception that the headgroup is uncharged. Steric and osmotic forces 
rather than electrostatic forces control interactions between nonionic surfactant headgroups. Oligomers of ethylene 
oxide have become the most important class of headgroup in nonionic surfactants and the alkanol 
polyethyleneglycol ethers, C^E^, have become the mostly widely studied class of nonionics. The ethylene oxide 
groups are generally believed to hydrate with about three water molecules. A related class is that of the alkylphenol 
ethoxylates. Triton X-100 (TX100) is the best known member of this class. 

Table C2.3.4 Structures of key nonionic surfactants. 


Alkanol polyethyleneglycol ethers, 


C m E n 




^c 



o 

a-/ 


OH O 


Cfc— CH 


HO 


^CH 


O 
II 


CisH 3 i— C-O-Cfc^ 
CssHsi-C-O-CH O 

O CH^O-^-OCH^CHsMHj 


Polyoxyethylene p-f-octyl phenol, 
TX100 


Alkylphenol polyethyleneglycol ethers 
Fatty acid alkanol amides 
Alkyl amides 


Diblock copolymers of ethylene oxide 
and propylene oxide 


Triblock copolymers of ethylene oxide 
and propylene oxide 


Sorbitan esters 


3-sn-dipalmitoylphosphatidyl-1'-s/7- 
ethanolamine 


6 
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The fastest growing class of block copolymeric nonionic surfactants derive from oligomeric ethyleneoxide (EO) 
and from oligomeric propyleneoxide (PO). They are obtained as diblocks, AB, as triblocks, ABA and as triblocks, 
BAB, where A denotes a hydrophilic block (such as EO) and B denotes a hydrophobic block (such as PO). Poly 
(butylene oxide) and polystyrene oligomers are also available for the more hydrophobic B blocks. These 
surfactants are now widely available commercially. An excellent review of their properties in aqueous solution is 
given by Alexandridis and Hatton [14]. Other nonionics include alkanol amides such as ethanolamides and 
diethanolamides, alkylamides and amine ethoxylates. Lecithin and other glycerol-based nonionics are 
physiologically important and are finding widespread pharmaceutical applications. 

C2.3.3.2 PHYSICAL STATE OF SURFACTANTS 

The physical state of surfactants affects how easily micellar solutions may be prepared. At room temperature 
surfactants are usually obtained as amorphous or crystalline solids, but increasing numbers of liquid surfactants are 
being derived, particularly as nonionics. The amorphous solid physical state is often obtained as a waxy paste. The 
surfactants in such pastes are usually in a liquid crystalline packing mode. Pure ionic surfactants of moderate 
formula weight (<500) are often obtained as crystalline powders. Published crystal structures suggest factors 
important to understanding how surfactants pack in micelles and in planar assemblies such as monolayers and 
bilayers. These factors include chain tilting and layering. 


Molecular packing of surfactants in crystals has been reviewed at some length [15]. An almost universal factor 


observed, at least for surfactants having long alkyl tailgroups, is that surfactants intrinsically pack in tail-to-tail or 
head-to-head bilayers in the absence of excess solvent. This factor makes it easy to understand the genesis of 
bilayer structures encountered in various surfactant mesophases, such as lamellar mesophases. Such bilayer 
packing typically allows for the compartmentalized separation of hydrophobic and hydrophilic domains in the 
crystal, wherein the hydrophobic tailgroups pack among themselves and the headgroups, often with solvent or 
water of crystallization, define a distinct hydrophilic or polar region. The tailgroups may pack end to end, as 
observed in a monohydrate of SDS [16], or they may pack in an interdigitated array such as in dodecylammonium 
bromide [17]. 

The extent of headgroup hydration provides relative control of the effective headgroup size and the tilt angles of 
hydrocarbon chains, relative to the normal, defined by the plane of the headgroups. As the effective headgroup size 
increases with increasing hydration this size must be accommodated by a concomitant amount of chain tilting. If 
the headgroup is small, little chain tilting is required in order to accommodate close packing of all of the molecules. 
As the headgroup size increases, the chains tilt to more densely fill space. 

This headgroup size—chain tilting phenomenon can be illustrated for the case of SDS. The crystal structure of SDS 
monohydrate is illustrated in figure C2.3.3 [18]. The packing is of a bilayer type and there is distinct chain tilting of 
about 40° relative to the ab plane of the headgroups. This tilting results in a bilayer thickness of about 28.9 A and a 
headgroup (projected) area of 29.5 A . In a more anhydrous SDS polymorph [ 16 ] containing only four water 
molecules per 32 SDS, distinctly less tilting appears. A tilt angle of about 21° relative to the plane of the 
headgroups results from the much lower solvation, smaller headgroup projected area, 20.9 A , and a much 
increased bilayer thickness, 38.9 A. Such dramatic variations in molecular packing with changing headgroup 
hydration and effective headgroup size underscores the importance of factors that control headgroup repulsion, 
such as counterion binding affinity and solvation, on packing in micelles. 
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^^J^f^- 


Figure C2.3.3. Molecular packing of SDS monohydrate viewed as projected on the ac plane. This polymorph 
crystallizes in a triclinic cell with unit cell constants a, b and c of 10.423 A, 5.662 A and 28.913 A, respectively, 
and with a = 86.70°, P = 93.44°, y = 89.55°. There are four molecules per unit cell. Adapted from figure 2 of [18]. 


C2.3.4 EXPERIMENTAL METHODS FOR EXAMINING MICELLES AND 
MICELLIZATION 


The concentration at which micellization commences is called the critical micelle concentration, cmc. Any 
experimental technique sensitive to a solution property modified by micellization or sensitive to some probe 
(molecule or ion) property modified by micellization is generally adequate to quantitatively estimate the onset of 
micellization. The determination of cmc is usually done by plotting the experimentally measured property or 
response as a function of the logarithm of the surfactant concentration. The intersection of asymptotes fitted to the 
experimental data or as a breakpoint in the experimental data denotes the cmc. A partial listing of experimental 


techniques used to determine cmc is given in table C2.3.5 . Foremost among these methods is surface tension 
measurement, typically at the air-water interface. An example of such a measurement is illustrated in figure C2.3.4 
for the case of an equal weight mixture of sodium diisopropyl naphthalene sulphonate and sodium triisopropyl 
naphthalene sulphonate. The measurements illustrated were obtained [ 19 ] by the Wilhelmy plate technique. There 
are numerous other methods for measuring surface tensions, such as capillary rise, maximum bubble pressure, drop 
weight and pendant drop techniques [20]. Mention should be made of the largest compendium of cmc data, 
assembled by Mukerjee and Mysels [21], and of an excellent updating compendium of van Os et al [22]. 


-11- 
Table C2.3.5 Survey of techniques and observables for determining cmc. 


Density 

Diffusion coefficient 

Dye decomposition kinetics 

Electromotive force 

Conductance 

ESR probe 

Flocculation rate 

Foaming power 

Freezing point 

Heat of dilution 

Light scattering 

NMR 

Neutron scattering 

Optical probe 

Partial volume 

Polarographic maximum 

Potentiometry 


Refractive index 

Solubilization 

Solubilization rate 

Specific heat 

Streaming current 

Surface tensions 

Taylor diffusion 

Turbidometry 

Turbidometric solubilization 

Ultracentrifugation 

Ultrafiltration 

Vapour pressure lowering 

Velocity of sound 

Viscosity 

Voltammetry of electroactive probe 

We in effect 

X-ray scattering 


Micellization is a second-order or continuous type phase transition. Therefore, one observes continuous changes 
over the course of micelle formation. Many experimental techniques are particularly well suited for examining 
properties of micelles and micellar solutions. Important micellar properties include micelle size and aggregation 
number, self-diffusion coefficient, molecular packing of surfactant in the micelle, extent of surfactant ionization 
and counterion binding affinity, micelle collision rates, and many others. 
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Figure C2.3.4. Relative surface tension of DTINS at 25 °C. The intersection of the dotted lines indicates a cmc of 
3.0 mM. Reproduced by permission from figure 1 of [20]. 

C2.3.4.1 CRITICAL MICELLE CONCENTRATION 

Measured cmcs range from large concentrations, such as approximately 0.6 M for sodium octanoate, to 

concentrations lower than 10 M for amphiphiles of very low solubility. The cmcs are significantly affected by a 
variety of physical properties associated with the hydrophilic headgroup, the hydrophobic portion (tailgroup) and 
the means by which these two groups are connected. Various aspects have already been discussed as affecting 
surfactant classification. The number of headgroups, the number of tailgroups and the connectivity of these groups 
can vary widely. Several linear free energy relationships between cmc and these molecular properties are discussed 
below and more rigorous thermodynamic modelling of micellization is presented in section C2.3.5 . These linear 
free energy relationships are based upon the thermodynamics of hydrophobic chain transfer from oil to water, 
solvent-hydrocarbon chain contacts, hydrocarbon chain packing and interactions between headgroups. 

(A) CMC OF HOMOLOGOUS SERIES 

Such linear free energy relationships are available for alkyl sulphates and for the C4 to C9 homologues of the 
dialkanoyl lecithins (see table C2.3.3 for structure). Most of the naturally occurring phospholipids are too insoluble 
to form micelles, but the lower alkanoyl lecithins, also known as phosphatidylcholines, do form micelles. The cmcs 
for these homologues are listed in table C2.3.6 . The approximately linear free energy relationship between the alkyl 
chain length and log cmc is given by: 
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log cmc = a — hn. 


(C2.3.3) 


The alkyl chain length is n, and a and b are fitting constants. The constant a usually varies with headgroup type and 
typically is 3.5 to 10. The b parameter is usually fairly uniform among different homologous alkyl chain 
surfactants. Values for b of about 0.5 are often obtained for single-chain nonionic surfactants and values of about 
0.3 are obtained for single-chain ionic surfactants. For this series of alkyl sulphates, parameters a = 4.39 and b = 
0.29 were obtained, and for this lecithin series, a = 5.77 and b = 0.85. Studies suggest that this b parameter is 


proportional to the hydrophobic interaction energy of micelle formation. 
Table C2.3.6 Cmc of homologous alkyl sulphates and dialkanoyl lecithins. 


Homologue 

cmc (mM) 

Sodium hexylsulphate 

420 

Sodium heptylsulphate 

220 

Sodium octylsulphate 

130 

Sodium nonylsulphate 

60 

Sodium undecylsulphate 

16 

Sodium dodecylsulphate 

8.2 

Sodium tetradecylsulphate 

2.05 

Dibutanoyl lecithin 

80 

Dihexanoyl lecithin 

14.6 

Diheptanoyl lecithin 

1.42 

Dioctanoyl lecithin 

0.265 

Dinonanoyl lecithin 

0.002 87 
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(B) SALT EFFECTS 


The cmcs of ionic surfactants are usually depressed by the addition of inert salts. Electrostatic repulsion between 
headgroups is screened by the added electrolyte. This screening effectively makes the surfactants more 
hydrophobic and this increased hydrophobicity induces micellization at lower concentrations. A linear free energy 
relationship expressing such a salt effect is given by: 

log cmc = (log cnic)^ - i r <Y- (C2.3.4) 

Here (log cmc) is the log cmc in the absence of added electrolyte, k c is related to the degree of counterion binding 
and electrostatic screening and c f is the ionic strength (concentration) of inert electrolyte. Effects of added salt on 
cmc are illustrated in table C2.3.7. 

Table C2.3.7 Salt effects on cmc. 


Surfactant/salt [salt] (M) cmc (mM) 

Sodium dodecylsulphate/NaCI 8.2 

0.1 1.4 



0.2 

0.83 


0.4 

0.52 

Dodecylpyridinium bromide/KBr 



12 


0.02 

7.25 


0.05 

4.70 


0.1 

2.74 

Dodecyltrimethylammonium bromide/NaBr 



14.8 


0.0175 

10.4 


0.05 

7.0 


0.1 

4.65 
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(C) SOLUBILITY 

Headgroups and tailgroups have a big effect on solubility. Important surfactants, such as the phospholipids that 
make up significant fractions of cellular membranes, are nearly insoluble in water. Such insoluble and double-tail 
surfactants typically do not form micelles and generally prefer to pack as bilayers. A very important double-tail 
surfactant, AOT (see table C2.3.1 for structure), is soluble in water, but is much more soluble in very many organic 
solvents. Such surfactants having high solubility in oils form the basis of reverse micellar (see section C2.3.8 ) and 
reverse microemulsion (see section C2. 3. 11 ) technology. 

A quantitative treatment of surfactant solubility has been successfully made empirically using linear free energy 
relationships. An important relation is that for the linear free energy of transfer of alkanes to water [23]: 


HT\nS = aA. (C2.3.5) 

In this relationship S is alkane solubility, A is the cavity surface area and a is the hydrophobic free energy per unit 
area. Extensive fitting of this equation [ 24 ] yields a value of 88 kJ mol A for the proportionality constant a. 
This value corresponds to an unfavourable free energy of about 3.6 kJ mol for the transfer of a CH 2 group to 
aqueous solution. 

(D) KRAFT POINT 

The Kraft point (7^) is the temperature at which the cmc of a surfactant equals the solubility. This is an important 
point in a temperature-solubility phase diagram. Below T K the surfactant cannot form micelles. Above T K the 
solubility increases with increasing temperature due to micelle formation. T K has been shown to follow linear 
empirical relationships for ionic and nonionic surfactants. One found [25] to apply for various ionic surfactants is: 

T K = an + h (C2.3.6) 

where n is hydrocarbon chain length and a and b are fitting parameters. The parameter b typically varies for 


different types of headgroup. Various values (a, b) have been obtained such as (5.5, 1 1) for sodium alkylsulphates, 
(5.5, 29) for sodium alkylsulphonates (odd number of carbons), (5.5, 34) for sodium alkylsulphonates with an even 
number of carbon, and (5.5, 44) for sodium alkyloxyethylenesulphates. 

(E) CLOUD POINT 

The temperature at which a clear solution of surfactant just becomes turbid with heating is called the cloud point. 
This turbidity comes from light scattering by assemblies of micelles. These assemblies arise as a result of attractive 
interactions between micelles and may be thought of as clusters of micelles that in some cases become ordered 
mesophases of micelles. Such a phenomenon has been identified as a thermodynamic critical point and is most 
often exhibited by nonionic surfactants. As a practical matter it is known that cloud points can be raised by adding 
ionic groups to a surfactant (such as by adding a sulphate group to an oligomeric poly(propylene oxide)-poly 
(ethylene oxide) surfactant. At higher temperatures a phase separation into a water-rich phase and a surfactant-rich 
phase is generally obtained. 
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C2.3.5 THERMODYNAMICS OF MICELLIZATION 

The free energy driving micellar aggregation primarily derives from intermolecular attractive and repulsive forces 
between the hydrophobic tailgroups and the decrease in free energy obtained when these tailgroups are no longer 
solvated by water or other polar solvent. Typically 10-100 monomers self-assemble to form a micelle. The 
spheroidal object formed by such aggregation allows one to put forward the pseudophase approximation, wherein 
the oily micellar interior constitutes a disperse oil phase (pseudophase) and the continuous aqueous phase 
represents the other pseudophase. Micellar solutions are, however, single-phase solutions, but this pseudophase 
approximation is very useful for keeping track of solute binding and partitioning into micelles, and phenomena 
related to such pseudophase concepts. A very important alternative approach to modelling micellization is the 
stepwise aggregation or chemical equilibrium approach. In this approach the detailed molecular or ionic growth of 
micelles is modelled in terms of sequential chemical equilibria, as surfactant merges with clusters to form slightly 
larger aggregates. Both of these approaches lead to thermodynamically equivalent results. 

C2.3.5.1 CHEMICAL POTENTIAL 

Assume that the chemical potential, ju, of surfactant in aggregates of size TV in equilibrium with one another is 
uniform. One may therefore write 


W = WA' (C2.3.7) 

where ju is a constant and \i N is the chemical potential of an aggregate of size N. We may also write 

fiN = H% + — ln {-^) (C238) 

where (ju ( ^) is the standard part of the chemical potential and X N is the mole fraction of surfactant aggregates of 
size N. After rearranging terms, X^may be written: 

X N = NX?Q— h-*-. (C2.3.9) 


The standard chemical potentials (jtf "■) are approximately the same if the surfactant in each aggregate sees nearly 
the same interaction with the solvent. This simplifying assumption then gives 

X { < ] (C2.3.10) 

X*<£X| (C2.3.11) 


-17- 

and yields the conclusion that the surfactant is in a monomeric state (aggregate of size 1). However, (ju",) actually 
often decreases with increasing N. When this is the case, we see from equation (C2.3.9) that the mole fraction of 
surfactant in large aggregates may be relatively large. A necessary condition for the formation of large aggregates 
is this decrease of (/Am) with increasing N. 

C2.3.5.2 SHAPE EFFECTS 

The variation of this standard chemical potential (ju",) with aggregate size is important in determining whether 
micelles or aggregates will form. This reference potential also determines polydispersity and aggregate shape. For 
the sake of discussion we consider the formation of (one-dimensional aggregates) linear chains of surfactants. We 
approximate the pairwise binding energy (relative to separated species) as akT. The standard reference potential is 
then written: 


Ml- "(I -£)««-. 


(C2.3.12) 


It appears that (/i ^) approaches ({*%) as TV — > oo. We therefore have in one dimension: 

«W =A« + — - (C2.3.13) 

This expression can be generalized to two-dimensional aggregates (disclike micelles) and to spherical micelles, 
where 

A-A*TP < C2 - 3M » 

and the exponent v takes values that depend on the aggregate shape. For disclike micelles, v = 1/2, and for 
spherical micelles, v = 1/3. The parameter a reflects the energy of intermolecular binding in units of the thermal 
energy. 


C2.3.6 MORPHOLOGY AND STRUCTURE 

The early Hartley model [2, 3] of a spherical micellar structure resulted, in later years, in some considerable debate. 
The self-consistency (inconsistency) of spherical symmetry with molecular packing constraints was subsequently 
noted [4, 5 and 6]. There is now no serious question of the tenet that unswollen micelles may readily deviate from 
spherical geometry, and ellipsoidal geometries are now commonly reported. Many micelles are essentially 
spherical, however, as deduced from many light and neutron scattering studies. Even ellipsoidal objects will appear 


spherical when the time scale of the experimental probe is longer than the rotational period of the micelle. 
Cylindrical micelles presumably originate as spherical or ellipsoidal objects, and grow cylindrically as a 
consequence of competing energetics, wherein cylindrical extension is favoured over (hemispherical) end cap 
completion. 
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Deviations from spherical geometry are more prevalent in reverse micelles of low hydration, because spherical 
symmetry is more difficult to construct when concentrically arranging headgroups that may only interact 
repulsively. That these headgroups must fill space at the reverse micellar core necessarily introduces packing 
defects that mitigates against spherical symmetry. Normal and reverse micelles become more spherical when they 
are swollen with solvent, and thereby form microemulsion droplets ( section C2. 3. 11 ). Regular spheroidal micelles, 
cylindrical micelles and branched cylindrical micelles have been imaged by cryo transmission electron microscopy, 
so that such approximate shapes have been fairly directly visualized [26, 27 ]. 

Resolution at the atomic level of surfactant packing in micelles is difficult to obtain experimentally. This difficulty 
is based on the fundamentally amorphous packing that is obtained as a result of the surfactants being driven into a 
spheroidal assembly in order to minimize surface or interfacial free energy. It is also based upon the dynamical 
nature of micelles and the fact that they have relatively short lifetimes, often of the order of microseconds to 
milliseconds, and that individual surfactant monomers are coming and going at relatively rapid rates. 

In addition to these long contested arguments over the sphericity of micelles are arguments over the accessibility of 
solvent to the core of normal micelles. The morphological models such as that of Dill and Flory in figure C2.3.2 
portray micellar cores as composed of alkyl chains, and it has generally been believed that such an environment is a 
place where water may be expected to bind. Various experimental methods have, however, suggested the existence 
of water accessibility to these hydrocarbon chains and have caused some considerable disagreement in the 
literature. The simple packing models put forward by Fromherz showed, in a particularly clear way, that it is 
feasible to picture most parts of tailgroups as having a reasonable probability of accessing the surface of micelles. 
In other words, the micelle surface is not densely packed with headgroups, but also comprises intermediate and end 
of chain segments of the tailgroups. Such segments reasonably interact with water, consistent with dynamical 
measurements. Given that the lifetime of individual surfactants in micelles is of the order of microseconds and that 
of micelles is of the order of milliseconds, it is clear that the dynamical equilibria associated with micellar 
structures is one that brings most segments of surfactant into contact with water. The core of normal micelles 
probably remains fairly 'dry', however. 

C2.3.6.1 PACKING PARAMETER 

The surfactant number or surfactant parameter [28, 29 and 30], N , is defined as a dimensionless group: 


N s = vffa h (C2.3.15) 

where u is the volume of the surfactant tailgroup, / is the tailgroup length and a h is the area of the head group. 
These volume and length parameters may be estimated from partial molecular structural studies of various 
homologous series of surfactants, from single-crystal x-ray diffraction studies and from molecular models. The key 
result is that this dimensionless group, based upon surfactant molecular properties, allows for the prediction of 
mesoscale packing morphology. A summary of the predictions obtained is given in figure C2.3.5 . See table C2.3.1 , 
table C2.3.2 , table C2.3.3 and table C2.3.4 for examples of the structures of some of the surfactants mentioned 
below. 
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Figure C2.3.5. Mesoscale structures deriving from surfactants nominally exhibiting shapes corresponding to 
various packing parameter ranges. 

(A)N S <M3 

This inequality indicates the amphiphile adopts a shape essentially equivalent to that of a cone with basal area a^. 
Such cones self-assemble to form spheroidal micelles in solution or spheroidal hemimicelles on surfaces (see 
section C2. 3. 15 ). Single-chain surfactants with bulky headgroups, such as SDS, typify surfactants in this category. 

(B) 1/3<N S <M2 

In this range the packing parameter yields a molecular shape similar to that of a truncated cone. Such cones may 
assemble to form rodlike structures and cylindrical micelles. Single-chain surfactants such as SDS and CTAB can 
fall within this range when the ionic strength is high enough to shield electrostatic repulsion between headgroups. 
This shielding is a de facto means for making the headgroups sterically smaller. 

(C) 1/2 <N s < 1 

This range yields more highly truncated cones. The main mesophase structure obtained from these units is a 
flexible bilayer such as that formed in vesicles and liposomes. These arrangements are often obtained from double- 
chain surfactants such as lecithin, double tailed cationic surfactants and AOT. 
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(D) N s *1 


This parameter corresponds to cylindrical packing shapes. Surfactants and amphiphiles falling in this range often 
produce planar bilayers and lamellar mesophases. Such cylindrical building blocks also contribute to many 


important liquid crystalline applications. Double-tailed surfactants with smaller headgroups, such as phosphatidyl 
ethanolamine, tend to form planar bilayers. 

(E)N S >1 

Surfactants having an inverted truncated cone shape yield inverted spheroidal micelles. Many double-chain 
surfactants such as AOT form such inverted micellar structures. These kinds of surfactant also form inverted 
anisotropic liquid crystalline phases. 

C2.3.6.2 WORMLIKE MICELLES 

Lengthy cylindrical micelles have become known as wormlike micelles, threadlike micelles and giant micelles. 
While the thickness of such micelles is typically of the order of two surfactant lengths (3-6 nm), the length of such 
micelles can approach 1000 nm or more [31]. These lengths are also known as persistence lengths, and usually are 
of the order of 30-200 nm. They are often studied in direct analogy to polymers, and much effort has been 
expended in applying the tenets of the statistical mechanics of polymer chains to wormlike micelles [32, 33 ]. 
Magid [ 34 ] gives an excellent review of the analogy of wormlike micelles to polymers. She presents a compelling 
picture of such giant micelles as living polymers. 


C2.3.7 STATISTICAL MECHANICAL SIMULATIONS 

In the absence of anisotropy introduced by specific surfactant-surfactant interactions, a spherical droplet model is 
reasonable because it tends to minimize the surface energy. Deviations from spherical symmetry occur because of 
the finite size and anisotropy of surfactant molecules and the anisotropy of interactions. Many early experimental 
data were interpreted on the assumption of spherical structures. In seminal Monte Carlo studies by Haan and Pratt 
[35], micelles simulating those of sodium octanoate were examined. They found that the chains adopted a 
spheroidal structure that was never close to perfectly spherical. An example packing configuration of the type 
observed is illustrated in figure C2.3.6 for the case of an assembly involving 30 monomers. The shaded headgroups 
are mostly situated at the micellar surface, but it is obvious that much of the surface is also composed of methylene 
and methyl groups. This structure also obviously departs significantly from spherical symmetry. Spherical packing 
just is not energetically feasible when surfactant tailgroups must fill space. The situation changes dramatically 
when another solvent is permitted to fill the core region, as in microemulsions, and the surfactants can then pack in 
a more or less 'planar' manner at the oil-water interface. Similar conclusions have been upheld by much more 
time-consuming molecular dynamics simulations, such as those of Jonsoon et al [36]. A molecular dynamics 
snapshot of a sodium octanoate micelle is illustrated in figure C2.3.7 . This structure also shows that the micelle at a 
given instant is far from spherically symmetric. Of course, this structure is undergoing shape fluctuations as part of 
its dynamical equilibrium and it is constantly rotating in space. Such fluctuations and rotations tend to give an 
apparent spherical structure when averaged over time. This is why many structural studies based on neutron 
scattering, 
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for example, need to invoke spherical models. This molecular dynamics simulation also confirms the earlier Monte 
Carlo illustration of the accessibility of hydrocarbon chains to the micellar surface and the ensuing solvent contacts. 



Figure C2.3.6. Illustration of micelle structure obtained by Monte Carlo simulations of model octanoate 
amphiphiles. There are 30 molecules simulated in this cluster. The shaded spheres represent headgroups. 
Reproduced by permission from figure 2 of [35]. 



Figure C2.3.7. Snapshot of micelle of sodium octanoate obtained during molecular dynamics simulation. The 
darkest shading is for sodium counter-ions, the lightest shading is for oxygens and the medium shading is for 
carbon atoms. Reproduced by permission from figure 2 of [36]. 


-22- 


Since these early simulations the art and science of simulating micellar structures has advanced significantly [37, 
38 and 39]. Ionic micelles are being simulated with specific inclusion of solvent water. Micellization processes are 
being simulated dynamically using model united atom techniques [40, 41 and 42]; such methods are proving useful 
for understanding micelle and bilayer [43] structural evolution in solutions of small surfactants and for 
understanding micellization of polymeric surfactants (block copolymers) [44]. 


C2.3.8 REVERSE MICELLES 


The idealized reverse micelle sketched in figure C2.3.1 is an aggregate of a double-tail surfactant. In such systems 
the solvent is more compatible with the lyophobic part of the surfactant than with the headgroup. This preference 


leads to the inverted structure illustrated. There is much less known in terms of conclusive physical data (e.g., cmc, 
aggregation number) on reverse micelles than on normal micelles. This, in part, is due to the hydrophilic nature of 
surfactant headgroups, and the fact that it is experimentally challenging to prepare and study reverse micelles in 
water immiscible solvents while keeping all water out of the system. The cartoon in figure C2.3.1 is actually much 
more appropriate for a reverse microemulsion droplet, where the mole ratio of water to surfactant is of the order of 
ten or greater. 

Such reverse droplets generally have tiny water pools in the core that exhibit many of the bulk properties of water. 
In such cases it is more sensible to imagine the surfactant headgroups aligned as a monolayer at a water-oil 
interface. However, in the absence of more than a few water molecules per surfactant, such idealized packing 
cannot be obtained without generating energetically unfavourable vacuum cores. Another point of controversy has 
been the question of whether core water is a necessary condition for the formation of reverse micelles [45]. There is 
no fundamental reason why dipolar headgroups of even ionizable surfactants cannot associate to form a reverse 
micellar core. The presence of some waters of hydration will tend to ameliorate such association, and provide 
hydrogen bonding as a means of forming such associations. However, data showing that water facilitates reverse 
micelle (microemulsion) formation are incontrovertible. NMR self-diffusion data [46] for reverse micelles of AOT 
in decane are illustrated in figure C2.3.8 where the ratio of decane to decane plus water (0.6 % brine) is varied. In 
the limit of no added water, the self-diffusion of the surfactant is almost equal to that of water, and indicates that 
the reverse micelles formed have only a very few monomers in them. As water is added, both the self-diffusion of 
water and the self-diffusion of AOT decrease. This decrease indicates that the reverse micelles are controlling the 
diffusion rate of the water and AOT composing these micelles and are growing in size as more water is added. 
Reverse micelles of some surfactants will not form without added water, but many surfactants have been 
demonstrated to form them without added water [47]. 
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Figure C2.3.8. Self-diffusion coefficients at 45°C for AOT (•), water (■) and decane (±) in ternary AOT, brine 
(0.6% aqueous NaCl) and decane microemulsion system as a function of composition, a. This compositional 
parameter, a, is the weight fraction of decane relative to decane and brine. Reproduced by permission from figure 3 
of [46]. 

The issue of water in reverse micellar cores is important because water swollen reverse micelles (reverse 
microemulsions) provide means for carrying almost any water-soluble component into a predominantly oil- 
continuous solution (see discussions of microemulsions and micellar catalysis below). In the absence of water it 
appears that premicellar aggregates (pairs, trimers etc.) are commonly found in surfactant-in-oil solutions [47]. 
Critical micelle concentrations do exist (with some exceptions). 


C2.3.9 SOLUBILIZATION AND PARTITIONING 


Micelles are mainly important because they solubilize immiscible solvents in their cores. Normal micelles 
solubilize relatively large quantities of oil or hydrocarbon and reverse micelles solubilize large quantities of water. 
This is because the headgroups are water loving and the tailgroups are oil loving. These simple solubilization 
trends produce microemulsions (see section C2. 3. 11 ). 

Other solubilization and partitioning phenomena are important, both within the context of microemulsions and in 
the absence of added immiscible solvent. In regular micellar solutions, micelles promote the solubility of many 
compounds otherwise insoluble in water. The amount of chemical component solubilized in a micellar solution 
will, typically, be much smaller than can be accommodated in microemulsion formation, such as when only a few 
molecules per micelle are solubilized. Such limited solubilization is nevertheless quite useful. The incorporation of 
minor quantities of pyrene and related optical probes into micelles are a key to the use of fluorescence 
depolarization in quantifying micellar aggregation numbers and micellar micro viscosities [48]. Micellar 
solubilization makes it possible to measure acid-base or electrochemical properties of compounds otherwise 
insoluble in aqueous solution. Micellar solubilization facilitates micellar catalysis (see section C2. 3. 10 ) and 
emulsion polymerization (see section C2. 3. 12 ). On the other hand, there are untoward effects of micellar 
solubilization in practical applications of surfactants. When one has a multiphase 


-24- 


system as often encountered in cosmetic formulations or in photographic emulsion technology, one or more 
chemical components may be present as nanoparticulates (for example, an organic coupler that reacts to form 
image dye in a colour photographic element). If surfactant is present in excess, so as to form micelles, 
solubilization of such a component in the micelles often may lead to unwanted growth of such particulates via 
Ostwald ripening. Since micellar solubilization raises the effective solubility of the component, the ripening rates 
will increase and be exacerbated. 

Micelles can solubilize gases. It has been demonstrated [49] that the Laplace model gives a good description of 
such solubilization for the case of ionic micelles: 


In X ttt = In X h - ^ V (C2.3.16) 

aiiRT 

where X is the mole fraction of gas in the micelle and X b is the mole fraction of gas in a bulk solvent equivalent to 
that of the surfactant tail, a is the interfacial tension at the water-micelle interface, v is the partial molar volume of 
the gas in the micelle, n is the number of carbon atoms in the surfactant tail group, a is the segment length, T is 
temperature and R is the gas constant. This equation derives from the existence of a Laplace pressure differential 
across the micelle-water interface. Typical interfacial tensions applicable to micelles of hydrocarbon surfactants 
are in the neighbourhood of 30 dyn cm 

The solubilization of diverse solutes in micelles is most often examined in terms of partitioning equilibria, where 
an equilibrium constant K defines the ratio of the mole fraction of solute in the micelle (X m ) and the mole fraction 
of solute in the aqueous pseudophase. This ratio serves to define the free energy of solubilization (-RT \n K). 
Recent monographs provide access to tabulations of such thermodynamic quantities [50, 51 ]. 

It is of particular interest to be able to correlate solubility and partitioning with the molecular structure of the 
surfactant and solute. 'Likes dissolve like' is a well-worn phrase that appears applicable, as we see in 
microemulsion formation where reverse micelles solubilize water and normal micelles solubilize hydrocarbons. 
Surfactant interactions, geometrical factors and solute loading produce limitations, however. There appear to be no 
universal models for solubilization that are readily available and that rest on molecular structure. Correlations of 
homologous solutes in various micellar solutions have been reviewed by Nagarajan [52]. Some examples of 
solubilization, such as for polycyclic aromatics in dodecyl sulphonate micelles, are driven by hydrophobic 


interactions, while a variety of other types of micelle exhibit entropy driven solubilization. Other solutes and 
micelles involve solubilization in the headgroup region, and some of these cases are best modelled as specific 
binding phenomena (rather than partitioning). In cases where loading is fostered to the point of micellar swelling, 
where a core of neat solute is formed, we have the evolution of a microemulsion 'droplet'. 


C2.3.10 MICELLAR CATALYSIS 

Reversibly formed micelles have long been of interest as models for enzymes, since they provide an amphipathic 
environment attractive to many substrates. Substrate binding (non-covalent), saturation kinetics and competitive 
inhibition are kinetic factors common to both enzyme reaction mechanism analysis and micellar binding kinetics. 
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Micellar catalysis has two important components. The first is the partitioning (localization, binding) of the reagent 
into or onto the micelle. Such localization provides increased interactions among the reagents, particularly with 
respect to the collision frequency in the case of bimolecular reactions. The second is a field or 'solvent' effect 
wherein the micellar environment provides catalysis by modifying the reaction transition state. The first of these 
components is the best understood and the most often encountered. Micellar catalysis differs from enzyme catalysis 
in some significant ways. Substrate specificity is usually much less significant in micellar catalysis. Also, the 
enhancement of reaction rates is much, much smaller in micellar catalysis than in enzyme catalysis. In addition, the 
substrate concentration in enzyme catalysis is usually much lower than the concentration of enzyme, but in micellar 
catalysis the concentration of micelles is usually of the same order as that of at least one of the reagents. 

This localization phenomenon has also been shown to be important in a case of catalysis by premicellar 
aggregates. In such a case [53] premicellar aggregates of cetylpyridinium chloride (CPC) were shown to enhance 
the rate of the Fe(III) catalysed oxidation of sulphanilic acid by potassium periodate in the presence of 1,10- 
phenanthroline as activator. This chemistry provides a lowering of the detection limit for Fe(III) by seven orders of 
magnitude. It must also be appreciated, however, that such premicellar aggregates of CPC actually constitute mixed 
micelles of CPC and 1,10-phenanthroline that are smaller than conventional CPC micelles. 

The oily interiors of micelle cores often provide a driving force for substrate binding to micelles. Those interiors 
are not the only physical aspect that affect the chemistry. Studies have shown that both electrostatic and geometric 
factors may be important in micellar catalysis. In particular, the surfactant headgroup (its charge, its volume and its 
hydration) affects the catalytic power of the micelle. The role of surfactant headgroups in modifying transition 
states can be contrasted with the role of oily interiors in providing substrate binding. Micellar solubilization has 
also been shown to inhibit alkaline decomposition when the reactive site is buried within the micellar interior. 
However, many micellar catalysed reactions occur near the charged double layer in proximity to the ionic 
headgroups [54]- Nucleophilic aromatic substitutions, such as the attack by azide ion on 2,4- 
dinitrochloronaphthalene catalysed by CTAB micelles [55], are examples of micellar catalysis by field or solvent 
effects. Such substitutions typically involve charged transition states that can be significantly modified by micelle 
ionic structures. 

C2.3.10.1 REACTION PATHWAYS 

Alteration of reaction pathways by micellar catalysis often can yield modified product distributions. Such 
modification is most easily obtained when one pathway is catalysed and another is inhibited by the micellar 
environment. An excellent tabulation of product distribution variations obtained by micellar catalysis is given by 
Fendler [56]. An example is illustrated in figure C2.3.9 for the CTAC (cetyltrimethylammonium chloride) 
catalysed photodecarbonylation of dissymmetrical ketones [57], A(CO)B. The CTAC micelles provide a cage 
effect that greatly enhances the joining of the A and B radicals produced by the photolysis. Although the 
localization effects and field effects provided in micellar catalysis can provide significant rate enhancements and 


these environmental effects can provide dramatically altered product distributions, there has been little effective 
development of micellar catalysis in modifying stereoselectivities. The dynamical equilibria exhibited by micelles 
mitigates against easily developing stereoselective binding equilibria. 
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Figure C2.3.9. Product distribution of dissymmetrical ketone photolysis as influenced by cetyltrimethylammonium 
chloride (CTAC) micelles. The initial ketone, A(CO)B is photolysed to lose the carbonyl group and to produce 
three products, AA, AB and BB. These data are for benzyl (A) 4-methylbenzyl (B) ketone. Product AA is 1,2- 
diphenylethane, product BB is 1 ,2-ditolylethane and product AB is l-phenyl-2-tolyl-ethane. At low CTAC 
concentration, in the absence of micelles, a random distribution of products is obtained. In the presence of micelles, 
however, the AB product is heavily favoured. Adapted with permission from [57]. 

C2.3.10.2 SELF-REPLICATION 

A particularly interesting type of micellar catalysis is the autocatalytic self-replication of micelles [58]. Various 
examples have been described, but a particularly interesting case is the biphasic self-reproduction of aqueous 
caprylate micelles [59]. In this system ethyl caprylate undergoes hydroxyl catalysed hydrolysis to produce the free 
carboxylate anion, caprylate. Caprylate micelles then form. As these micelles form, they solubilize ethylcaprylate 
and catalyse further production of caprylate anion and caprylate micelles. 

C2.3.10.3 REVERSE MICELLAR CATALYSIS 


An even greater diversity of catalytic processes has been obtained in reverse micellar systems. Reverse micelles, as 
pictured in figure C2.3.1 usually contain hydrating water molecules around the surfactant headgroups. Additional 
water solubilized in the core of reverse micelles produces reverse microemulsions ( section C2. 3. 11 ), where the size 
of the water nanoreactor core can be simply adjusted by the amount of water added to the system. The same kinds 
of partitioning, field effects and headgroup charge effects encountered in normal micellar catalysis are also 
obtained in reverse micellar catalysis. However, the direct incorporation of varied catalysts, such as enzymes and 
cofactors, into the water pools provides a plethora of additional chemistries. Many of these chemistries are 
described in monographs [56, 60, 61], and include electron transfer reactions, donor-acceptor interactions, ester 
hydrolysis, carbohydrate hydrolysis, polymerization of olefmic monomers such as acrylates, methacrylates, 
acrylamides, acid dissociation, 
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Schiff base formation, photochemistry, protein partitioning, catalysis by chymotrypsin, lipase, peroxidase, 
phosphatase, catalase and alcohol dehydrogenase. 


C2.3.11 MICROEMULSIONS 

Microemulsions can be simply defined as solvent-swollen micellar solutions, wherein the swelling solvent is 
immiscible with the solvent of the pseudocontinuous phase. For example, in the case of normal micelles in aqueous 
solution, the swelling solvent typically is a water-immiscible organic solvent. This definition is not sufficient, 
however, because it only covers an important topological subset of microemulsions. This subset is one wherein the 
swelling solvent is present in objects readily identified as spheroidal or cylindrical 'particles' or 'droplets'. 
Microemulsions also contain a distinct topological entity known as irregular, bicontinuous microemulsions, 
wherein interdigitated domains of immiscible solvents are separated by a monolayer of surfactant. These structures 
will be elaborated further in the discussion below. 

All microemulsions have at least three chemical components, surfactant and two immiscible solvents; micellar 
solutions need only have two chemical components, surfactant and solvent. Microemulsions and micellar solutions 
are thermodynamically stable isotropic solutions. In this context the phrase 'thermodynamically stable' means that 
only moderate mixing is required to transform a mixture of the three main components into a transparent isotropic 
solution. There, of course, can be diverse kinetic barriers in such dissolution processing, and it often is convenient 
to dissolve the surfactant in one of the solvents before mixing in the other solvent, but, once formed, an isotropic 
microemulsion solution will not phase separate unless there is molecular decomposition or some field variable, 
temperature for example, is changed. Our use of the adjective isotropic in these contexts means that 
microemulsions are optically isotropic. That is to say the microstructures that exist in the aggregates in 
microemulsions are almost always significantly smaller than the wavelengths of visible light. 

The composition of microemulsions is usefully considered in the context of ternary phase diagrams such as that 
illustrated in figure C2.3.1Q . The region marked L, is the normal single-phase microemulsion domain wherein the 
solution microstructure is essentially that of micelles swollen with oil. Similarly, the L 2 domain essentially has 
water-swollen reverse micelles. At certain field variables these domains may be simply connected to one another. 
When such is the case, the microstructure in the 'connecting' region is predominantly that of an irregular, 
bicontinuous microemulsion. The physical, chemical and practical aspects of microemulsions fill many, many 
volumes, so no further attempt will be made to elucidate them here. Excellent monographs are available [62, 63 , 
64, 65 and 66]- 
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Figure C2.3.10. Ternary phase diagram of surfactant, oil and water illustrating the (regular) L 1 and (reverse) L 2 
microemulsion domains. 


C2.3.12 EMULSION POLYMERIZATION 

The production of organic polymeric particles in the size range of 30-300 nm by emulsion polymerization has 
become an important technological application of surfactants and micelles. Emulsion polymerization is very well 
and extensively reviewed in many monographs and texts [67, 68], but we want to briefly illustrated the role of 
micelles in this important process. 

Surfactants provide temporary emulsion droplet stabilization of monomer droplets in the two-phase reaction 
mixture obtained in emulsion polymerization. A cartoon of this process is given in figure C2.3.11 . There we see 
that a reservoir of polymerizable monomer exists in a relatively large droplet (of the order of the size of the 
wavelength of light or larger) kinetically stabilized by surfactant. 

The role of micelles comes into play in nucleating the formation of the polymerized organic particles. The initial 
stages of polymerization, in the case of some monomers, may occur in the aqueous phase, but at some point of 
growth the aqueous solubility is no longer sufficient. The micelles provide a place, through solubilization, for small 
oligomers to continue growing thereafter. The micellar environment provides a region of intermediate solubility, 
more favourable for these oligomers than the aqueous phase or the reservoir phase. Transport between the 
monomer reservoirs (emulsion droplets) and the reacting, polymerizing polymer particle is also facilitated by 
micellar solubilization of monomer. 

Three phases, initiation, growth and termination, are typically encountered in emulsion polymerization. The 
initiation stage involves the creation of monomeric radicals and ensuing oligomeric radicals. These oligomers 
become temporarily engulfed in a micelle. The initiation stage is followed by the growth stage, wherein monomer 
in the reservoir (emulsion) particles is depleted. Finally, the radical chemistry and polymerization is shut down in 
the termination stage, and the radical polymerization ceases. 
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Figure C2.3.11 Key surfactant structures (not to scale) in emulsion polymerization: micelles containing monomer 
and oligomer, growing polymer particle stabilized by surfactant and an emulsion droplet of monomer (reservoir) 
also coated with surfactant. Adapted from figure 4-1 in [67]. 


C2.3.13 MICELLAR AND MICROEMULSION POLYMERIZATION 

The polymerization of micelles composed of polymerizable surfactants while maintaining micellar morphology and 
size is a continuing challenge in colloid and polymer science. No significant case of polymerizing a micelle, while 
maintaining morphological integrity, has yet been convincingly reported. This situation will no doubt change in the 
near future, as the steric constraints of polymerization in related systems are being overcome [69, 70]. Since 
micelles and microemulsions are in dynamic equilibrium, it is difficult to polymerize surfactant monomers fast 
enough while maintaining the morphological integrity of the micelle. Also, steric constraints come into play once a 
monomer has been joined to another, so that it is difficult to carry on the polymerization while maintaining the 
nominal packing that existed before polymerization. It is noteworthy that Cussler and co-workers [71, 72] report 
what they believe to be polymerization in a bicontinuous microemulsion that preserves interfacial structure. 
However, it must be noted that the main evidence is that correlation lengths appear preserved, before and after 
polymerization, and that this preservation may be coincidental was not ruled out. 

Irrespective of whether microemulsion and micellar interfaces can be polymerized using polymerizable surfactants, 
it is now very well established that monomers in microemulsions, whether part of or entirely composing the oil 
phase, whether solubilized in the aqueous phase or whether part of both pseudophases can be polymerized to form 
very small particles or interesting bulk materials. When oily monomers are polymerized inside conventional 
microemulsions or when aqueously soluble monomers (acrylamides, acrylates) are polymerized in reverse 
microemulsions, one usually obtains latexes similar to those obtained by emulsion polymerization (30-100 nm in 
diameter). The mechanism of microemulsion polymerization is essentially identical to that for emulsion 
polymerization, except that the initiation and termination intervals are often not connected by a very lengthy 
growth interval. This is because there are no large monomeric reservoirs and monomer transport is facile and 
essentially diffusion controlled. Excellent reviews on microemulsion polymerization are readily available [73 , 74 
and 75 ]. Very recent success has been obtained in obtaining very small particles in reverse microemulsion 
polymerization [76]. The key in the studies of Pileni and co-workers is 
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to use surfactants that have polymerizable counter-ions. Double-tail cationic surfactants with acrylate-type counter- 
ions yield ultrasmall particles (2-4 nm). Additional polymerizable monomer can be included in the water core, but 
having such species as part of the surfactant (headgroup) without covalently induced strain on the cationic 
surfactant packing appears to play a major factor in preserving the morphology through the polymerization interval. 


C2.3.14 MICELLE-BASED MESOPHASES 

Two-dimensional maps of single- and multiple-phase domains as a function of temperature and of 
surfactant/solvent mole fraction (or, alternatively, weight fraction) provide a useful means for characterizing ionic 
and nonionic surfactants. The variety of physical states discussed earlier, and more, are routinely exhibited in 
binary phase diagrams of surfactants. The formation of micellar and other aggregates is driven by hydrophobic 
interactions. As such aggregates become more concentrated and interact more strongly, supramolecular ordering of 
such aggregates occurs and the shape of such micelles and aggregates can change. These transitions yield a rich 
array of mesophases, such as lamellar phases where surfactant packs in infinite bilayers that can be swollen in the 
headgroup region by water or in the tailgroup region by organic solvent [77]. Some of these mesophases have 
building blocks identifiable as micelles. 

Spherical and ellipsoidal micelles and rodlike micelles can form supramolecular assemblies having cubic 
symmetry. Ellipsoidal micelles at sufficiently high concentration may pack at cubic lattice sites to produce viscous 
cubic phases. For example, certain triblock copolymeric micelles form cubic mesophases [78], wherein the micelles 
aggregate in a cubic array. Such arrays are often thermoreversible gels that 'melt' on cooling, as isotropic and low 
viscosity micellar solutions form (as the number density of micelles decreases with concomitant increases in the 
cmc). Both fee and bec arrays have been reported for such cubic mesophases [79, 80, 81, 82, and 83]. Furthermore, 
such ordering is often induced by shear. Two-dimensional scattering from a shear-induced cubic mesophase of 
E0 96 P0 39 E0 96 (see table C2.3.4 ) is illustrated in figure C2.3.12 [84]. This is the type of scattering pattern 
expected for a bec lattice. 
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Figure C2.3.12. Two-dimensional neutron scattering by E0 96 P0 39 E0 96 (Pluronic F88) micellar solution under 
shear with (a) the sample shear axis parallel to the beam, and (b) the sample rotated 35° around the vertical axis. 
Reflections for several of the Miller indices expected for a bec lattice are annotated. Reproduced by permission 
from figure 4 of [84]. 
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Normal and reverse cylindrical micelles or rodlike micelles can pack hexagonally to form a variety of mesophases. 
When cylindrical rods (micelles) pack hexagonally in two dimensions to form the normal hexagonal (H^) 
mesophase, one obtains what was known as the middle phase in soap technology. Similar arrays may form from 
reverse rodlike micelles, and such mesophases are called inverse hexagonal (H /7 ) phases. Such mesophases are 
liquid crystalline and highly viscous and are often formed by phospholipids. These hexagonal mesophases are 
illustrated in figure C2.3.13 [85]. A thorough theoretical analysis of nematic mesophase formation by rodlike 
micelles, in the framework of micellar growth coupling with micellar alignment, has been given by vander Schoot 
and Cates [86]- A summary of work done on trying to understand the formation of hexagonal mesophases has been 


detailed by Odijk [87]. 



Figure C2.3.13. Normal (H /? left) and inverse (H //? right) hexagonal mesophases composed of rodlike micelles. 


C2.3.15 ADSORBED MICELLES 

The formation of surface aggregates of surfactants and adsorbed micelles is a challenging area of experimental 
research. A relatively recent summary has been edited by Sharma [51]. The details of how surfactants pack when 
aggregated on surfaces, with respect to the atomic level and with respect to mesoscale structure (geometry, shape 
etc.), are less well understood than for micelles free in solution. Various models have been considered for surface 
surfactant aggregates, but most of these models have been adopted without firm experimental support. 

C2.3.15.1 ISOLATED SURFACTANT ADSORPTION 


Individual and isolated surfactant adsorption onto a surface can be imagined to occur in various ways, such as those 
depicted in figure C2.3.14 . The headgroup end-on adsorption shown in figure C2. 3. 14(a) is the most commonly 
assumed mode of adsorption. It is generally invoked for charged surfactant adsorption, when it is assumed that the 
prevalent surface charge is opposite to that of the surfactant headgroup. This assumption has most often been 
applied to the adsorption of straight-chain cationic surfactants onto negatively charged surfaces (silica, metals etc.). 
The mode exhibited in figure C2. 3. 14(b) is one that requires interaction of the headgroup and a part of the 
surfactant chain. Modes depending on hydrophobic interactions with the surface are shown in parts (c) and (d). 
When both the headgroup and the tail interact with the surface, the structures in figure C2. 3. 14(e) and figure 
C2. 3. 14(f) would be expected. Recent Raman evidence for CTAB adsorption onto negatively charged silver 
according to modes such as those of (e) and (f) appears unequivocal [88]. Modes (c)-(f) appear to be the most often 
overlooked in the literature. 
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Figure C2.3.14. Isolated surfactant modes of adsorption at liquid-solid interfaces for a surfactant having a distinct 
headgroup and hydrophobic portion (dodecyltrimethylammonium cation): (a), (b) headgroup specific interaction; 
(c), (d) hydrophobic tail interaction, (e),(f) headgroup and tail interactions. 

C2.3.15.2 HEMIMICELLE FORMATION 

It is easy to see how extrapolation of isolated adsorption depicted in figure C2. 3. 14(a) can lead to a picture of 
hemimicelle structure as illustrated in figure C2. 3. 15(a). Combinations of the modes illustrated in figure C2.3.14 
can be used to construct the alternative hemimicelle model of figure C2.3. 15(b). This alternative model was much 
more popular in the 1960s and 1970s, but was supplanted by the evolution of dogma in favour of figure C2.3.15 
(a) . It should be stressed that the structure for the hemispherical hemimicellar model takes simultaneous account of 
surfactant packing parameters and basic interfacial free energy, whilst the model of figure C2.3. 15(a) ignores the 
highly unfavourable tail-solvent interactions that remain about the periphery after aggregation. 



a b 

Figure C2.3.15. Hemimicelle structures: (a) monolayer type hemimicelle; (b) spheroidal, globular hemimicelle. 
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Adsorbed micelles, also called admicelles, are illustrated in cross-section in figure C2.3. 16(a) and figure C2.3.16 
(b) . The bilayer structure in (a), when capped at the ends, mitigates against the unfavourable solvent interactions 
maintained in the hemimicellar model of figure C2.3. 15(a) . When suitable end caps are added, these two admicelle 
models in figure C2.3.16 become somewhat indistinguishable. It is possible for the bilayer admicelle, however, to 
propagate over large areas so that the detailed molecular packing remains fairly ordered and distinguishable from 
the more random short range packing exhibited in the model of figure C2.3. 16(b) . 



Figure C2.3.16. Adsorbed micelle structures: (a) bilayer admicelle; (b) spheroidal, globular adsorbed micelle. 

Below the onset of surface aggregation through the adsorption of micelles, there are up to four regimes of 
adsorption observed. The first regime is that of isolated surfactant adsorption, such as depicted in figure C2.3.14 . 
The second regime is that in which hemimicelles form and/or admicelles adsorb. In log-log plots of adsorbed 
surfactant versus solution-phase surfactant concentration, the breakpoint between these first two regimes denotes 
the 'surface cmc' for hemimicelle formation. Additional surfactant uptake with increasing concentration in a 
subsequent regime has been interpreted as corresponding to completion of a surface bilayer. Asymptotic adsorption 
corresponds to a fourth regime, wherein solution micelles are believed to be in dynamical equilibrium with a 
surface bilayer-type admicelle, such as depicted in figure C2.3. 16(a). 

C2.3.15.3 DIRECT STRUCTURAL DATA 

The qualitative resolution of the morphology and structure of surfactant aggregates on surfaces is experimentally 
formidable. Until only recently all such structural assignments were made on quite indirect bases. In the case of 
interpreting neutron scattering data, for example, generally only unbounded structures of the type illustrated in 
figure C2.3. 15(a) and figure C2.3. 16(a) were considered in the modelling and data fitting processes. Fortunately, 
recent AFM (atomic force microscopy) studies by Manne [89] and collaborators have provided direct 
morphological data for a variety of surfactants interacting with several different surfaces. The overall results are 
exciting because they illustrate a diversity of structure. 
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All of these results were obtained for solutions that were above the respective cmc in aqueous solution. Structural 
results [90] for tetradecyltrimethylammonium bromide are illustrated in figure C2.3.17 where the surfactant is 
adsorbed onto a hydrophobic cleavage plane of MoS 2 - The cartoon illustrated shows the structure deduced fairly 
directly from the atomic force micrographs. Other results are summarized in table C2.3.8. The dodecyl- to 
hexadecyltrimethylammonium bromides, hexadecyltrimethylammonium chloride and SDS all yield similar parallel 
half-cylinders on a hydrophobic cleavage plane of graphite. These results contrast with earlier assignments of 
vertical monolayers (such as illustrated in figure C2.3.15 for the cationic surfactants [ 93 ] and for SDS [ 94 ] and 
hemispheres for SDS [95]. A strikingly different result, parallel flexible cylinders [90], was obtained for the 
dodecyl- to hexadecyltrimethylammonium bromides and chlorides on an anionic cleavage plane of mica. These 
flexible cylinders remain parallel but undergo s-shaped shifts as the underlying lattice is covered. Earlier 
assignments [92, 96, 97 and 98] deduced uniform bilayer structures such as illustrated in figure C2.3. 16(a) . This 
AFM approach did yield an assignment of a uniform bilayer for the adsorption of the double-tail cationic bromide 
on mica in agreement with an earlier study [99]. The results obtained by AFM for adsorption of the cationic 
tetradecyltrimethylammonium bromide onto the anionic surface of SDS are particularly interesting, as spheres and 
spheroids, such as depicted in figure C2.3. 15(b) and figure C2.3. 16(b) , were deduced in agreement with earlier 
assignments [ 100 ., 101 ] of spheres, but contrasted with other assignments inferring bilayer patches [ 102 ., 103 ., 104 
and 105 ]. Much work remains in developing a firmer comprehension of hemimicellar structure at concentrations 
lower than those that produce micelles in bulk solution. 


Table C2.3.8 Hemimicelle morphology by AFM. 


Surface Surfactant 


Morphology 


Graphite 


Mica 


MoS 2 
Si0 o 


CH,(CH 2 ) 
CH^CH,) 
CH^(CH 2 ) 

CHi(CH 2 ) 
CH ? (CH 2 ) 
CH,(CH 2 ) 
CH ? (CH 2 ) 

CH,(CH 2 ) 

CH,(CH 2 ) 

CH ? (CH 2 > 
CH,(CH 2 ) 


^u v,^u , r> - Parallel half-cylinders [90, 91] 

rtL.n;MlL.njJj. tsr 

2 CH>N{CH}hBr- 

jCH 2 N{CH.O.; Br" (CTAB) Parallel half-cylinders [92] 


n CH 2 OSO^ Na + (SDS) 
nCHiNfCHO.iBr 
: CH 2 N(CH 3 hBr" 
jCH 2 N{CH.O.i Br - (CTAB) 

°S/>N(CHO : Br 
(»Crl 2 

2 CH 2 N{CHj).iBr~ (CTAB) 

2 CH 2 N{CHO> Br" (CTAB) 


Parallel flexible cylinders [91] 


Uniform bilayer [91] 
Parallel half-cylinders [91] 
Spheres and spheroids [91] 
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Figure C2.3.17. Model of half-cylindrical aggregates (hemimicelles) on a crystalline hydrophobic substrate, such 
as for tetradecyltrimethylammonium bromide on MoS 2 [91]. Adapted from figure 2 of [89]. 


C2.3.16 MICELLE-POLYMER INTERACTIONS 

Many practical applications of surfactants and polymers utilize both types of component in the same single- or 
multi-phase formulation. For example, in the photographic industry, anionic surfactants are used as charge 
stabilizers for a myriad of organic and inorganic nanoparticulates and polymers (such as gelatin and other 
polyelectrolytes) are used as steric stabilizers for the same particulates. There are significant synergistic 
interactions between surfactants and polymers, and most of these interactions are based on how these polymers 
interact with micelles of these surfactants [ 106 , 107 , 108 and 109 ]. 

C2.3.16.1 EFFECTS ON CMC 


An important mechanistic feature of such interactions is in the molecular detail of the interaction. Two general 


types of interaction may be articulated: (1) nucleation of a micelle by some pendant group of the polymer, wherein 
a micelle grows about some part of the polymer; (2) polymeric adsorption onto the micellar surface, where one can 
picture a nonintegral type of binding between the micelle and polymer involving only the micellar headgroup 
region and active sites of the polymer [ 110 ]. These two limiting cases define boundaries between which mixtures 
of the two effects can be observed. 

Additives, whether hydrophobic solutes, other surfactants or polymers, tend to nucleate micelles at concentrations 
lower than in the absence of additive. Due to this 'nucleating' effect of polymers on micellization there is often a 
measurable cmc, usually called a critical aggregation concentration or cac, below the regular cmc observed in the 
absence of added polymer. This cac is usually independent of polymer concentration. The size of these aggregates 
is usually smaller than that of free micelles, and this size tends to be small even in the presence of added salt 
(conditions where free micelles tend to grow in size). 

These effects are illustrated in figure C2.3.18 for the case of SDS micellization as influenced by poly(ethylene 
oxide), PEO, and salt [ 111 ]. The breakpoints in the figure denote the cmc or cac. From table C2.3.6 and table 
C2.3.7 we see that the cmc of SDS is 8.2 mM in the absence of salt and polymer and is 1 .4 mM in 0. 1 M NaCl. The 
open symbols in figure C2.3.18 show that in the absence of salt, 35 000 Dalton PEO (at 0.1 % w/w) depresses the 
cac to about 3.5 mM and 60 000 Dalton PEO (also at 0.1 % w/w) depresses the cac even further to about 2.3 mM. 
In 0.1 M NaCl both molecular weight samples of PEO depress the cac, to about 1 mM (relative to 1.4 mM in the 
absence of polymer), but the relative depression is much less than in the absence of salt. 
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Figure C2.3.18. Vibronic peak fluorescence intensity ratio (III/I) as a function of SDS concentration for 0.1 % 
PEO solutions o, • — 35 000 Daltons; □, ■ — 600 000 Daltons). Open symbols are for aqueous solution without 
added salt, and filled symbols are for 100 mM aqueous NaCl. Reproduced with permission from figure 2 of [ 111 ]. 

C2.3.16.2 EFFECTS ON RHEOLOGY 

Solution viscosity in systems with strong polymer-micelle interactions generally increases with polymer-micelle 
binding. If one imagines the necklace model of Cabane and co-workers [ 112 , 113 and 114 ], where micelles (beads) 
are bound to polymeric strands, two contributing features to viscosity may be identified. First, there exists an 
inertial effect due to the adsorbed micelles. These micelles serve to simply increase the effective molecular weight 
of the polymer and the effective friction coefficient of the polymer. In the neighbourhood of an adsorbed micelle, 
translation of the polymer strand in a reptation mode, for example, also requires translation of the attached micelle 
having considerably greater macromolecular cross-section. Second, occasional micelles may interact or bind to two 
different polymer strands. This kind of network formation also dramatically increases the effective molecular 
weight and friction coefficient of the polymer, as it introduces de facto cross-linking. This kind of cross-linking can 
also lead to the formation of gels at concentrations far below where the polymer would gel on its own. 


A particularly interesting kind of sol-gel-sol sequence has been reported by Bloor et al [ 115 ] for SDS interactions 


with ethyl-(hydroxyethyl)cellulose (EHEC). Below the cac the polymer and SDS are in solution and significant 
interactions among themselves or with each other are absent. As the cac is passed the SDS nucleates in micelles 
around the pendent ethyl groups. A fraction of these micelles attach to two or more pendent groups connecting 
different polymeric strands and thereby generating a gel network. This network grows with increasing SDS until a 
maximum number of cross-links are established. Further SDS additions displace some of these micelle-EHEC 
connections, and gradually dissolve the cross-links to produce another sol state comprising individual EHEC 
strands containing attached micelles. 
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C2.4 Organics films (Langmuir-Blodgett films and 
self-assembled monolayers) 

Georg Hdhner 


C2.4.0 INTRODUCTION 

The goal of constructing stable organic molecular architectures with desired properties that modify surfaces 
independent from their bulk characteristics is of fundamental interest in many areas. Organic chemists have made 
significant progress over recent years in designing and constructing molecules that have certain desired physical 
and chemical properties. Assembling such molecules onto surfaces, as well as characterizing and studying the 
resulting layers, lies at the centre of interest in many laboratories, and a variety of techniques have been employed 
in order to reach this goal. Ultrathin and especially monomolecular organic films with a high degree of order are of 
special interest since they open up new fields of research and could establish an organic counterpart to inorganic 
crystals. Such films play an important role not only in fundamental science, where they often serve as model 
systems (e.g. for polymers) but also in applied sciences, where they are employed as corrosion inhibitors, 
lubricants, adhesion promoters and in biosensors, as well as in many other applications. 

Inorganic surfaces can be covered with organic molecules by different methods. The Langmuir-Blodgett (LB) and 
the self-assembly (SA) techniques that are described here both offer the possibility of tailoring the properties of 
ultrathin organic films to a certain degree. The LB technique is suitable only for a limited number of molecules and 
substrates and requires special equipment for preparation. It was the first technique that gave chemists a practical 
method by which to prepare ordered molecular structures on surfaces. 

Adsorption from solution, which is most often employed in connection with the SA technique, provides the easiest 


route for studying the behaviour of organic molecules. Organic films can be prepared by immersing the inorganic 
substrates in dilute solutions of the surfactant. However, not all molecules and substrates are appropriate for 
establishing self-assembled monolayers (SAMs). 

Apart from the techniques described in this chapter other methods of organic film formation are vacuum deposition 
or film formation by allowing a melt or a solution of the material to spread on the substrate and subsequently to 
solidify. Vacuum deposition is limited to molecules with a sufficiently high vapour pressure while a prerequisite 
for the latter is an even spreading of the solution or melt over the substrate, which depends on the nature of the 
intermolecular forces. This subject is of general relevance to the formation of organic films. 

Excellent books and review articles covering LB and SA films have appeared recently. The following covers the 
basics and some selected topics are presented as examples. For a more comprehensive overview and more details 
on specific topics the reader is referred to the cited literature. 
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C2.4.1 LANGMUIR-BLODGETT FILMS 

The systematic study of ultrathin and, in particular, monomolecular organic films that are ordered originates from 
the end of the nineteenth century, and it was at that time when the first truly quantitative studies of amphiphilic (i.e. 
one end of the molecule is polar and hydrophilic, the other end is hydrophobic) monolayers were made by Pockels 
[1]. She demonstrated that in the case of amphiphilic molecules such as stearic acid (C^H^CC^H) there exists a 
unique form of layer at the air-water interface, having a definite ratio of mass of stearic acid to surface area of 
water on which it resides [2, 3 and 4]. Using these results and the known density of stearic acid, one can deduce a 
thickness of these layers of about 2.3 nm, which compares well with the modern value of 2.5 nm for the length of 
such a molecule. Lord Rayleigh suggested that these films were monolayers and thus gave a direct measure of 
molecular dimensions [5]. Subsequent work showed that only amphiphilic molecules form good monolayers 
whereas simple aliphatic ones do not [6]. 

In 1917, Langmuir published his systematic study of amphiphilic compounds at the air-water interface [7]. In 
1920, he mentioned the transfer of films from this interface to a solid substrate [8]. In 1935 K Blodgett published 
an extensive report on the deposition of mono- and multilayers of fatty acids on a solid substrate from films 
existing at the air-water interface [9]. In the following 30 years a number of publications appeared dealing with the 
properties of such films. The term Langmuir-Blodgett films (LB films) is usually employed to denote mono- or 
multilayers transferred from a liquid-gas interface onto a solid substrate. The molecular film at the (liquid-gas) 
interface itself is denoted a Langmuir film. 

C2.4.1.1 PREPARATION OF LB FILMS 

(A) LB MOLECULES (AMPHIPHILES) 

It is well known from everyday life that there are substances that dissolve in water and others that are insoluble. 
Many salts, for example, are soluble in water, while lipids are not. However, the latter are soluble in nonpolar 
solvents, such as CCl 4 . This is due to the different interactions between the solvent and the solute. Materials that 
'like' water (i.e. polar ones) are called hydrophilic, while those which do not like water are called hydrophobic. An 
amphiphile is a molecule that is not soluble in water, but has a hydrophilic and a hydrophobic end group ( figure 
C2.4.1 ). Therefore one end goes into water while the other points out, resulting in a spreading of the material on the 
water surface. A typical example is stearic acid (C 17 H 35 C0 2 H), where the long hydrocarbon chain is hydrophobic 
while the carboxyl group (-COOH), which can dissociate in water and become negatively charged, is hydrophilic. 
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Figure C2.4.1. Schematic diagram of a fatty acid with a hydrophilic (COO ) and a hydrophobic end group (CH 3 ) 
(left) and of an amphiphile in general (right). 

(B) TROUGH 

The basic equipment for the preparation of LB films is a trough containing the subphase on which the compound is 
spread and which is equipped with barriers in order to manipulate the film at the liquid-gas interface ( figure 
C2.4.2 ). Both the trough and the barrier are frequently made out of Teflon. This material is very inert and can be 
cleaned with strong oxidizing materials without any damage. The movable barrier allows the pressure exerted on 
the film to be controlled. All movements are performed by a motor. The substrate can be lowered and raised by 
means of another motor with a gearbox in order to transfer the film from the interface onto the substrate (see (e) 
below). The balance used to measure the pressure in the film is most often a so-called Wilhelmy plate. Discussion 
of methods for surface pressure measurements can for example be found in [10]. The full automization and 
computerization for the preparation of monomolecular and multilayer films started in the early 1970s [11]. 
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Figure C2.4.2. Schematic sideview of the trough. The movable barrier is used to push the molecules on the 
subphase together in the Langmuir film which is subsequently transferred to a solid substrate. 

Demand for temperature controlled troughs came from the material scientists who worked with large molecules and 
polymers that establish viscous films. Such troughs allow a deeper understanding of the distinct phases and the 
transitions in LB films and give more complete pressure-area isotherms (see (d) below). 

In general, extreme care has to be taken when LB films are prepared, since the quality of the resulting films 
depends crucially on the preparation conditions. The best place for an LB trough is a laboratory where the 
surroundings, i.e. temperature, humidity and atmosphere, are completely controlled. Often it is placed in a laminar 
flow box. Also, the trough should be installed in a shock-free environment. 

(C) SUBPHASE 

The most often used subphase is water. Mercury and other liquids [12], such as glycerol, have also occasionally 
been used [13, 14]. The water has to be of ultrapure quality. The pH value of the subphase has to be adjusted and 
must be controlled, as well as the ion concentration. Different amphiphiles are differently sensitive to these 
parameters. In general it takes some time until the whole system is in equilibrium and the final values of pressure 
and other variables are reached. Organic contaminants cannot always be removed completely. Such contaminants, 
as well as ions, can have a harmful influence on the film preparation. In general, all chemicals and materials used in 
the film preparation have to be extremely pure and clean. 
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(D) PRESSURE-AREA (Yl-A) ISOTHERM 


A drop of a dilute solution (1%) of an amphiphile in a solvent is typically placed on the water surface. The solvent 
evaporates, leaving behind a monolayer of molecules, which can be described as a two-dimensional gas, due to the 
large separation between the molecules (figure C2.4.3). The movable barrier pushes the molecules at the surface 
closer together, while pressure and area per molecule are recorded. The pressure-area isotherm yields information 
about the stability of monolayers at the water surface, a possible reorientation of the molecules in the two- 
dimensional system, phase transitions and changes in the conformation. While being pushed together, the layer at 


the water surface goes through different states: gaseous, liquid and solid. If the pressure is increased further, the 
layer collapses due to mechanical instabilities. This collapse is detected as a sharp decrease in the pressure. This 
break-down pressure is a function of temperature, pH, the subphase and the velocity with which the barrier moves. 
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Figure C2.4.3. Pressure-area isotherm for a fatty acid. The molecules are in a gaseous, liquid or solid state, 
depending on the area per molecule available. If the pressure is further increased, a mechanical instability occurs 
and the film breaks down. 

(E) FILM TRANSFER 

Vertical deposition. The conventional method of transferring films from the air-water interphase onto a solid 
substrate is vertical deposition, which was demonstrated by Langmuir and Blodgett ( figure C2.4.4 ). They showed 
that a monolayer of an amphiphile can be transferred to a substrate by moving a vertical plate through the film at 
the water-air interface. During transfer the Langmuir monolayer is held at constant surface pressure. The transfer 
process itself has not yet been fully understood, although it is known that there is a critical velocity above which it 
does not work. The effects of various parameters, such as viscosity, on the critical velocity have been investigated 
[15]. 
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Figure C2.4.4. Schematic diagram of the transfer process of LB films onto a hydrophilic substrate. Vertical 
upward and downward strokes result in hydrophobic and hydrophilic surfaces, respectively. 

If a substrate is moved through a monolayer at the water-air interface, a film can be deposited by both dipping the 
substrate into the water and retracting it. Usually the film is transferred while retracting, if the substrate is 
hydrophilic and the hydrophilic headgroups are interacting with the surface. On the other hand, if the substrate is 
hydrophobic, the film is transferred while dipping, as the hydrophobic alkyl chains interact with the surface. Thus, 
multilayers can be prepared by several subsequent transfer processes. If the transfer process starts with a 
hydrophilic substrate the surface will be hydrophobic after the first film transfer, hydrophilic after the second one 
and so on. This transfer mode is called Y-type deposition, resulting in multilayers with 'head to head' and 'tail to 
tail' configurations of the layers. Films can, however, also be transferred only in downstroke mode resulting in so- 
called X-type layers ('head to tail' configuration) or in upstroke mode only resulting in Z-type layers ('tail to head' 
configuration). The different transfer modes have specific advantages and disadvantages — in general, the Y-type 
(multi)layers are the most stable ones for very hydrophilic headgroups [16]. 
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Horizontal transfer (Schaefer's method). Another technique to prepare structures with LB (multi)layers is named 
after Schaefer [17]. This method is useful for depositing rigid films which can be described as two-dimensional 
solids. First, a compressed monolayer is established at the water-air interface. Subsequently a flat substrate is 
brought horizontally into contact with the film (figure C2.4.5). When the substrate is lifted and separated from the 
water surface a monolayer is transferred to the substrate while (theoretically) maintaining the molecular order. 
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Figure C2.4.5. Horizontal transfer on a hydrophobic substrate. This method is useful for very rigid films that are in 
the 'solid state' in the Ti-A-diagram. 

(F) SUBSTRATES 

Monolayers can be transferred onto many different substrates. Most LB depositions have been performed onto 
hydrophilic substrates, where monolayers are transferred when pulling the substrate out from the subphase. 
Transparent hydrophilic substrates such as glass [18, 19] or quartz [ 20 ] allow spectra to be recorded in transmission 
mode. Examples of other hydrophilic substrates are aluminium [21, 22, 23 and 24], chromium [9, 25] or tin [26], all 
in their oxidized state. The substrate most often used today is silicon wafer. Gold does not establish an oxide layer 
and is therefore used chiefly for reflection studies. Also used are silver [27], gallium arsenide [27, 28] or cadmium 
telluride wafer [ 28 ] following special treatment. 

C2.4.2 EXAMPLES OF LB FILMS 

(A) FATTY ACIDS 

LB films of fatty acids are still studied today, particularly by those researchers who are interested in the basic 
physics of the subject. The literature on this topic is very extensive. The study of such films by means of x-ray 
diffraction revealed that the order of LB films in the direction normal to the substrate and of the lattice planes is 
extremely good. This fact was initially responsible for the widespread enthusiasm for the study of LB films. 
Structural aspects of such films are discussed in detail in the review article by Schwartz [29]. A number of authors 
have found that long-chain fatty acid (and fatty acid salt) LB monolayers deposited on a variety of hydrophilic 
substrates and under a variety of conditions adopt a hexagonal packing with the chain normal to the surface 
(untilted) on average [29] and short-range translational order. However, there are exceptions, such as calcium 
arachidate (CaA 2 ), which is tilted by 20-30°from the normal on Si oxide [29, 30 and 31 ]. Also LB monolayers of 
phospholipids often display hexagonal packing of the chains [29]. However, in most cases multilayers display 
different packing from that of monolayers [29]. They are typically packed in a crystalline rectangular or triclinic 
lattice and in most cases the transition from monolayer to 
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bulk structure is abrupt [29]. Techniques that can give direct information on in-plane order in LB films are electron 
diffraction and polarized-light optical microscopy [28, 29 ]. 

In most cases, once a two-dimensional structure has been established, it will propagate through the film as further 
layers are deposited [32]. It has been suggested that disclination structures that can be found in the resulting films 
exist already at the water-air interface [33] and are transferred to the substrate in the initial layer. It has also been 
proposed that annealing of the film at the water-air interface can greatly reduce the density of disclinations so that 
subsequent multilayers will also contain a low disclination density [34]. It is supposed that the regions immediately 
associated with the disclinations are responsible for electron conduction through the film. 

Electron tunnelling through monolayers of long-chain carboxylic acids is one aspect of interest since it was 
assumed that such films could be used as gate electrodes in field-effect transistors or even in devices depending on 
electron tunnelling [24, 26, 35, 36, 37 and 38]. It was found, however, that the whole subject depends critically on 


the materials involved, especially on the metal used as substrate and electrode. It seems that conduction through 
monolayers can be best understood by conduction through defects [39, 40, 41 and 42]- In addition to the defects 
due to disclinations on a small scale, there will be fluctuations in the distances between molecules on a somewhat 
larger scale. 

Apart from fatty acids, straight-chain molecules containing other hydrophilic end groups have been employed in 
numerous studies. In order to stabilize LB films chemical entities such as the alcohol group and the methyl ester 
group have been introduced, both of which are less hydrophilic than carboxylic acids and are largely unaffected by 
the pH of the subphase. 

New factors for the establishment of multilayer structures are, for example, the replacement of the hydrocarbon 
chain by a perfluorinated chain and the use of a subphase containing multivalent ions [29]. The latter can become 
incorporated into an LB film during deposition. The amount depends on the pH of the subphase and the individual 
ion. The replacement of the hydrocarbon by a rodlike fluorocarbon chain is one way to increase van der Waals' 
interaction and therefore enhance order and stability in molecular assemblies [43]. 

Remarkably, such fluorocarbon monolayers show higher friction than their hydrocarbon counterparts [44] , 
although fluorocarbons are known to have the lowest surface free energy of all organic materials. 

Mechanical stability. The LB technique can be used to force the ordering of long-chain molecules. They are forced 
to order on a liquid subphase and are subsequently transferred to a solid substrate. Although order in these 
materials can be high, achieving this is a potentially non-equilibrium and difficult procedure. The monolayer- 
substrate bond is often weak (either van der Waals or hydrogen-bond interactions); as a result, the assemblies are 
not very mechanically or thermally stable [45, 46]- Stable and high-surface-free-energy monolayers are difficult to 
prepare with this technique. The mechanical stability of a variety of systems, mainly arachidic ones, has been 
investigated with the atomic force microscope. Details and references can be found in the review article by 
Schwartz [29]. 

Thermal stability. F or applications of LB films, temperature stability is an important parameter. Different 
techniques have been employed to study this property for mono- and multilayers of arachidate LB films. In general, 
an increase in temperature is connected with a conformational disorder in the films and above 390 K the order 
present in the films seems to vanish completely [45, 46 and 47]- However, a comprehensive picture for order- 
disorder transitions in mono- and multilayer systems cannot be given. Nevertheless, some general properties are 
found in all systems [47]. Gauche conformations mostly reside at the ends of the chains at room temperature, but 
are also present inside the 
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films at higher temperatures. The energy connected with a gauche conformation in a densely packed film of 
calcium arachidate (CaA 2 ) was found to be more than an order of magnitude higher compared to the liquid state 
[48]. 

(B) RODLIKE STRUCTURES WITH CYCLIC GROUPS 

In addition to the rodlike molecules described above, amphiphiles, which consist of a rodlike structure containing a 
cyclic group near its centre, have also been studied. Of great interest here are materials containing both the 
azobenzene and the stilbene structure (figure C2.4.6) [49]. Interestingly, these are extended conjugated structures 
involving two rings and are thus constrained to remain approximately in the same plane (trans configuration). 
There are, however, cis conformations for both molecules. The reversible change between the two conformations 
by an irradiation are connected with different absorption spectra. This so-called photo -chromic effect is interesting 
because of potential applications, for example, in recording media. Materials containing modifications of these 
groups have been employed in devices intended to generate optical second harmonics. Studies of amphiphilic 
materials containing these groups are for example described in [50, 51, 52, 51 and 54 ]. 
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Figure C2.4.6. Azobenzene structure modified with hydrocarbon chains [53, 54]. 


-10- 


Success of depositing compounds where an 18-carbon chain was attached to one end of an azobenzene group and 
various different hydrophilic groups attached to the other end has been reported in X and Z mode [ 52 ] and piezo- 
and pyroelectric effects were demonstrated. 

Many workers have used the cis-to-trans structural change referred to above and brought about by UV irradiation 
to change some physical parameter of the LB films formed from azobenzene derivatives [55, 56, 52, 58, 59 and 


(C) PORPHYRINS AND PHTHALOCYANINES 


The porphyrin is one of the most important among biomolecules. It is involved in the fundamental processes of life, 
such as oxygen transfer and storage, electron transfer, and synthesis of amino acids [61]. Phthalocyanine is a very 
stable, planar, synthetic aromatic macrocycle. The basic porphyrin and phthalocyanine structures are shown in 
figure C2.4.7 . In the case of porphyrins the positions bearing numbers are capable of having various groups 
attached to them, though many of these positions are usually occupied by hydrogen. In the case of phthalocyanine, 
groups can be attached to the periphery of the benzene rings. Both molecules have a planar structure and a number 
of relatively low-lying excited states. Due to the latter property, it seems likely that interesting devices could be 
made from films in which derivatives of these materials are incorporated, which has led to great interest in these 
compounds. In addition the structures are extremely stable. Porphyrins, for example, can survive the fractioning 
process applied to petroleum and the phthalocyanine group is stable to 400°C [62]. Both materials can be 
complexed with divalent metals, which reside at the centre of the ring. 


LB films of porphyrin and phthalocyanine derivatives can be made in different ways. 

Molecules that have been rendered amphiphilic are deposited in the Y mode so that the planes of the molecules are 
nearly vertical with respect to the film plane. In the mid-1980s it was shown that certain derivatives of porphyrin 
can be used to obtain good Y layers [63, 64 and 65]. On another derivative electron diffraction studies led to the 
conclusion that films consist of crystallites formed from tilted molecules [66]. It seems likely that all these 
materials rearrange after deposition to form many small crystallites, which do not have crystal planes 
corresponding to the original LB planar structure. 

In some cases it has been possible to prepare LB films from porphyrin and phthalocyanine containing fourfold 
symmetry. However, it is not entirely clear how such materials can be deposited by the LB technique. A number of 
studies deals with LB films made of non-amphiphilic phthalocyanines. However, it is very difficult to form LB 
films from symmetric porphyrins [67]. True LB deposition of such compounds leads to an edge-on structure, 
whereas related techniques can lead to a structure in which the molecular planes lie parallel to the substrate [68, 69, 
70, 71, 72, 73, 74, 75, 76, 77, 78 and 79]. 

Another possibility is that the ring structure may have long hydrocarbon chains attached at the corners so that they 
stand up at one side. These chains provide the hydrophobic component and the polarizable ring structure provides 
the hydrophilic moiety. There are studies with porphyrins bearing four long hydrocarbon chains and whose 
hydrophilic moieties are associated with the ring structure [80, 81]. However, these materials did not lead to the 
formation of ordered multilayers. The same general principle applied to phthalocyanines led to stable films at the 
water-air interface and could produce multilayers by the LB technique [82, 83 ]. 
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Figure C2.4.7. Porphyrin (a) and phthalocyanine (b) structures. 


In summary, a vast number of materials has been used to form LB films. However, in the majority of cases an 
effort to characterize the film structure or even to show that a regular layer structure has been achieved is lacking. 
Work on the structure of films of disc-like molecules such as porphyrins and phthalocyanines is especially limited. 
Some references can be found in [29]. 

(D) MORE COMPLEX STRUCTURES (POLYMERS) 

The generally low chemical, mechanical and thermal stability of LB films hinders their use in a wide range of 
applications. Two approaches have been studied to solve this problem. One is to spread a polymerizable monomer 
on the subphase and to polymerize it either before or following transfer to the substrate. The second is to employ 
preformed polymers containing hydrophilic and hydrophobic groups. 

LB films made of more complex structures such as polymers can be divided into different classes [84]. 
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(E) POST-FORMED POLYMERS FROM MONOMERS CONTAINING ONE OR MORE DOUBLE BONDS 

These are systems in which multilayer structures are formed from molecules containing one or more double bonds 
and in which polymerization is subsequently initiated by appropriate means such as electron beam or UV light 
exposure. 

A study of polymerization after deposition is of interest to polymer chemists since it is possible to arrange the 
monomers so that the polymerizable groups are adjacent to one another and to monitor changes in film structure 
arising from polymerization. The system under investigation is thus under control to a large degree. The first work 
published on this was upon multilayers of vinyl stearate and subsequent polymerization by y-rays [85]. Under 
appropriate conditions a high level of polymerization was observed. 

Polymerization of compounds performed with UV light was first reported in the 1970s [86] and was followed by 
further studies [87, 88 and 89]. Another study was concerned with the deposition and polymerization of multilayers 
of alcohols and acids incorporating the diene group , -CH=CH-CH=CH-, at the hydrophilic end of the molecule 
[90]. 

Other investigations dealt with straight-chain molecules (co-tricosenoic acid) in which the penultimate and final 
carbon atoms at the hydrophobic end are connected by a double bond [91, 92]. The material does not polymerize as 
rapidly as those described before when irradiated by UV light, however, but it is readily polymerized when 
bombarded with an electron beam. It was thus thought to be an optimal material for the fabrication of electron 
beam resists. 

The deposition and photo-polymerization of relatively complex amphiphilic compounds having two hydrophobic 
chains attached to a single hydrophilic headgroup have also been studied [93, 94]. This work is in the direction of 
materials for forming stable bio-compatible coatings for artificial organs, the most important outcome being the 
formation of stable multilayers having a hydrophilic outer surface. 

(F) DIACETYLENES 

Similar systems to those mentioned above exist where the constituent monomer contains the diacetylene group. 

A summary of the studies performed on symmetrical compounds having a diacetylene group at the centre is given 
in [94]. Most of the materials studied in the context of LB films have been diyonic acids ( figure C2.4.8 ). 
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Figure C2.4.8 Diacetylene structure employed to prepare polymeric LB films (a) and principle in diacetylene 
polymerization (b). 

If these materials are deposited as LB multilayers, polymerization can be induced either by thermal or optical 
means. This subject has been intensively studied [95, 96, 92, 98 and 99]- Since parameters such as m, subphase 
components, pH and polymerization before and after dipping, as well as temperature and wavelength employed for 
polymerization can be varied, the literature on diacetylenes is extensive and the reader is referred for example to 
the book of Tredgold [ 100 ]. 

(G) PREFORMED POLYMERS 

Mono- and multilayers may be formed by the LB technique from polymers bearing both hydrophilic and 
hydrophobic side groups that are already spread as a polymer at the water-air interface. 

Unwanted structures in the film plane — often found within LB films formed from simple rodlike molecules or from 
molecules polymerized after deposition — can be problematic, since many possible applications of such films 
require a uniform structure within the plane. On the other hand, however, the production of a system in which the 
structure within the plane is so disordered that there exist no structural features large enough to cause problems 
would also render applications possible. In three-dimensional materials, for example, both inorganic glasses and 
many polymers are capable of transmitting light without any appreciable scattering for substantial distances. 

Studies of the waveguiding of light in multilayers of certain polymers showed that it is possible to propagate light 
with an attenuation that is still large compared to many other materials but small compared to other LB materials 
[101]. 

Another approach to the fabrication of LB films from preformed polymers is to form a hydrophobic main chain by 
reacting monomers terminated by a vinyl group [ 102 , 103 , 104 , 105 and 106 ]. The side groups studied also 
included perfluorinated hydrocarbon chains, which tilt with respect to the normal to the plane of the film, whereas 
the analogous ordinary hydrocarbon chains do not [ 105 ]. 
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Other polymers, such as polymethacrylates, have been studied, as well as esters of naturally occurring 
polysaccharides. References can be found in the literature cited in the list of further reading. 

(H) RIGID-ROD POLYMERS 

Finally, rigid-rod polymers can be deposited on a solid substrate by the LB technique. These materials have both 


hydrophilic and hydrophobic characteristics, and are capable of residing with the rod axis horizontal at the water- 
air interface. 

The polymers described so far have relatively flexible main chains which can result in complex conformations. In 
some cases, they can double back and cross over themselves. There are also investigations on polymers which are 
constrained to remain in a conformation corresponding, at least approximately, to a straight line, but which have 
amphiphilic properties that ensure that this line is parallel to the water surface. Chiral molecules are one example 
and many polypeptides fall into this class [ 107 ]. Another example is cofacial phthalocyanine polymers (figure 
C2.4.9). 



Figure C2.4.9. Part of a bridge-stacked polyphthalocyanine. 

The species at the centre of the rings is usually Si or Ge and the bridging atom is oxygen. In one study the 
peripheral hydrogens on the phthalocyanine molecules were replaced by alkyl groups and the resulting polymers 
could be rendered soluble in ordinary organic solvents [ 108 , 109 and 110 ]. Successful deposition of several of these 
materials has been achieved and different techniques were employed to study their structural properties [ 109 ., Ill , 
112, 113 and 114]. 

The variety of molecules used to prepare LB films is enormous, and only a small selection of examples can be 
presented here. Liquid crystals and biomolecules such as phospholipids, for example, can also be used to prepare 
LB films. The reader is referred to the literature for information about individual species. 
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C2.4.2 SELF-ASSEMBLED MONOLAYERS (SAMS) 

Self-assembled monolayers (SAMs) are molecular layers that form spontaneously upon adsorption by immersing a 
substrate into a dilute solution of the surface -active material in an organic solvent [ 115 ]. This is probably the most 
comprehensive definition and includes compounds that adsorb spontaneously but are neither specifically bonded to 
the substrate nor have intermolecular interactions which force the molecules to organize themselves in the sense 
that a defined orientation is adopted. Some polymers, for example, belong to this class. They might be attached to 
the substrate via weak van der Waals' interactions only. 

Most often the term SA is used in connection with compounds that attach strongly to the substrate and/or have 
significant intermolecular interactions. Order, orientation and stability in SA systems depend crucially on the 
compound involved. For establishing a lateral translational order the anchoring to the substrate and/or the 
intermolecular interactions are important. A highly ordered substrate, for example, may induce a high translational 
order if there is strong coupling between headgroups and substrate. This is, for example, the case for the alkyl 
thiol-gold system (see below). 


The so-called self-assembly technique has its origin in 1946, when a paper was published by Bigelow et al [ 116 ] 
and thus is slightly younger than the LB technique. The authors noted that a hydrophilic surface exposed to an 
amphiphilic compound dissolved in a non-polar solvent induces the amphiphilic material to form a monolayer on it. 


This idea was later extended to form multilayers by synthesizing a material with a hydrophilic group at one end and 
a further hydrophilic group, masked by a hydrophobic blocking group, at the other. After deposition on a 
hydrophilic surface by the technique introduced by Bigelow et al 9 the blocking groups are removed by a chemical 
reaction, revealing a further hydrophilic surface. Then the whole process can be repeated, which, in principle, 
should be capable of providing a simple way to produce ordered multilayers. In practice, however, there are many 
difficulties. 

In the 1980s the study of SAMs was sparked by the use of octadecyltrichlorosilane (OTS) for film formation on 
silica surfaces [ 117 ] and by the investigation of dialkyldisulphide layers on gold [ 118 ], prepared by immersion of 
the substrates in diluted solutions of the surfactants. Higher-quality films on gold were obtained by the adsorption 
of structurally analogous alkyl thiols [ 119 , 120 ]. These experiments initiated an avalanche of investigations with a 
plethora of SA systems, among which thiols on gold is the most intensively investigated combination to date. 

C2.4.2.1 SA MOLECULES 

Not all molecules are suited for establishing SAMs. The majority of cases studied have involved assembly of alkyl- 
chain-based entities. The molecules of self-organizing chemical compounds all have a similar structure. The 
spontaneous nature of film formation is due to the interaction energies of the monolayers. These can be considered 
in terms of three main components ( figure C2.4.10 ) [ 121 ], which cooperatively establish stability, order and 
orientation in the monolayer. 
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Figure C2.4.10. Schematic diagram of a self-assembling molecule. 

The first part is the headgroup, which is responsible for the bonding to the substrate surface, which may be by 
chemisorption or physisorption. 

In the case of chemisorption this is the most exothermic process and the strong molecule substrate interaction 
results in an anchoring of the headgroup at a certain surface site via a chemical bond. This bond can be covalent, 
covalent with a polar part or purely ionic. As a result of the exothermic interaction between the headgroup and the 
substrate, the molecules try to occupy each available surface site. Molecules that are already at the surface are 
pushed together during this process. Therefore, even for chemisorbed species, a certain surface mobility has to be 
anticipated before the molecules finally anchor. Otherwise the evolution of ordered structures could not be 
explained. 

The spontaneous adsorption brings the molecules close enough together that intermolecular interactions become 
important; these — in the case of alkyl-chain-based molecules — consist of the short-range van der Waals' 


interaction. The chains constitute the second part of alkane-based self-organizing molecules. If the molecules are 
densely packed in the final layer they are forced to stretch and are in a nearly all trans -configuration for sufficiently 
long chains at room temperature [ 122 ]. 

The self-organizing process of the amphiphilic alkane chains would not be possible solely due either to the 
interaction between the chains or the bonding of the molecules to the substrate. In fact it is a cooperative process of 
both factors and there are limiting cases where one or the other dominates. In the case of alkanethiols the first and, 
as mentioned earlier, the most important process, is chemisorption. The establishment of a well ordered and 
densely packed layer is only possible following the anchoring of molecules at the surface sites. Van der Waals' 
forces are the most important intermolecular interactions in the case of simple alkane chains. By substitution of the 
methylene groups in the chain by larger polar groups, long-range electrostatic interactions can also play a 
significant role and can become energetically more important than the short-range van der Waals' interactions. 
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If the coupling to the substrate is weak (physisorption), as is the case for alkylsiloxanes on a SiO x surface in the 
presence of a water layer, for example, the packing may also be mainly driven by intermolecular forces. Stability in 
this system is provided by crosslinking between the molecules (see below). 

The third part is the interaction between the terminal functionality, which in the case of simple alkane chains is a 
methyl group (-CH 3 ), and the ambient. These surface groups are disordered at room temperature as was 
experimentally shown by helium atom diffraction and infrared studies in the case of methyl-terminated monolayers 
[ 122 ]. The energy connected with this conformational disorder is of the order of some kT. 

This third part can be substituted by a functional group, a small fragment or even a polymer, where alkanethiols are 
only used to attach the whole compound to the surface. This potential makes compounds modified with SA 
molecules attractive in a whole variety of areas and technologies. 

C2.4.2.2 PREPARATION OF SAMS 

In contrast to the preparation of LB films, that of SAMs is fairly simple and no special equipment is required. The 
inorganic substrate is simply immersed into a dilute solution of the surface active material in an organic solvent 
(typically in the mM range) and removed after an extended period (-24 h). Subsequently, the sample is rinsed 
extensively with the solvent to remove any excess material (wet chemical preparation). 

Preparation of films for sufficiently volatile molecules can also be performed by evaporating the molecules in 
vacuum (gas-phase deposition) or by the use of a desiccator which contains the substrate and the dilute solution in a 
vessel separately and which is evacuated to 0.1 mbar and kept under vacuum for several hours (-24 h). This also 
results in a vapour-phase-like deposition of the molecules onto the substrates. 

C2.4.2.3 EXAMPLES OF SAMS 

A plethora of different SA systems have been reported in the literature. Examples include organosilanes on 
hydroxylated surfaces, alkanethiols on gold, silver, copper and platinum, dialkyl disulphides on gold, alcohols and 
amines on platinum and carboxyl acids on aluminium oxide and silver. Some examples and references can be 
found in [ 123 ]. More recently also phosphonic and phosphoric esters on aluminium oxides have been reported 
[124, 125 ]. Only a small selection out of this number of SA systems can be presented here and properties such as 
kinetics, thermal, chemical and mechanical stability are briefly presented for alkanethiols on gold as an example. 

The molecules for SA monolayers are chosen or synthesized according to the substrate that should be coated. 
Thiol-terminated entities have been mostly used in connection with metal surfaces, but also on GaAs [ 126 ]. 
Chloro- and acid-terminated molecules are most often employed on oxide surfaces of metals or semiconductors. 
However, they have also occasionally been used with metal surfaces [ 127 ]. 


(A) MONOLAYERS FORMED FROM ACIDS 

The first work published in this area was that of Bigelow mentioned above [ 116 ]. In 1957, monolayers of long- 
chain fatty acids were formed on thin films of silver, copper, iron and cadmium deposited on glass microscope 
slides [43]. 
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Finally, in 1985, the results of an extensive investigation in which adsorption took place onto an aluminium oxide 
layer formed on a film of aluminium deposited in vacuo onto a silicon wafer was published by Allara and Nuzzo 
[ 127 , 128 ]. Various carboxylic acids were dissolved in high-purity hexadecane and allowed to adsorb from this 
solution onto the prepared aluminium oxide surface. It was found that for chains with more than 12 carbon atoms, 
chains are nearly in a vertical orientation and are tightly packed. For shorter chains, however, no stable monolayers 
were found. The kinetic processes involved in layer formation can take up to several days. 


More recently, alternative chemistries have been employed to coat oxide surfaces with SAMs. These have included 
carboxylic [ 129 , 130 ], hydroxamic [ 131 ], phosphonic [ 124 , 132 ] and phosphoric acids [ 133 ]. Potential applications 
of SAMs on oxide surfaces range from protective coatings and adhesive layers to biosensors. 

(B) SILANES 

Organosilanes, such as trichlorosilanes or trimethylsilanes, can establish SA monolayers on hydroxylated surfaces. 
Apart from their (covalent) binding to the surface these molecules can also establish a covalent intermolecular 
network, resulting in an enhanced mechanical stability of the films (figure C2.4.1 1). In 1980, work was published 
on the formation of SAMs of octadecyltrichlorosilane (OTS) [ 117 ]. Subsequently, the use of this material was 
extended to the formation of multilayers [ 134 ]. 
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Figure C2.4.11. The formation of SAMs from OTS on a silicon oxide substrate. 

Although it has been claimed in the literature that well defined and organized films have been achieved, there is 
still some debate about the quality of these systems. It has been suggested that the assembly mechanism of 
chlorosilanes on oxidized surfaces depends very crucially on the processing conditions involved. The assembling 
process shows an interesting dependence on pre-hydration and temperature [ 135 ], suggesting that water plays a 
central role. The stability in the films seems to be mostly due to the intermolecular network and not to the bonding 
to the substrate. This is supported by the observation that organized films are established on amorphous substrates. 
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Apart from these simple silanes, derivatives with aromatic groups at different places in the chain have also been 
investigated [ 136 , 137 ]. It was found that the average tilt angle of these molecules depends on the specific 
functional entities contained in the chains. It is likely that apart from packing considerations — important for bulky 
groups, for example — other factors also influence the resulting tilt. 

(C) THIOLS 

In 1983, it was shown that certain sulphur compounds have a strong affinity to gold and bind strongly to a gold 
surface [ 118 ]. This led to considerable interest and activity in the study of SAMs of sulphur compounds on gold 
and today alkyl thiols on gold are the most extensively studied SA system. Although ethanol has been most 
frequently used as a solvent, however, other liquids have also been employed [ 122 ]. Polycrystalline gold substrates 
prepared by thermal evaporation predominantly display the (1 1 1) face [ 138 , 139 , 140 , 141 and 142 ], which has the 
lowest surface energy [ 138 ]. Most of the work cited in the following has been performed on such surfaces. 

Monolayers of alkanethiols adsorbed on gold, prepared by immersing the substrate into solution, have been 
characterized by a large number of different surface analytical techniques. The lateral order in such layers has been 
investigated using electron [ 143 ], helium [ 144 , 145 ] and x-ray [ 146 , 147 ] diffraction, as well as with scanning 
probe microscopies [ 122 , 148 ]. Information about the orientation of the alkyl chains has been obtained by 
ellipsometry [ 149 ], infrared (IR) spectroscopy [150, 151] andNEXAFS [152]. 

The systematic study of alkanethiols (CH 3 (CH 2 ) W -SH) with different chain lengths revealed that for n = 1 1 or 
greater closely packed layers are obtained with a tilt angle of the alkyl chains between 30 and 35° from the surface 

normal and with an area per molecule of approximately 21.4 A [ 122 ]. For n < 1 1, there is a gradual deterioration 
in order with decreasing length. It is generally accepted that adsorption takes place by elimination of the terminal 
hydrogen and that the thiol is bound to the gold by a true valence bond [ 122 ]. In contrast to chlorosilanes on 
oxidized substrates, alkanethiols establish a superlattice on gold and much of their stability is due to the anchoring 
of the molecules to the substrate. 

There is a controversy about the bonding state of the sulphur. Most evidence suggests that it is bound in the form of 
a thiolate [ 122 ], while x-ray diffraction suggests that the sulphur atoms may dimerize [ 147 ]. However, not all of the 
observed overstructures can be explained with this latter assumption. 

The growth and the structure of fully developed alkanethiol films on gold have been extensively investigated with 
STM [ 153 , 154 , 155 , 156 and 157 ]. This work has confirmed that four phases exist in the final layer. There is a 
basic hexagonal lattice corresponding to the superlattice established by the sulphur atoms and three variants of a 
rectangular lattice, which was experimentally observed in the organization of the chains in long-chain (>C 12 ) 
thiolate monolayers on Au(l 11). The rectangular lattices are due to different twist angles of the chains around their 
axis [ 148 ], 

The domain size in alkanethiol films on gold depends on the concentration used during preparation and is typically 
between 10 and 50 nm [ 158 ]. Thiol monolayers on gold have long-range angular order but quite short-range radial 
order. This can be explained in terms of tightly packed tilted molecules. 

Order and dense packing are relative in the context of these systems and depend on the point of view. Usually the 
term order is used in connection with translational symmetry in molecular structures, i.e. in a two-dimensional 
monolayer with a crystal structure. Dense packing in organic layers is connected with the density of crystalline 
polyethylene. 
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Self-organizing monolayers are highly disordered compared to inorganic crystals, due to the defects that are always 
present. On the other hand, when compared to polymeric glasses or liquid paraffin, they are highly ordered 
systems. Hence, the terms order and densely packed in this context do not imply the absence of defects. The degree 
of order is comparable to that of Langmuir-Blodgett layers. 


Apart from domain boundaries, some of the defects in alkanethiol monolayers (pitholes) are created by the thiol 
itself 159 ] by 'etching' processes. It was found that the solvent used for preparation also has some effect on the 
resulting defect density. 

In contrast to the gold surface, on silver the chains adopt a lower tilt angle of 12° from the surface normal [ 160 , 
161 ]. This is attributed to the different nature of the bonding of sulphur to silver as compared to gold and the 
slightly different packing density. The coherence length determined with He atom diffraction was found to be 12 
nm [162]. 

GaAs has been coated with thiols with a view to modifying devices [ 123 ]. For these films, S-As bonds are 
presumed to be present. An ordering of the chains for n = 18 has been reported. However, this system has generally 
been much less investigated than those involving metal substrates. 

The lubricant properties of alkanethiols and fluorinated alkanes have been studied extensively by scanning probe 
techniques [ 163 ]. In agreement with experiments on LB monolayers it was found that the fluorocarbon monolayers 
show considerably higher friction than the corresponding hydrocarbon monolayers [ 164 , 165 and 166 ] even though 
the fluorocarbons are known to have the lowest surface free energy of all organic materials. 

Kinetics of film formation. The kinetics of film formation of SAMs is important, in order to establish a recipe that 
allows films of reproducible quality to be prepared, i.e. densely packed, well ordered monolayers. Ellipsometry and 
contact-angle measurements can give information about coverage and orientation present in the films. In general it 
has been reported that film formation is a two-step process, at least for dilute solutions (-1 mM). A fast first step 
where the contact angles and film thickness are already close to their limiting values (minutes) is followed by a 
second slower step, after which the final values are reached [ 120 ]. It was also shown that the order is established 
during this second step, where the last 5-10% of molecules are incorporated into the film and force the molecules 
on the surface to stretch [ 167 ]. For different chain lengths and solvents it was found that only Langmuir kinetics 
can explain the experimental data of thiolate films on poly crystalline gold, irrespective of the experimental 
conditions [ 168 ]. 

Lateral structuring of SAMs — microcontact printing. Of great interest is the application-specific chemical 
structuring of ultrathin organic films, for example for use in biomedical devices. Such structuring can be 
accomplished by lithographic means, including the so-called microcontact printing technique (juCP) [ 169 , 170 and 
171 ]. This is a relatively simple technique and allows lateral patterning down to submicrometre scale. It is also 
known as soft-lithographic patterning. In short, a structured stamp made of poly(dimethylsiloxane) (PDMS) is 
inked with the diluted solution and brought into contact with the substrate ( figure C2.4.12 ). Patterns with 
dimensions in the (sub) micrometre range can routinely be produced by this technique [ 172 ]. Although the contact 
time between stamp and surface is of the order of only 10-20 s, the resulting films of alkanethiols on gold are 
chemically not distinguishable from those prepared by immersion. Remarkably, even the order in areas that are 
prepared by juCP is comparable to that in films prepared by immersion [ 173 ]. However, with lateral force 
microscopy, regions prepared by immersion can be distinguished from those prepared by juCP [ 173 , 174 ]. This is 
probably due to slightly different domain size distributions and thus different mechanical stabilities of the films on 
a nanometre scale [ 173 ]. 
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Figure C2.4.12. Principle of the microcontact printing process. Chemically patterned organic films with differently 
functionalized regions can be prepared by a combination of juCP and subsequent immersion. 

Other lithographical means include micromachining [ 175 ], photopatterning [ 176 ] or electron beam patterning 
[ 177 ], which have been demonstrated on alkanethiolate/Au SAMs, alkanethiolate and organo-siloxane on Si and Ti 
and alkanethiolates on GaAs. 

Chemical stability. The chemical stability of SA films is of interest in many areas. However, there is no general 
rule for it. The chemical stability of silane films is remarkable, due to their intermolecular crosslinking. Therefore, 
they are found to be more stable than LB films. Alkyltrichlorosilane monolayers provide structures that are stable 
to chemical conditions that most LB films could not stand. However, photopolymerized LB films also show 
considerable stability in organic solvents. 

SAMs of thiolates on gold are generally resistant to strong acids or bases [ 175 , 178 and 179 ], are not destroyed by 
solvents [ 180 ] and can withstand physiological environments [ 181 , 182 and 183 ]. However, they also show some 
degradation if exposed to the ambient atmosphere for sufficiently extended periods [ 184 ]. 

Thermal stability. The thermal stability of SAMs is, similarly to LB films, an important parameter for potential 
applications. It was found that SA films containing alkyl chains show some stability before an increase in the 
number of gauche conformations occurs, resulting in melting and irreversible changes in the film. The disordering 
of the 
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chain structure as temperature is increased has been studied for both LB films and SAMs. In general, an all-trans 
structure dominates at room temperature for most of the systems. When the melting point is approached, however, 
the number of gauche defects increases. For thiols with more than 16 carbon atoms it was shown experimentally 
that gauche defects are mainly concentrated near the free ends of the chains [ 151 , 185 ]. Unlike bulk hydrocarbons, 
these films do not show a sharp phase transition in the temperature range between 80 and 420 K but rather a 
gradual change to a progressively more ordered state as the temperature is lowered. 


Similarly to LB films, the order of alkanethiols on gold depending on temperature has been studied with NEXAFS. 
It was observed that the barrier for a gauche conformation in a densely packed film is an order of magnitude higher 
than that of a free chain [48]. 

S AMs that are made out of structures capable of forming strong intermolecular hydrogen bonds have been studied 
especially in view of their expected high thermal and chemical stability [ 186 , 187 ]. 

A good survey of the chemical and physical film characteristics of highly organized SAMs is given in [ 123 ]. 

Mechanical stability. Chemisorption to the surface, intermolecular interactions and crosslinking between adjacent 
compounds — if possible — all contribute to the resulting stability of the monolayer film. Lateral force microscopy 
investigations revealed that the mechanical stability towards lateral forces on the nanometre scale is likely to be 
determined by the defect density and the domain size on a nano- to micrometre scale [ 163 , 173 ]. 

Experiments with chemically grafted SAMs displayed much larger wear resistance than films produced by the LB 
technique [ 188 ]. Also it was found that wear properties of SAMs can be further improved by chemically grafting 
C 60 molecules onto SAM surfaces [ 189 ]. 

(D) ALKANETHIOLS WITH FUNCTIONAL ENTITIES ATTACHED 

The strong bond formed between the thiol endgroups and gold and silver surfaces allows the possibility of forming 
molecules that have a wide variety of different functional groups at the opposite end and thus of coating a noble 
metal surface with a variety of differently functionalized molecules and mixtures. 

A large number of studies concerned with thiol-terminated molecules has been directed at the preparation of 
tailored organic surfaces, since their importance has been steadily increasing in various applications. Films of oo- 
functionalized alkanethiols have facilitated fundamental studies of interfacial phenomena, such as adhesion [ 190 , 
191 ], corrosion protection [ 192 ], electrochemistry [ 193 ], wetting [ 194 ], protein adsorption [ 195 , 196 ] or molecular 
recognition [ 197 , 198 , 199 , 200 and 201 ] to mention only a few. 

Biological applications are attracting increasing attention and examples from this area are given below. An 
understanding of the mechanism of protein adsorption, the interaction of proteins with 'artificial' substrates and the 
way in which these interactions determine the biological activity of these substrates are of immense biomedical 
significance [ 202 , 203 , 204 , 205 and 206 ]. SAMs play a particularly important role here, since they can serve as 
models of polymer surfaces, allowing surface chemical properties to be investigated independent from the effects 
of surface morphology. In this way, macroscopic concepts such as hydrophobicity, hydrophilicity, wettability and 
water 
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content, which are crucial to understanding cell adhesion and anchorage-dependent cell behaviour [ 207 , 208 , 209 , 
210 and 211 ], can be substituted by more fundamental, molecular-level concepts of surface organization, reactivity 
and structure. Efforts have been undertaken to engineer gradients of surface hydrophobicity/hydrophilicity on 
polymeric surfaces [ 212 , 213 , 214 and 215 ] and, more recently, on SAMs prepared from thiols [ 216 ]. 

Hydroxylated surfaces are of particular interest, due to the possibility of derivatizing the -OH groups with 
biologically active moieties. The spatial arrangement and density of -OH groups within a monolayer matrix are 
relevant, since they may regulate the accessibility of a specific functional group to biomolecules. One approach to 
controlling these properties uses mixed chain length CH 3 - and OH-terminated alkanethiolate SAMs [ 216 , 217 , 218 , 
219 and 220 ]. Another approach is a thiol-terminated hexasaccharide with acetyloxy groups that can be replaced by 
hydroxyl groups, and which was also adsorbed on gold [ 221 ] (figure C2.4.13). It was found that deprotection and 
thus the degree of hydrophilicity can be tailored before as well as following adsorption on the surface. This allows 
the OH-group density to be adjusted to a high degree and might have advantages over the mixed monolayers, since 
it can be better controlled. 
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Figure C2.4.13. Thiol-terminated hexasaccharide, where acetoxy groups can be replaced by hydroxyl groups, both 
before and following adsorption [ 221 ]. 

In another study, coadsorption of simple ?z-alkanethiols, which acted as a scaffolding, and a synthetic receptor was 
studied on gold [ 222 ]. The design of the system mimics those of receptors bound to lipid membranes. 

Poly(ethylene glycol) is often employed to render surfaces protein resistant. 01igo(ethylene glycol)- (OEG-) 
terminated alkanethiols were adsorbed on Ag and Au and were studied concerning their properties towards protein 
adsorption [ 223 , 224 ]. Interestingly, the slight difference in the packing density and chain tilt of the alkane chains 
on gold and silver leads to a completely different behaviour of the OEG-terminated thiols concerning protein 
adsorption [ 223 ]. While the SAMs on gold are protein resistant, those on silver adsorb a certain amount of 
fibrinogen. 

Regarding protein adsorption properties, differently terminated SAMs on gold have also been investigated [ 225 ]. It 
was found that the nature of the adsorbate chain structure was the most important parameter for the observed 
behaviour towards protein and cell adsorption. 

Covalent immobilization of proteins on micro structured gold surfaces was studied in [ 226 ]. On these substrates, 
which were prepared by juCP and etching, the immobilization sites of proteins could be spatially controlled using 
an amino-reactive SAM. The whole process, i.e. production of the micropatterned substrate including SAM 
exchange and protein immobilization, took a reasonably small amount of time (-24 h), providing some flexibility 
in the experimental work. 
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(E) THIOL-MODIFIED POLYMERS 

In addition to simple alkanethiols or those functionalized with small groups or compounds, thiols can also be used 
to attach polymers to metal surfaces. However, there are specific problems, since the accessibility of the surface for 
the thiol groups of the modified polymers, for example, is often strongly restricted and a significant entropy change 
may be connected with the adsorption, depending on the polymer, the surface and the solvent. 

The study of ultrathin polymer layers on metals is relevant in understanding the behaviour of polymers on surfaces, 
as well as in the areas of adhesion and corrosion. Gold and copper surfaces can be covered with monolayers of 
polymers by adsorption from solution [227, 228 , 229, 230 , 231 , 232 , 233 , 234 and 235]. 

One example of a polymer layer on gold consists of adsorbed thiol-terminated poly(styrene)s of different molecular 
weights [ 234 ]. Poly(styrene) itself does not adsorb significantly on gold from tetrahydrofuran (THF) [ 232 ] or 
toluene [ 236 ]. As the average molecular weight (M ) increases up to -100 000, an increasing amount of the thiol- 
terminated polymer adsorbs on the surface [ 234 ]. The adsorbed mass remains constant up to M n of -200 000 or 
more, corresponding to a layer thickness of -3 nm [ 234 ]. However, thiol-terminated poly(styrene) with M n = 500 
000 does not adsorb at all [ 234 ]. It seems that the sulphur-gold interaction is no longer sufficient to overcome the 
entropy loss and loss of polymer-solvent interactions, which would accompany adsorption of very high-molecular- 
weight chains [ 234 ]. 


Structure, morphology and friction of thiol-terminated poly(styrene) have also been studied with atomic force 
microscopy [237, 238 and 239 ]. 

No adsorption of a block copolymer with a styrene/propylene sulphide molar ratio of 3:1 from THF was found 
when the propylene sulphide blocks were capped with ethyl groups [ 227 ]. Thiol-capped styrene-propylene 
sulphide block copolymers (M n = 60 000), in contrast, adsorb on gold from THF [ 234 ]. The layer thickness of 
styrene-propylene sulphide block copolymers decreases with increasing propylene sulphide block size, since the 
propylene sulphide block interacts with and, as a result, adheres to the surface [ 234 ]. A strong interaction of the 
thiol endgroup seems, however, to be necessary for any further segmental adsorption of the polymer chain [ 227 ]. 
The styrene block most likely escapes from the gold surface, at least partially, and this could explain the observed 
changes in layer thickness. Analogously, poly (styrene) that is terminated with six or seven ethylene sulphide units 
with a thiol end group also adsorbs from toluene onto gold [235]. 

Poly(methyl methacrylate) (PMMA) with sulphide-modified side chains adsorbs onto gold from methyl ethyl 
ketone and dichloromethane (there is little influence of the solvents on the layer formation) [ 229 ]. Two polymers 
with one sulphide group per 10 and 100 monomer units were examined [ 229 ]. The thickness of the adsorbed layers 
was -1.5-3 nm, depending on the concentration of the polymer in the solution and, to a degree, on the sulphur 
content [ 229 ]. Angle-dependent XPS measurements indicate that the sulphide groups are not concentrated at the 
interface but dispersed relatively uniformly throughout the film [ 229 ]. Unmodified poly(methyl methacrylate) also 
adsorbs under the same conditions, but the thickness of the corresponding layer is below 1 nm [ 229 ]. Also, a poly 
(benzylglutamate) with a disulphide endgroup yields thicker layers on gold than the corresponding non-modified 
poly(benzylglutamate) [ 228 ]. Moreover, poly(acrylate)s with disulphide moieties in the side chains have been 
studied. In this case it was found that the disulphide groups promote the adsorption [ 230 , 232 ]. 
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The examples described above are only a small selection out of a tremendous number of investigations of LB films 
and SAMs. This number is still increasing and it is expected that ultrathin organic films will play a central role in 
both fundamental and applied sciences in the future. 
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C2.5 Introducing protein folding using simple 
models 


D Thirumalai and D K Klimov 


C2.5.1 INTRODUCTION 

Most reactions in cells are carried out by enzymes [1], In many instances the rates of enzyme-catalysed reactions 
are enhanced by a factor of a million. A significantly large fraction of all known enzymes are proteins which are 
made from twenty naturally occurring amino acids. The amino acids are linked by peptide bonds to form 
polypeptide chains. The primary sequence of a protein specifies the linear order in which the amino acids are 
linked. To carry out the catalytic activity the linear sequence has to fold to a well defined three-dimensional (3D) 
structure. In cells only a relatively small fraction of proteins require assistance from chaperones (helper proteins) 
[2]. Even in the complicated cellular environment most proteins fold spontaneously upon synthesis. The 
determination of the 3D folded structure from the one-dimensional primary sequence is the most popular protein 
folding problem. 

For a number of years the protein folding problem remained only of academic interest. The synthesis of proteins in 
cells was described by Crick in [JJ. Schematically this process can be represented as DNA — » RNA — > Proteins. 
This proposal and Anfmsen 's [3] demonstration that a denatured protein would fold to the native conformation 
under suitable conditions was sufficient to understand the role of protein folding in cells. However, in the last 
couple of decades many diseases have been directly linked to protein folding (especially misfolding) [4]. Thus, 
there is an urgency to understand the mechanisms in the formation of folded structures. The biotechnology industry 
is also interested in the problem because of the hope that by understanding the way polypeptide chains fold one can 
design molecules (using natural or synthetic constituents) of use in medicine. Finally, the full potential of the 
human genome project involves understanding what the genes encode. For all these profoundly important reasons 
the protein folding problem has taken centre stage in molecular biology. 

Because this problem is complex several avenues of attack have been devised in the last fifteen years. A 
combination of experimental developments (protein engineering, advances in x-ray and nuclear magnetic resonance 
(NMR), various time-resolved spectroscopies, single molecule manipulation methods) and theoretical approaches 
(use of statistical mechanics, different computational strategies, use of simple models) [5, 6 and 7] has led to a 
greater understanding of how polypeptide chains reach the native conformation. 


are: 


From our perspective there are four major problems that comprise the protein folding enterprise. They 

(a) Prediction of the 3D fold of a protein given only the amino acid sequence. This is referred to as the structure 
prediction problem. In figure C2.5.1 we show the 3D structure of haemoglobin. The structure prediction 
problem involves determining this fold using the primary sequence which is given in the lower part of figure 
C2.5.1. 
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Given that a sequence folds 
to a known native structure, 
what are the mechanisms in 
the transition from the 
unfolded conformation to the 
folded state? This is a 
kinetics problem, the solution 
of which requires elucidation 
of the pathways and transition 
states in the folding process. 
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Figure C2.5.1. The 3D native structure of haemoglobin visualized using RasMol 2.6 
[§]. The linear sequence of amino acids of haemoglobin is given below the figure. 


(b) 

(c) 


(d) 


How to design sequences that 
adopt a specified fold [9]? 
This is the inverse protein 
folding problem that is vital 
to the biotechnology industry. 
There are some proteins that 
do not spontaneously reach 
the native conformation. In 
the cells these proteins fold 
with the assistance of helper 
molecules referred to as 
chaperonins. The chaperonin- 
mediated folding problem 
involves an understanding of 
the interactions between 
proteins. 
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It is not the aim of this section to introduce the reader to all the areas listed above. Our goal is modest. We describe 
some of the theoretical developments which arose from studies of caricatures of proteins. Such models were 
designed in order to understand certain general features about protein structures and how these are kinetically 
reached. To keep the bibliography compact we mostly cite review articles. The interested reader can find the 
original papers in these cited works. We hope that this short introduction will entice the reader to delve into the 
ever surprising world of biological macromolecules. 


C2.5.2 RANDOM HETEROPOLYMER AS A CARICATURE OF PROTEINS 

In homopolymers all the constituents (monomers) are identical, and hence the interactions between the monomers 
and between the monomers and the solvent have the same functional form. To describe the shapes of a 
homopolymer (in the limit of large molecular weight) it is sufficient to model the chain as a sequence of connected 
beads. Such a model can be used to describe the shapes that a chain can adopt in various solvent conditions. A 
measure of shape is the dimension of the chain as a function of the degree of polymerization, N. If TV is large then 
the precise chemical details do not affect the way the size scales with TV [10]. In such a description a homopolymer 
is characterized in terms of a single parameter that essentially characterizes the effective interaction between the 
beads, which is obtained by integrating over the solvent coordinates. 

Proteins are clearly not homopolymers because many energy scales are required to characterize the polypeptide 
chain. Besides the excluded volume interactions and hydrogen bonds the potential between the side chain depends 
on the nature of the residues [1]. Therefore, as a caricature of proteins the heteropolymer model is a better 
approximation. A convenient limit is the random heteropolymer for which approximate analytic treatments are 
possible [11]. In a random heteropolymer the interactions between the beads are assumed to be randomly 
distributed. Some of the interactions are attractive (which are responsible for conferring globularity to the chain) 
while others are repulsive and these residues are better accommodated in an extended conformation. In proteins 
water is a good solvent for polar residues while it is a poor solvent for hydrophobic residues. (In a good solvent 
contacts between the monomer and solvent are favoured whereas in a poor solvent the monomers are attracted to 
each other.) Because only 55% of the residues in proteins are hydrophobic it is clear that in a typical protein 
energetic frustration plays a role. In addition because of chain connectivity there is also topological frustration. 
This arises because residues that are proximal tend to form structures on short-length scales. The assembly of such 
short-length scale structures would typically be incompatible with the global fold giving rise to topological 
frustration. Even if energetic frustrations are eliminated a polypeptide chain (in fact any biomolecule) is 
topologically frustrated [7]. 

In the field of spin glasses and structural glasses such frustration effects are well known [12]. Thus, it was natural 
to suggest that random heteropolymers could serve as a simple representation of polypeptide chains. In appendix 
C2.5.A we sketch computational details for one model of random heteropolymers. Bryngelson and Wolynes [ 13 ] 
proposed, using phenomenological arguments, that the random energy model (REM) would be an appropriate 
description of some aspects of proteins. The rationale for this is the following: consider the exponentially large 
number of conformations. Because of the presence of several conflicting energies in a polypeptide chain it is 
natural to assume that these energies are randomly distributed. If there are no correlations between these energies 
and if the distribution is Gaussian one gets the REM. Of course, in the REM model the chain connectivity is 
ignored and there is no manifest for a spatial dependence of the chain coordinates. We show in appendix C2.5.B 
that in the compact phase the random heteropolymer is equivalent to REM. 
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The random heteropolymer models of proteins are interesting from a statistical mechanics perspective. However, 
they do not explain the key characteristics of proteins, such as reversible and cooperative folding to a unique native 
conformation. Moreover, the theories for heteropolymers suggest that, typically, the energy landscapes for these 
systems are extremely rugged consisting of many minima that are separated by barriers of varying heights [11]. 
This would mean that kinetically it would be impossible for chain with a typical realization of interactions to reach 


the ground state in finite time scales. Thus, the dynamics of such random heteropolymer models typically exhibits 
glassy behaviour. Natural proteins do not exhibit any hallmarks of glassy dynamics at most temperatures of 
interest. It follows that a certain refinement of the random heteropolymers is required to capture protein-like 
properties. One of the important theoretical advances is the observation that very simple minimal models [ 14 ] can 
be constructed that capture many (not all) of the salient features observed in proteins [14]. The simplest 
manifestation of such models are the lattice representation of polypeptide chains. In the next section we introduce 
the models and describe a few results that have been obtained by numerically exploring their behaviour. 


C2.5.3 LATTICE MODELS OF PROTEINS 

The computational protocol for describing protein folding mechanisms is straightforward in principle. The 
dynamics is well described by the classical equations of motion. Simulations of a monomeric protein involves 
equilibrating the polypeptide chain in a box of water molecules at the desired temperature and density. If an 
appropriately long trajectory is generated then the dynamics of the protein can be directly monitored. There are two 
crucial limitations that prevent a straightforward application of this approach to a study of the folding of proteins. 
First, the interaction potentials or the force fields for such a complex system are not precisely known. Molecular 
dynamics simulations in the standard packages use potentials that rely on the transferability hypothesis, i.e. that 
interactions designed in one context can be used in aqueous medium and for larger systems. The need to compute 
potentials that can be used reliably in simulations of protein dynamics remains acute. 

The second problem is related to the limitations in generating really long duration trajectories that can sample all 
the relevant conformational spaces of proteins. To observe reversible folding of even a moderate sized protein 
requires simulations that span the millisecond time scale. More importantly, making comparisons with experiments 
involves generating many (greater than perhaps 100) folding trajectories so that a reliable ensemble average is 
obtained. Thus, we need to make progress on both fronts (force fields and enhanced sampling techniques in long 
duration simulations) before straightforward all-atom simulations become routine. 

In the light of the previously mentioned difficulties various simplified models of proteins have been suggested [14]. 
The main rationale for using such drastic simplifications is that a detailed study of such models can enable us to 
decipher certain general principles that govern the folding of proteins [5, 6 and 7]. For this class of model detailed 
computations without sacrificing accuracy is possible. Such an approach has yielded considerable insights into the 
mechanisms, time scales and pathways in the folding of polypeptide chains. In this section we will outline some of 
the results that have been obtained (largely from our group) with the aid of simple lattice models of proteins. 

In the simple version of the lattice representation of proteins the polypeptide chain is modelled as a sequence of 
connected beads. The beads are confined to the sites of a suitable lattice. Most of the studies have used the cubic 
lattice. To satisfy the excluded volume condition only one bead is allowed to occupy a lattice site. If all the beads 
are identical we have a homopolymer model the characteristics of which on lattices have been extensively studied. 
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To introduce protein-like character the interactions between beads (those separated by at least three bonds) that are 
nearest neighbours on a lattice are assumed to depend on the nature of the beads. The energy of a conformation, 
specified by {r., i = 1,2,. . .,7V}, is 


tfCk)) = ZJ i(|r * " ^ -° )ft / (C2.5.1) 

where TV is the number of beads in the chain, a is the lattice spacing, and B.. is the value of the contact interaction 
between beads / andy. We will consider different forms ofB... Since this model can be viewed as a coarse-grained 
representation of the a-carbons of the polypeptide chain the value of a is typically taken to be about 3.8 A. 


Lattice models have been used for a long time in polymer physics [15]. They were instrumental in computing many 
properties (scaling of the size of the polymer with N, distribution of end-to-end distance, etc.) of real homopolymer 
chains. In the context of proteins lattice models were first introduced by Taketomi et al [16]. The currently popular 
Go model [ 16 ] only considers interactions between residues (beads on the lattice) that occur in the native (ground) 
state. Thus, in this 'strong specificity limit' only native contacts are taken into account. It follows that in this 
version of the Go model the chain is forced to adopt the lowest energy conformation at low temperatures. Go also 
considered a variant of this model in which certain nonnative contacts are allowed. Although these models were 
insightful, Go and co-workers did not use them to obtain plausible general principles of protein folding. This was 
partly due to the fact that in their studies they typically used long chains, and hence exact enumeration was not 
possible. 

Simple lattice models, with the express purpose of obtaining minimal representations of polypeptide chains, were 
first suggested by Chan and Dill [17]. To account for the major interactions in proteins these authors argued that 
the twenty naturally occurring amino acids can be roughly divided into two categories, namely, hydrophobic (H) 
and polar (P). Chan and Dill have suggested that this simple HP model can capture many salient features of 
proteins. They also suggested that many of the conceptual puzzles (the Levinthal paradox in particular) could be 
addressed by systematically studying short chains. This simple exactly enumerable HP model and their variants 
have been used to understand cooperativity, folding kinetics, and the designability of protein structures [14]. Thus, 
it is instructive to describe the calculations that have been done using lattice models. A study of such models 
indeed provides a good introduction to the computational aspects of protein folding. 

C2.5.3.1 EMERGENCE OF STRUCTURES 

The sequence space of proteins is extremely dense. The number of possible protein sequences is 20^. It is clear that 
even by the fastest combinatorial procedure only a very small fraction of such sequences could have been 
synthesized. Of course, not all of these sequences will encode protein structures which for functional purposes are 
constrained to have certain characteristics. A natural question that arises is how do viable protein structures emerge 
from the vast sea of sequence space? The two physical features of folded structures are: (l)in general native 
proteins are compact but not maximally so. (2) The dense interior of proteins is largely made up of hydrophobic 
residues and the hydrophilic residues are better accommodated on the surface. These characteristics give the folded 
structures a lower free energy in comparison to all other conformations. 

Lattice models are particularly suited for answering the questions posed. We will show that the two physical 
restrictions are sufficient to rationalize the emergence of very limited (believed to be only of the order of a 
thousand 
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or so) protein -like structures. To provide a plausible answer to this question using lattice models we need to 

specify the form of the interaction matrix elements B... For purposes of illustration we consider the random bond 

(RB) model in which the elements B.. are distributed as 

y 

Here a (=1) is the variance in B.. and the hydrophobicity parameter 2? Q is the mean value. We chose 2? Q = (half the 
beads are hydrophobic) and Bq = -0.1. The latter is motivated by the observation that in natural proteins roughly 
55% of the residues are hydrophobic [18]. 

Protein-like structures are not only compact but also have low energy. With this in mind we have calculated the 
number of compact structures (CS) as CS with low energy for a given N. The number of CS in its most general 
form may be written as 


c lV (CS)^z w zf tJ - |>A V^ H 


(C2.5.3) 


where In Zis the conformational free energy (in units of k B T), Z 1 is the surface fugacity, d is the spatial dimension, 

and y c represents the possible logarithmic corrections to the free energy. It is clear that natural proteins are 
relatively unique and hence their number on an average has to grow at rates that are much smaller than that given in 
(C2.5.3). To explore this we have calculated by exact enumeration the number of compact structures, C N (CS) and 
the number of minimum energy structures C N (MES) as a function of TV (MES are compact, but not necessarily 
maximally compact). 

We performed exhaustive enumeration of all self-avoiding conformations to explore the conformational space of 
the polypeptide chain of a given length. In order to reduce the sixfold symmetry on the cubic lattice we fixed the 
direction of the first monomeric bond in all conformations. The remaining conformations are related by eightfold 
symmetry on the cubic lattice (excluding the cases when conformations are completely confined to a plane or 
straight line). To decrease further the number of conformations to be analysed the Martin algorithm [19] was 
modified to reject all conformations related by symmetry. 

We define MES as those conformations, the energies of which lie within the energy interval A above the lowest 
energy E^. Several values for A were used to ensure that no qualitative changes in the results are observed. We set 
A to be constant and equal to 1.2 (or 0.6) (definition (i)). We have also tested another definition for A, according to 
which A = \3\E^-t B |/7V, where t is the number of nearest-neighbour contacts in the ground state (definition (ii)). 
It is worth noting that in the latter case A increases with N. Both definitions yield equivalent results. Using these 
definitions for A, we computed C(MES) as a function of the number of residues N. 

The computational technique involves exhaustive enumeration of all self-avoiding conformations for 7V< 15 on a 
cubic lattice. In doing so we calculated the energies of all conformations according to (C2.5.1) and then determined 
the number of MES. Each quantity, such as the number of MES, C(MES), the lowest energy E^, the number of 
nearest-neighbour contacts t in the lowest energy structures, is averaged over 30 sequences. Therefore, when 
referring to these quantities, we will imply their average values. To test the reliability of the computational results 
an additional 
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sample of 30 random sequences was generated. Note that in the case of C(MES) we computed the quenched 
averages, i.e., C(MES) = cxp|1n[r(MbS)JJ, where c is the number of MES for a given sequence. 

The number of MES C(MES) is plotted as a function of the number of residues TV in figure C2.5.2 for B Q = -0.1 
and A = 0.6. A pair of squares at given TV represents C(MES) computed for two independent runs of 30 sequences 
each. For comparison, the number of self-avoiding walks C(SAW) and the number of CS C(CS) are also plotted in 
this figure (diamonds and triangles, respectively). The most striking and important result of this graph is the 
following. As expected on general theoretical grounds, C(SAW) and C(CS) grow exponentially with N, whereas C 
(MES) exhibits drastically different scaling behaviour. There is no variation in C(MES) and its value remains 
practically constant within the entire interval of TV starting with N = 7. We find (see figure C2.5.2 ) that C 

(MES)«10 . These results suggest that C(MES) grows (in all likelihood) only as In TV with N. Thus the restriction 
of compactness and low energy of the native states may impose an upper bound on the number of distinct protein 
folds. 
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Figure C2.5.2. Scaling of the number of MES C(MES) (squares) is shown for the hydrophobic parameter B^ = - 
0.1 and A = 0.6. Data were obtained for the cubic lattice. The pairs of squares for each TV represent the quenched 
averages for different samples of 30 sequences. The number of compact structures C(CS) and self-avoiding 
conformations C(SAW) are also displayed to underscore the dramatic difference of scaling behaviour of C(MES) 
and C(CS) (or C(SAW)). It is clear that C(MES) remains practically flat, i.e. it grows no faster than In N. 

C2.5.3.2 3D HP MODEL 

The calculations described above suggest that upon imposing minimal restrictions on the structures (compactness 
and low energies) the structure space becomes sparse. As suggested before this must imply that each basin of 
attraction (corresponding to a given MES) in the structure space must contain numerous sequences. The way these 
sequences are distributed among the very slowly growing number (with respect to N) of MES, i.e. the density of 
sequences in structure space, is an important question. This was beautifully addressed in the paper by Li et al [20]. 
They considered a 3D (N= 27) cubic lattice. By using the HP model and restricting themselves to only maximally 
compact structures as putative native basins of attractions (NBAs) they showed that certain basins have a much 
larger number of sequences. In particular, they discovered that one of the NBAs serves as a ground state for 3794 

97 

(total number is 2 ) sequences 
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and hence was considered most designable (figure C2.5.3). The precise density of sequences among the NBAs is 
clearly a function of the interaction scheme. These calculations and the arguments presented in the previous 
subsection using the random bond model point out that since the number of NBA for the entire sequence space is 
small it is likely that proteins could have evolved randomly. Naturally occurring folds must correspond to one of 
the basins of attraction in the structure space so that many sequences have these folds as the native conformations, 
i.e. these are highly designable structures in the language of Li et al [20]. These ideas were further substantiated by 
Lindgard and Bohr [21], who showed that among maximally compact structures there are only very few folds that 
have protein-like characteristics. These authors also estimated, using geometrical characteristics and stability 
arguments, that the number of distinct folds is of the order of a thousand. All of these studies confirm that the 
density of the structure space is sparse. Thus, each fold can be designed by many sequences. From the purely 
structural point of view nature does have several options in the sense that many sequences can be 'candidate 
proteins'. However, there is also evolutionary pressure to fold rapidly (i.e. a kinetic component to folding). This 
requirement further restricts the possible sequences that can be considered as proteins, because they must satisfy 
the dual criterion of reaching a definite fold on a biologically relevant time scale. These observations are 
schematically sketched in figure C2.5.4 . 
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Figure C2.5.3. Histogram of the number of structures with a given number of associated sequences N s for the 3D 3 
x 3 x 3 case, in a log-log plot. 

C2.5.3.3 SYMMETRY AND DESIGNABILITY 

In the study by Li et al [20] it was noted that highly designable structures appear to be symmetric. Independently, 
in a thought provoking article Wolynes [22] has made a series of compelling arguments as to why nature might use 
symmetry (at least in an inexact manner) to generate symmetrical tertiary folds of proteins. Many enzymes are 
oligomers. Wolynes makes a number of observations about the symmetry aspects of protein structures: (a) the 
tetrameric haemoglobin (a-helical protein) molecule has an approximate two-fold symmetry. 
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(b) A striking example of approximate symmetry in P-proteins is found in the structure of a monomeric y crystallin 
in which the shapes adopted by residues 1-88 and 89-174 are nearly the same. However, the two individual 
sequences do not bear much similarity. This, of course, is consistent with the notion that the structure space is so 
sparse that many sequences are forced to adopt similar shapes. The interesting conclusion from examining the y 
crystallin structure is that the underlying symmetries in the shape are only inexact, (c) The obvious example of 
nearly symmetrical structures are helical proteins, with the four helix bundle being one the most prominent 
examples (see figure C2.5.5 . (d) Various proteins with mixed topology (like triose phosphate isomerase (TIM) 
barrels and jelly rolls) appear to have the kind of inexact but apparently symmetrical arrangement discussed by 
Wolynes [23]. (e) He also conjectured that it is likely that the underlying approximate symmetry is reflected in the 
free energy landscape being funnel like. This would facilitate rapid folding which for many proteins may be a result 
of evolutionary pressure. The precise connections between the symmetries and the folding mechanisms and 
functional competence of biological molecules have not been worked out. Nevertheless, it appears that employing 
such ideas might be useful in the de novo design of proteins. 


We note that equally striking are the kind of symmetrical arrangements found in RNA molecules [24]- The crystal 
structure of the P4-P6 domain of Tetrahymena self-splicing RNA clearly is highly symmetric with helices packed 
in a nearly regular arrangement. Since in an evolutionary sense the RNA world preceded the protein world it is 
interesting to speculate that the emergence of inexact symmetries may have been a biological necessity. The 
observation of inexact symmetries in a protein structure might be a consequence of the fact that they are present in 
the 'parent' molecules. In fact this evolutionary conservation may have been imprinted when evolution from the 
RNA world to the current scheme for protein synthesis took place. The most compelling reason for observing near 
regular patterns in biomolecular structures is because synthesis of symmetrical folds might be energetically 


economical. 

C2.5.3.4 EXPLORING THE PROTEIN FOLDING MECHANISM USING THE LATTICE MODEL 

It is well known that proteins reach the biologically active native states in a relatively short time, which is of the 
order of a second for most single domain proteins [JJ. Based on folding and refolding experiments on ribonuclease 
A, Anfmsen concluded that under appropriate conditions natural sequences of proteins spontaneously fold to their 
native conformation [3]. This implies that protein folding is a self-assembly process, i.e. the information needed for 
specifying the topology of the native state is contained in the primary sequence itself. This thermodynamic 
hypothesis does not, however, address the question of how the native state is accessed in a short time scale. This 
issue was raised by Levinthal who wondered how a polypeptide chain of reasonable length can navigate the 
astronomically large conformational space so rapidly. Levinthal posited that certain preferred pathways must guide 
the chain to the native state. The Levinthal paradox, simplistic as it is, has served as an intellectual impetus to gain 
an understanding of the ease with which a polypeptide chain reaches the native conformation [5, 6]. We use lattice 
models to describe the foldability of biological sequences of proteins. A sequence is foldable if it reaches the native 
state in a reasonable time and remains stable over some range of external conditions (pH, temperature). 

C2.5.3.5 CHARACTERISTIC TEMPERATURES 

The basic features of folding can be understood in terms of two fundamental equilibrium temperatures that 
determine the 'phases' of the system [7]. At sufficiently high temperatures (^greater than all the attractive 
interactions) the shape of the polypeptide chain can be described as a random coil and hence its behaviour is the 
same as a self-avoiding walk. As the temperature is lowered one expects a transition atT = Tq to a compact phase. 
This transition is very much in the spirit of the collapse transition familiar in the theory of homopolymers [10]. The 
number of compact 
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conformations at Tq is still exponentially large. Because the polypeptide chains have additional energy scales that 
discriminate between the various compact conformations we expect a transition to the ground (native) state at a 
lower temperature T^. Generally the transition at Tq is second order, while the transition at T^ is similar to first 
order. Since we are considering finite systems, the notion of 'phases' should be used with care. These expectations, 
based on fairly general arguments, have been confirmed in various lattice simulations of protein-like 
heteropolymers. For the lattice models the collapse temperature T is determined from the peak of the specific heat 
and the folding transition temperature is obtained from the fluctuations in the overlap function given by 

A* = (X 2 ) ~ (X) 2 (C2.5.4) 

where 


x = ' ~ ff'-3jv+2 £ Si,f ' ~ r " f} (CZ5 - 5) 

with r ^referring to the native state. In figure C2.5.6 a) (for the structure displayed in figure C2.5.7 ) we plot the 
ij 

temperature dependence of C , which has a peak at Tq = 0.83. This figure also shows the variation of d(R ) / dT 

with temperature. The peak of this curve (at 0.86) almost coincides with that of the specific heat indicating that this 

transition is associated with compaction of the chain. Hence, the maximum in C y legitimately indicates the collapse 

temperature. X-ray scattering experiments have been used to obtain Tq for a few proteins. In figure C2.5.6 (a) we 

also show the temperature dependence of Ax from which the folding temperature T^ is determined to be 0.79. 

(A) FOLDING RATES 


The key question we want to answer is what are the intrinsic sequence dependent factors that not only determine 
the folding rates but also the stability of the native state? It turns out that many of the global aspects of the folding 
kinetics of proteins can be understood in terms of the equilibrium transition temperatures. In particular, we will 
show that the key factor that governs the foldability of sequences is the single parameter 

7;, - t v 

<?T = — = (C2.5.6) 

t& 

which indicates how far T^ is from Tq. To establish a direct correlation between the folding time t f and o T we 
generated a number of sequences for N= 27. The folding time was taken to be equal to the mean first passage time. 
The first passage time for a given initial trajectory was calculated by determining the total number of Monte Carlo 
steps (MCS) needed to reach the native conformation for the first time. By averaging over an ensemble of initial 
trajectories (typically this number varies between 400-800 in our examples) the mean first passage time is 
obtained. The precise moves that are utilized in the simulations are described elsewhere [18]. The dependence of x F 
on a T (for the random bond model and for the other interaction schemes) is given in figure C2.5.8 . This figure 
shows a remarkable correlation between the folding time and a T A small change in o T results in a dramatic effect 
(a few orders of magnitude) on the folding times. It is clear that both Tq and T^ are dependent on the sequence. As 

a result mutations that preserve the native state can alter the folding rates due to the change in the o T values. 
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Using lattice models we have also established that folding rates correlate well with Z = (E^ - ^ MS ) / 8, where E^ is 
the native state energy, E M ^ is the average energy of the ensemble of misfolded structures, and 8 is the dispersion 
in the contact energies. The relationship between a and Z also suggests that, in general, the correlation between x F 
and a should be superior. More importantly, experimental measurements of Z are difficult. On the other hand, both 
and T^and r p can be measured in scattering, CD, or fluorescence experiments. Other measures, such as energy gap 
(however it is defined), do not correlate with t f . 

In the previous section we showed that because the structure space is very sparse there have to be many sequences 
that map onto the countable number of basins in the structure space. The kinetics here shows that not all the 
sequences, even for highly designable structures, are kinetically competent. Consequently, the biological 
requirements of stability and speed of folding severely restrict the number of evolved sequences for a given fold. 
This very important result is schematically shown in figure C2.5.4. 


Sequence spaco 


Sequences, which ervcorte MES 


Foldabla s^uences 



Blotogieally competent sequences 


Figure C2.5.4. Schematic illustration of the stages in the drastic reduction of sequence space in the process of 
evolution to functionally competent protein structures. 
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It is important to point out that the simulations reported in figure C2.5.8 were done at sequence-dependent 
temperatures using the condition (%(r )) = 0.21. At these temperatures, all of which are below their respective 
folding transition temperatures, the native conformation has the highest occupation probability. In lattice models 
the native state is a single conformation (a microstate) which is, of course, physically unrealistic. In real systems 
there is a volume associated with the native basin of attraction and there are many conformations that map onto the 
NBA. The probability of being in the NBA at the various simulation temperatures is in excess of 0.5 so that under 
the conditions of our simulations the stability criterion is automatically satisfied. The results in figure C2.5.8 
therefore, shows that the dual requirement of stability and the kinetic accessibility of NBA is most easily satisfied 
by those sequences that have small values of a T Thus rapid folding occurs when o T ~ 0, i.e. near a multicritical- 
like point. In this case there are no detectable intermediate 'phases'. The sequence, whose native state is shown in 
figure C2.5.7 has o T = 0.05. We found that this sequence folds rapidly. 



Figure C2.5.5. Native structure of acyl-coenzyme A binding protein (first NMR structure out of 29 deposited to 
PDB). The figure was created using RasMol 2.6 [§]. 
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Figure C2.5.6. Thermodynamic functions computed for the sequence whose native state is shown in figure C2.5.7 . 
(a) Specific heat C y (dotted curve) and derivative of the radius of gyration with respect to temperature dR IdT 
(broken curve) as a function of temperature. The collapse temperature T Q is determined from the peak of C y and 
found to be 0.83. T a is very close to the temperature at which d (RJ/d T becomes maximum (0.86). This illustrates 


that Tq is indeed associated with the compaction of the chain. The temperature dependence of fluctuations of 
overlap function Ax is given by the full curve. The folding transition temperature T^ is obtained from the peak of A 
Xand for this sequence T^ = 0.79. The curves are scaled to fit one plot, (b) Time dependence of the fraction of 
unfolded molecules P u (t) for the sequence shown in figure C2.5.7 calculated at folding conditions T, < 7"f . The 

function P u (t) is computed from a distribution of first passage times x^.. The first passage time for a given initial 
condition is the first time the trajectory reaches the native conformation. Typically an adequately converged 
distribution is obtained by averaging over several hundred initial conditions. For the conditions used in this 
simulation folding is two state, therefore, P u (t) is adequately fitted with the single exponential (thick full curve). 

The folding time x^ obtained from the fit is 1.4 x 10 6 MCS. 
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Figure C2.5.7. The native conformation of fast folding sequence (N= 27) with random bond potentials is shown. 
This structure has c = 22 non-bonded contacts, therefore it is not a maximally compact conformation for which c ~- 
28. The figure was created using RasMol 2.6 [§]. 
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Figure C2.5.8. Plot of the folding times t f as a function of o T for the 22 sequences. This figure shows that under 
the external conditions when the NBA is the most populated there is a remarkable correlation between x F and g^. 
The correlation coefficient is 0.94. It is clear that over a four orders of magnitude of folding times t f « exp(-a^/ 
a Q ) where a Q is a constant. The filled and open circles correspond to different contact interactions used in C2.5.1 . 
The open squares are for N = 36. 

(B) TOPOLOGICAL FRUSTRATION AND KINETIC PARTITIONING MECHANISM 

Lattice models can also be used to obtain the outlines of the mechanisms for the folding of proteins. The qualitative 
aspects of the folding kinetics of biomolecules can be understood in terms of the concept of topological frustration. 
The primary sequence of proteins has about 55% hydrophobic residues. The linear density of hydrophobic residues 
along the polypeptide chain is roughly constant, implying that the hydrophobic residues are spread throughout the 
chain. As a result on any length scale / there is a propensity for the hydrophobic residues to form tertiary contacts 
under folding conditions. The resulting structures which are formed by contacts between residues that are in 
proximity would be in conflict with the global fold corresponding to the native state. The incompatibility of 
structures on local scales with the near unique native state on the global scale leads to topological frustration. It is 
important to realize that topological frustration is inherent to all foldable sequences, and is a direct consequence of 
the polymeric nature of proteins as well as competing interactions (hydrophobic residues which prefer the 
formation of compact structures and hydrophilic residues which are better accommodated by extended 
conformations). A consequence of topological frustration is that the underlying energy landscape is rugged, 
consisting of many minima that are separated by barriers of varying heights. 
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It is important to understand the nature of the low-lying minima in the rugged energy landscape. On the length 
scale / there are many ways of forming structures that are in conflict with the global fold. It is expected that most of 
these structures have high free energies and are unstable to thermal fluctuations. We expect a certain number of 
these structures to have low free energies and represent relatively deep minima. The similarity between these 
structures and the native fold could be considerable and hence these structures could be viewed as being native like. 
These competing basins of attraction (CB A) in which the polypeptide chain adopts native-like structures can act as 
kinetic traps that will slow down the folding process. 


The basic consequences of topological frustration for mechanisms of folding can be understood in terms of the 
kinetic partitioning mechanism (KPM) [7]. Imagine an ensemble of denatured molecules in search of the native 
conformation. This is the experimental situation that arises when the concentration of denaturant molecules is 
decreased. It is clear that a fraction of molecules O would reach the NBA rapidly without being trapped in the low 
lying energy minima. The remaining fraction would be trapped in the minima and only on longer time scales do 


fluctuations enable the chain to reach the NBA. The value of the partition factor O depends on the sequence and is 
explicitly determined by the a T value. Thus because of topological frustration the initial pool of denatured 
molecules partitions into fast folders and slow folders that reach the native state by indirect off-pathway processes. 

From the description of the kinetic partitioning mechanism (KPM) given above it follows that generically the time 
dependence of the fraction of molecules that have not folded at time t, PJf), is given by 


^ =•■»(- £)*?•"*(-£) 


(C2.5.7) 


where ^ NCNC is the time constant for reaching the native state by the fast process, t^ is the time for escape from the 
CBA labelled k, and a^ is the 'volume' associated with the Mi CBA. From this consideration we expect that for a 
given sequence, trajectories can be grouped into those that reach the native conformation rapidly (O being their 
fraction), and those that remain in one of the CBA for a discernible length of time. In figure C2.5.9 a) we show an 
example of a trajectory that reaches the native state directly from the random coil conformation. In contrast in 
figure C2.5.9 (b) we show an example of a trajectory for the same sequence at the same simulation temperature. 
This figure shows that on a very short time scale the chain gets trapped in conformations other than the NBA and 
only on a long time scale does it reach the native state. This figure illustrates the basic principle of KPM. If we 
perform an average over an ensemble of such trajectories the kinetic result given in (C2.5.7) ensues. 
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Figure C2.5.9. Examples of folding trajectories at T= T s derived from the condition (%(r ) = 0.21. (a) Fast folding 
trajectory as monitored by %(t). It can be seen that sequence reaches the native state very rapidly in a two-state 
manner without being trapped in intermediates. The first passage time for this trajectory is 277 912 MCS. (b) Slow 
folding trajectory for the same sequence. The sequence becomes trapped in several intermediate states with large % 
en route to the native state. The first passage time is 1 1 442 793 MCS. Notice that the time scales in both panels are 
dramatically different. 

(C) CLASSIFYING FOLDING MECHANISMS IN TERMS OF Z r 

The various folding mechanisms expected in foldable sequences may be classified in terms of the o T . We have 
already shown that sequences that fold extremely rapidly have very small values of a^. Based on our study of 
several model proteins as well as analysis of real proteins we classify the folding kinetics of proteins in the 
following [7]. 
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(D) FAST FOLDERS 

For these sequences the value of <j t is less than a certain small value a^. For such sequences the folding occurs 
directly from the ensemble of unfolded states to the NBA. The free energy surface is dominated by the NBA (or a 
funnel) and the volume associated with NBA is very large. The partition factor O is near unity so that these 
sequences reach the native state by two-state kinetics. The amplitudes a^ in (C2.5.7) are nearly zero. There are no 
intermediates in the pathways from the denatured state to the native state. Fast folders reach the native state by a 
nucleation-collapse mechanism which means that once a certain number of contacts (folding nuclei) are formed 
then the native state is reached very rapidly [25, 26]. The time scale for reaching the native state for fast folders 
(which are normally associated with those sequences for which topological frustration is minimal) is found to be 


TNCKC = —/(BtW* (C2.5.8) 

y 

where r| is the solvent viscosity, a is the typical size of a residue, y is the average surface tension between the 
residue and water, f[<J T ) is typically an exponential function of a^, and the exponent co is between 3.8 and 4.2. In 
general, only small proteins (AHess than about 100) are fast folders. 

(E) MODERATE FOLDERS 

Sequences for which Oj<Oj£o h (where a^ is the upper boundary for moderate filters) can be classified as moderate 
folders. Unlike fast folding sequences the O values are fractional which means that a substantial fraction of 
molecules is essentially trapped in one of the CBAs for some length of time. For these sequences there are 
detectable intermediates and, for all but very small proteins, the rate determining step is the activated transition 
from one of the CBAs to the native state. The average time scale for transition from these misfolded structures to 
the native conformation is given by 


p(V^) (C2.5.9) 


at r« Tp This shows that typical barriers for moderate folders are quite small. As a result the folding times even 
for long proteins (JV« 200) are only of the order of a second. It is these small barriers that enable typical proteins to 
fold in a biologically relevant time scale without encountering the Levinthal paradox. 

(F) SLOW FOLDERS AND CHAPERONES 

For sequences with o T > o h folding is extremely slow and these sequences may not reach the native state in a 


biologically relevant time scale. The volume corresponding to NBA is very small in this case and as a result Ois 
nearly zero. The free energy surface is dominated by CBAs. Under these circumstances spontaneous folding does 
not become viable. In cells such proteins are rescued by chaperones. Typically this happens when TV is so large that 
To c\p{ v^texceeds reasonable folding time scales. Thus in cells we expect that only those proteins which are large 
or whose biological functioning state has to be oligomers require chaperones. 
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C2.5.3.6 MINIMUM NUMBER OF RESIDUES FOR OBTAINING FOLDABLE PROTEIN STRUCTURES 

Natural proteins are made up of twenty amino acid residues. An important question, from the perspective of protein 
design, is how many distinct types of residues are required for protein-like behaviour? Such a selection cannot be 
made arbitrarily because in natural proteins one should have polar, hydrophobic, and charged residues. In addition, 
for optimal packing of the core, hydrophobic residues with different van der Waals radii may be required. To 
explore the potential simplification of the number of residues Wang and Wang [27] have carried out a highly 
significant study using lattice models and standard statistical potentials for the contact interaction elements B.. 
(C2.5.1). They discovered that a grouping of amino acid residues into five categories mimics the folding behaviour 
found using the standard twenty residues. To demonstrate this they used a cubic lattice with N = 27 and mostly 
focused on the maximally compact structures as ground states. Thus, structures such as ones given in figure C2.5.6 
are not explicitly considered. Nevertheless, the demonstration that a suitable set of five amino acid residue types is 
sufficient is an important result which should have implications for the protein design problem — the generation of 
primary sequences that can fold to a chosen target folded structure. 

In their original article they mostly focused on various thermodynamic properties (nature and degeneracy of the 
ground states). They also carried out kinetic simulations to assess if the kinetic properties are altered by using a 
reduced number of residues. To test this idea Wang and Wang used the foldability index a (which correlates well 
with folding rates) as a discriminator of sequence properties. The precise question addressed by Wang and Wang is 
the following: what is the minimum number of residues that are required to obtain foldable (characterized by 
having relatively small values of a) sequences? We found that fast folding sequences have a less than about a 
quarter. They carried out two sets of computations. In one set they initially optimized the stability gap [5] of 
various sequences using the twenty residues. They substituted the residues in these optimized sequences by the 
representative residue for each group. Four subgroups were considered with each containing five and a variant, 
three and two amino acids. The foldability index for the standard sample and their substitutes is shown in figure 
C2.5.1Q as full circles. In another set of computations they examined the foldability index (open diamonds in figure 
C2.5.1Q ) for sequences that were optimized using the reduced sets of amino acids. Both these curves show that as 
long as the number of amino acid types exceeds five one can generate sequences with relatively small values of a. 
figure C2.5.10 also shows that smaller values of a can be obtained if optimization is carried out with reduced sets 
of amino acids. Such sequences are foldable, i.e. the dual requirements of stability over a wide temperature range 
and the kinetic accessibility of their native states are simultaneously satisfied. 
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Figure C2.5.10. The figure gives the foldability index a of 27-mer lattice chains with sets containing different 
number of amino acids. The sets are generated according to scheme described in [27]. The set of 20 amino acids is 
taken as a standard sample. Each sequence with 20 amino acids is optimized to fulfil the stability gap [5]. The 
residues in the standard samples are substituted with four different sets containing a smaller number of amino acids 
[27]. The foldability of these substitutions is indicated by the full circles. The open diamonds correspond to the 
sequences with same composition. However, the amino acids are chosen from the reduced representation and the 
resultant sequence is optimized using the stability gap [5]. 

C2.5.4 CONCLUSIONS 

The examples of modelling discussed in section C2.5.2 and section C2.5.3 are meant to illustrate the ideas behind 
the theoretical and computational approaches to protein folding. It should be borne in mind that we have discussed 
only a very limited aspect of the rich field of protein folding. The computations described in section C2.5.3 can be 
carried out easily on a desktop computer. Such an exercise is, perhaps, the best of way of appreciating the simple 
approach to get at the principles that govern the folding of proteins. 

In this section we have not discussed experimental advances that are offering extraordinary insights into the way 
the denatured molecules reach the native state. Two remarkable experimental approaches hold the promise that in 
short order we will be able to watch the folding process from submicrosecond time scale until the native state is 
reached. A brief summary of these follow. 

(1) Eaton et al [28] have shown that optical triggers of folding can offer a window into the folding process from 
the microsecond time scale. Since this many laboratories have probed the plausible structure formations that 
occur on a short time scale. Fast folding experimental techniques have been used to obtain the detailed 
kinetics for the building blocks of proteins, namely, P-hairpin, a-helices, and loops. Very recent experiments 
have given compelling evidence that there are populated native-like intermediates even in proteins that were 
thought to follow two-state kinetics. 
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(2) Perhaps the most exciting development in the last few years is the ability to nanomanipulate single 

biomolecules using atomic force microscopy and optical tweezer techniques [29]. So far such experiments 
have been used to provide a microscopic basis of elasticity in muscle proteins. If these stretching experiments 
can be combined with fluorescent resonance energy transfer experiments then it is possible to follow the 
folding of individual molecules as it passes through the transition state to the native conformation. It has been 
suggested on theoretical grounds that such two-dimensional single-molecule experiments can measure 
directly the distribution of folding rates (and the barrier distribution) in much the same way that mean first 
passage times are computed in minimal protein models (see section C2.5.3 ). 


The challenges posed by these high precision experiments demand more refined models and further developments 
in computational techniques. For the theoretically inclined it will no longer be sufficient to describe kinetics only in 
terms of energy landscapes. The wealth of data that are being generated by experiments such as those mentioned 
above, requires a quantitative understanding of the various factors that govern the pathways, mechanisms, and the 
transition states in the folding process. These challenging issues will make the area of biomolecular folding an 
engaging one for many years to come. 


ACKNOWLEDGMENTS 

We are grateful to John D Weeks for useful comments and to Chao Tang for supplying figure C2.5.3 . We are 
indebted to Dr J Wang and Professor W Wang for kindly providing us with figure C2.5.1Q prior to publication. 


APPENDIX C2.5.A 

There are several versions of the random heteropolymer models. To keep the discussions technically simple we will 
consider one case — the so-called random hydrophilic-hydrophobic chain whose phases were studied by Garel et al 
(GLO) [11]. The GLO model consists of a polymer chain with TV monomers. The GLO model can be viewed as a 
generalization of the popular Edwards model which was introduced in order to understand the swelling of real 
homopolymer chains in good solvents [10]. In the GLO model the chain is made up of hydrophobic (hydrophilic) 
residues that tend to collapse (swell) the chain when dispersed in a solvent. The solvent mediated interactions at 
each site are assumed to be random. The random interactions depend only on a given site i and the strength depends 
on the degree of hydrophilicity X.. Besides the term accounting for chain connectivity there are two- and higher- 
body interactions that determine the shape of the chain. In the GLO model the two-body interaction is given by 

W/; = ^0 + fi(K + kj)S[n - rj] (C2.5.A1) 

where v is repulsive short-range interaction, A, . is a quenched random variable which is distributed as 

™' )= ;/i^ exr V^ _ i' (CZ5A2) 
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If the mean A, Q is positive then the majority of the residues are hydrophilic. A description of the collapsed phase of 
the chain requires introducing three- and and four-body interaction terms. Thus, the total Hamiltonian is 

ffH = - ^ v i} + - J] wiHn - rj)&in - r k ) 
* it; ° Wftt 

(C2.5.A3) 

+ — ^Z ^to " r j)&i r i " o)* to - n). 

Since the charge variables X. are quenched the thermodynamics of the system requires averaging the free energy 
using the distribution P(X), i.e. 


/n 


F = -k s T f P[ J>(Xf)lnZ(X,)d{A f }. (C2.5.A4) 


The average of In Z(X) is most conveniently done using the replicas through the relation 

In Z = Mm ^—^. (C2.5.A5) 

rJ^O It 

Using (C2.5.A2)-(C2.5.A4) the required average can be carried out. This leads to a complicated expression for 
Z' r where the bar indicates the average over the quenched random variables X f . In terms of the order parameters 

<M(r,r') = jdsS(rJs) - r)&(r^) - r') (C2.5.A6) 


and 


pAO = jds&irjjt) - r) (C2.5.A7) 


(a and Z? are replica indices) the expression for ^becomes 


where 


Z" = f Dg^ir.r^Dq^ir.ryDpJryD^otrjcxplH^] (C2.5.A8) 


H?K = G[q a b.q a t,.p a *$ a ) -Hnft^A,^} (C2.5.A9) 
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with 


and 




(C2.5.A10) 


In(C2.5A10)# r {r^)}is 

and 

a^ = ft^ - 3^ 2 i 2 . (C2.5.A12) 


The path integrals in (C2.5.A10) may be evaluated using the spectrum of the effective ^z-body Hamiltonian 

in the limit of n ® 0. If TV is very large then we can use ground state dominance to evaluate the spectrum of H n . 
This gives 

and 

<■<**. A.) - «p[- * min {(* I W, |VJ - ff t (!* I V) - I }] (C2 5 A14) 

where £ is the ground state energy of H^. GLO evaluated the integral over q ab (C.2.5.A6) by a saddle point 
approximation which leads to 

[q^ir, r' r ) = -/J 2 ) L 2 p„{r)pfc(r') i (C2.5.A15) 
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From the above equation it follows that in the mean-field limit replica symmetry is not broken. This makes the 
GLO model conceptually simpler to interpret than the random bond heteropolymer model discussed in the 
appendix. 

The total wavefunction ^{r^ r 2 , . . ., r n ) is written as a product of single-particle functions (Hartree 
approximation). The various integrals are evaluated in the saddle point approximation. A simple Gaussian form for 
the trial one-particle wavefunction 


^-k^T •*(-&) 


rf/4 

(C2.5.A16) 


is chosen with R being the single variational parameter. Upon performing the Gaussian integrals the free energy per 
monomer /becomes 

PJ = && + <2^y 2 F + Q (CZ5A17) 

where 

,., > C2.5.A18 

At low temperatures the shape of the chain is determined by the sign of first term in (C2.5.A18). If the sign is 
negative then the positive four-body term is required for a stable theory. 

The phase of the random hydrophobic-hydrophilic model is complicated and depends on the value of X n [11]. We 


only describe the hydrophilic case when A, Q is positive. In this case there is a first-order transition to a collapsed 

state (R ~ AT ) induced by the negative three-body term. GLO have pointed out that this transition is neither the 
usual 9-point nor is it a freezing temperature because there is no replica symmetry breaking. In fact, this collapse 
transition resembles that seen in proteins where it is suspected that it is a first-order transition. The microscopic 
origin of the first-order transition upon collapse of polypeptide chains is not fully understood. Recent arguments 
suggest that it could arise because the burial of hydrophobic residues and the accommodation of the hydrophilic 
ones at the surface of proteins in water requires some work and perhaps this assembly happens in a discontinuous 
manner. 


APPENDIX C2.5.B 

In section C2.5.2 we considered a variational-type theory to treat the thermodynamics of the random hydrophobic- 
hydrophilic heteropolymer. Here we describe a limiting behaviour of the random bond model [30]. 
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In this appendix we show that the random bond model in the compact phase is identical to the random energy 
model (REM). Historically, REM was proposed as a caricature for proteins on phenomenological grounds [13]. 
The heteropolymer with random bond interactions was treated using a variational theory which suggested that 
when the disorder increases beyond a limiting value the chain undergoes a thermodynamic glass transition. The 
nature of this transition is closely related to Potts glasses. 


The random-bond heteropolymer is described by a Hamiltonian similar to (C2.5.A3) except that the short-range 
two-body term v.. is taken to be random with a Gaussian distribution. In th 
value of co 3 is needed to describe the collapsed phase. The Hamiltonian is 


two-body term v.. is taken to be random with a Gaussian distribution. In this case a three-body term with a positive 


ff = ^{vo + ^j)50v - rj) + ^ &{n -rj)S(rj - n) (C2.5.B1) 

The distribution of the random couplings is given by 

Pi Vi, ) = -j=L= CXp ( - -% V (C2.5.B2) 

"3 

In the collapse phase the monomer density p = N/R is constant (for large N). Thus, the only conformation 
dependent term in (C2.5.A1) comes from the random two-body term. Because this term is a linear combination of 
Gaussian variables we expect that its dist ributio n is also Gaussian and, hence, can be specified by the two 
moments. Let us calculate the correlation l£\ /^between the energies E^ and E 2 of two conformations {^ j 3 5 }and 

\f ? 2K ]of the chain in the collapsed state. The mean square of E^ is 

i J=T2><"- r '" ) = T w * <C25B3) 

'■J 

which is independent of the collapsed conformation. Similarly, we have 


,.2 


2 

,2 


= vl]^ (r ' , "' ) 


r.r' 
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where ? 12 ( r ' r ^ * s t ' ie over l a P between the two conformations. Because 

J^ufo r'> = W (C2.5.B4) 

and since the monomer density is constant we have g 12 ( r > O = p 2 /A^. This implies 

~E^~2 = y *T, (C2.5.B5) 

Thus the joint probability is 

(C2.5.B6) 



which is equivalent to the behaviour in uncorrelated REM. Thus, it is not a surprise that in large dimensions (which 
are captured by variational type treatments) the random bond heteropolymer model yields exactly the same result 
as the REM. 
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C2.6 Colloids 

Jeroen S van Duijneveldt 


C2.6.1 INTRODUCTION 


C2.6.1.1 CLASSIFICATION OF COLLOIDS 


The term colloid refers to systems where one phase is finely divided in another phase — with at least one of the 
dimensions in the range of about 1 nm to 1 um. This encompasses a wide variety of systems, some of which will be 
mentioned below. In a narrower sense, the word colloid is often used to denote systems consisting of solid particles 
(or liquid droplets) suspended in a liquid. This contribution will mainly focus on such systems. On the one hand, 
these particles are (significantly) larger than the solvent molecules. On the other hand, they are sufficiently small to 
remain suspended and undergo vivid Brownian motion (after the British botanist Robert Brown, who published his 
observations on aqueous pollen suspensions in 1827). The term colloid (after the Greek word for 'glue') was 
coined by Thomas Graham in the 1860s, to denote substances such as gelatin, albumin and gums. In a solution, 
these would not pass a dialysis membrane. 

First of all, a general classification can be made depending on the nature of the continuous and suspended phases: 
gas, liquid, or solid. The names of the corresponding colloidal systems are summarized in table C2.6.1. 
Traditionally, following Kruyt [1], colloids are further classified as either reversible or irreversible, depending on 
whether they redisperse spontaneously when they are added to a solvent. Polymer and micellar solutions would be 
reversible, for instance, whereas suspensions and emulsions would usually be irreversible. These terms are more or 
less equivalent to the terms lyophilic (solvent-loving) and lyophobic (solvent-hating), respectively, which are also 
used widely. Many systems encountered in technology or in nature are colloids. Some examples are given in table 
C2.6.2 . 

Table C2.6.1 Classification of colloidal systems. 


Dispersion medium 

phase Gas 

Liquid Solid 


Gas — Foam Solid foam 

Liquid Aerosol Emulsiom Solid emulsion 

Solid Aerosol Suspension Solid disersion 
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Table C2.6.2 Some practical examples of colloidal systems. 


Aerosols 

Inks 

Agrochemicals 

Milk 

Blood 

Paints 

Carbon black 

Pastes 

Cosmetics 

Polymer solutions 

Drilling muds 

Protein solutions 

Fog 

Soils 

Ice-cream 

Viruses 


C2.6.1.2SCOPE 

In practice, e.g., in nature or in formulated products, colloidal suspensions (also denoted sols or dispersions) tend to 
be complex systems, consisting of many components that are often not very well defined, in terms of particle size 
for instance. Much progress has been made in the understanding of colloidal suspensions by studying well defined 
model systems, which allow for a quantitative modelling of their behaviour. Such systems will be discussed here. 

Although the remainder of this contribution will discuss suspensions only, much of the theory and experimental 
approaches are applicable to emulsions as well (see [2] for a review). Some other colloidal systems are treated 
elsewhere in this volume. Polymer solutions are an important class — see section C2.1 . For surfactant micelles, see 
section C2. 3 . The special properties of certain particles at the lower end of the colloidal size range are discussed in 
section C2.17 . 

C2.6.1.3 COLLOIDS AS ATOMS 

In addition to their practical importance, colloidal suspensions have received much attention from chemists and 
physicists alike. This is an interesting research area in its own right, and it is an important aspect of what is referred 
to as soft condensed matter physics. This contribution is written from such a perspective, and although a balanced 
account is aimed for, it is inevitably biased by the author's research interests. References to the original literature 
are included, but within the scope of this contribution only a fraction of the vast amount of literature on colloidal 
suspensions can be mentioned. 

Colloidal particles can be seen as large, model 'atoms'. In what follows we assume that particles with a typical 

radius a = 100 nm are studied, about 10 times as large as atoms. Usually, the solvent is considered to be a 
homogeneous medium, characterized by bulk properties such as the density p and dielectric constant e. A full 
statistical mechanical description of the system would involve all colloid and solvent degrees of freedom, which 
tend to be intractable. Instead, the potential of mean force, V, is used, in which the interactions between colloidal 
particles are averaged over 
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all solvent degrees of freedom [3, 4]. Usually, Fis written as a sum of pair potentials. Its equivalent for an atomic 
system is the potential energy. Analogously, the osmotic pressure IT replaces the pressure/?. As a consequence of 


this colloid-atom analogy, for instance, colloidal suspensions at low concentrations obey van 't Hoff s law, n = 
nkT, the equivalent of the ideal gas law. At higher concentrations, colloids can display a similar phase behaviour as 
simple liquids, including colloidal gas, liquid, and crystal phases, that differ in the arrangement of the particles 
within the solvent. 

Model colloids have a number of properties that make them experimentally convenient and interesting systems to 
study. For instance, the timescale for 'structural relaxation' of a colloidal fluid can be estimated as the time for a 
particle to diffuse a distance equal to its radius, 

a 2 
R D 

where the Stokes diffusion coefficient of a sphere in a liquid of viscosity r| is given by 

D= . (C2.6.1) 


This typically yields t R of order 0.01 s. 

Due to the particle size, a colloidal crystal is much weaker than a normal solid material — the elastic moduli are 

proportional to the number density n, and therefore a colloidal solid would be about 10 9 times weaker than its 
atomic equivalent. The weakness of a colloidal solid means that a crystal can be broken up easily, by shaking the 
sample, for instance. The slow structural relaxation means that non-equilibrium behaviour is experimentally 
accessible, such as crystallization kinetics and glass or gel formation. Indeed, many colloidal systems probably 
never reach thermodynamic equilibrium on the timescale of experiments. Furthermore, as shown below, the 
interaction potential between colloidal particles can be tuned by varying the surface chemistry of the particles and 
the solvent conditions. 

In the theory of the liquid state, the hard-sphere model plays an important role. For hard spheres, the pair 
interaction potential V(r) = oo for r < d, where d is the particle diameter, whereas V(r) = for r > d. The structure 
of a simple fluid, such as argon, is very similar to that of a hard-sphere fluid. Hard-sphere atoms do, of course, not 
exist. Certain model colloids, however, come very close to hard-sphere behaviour. These systems have been 
studied in much detail and some results will be quoted below. 

C2.6.1.4 OUTLINE 

The remainder of this contribution is organized as follows. In section C2.6.2 , some well studied colloidal model 
systems are introduced. Methods for characterizing colloidal suspensions are presented in section C2.6.3 . An 
essential starting point for understanding the behaviour of colloids is a description of the interactions between 
particles. Various factors contributing to these are discussed in section C2.6.4 . Following on from this, theories of 
colloid stability and of the kinetics of aggregation are presented in section C2.6.5 . Finally, section C2.6.6 is 
devoted to the phase behaviour of concentrated suspensions. 
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C2.6.2 MODEL COLLOIDS 

A huge variety of model colloids have been studied. In this section we will highlight a few of these, of particular 
interest to the discussion of concentrated suspensions in section C2.6.6 . 


C2.6.2.1 POLYDISPERSITY 


Even when carefully prepared, model colloids are almost never perfectly monodisperse. The spread in particle 
sizes, or polydispersity, is usually expressed as the relative width of the size distribution, 


a = 


a 


where s^ denotes the standard deviation of a. So-called monodisperse model systems tend to have polydispersities 
ranging from about a = 0.01 to about 0.2. Suspensions encountered in practice tend to be much more poly disperse 
than that. When performing accurate quantitative experiments, the polydispersity needs to be taken into account. In 
some cases, such as in the formation of colloidal crystals (see section C2.6.6 ), the qualitative behaviour may also 
depend sensitively on the polydispersity. 

C2.6.2.2 INORGANIC PARTICLES 

Traditionally, most model studies were carried out using inorganic colloids, for instance gold and sulphur sols. A 
variety of particle types and shapes are provided by hydrous metal oxides [5] and silica [6]. Inorganic model 
suspensions are usually made using a nucleation and growth process. By controlling the nucleation step, 
monodisperse suspensions can be obtained. For instance, monodisperse silica spheres can be obtained by 
hydro lyzing alkoxysilanes in alcoholic solution [7]. Alternatively, seed particles can be prepared using 
microemulsions and then grown to the required size, resulting in very monodisperse suspensions (see figure 
C2.6.1 . Near hard-sphere silica suspensions can then be obtained by coating the particles with a chemically grafted 
polymer layer and suspending them in organic solvents [8]. 
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Figure C2.6.1. SEM image of silica spheres of radius a : 
A van Blaaderen) 


75 nm and polydispersity a < 0.01 (courtesy of Professor 


C2.6.2.3 POLYMER LATTICES 


An important step in the progress of colloid science was the development of monodisperse polymer latex 
suspensions in the 1950s. These are prepared by emulsion polymerization, which is nowadays also carried out 
industrially on a large scale for many different polymers. Perhaps the best-studied colloidal model system is that of 
polystyrene (PS) latex [9]. This is prepared with a hydrophilic group (such as sulphate) at the end of each molecule. 
In water this produces well defined spheres with a number of end groups at the surface, which (partly) ionize to 


produce charged particles. In aqueous suspensions, near hard-sphere behaviour can be obtained by adding 
sufficient salt to screen the electrostatic repulsions (see section C2.6.4 ). 

Another model system consists of polymethylmethacrylate (PMMA) latex, stabilized in organic solvents by a 
'comb' polymer, consisting of a PMMA backbone with poly- 12 -hydroxy stearic acid (PHSA) chains attached to it 
[10]. The PHSA chains form a steric stabilization layer at the surface (see section C2.6.4 ). Such particles can 
approach the hard-sphere model very well [11]. 

C2.6.2.4 PARTICLES WITH UNUSUAL PROPERTIES 

In addition to the 'standard' model systems described above, more exotic particles have been prepared with certain 
unusual properties, of which we will mention a few. For instance, using seeded growth techniques, particles have 
been developed with a silica shell which surrounds a core of a different composition, such as particles with 
magnetic [12], fluorescent [13] or gold cores [14]. Another example is that of spheres of polytetrafluoroethylene 
(PTFE), which are optically anisotropic because the core is crystalline [15]. 

A different class, in between polymer lattices and polymer solutions, is that of microgels, consisting of weakly 
crosslinked polymer networks. Just as for polymer solutions, small changes in the solvency conditions may have 
large 
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effects on their behaviour. They tend to undergo a swelling-deswelling transition, which for sufficiently weakly 
crosslinked particles may result in a particle size change by a factor of 5 [16]. 

C2.6.2.5 NON-SPHERICAL COLLOIDS 

Although the majority of studies on model colloids involve (quasi-) spherical particles, there is a growing interest 
in the properties of non-spherical colloids. These tend to be either rod-like or plate-like. 

One model for rod-like colloids is the tobacco mosaic virus (TMV), which consists of rods of diameter D about 18 
nm and length L of 300 nm [ 17 , 18 ]. These colloids have the advantage of being quite monodisperse, but are hard 
to obtain in large amounts. The fd virus gives longer, semi-flexible rods (L = 880 nm, D = 9 nm) [18, 19]. Inorganic 
boehmite rods have also been prepared successfully [20]. 

The major class of plate-like colloids is that of clay suspensions [21]. Many of these swell in water to give a stack 
of parallel, thin sheets, stabilized by electrical charges. Natural clays tend to be quite polydisperse. The synthetic 
clay laponite is comparatively well defined, consisting of discs of about 1 nm in thickness and 25 nm in diameter. It 
has been used in a number of studies (e.g. [22]). 

C2.6.2.6 PURIFICATION 

After preparation, colloidal suspensions usually need to undergo purification procedures before detailed studies can 
be carried out. A common technique for charged particles (typically in aqueous suspension) is dialysis, to deal with 
ionic impurities and small solutes. More extensive deionization can be achieved using ion exchange resins. 

Another standard method is to use a (high-speed) centrifuge to sediment the colloids, replace the supernatant and 
redisperse the particles. Provided the particles are well stabilized in the solvent, this allows for a rigorous 
purification. Larger objects, such as particle aggregates, can be fractionated off because they settle first. A third 
method is (ultra)filtration, whereby larger impurities can be retained, particularly using membrane filters with 
accurately defined pore sizes. 


C2.6.3 PROPERTIES AND CHARACTERIZATION METHODS 

Even when well defined model systems are used, colloids are rather complex, when compared with pure molecular 
compounds, for instance. As a result, one often has to resort to a wide range of characterization techniques to 
obtain a sufficiently comprehensive description of a sample being studied. This section lists some of the most 
common techniques used for studying colloidal suspensions. Some of these techniques are discussed in detail 
elsewhere in this volume and will only be mentioned in passing. A few techniques that are relevant more 
specifically for colloids are introduced very briefly here, and a few advanced techniques are highlighted. 

Although the behaviour of colloidal suspensions does in general depend on temperature, a more important control 
parameter in practice tends to be the particle concentration, often expressed as the volume fraction <\>. In fact, for 
hard- sphere suspensions the phase behaviour is determined by $ only. For spherical particles 4* = ^JTrt" tt. 
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In practice, there are various ways by which § can be determined for a given sample, and the results may be 
(slightly) different. In particular, for sterically stabilized particles, the effective hard-sphere volume fraction will be 
different from the value obtained from the total solid content. 

C2. 6. 3. 1 OBSERVA TION 

Straightforward, direct observation is generally very useful to assess suspension stability, phase separations, etc. 
Light microscopy (see section B 1.1 9 ) can, under some conditions, image particles directly. Often, however, this is 
prevented by sample turbidity or insufficient resolution. An enhanced resolution can be obtained by preparing core- 
shell particles with a fluorescent core. Even when the particles are touching, the cores can still be resolved using 
confocal scanning laser microscopy (CSLM), allowing for the determination of three-dimensional structures in 
dense suspensions [ 23 ] (see figure C2.6.2). 

Electron microscopy (see section B 1.1 8 ) is very valuable in characterizing particles (see, for instance, figure 
C2.6.1 ). The suspension structure is, of course, not represented well because of the vacuum conditions in the 
microscope. This can be overcome using environmental SEM [24]. 



Figure C2.6.2. CSLM image of near hard-sphere silica particles of diameter d = 1050 nm with a fluorescent core 
of diameter 400 nm, showing fee stacking (top), hep stacking (bottom middle) and amorphous areas (image size 
16.3 urn x 16.3 um, courtesy of Professor A van Blaaderen) 


C2.6.3.2 GENERAL PROPERTIES 

Because model colloids tend to have a rather well defined chemical composition, elemental analysis can be used to 
obtain detailed information, such as the grafted amount of polymer in the case of sterically stabilized particles. 
More details about the chemical structure can be obtained using NMR techniques ( section B 1.1 3 ). In addition, 
NMR 
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relaxation techniques can be used to quantify properties such as polymer adsorption to the particle surface (see 
section B 1.1 3 ). Often further details, such as particle density and refractive index, need to be determined as well. 

C2.6.3.3 SCATTERING TECHNIQUES 

Because colloidal particles typically have a size similar to that of the wavelength of light, light scattering and 
diffraction are very useful in characterizing their suspensions. In particular, for small particles, x-ray and neutron 
scattering are employed as well (see section B 1.9 ). Photon correlation spectroscopy (PCS) is a standard technique 
for particle size determination. Static light scattering is also used for this, and to characterize the structure of 
suspensions. Although, in an indirect way, both techniques can yield information on particle interactions as well. 

C2.6.3.4 PARTICLE INTERACTIONS 

The interactions between colloidal particles (see section C2.6.4 ) are central to the understanding of suspension 
behaviour. Although most work has had to rely on rather indirect ways to characterize these interactions, novel 
techniques are emerging that access these interactions more directly. 

Particles can be manipulated in suspension using strongly focused laser beams ('optical tweezers') [25] or magnetic 
fields [ 26 ] and by collecting statistics on the particle movements using video microscopy, information on the 
particle interactions can be obtained. 

Surfaces can be characterized using scanning probe microscopies (see section B 1.1 9 ). In addition, by attaching a 
colloidal particle to the tip of an atomic force microscope, colloidal interactions can be probed as well [27]. 
Interactions between surfaces can be studied using the surface force apparatus (see section B 1.20 ). This also helps 
one to understand the interactions between colloidal particles. 

C2.6.3.5 RHEOLOGY 

The study of the rheology, or flow behaviour, of suspensions is an important method for characterizing particle 
interactions. Controlling the rheological properties of suspensions is also crucial in practice, where during 
processing and in the final application the flow behaviour usually has to be within fairly narrow specifications. This 
field is only touched upon here; for more details consult [28, 29, 30 and 31], or the general colloid science texts 
[32, 33 and 34]- For Newtonian fluids, the shear stress (force/area) x is proportional to the shear rate (velocity 
gradient), }>, 

r = W- (C2.6.2) 


For dilute dispersions of hard spheres, Einstein's viscosity equation predicts 

— = 1+2.50 + -- 

no 

where r\ n denotes the solvent viscosity. For concentrated hard-sphere suspensions, experimental data can be 


correlated 
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using the Krieger-Dougherty equation, 

- = [1 - (^M^)!"""^ 

where [r|] is called the intrinsic viscosity and the viscosity diverges at a concentration (b . At low shear rate, (b 
= 0.63 (similar to the random close packing density; see section C2.6.6.2 ) and [r|] = 3.13. The Krieger-Dougherty 
equation is also widely used to correlate data for other types of suspensions. 

Colloidal dispersions often display non-Newtonian behaviour, where the proportionality in equation (C2.6.2) does 
not hold. This is particularly important for concentrated dispersions, which tend to be used in practice. Equation 
(C2.6.2) can be used to define an apparent viscosity, ^| aDD , at a given shear rate. If ri decreases with increasing 
shear rate, the dispersion is called shear thinning (pseuaoplastic); if it increases, this is known as shear thickening 
(dilatant). The latter behaviour is typical of concentrated suspensions. If a finite shear stress has to be applied 
before the suspension begins to flow, this is known as the yield stress. The apparent viscosity may also change as a 
function of time, upon application of a fixed shear rate, related to the formation or breakup of particle networks. 
Thixotropic dispersions show a decrease in ri with time, whereas an increase with time is called rheopexy. 

C2.6.3.6 SEDIMENTATION AND DIFFUSION 

In most colloidal suspensions the particles have a tendency to sediment. At infinite dilution, spherical particles with 
a density difference Ap with the solvent will move at the Stokes velocity 

At finite concentration, the settling rate is influenced by hydrodynamic interactions between the particles. For 
purely repulsive particle interactions, settling is hindered. Attractive interactions encourage particles to settle as a 
group, which increases the settling rate. For hard spheres, the first-order correction to the Stokes settling rate is 
given by [ 33 ] 

U 

— = 1- 6.55*. 

I/O 

The tendency for particles to settle is opposed by their Brownian diffusion. The number density distribution of 
particles as a function of height z will tend to an equilibrium distribution. At low concentration, where van 't Hoff s 
law applies, the barometric height distribution is given by 




where m is the buoyant mass of the particles and g the gravitational acceleration. Perrin has already used this to 
determine Avogadro's number [35]. 

Given the a size dependence of the settling rate, sedimentation can be used for particle size analysis. Indeed, a 
quick 
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impression of particle size (or degree of aggregation of primary particles) is often obtained from the settling 
behaviour of dilute suspensions. A quantitative analysis of particle sizes can be carried out using the analytic 
ultracentrifuge (see, for instance, [34]). 

In practice, sedimentation is an important property of colloidal suspensions. In formulated products, sedimentation 
tends to be a problem and some products are shipped in the form of weak gels, to prevent settling. On the other 
hand, in applications such as water clarification, a rapid sedimentation of impurities is desirable. 

C2.6.3.7 ELECTROKINETIC PHENOMENA 

In particular, in polar solvents, the surface of a colloidal particle tends to be charged. As will be discussed in 
section C2. 6.4.2 , this has a large influence on particle interactions. A few key concepts are introduced here. For 
more details, see [ 32 ] (ch 13), [ 33 ] (ch 7), [ 36 ] (ch 4) and [ 34 ] (ch 12). The presence of these surface charges gives 
rise to a number of electrokinetic phenomena, in particular electrophoresis. 

In electrophoresis, the motion of charged colloidal particles under the influence of an electric field is studied. For 
spherical particles, we can write 

V = jAtE 

where v is the particle velocity, E the electric field and ju £ is called the electrophoretic mobility. For low surface 
electrostatic potentials, it is given by Henry's equation, 

3?) 

where k is the inverse screening length (see section C2.6.2 ). j /to) increases from 1 at Ka = to 1.5 at Ka — > oo. The 
zeta-potential £ represents the electrostatic potential near the point where the diffuse double layer ( section 
C2.6.4.2 ) starts. 

Related phenomena are electro-osmosis, where a liquid flows past a surface under the influence of an electric field 
and the reverse effect, the streaming potential due to the flow of a liquid past a charged surface. 


C2.6.4 PARTICLE INTERACTIONS 

Many properties of colloidal suspensions, such as their stability, rheology, and phase behaviour, are closely related 
to the interactions between the suspended particles. The background of the most important contributing factors to 
these interactions is discussed in this section. 

C2.6.4.1 VAN DER WAALS INTERACTIONS 

Between any two atoms or molecules, van der Waals (or dispersion) forces act because of interactions between the 
fluctuating electromagnetic fields resulting from their polarizabilities (see section A 1.5 , and, for instance, 
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[37]). Similarly, van der Waals forces operate between any two colloidal particles in suspension. In the 1930s, 
predictions for these interactions were obtained from the pairwise addition of molecular interactions between two 
particles [38]. The interaction between two identical spheres is given by 


Wr) = ~ 6 b^T + 7T ■ ! H l " 7^ jj ( C2 - 63 ) 

where ^4 is the Hamaker constant, which typically is of order 10 J. Similar equations are obtained for other 
geometries [37, 39 ]. 

At large separation r, equation (C2.6.3) decays as V y ^r(r) oc r , just as the van der Waals interactions between 
molecules do. However, at large separation, say r > 100 nm, relativistic effects have to be taken into account and 

n 

the so-called retarded van der Waals interactions decay as r . 
At short separations, equation (C2.6.3) tends to 

V^-(r) * A_jL (C2.6.4) 

where H = r - 2a is the surface separation. At contact, equation (C2.6.4) would predict an infinitely strong 
attraction. In reality, this is prevented by steep Born repulsions at short distances. Nevertheless, the van der Waals 
interactions tend to create a deep interaction minimum near r = 2a, strong enough to result in aggregation of 
suspended particles, unless a stabilizing mechanism such as electrostatic interactions or steric stabilization is 
provided (see section C2.6.2 and section C2.6.3 ). 

The Hamaker constant can be evaluated accurately using the continuum theory, developed by Lifshitz and 
coworkers [40]. A key property in this theory is the frequency dependence of the dielectric permittivity, e(oo). If 
this spectrum were the same for particles and solvent, then A = 0. Since the refractive index n is also related to e 
(co), the van der Waals forces tend to be very weak when the particles and solvent have similar refractive indices. A 
few examples of values for A for interactions across vacuum and across water, obtained using the continuum 
theory, are given in table C2.6.3 . 
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Table C2.6.3 Hamaker constants A (10" 20 J) (from [120]). 


Medium 


Material Vacuum Water 


Water 

3.7 


Pentane 

3.8 

0.34 

PS 

6.6 

0.95 

PMMA 

7.1 

1.05 

Fused silica 

6.6 

0.85 


More generally, approximate relations can be used to estimate the Hamaker constant for particles 1 and 2, 
suspended in a medium 3, such as 


^132 * WA\\ ~ VAii)WA 21 - yA.,3) (C2.6.5) 

where A u is the Hamaker constant for interaction of material i across a vacuum. Although the validity of such 
equations is limited, one interesting aspect of equation C2.6.5 is that van der Waals interactions between two 
suspended particles can be repulsive, when the suspending medium has a Hamaker constant intermediate between 
that of the two particles. 

C2.6.4.2 ELECTROSTATIC INTERACTIONS 

Particularly in polar solvents, electrostatic charges usually have an important contribution to the particle 
interactions. We will first discuss the ion distribution near a single surface, and then the effect on interactions 
between two colloidal particles. 

THE ELECTRICAL DOUBLE LAYER 

Here a few core equations are presented from the simplest theory for the electric double layer: the Gouy-Chapman 
theory [41]. We consider a solution of ions of valency z + and z_ in a medium with dielectric constant e. The ions 

are represented by point charges (they have no size) and it is assumed that the ions undergo rapid Brownian 
motion, and their average spatial distribution may be obtained through Boltzmann's distribution from the 
electrostatic potential. For simplicity, we restrict ourselves to symmetric electrolytes, z = z + = z_. We write the 
electrostatic potential <\> in dimensionless form as 

<t> = 

kT 
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where e is the elementary charge. A combination of Poisson's law and the Boltzmann distribution gives the 
Poisson-Boltzmann equation, which here takes the form 

V 2 * = * 2 sinh<t> (C2.6.6) 

and k is given as a function of the bulk electrolyte concentration c by 

(C2.6.7) 


J I^Na C^ 


V «u*r " 

This is an inverse length; k is known as the Debye screening length (or double layer thickness). As demonstrated 
below, it gives the length scale on which the ion distribution near a surface decays to the bulk value. Table C2.6.4 
gives a few numerical examples. 

Table C2.6.4 Debye screening length k _1 for aqueous solutions of a 1-1 electrolyte at 298 K (equation (C2.6.7)). 


c Q (mol dm- 3 ) K - 1 (nm) 

10" 5 97 


10" 3 
0.1 


9.7 
0.97 


Surfaces in polar solvents and particularly in water tend to be charged, through dissociation of surface groups or by 
adsorption of ions, resulting in a charge density a. Near a flat surface, ((> only depends on the distance x from the 
surface. The solution of equation (C2.6.6) then is 




(C2.6.8) 


where y = tanh(^O g ) and the subscript s denotes values at the surface. The corresponding surface charge density is 
given by 


«t = lyfletokl NaCq sinh -<tv. 


(C2.6.9) 


From O, the charge distribution can then be calculated using Boltzmann's distribution. An example of this is shown 
in 
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figure C2.6.3, which plots the distribution of counterions (of opposite sign to the charged surface) and co-ions (of 
the same sign as the surface). More detailed descriptions of the ionic distribution take into account the non-uniform 
packing of ions and molecules close to the surface. A significant potential drop may occur across this so-called 
Stern layer adjacent to the surface. The potential (|> d outside the Stern layer then enters the description of the diffuse 
double layer. In practice, (|> d is usually equated to the (^-potential, discussed in section C2.6.3.7 . 
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Figure C2.6.3. Distribution of positive and negative ions (c + , c_) near a flat surface in water at 298 K ( equation 

(C2.6.8) ). Parameters: z = 1, cp = 70 mV, c Q = 0.1 mol dm , corresponding to a = -0.068 Cm ( equation 
(C2.6.9)). The double layer thickness k = 0.96 nm is indicated. 


DOUBLE LAYER INTERACTIONS 

When describing the interactions between two charged flat plates in an electrolyte solution, equation (C2.6.6) 
cannot be solved analytically, so in the general case a numerical solution will have to be used. Several equations 
are available, however, to describe the behaviour in a number of limiting cases (see [ 41 ] for a detailed discussion). 
Here we present two limiting cases for the interactions between two charged spheres, surrounded by their 
counterions and added electrolyte, which will be referred to in further sections. This pair interaction V R is always 
repulsive in the theory discussed here. 

The first case is relevant in the discussion of colloid stability of section C2.6.5 . It uses the potential around a single 
sphere in the case of a double layer that is thin compared to the particle, Ka » 1. Furthermore, it is assumed that 
the surface separation is fairly large, such that exp(-K//) « 1, so the potential between two spheres can be 
calculated from the sum of single-sphere potentials. Under these conditions, V R is approximated by [42]: 
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V R = — — e"* H . (C2.6.10) 


Again, k is the length scale on which the interaction decays. 

In the second case, a thick double layer, Ka « 1 (low ionic strength), is assumed. When the surface potential is 
low, ® § « 1, a reasonable approximation is given by 

V R = 4w^^a 2 0l e 2 ™ . (C2.6.11) 


This r dependence is also known as a Yukawa potential. This type of potential has been used to describe the 
behaviour of latex suspensions at low ionic strength. 

More sophisticated approaches to describe double layer interactions have been developed more recently. Using cell 
models, the full Poisson-Boltzmann equation can be solved for ordered structures. The approach by Alexander et al 
shows how the effective colloidal particle charge saturates when the 'bare' particle charge is increased [43]. Using 
integral equation methods, the behaviour of the 'primitive model' has been studied, in which all the interactions 
between the colloidal macro-ions and the small ions are addressed (see, for instance, [44, 45]). 

C2.6.4.3 INTERACTIONS DUE TO SOLUBLE POLYMERS 

In many colloidal systems, both in practice and in model studies, soluble polymers are used to control the particle 
interactions and the suspension stability. Here we distinguish three scenarios: interactions between particles bearing 
a grafted polymer layer, forces due to the presence of non-adsorbing polymers in solution, and finally the 
interactions due to adsorbing polymer chains. Although these cases are discussed separately here, in practice more 
than one mechanism may be in operation for a given sample. 

STERIC STABILIZATION 

The first case concerns particles with polymer chains attached to their surfaces. This can be done using chemically 
(end-)grafted chains, as is often done in the study of model colloids. Alternatively, a block copolymer can be used, 
of which one of the blocks (the anchor group) adsorbs strongly to the particles. The polymer chains may vary from 
short alkane chains to high molecular weight polymers (see also section C2.6.2 ). The interactions between such 


'hairy' colloidal particles depend on many parameters, such as the nature of the polymer and the solvent, the 
molecular weight and grafting density. For theoretical approaches to describe the resulting behaviour, see [33, 46, 
47 ]. Here a few general observations are made. 

For so-called steric stabilization to be effective, the polymer needs to be attached to the particles at a sufficiently 
high surface coverage and a good solvent for the polymer needs to be used. Under such conditions, a fairly dense 
polymer brush with thickness L will be present around the particles. When two particles approach, such that r < d 
+ 2Z, the polymer layers may be compressed from their equilibrium configuration, thus causing a repulsive 
interaction. 
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Alternatively, the polymer layers may overlap, which increases the local polymer segment density, also resulting in 
a repulsive interaction. Particularly on close approach, r < d + Z, a steep repulsion is predicted to occur. When a 
relatively low molecular weight polymer is used, the repulsive interactions are rather short-ranged (compared to the 
particle size) and the particles display near hard-sphere behaviour (e.g., [11]). 

When the solvent quality is reduced, by changing the solvent composition or temperature, for instance, steric 
stabilization may not occur. On close approach, a repulsive interaction will still result, but for partial overlap of the 
polymer layers an attractive interaction may arise. For a number of systems, steric stabilization was found to fail, 
resulting in particle aggregation, at the 9 temperature of the polymer [ 46 ] (at the 9 temperature, the second osmotic 
virial coefficient of the polymer solution is zero; see also section C2.1 ). The attractions tend to be of short range 
compared to the particles themselves. The behaviour of such systems has been modelled using a narrow square 
well potential [48]. In the limit of a very narrow attraction range, Baxter's adhesive sphere ('sticky' sphere) 
potential [ 49 ] is obtained. Many authors have interpreted their observations by modelling the particle interactions 
using this potential. 

NON-ADSORBING POLYMER 

The second case involves non-adsorbing polymer chains in solution. It was realized by Asakura and Oosawa (AO) 
[50] and separately by Vrij [ 51 ] that these chains will give rise to an effective attraction between colloidal particles. 
This is known as depletion attraction (see figure C2.6.4 . We will summarize the AO theory to explain this. 



Figure C2.6.4. Graphical representation of the AO model. A depletion shell of thickness 8 surrounds each particle. 

The colloidal particles are represented by hard spheres with diameter d, and the polymer coils by spheres with 
radius 8. As a guide, 8 is often taken to be equal to R , the radius of gyration of the polymer. Polymer molecules 
are considered not to have any interaction. A polymer 'sphere', however, cannot overlap with a colloidal particle 
(the colloidal particles and polymer molecules behave as hard spheres towards each other). This means that a 
polymer coil cannot enter a sphere with radius a + 8, centred on a colloidal particle. In other words, there is a 
depletion shell of thickness 8 around each particle. If two particles approach each other, their depletion zones will 
overlap when r < d + 28. This gives rise to an osmotic pressure imbalance, which results in an effective attraction 
between the particles, given by 


V* p = -n^Vftrfap id < r < d i 2$) 


(C2.6.12) 


where EL is the polymer osmotic pressure and V , is the overlap volume of the excluded spheres of the two 
particles. It is given by 
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where 

is the polymer/colloid size ratio. Figure C2.6.5 shows examples of the shape of this potential. A few points are 
worth noting about this potential. First, although the net effect is an attraction between the colloids, this is the result 
of purely repulsive interactions. Second, this interaction can easily be tuned experimentally: the range (28) is set by 
the polymer size (molecular weight), whereas the strength can be adjusted by the polymer concentration (at low 
concentration, van 't Hoff s law can again be applied: IT = n JcT). Some results obtained in this way will be 
discussed in section C2.6.6.4 . For a more advanced discussion of depletion interactions, see [ 33 , 46 , 47 and 52]. 
The depletion picture also applies to other systems, such as mixtures of colloidal particles. However, whereas 
neglecting the interactions between polymer molecules may be reasonable, this cannot be done in the general case. 

ADSORBING POLYMER 

Finally, we briefly mention interactions due to adsorbing polymers. Block copolymers, with one block strongly 
adsorbing to the particles, have already been mentioned above. Here, we focus on homopolymers that adsorb 
moderately strongly to the particles. If this can be done such that a high surface coverage is achieved, the adsorbed 
polymer layer may again produce a steric stabilization between the particles. 



Figure C2.6.5. Examples of the AO potential, equation (C2.6.12) . The values of \ are indicated next to the curves. 
The hard-sphere repulsion at r = d has not been drawn. 
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At lower surface coverage, however, the possibility exists that one polymer chain may attach itself to two particles. 
If the adsorption is strong enough, this results in an aggregation of the particles, known as bridging flocculation 
[33, 46, and 47]. 


C2.6.5 COLLOID STABILITY AND AGGREGATION 

In this section we focus on the theory of stability of charged colloids. In section C2.6.5.1 it is shown how particles 
can be made to aggregate by adding sufficient electrolyte. The associated aggregation kinetics are discussed in 
section C2.6.5.2 , and the structure of the aggregates in section C2. 6. 5. 3 . For more details, see the recent reviews 
[53, 54 and 55], or the colloid science textbooks [33, 39 ]. 

C2.6.5.1 CHARGED PARTICLES 

In suspensions containing no soluble polymer, the van der Waals forces and electrostatic interactions are the main 
factors in controlling the particle interactions. The van der Waals interactions are often strong enough to cause an 
irreversible aggregation of the particles, unless a stabilizing mechanism is present. In polar solvents, particularly in 
water, the particle surfaces tend to be charged. At low salt concentration, the resulting double layer repulsions can 
be strong enough to prevent aggregation. 

Here we consider the total interaction between two charged particles in suspension, surrounded by their counterions 
and added electrolyte. This is the celebrated DLVO theory, derived independently by Derjaguin and Landau and by 
Verwey and Overbeek [41]. By combining the van der Waals interaction ( equation (C2.6.4) ) with the repulsion due 
to the electric double layers ( equation (C2.6.10) ), we obtain 


Vdlvo = Kw + Vr* 


(C2.6.13) 


For the repulsions, the approximate result of equation (C2.6.1Q) is appropriate for the typical conditions relevant 
for investigating the particle stability (k<2 » 1, k H~ 1) [33, 41 ]. Examples of the shape of this potential are shown 
in figure C2.6.6 . At short range, a deep attractive minimum (called the primary minimum) is found, due to the van 
der Waals attractions. At slightly larger separation, a repulsive maximum is present at low salt concentration. Once 
two particles have reached the primary minimum this tends to be irreversible. They may, however, be kinetically 
stabilized against reaching this minimum when the repulsive maximum is sufficiently high, V max » kT. 
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Figure C2.6.6. DLVO potential for gold spheres with A/kT =25, a =100 nm and O g = 1 in water, at a range of 
concentrations of 1-1 electrolyte (equations (C2.6.3) , (C2.6.10) and (C2.6.13) ). 


As can be seen in figure C2.6.6 the repulsive maximum is reduced at high ionic strength. A rapid and irreversible 
aggregation into the primary minimum (also referred to as coagulation) is expected to occur when the maximum 
has become sufficiently small. Over a narrow range of electrolyte concentrations, a transition occurs from kinetic 
stabilization to rapid aggregation, when the maximum is about zero. By solving equation (C2.6.13) for ^ DL vo = ^ 
and d ^dlvc/^ = ^' we obtain k// = 1, and the corresponding electrolyte concentration, known as the critical 
coagulation concentration (c.c.c), is given by 


_ 49.6y 4 /klW 


(C2.6.14) 


where / b = e lAwet^kTis the Bjerrum length. For the conditions used in figure C2.6.6 c ccc is calculated to be 1.3 
mmol dm -3 . 

In equation (C2.6.14) it can be seen that the required salt concentration depends strongly on the valency of the ions 

z. At high surface potential y — » 1, and c ccc <x z . This had been observed experimentally and is know as the 
Shulze-Hardy rule. This result was one of the early successes of DLVO theory. At low surface potential, however, 

the valency dependence is less pronounced, Qtc ^ ^ .In reality, the behaviour of higher valency ions can be 
rather complicated, for instance, they may adsorb to the particle surface and even change the sign of the surface 
charge. 
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At larger particle separation, a second minimum may occur in the potential energy. In many cases, this minimum is 
too shallow to be of much significance. For larger particles, however, the minimum may become of order kT. 
Aggregation in this minimum is referred to as secondary minimum flocculation. 


C2.6.5.2 AGGREGATION KINETICS 


For a more complete understanding of colloid stability, we need to address the kinetics of aggregation. The theory 
discussed here was developed to describe coagulation of charged colloids, but it does apply to other cases as well. 
First, we consider the case of so-called rapid coagulation, which means that two particles will aggregate as soon as 
they meet (at high salt concentration, for instance). This was considered by von Smoluchowski [56]; here we 
follow [39, 52]. 

It is assumed that irreversible aggregation occurs on contact. The rate of coagulation is expressed as the 
aggregation flux J of particles towards a central particle. Using a steady-state approximation, the diffusive flux is 
derived to be 

J = [6jTDtiu 

where D is again the diffusion coefficient of a single particle ( equation (C2.6.1) ) and n is the initial monomer 
concentration. As aggregation proceeds, a distribution of aggregate sizes (monomers, dimers, trimers, etc) is 
established, which evolves in time. This is described by 

n t = p , t (C2.6.15) 

where n. denotes the number density of z-mers at time t. The half-life time of aggregation t , after which the total 

i p 

number of aggregates has halved, is given as a function of the volume fraction § by 

<P = 7TT- (C2.6.16) 

In table C2.6.5, a few numerical examples for t are shown. Smaller colloids are found to aggregate much faster 
and stabilizing them is therefore more difficult. The validity of equation (C2.6.15) has been confirmed 
experimentally (e.g. [58]). 

Table C2.6.5 Rapid coagulation half-life time for particles in water at r=300 K (equation (C2.6.16)). 


a 

4 = 10" 5 

(|> = 0.1 

100 nm 

76 s 

8 ms 

1 \xm 

21 h 

8s 
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The second case concerns situations where not all particle encounters result in aggregation. This is known as slow 
coagulation. This was addressed first by Fuchs [59]; again we follow [ 39 , 57 ]. 

In slow coagulation, particles have to diffuse over an energy barrier (see the previous section) in order to aggregate. 
As a result, not all Brownian particle encounters result in aggregation. This is expressed using the stability ratio W, 
defined as 

W = — (C2.6.17) 


where J^ denotes the rapid coagulation rate. By solving the diffusion equations under steady-state conditions, it was 
found that 


Ju r 2 


(C2.6.18) 


Because of the exponential term, Wis mainly determined by the potential energy maximum V m , and it can be 
approximated as [ 57 ] 

W&—mp{V m ,JkT). (C2.6.19) 

A combination of equation (C2.6.13) , equation (C2.6.14) , equation (C2.6.15) , equation (C2.6.16) , equation 
(C2.6.17), equation (C2.6.18) and equation (C2.6.19) then allows us to estimate how low the electrolyte 
concentration needs to be to provide kinetic stability for a desired length of time. This theory successfully accounts 
for a number of observations on slowly aggregating systems, but two discrepancies are found (see, for instance, 
[33]). First, the observed dependence of stability ratio on salt concentration tends to be much weaker than 
predicted. Second, the variation of the stability ratio with particle size is not reproduced experimentally. Recently, 
however, it was reported that for model particles with a low surface charge, where the DLVO theory is expected to 
hold, the aggregation kinetics do agree with the theoretical predictions (see [60], and references therein). 

C2.6.5.3 AGGREGATE STRUCTURE 

Although the theories of colloid stability and aggregation kinetics were developed several decades ago, the actual 
structure of aggregates has only been studied more recently. To describe the structure, we start with the relationship 
between the size of an aggregate (linear dimension), expressed as its radius of gyration R and its mass m: 

m a Rf. 

For compact, homogeneous objects in three dimensions, rf f = 3. Colloidal aggregates, however, tend to be rather 
open, fractal structures, with <i f < 3. For a general introduction to fractals, see section C3. 6 and [61]. 
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First, we consider the case of rapid, irreversible aggregation. In the literature on fractal aggregates, this is known as 
diffusion limited cluster aggregation (DLCA), where particles (monomers) and also the aggregates diffuse and 
aggregate when any two of them meet. Computer simulations predicted rather open structures with <i f ~ 1.8 under 
these conditions [62, 63 and 64], and experiments have confirmed this (figure C2.6.7) [65]. Various methods exist 
for measuring <i f . For instance, scattering (light, x-ray) experiments yield <i f from the variation of the scattered 
intensity / with wavevector Q, as 

/ a Q- <tl . 

This relationship holds for wavevectors that probe the appropriate size range, l/R p <Q<\la (see section B 1.10 ). 



500 nm 


Figure C2.6.7. Fractal aggregate of gold particles with a = 1.2± 0.7 nm, obtained under DLCA conditions, with <i f 
= 1.74 (reproduced with permission from [65]. Copyright 1984 Elsevier Science Publishers B.V). 

More compact structures are obtained in the slow coagulation regime. Here aggregation is still irreversible but not 
every collision results in aggregation, and the clusters have more time to explore the available space. This is known 
as reaction limited cluster aggregation (RLCA). Here computer simulations predicted <i f « 2.1 [66], which was 
again confirmed experimentally [67]. The DLCA and RLCA regimes have been observed for a range of colloidal 
systems [64]- Only when the particle bonds are sufficiently weak that restructuring of a cluster can occur, are more 
dense clusters with rf f «3 obtained (see, for instance, [54]). 

Although this section has focused on the behaviour of charged particles, similar phenomena may be observed using 
sterically stabilized particles. As discussed in section C2.6.4 , these can also be given strong, short-ranged 
attractions, by changing the solvent quality or by adding non-adsorbing polymers. A similar aggregation behaviour 
to the charged spheres may then be observed | 
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C2.6.6 BEHAVIOUR OF CONCENTRATED SUSPENSIONS 

In the previous section, non-equilibrium behaviour was discussed, which is observed for particles with a deep 
minimum in the particle interactions at contact. In this final section, some examples of equilibrium phase behaviour 
in concentrated colloidal suspensions will be presented. Here we are concerned with purely repulsive particles 
(hard or soft spheres), or with particles with attractions of moderate strength and range (colloid-polymer and 
colloid-colloid mixtures). Although we shall focus mainly on equilibrium aspects, a few comments will be made 
about the associated kinetics as well [69, 70 ]. 

C2.6.6.1 COLLOIDAL CRYSTALS 


One of the intriguing and beautiful properties of suspensions of well defined colloidal particles is their ability to 
order into a regular crystal lattice, called a colloidal crystal. The lattice spacing in colloidal crystals is set by the 
particle size and tends to be similar to the wavelength of light. Therefore, Bragg scattering (iridescence) can be 
observed using light (see section B 1.9 ). Examples of this were found first in nature. For instance, tipula iridescent 
virus (TIV) particles were observed to assume face centred cubic (fee) stackings [71], and opals are fossilized 
colloidal crystals consisting of silica [72]. For further background, see [69, 73, 74]. Colloidal crystals are used as 
model systems to study the freezing transition. Because of their optical properties, they are also being investigated 
for potential applications such as optical rejection filters (for instance, [75]). 


C2.6.6.2 HARD SPHERES 

Hard spheres are perhaps the simplest system to undergo a freezing transition. Freezing of hard spheres was 
observed using computer simulations [76]. The freezing and melting densities were found to be (|) F = 0.49 and (|) M = 
0.55 [77]. The stable crystal structure is fee (see [78] and references therein). So, although this may seem counter 
intuitive, no particle attractions are needed for a freezing transition to occur — at sufficiently high density 
(pressure), this will also occur for particles with purely repulsive interactions. The phase behaviour of hard spheres 
is summarized in figure C2.6.8 . 
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Figure C2.6.8. Phase diagram of hard spheres (see text for details) 

Experimentally, the hard-sphere phase transition was observed using non-aqueous polymer lattices [79, 80 ]. 
Samples are prepared, brought into the fluid state by tumbling and then left to stand. Depending on particle size and 
concentration, colloidal crystals then form on a time scale from minutes to days. Experimentally, there is always 
some uncertainty in the actual volume fraction. Often the concentrations are therefore rescaled so freezing occurs at 
(|> F = 0.49. The width of the coexistence region agrees well with simulations [ 11 , 80 ]. 

On further increasing the concentration, the glass transition at (|> G « 0.58 is reached [81]. At this concentration, the 
overall structure is arrested and particles can only undergo local diffusive motion. In other words, the sample is not 
ergodic anymore, as can be shown using dynamic light scattering, where the intermediate scattering function does 
not relax to zero (see section B 1.10 ). The hard-sphere glass serves as a model to understand the glass transition in 
simple liquids (see also section C2.15 Disordered Systems). 

Samples can be concentrated beyond the glass transition. If this is done quickly enough to prevent crystallization, 
this ultimately leads to a random close-packed structure, with a volume fraction § ~ 0.64. Close-packed 
structures, such as fee, have a maximum packing density of (b = 0.74. The crystallization kinetics are strongly 
concentration dependent. The nucleation rate is fastest near the melting concentration. On increasing concentration, 
the nucleation process is arrested. This has been found to occur at the glass transition [82]. 


The formation of colloidal crystals requires particles that are fairly monodisperse — experimentally, hard sphere 
crystals are only observed to form in samples with a polydispersity below about 0.08 [69]. Using computer 


simulations, a maximum polydispersity for the solid phase of 0.06 was predicted [83]. 
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C2.6.6.3 SOFT SPHERES 


Charged particles in polar solvents have soft-repulsive interactions (see section C2.6.4 ). Just as hard spheres, such 
particles also undergo an ordering transition. Important differences, however, are that the transition takes place at 
(much) lower particle volume fractions, and at low ionic strength (low k) the solid phase may be body centred 
cubic (bcc), rather than the more compact fee structure (see [69, 73, 84])- For the interactions, a Yukawa potential 
( equation (C2.6.1 1) ) is often used. The phase diagram for the Yukawa potential was calculated using computer 
simulations by Robbins et al [85]. 


We will focus on one experimental study here. Monovoukas and Gast studied polystyrene particles with a = 61 nm 
in potassium chloride solutions [86]- They obtained a very good agreement between their observations and the 
predicted Yukawa phase diagram (see figure C2.6.9). In order to make the comparison they rescaled the particle 
charges according to Alexander et al [43] (see also [87]). At high electrolyte concentrations, the particle 
interactions tend to hard-sphere behaviour (see section C2.6.4 ) and the phase transition shifts to volume fractions 
around 0.5 
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Figure C2.6.9. Phase diagram of charged colloidal particles. The solid lines are predictions by Robbins et al [85]. 
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n , and X = kx (reproduced with permission from [86]. Copyright 1989 Academic 


In extensively deionized suspensions, there are experimental indications for effective attractions between particles, 
such as long-lived void structures [ 89 ] and attractions between particles confined between charged walls [90]. 
Nevertheless, under these conditions the DLVO theory does seem to describe interactions of isolated particles at 
the pair level correctly [90]. It may be possible to explain the experimental observations by taking into account 
explicitly the degrees of freedom of both the colloidal particles and the small ions [91, 92 ]. 
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C2.6.6.4 COLLOID-POLYMER MIXTURES 

In section C2. 6.4.3 it was shown how the addition of non-adsorbing polymer chains induces a depletion attraction 
between colloidal particles. If sufficient polymer is added, these attractions can be strong enough to induce a phase 
separation of the colloidal particles. An early application of this was the creaming of rubber latex [93]. 

Much later, experiments on model colloids revealed that the addition of polymer may either induce a gas-liquid 
type phase separation or a fluid-solid transition [94, 95, 96 and 97]. Using perturbation theories, these observations 
could be accounted for quite well [97, 98 ]. 

At equilibrium, in order to achieve equality of chemical potentials, not only the colloid but also the polymer 
concentrations in the different phases are different. We focus here on a theory that allows for this polymer 
partitioning [99]. Predictions for two polymer/colloid size ratios are shown in figure C2.6.10. A liquid phase is 
predicted to occur only when the range of attractions is not too small compared to the particle size, 8/a > 0.3. Under 
these conditions a phase behaviour is obtained that is similar to that of simple liquids, such as argon. Because of the 
polymer partitioning, however, there is a three-phase triangle (rather than a triple point). For smaller polymer 
(narrower attractions), the gas-liquid transition becomes metastable with respect to the fluid-crystal transition. 
These predictions were confirmed experimentally [ 100 ]. The phase boundaries were predicted semi-quantitatively. 
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Figure C2.6.10. Phase diagram of colloid-polymer mixtures: polymer coil volume fraction i^^^nd^n vs particle 
volume fraction <\>. (a) Narrow attractions, 8/a = 0.1. Only a fluid-crystal transition is present. Tie lines indicate 
coexisting phases, (b) Longer range attractions, 8/a = 0.4. Gas, liquid and crystal phases (G, L and C) are present, 
as well as a critical point (CP). The three-phase triangle is shaded (reproduced with permission from [99]. 
Copyright 1992 EDP Sciences). 

In practice, colloidal systems do not always reach the predicted equilibrium state, which is observed here for the 
case of narrow attractions. On increasing the polymer concentration, a fluid-crystal phase separation may be 
induced, but at higher concentration crystallization is arrested and amorphous gels have been found to form instead 
[ 101 , 102 ]. Close to the phase boundary, transient gels were observed, in which phase separation proceeded after a 
lag time. 
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The behaviour of these systems is similar to that of suspensions in which short-range attractions are induced by 
changing solvent quality for sterically stabilized particles (e.g. [ 103 ]). Another case in which narrow attractions 
arise is that of solutions of globular proteins. These crystallize only in a narrow range of concentrations [ 104 ]. 


C2.6.6.5 MIXTURES OF HARD SPHERES 


As shown in section C2.6.6.2 , hard-sphere suspensions already show a rich phase behaviour. This is even more the 
case when binary mixtures of hard spheres are considered. First, we will mention the case of moderate size ratios, 
around 0.6. At low concentrations these form a mixed fluid phase. On increasing the overall concentration of 
mixtures, however, binary crystals of type AB 2 and AB 13 were observed (where A represents the larger spheres), in 
addition to pure A or B crystals [105, 106 ]. An example of an AB 2 structure is shown in figure C2.6.1 1. Computer 
simulations confirmed the thermodynamic stability of the structures that were observed [ 107 , 108 ]. 



Figure C2.6.11. SEM of AB 2 structure, formed in aqueous mixtures of PS latex spheres with a = 68 nm and a = 
264 nm (courtesy of Prof R H Ottewill). 

A second case to be considered is that of mixtures with a small size ratio, <0.2. For a long time it was believed that 
such mixtures would not show any instability in the fluid phase, but such an instability was predicted by Biben and 
Hansen [ 109 ]. This can be understood to be as a result of depletion interactions, exerted on the large spheres by the 
small spheres (see section C2.6.4.3 ). Experimentally, such mixtures were indeed found to display an instability 
[ 110 ]. The gas-liquid transition does, however, seem to be metastable with respect to the fluid-crystal transition 
[ 111 , 112 ]. This was confirmed by computer simulations [ 113 ]. 

C2.6.6.6 NON-SPHERICAL COLLOIDS 

Other possibilities for observing phase transitions are offered by suspensions of non-spherical particles. Such 
systems can display liquid crystalline phases, in addition to the isotropic liquid and crystalline phases (see also 
section C2.2 ). First, we consider rod-like particles (see [ 114 , 115 ], and references therein). As shown by Onsager 
[ 116 , 117 ], sufficiently elongated particles will display a nematic phase, in which the particles have a tendency to 
align parallel to 
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each other. Several experimental systems display this Onsager transition [ 114 ]. 

Hard spherocylinders (cylinders with hemispherical end caps) were studied using computer simulations [ 118 ]. In 
addition to a nematic phase, such particles also display a smectic-A phase, in which the particles are arranged in 
liquid-like layers. To observe this transition, rather monodisperse particles are needed. The smectic-A phase was 
indeed observed in suspensions of TMV particles [17]. 

Disc-like particles can also undergo an Onsager transition — here the particles form a discotic nematic, where the 
short particle axes tend to be oriented parallel to each other. In practice, clay suspensions tend to display sol-gel 
transitions, without a clear tendency towards nematic ordering (for instance, [22]). Using sterically stabilized 
platelets, an isotropic-nematic transition could be observed [ 119 ]. 
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C2.7 Catalysis 

Bruce C Gates 


C2.7.1 INTRODUCTION 

The idea of a catalyst is one of the most fascinating and significant in science and the word is one of the few that 
have carried over broadly from scientific into nonscientific language. A catalyst speeds up a chemical reaction 
without being consumed substantially — the occurrence of a reaction accelerated by a catalyst is called catalysis. At 
first, one might think that catalysis seems too good to be true, but the principles are well understood; a catalyst 
works by forming chemical bonds with reactants, generating intermediates that react more readily to give products 
than the reactants would alone — and giving back the catalyst. A catalyst affects the rate of approach to equilibrium 
of a reaction but not the position of the equilibrium. Catalysts provide subtle control of chemical conversions; a 
good catalyst increases the rate of a desired reaction but not the rates of undesired side reactions. Catalysis is 
ubiquitous in biology and technology and is the key to the efficiency of most chemical conversions. Only 
temperature provides a comparable means for increasing reaction rates, but high temperatures are often 
unacceptable — for example, because they harm biological organisms; high temperatures in chemical technology 
often mean high costs, e.g., because reaction in a liquid at high temperature requires a high pressure to maintain the 
liquid state, and high-pressure equipment is expensive. 

The activity of a catalyst is a measure (e.g., a rate or a rate constant) of how fast a catalytic reaction occurs under 
some standard conditions. Activities vary from catalyst to catalyst and depend on the variables that influence 
reaction rates (temperature, reactant concentrations etc). The selectivity of a catalyst is a measure of how fast it 
causes one reaction to proceed relative to others; selectivity might be defined, e.g., as the rate of formation of a 
desired product divided by the rate of formation of all products under some standard conditions. The stability of a 
catalyst accounts for how fast it loses activity during operation; the ideal catalyst is infinitely stable, but real 
catalysts undergo changes causing loss of activity and selectivity, and they must be replaced or regenerated (at 
intervals ranging from seconds to years). The regener ability of a technological catalyst is a measure of how well it 
responds to treatments to bring back its activity and selectivity after deactivation; many solid catalysts are 
regenerated by burning off carbonaceous deposits formed during operation with organic reactants. 

Biology and chemical technology as we know them are hardly imaginable without catalysis. Almost all biological 
reactions are catalytic and the number of biological catalysts is huge. Most large-scale technological reactions are 
also catalytic, generating products valued in trillions of dollars annually — roughly two orders of magnitude more 
than the annual cost of the catalyst purchase and replacement. Products of catalytic technology (with examples) 
include fuels (gasoline), polymeric materials (polypropylene), clothing (nylon), pharmaceuticals (pain relievers), 
foods (hydrogenated fats), solvents (methanol) and chemicals (sulphuric acid). Catalysis is also essential to the 


minimization of environmental pollutants, e.g., by conversion of automobile and power-plant emissions (e.g., CO 
and NO ). By converting these harmful emissions into benign gases (e.g., C0 2 and N 2 ) and by reducing the 
production of harmful byproducts of chemical processes that had earlier been dumped, catalysis has dramatically 
improved the quality of the earth's air and water. 
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Catalysis spans chemistry, chemical engineering, materials science and biology. The goal here is to enliven the 
subject with diverse examples showing the microscopic details of catalysis. 


C2.7.2 CLASSIFICATION OF CATALYSTS AND CATALYSIS 

Catalysis in a single fluid phase (liquid, gas or supercritical fluid) is called homogeneous catalysis because the 
phase in which it occurs is relatively uniform or homogeneous. The catalyst may be molecular or ionic. Catalysis at 
an interface (usually a solid surface) is called heterogeneous catalysis', an implication of this term is that more than 
one phase is present in the reactor, and the reactants are usually concentrated in a fluid phase in contact with the 
catalyst, e.g., a gas in contact with a solid. Most catalysts used in the largest technological processes are solids. The 
term catalytic site (or active site) describes the groups on the surface to which reactants bond for catalysis to occur; 
the identities of the catalytic sites are often unknown because most solid surfaces are nonuniform in structure and 
composition and difficult to characterize well, and the active sites often constitute a small minority of the surface 
sites. 

Most biological catalysts are enzymes, i.e., proteins, which are macromolecules (polypeptides) formed by 
biopolymerization of amino acids (with elimination of water); some enzymes are huge, with hundreds of monomer 
units. The 20 amino acid monomers occurring in nature, 

II 
H,N-C-COOH 


have R groups including, e.g. H (in glycine), CH 2 OH (in serine), CH 2 COOH (in aspartic acid), CH 2 CH 2 COOH (in 
glutamic acid) and CH 2 (CH 2 ) 3 NH 2 (in lysine). Some of the R groups are ligands for metal ions (e.g., Zn, Fe, Cu, 
Mo and Co) that play catalytic roles. Acidic, basic and metal groups typically constitute the active sites of enzymes. 
The structures of a number of enzymes are known from x-ray crystallography. The polymer chains are usually 
folded to give a precise juxtaposition of the active sites on the interior surface of a clamlike structure. It is not a 
straightforward matter to classify enzyme catalysis as simple homogeneous or heterogeneous catalysis, because, in 
biological cells, enzymes exist both in solution (e.g., in cytoplasm) and within membranes — some enzymes are 
even arranged in assembly-line fashion so that the product molecules from one pass directly to the next. 

The subject of catalysis has evolved with little integration of homogeneous, heterogeneous, and biological 
catalysis, as is reflected in the general references cited in the further reading section. 
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C2.7.3 A BIT OF HISTORY— THE AMMONIA SYNTHESIS REACTION 

A definition of catalysis similar to that given above was stated first in about 1895 by Wilhelm Ostwald, whose 
work on catalysis was recognized with a Nobel prize. Sixty years before, Jakob Berzelius had coined the term 


'catalysis', recognizing that a single concept could account for changes in compositions of numerous substances 
resulting from their mere contact with liquids, solids or 'ferments'. Berzelius's insight bears on phenomena that 
had earlier vexed the alchemists, who, aware of the then mysterious actions of these 'ferments' and other 
substances ('contacts'), sought vainly for a philosopher's stone to change base metals into gold. 

Ostwald's definition of catalysis rests on reaction kinetics and, indeed, at about the time he stated it, the beginnings 
of physical chemistry were emerging in the quantitative representation of the thermodynamics and kinetics of 
chemical reactions. The first concepts of reaction thermodynamics and kinetics came into focus in the early 1900s 
in work by Nernst and by Fritz Haber and coworkers; Haber carried out research motivated by the goal of 
synthesizing ammonia on a large scale from nitrogen and hydrogen, to produce fertilizers — and also explosives. 
The ammonia synthesis reaction, 


N 2 + 3H 2 -+ 2NH 3 (C2.7.1) 

which is still of great technological importance, takes place at almost negligibly low rates, except in the presence of 
a catalyst, and the reaction is strongly equilibrium limited. As the reaction is exothermic, the equilibrium 
conversion decreases with increasing temperature, so that the advantage of increasing the rate by increasing 
temperature is offset by a decrease in the attainable (equilibrium) conversion. Thus, an advantage of a highly active 
catalyst such as iron or ruthenium is that by increasing the reaction rate, the catalyst lowers the temperature at 
which the reaction can practically be carried out, thereby increasing the attainable conversion. Haber' s 
understanding of the interplay of thermodynamics and kinetics was pivotal to the development of the early 
concepts of physical chemistry; Haber's Nobel prize is one of several that recognize research in catalysis. 

Figure C2.7.1 is a potential energy diagram for the ammonia synthesis reaction taking place on the surface of an 
iron catalyst [1]. The reaction proceeds via chemisorbed intermediates (i.e., those chemically bonded to the iron 
surface) N, H, NH and NH 2 , and the energy barriers shown in figure C2.7.1 are much lower than those that would 
pertain if the intermediates were N, H, NH and NH 2 in the gas phase; thus, figure C2.7.1 shows why iron is a good 
ammonia synthesis catalyst (and a good ammonia decomposition catalyst). Notice that by causing the reaction to 
proceed via steps whereby chemisorbed H combines with N, NH and then NH 2 , the catalyst provides a sequence 
with each elementary step characterized by only a moderately high activation energy barrier; notice also that iron's 
ability to cause dissociation of H 2 and N 2 , leading to the formation of the chemisorbed atomic species, allows the 
formation of intermediates that are not too stable (not too low in energy) — if they were much more stable, then the 
barriers would be higher, the overall reaction not so fast and the catalyst not so active. Good catalysts generally 
provide efficient pathways for both the formation and conversion of highly reactive intermediates. 
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Figure C2.7.1. Schematic potential energy diagram for the catalytic synthesis and decomposition of ammonia on 
iron. The energies are in kJ mol ; the subscript 'ads' refers to species adsorbed on iron [1]. 

The ammonia synthesis reaction provides another important lesson; it illustrates how fundamental science 
developed from research that was motivated by a technological need. Catalysis is one of the most essential and 
enduring enabling technologies, and it has been pulling researchers into unexplored territory from the beginning, 
with no end in sight. For example, much of ultrahigh- vacuum surface science ( section Al. 7 ) has emerged from 
work directed toward understanding of catalysis by solids and much of organometallic chemistry has emerged from 
work directed toward understanding of catalysis by transition metal compounds in solution. 


C2.7.4 CATALYTIC CYCLES 

Because a good catalyst is not consumed to a significant degree as it functions, catalysis is a cyclic process, and 
compact representations of catalysis are cycles that show the various intermediate species, illustrated by the 
following simple example, where C is the catalyst, R the reactant, P the product and RC the intermediate: 
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(using figure C2.7.1 can you write a cycle for the ammonia synthesis reaction?). 

A well-understood catalytic cycle is that of the Wilkinson alkene hydrogenation (figure C2.7.2) [2]. Like most 
catalytic cycles, that shown in figure C2.7.2 is complex, involving intermediate species in the cycle (inside the 
dashed line) and other species outside the cycle and in dead-end paths. Knowledge of all but a small number of 
catalytic cycles is only fragmentary because of the complexity and because, if the catalyst is active, the cycle turns 
over rapidly and the concentrations of the intermediates are minute; thus, these intermediates are often not even 


identified. Typically, as understanding develops, a more complex model emerges — i.e., a cycle with more 
intermediates. Determination of the important intermediates and quantitative representation of the kinetics and 
thermodynamics of the separate elementary reactions (e.g., figure C2.7.2 ) are often daunting tasks requiring the 
concerted application of various physical/chemical techniques. The challenges have helped to motivate the 
development of more and more sensitive spectroscopic methods, transient kinetics methods and so forth, some of 
which are mentioned below and described elsewhere in this encyclopedia. 
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Figure C2.7.2. Catalytic cycle (within dashed lines) for the Wilkinson hydrogenation of alkene [2]. Values of rate 
and equilibrium constants are given in [2] 

An important point about kinetics of cyclic reactions is that if an overall reaction proceeds via a sequence of 
elementary steps in a cycle (e.g., figure C2.7.2), some of these steps may be equilibrium limited so that they can 
proceed at most to only minute conversions. Nevertheless, if a step subsequent to one that is so limited is 
characterized by a large enough rate constant, then the equilibrium-limited step may still be fast enough for the 
overall cycle to proceed rapidly. Thus, the step following an equilibrium-limited step in the cycle pulls the cycle 
along — it drains the intermediate that can form in only a low concentration because of an equilibrium limitation 
and allows the overall reaction (the cycle) to proceed rapidly. A good catalyst accelerates the steps that most need a 
boost. 
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C2.7.5 MACROSCOPIC PHYSICAL PROPERTIES OF CATALYSTS 

C2.7.5.1 SOLUBLE CATALYSTS 

If a catalyst is to work well in solution, it (and the reactants) must be sufficiently soluble and stable. Most polar 
catalysts (e.g., acids and bases) are used in water and most organometallic catalysts (compounds of metals with 
organic ligands bonded to them) are used in organic solvents. Some enzymes function in aqueous biological 
solutions, with their solubilities determined by the polar functional groups (R groups) on their outer surfaces. 


A solution containing both reactants and a catalyst may be mixed mechanically to bring the constituents into 
efficient contact — otherwise, the rate of the catalytic reaction would be affected by mass transport (e.g., diffusion) 


and thus the reaction would proceed more slowly than in the absence of significant concentration gradients. 

In technology, an economic separation of the products of a reaction from the solution containing the catalyst is 
necessary. Distillation is a commonly used method and, for it to work successfully, the products and catalyst must 
be stable at the temperatures of the distillation, which are often relatively high; some organometallic compounds, 
for example, may not meet this criterion. 

As the separation is often an expensive part of a process, simplifications may be valuable. For example, in a 
process for hydro formylation of propene, 


CHjCH=CH2 + CO + H 2 -* CHjCHjCHjCHO 


(C2.7.2) 


the reaction is carried out in a mixed reactor with a gas and two separate liquid phases, one organic, containing 
most of the product, and the other aqueous, containing almost all the catalyst, which is an organometallic 
compound of rhodium with phosphine ligands that are sulphonated to make them water soluble. The separation of 
the product from the catalyst is simple and economical [3] ( figure C2.7.3 ): product liquid flows continuously to a 
tank (comparable to a separatory funnel) where it settles into two layers: one organic, containing the product; the 
other aqueous, containing the catalyst. The aqueous stream is recycled to the reactor so that the catalyst is 
continuously reused; thus, the process as well as the reaction is cyclic. 
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Figure C2.7.3. Process flow diagram for hydroformylation of propene; 1, reactor; 2, separator; 3, phase separator; 
4, stripping column; 5, heat exchanger. 

C2.7.5.2 SOLID CATALYSTS 

Macroscopic properties often influence the performance of solid catalysts, which are used in reactors that may 
simply be tubes packed with catalyst in the form of particles — chosen because gases or liquids flow through a bed 
of them (usually continuously) with little resistance (little pressure drop). Catalysts in the form of honeycombs 
(monoliths) are used in automobile exhaust systems so that a stream of reactant gases flows with little resistance 
through the channels and heat from the exothermic reactions (e.g., CO oxidation to C0 2 ) is rapidly removed. 


Efficient use of a catalyst requires high rates of reaction per unit volume and, since reaction takes place on the 
surface of a solid, catalysts have high surface areas per unit volume. Therefore, the typical catalyst is porous, with 

an internal surface area often exceeding 100 m g . Porous materials often consist of aggregates of nonporous 
(crystalline) microparticles, with the void spaces between them constituting a labyrinth of internal pores with 


diameters roughly equal to those of the microparticles; reaction takes place on the microparticle surfaces. The pores 
may have average diameters <2.0 nm (micropores) because these imply high surface areas per unit volume, but 
larger pores are also common. 

Because catalysts lose activity and need to be regenerated and replaced periodically, they must be robust enough to 
withstand these processes. Catalyst particles used in large reactors must be strong enough to resist crushing under 
the weight of the particles above them. Some catalyst particles are used in entrained and fluidized bed reactors, 
where they are in constant motion (for rapid heat transfer), and they must be resistant to abrasion. Many catalysts 
must withstand use at high temperatures (-800 K). 

Physical properties affecting catalyst performance include the surface area, pore volume and pore size distribution 
( section B 1.26 ). These properties regulate the tradeoff between the rate of the catalytic reaction on the internal 
surface and the rate of transport (e.g., by diffusion) of the reactant molecules into the pores and the product 
molecules out of the pores: the higher the internal area of the catalytic material per unit volume, the higher the rate 
of the reaction 
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per unit volume — up to a limit; beyond the limit, an increase in internal surface area requires such small 
microparticles (and hence such small pores) that the restrictions of the pores limit the rate of transport through them 
and lead to the existence of significant concentration gradients within the catalyst particles. Thus, the reactant 
concentration becomes less at the particle interior than at the edge, and the overall reaction rate is reduced and no 
longer proportional to the internal surface area. If internal area is gained at the expense of increases in pore volume, 
a limit is reached at which the material no longer meets the crush strength requirement. 

Catalyst particles are usually cylindrical in shape because it is convenient and economical to form them by 
extrusion — like spaghetti. Other shapes may be dictated by the need to minimize the resistance to transport of 
reactants and products in the pores; thus, the goal may be to have a high ratio of external (peripheral) surface area 
to particle volume and to minimize the average distance from the outside surface to the particle centre, without 
having particles that are so small that the pressure drop of reactants flowing through the reactor will be excessive. 

C2.7.5.3 CONSTITUENTS OF SOLID CATALYSTS 

A solid catalyst is usually a composite, consisting of a material called a support or carrier (which often lacks 
catalytic activity) and other components, including those with catalytic activity, and perhaps still others, called 
promoters. The support is usually the principal component, sometimes being 99% or more of the catalyst mass. 
Thus, the physical properties are largely determined by the support. Supports are usually ceramic materials ( section 
C2.12 ), the most common being transition aluminas such as y-Al 2 3 . Others include silica (Si0 2 ), carbon and 
zeolites. Transition aluminas offer the advantages of being inexpensive, robust, stable and formable into particles 
with wide ranges of shapes, internal surface areas and pore size distributions. The typical catalyst incorporates 
small amounts (e.g., 1 wt%) of catalytically active components (e.g., metals, metal oxides or metal sulphides) on 
the internal surface of the support. Since these components are often expensive, they are dispersed as small 
particles on the support surface; for example, particles of Pt on A1 2 3 may be less than 1 nm in diameter, so small 
that almost all the Pt atoms are exposed at a surface where reactant molecules can bond to them and catalysis can 
occur. Supported metal catalysts have been applied for decades and are among the first nanomaterials ( section 
CI. 2 ) to find industrial applications. 

The components in catalysts called promoters lack significant catalytic activity themselves, but they improve a 
catalyst by making it more active, selective, or stable. A chemical promoter is used in minute amounts (e.g., parts 
per million) and affects the chemistry of the catalysis by influencing or being part of the catalytic sites. A textural 
(structural) promoter, on the other hand, is used in massive amounts and usually plays a role such as stabilization 
of the catalyst, for instance, by reducing the tendency of the porous material to collapse or sinter and lose internal 
surface area, which is a mechanism of deactivation. 


C2.7.6 EXAMPLES OF CATALYSIS 

C2.7.6.1 WILKINSON HYDROGENATION OF ALKENES CATALYSED BY A RHODIUM COMPLEX 

The hydrogenation of alkenes catalysed by organometallic compounds is exemplified by the Wilkinson catalyst of 
figure C2.7.2 invented by the Nobel-prize- winning chemist Geoffrey Wilkinson. The cycle of figure C2.7.2 
elucidated by the group of Halpern [4, 5 and 6], has a number of characteristics that are so common that it can be 
regarded as 
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a prototype. The rhodium compounds (complexes) that react with the H 2 and alkene are coordinatively unsaturated 
and their reactions with these reactants lead to dissociation of the reactants (which become ligands bonded to the 
metal). The ligands that are formed, H and alkyl, combine with each other while bonded to the Rh, leading to the 
formation of the alkane product and regeneration of the catalyst (closure of the cycle). Thus, the role of the Rh 
centres in the Wilkinson catalyst is similar to that of the iron sites on the surface of the ammonia synthesis catalyst 
in that they both cause dissociation of the reactants and position the resultant ligands so that they combine with 
each other to give the product. This generalization is often valid for metals in catalysts, whether they are in 
compounds in solution, on metal surfaces or on surfaces of metal oxides or sulphides, for example. 

There is more to the Wilkinson hydrogenation mechanism than the cycle itself; a number of species in the cycle are 
drained away by reaction to form species outside the cycle. Thus, for example, PPh 3 (Ph is phenyl) drains rhodium 
from the cycle and thus it inhibits the catalytic reaction (slows it down). However, PPh 3 plays another, essential 
role — it is part of the catalytically active species and, as an electron-donor ligand, it affects the reactivities of the 
intermediates in the cycle in such a way that they react rapidly and lead to catalysis. Thus, there is a tradeoff that 
implies an optimum ratio of PPh 3 to Rh. 

When a strong electron-donor ligand such as pyridine is added to the reaction mixture, it can bond so strongly to 
the Rh that it essentially drains off all the Rh and shuts down the cycle; it is called a catalyst poison. A poison for 
many catalysts is CO; it works as a physiological poison in essentially the same way as it works as a catalyst 
poison: it bonds to the iron sites of haemoglobin in competition with O r 

The reactivities of the species within the Wilkinson cycle are so great that they are not observed directly during the 
catalytic reaction; rather, they are present in a delicate dynamic balance during the catalysis in concentrations too 
low to observe easily, and only the more stable species outside the cycle (outside the dashed line in figure C2.7.2 
are the ones observed. Obviously it was no simple matter to elucidate this cycle; the research required piecing it 
together from observations of kinetics and equilibria under conditions chosen so that sometimes the cycle 
proceeded slowly or not at all. 

Techniques such as NMR spectroscopy ( section B 1.1 2 ) and IR spectroscopy ( section B 1.2 ) are useful in such 
experiments. Furthermore, theory ( section B3.1 ) has proceeded to the point of being successful in predicting some 
simple catalytic cycles. 

C2.7.6.2 CHIRAL HYDROGENATION OF ALKENES CATALYSED BY A RHODIUM COMPLEX 

A hydrogenation reaction closely related to that referred to in the preceding paragraphs is that represented in figure 
C2.7.4 [7, §]. The catalyst incorporates a bidentate (two-toothed) phosphine ligand that affects the stereochemistry 
(chirality) of the reaction. There are two pathways for reaction ( figure C2.7.5 ) giving products of different 
stereochemistries: the pathway shown on the left shows the preferred (more stable) mode of binding of the reactant 
with the catalyst; the pathway on the right involves a minor isomer of the reactant-catalyst complex. The chirality 
of the product is determined predominantly by the latter pathway, which gives the major product because the 
reactant-catalyst complex on the right is much more reactive than that on the left, even though it is formed in lower 
concentrations than that on the left. 
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Figure C2.7.4. Catalytic cycle for hydrogenation of methyl-(Z)-a-acetamidocinnamate; the rate constants were 
measured at 298 K; S is solvent [8]. 

This example illustrates a subtle control of a chemical reaction by a delicate manipulation of the stereochemical 
environment around a metal centre dictated by the selection of the ligands. This example hints at the subtlety of 
nature's catalysts, the enzymes, which are also typically stereochemically selective. Chiral catalysis is important in 
biology and in the manufacture of chemicals to regulate biological functions, i.e., pharmaceuticals. 

C2.7.6.3 NO DECOMPOSITION CATALYSED BY A RUTHENIUM SURFACE 

Scanning tunnelling microscopy (STM, section B 1.1 9 ) has made it possible to observe microscopic details of 
catalyst surfaces, as shown by an investigation of one of the simplest solid catalysts, a single crystal of a pure metal 
(see section Al. 7 ), Ru, which is active and selective for the decomposition of NO, which proceeds via dissociative 
chemisorption to give N and O atoms [9]. A clean Ru(0001) sample was prepared under ultrahigh vacuum and 
exposed to NO at 300 K, which dissociated completely on the surface. The distribution of N and O atoms on the 
surface was investigated by STM at 300 K, allowing the researchers to distinguish isolated N atoms (dark spots), 
islands of O atoms, and individual O atoms ( figure C2.7.6 ); adsorbed NO molecules were not observed, as they 
move very rapidly on the surface [9]. Data were obtained at various times after the exposure of the crystal to NO. 
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Figure C2.7.6. STM images of an Ru(0001) surface after dissociative adsorption of NO at 315 K. (A) Image (38 
nm><33 nm) showing two terraces separated by a monatomic step (black stripe). (B) Close-up (6 nm><4 nm) 
showing an O island and individual N atoms. Individual O atoms are imaged as dashes (arrow) [9] 

The Ru surface is one of the simplest known, but, like virtually all surfaces, it includes defects, evident as a step in 
figure C2.7.6. The observations show that the sites where the NO dissociates (active sites) are such steps. The 
evidence for this conclusion is the locations of the N and O atoms; there are gradients in the surface concentrations 
of these elements, indicating that the transport (diffusion) of the O atoms is more rapid than that of the N atoms; 
thus, the slow-moving N atoms are markers for the sites where the dissociation reaction must have occurred, where 
their surface concentrations are highest. 

It has long been inferred that surface sites of low coordination (comparable to the metals in coordinatively 
unsaturated rhodium complexes in figure C2.7.2 are catalytically more active than those with larger numbers of 
neighbours and lower degrees of coordinative unsaturation, e.g., those on terraces on the Ru surface. Even more 
highly unsaturated are those sites at corners. The sites for one catalytic reaction may be different from those for 
another and, furthermore, the reactants may change the surface and create active sites. Active sites on surfaces 
more complex than those of single metal crystals are difficult to determine. 

C2.7.6.4 PROPANE METATHESIS CATALYSED BY TANTALUM COMPLEXES ANCHORED TO SILICA 

An alternative to elucidating the active sites on a surface is to synthesize them. For example, a new catalyst for 
metathesis of alkanes, 

2C iF H^,+2 — > Ci +i Hi (jh-0+2 +£*'-' H2[ji!-j)+2 (C2.7.3) 
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was synthesized on the surface of silica by the reaction of surface oxygen atoms with a reactive organometallic 
precursor, Ta(-CH 2 CMe 3 ) 3 (=CHCMe 3 ) (Me is methyl); two surface species were formed and identified by 
techniques including IR, NMR and EXAFS spectroscopies ( section B 1.6 ) [10]: (SiO-)Ta(-CH 2 CMe 3 ) 2 (=CHCMe 3 ) 
(-65%) and (SiO-) 2 Ta(-CH 2 CMe 3 )(=CHCMe 3 ) (-35%). The authors suggested a catalytic cycle (although they 
lack the strongest evidence of the intermediates) (figure C2.7.7) in which the surface-bound Ta group is denoted 
with a subscript s. Thus, the catalysis is molecular and the silica just an enormous, rigid ligand. This anchoring 
helps to stabilize the tantalum complex in states of coordinative unsaturation that probably do not exist in 
solution — analogous complexes in solution could react with each other (leading to self- inhibition of catalysis). 
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Figure C2.7.7. Catalytic cycle for the metathesis of propane [ 10 ] 

C2.7.6.5 CATALYTIC ACTION OF THE ENZYME HALOALKANE DEHALOGENASE 

An organism investigated for conversion of environmentally harmful halogenated compounds (e.g., dry-cleaning 
solvents) degrades them by dehalogenation catalysed by the enzyme haloalkane dehalogenase; 1-haloalkane and 
water react to give a primary alcohol and halide ion. The reaction mechanism was investigated by x-ray 
crystallography; enzyme crystals were soaked in dichloroethane to give a reactant-catalyst complex at 277 K and 
pH 5.0 (values less than those corresponding to the maximum reaction rate) and the complex was stable enough to 
allow an accurate structure determination [11]. Similar measurements were made with the sample warmed to room 
temperature and after standing long enough for the catalytic reaction to take place and the enzyme to become 
complexed with CI" product (but not 2-chloroethanol product). 

The data led to the cycle shown in figure C2.7.8 . Here, only the active site on the interior enzyme surface ( section 
C2.6 ) is depicted, consisting of R groups including aspartic acid, glutamic acid and others, represented with the 
shorthand Asp 260 , Glu 5fi etc; the subscripts represent the positions on the polypeptide chain. 
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Figure C2.7.8. Catalytic cycle indicating the working of the enzyme haloalkane dehalogenase [11]. 
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C2.7.6.6 CO OXIDATION CATALYSIS ON A PLATINUM SURFACE 


In the preceding example, the structure of the catalyst combined with reactants and products was determined and 
the data were used to infer a cycle. Structures of the highly reactive intermediates in catalysis are generally elusive 
and information about them based only on inference. In prospect, the most incisive information about the workings 
of a catalyst can be obtained by observations of the catalyst in action. The following example illustrates this 


strategy. 

The CO oxidation occurring in automobile exhaust converters is one of the best understood catalytic reactions, 
taking place on Pt surfaces by dissociative chemisorption of 2 to give O atoms and chemisorption of CO, which 
reacts with chemisorbed O to give C0 2 , which is immediately released into the gas phase. Details are evident from 
STM observations focused on the reaction between adsorbed O and adsorbed CO [12]. 

Experiments were carried out with a Pt(l 1 1) single-crystal surface cleaned in an ultrahigh- vacuum system. To 
monitor the progress of the reaction as a function of time, the researchers brought 2 into contact with the Pt to 
give a surface partially covered with O atoms, which aggregate into islands with a distinct periodic structure; the O 
atoms show up as dark dots in islands in the image of figure C2.7.9 at time 0. At time 0, CO was introduced into 
the reactor and bonded with the surface, thereupon reacting with the chemisorbed O and reducing the sizes of the 
islands, as shown by the STM images at increasing times after the introduction of CO ( figure C2.7.9 ). The CO 
molecules that must have been present on the surface at short times were not visible because of their high 
mobilities. But after 290 s, and more clearly after 600 s, the O islands had markedly shrunk because of reaction, 
and the CO had become evident as an additional ordered, streaky structure ( figure C2.7.9 ); at these times the CO 
had formed into closely packed immobilized structures. After longer times, the O islands became smaller, 
indicating the progress of the catalytic reaction; by 2020 s, the O had been completely converted ( figure C2.7.9 ). 
Note the contrast with the NO decomposition reaction; the CO oxidation takes place on the flat metal surface, not 
just on minority sites such as steps. 

A striking feature of the images is the nonuniformity of the distribution of the adsorbed species. The reaction 
between O and CO takes place at the boundaries between the surface domains and it was possible to determine 
reaction rates by measuring the change in length L of the boundaries of the O islands. The kinetics is represented by 
the rate equation 


r = -du /dr= */. (C2.7.4) 

where r is rate, n the number of surface O atoms, t time and k the rate constant. Data determined from the images 
are shown in figure C2.7.10 . Similarly, the rate was determined in the conventional way by macroscopic 
measurements of the surface coverages, assuming a random distribution of chemisorbed O and CO: 

V = k'ifaOco (C2.7.5) 

where K is a rate constant. Data determined in this way are also shown in figure C2.7.10 . The agreement between 
the microscopic and macroscopic kinetics data provides a verification of the simplified representation of the 
conventional macroscopic kinetics, which works rather well, although the distribution of surface species is far from 
random. 
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Figure C2.7.9. STM images recorded during reaction of adsorbed O atoms with adsorbed CO molecules on a Pt(l 1 
crystal at 247 K; image size, 18 nm><17 nm. Times are those after addition of CO to the surface; see text for details [ 
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Figure C2.7.10. Rates of reaction of CO with O on Pt(l 11), determined from microscopic (■) and macroscopic (+) < 
text for details [12]. 
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C2.7.6.7 SHAPE-SELECTIVE HYDROCARBON REACTIONS CATALYSED BY ZEOLITES 


Zeolites ( section C2.13 ) are unique because they have regular pores as part of their crystalline structures. The pores 
are so small (about 1 nm in diameter) that zeolites are molecular sieves, allowing small molecules to enter the 
pores, whereas larger ones are sieved out. The structures are built up of linked Si0 4 and A10 4 tetrahedra that share 
O ions. The faujasites (zeolite X and zeolite Y) and ZSM-5 are important industrial catalysts. The structure of 
faujasite is represented in figure C2.7.1 1 and that of ZSM-5 in figure C2.7.12. The points of intersection of the 
lines represent Si or Al ions; oxygen is present at the centre of each line. This depiction emphasizes the zeolite 
framework structure and shows the presence of the intracrystalline pore structure. In the centre of the faujasite 
structure is an open space (supercage) with a diameter of about 1.2 nm. The pore structure is three dimensional, 


with the supercages connected by apertures with diameters of about 0.74 nm. Molecules large enough to fit through 
the apertures can undergo catalytic reaction in the cages. ZSM-5 also has a three-dimensional structure, with 
straight parallel pores intersected by zig-zag pores. 



Figure C2.7.11. Framework structure of zeolites X and Y. 
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Figure C2.7.12. Structure of zeolite ZSM-5; (a) framework, (b) schematic representation of pores. 
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The zeolite frameworks are built up of Si0 4 tetrahedra, which are neutral, and A10 4 tetrahedra, which have a 
charge of-1. The charge of the A10 4 tetrahedra is balanced by the charges of cations located at various 
crystallographically defined positions in the zeolite, many of them exposed at the internal surface. The cations are 

typically catalytically active sites. When the cations are H + (in OH groups), the zeolites are acidic. Acidic zeolite Y 
(HY) and HZSM-5 are applied as components of petroleum cracking catalysts to make gasoline. The OH groups 
located near A10 4 tetrahedra are strong Bronsted acids and the catalytic sites for many reactions, including those 
mentioned below. 


Zeolites are unique as shape-selective catalysts. Mass transport shape selectivity is a consequence of transport 
restrictions allowing some species to diffuse more rapidly than others in zeolite pores. Small molecules enter the 
pores and are catalytically converted, but larger molecules may pass through a flow reactor unconverted because 
they do not fit into the pores, where almost all the catalytic sites are located. Similarly, product molecules formed 
inside a zeolite may be so large that their diffusion out of the pores may be so slow that they are largely converted 
into other products before escaping into the product stream. Mass transport selectivity is illustrated by toluene 
disproportionation catalysed by HZSM-5 13 (figure C2.7.13). The desired product is industrially valuable /^-xylene. 


The ortho- and meta-isomers are bulkier than the para-isomQV and diffuse less readily in the zeolite pores. The 
transport restriction favours their conversion into the ^ara-isomer, which is formed in excess of the equilibrium 
concentration. Because the selectivity is transport influenced, it is dependent on the path length for transport, which 
is the length of the zeolite crystallites. 
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Figure C2.7.13. Schematic representation of diffusion and reaction in pores of HZSM-5 zeolite-catalysed toluene 
disproportionate; the numbers are approximate relative diffusion coefficients in the pores [13]. 
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A different kind of shape selectivity is restricted transition state shape selectivity. It is related not to transport 
restrictions but instead to size restrictions of the catalyst pores, which hinder the formation of transition states that 
are too large to fit; thus reactions proceeding through smaller transition states are favoured. The catalytic activities 
for the cracking of hexanes to give smaller hydrocarbons, measured as first-order rate constants at 81 1 K and 
atmospheric pressure, were found to be the following for the reactions catalysed by crystallites of HZSM-5 14: n- 

hexane, 29; 3-methylpentane, 19; 2,2-dimethylbutane, 12 s _1 . The reaction rates were independent of the zeolite 
crystallite size, which rules out a transport effect. Instead, the selectivity is determined by the geometry of the 
transition states, which become bulkier with increasing branching of the molecules and are believed to be C 12 
carbenium ions resulting from the carbon-carbon bond formation reaction of the alkane with a carbenium ion 
formed from the alkane by abstraction of a hydride ion by another carbenium ion. These C 12 carbenium ions easily 
fit in the zeolite pores when the reactant has no branches and barely fit when it has two branches; the crowding in 
the pores hinders the reaction. 
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C2.8 Corrosion 

P Schmuki and M J Graham 


C2.8.1 INTRODUCTION 

Corrosion is the 'gnawing away' of materials due to exposure to different environments. Basically, a material is 
trying to return to its natural state, e.g., metallic iron oxidizes to form the ore from whence it came. 

The most common form of corrosion (in terms of tons of materials lost) is electrochemical corrosion, which can 
occur for example in aqueous solutions, in the atmosphere and in the ground. Here, the actual corrosion reaction is 
invariably the anodic or oxidation reaction, whereby a metal dissolves while releasing electrons and ions. Thus one 
might say that 'corrosion' is a negative way of looking at an electrochemical dissolution or oxidation reaction. The 
reason for separating this topic from other dissolution or oxidation reactions which are of economic benefit, e.g. 
oxidation of silicon to form semiconductor devices, is based on the historic roots of corrosion science and the 
tremendous economic significance of material destruction. In industrialized countries, the cost of corrosion is 
estimated to be about 3.5% of the GNP. Areas such as construction materials, electronics and transportation are 
affected and thus an extensive number of reference books is available [1, 2, 4, 5, 6, 7, 8, 9, 10, H, 12, 13, 14, 15, 
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32 and 33]. Frequently, modes of corrosion are 
described according to the type of attack (e.g. uniform corrosion, localized corrosion) or the topic is categorized 
according to the specific material involved. 


In the following, however, classification of corrosion processes is made according to the reaction mechanism rather 
than the phenomenology. Emphasis is placed on electrochemical reactions, including high temperature oxidation. 
Other types of corrosion such as purely physical processes (e.g. erosion, fretting) or mixed type (e.g. stress 
corrosion cracking) are only briefly mentioned, with reference to further reading, in section C2.8.5. 


C2.8.2 ELECTROCHEMICAL FUNDAMENTALS [34, 35, 36, 37 AND 38] 

One can distinguish between a thermodynamic and kinetic stability to corrosion. 

C2.8.2.1 THERMODYNAMIC CONSIDERATIONS 

Thermodynamic stability is generally provided for noble metals in most media as their oxidation potential is more 
anodic than the reduction potential of species commonly occurring in the surrounding phase. However, for many 
materials of technological and industrial importance this is not the case. 

For example, for iron in aqueous electrolytes, the thermodynamic warning of the likelihood of corrosion is given by 
comparing the standard electrode potential of the metal oxidation, with the potential of possible reduction reactions. 

94- — 

The metal anodic oxidation reaction, Fe — > Fe + 2e , can be written in the standard (reduction) notation as: 
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Fe 2+ (aq) + 2e" -> Fc(s) £ fl = -0.44 V (C2.8.1) 

where E denotes the standard redox potential of the reaction versus the normal hydrogen electrode (NHE). 
Depending on the electrolyte pH, one of the following reduction (cathodic) reactions will dominate: 

(a) in acidic solutions: 

2 H 4 (aq) + 2e" — H 2 (g) £° = V (al u fr = 1 ) (C2.8.2) 

(b) in neutral and in alkaline solutions: 

2H 2 0(1) * 2 (g) * 4e" -> 40»"(aq) E y} = 0,4 V (ataou- = D ( C28 3 ) 

E^ = 1.23 (at a H - =1). (C2.8.4) 

Both cathodic reactions can drive the metal oxidation. Of course, the potentials given above are only standard 

Gibbs values (E ), and the effective electrode potential follows the Nernst equation (see section C2. 11 ). For the 
oxidation (anodic) reaction, the potential (E ) of the Nernst equation can be written as: 


a 7 


tit 


B = 4, : -„, + — MFe 3 *]. (C2.8.5) 


For the reduction reaction potential (E c ), the Nernst equations are: 
(a) 


R 7 
£c = El /H + ~F lll f 1 1+ ] * -0059 pi I (C2.8.6) 


(b) 


,0 ^ 


E c = £l v w + — ln t H+ l ^ 1 -23 V - 0.059 pH. (C2.8.7) 

The right-hand term in both equations indicates the direct dependence of E Q on the pH of the medium (assuming 
otherwise standard conditions). 
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For the coupled redox cell, the e.m.f. (E) results as: 

E = E c . - E tl (C2.8.8) 

and the Gibbs free energy (AG) of the reaction as: 

AC = -zFE. ( C2 - 8 - 9 ) 

Thus, it can basically be predicted under what conditions (pH, concentration of redox species) the metal dissolution 
reaction (Fe — > Fe ) proceeds thermodynamically. From a practical point of view, the rate of the reaction and 
therefore the fate of the oxidized species (Fe ) is extremely important: they can either be solvated, i.e., to form Fe 
(H 2 0)^ + complexes, and therefore be efficiently dissolved in the solution, or they can react with oxygen species of 

the solution to form a surface oxide layer (FeO, Fe 3 4 , Fe 2 3 ). Such oxide layers can represent effective kinetic 
barriers against corrosion (see section C2.8.3 on passive films). 

Thus it is important to take into account the thermodynamics of oxide formation and any additional electrochemical 

reactions such as the oxidation of the Fe to Fe . The results of the calculations are frequently represented as 
pH-potential diagrams (so-called Pourbaix diagrams [39]). The Pourbaix diagram for iron in an aqueous 
environment is shown in figure C2.8.1. 
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Figure C2.8.1. Simplified E/pYL diagram (Pourbaix diagram) for the iron-water system at 25°C. The diagram is 

drawn for a concentration of dissolved Fe species of 10 mol 1 . The potentials are given versus the normal 
hydrogen electrode (NHE) scale. 
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The diagram gives regions of existence, i.e. for a particular combination of pH and redox potential it can be 
predicted whether it is thermodynamically favourable for iron to be inert (stable) (region A), to actively dissolve 
(region B) or to form an oxide layer (region C). 

The dotted lines represent the cases when the above cathodic reactions, (a) or (b), drive the reaction. The solid lines 
indicate the stability ranges for Fe and its corrosion products (Fe , Fe , Fe 3 4 , Fe 2 3 , HFcOj). 

Consider, for example, an acidic solution at pH 1: iron dissolves (formation of Fe(H 2 0)g + ); as E Q of reaction (a) is 

at -0.06 V NHE we are in region B (existence of Fe ). Additionally, it can be seen that the Fe species can be 
further oxidized to Fe if 2 is present in the electrolyte (line (b) lies in the region of existence of Fe ). 

Considering the case of pH > 9, the formation of an oxide film is favoured compared with Fe dissolution. 

In the case of a neutral solution (e.g. pH = 7), depending on the corrosion potential all these three ranges (stability, 
dissolution or oxide formation) may be involved. 

pH-potential diagrams are available for many elements in aqueous environments and are often a valuable tool in 
the preliminary assessment of the (thermodynamic) stability of a system. However, it should be pointed out that 
these calculations are based purely on thermodynamic considerations and, hence, this approach gives no 
information on the rate (kinetics) of the possible corrosion reactions. 

C2.8.2.2 KINETIC CONSIDERATIONS 


For many practically relevant material/environment combinations, thermodynamic stability is not provided, since 
E > E . Hence, a key consideration is how fast the corrosion reaction proceeds. As for other electrochemical 
reactions, a variety of factors can influence the rate determining step. In the most straightforward case the reaction 
is activation energy controlled; i.e. the ion transfer through the surface Helmholtz double layer involving migration 
and the adjustment of the hydration sphere to electron uptake or donation is rate determining. The transition state is 


called an 'activated surface complex'. 

Alternatively, the mass transport properties in the solution can become rate determining — the reaction is then said 
to be diffusion controlled. 

(A) ACTIVATION CONTROL (SEE ALSO SECTION C2.11 OF THIS BOOK) 

Let us consider the oxidation of Fe(s) to Fe (solvated), which can be described by the following reaction 
sequence [36, 40 ]: 


Fe *+ Fe(OH) ++ FeOH + ++ Fe(H 2 0)^ 

■f | (C2.8.10) 

activated transition couple 
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The intermediate species Fe(OH) and FeOH + can be regarded as constituting the activated surface complex. 
The rate constant for formation and decay of this complex, k, can be written as 

fc = tfe- AC ' V * y ". (C2.8.11) 

As the reaction leading to the complex involves electron transfer it is clear that the activation energy AG # for 
complex formation can be lowered or raised by an applied potential (A®). Of course, both the forward (oxidation) 
and well as the reverse (reduction) reaction are influenced by A®. If one expresses the reaction rate as a current 
flow (/'), the above equation C2.8.1 1 can be expressed in terms of the Butler- Volmer equation (for a more detailed 

treatment see section C2.ll ). For the anodic reaction (Fe — » Fe ), the resulting anodic current density,^, upon 
applying externally an anodic voltage A® has the form 

J« = Joe > (C2.8.12) 
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For the cathodic reaction (Fe — > Fe), the cathodic current density y c can analogously be written as 

jc = -h*~* & * ( C2 - 8 - 13 ) 

where 7 , b and b ' are constants. (/ is the so-called exchange current density, i.e., the reaction current density in the 
absence of an external applied potential.) 

Since any current resulting from the anodic reaction must be consumed by the cathodic reaction, the cathodic 
current, j , must be equal to the anodic current j . As a consequence, the equilibrium potential ® of a metal (e.g. 
Fe) that is immersed into an aqueous electrolyte will be adjusted by the condition thaty'^ = \j |(=7q). This is 

94- 

illustrated in figure C2. 8.2 (a). Under ideal conditions, A® is the Nernst potential (E Fe ) of the Fe /Fe couple. For 
iron immersed in an aqueous electrolyte, the dominant reduction reaction is, however, one of the reactions 
involving the electrolyte (equations ( C2.8.2 ), ( C2.8.3 ) and ( C2.8.4 )). For this reaction equilibrium the same 
principle applies as outlined above for the Fe /Fe case (e.g. for H <-» H 2 the concept of/ =j applies at E H (an 
analogous diagram as for Fe in figure C2.8.2 (a) could be drawn for the H + /H couple). 
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Figure C2.8.2. (a) Schematic polarization curves for the anodic and cathodic reaction of an Fe/Fe electrode. The 
anodic and cathodic branches of the curve correspond to equations (C2.8.12) and (C2.8.13) respectively. The 
equilibrium potential of this electrode will adjust so that \j a \ = \j c |; the corresponding potential is E redox of Fe. (b) 
Schematic polarization curves for a mixed electrode (Fe in an aqueous solution in the presence of hydrogen ions), 

i.e. the redox system of Fe (figure 2(a)) is coupled with a second redox system: H + /H 2 . Also in this mixed case the 
absolute values of the two partial current densities are equal (li*l = I -CI) at equilibrium. The corresponding 
potential is the corrosion potential (E ); the corrosion current density is equal to the anodic current density at 
E corr (J Ai # f — J„). Experimentally, only the sum of the anodic and cathodic branch is accessible — the dotted line 

represents this sum of current density (/*). (c) Determination of the corrosion current from a so-called Tafel plot 
(log(/*) against U). The corrosion current density (j corr ) is obtained from the extrapolation of linear parts of the 

cathodic and anodic branches of j s to the corrosion potential. 
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The coupled situation of both redox equilibria is described by the so-called 'mixed potential theory'. The mixed 

94- 4- 

oxidation (Fe — » Fe ) and reduction (H — » FL) systems will equilibrate to zero net current; the resulting potential, 
which lies between £ jv^/r^and & H+/H, is called the corrosion potential (E corr ). This is illustrated in figure C2.8.2 

(b) (which is obtained by 'adding' figure C2. 8.2 (a) and the analogous curves for the H + /H couple). The rate of 
corrosion is given by the current of metal ions leaving the metal surface in the anodic region. Thus, the corrosion 
current density, j corr , can be identified with the anodic current of the coupled system, j n . 


As both processes, reduction and oxidation, take place on the same electrode surface (a short-circuited system), it is 
not possible to directly measure the corrosion current. Experimentally, only the sum of the anodic and cathodic 


current densities (j s ) is accessible (the dotted line in figures C2.8.2(b) and (c)). To obtain the corrosion rate, j h can 
be measured as a function of an externally applied voltage. To acquire the polarization curves as in figure C2.8.2 
(c), a traditional three-electrode set-up (figure C2.8.3) is mainly used where the system is compared with a 
reference electrode. The potential (voltage) between the metal and the reference electrode is varied using a counter- 
electrode and the current is registered (figure C2.8.3). From figure C2.8.2 (b) it is clear that, for potentials which 
are sufficiently far from the equilibrium value, j u &j fi . Thus, in a plot of log (/ ) versus AU a linear portion is 

obtained which can be extrapolated back to E corr as shown in figure C2. 8.2(c) . The corresponding current density 
value isj corr . The semilog representation of figure C2.8.2 (c) is often referred to as a Tafel plot; the slope of the 
linear portions, which depends on the exact mechanism of the charge transfer reaction, is accordingly called the 
Tafel slope. 
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Figure C2.8.3. A three-electrode electrochemical set-up used for the measurement of polarization curves. A 
potentiostat is used to control the potential between the working electrode and a standard reference electrode. The 
current is measured and adjusted between an inert counter-electrode (typically Pt) and the working electrode. 

The corrosion current can be converted into material loss (m corr ) using Faraday's law according to equation 
C2.8.14): 


m Cfffr = (MfzF)j< t , r rt 


(C2.8.14) 


where M is the molar mass of the metal, z is the charge number of the ion, F is the Faraday constant and t is the 
time. 

This, of course, assumes a 100% current efficiency regarding metal dissolution, i.e. no other competitive 
electrochemical reactions occur. 
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It should be pointed out that external polarization differs from the unbiased (open circuit) case in that after 
application of, say, an anodic voltage only the oxidation reaction takes place on the metal, whereas the cathodic 
reaction (H — » H 2 ) occurs at the external counter-electrode. 

Other techniques to determine the corrosion rate use instead of DC biasing, an AC approach (electrochemical 
impedance spectroscopy). From the impedance spectra, the polarization resistance (R ) of the system can be 
determined. The polarization resistance is indirectly proportional toj corr . An advantage of an AC method is given 
by the fact that a small AC amplitude applied to a sample at the corrosion potential essentially does not remove the 
system from equilibrium. 

(B) DIFFUSION CONTROL (SEE ALSO SECTION C2.11 OF THIS BOOK) 


Electrochemical processes can become diffusion controlled if the formation of the activated complex is fast 
compared with the diffusion of the reacting anion to the surface or dissolving cations from the surface. In aqueous 


solutions diffusion control of uniform corrosion is frequently encountered when the cathodic reaction depends on 
the supply of 2 (g) — which is only sparingly soluble in water and therefore is present only in small concentrations. 

Under diffusion controlled conditions the reaction rate depends, then, only on the supply of 2 (g) to the surface 
which is determined by Fick's law: 


flux= D(i)N/Hx) 


(C2.8.15) 


where D is the diffusion coefficient and dN/dx is the particle (e.g. 2 ) concentration gradient within the Nerstian 
diffusion layer. 

Therefore, in the limiting case — the surface concentration of the reacting species is zero as all the arriving ions 
immediately react — the current density becomes voltage independent and depends only on diffusion, specifically, 
on the width of the Nerstian diffusion layer 8, and of course the diffusion coefficient and the bulk concentration of 
anions (c). The limiting current density (j L ) is then given by 


j L = U^/%- 


(C2.8.16) 


The diffusion layer width is very much dependent on the degree of agitation of the electrolyte. Thus, via the 
parameter 8, the hydrodynamics of the solution can be considered. Experimentally, defined hydrodynamic 
conditions are achieved by a rotating cylinder, disc or ring-disc electrodes, for which analytical solutions for the 
diffusion equation are available [37, 41, 42 and 43 ]. 

In the polarization curve of figure C2.8.4 (solid line), the two regimes, activation control and diffusion control, are 
schematically shown. The anodic and cathodic plateau regions at high anodic and cathodic voltages, respectively, 
indicate diffusion control; the current is independent of the applied voltage andy' A is reached. 
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Figure C2.8.4. The solid line shows a typical semilogarithmic polarization curve (logy' against U) for an active 
electrode. Different stages of reaction control are shown in the anodic and cathodic regimes: the linear slope 
according to an exponential law indicates activation control; at high anodic and cathodic potentials the current 
becomes independent of applied voltage, indicating diffusion control. 


It is worth noting that under activation control the reaction rate depends on crystal orientation as the strength of the 


bonds to the rest of the lattice, as well as the number of available bonds, directly influences the activation energy 
needed to create the activated complex. For instance, the kinetics of the dissolution of a Si(l 1 1) plane is much 
slower than for Si(100) due to stronger backbonding of an Si(l 1 1) surface atom. (An Si(l 1 1) surface is 'attached' 
by three backbonds to the lattice and has only one available (dangling) bond sticking out into the electrolyte, 
whereas an Si(100) surface has two backbonds and two of the bonds are dangling). 

Under diffusion-controlled dissolution conditions (in the anodic direction) the crystal orientation has no influence 
on the reaction rate as only the mass transport conditions in the solution determine the process. In other words, the 
material is removed uniformly and electropolishing of the surface takes place. 
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C2.8.3 OXIDE FORMATION AND PASSIVITY [44, 45, 46, 47, 48, 49, 50, 51, 
52, 53, 54 AND 55] 

C2.8.3.1 PASSIVITY 

(A) PASSIVE FILM FORMATION 

In terms of an electrochemical treatment, passivation of a surface represents a significant deviation from ideal 
electrode behaviour. As mentioned above, for a metal immersed in an electrolyte, the conditions can be such as 
predicted by the Pourbaix diagram that formation of a second-phase film — usually an insoluble surface oxide 
film — is favoured compared with dissolution (solvation) of the oxidized anion. Depending on the quality of the 
oxide film, the formation of a surface layer can retard further dissolution and virtually stop it after some time. Such 
surface layers are called passive films. This type of film provides the comparably high chemical stability of many 
important construction materials such as aluminium or stainless steels. 

Highly protective layers can also form in gaseous environments at ambient temperatures by a redox reaction similar 
to that in an aqueous electrolyte, i.e. by oxygen reduction combined with metal oxidation. The thickness of 
spontaneously formed oxide films is typically in the range of 1-3 nm, i.e., of similar thickness to electrochemical 
passive films. Substantially thicker anodic films can be formed on so-called valve metals (Ti, Ta, Zr, . . .), which 
allow the application of anodizing potentials (high electric fields) without dielectric breakdown. 


Passivation is manifested in a polarization curve ( figure C2.8.4 ) dashed line) by a dramatic decrease in current at a 
particular onset potential (the passivation potential, 
density, is lowered by several orders of magnitude. 


particular onset potential (the passivation potential, U ). The corrosion reaction rate kinetics, i.e. the anodic current 


The value and existence of a passivation potential U is based on the thermodynamics of oxide formation. 
Accordingly, passivation potentials and conditions for oxide formation can be predicted from a Pourbaix diagram 
( figure C2.8.1 ). A polarization curve as in figure C2.8.4 can be perceived as reflecting a cross section through the 
Pourbaix diagram at a fixed pH. For example, at pH 7 one crosses, moving from cathodic to anodic potentials, first 
the active metal dissolution line (I), then the passivation line (II) at U . 

It is generally believed that in the first stage of a passivation reaction — just below U — a precursor film is formed 
(e.g. a thin hydroxide layer), which then facilitates subsequent oxide formation. 

The actual chemical mechanism of oxide formation has to address several factors, as schematically shown in figure 
C2.8.5 . Although in essence very similar, two slightly different mechanisms are distinguished: figure C2.8.5 (a) 
represents the growth of an oxide under open-circuit conditions, i.e. a piece of iron immersed in a passivating 
solution or exposed to an oxygen-containing environment. Figure C2.8.5 (b) shows the situation under an 
externally applied voltage in an electrolyte. 
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Figure C2.8.5. Growth of an oxide film on a metal surface, (a) In the absence of an externally applied potential: 

metal oxidation (M — > M ) occurring at the inner interface is coupled with oxygen reduction (CL— »0 ) at the 
outer interface. For film growth one of the ionic species migrates predominantly — mass transfer is coupled with 
electron transfer through the layer. (This situation also corresponds to high-temperature corrosion.) (b) In the 
presence of an externally applied anodic potential (potentiostat): in addition to the mechanism of (a) film growth 
can also take place without electron transfer through the film as oxygen reduction happens at the counter-electrode. 
Mass transfer is not coupled with electron transfer. 

In both cases, the anodic reaction occurs by oxidation at the metal/oxide interface: 


M^ M 21 +2e- 


(C2.8.17) 


For the situation in figure C2.8.5 a), the cathodic reaction is the reduction of 2 at the oxide/gas or 
oxide/electrolyte interface: 
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Q 2 + 2e~ 


O 


.2 


(C2.8.18) 


At least one of the ionic species has to diffuse or migrate through the oxide and accordingly the layer grows usually 
either at the inner or outer interface. (Alternatively, the transport of ions through the film can be formulated as 
cation and anion vacancies moving through the lattice of the oxide film [44,56, 57].) As the cathodic and anodic 


reactions are spatially separated by the oxide, electrons have also to be transferred through the layer and, thus, the 
conductance of the layer is essential to the process. Most oxides are semiconductors due to a non-equilibrated 
stoichiometry and, thus, either a negatively or positively charged species has the freedom to migrate through the 
lattice. 

The driving force for migration is established by the different electrochemical potentials (AU) that exist at the two 
interfaces of the oxide. In other words, the electrochemical potential at the outer interface is controlled by the 
dominant redox species present in the electrolyte (e.g. 2 ). 

The situation in figure C2.8.5 (b) is different in that, in addition to the mechanism in figure C2.8.5 (a), reduction of 
the redox species can occur at the counter-electrode. Thus, electron transfer through the layer may not be needed, 

as film growth can occur with OH - species present in the electrolyte involving a (field-aided) deprotonation of the 
film. The driving force is provided by the applied voltage, AU. 

Quantitative approaches to describe the kinetics of film formation i.e. the mechanistic extraction of growth laws for 
film formation date back to the work of Cabrera and Mott [58] and Vetter [59]. It is essential that at low to 
moderate temperatures pure diffusion of ionic species is very small. Thus, film growth is controlled by the electric 
field across the layer and the lowering of the activation energy for ion or vacancy hopping (so-called high-field 
mechanism). This results in the so-called inverse logarithmic growth law. 

1/jf = A - B logt'O (C2.8.19) 

where x is the film thickness and t is the time. 

The growth according to this equation is self-limiting as the field strength F is lowered (at constant voltage) with an 
increasing film thickness x. 

F[r) = AUfx{t). (C2.8.20) 

In most practical cases (and at moderate voltages) the high-field growth law can control film growth, say up to only 
a maximum of 10 nm, as at this thickness the field strength effects become even less important than film growth 
due to diffusion of vacancies or ions. 

The above rate law has been observed for many metals and alloys either anodically oxidized or exposed to 
oxidizing atmospheres at low to moderate temperatures — see e.g. [60]. It should be noted that a variety of different 
mechanisms of growth have been proposed (see e.g. [61, 62]) but they have in common that they result in either the 
inverse logarithmic or the direct logarithmic growth law. For many systems, the experimental data obtained up to 
now fit both growth laws equally well, and, hence, it is difficult to distinguish between them. 
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It should be mentioned that as well as for metals the passivation of semiconductors (particularly on Si, GaAs, InP) 
is also a subject of intense investigation. However, the goal is mostly not the suppression of corrosion but either the 
formation of a dielectric layer that can be exploited for devices (MIS structures) or the minimization of interface 
states (dangling bonds) on the semiconductor surface [63, 64 ]. 

(B) PROPERTIES OF PASSIVE FILMS 

The protective quality of the passive film is determined by the ion transfer through the film as well as the stability 
of the film with respect to dissolution. The dissolution of passive oxide films can occur either chemically or 
electrochemically. The latter case takes place if an oxidized or reduced component of the passive film is more 
soluble in the electrolyte than the original component. An example of this is the oxidative dissolution of Cr ? 0^ 


films as Cl05"[39, 65, 66]. 

From polarization curves the 'protectiveness' of a passive film in a certain environment can be estimated from the 
passive current density in figure C2.8.4 which reflects the layer's resistance to ion transport through the film, and 
chemical dissolution of the film. It is clear that a variety of factors can influence ion transport through the film, 
such as the film's chemical composition, structure, number of grain boundaries and the extent of flaws and pores. 
The protectiveness and stability of passive films has, for instance, been based on percolation arguments [67, 68], 
structural arguments [69], ion/defect mobility [56, 57] and charge distribution [70, 71 ]. 

To illustrate some of the different approaches, let us consider passive films grown on Fe-Cr alloys. It has been 
established since 1911 [ 72 ] that an increase of Cr in the alloy increases the stability of the oxide film against 
dissolution. 

The percolation argument is based on the idea that with an increasing Cr content an insoluble interlinked chromium 
oxide network can form which is also protective by embedding the otherwise soluble iron oxide species. As the 
threshold composition for a high stability of the oxide film is strongly influenced by solution chemistry and is 
different for different dissolution reactions [73], a comprehensive model, however, cannot be based solely on 
geometrical considerations but has in addition to consider the dissolution chemistry in a concrete way. 

Other authors have attributed the improved corrosion resistance with increasing Cr content with the increasing 
tendency of the oxide to become more disordered [69]. This would then suggest that an amorphous oxide film is 
more protective than a crystalline one, due to a bond and structural flexibility in amorphous films. 

This example illustrates that exact information on the chemistry and structure of the passive film is necessary to 
clarify the mechanisms relevant to stability and protectiveness of passive films. 
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The nature of passive oxide films on many technologically important metals and alloys has been the subject of 
investigation for many years. Ex situ surface analytical techniques such as x-ray photoelectron spectroscopy (XPS), 
Auger electron spectroscopy (AES) and secondary ion mass spectrometry (SIMS) provide useful information on 
the chemical composition and thickness of the films. Good agreement exists regarding a qualitative description of 
the chemistry of passive films on many metals. However, due to either different experimental approaches or data 
analysis, slightly different views can be found on the more detailed nature of the different films. Generally, it is 
important to note that the passive film, once formed should not be considered as a rigid layer, but instead as a 
system in dynamic equilibrium between film dissolution and growth. In other words, the passive film can adjust its 
composition and thickness to changing environmental factors. Principally, the chemical composition and the 
thickness of electrochemically formed passive films depend (apart from the base metal) on the passivation 
potential, time, electrolyte composition and temperature, i.e., on all passivation parameters and, hence, a detailed 
treatment is beyond the scope of this chapter. For further relevant literature the reader is referred to e.g. [74, 75, 76, 
77, 78, 79, 80, 81, 82, 83 , 84, 85, 86, 87, 88 and 89] and references therein. 

The question of the structure of the passive film has been tackled by many research groups. Methods used to 
investigate the structure include x-ray scattering, diffraction and Mossbauer spectroscopy. For thick anodic oxide 
films or thick oxide films grown at elevated temperatures, the structure can be assessed by x-ray diffraction 
techniques. However, for thin passive films formed at low to moderate temperatures, the thickness of the films is 
usually less than 10 nm and, hence, it is experimentally difficult to investigate the structure by traditional x-ray 
diffraction. Another question often asked is whether the structure of a thin, mostly hydrated passive film formed 
under electrochemical conditions may change as it is removed from the conditions under which it was formed. 
Therefore, lately, new in situ techniques (STM, x-ray scattering using synchrotron radiation, EXAFS) to study the 
structure of thin oxide films have attracted considerable interest. In the case of the passive film on Fe, for instance, 
it could be shown with in situ STM [ 90 ] as well as with in situ x-ray scattering [91] that the passive film has a 
crystalline structure. Up to now, however, these investigations have been extended to only a few metals and, hence, 
the question of the structure of passive films remains to be investigated further. 


As outlined above, electron transfer through the passive film can also be crucial for passivation and thus for the 
corrosion behaviour of a metal. Therefore, interest has grown in studies of the electronic properties of passive 
films. Many passive films are of a semiconductive nature [92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102 and 103 ] 
and therefore can be investigated with techniques borrowed from semiconductor electrochemistry — most typically 
photoelectrochemistry and capacitance measurements of the Mott-Schottky type [ 104 ]. Generally it is found that 
many passive films cannot be described as ideal but rather as amorphous or highly defective semiconductors which 
often exhibit doping levels close to degeneracy [ 105 ]. 

(C) PASSIVITY BREAKDOWN AND LOCALIZED CORROSION 

The passive state of a metal can, under certain circumstances, be prone to localized instabilities. Most investigated 
is the case of localized dissolution events on oxide-passivated surfaces [51, 106 , 107 , 108 ., 109 , 110 , 111 , 112 , 113 , 
114 , 115 , 116 , 117 and 118 ]. The essence of localized corrosion is that distinct anodic sites on the surface can be 
identified where the metal oxidation reaction (e.g. Fe — » Fe 2+ + 2e~) dominates, surrounded by a cathodic zone 
where the reduction reaction takes place (e.g. 2H + + 2e~ — » H 2 ). The result is the formation of an active pit in the 
metal, an example of which is illustrated in figure C2.8.6 (a) and (b). 
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Figure C2.8.6. (a) SEM image of an early stage of pit formation on AISI304 steel in chloride containing solution, 
(b) Schematic cross section through an actively growing pit. Metal oxidation occurs at the pit base and the 
corresponding reduction on the passive film surrounding the pit. Acidification and an increase of the halogen ion 
concentration (due to migration) within the pit additionally accelerate dissolution, (c) Polarization curve of a 
passive metal showing localized breakdown of passivity and pit growth at U u (dashed line). The solid line 
represents a polarization curve of the same material in the absence of aggressive (pit-triggering) anions. In this case 
the current increase at higher anodic voltages indicates either transpassive oxide film dissolution or the onset of 


oxygen evolution at the anodically polarized electrode. 

Pitting occurs with many metals in halide containing solutions. Typical examples of metallic materials prone to 
pitting corrosion are Fe, stainless steels and Al. The process is autocatalytic, i.e., by initial dissolution, conditions 
are established which further stimulate dissolution: inside the pit the metal (Fe in the example of figure C2.8.6 
dissolves. 
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The M 2+ species form an aquo-complex: 


M 2 ^ + 6H 2 -* M(HjO)-\ (C2.8.21) 

To maintain charge neutrality, additional halide ions (CI - in our example) have to migrate inside the pit thus 
increasing the local chloride concentration and a chloro-complex is formed. 

M(hhO)^ + CI" -+ M(H 2 0) s Cr + H 2 0. (C2.8.22) 

The chloro-complex is in equilibrium with its hydroxo-chlorocomplex and H + . 

M(ii 2 o) 5 cr *+ M(ii20) 4 (OH)ci + n\ (C2.8.23) 

For many metals the equilibrium lies strongly to the right hand side. Thus, within the pit the chloride concentration 
and the H + concentration both increase, further accelerating metal dissolution. 

Generally, the following two stages of pitting are distinguished: pit initiation and pit growth. The reasons for the 
initiation of pits at distinct surface locations are manifold and can either be deterministic or stochastic in nature. 
They can be ascribed to bulk metal inhomogeneities (inclusions, precipitates, grain boundaries, dislocations etc.) or 
to properties of the passive film (thermally induced stochastic film rupture, electrostriction, local composition or 

structure variations). Initiation mechanisms assigning the key role to the passive film involve Cl~ penetration, local 
film thinning, or vacancy condensation; mechanisms focusing on the bulk metal ascribe the key role to preferential 
dissolution at inhomogeneities. 

In an electrochemical polarization experiment on a passive system the onset of localized dissolution can be 
detected by a steep current increase at a very distinct anodic potential (the pitting potential, U - t ) — see figure 
C2.8.6 (c). This increase occurs far below either transpassive dissolution (oxide film dissolution due to the 
formation of soluble higher oxidation states (e.g. Cr 2 3 — » Cr aq) or the occurrence of oxygen evolution (OH - 

->o 2 ». 

In the potential range anodic to U - t , stable pit growth occurs. The value of U u is shifted to lower anodic potentials 

with increasing temperature, increasing Cl~ concentration and decreasing pH, and is dependent on the presence of 
other anions in the electrolyte. 

From an electrochemical viewpoint, stable pit growth is maintained as long as the local environment within the pit 
keeps the pit under active conditions. Thus, the effective potential at the pit base must be less anodic than the 
passivation potential (U ) of the metal in the pit electrolyte. This may require the presence of voltage-drop (IR- 
drop) elements. In this respect the most important factor appears to be the formation of a salt film at the pit base. 
(The salt film forms because the solubility limit of e.g. FeCl 2 is exceeded in the vicinity of the dissolving surface in 

the highly Cr-concentrated electrolyte.) 
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In the potential range cathodic to U w one frequently observes so-called metastable pitting. A number of pit growth 
events are initiated, but the pits immediately repassivate (an oxide film is formed in the pit) because the conditions 
within the pit are such that no stable pit growth can be maintained. This results in a polarization curve with strong 
current oscillations at U< U . f 

Another type of localized corrosion closely related to pitting corrosion is crevice corrosion. This type of attack 
occurs preferentially in regions on the metal surface where mass transfer is limited (e.g., in narrow crevices or 
under deposits) and, hence, an increase in concentration of aggressive species (halides), combined with a pH 
decrease as discussed above and depletion of oxygen, can rapidly lead to activation of the surface in the crevice 
area. Metals which are susceptible to pitting corrosion also suffer from crevice corrosion. The presence of crevices 
on the surface often triggers localized corrosion already under conditions where stable pitting would not take place 
(e.g., with lower concentration of aggressive halides). 

In all cases of localized corrosion, the ratio of the cathodic to the anodic area plays a major role in the localized 
dissolution rate. A large cathodic area provides high cathodic currents and, due to electroneutrality requirements, 
the small anodic area must provide a high anodic current. Hence, the local current density, i.e., local corrosion rate, 
becomes higher with a larger cathode/anode-ratio. 

Localized corrosion is far more treacherous in nature and far less readily predictable and controllable than uniform 
corrosion and it is, moreover, capable of leading to unexpected damage with disastrous consequences, especially 
since inspection of corrosion damage is in many cases difficult. 

Recently, the phenomenon of localized dissolution has attracted a great deal of interest in the field of 
semiconductor technology. This is due to the discovery of visible light emission from porous Si [ 119 ] which is 
formed by an electrochemical treatment of a Si surface in an HF-containing electrolyte [ 120 ]. It is interesting to 
note that the formation process is in many respects similar to pitting of metals [ 118 ] and that preferential triggering 
of the formation process at defects can be exploited to form highly defined localized dissolution [ 121 ]. 

C2.8.3.2 HIGH-TEMPERATURE OXIDATION AND CORROSION [122, 123 , 124, 125 , 126 AND 127] 

So far, discussions have been limited to oxide film growth at low temperatures, where the model of Cabrera and 
Mott [58 usually applies. Oxide growth, controlled by the electric field across the film, follows an inverse 
logarithmic growth law ( equation (C2.8.19) ). At elevated temperature, scales can grow much thicker in water, air 
or oxygen, for example, or other more aggressive gases containing sulphur or chlorine. Mechanistically, the 
processes are similar to the passivation discussed earlier in terms of oxidation, reduction, ion transport and electron 
transfer, as outlined in figure C2.8.5 (a). The main difference is that elevated temperatures promote ionic diffusion 
and, thus, oxide formation can proceed to a much greater extent than at low temperatures where only thin layers are 
formed by the high-field mechanism. The most common growth law observed at higher temperatures is the so- 
called parabolic rate law [ 128 ]: 


X 2 = kpt + C (C2.8.24) 

where the rate of oxidation dx/dt is inversely proportional to the oxide thickness, x, and k is the parabolic rate 
constant. This indicates that a thermal diffusion process is rate controlling, with oxygen or cations or both diffusing 
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through a compact layer ( figure C2.8.5 (a)). Protective oxides are formed in this manner, the most important in 
practice being Cr 2 3 , A1 2 3 and Si0 2 . Si0 2 layers are particularly protective, but it is difficult to form continuous 
silica layers on silicon-containing steels. It is, however, straightforward to produce SiO ? on silicon single crystals, 


and the ability for silicon to form high-quality (amorphous) thermal oxide is to a large extent responsible for its 
successful application in MOS (metal/oxide/semiconductor) technologies [ 129 ]. 


The rate of diffusion through an oxide film depends on a number of factors, such as the temperature, oxygen partial 
pressure and structure of the oxide. At high temperatures (>0.7 of the melting point of the metal) lattice diffusion 
dominates through the crystalline oxide formed on a metal. However, at moderate temperatures diffusion via oxide 
grain boundaries is predominant. In this case, the rate of oxidation of a metal or alloy depends on the oxide grain 
size, which is often dictated by substrate grain orientation, surface pretreatment etc [ 130 ]. Deviation from parabolic 
oxidation behaviour is often observed and can be the result of the oxide grain size changing with time at a 
particular temperature. In this case, the number of oxide grain boundary 'easy diffusion paths' decreases with time, 
causing an apparent decrease in oxidation rate. If true parabolic behaviour is observed, then the change in oxidation 


rate with temperature will follow an Arrhenius equation; k = Aq 


-AE/PT 


The log of the parabolic rate constant In k 


D ° F D 

is proportional to l/T, where Tis the absolute temperature: The slope of such a plot yields the activation energy, 
AE, for oxidation. 


If a compact film growing at a parabolic rate breaks down in some way, which results in a non-protective oxide 
layer, then the rate of reaction dramatically increases to one which is linear. This combination of parabolic and 
linear oxidation can be termed paralinear oxidation. If a non-protective, e.g. porous oxide, is formed from the start 
of oxidation, then the rate of oxidation will again be linear, as rapid transport of oxygen through the porous oxide 
layer to the metal surface occurs. Figure C2.8.7 shows the various growth laws. Parabolic behaviour is desirable 
whereas linear or 'breakaway' oxidation is often catastrophic for high-temperature materials. 


M 



inve rse tog. Law 


Tlmt 

Figure C2.8.7. Principal oxide growth rate laws for low- and high-temperature oxidation: inverse logarithmic, 
linear, paralinear and parabolic. 
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High-temperature protective oxides often fail as the result of the development of stress during growth, which 
causes cracking and rupture of the scales leading to much faster metal degradation. Mechanisms for intrinsic 
growth stresses include the epitaxial relationship between the metal and scale [ 131 ], the volume change that occurs 
when a metal is converted into oxide by oxidation (the Pilling-Bedworth ratio) [ 132 ], compositional changes in 
either the metal or oxide and the influence of vacancies generated during oxidation [ 133 ]. Many discussions of the 
mechanisms of growth stresses have been published, e.g. [ 134 , 135 ]. 


C2.8.4 SUPPRESSION OF CORROSION 


C2.8.4.1 ELECTROCHEMICAL PROTECTION 

Based on the polarization curves of figure C2.8.4 there are several possibilities for reducing or suppressing the 
corrosion reaction. The main idea behind every case is to shift the corroding anode potential away from E corr . This 
can be done in the following ways. 

(i) Cathodic protection [136, 137 and 138] . By imposing a negative external voltage, the potential can be shifted 
cathodic to the corrosion potential (or better negative to the oxidation potential Ef v/fv-% to achieve 
thermodynamic stability). 

Cathodic protection can also be achieved without the application of an external voltage by coupling with a less 
noble metal. The coupled metal has an E redox that is negative to the material to be protected and thus becomes the 
anode which corrodes. The anode is therefore termed sacrificial. 

(ii) Anodic protection [139] . If the material exhibits a passive behaviour (dotted line in figure C2.8.4 , the potential 
of the corroding material can be anodically shifted into the passive range. This can be achieved by imposing a 
positive external voltage. The remaining corrosion current then depends on the quality of the passive film. 

For practical applicability, several aspects have to be considered such as the anode material (sacrificial (e.g. zinc) 
or inert (e.g. Pt/Ti or graphite)), the conductivity of the medium and the current distribution. Cathodic protection is 
typically used for buried constructions (e.g. pipelines), off-shore structures or ship hulls. 

C2.8.4.2 CHEMICAL INHIBITION [140 , 141, 142 , 143, 144 AND 145] 

Corrosion suppression by inhibitors can be achieved by adding chemical species to the environment, which lead to 
a strong reduction of the dissolution rate. Depending on their specific action, corrosion inhibitors can be divided 
into the following groups. 

(i) Oxidizing inhibitors. The idea is essentially the same as with anodic protection. If the material shows a passive 
range, the corrosion potential can be shifted into the passivity region by adding an additional redox couple to the 
electrolyte that possesses an E rpd at potentials in the passive regime (oxidative inhibition). 
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The effectiveness of anodic methods can vary considerably and is mainly determined by the protective nature of the 
passive layer formed. 

(ii) pH modifiers. The primary effect of pH modification can be deduced from the Pourbaix diagram ( figure 
C2.8.1 ). The goal is to shift the pH into a region where thermodynamically, the formation of a passive layer is 
favoured as opposed to active dissolution. Often pH buffers are used, which keep the pH stable and thus hamper 
acidification, as discussed in the localized corrosion section. 

(Hi) Surface blockers. Type 1 : the inhibiting molecules set up a geometrical barrier on the surface (mostly by 
adsorption) such as a variety of ionic organic molecules. The effectiveness is directly related to the surface 
coverage. The effect is a lowering of the anodic part of the polarization curve without changing the Tafel slope. 

Type 2: the inhibiting species takes part in the redox reaction, i.e. it is able to react at either cathodic or anodic 
surface sites to electroplate, precipitate or electropolymerize. Depending on its 'activation' potential, the inhibitor 
affects the polarization curve by lowering the anodic or cathodic Tafel slope. 

C2.8.4.3 SURFACE TREATMENTS [137 , 146, 147] 

For corrosion protection a large number of different types of surface treatment and coating have been developed, 
ranging from inorganic enamel coatings to organic coatings. In the following, the main two types of coating are 


briefly discussed. 

(i) Paint and insulating coatings. Coating the surface with some impermeable layer, such as paint, is frequently 
used due to the ease of application. The protection mechanism is simply to provide a physical barrier against metal 
dissolution. Unfortunately, this protection can fail disastrously if the coating is defective. Therefore, in many 
practical applications, a combination of an insulating coating and cathodic protection is employed. 

(ii) Deposition of a less noble metal (mostly by galvanizing). The principle is again identical to cathodic protection. 
The coating has an E redox that is negative to the material to be protected and the layer serves as a sacrificial anode. 
Therefore, this type of coating is not sensitive to defects, pinholes or mechanical damage during service. A typical 
example is galvanized steel (Zn layer on steel). 

C2.8.4.4 PROTECTION AGAINST HIGH-TEMPERATURE CORROSION 

Increasing operating temperatures result in increasing rates of corrosion and protective coatings are used to enhance 
component performance. These coatings serve as effective diffusion barriers between the oxidizing environment 
and the base alloy. In practice, corrosion-resistant coatings usually produce protective oxide scales consisting of 
thermally formed Cr 2 3 , A1 2 3 or Si0 2 films. The coating and the substrate should preferably have closely 
matched thermal expansion coefficients to prevent cracking during thermal cycling; the coating should also be able 
to withstand damage from impacts, erosion and abrasion. 
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C2.8.5 BRIEF OVERVIEW OF OTHER SPECIFIC CASES OF CORROSION 

In the following, the most typical modes of corrosion — other than the above discussed uniform dissolution (active 
corrosion) and localized pitting and crevice corrosion (local active dissolution) — are briefly presented. 

The paragraphs below are arranged in alphabetical order and are intended only as a short reference. For readers 
interested in a particular topic a few references are given which serve as a link for further reading. Generally, it 
should be noted that the separation of the categories below is to a large extent based on historic evolution rather 
than physicochemical mechanisms. 

C2.8.5.1 ATMOSPHERIC CORROSION [148, 149, 150] 

Atmospheric corrosion results from a metal's ambient-temperature reaction, with the earth's atmosphere as the 
corrosive environment. Atmospheric corrosion is electrochemical in nature, but differs from corrosion in aqueous 
solutions in that the electrochemical reactions occur under very thin layers of electrolyte on the metal surface. This 
influences the amount of oxygen present on the metal surface, since diffusion of oxygen from the 
atmosphere/electrolyte solution interface to the solution/metal interface is rapid. Atmospheric corrosion rates of 
metals are strongly influenced by moisture, temperature and presence of contaminants (e.g., NaCl, S0 2 , . . .)• 
Hence, significantly different resistances to atmospheric corrosion are observed depending on the geographical 
location, whether rural, urban or marine. 

C2.8.5.2 CONTACT CORROSION = GALVANIC CORROSION [151, 152, 153, 154, 155 AND 156] 

This type of corrosive attack occurs when dissimilar metals (i.e., with a different E ^ ) are in direct electrical 
contact in corrosive solutions or atmospheres. Under such conditions, enhanced corrosion of the less noble part of 
the bimetallic couple takes place, whereas the corrosion rate of the more noble part of the couple is reduced or even 
completely suppressed (as in the case of corrosion suppression by cathodic protection). The difference in corrosion 
potential of the components of the couple provides the driving force for the corrosion reaction. However, to 
determine the kinetics of galvanic corrosion, knowledge of the nature and kinetics of the cathodic reaction at the 


surface of the more noble metal as well as the nature and kinetics of the anodic reaction on the surface of the less 
noble metal are required. The nature and conductivity of the electrolyte solution determines the current and 
potential distribution: the larger the conductivity, the farther from the contact site the coupling action is 
experienced. A major factor in determining the danger of galvanic corrosion is the ratio of the area of the cathode 
and the anode. The higher the cathode area compared with the anode area, the larger the enhancement of 
dissolution of the less noble metal due to coupling. 

C2.8.5.3 CAVITATION (CORROSION) [157. 158 . 159 AND 160] 

Cavitation damage is a form of deterioration associated with materials in rapidly moving liquid environments, due 
to collapse of cavities (or vapour bubbles) in the liquid at a solid-liquid interface, in the high-pressure regions of 
high flow. If the liquid in movement is corrosive towards the metal, the damage of the metal may be greatly 
increased (cavitation corrosion). 
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C2.8.5.4 CORROSION FATIGUE [161 . 162 AND 163] 

Corrosion fatigue is a type of failure (cracking) which occurs when a metal component is subjected to cyclic stress 
in a corrosive medium. In many cases, relatively mild environments (e.g., atmospheric moisture) can greatly 
enhance fatigue cracking without producing visible corrosion. 

C2.8.5.5 DEALLOYING, SELECTIVE CORROSION 

In certain alloys and under certain environmental conditions, selective removal of one metal (the most 
electrochemically active) can occur that results in a weakening of the strength of the component. The most 
common example is dezincification of brass [164, 165 ]. The residual copper lacks mechanical strength. 

Another case of selective corrosion is the graphitization of grey cast iron, resulting in preferential removal of the 
metallic constituent, leaving graphite. Here again the physical form of the casting is maintained, but it is devoid of 
any mechanical strength. 

C2.8.5.6 EROSION (CORROSION) [166 . 167] 

Erosion is the deterioration of a surface by the abrasive action of solid particles in a liquid or gas, gas bubbles in a 
liquid, liquid droplets in a gas or due to (local) high-flow velocities. This type of attack is often accompanied by 
corrosion (erosion-corrosion). The most significant effect of a joint action of erosion and corrosion is the constant 
removal of protective films from a metal's surface. This can also be caused by liquid movement at high velocities, 
and will be particularly prone to occur if the solution contains solid particles that have an abrasive action. 

C2.8.5.7 FRETTING CORROSION [168] 

Fretting corrosion is a form of damage which occurs at the interface of two closely fitting surfaces when they are 
subject to slight oscillatory slip and joint corrosion action. Almost all materials are subject to fretting and, hence, its 
incidence in vibrating machinery is high. The damage is mostly of a localized form and any debris which is 
generated (mostly oxide) has some difficulty escaping from the rubbing zone, and this can lead to an increase in 
stress. 

C2.8.5.8 HYDROGEN EMBRITTLEMENT [169] 

A process resulting in a decrease in toughness or ductility of a metal due to absorption of hydrogen. This atomic 
hydrogen can result, for instance, in the cathodic corrosion reaction or from cathodic protection. 


C2. 8. 5. 9 IMPINGEMENT A TTA CK [158] 

Localized erosion-corrosion caused by turbulence or impinging flow at certain points of the surface. In the 
majority of cases of impingement attack, a geometrical feature of the system results in turbulence at one or more 
parts of the surface. 
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C2.8.5.10 INTERGRANULAR CORROSION [170] 

Corrosion damage due to enhanced dissolution in or adjacent to the grain boundaries of a metal, due to composition 
gradients between the grain boundary area and the bulk metal. An example is the intergranular attack of stainless 
steels, which can be explained by a chromium depletion. In a specific temperature region, carbon diffuses to the 
grain boundaries and reacts with chromium to form chromium carbides, thereby depleting the adjacent areas of 
chromium. Since stainless steels depend on chromium for corrosion resistance, the grain boundary areas become 
less resistant to corrosion and more susceptible to localized attack. 

C2.8.5.11 MICROBIOLOGICALLY INDUCED CORROSION (MIC) [171 . 172 AND 173] 

Corrosion associated with the action of micro-organisms present in the corrosion system. The biological action of 
organisms which is responsible for the enhancement of corrosion can be, for instance, to produce aggressive 
metabolites to render the environment corrosive, or they may be able to participate directly in the electrochemical 
reactions. In many cases microbial corrosion is closely associated with biofouling, which is caused by the activity 
of organisms that produce deposits on the metal surface. 

C2.8.5.12 STRAY-CURRENT CORROSION [138] 

Corrosion due to stray current — the metal is attacked at the point where the current leaves. Typically, this kind of 
damage can be observed in buried structures in the vicinity of cathodic protection systems or the DC stray current 
can stem from railway traction sources. 

C2.8.5.13 STRESS CORROSION CRACKING (SCC) [174, 175, 116, 177, 178 AND 179] 

A process involving combined corrosion and straining of the metal due to residual or applied stresses. The 
occurrence of stress corrosion cracking is highly specific; only particular metal/environment systems will crack. 
The appearance of stress corrosion cracking may be either intergranular or transgranular in nature. 
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C2.9 Tribology 

Andrew J Gellman 


C2.9.1 INTRODUCTION 

The term tribology translates literally into 'the study of rubbing'. In modern parlance this field is held to include 
four phenomena: adhesion, friction, lubrication and wear. For the most part these are phenomena that occur 
between pairs of solid surfaces in contact with one another or separated by a thin fluid film. Adhesion describes the 
resistance to separation of two surfaces in contact to while friction describes their tendency to resist shearing. 
Lubrication is the phenomenon of friction reduction by the presence of a fluid (or solid) film between two surfaces. 
Finally, wear describes the irreversible damage or deformation that occurs as a result of shearing or separation. 

Tribological phenomena have been known to mankind since prehistorical times when friction between wooden 
sticks was used to produce fire. The first historical record of tribology described the use of lubrication in the 


construction of Egyptian temples as far back as 2400 BC. The earliest scientific studies of tribological phenomena 
are attributed to da Vinci in the 15th century, Amontons in the 17th and Coulomb in the 18th. More recently, the 
application of modern methods of physics and chemistry to tribology is usually credited to Bowden and Tabor for 
work done in the post World War II era [1, 2]. In the past decade there has been a great deal of progress in the 
understanding of tribological phenomena. This progress has been catalysed by the development and application of 
a number of experimental and theoretical methods that allow study of tribological phenomena at an unprecedented 
level of detail [3, 4]. This effort has been further motivated by the development of several 'high tech' devices that 
have pushed the need for tribology into extreme conditions and environments. Examples include: the lubrication of 
ceramics for high temperature applications, lubrication of the surfaces of hard disks for data storage [5] and 
microelectromechanical systems (MEMS) [6]. These applications place unprecedented demands on the 
performance of tribological systems. Furthermore, there is increasing interest in solving tribological problems that 
arise in such diverse environments such as the human body (joints), vacuum (satellite components) and the Earth's 
mantle (tectonic motion and earthquakes). 

The scope of this entry includes a description of tribological phenomena and the modern tools that are spurring 
developments in our understanding of tribology. The goal is to provide the reader with a basic understanding of the 
concepts, an understanding of their limitations and a perspective on the breadth and scope of phenomena that are 
included under the umbrella of tribology. 


C2.9.2 PHYSICAL DESCRIPTION OF TRIBOLOGICAL PHENOMENA 

A typical physical process involving friction occurs in three steps: the formation of a contact between two solid 
surfaces, the shearing of those surfaces and the separation of those surfaces ( figure C2.9.1 ). A device used to 
measure friction includes the elements shown in figure C2.9.1 and measures friction forces (F) and normal forces 
(TV) as extension (or compression) of springs during sliding. The phenomenological measure of friction is in terms 
of a friction coefficient given by 
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* = F 


The proportionality between normal and friction forces is observed for many systems but is not founded in any 
basic physics. Much work in the field of tribology has been devoted to rationalizing the implication that the friction 
coefficient does not depend upon the apparent contact area between the two solid surfaces. For this reason the field 
of contact mechanics has always been intimately linked to tribology. 


Figure C2.9.1 Schematic representation of the steps involved in a tribological process: (a) contact between 
surfaces, (b) shearing under a constant normal force and (c) separation against adhesive forces. In the absence of 
gravity the normal (TV) and friction (F) forces are measured by the extension or compression of the springs. 

C2.9.2.1 CONTACT OF SOLID SURFACES 

With very few exceptions is it possible to produce surfaces of materials that are atomically smooth across 
macroscopic length scales. The protrusions that exist on the surfaces of all common materials are called asperities. 
As two surfaces are brought together, the peaks of asperities come into contact first. If the two solids were 
nondeformable there would be at most three contact points spanning the entire apparent contact area and supporting 
the normal force. In reality, the normal forces between two solids cause both elastic and plastic deformation of the 
regions around the initial contact points such that the two solids come into contact across a finite area. 
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Modelling of the true contact area between surfaces requires consideration of the deformation that occurs at the 
peaks of asperities as they come into contact with mating surfaces. Purely elastic contact between two solids was 
first described by H Hertz [7]. The Hertzian contact area (^ H ) between a sphere of radius r and a flat surface 
compressed under normal force TV is given by 


A t[ = n(ricN) 2/3 


where E 1? E 2 are the elastic moduli, and v 1? v 2 are the Poisson numbers for the sphere and the surface. The friction 
force during shearing is proportional to the contact area; however, under conditions of purely elastic Hertzian 
contact it cannot be proportional to the normal force. This is inconsistent with the empirical observation of a 
coefficient of friction. The most important correction to the Hertz expression is due to Johnson, Kendall and 
Roberts (JKR) who included the fact that the surface tensions of two solids will contribute to the contact area [8]. 
The JKR expression for the contact area between a sphere and a flat surface is 

In this expression y = y 1 + y 2 - y 12 where y 1 and y 2 are the surface tensions of the sphere and flat, respectively, and 
y 12 is the surface tension of the interface between them. The JKR expression recognizes the fact that even in the 
absence of a normal force (N= 0) surface tension will cause some elastic deformation of the surfaces producing a 
finite contact area. This fact alone renders the concept of a coefficient of friction meaningless since it implies that 
there is some finite friction force between solids even under zero normal force. 

In reality most solids in contact under macroscopic loads undergo irreversible plastic deformation. This is caused 
by the fact that at high normal forces the stresses in the bulk of the solid below the contact points exceed the yield 
stress. Under these conditions the contact area expands until the integrated pressure across the contact area is equal 
to the normal force. Since the pressure is equal to the yield strength of the material a, the plastic contact area is 
given by 

Thus, under conditions of plastic deformation the real area of contact is proportional to the normal force. If the 
shear force during sliding is proportional to that area, one has the condition that the shear force is proportional to 
the normal force, thus leading to the definition of a coefficient of friction. 

Determining the contact area between two rough surfaces is much more difficult than the sphere-on-flat problem 
and depends upon the morphology of the surfaces [9]. One can show, for instance, that for certain distributions of 
asperity heights the contact can be completely elastic. However, for realistic morphologies and macroscopic normal 
forces, the contact region includes areas of both plastic and elastic contact with plastic contact dominating. 
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C2.9.2.2 SHEAR OF SOLID SURFACES 


Sliding of two solid surfaces in contact is induced by the application of a shear force. As the shear force spring in 
figure C2.9.1 is stretched at a constant velocity, the shear force on the interface increases until sliding begins. This 
process is illustrated in figure C2. 9.2(a) in which the shear force increases until a critical shear stress is reached. At 
that point sliding begins and the shear stress at the interface often drops to some constant value. The critical shear 
force (F ) needed to induce sliding is often used to define a static coefficient of friction 


c 7 


^s = 

N 


while the shear force during sliding defines a dynamic coefficient of friction 

F 

^ = 77" 


A second type of dynamics is commonly observed during sliding: stick-slip motion. This is responsible for the 
generation of acoustic emission, e.g., the squealing of chalk on a blackboard or brakes in a car. The dynamics of 
stick-slip motion are illustrated in figure C2.9.2 b) which shows the shear stress at the interface going through 
periodic oscillations. It is important to point out that the origins of stick-slip behaviour lie in the dynamics of the 
entire system and not simply the properties of the interface. In the device of figure C2.9.1 stick-slip motion will 
always occur at low sliding speeds or for shear springs with low spring constants [10]. 

The role of solid and liquid lubricants at solid-solid interfaces is to reduce friction forces during sliding. Liquid 
lubrication is commonly described as occurring in three regimes that depend upon the normal forces and sliding 
speeds of two surfaces in contact. At low normal force and high sliding speeds liquid films can completely separate 
two surfaces, preventing solid-solid contact. Under these conditions, usually referred to as hydrodynamic 
lubrication, the frictional forces between two surfaces are determined by the rheological properties of the thin fluid 
film that separates them. As the normal force is increased and the sliding speed decreased the interface enters the 
boundary regime of lubrication. The surfaces have deformed under the high normal forces and are thought to be 
separated by monomolecular films of adsorbed molecules. These are typically surfactant-like species that are added 
to lubricant fluids for just this purpose. Under even higher normal forces the interface enters the extreme pressure 
regime during which direct solid-solid contact occurs. The surfaces are deformed even further and high rates of 
wear are observed exposing clean solid surfaces. Lubricant fluids usually contain extreme pressure additives which 
can react with clean exposed surfaces under high pressure and high temperature conditions to form thin solid films 
with low shear yield strengths. It is these thin solid films that provide lubrication. As implied by the discussion 
above, lubricant fluids are often very complicated mixtures containing as many as ten or 20 additives, each of 
which serves a specific purpose in reducing friction and wear of solid surfaces in sliding contact [2]. 
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Figure C2.9.2 Shear force versus time during (a) sliding and (b) stick-slip motion. The motion of the surface 
beneath the sliding block of figure C2.9.1 is at constant velocity. 

C2.9.2.3 SEPARATION OF SOLID SURFACES 

The separation of two surfaces in contact is resisted by adhesive forces. As the normal force is decreased, the 
contact regions pass from conditions of compressive to tensile stress. As revealed by JKR theory, surface tension 
alone is sufficient to ensure that there is a finite contact area between the two at zero normal force. One 
contribution to adhesion is the work that must be done to increase surface area during separation. If the surfaces 
have undergone plastic deformation, the contact area will be even greater at zero normal force than predicted by 
JKR theory. In reality, continued plastic deformation can occur during separation and also contributes to adhesive 
work. 

C2.9.2.4 ENERGY DISSIPATION MECHANISMS 

The friction between two surfaces in sliding contact often manifests itself experimentally as a force. At a 
fundamental level, however, it is more relevant to think of friction in terms of energy dissipation [11]. In a 
completely conservative world the lateral translation of one object over another does not require work. 
Nonetheless, in the presence of friction, work is done. If one considers two solids in contact as a thermodynamic 
system, then sliding is an adiabatic process in which work is done to the system through the shear spring in figure 
C2.9.1 increasing its internal energy. The increase in internal energy can be considered to take two forms: thermal 
energy and potential energy. The thermal energy manifests itself as an increase in temperature. The potential 
energy increase is in the form of structural or even chemical changes to the system resulting from sliding. 
Examples include changes in the surface area due to 
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deformation, the creation of bulk defects in solids and changes in the composition of the system due to chemical 
reactions occurring during sliding or separation. 

One of the primary goals of current research in the area of tribology is to understand how it is that the kinetic 
energy of a sliding object is converted into internal energy. These dissipation mechanisms determine the rate of 
energy flow from macroscopic motion into the microscopic modes of the system. Numerous mechanisms can be 


and have been suggested: sliding causes excitation of atomic motions at the interface; bulk deformation along slip 
planes which leads to excitation of atomic motions and formation of defects; excitation of long wavelength motions 
of the solids can lead to acoustic emission; excitation of electronic motion in the solids can lead to friction due to 
electronic resistance. Understanding the relative importance of such energy dissipation mechanisms is one of the 
goals of tribological research. This is an extremely difficult challenge, since their relative importance depends upon 
the properties of the materials and the characteristics of the motions of the system. 


C2.9.3 MODERN METHODS OF TRIBOLOGY 

Progress in the understanding of tribology fundamentals has been accelerated over the past decade by the 
development of both experimental and theoretical tools for its study. While there have been many ideas put forward 
to describe the sources of friction, one of the primary impediments to progress has been a lack of reproducible 
measurements of friction under well defined conditions. It is possible to find tabulated lists of the coefficients of 
friction between various materials sometimes reported to three significant figures of accuracy. However, friction 
forces are sensitive to so many variables that it is not at all clear that these numbers are useful or even reproducible. 
As a simple example, friction between metals is influenced by adsorbed molecular films of a few monolayers 
thickness [12]. Without going to extraordinary lengths it is not possible to create perfectly clean metal surfaces and, 
as a result, the vast majority of friction measurements between metal surfaces are influenced by contaminant films. 
Even in the scientific literature it is difficult to find consistency among reported absolute values of friction between 
solids, undoubtedly because of lack of control over various characteristics of the sliding interface or the 
measurement mechanism itself. 

C2.9.3.1 ATOMIC FORCE MICROSCOPY 

The atomic force microscope (AFM) provides one approach to the measurement of friction in well defined systems. 
The AFM allows measurement of friction between a surface and a tip with a radius of the order of 5-10 nm figure 
C2.9.3 a)). It is the true realization of a single asperity contact with a flat surface which, in its ultimate form, would 
measure friction between a single atom and a surface. The AFM allows friction measurements on surfaces that are 
well defined in terms of both composition and structure. It is limited by the fact that the characteristics of the tip 
itself are often poorly understood. It is very difficult to determine the radius, structure and composition of the tip; 
however, these limitations are being resolved. The AFM has already allowed the spatial resolution of friction forces 
that exhibit atomic periodicity and chemical specificity [3, 10, 13 ]. 


-7- 


Atomic 
Force 

Microscope 



Tip 


Surface 


Surface 

Forces 

Apparatus 



Quartz 
Crystal 

Micro- 
balance 



Adsorbata 
Mela I 

Quartz 


Figure C2.9.3 Schematic diagrams of the interfaces realized by (a) the atomic force microscope, (b) the surface 
forces apparatus and (c) the quartz crystal microbalance for achieving fundamental measurements of friction in 
well defined systems. 

C2.9.3.2 SURFACE FORCES APPARATUS 

The surface forces apparatus (SFA) measures forces between atomically flat surfaces of mica. Mica is the only 
material that can be prepared with surfaces that are atomically flat across square-millimetre areas. The SFA 
confines liquid films of a few molecular layers' thickness between two mica surfaces and then measures shear and 
normal forces between them (figure C2.9.3)b)). In essence, it measures the rheological properties of confined, 
ultra-thin fluid films. The SFA is limited to the use of mica or modified mica surfaces but can be used to study the 
properties of a wide range of fluids. It has provided experimental evidence for the formation oflayered structures in 
fluids confined between surfaces and evidence for shear-induced freezing of confined liquids at temperatures far 
higher than their bulk freezing temperatures [14, 15 ]. 

C2.9.3.3 MOLECULAR DYNAMICS 

Molecular dynamics (MD) methods can be used to simulate tribological phenomena at a molecular level. These 
have been used primarily to simulate behaviour observed in AFM and SFA measurements. Such simulations are 
limited to short-timescale events, but provide a wealth of information and insight into tribological phenomena at a 
level of detail that cannot be realized by any experimental method. One of the most interesting contributions of 
molecular dynamics 
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is the demonstration that many of the predictions of continuum mechanics, such as those of Hertz, hold true down 
to length scales of the order of 10 nm, in spite of the obvious atomic coarseness on these length scales. 
Investigations of confined fluids have revealed the layered structures observed experimentally by the SFA and 
shear-induced freezing/melting transitions that can occur during stick-slip motion [16, 17 ]. 


C2.9.3.4 QUARTZ CRYSTAL MICROBALANCE 


Quartz crystal microbalances (QCMs) have been used to measure friction between thin films of adsorbed gases and 
gold surfaces. These measure both the resonant frequency and ^-factor of a quartz crystal oscillator coated with 
metal ( figure C2.9.3(c) ). The frequency is determined by the total mass of the system and is sensitive to the 
presence of fractions of a monolayer of adsorbates. The Q of the resonance serves to measure energy dissipation at 
the interface between the metal film and weakly adsorbed gases such as Ar, Xe and N 2 . One of the most important 
contributions of this measurement has been the observation of electronic effects in the friction between adsorbed 
N 2 and a Pb surface. These electronic effects manifested themselves as discontinuities in the energy dissipation as 
the Pb was heated and cooled through its superconducting phase transition [4]. 

C2.9.3.5 UHV SURFACE SCIENCE METHODS 

Ultra-high vacuum (UHV) surface science methods allow preparation and characterization of perfectly clean, well 
ordered surfaces of single crystalline materials. By preparing pairs of such surfaces it is possible to form interfaces 
under highly controlled conditions. Furthermore, thin films of adsorbed species can be produced and characterized 
using a wide variety of methods. Surface science methods have been coupled with UHV measurements of 
macroscopic friction forces. Such measurements have demonstrated that adsorbate film thicknesses of a few 
monolayers are sufficient to lubricate metal surfaces [12, 18 ]. 


C2.9.4 OUTLOOK 

Tribological problems will continue to plague society and will become more problematic as technology forces the 
development of mechanical systems that operate in increasingly extreme environments. These problems cannot be 
solved indefinitely using empiricism and adaptation of existing methods. This is spurring development of 
experimental methods for study of tribological phenomena in highly defined systems and at the atomic level. As a 
result, there is a very encouraging outlook for increased understanding of the mechanisms of adhesion, friction, 
lubrication and wear. Furthermore, this will have long term impact on the understanding of many other areas in 
which tribology plays an important but often unnoticed role. These include biological problems such as the 
adhesion of cells or the lubrication of joints and geological problems such as plate tectonics and earthquakes. 
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C2.10 Surface electrochemistry 

Hans-Henning Strehblow and Dirk Lutzenkirchen-Hecht 


C2.10.1 INTRODUCTION 

The structure and the composition of the electrode/electrolyte interface is of great scientific and technological 
interest. It is generally accepted that the adsorption of anions and cations from the solution is the initial step for 
many important electrochemical processes such as oxide formation, pitting corrosion, electrocatalysis and metal or 
semiconductor deposition. Classical electrochemical techniques, which are in principle based on potential and 
current measurements, are able to reveal a detailed description of the electrochemical interface. For example, the 
kinetics of oxide or chloride film formation on several different metals have been investigated with a high 
accuracy, and different growth modes can easily be identified [1, 2 and 3 ]. In some cases, also a microscopically 
detailed picture of the electrode surface can be derived. For example, each single crystal metal surface in a well 
defined electrolyte shows a typical cyclic voltammogram which can be used for a simple control of the single 
crystal surface preparation (as an example see [4, 5 and 6]); in contrast to the large experimental expenditures 
which are necessary for a conventional surface structure analysis e.g. with low energy electron diffraction (LEED) 
or grazing incidence x-ray diffraction. For the underpotential deposition (upd) of copper on stepped platinum 
surfaces, a clear influence of the copper coverage on the hydrogen adsorption reaction was found; even small 
amounts of adsorbed Cu on the Pt surface are able to block the hydrogen adsorption reaction [7, 8]. Due to the fact 
that this reaction preferentially takes place at Pt step sites, one can directly deduce that the blocking of hydrogen 


adsorption is a consequence of the selective adsorption of Cu at step sites [8]. 

In general, however, the electrochemical methods cannot provide structural details of the electrode surface such as 
the binding geometry or the valency of adsorbates. In addition, they can hardly address the question of surface 
water, the charge distribution in the electrochemical double layer and the structure of the electrode and their 
changes with potential. Although in principle possible, it is often very difficult to calculate precise values for the 
coverage of adsorbates from electrochemical measurements, for example in the presence of coadsorbed species 
(see, e.g., [9]) or if a partial charge transfer from the adsorbate to the electrode occurs (see, e.g., [10]). Therefore, a 
large variety of different techniques have been introduced to the investigation of electrochemical interfaces in the 
past. In situ techniques, such as ultraviolet (UV) and visible reflectance, infrared (IR) and Raman spectroscopy, 
scanning tunnelling and atomic force microscopy (STM and AFM), surface x-ray scattering (SXS) and x-ray 
absorption spectroscopy (XAS), as well as ex situ techniques, among them electron diffraction methods (LEED and 
RHEED), x-ray and UV photoelectron spectroscopy (XPS and UPS), Auger electron spectroscopy (AES) and ion 
scattering spectroscopy (ISS), revealed a comprehensive picture-detailed on an atomic level-of the electrode 
surface in contact with the electrolyte and have furthered our understanding of electrochemical processes in many 
ways. In this contribution, we will report on recent results obtained with in situ and ex situ techniques with the 
focus of the present article on adsorption phenomena and underpotential deposition. 
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C2.1 0.2 ADSORPTION 

During the past decades, anionic and cationic adsorption on metal electrodes have been intensively investigated. 

Especially the electrosorption of halides (Cl~ Br - , I - ) on noble metals (Au, Ag, Pt) were in the focus of interest. 
Several different techniques such as electrochemical methods including the electrochemical quartz microbalance 
technique and capacity measurements [3, 11 ], radiochemistry and radiotracer experiments [12, 13], SXS [14, 15 ], 
electroreflectance spectroscopy [16, 17 and 18], second harmonic generation (SHG) [19, 20], LEED and electron 
spectroscopies (XPS, UPS and AES) [21, 22, 23, 24 and 25], electrode resistivity measurements [26, 27] and STM 
[6, 28, 29] have been successfully applied. Well ordered phases which can be commensurate or incommensurate 
with the underlying noble metal surface were found (e.g. [6, 29, 30, 31 and 32]); in some cases several different 
adsorbate structures can be observed simultaneously (see, e.g., [29, 33]). As an example, results obtained recently 
for the adsorption of bromine on Ag (100) with in situ SXS are presented in figure C2.10.1 [15]. The absence of 
translational invariance perpendicular to a single crystal surface causes scattering rods between the three 
dimensional bulk reflections [34, 35 ]. SXS comprises the measurement of the intensity distribution of the scattered 
x-rays in reciprocal space; the comparison of the measured diffraction patterns with atomic models enables the 
accurate determination of adsorbate structures such as the adsorbate coverage, nearest neighbour bond distances 
and numbers. Although each element has a characteristic backscattering power, the type of adsorbate can also be 
identified; this has been successfully used for the investigation of the structures of surface water on silver [36]. 
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Figure C2.10.1. Potential dependence of the scattering intensity of the (1,0) reflection measured in situ from Ag 
(100)/0.05 M NaBr after a background correction (dots). The solid line represents the fit of the experimental data 
with a two dimensional Ising model with a critical exponent of 1/8. Model structures derived from the experiments 
are depicted in the insets for potentials below (left) and above (right) the critical potential (from [15]). 
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The Ag (100) surface is of special scientific interest, since it reveals an order-disorder phase transition which is 
predicted to be second order, similar to the two dimensional Ising model in magnetism [37]. In fact, the steep 
intensity increase observed for potentials positive to - - 0.76 V against Ag/AgCl for the (1,0) reflection, which is 
forbidden by symmetry for the clean Ag(100) surface, can be associated with the development of an ordered (V2 x 
V2)R45°-Br lattice, where the bromine is located in the fourfold hollow sites of the underlying fee (100) surface; 
this structure is depicted in the lower right inset in figure C2.10.1 [15]. 

Below the critical potential, the scattered intensity is zero. Obviously, the experimental data are in good agreement 
with the curve calculated using a critical exponent of 1/8 as predicted by the Ising model. The measurement of 
further scattering peaks gives additional information about the position of the adsorbed bromine ions for potentials 
cathodic to the critical potential. The determined positions are consistent with the assumption of a lattice gas 
adsorption at the fourfold sites, however with a large lateral displacement, especially for low coverages. The 
observations can be explained by unbalanced Br-Br interactions and filling of the unoccupied area with surface 

water [15]. While similar results were obtained for Cl~ on Ag(100) [15], no order-disorder transition, but several 
different commensurate and incommensurate surface structures were found for bromine adsorption on Au (100) 
[ 38 ] although both surfaces are in the same universality class [37]. 


Besides adsorption on noble metal surfaces, however, also less noble metals like copper have been investigated in 
halide containing electrolytes during recent years. In figure C2.10.2 the cyclic voltammogram of a (1 1 l)-oriented 
Cu single crystal in 10 mM HC1 is given together with two STM micrographs; the latter were recorded at potentials 
below and above the distinct oxidation and reduction peaks, respectively [39]. 
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Figure C2.10.2. Cyclic voltammogram of Cu(l 1 1)/10 mM HC1 and in situ measured STM micrographs revealing 
the bare Cu(lll)surface (-1.05 V, left) and the (V3 x V3)R30°-Cladsorbate superstructure (-0.6 V, right) (from 
[39]). 
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Two clearly different surface structures are visible. Due to the lack of chemical information, however, the relation 
of these structures to the Cu(l 11) substrate or the presence of an ordered array of adsorbate molecules is very 
difficult for the following reasons. Firstly, it is well known from the literature that a surface reconstruction-i.e. a 
rearrangement or relaxation of the outer atomic layers-can be induced by a change of the electrode potential and 
charge; the reconstruction and its lifting are in general accompanied by small oxidation and reduction current 
densities (see, e.g., [40]). For example, the reconstruction observed during negative charging of Au(l 1 1) electrodes 
has been explained by an increasing compressive stress induced by an increasing density of s-p electrons (see [10]) 
and references therein). Secondly, if the presence of an adsorbate is likely or expected, its nature cannot be clarified 
by STM experiments alone. For example, the first ordered adsorbate structures found for Au(l 1 1) in sulfate 
solutions were interpreted as a hydrogen bonded bisulfate layer [41], while later on, additional FTIR and 
radiochemical studies proved the presence of adsorbed sulfate rather than bisulfate [42, 43]. In addition, the long 
range ordered structure observed during the underpotential deposition of copper on gold was imaged by STM long 
before its composition was known: though anions were believed to be invisible to STM, these structures were 
ascribed to the metal deposit [44, 45 and 46] while later on it was shown by XAS [47] and SXS [48] that they are 
related to coadsorbed anions (see below). Furthermore, a surface reconstruction can also be affected by ionic 
adsorption [14, 20, 28, 40, 49], which can additionally complicate the interpretation of STM micrographs. In the 
presented study, ex situ XPS and ion scattering experiments were performed after the controlled emersion of the 
electrodes from the solution and their transfer to a UHV system in order to clarify the situation [39]. It has to be 
mentioned that UHV techniques such as LEED, XPS, UPS and ISS have been successfully applied for the ex situ 
investigation of electrode surfaces since the construction of well suited transfer cells in the 1980s [50, 51, 52, 53 

and 54]; a review about this topic was given by Kolb [24]. For Cu (1 1 1) in dilute HC1 solutions, only very low Cl~ 
surface concentrations were found for potentials negative to -0.9 V against Hg/Hg 2 S0 4 in the anodic direction and 

below -1.0 V in the cathodic scan direction, while positive to the mentioned potentials strong CI signals were 

found with XPS and ISS [39]. Therefore, the STM investigations reveal the unreconstructed (1* 1) Cu(l 11) 

structure and the (V3 x V3)R30°-C1 adsorbate structure, respectively [39]. 


As a further example for the meaning of ex situ investigations of emersed electrodes with surface analytical 
techniques, results obtained for the double layer on polycrystalline silver in alkaline solutions are presented in 
figure C2.10.3 . This system is of scientific interest, since thin silver oxide overlay ers (thickness up to about 5 nm) 
are formed for sufficiently anodic potentials, which implies that the adsorption of anions, cations and water can be 
studied on the clean metal as well as on an oxide covered surface [55, 56]. For the latter situation, a changed 


adsorption behaviour can be expected due to the semiconducting properties of Ag oxides; this effect is responsible 
e.g. for a significant enhancement of the catalytic activity for ethene epoxidation [57]. In addition, the oxide layer 
can be doped with anions or cations as found for copper in alkaline chloride solutions [58]. One great advantage of 
XPS in this case is the possibility to determine surface concentrations of all double layer constituents; in particular 
the contributions of surface water, adsorbed hydroxyl ions and possible oxide signals can be fully separated as 
shown in figure C2. 10.3 (a) for an Ag electrode emersed from an alkaline perchlorate solution. The surface 
concentrations of the adsorbed hydroxyl ions, surface water as well as the oxide layer thickness calculated from the 

corresponding O signals are presented in figure figure C2. 10.3 (b) and figure C2. 10.3 (c). It should be mentioned 
that an oxide formation was found for potentials significantly lower than the Nernst potential of Ag 2 formation 

[56]. 
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Figure C2.10.3. Ex situ investigation of the electrochemical double layer on Ag after hydrophobic emersion from 1 
M NaC10 4 + 0.1 M NaOH. (a) Peak deconvolution of the XPS Ols signals after emersion at +0.2 V: A surface 

oxide as well as OH - and water/perchlorate contributions can be identified, (b) Potential dependence of the OH~ 
surface concentration. The inset depicts the oxide thickness t determined from the O 2- signals. The hatched area 
visualizes the potential region where an underpotential Ag(I) oxide formation occurs, (c) Amount of adsorbed 
water calculated from the H>O/CI0j XPS signals; the perchlorate contributions were determined from the CI 2p 

signals and subtracted accordingly [56]. The dashed horizontal line represents a monolayer coverage of H 2 0. 

In addition, the OH - coadsorption significantly influences the adsorption of all other double layer components [55, 
56 ]. For example, in chloride containing media, a significant reduction of the Cl~ surface concentration was found, 
while the concentration of cations and surface water was significantly increased compared to the emersion from 
acidic solutions [55]. Even the adsorption of the strongly binding iodine on silver is pH dependent [31]. The results 
can be explained by a specific adsorption of OH ~ on silver [55, 56, 59]; this interpretation is consistent with the 
fact that appreciable amounts of adsorbed OH - were also found for copper electrodes in alkaline solutions [60]. 
Due to the high bond strength between the OH - ion and the Ag metal surface, OH - is able to expel Cl~ from the 
inner Helmholtz plane [55], while it is not able to displace Br~ [59]. Br - adsorption itself, however, suppresses Ag 
oxide formation [59]. 


The last example presented in this section deals with the pitting corrosion of Fe in C10 4 solutions. Perchlorate is 
less known as an aggressive ion but reveals some unique and remarkable characteristics with regard to pitting 
corrosion. For example, the critical pitting potential (1.46 V against a standard hydrogen electrode (SHE) for Fe/1 
M NaC10 4 ) can be measured with an accuracy of less than 4 mV [ 61 ] which is very unexpected if compared to 

other aggressive ions such as Cl~ or Br - In concentrated HC10 4 , a slightly lower value of 1.37 V was found [62]. In 
figure C2.10.4 two CI 2p XPS spectra obtained from an Fe electrode emersed from 1 M HC10 4 for a potential 

cathodic and anodic to the pitting potential E p are compared. Obviously, only C10~ 4 species can be detected at a 
binding energy of -208 eV (CI 2p 0/0 peak) for the passive iron (E < E^), while clear and intense Cl~ contributions 


were found at about 198 eV binding energy for E > E p [62]. The anionic fraction of CI , i.e. the ratio of the CI 
intensity and the sum of the chlorine and the perchlorate intensity (/<-|~ /( 1^- + / CIU - })is less than 5% for the 

passive Fe sample while a value of about 30% is 
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found for the corroded sample [62]. The experiments clearly suggest that perchlorate is decomposed at the oxide 
covered Fe surface for sufficiently anodic potentials, resulting in CI - ions which cause breakdown of passivity and 
finally lead to the localized corrosion phenomena. This observation is very special as a reduction of CIO, - is 
expected at more negative potentials for thermodynamic reasons. Presumably its decomposition is caused in the 
high field of the electrical double layer [62]. 
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Figure C2.10.4. XPS CI 2p signals of an iron specimen emersed from 1 M HC10 4 : (a) after passivation at 1 V 
(SHE); (b) after 2 minutes pitting corrosion at 1.5 V (SHE). Contributions of ClOj at -208 eV and CI" at -198 eV 
are visible in different amounts. 


C2.10.3 METAL MONOLAYER DEPOSITION 

At potentials positive to the bulk metal deposition, a metal monolayer-or in some cases a bilayer-of one metal can 
be electrodeposited on another metal surface; this phenomenon is referred to as underpotential deposition (upd) in 
the literature. Many investigations of several different metal adsorbate/substrate systems have been published to 
date. In general, two different classes of surface structures can be classified: (a) simple superstructures with small 
packing densities and (b) close-packed (bulklike) or even compressed structures, which are observed for deposition 
of the heavy metal ions Tl, Hg and Pb on Ag, Au, Cu or Pt (see, e.g., [63, 64, 65, 66, 67, 68, 69 and 70]). In case 
(a), the metal adsorbate is very often stabilized by coadsorbed anions; typical representatives of this type are Cu/Au 
(1 1 1) (e.g. [44, 45, 71, 72 and 73]) or Cu/Pt(l 1 1) (e.g. [46, 74, 75, and 76 ]). It has to be mentioned that the two 
dimensional ordering of the Cu adatoms is significantly affected by the presence of coadsorbed anions, for 
example, for the upd of Cu on Au(l 11), the onset of underpotential deposition shifts to more positive potentials 
from SO^to Br and CI" [72]. 
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STM measurements suggested for the bilayer formed by Cu and coadsorbed CI either a (5 x 5) long range order 
similar to that of the (1 1 1) plane of CuCl or a (4 x 4)-based structure [46, 77, 78], while the bilayer formed with Cu 

and BR - has always a (4 x 4)-based structure [78, 79]. Therefore, x-ray absorption spectroscopy (XAS) is a well 
suited technique for the investigation of these mixed overlayers since it probes the local atomic structure around an 
absorbing atom with a high precision [47, 80, 81]. Similar to SXS, different neighbouring atoms can easily be 
separated due to their different backscattering amplitudes. In addition, by the choice of the absorption edge, the 
central atom can also be selected. Usually, XAS spectra were recorded using photons polarized parallel to the 
surface in the past. However, the application of polarization resolved XAS is very promising for the investigation 
of highly anisotropic samples such as upd layers with coadsorbed anions [ 47 , 82 ]: if the polarization of the x-rays 
and an investigated bond are aligned parallel to each other, the corresponding atom is visible in the XAS signal. 
However, with increasing bond angle, the respective contributions to the XAS decrease continuously until the atom 
is invisible for an angle of 90°. In figure C2.1Q.5 . Fourier transforms (FTs) of x-ray absorption spectra measured in 
the vicinity of the Cu K edge are displayed for several different anions using a polarization parallel to the sample 
surface (E\ i ) or parallel to the surface normal (E^) [82]. Although the peaks in the FTs are generally shifted 
towards lower distances, these FTs can be interpreted as an approximation of the corresponding radial distribution 
functions. Due to the fact that neighbouring Cu atoms as well as the Au substrate, water and CI, Br or S (from the 
sulfate) contribute to these FTs, the interpretation of the Ft data is not straightforward in the present situation, 
although one might expect stronger Cu-Cu peaks for E, i i , while stronger Cu-Au and Cu-anion peaks are likely 
for Ej_. Therefore, XAS multiple scattering calculations (FEFF 6.01 code [83, 84]) were performed for several 
different model structures of the bilayer structure: besides the Cu adatoms, up to 200 Au and anion atoms were 

included in these model clusters [47, 82]. For coadsorbed Cl~ the best fit was obtained for a structural model which 
is very similar to the (1 1 1) plane of a CuCl crystal, in which the Cu adatoms are placed in registry with the top 

layer of Cl~ while they are out of registry with the gold(l 11) substrate. However, the copper adatoms have to be 
moved out of the high symmetry positions in order to obtain better agreement between experiment and simulation 
[84], indicating that the bilayer is characterized by a large static disorder with a broad distribution of Cu-Cl, Cu- 
Au and Cu-Cu distances rather than by a single set of bond length and bond angles [47, 82]. 

For coadsorbed sulfate, the well ordered overlayer consists of a honeycomb (V3 x V3)R30° lattice of Cu adatoms, 
which corresponds to a coverage of 2/3 [80, 81]. The sulfate anions occupy the empty centres of the honeycomb 
[ 81 ] and the fit of the polarization dependent XAS data is best if three oxygen ions from the sulfate are directed 
towards the the empty centres of the honeycomb [82]. These results are in accordance with those of an SXS study 
[85] and a quantum statistical model [ 86 ] and imply that STM and AFM images obtained from this system are 
rather images of sulfate than of upd Cu. For Cu upd on Pt(100), an ordered Br layer on top of a pseudomorphic Cu 
(lxl) layer was found [87]; the Cu monolayer grows with enhanced kinetics compared to the halide free solutions 
[48, 76, 82, 88]- At this point it should be mentioned briefly that only a few efforts have been made to deposit 
metal monolayers on semiconducting substrates to date (for references, see, e.g., [89, 90, 91 and 92]), despite the 
technological importance of the electrochemical metallization and the contamination of semiconductors with metal 
ions. 
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Figure C2.10.5. Magnitude of the Fourier transform of the £- weighted absorption fine structure k % (k) measured at 
the Cu K edge for the underpotential deposition of Cu/Au(l 1 1) from 0.1 M KClO 4 +10" 3 M HC10 4 +5 x 10" 5 M Cu 

(ClO 4 ) 2 +10 M potassium salt of sulfate, chloride, bromide and a mixture of sulfate and chloride, for polarization 
of the x-rays parallel to the sample surface (£,,) or parallel to the surface normal (E ) (from [81]). 


C2.10.4 CONCLUSIONS 

The presented examples clearly demonstrate that a combination of several different techniques is urgently 
recommended for a complete characterization of the chemical composition and the atomic structure of electrode 
surfaces and a reliable interpretation of the related results. Structure sensitive methods should be combined with 
spectroscopic and electrochemical techniques. Besides in situ techniques such as SXS, XAS and STM or AFM, ex 
situ vacuum techniques have proven their significance for the investigation of the electrode/electrolyte interface. 
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C2.11 Ceramic processing 

Kevin G Ewsuk 


C2.11.1 INTRODUCTION 

Ceramics represent a unique class of materials that are distinguished from common metals and plastics by their: (1) 
high hardness, stiffness, and good wear properties (i.e. abrasion resistance); (2) ability to withstand high 
temperatures (i.e. refractoriness); (3) chemical durability; and (4) electrical properties that allow them to be 
electrical insulators, semiconductors, or ionic conductors. Ceramics can be broken down into two general 
categories, traditional and advanced ceramics. Traditional ceramics include common household products such as 
clay pots, tiles, pipe, and bricks; porcelain china, sinks, and electrical insulators; and thermally insulating refractory 
bricks for ovens and fireplaces. Advanced ceramics, also referred to as 'high-tech' ceramics, include products such 
as spark-plug bodies, piston rings, catalyst supports, and water-pump seals for automobiles; thermally insulating 
tiles for the space shuttle; sodium vapour lamp tubes in street lights; and the capacitors, resistors, transducers, and 
varistors in the solid-state electronics we use daily. 

The major differences between traditional and advanced ceramics are in the processing tolerances and cost. 
Traditional ceramics are manufactured with inexpensive raw materials, are relatively tolerant of minor process 
deviations, and are relatively inexpensive. Advanced ceramics are typically made with more refined raw materials 
and processing to optimize a given property or combination of properties (e.g. mechanical, electrical, dielectric, 
optical, thermal, physical, and/or magnetic) for a given application. Advanced ceramics generally have improved 
performance and reliability over traditional ceramics, but are typically more expensive. Additionally, advanced 
ceramics are typically more sensitive to the chemical and physical defects present in the starting raw materials, or 
those that are introduced during manufacturing. 

In general, ceramic manufacturing involves creating fine particle size powders, forming powders into a particulate 
compact, and heat treating (i.e. sintering) that compact to produce a cohesive body with the desired micro structure 
and properties for a given application. Because powder systems have a relatively large total surface area for their 
mass, surfaces and interfaces are very important in ceramic processing. From the perspective of a physical chemist, 
ceramic processing involves understanding and controlling the physical chemistry of surfaces and interfaces [1, 2]. 

Initially in ceramic powder processing, particle surfaces are created that increase the surface energy of the system. 
During shape forming, surface/interface energy and interparticle forces are controlled with surface active additives. 


Ultimately, the surface energy is used to produce a cohesive body during sintering. As such, surface energy, which 
is also referred to as surface tension, y, is obviously very important in ceramic powder processing. Surface tension 
causes liquids to form spherical drops, and allows solids to preferentially adsorb atoms to lower the free energy of 
the system. Also, surface tension creates pressure differences and chemical potential differences across curved 
surfaces that cause matter to move. 

The Laplace equation, which defines the pressure difference, AP, across a curved surface of radius, r, 
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AP = y(\fr t + l/r 2 ) (C2.11.1) 

has been characterized as the fundamental equation of capillarity [1]. In ceramic processing, the pressure associated 
with surface tension and capillary forces contribute to, among other things, particle clustering (i.e. agglomeration) 
and rearrangement, to the migration of liquids through pores during mixing, shape forming, and drying, and to pore 
shrinkage during sintering. 

The equilibrium vapour pressure, P, over a curved surface is defined by the Kelvin equation 

In P/P i , = 2yQ/rkT (C2.11.2) 

where Pq is the equilibrium vapour pressure over a planar surface, Q is the molecular volume of the condensed 
phase, k is Boltzmann's constant, and Tis the absolute temperature. Because the chemical potential difference, Aju, 
between a curved and flat surface is related to the vapour pressures over those respective surfaces, 

A/< =kT\\\Pfl\> (C2.11.3) 

chemical potential is also related to surface curvature: 

Ap = f± - v* = 2yQ/r. (C2.1 1 .4) 

The chemical potential of a curved surface is extremely critical in ceramic processing. It determines reactivity, the 
solubility of a solid in a liquid, the rate of liquid evaporation from solid surfaces, and material transport during 
sintering. 

This chapter will describe some of the basic unit processes in ceramic manufacturing, and will touch on the 
pertinence of the physical chemistry of surfaces in selected unit processes. For a more comprehensive review of 
ceramics and ceramic processing, the reader is referred to other sources [3, 4 and 5]. 


C2.11.2 POWDER PROCESSING 

Ceramic manufacturing involves multiple unit process steps ranging from raw materials beneficiation to finish 
machining ( figure C2.11.1 ). Ceramics are fabricated using raw materials, typically in powder form, that are 
generally beneficiated to improve their handling and processability. The desired size and shape ceramic component 
is produced by consolidating powder in a process that generally involves a forming pressure. Ultimately, this 
powder compact is heat-treated (i.e. sintered) to form a cohesive body. 
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Figure C2.11.1. A flow chart summarizing the ceramic design and manufacturing process. 

The fabrication of an alumina spark-plug body is a good example of ceramic manufacturing. The manufacturing 
process begins with an alumina powder (figure C2.1 1.2) comprised of individual alumina particles of the desired 
size distribution. To enhance densification, precursors of CaO, MO, and Si0 9 are typically mixed with the 
alumina powder to produce several weight per cent of a glass phase during sintering. This mixture is then 
transformed into a slurry of ceramic particles dispersed in water, which is subsequently granulated with an organic 
binder by spray drying. Spray drying produces larger clusters of particles called agglomerates or granules that have 
improved powder flow, packing, and formability ( figure C2.11.3 ). These granules are then pressed and machined to 
produce a powder compact of the desired size and shape that is held together by the organic additives. Finally, this 
powder compact is sintered to produce a dense ceramic spark-plug body ( figure C2.11.4 ). The mechanical and 
electrical properties of this body are determined by the micro structure of the poly crystalline alumina ceramic 
produced on sintering ( figure C2.11.5 ). 



Figure C2.11.2. A scanning electron micrograph showing individual particles in a polycrystalline alumina powder. 
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Figure C2.11.3. A scanning electron micrograph of the spherical alumina granules produced by spray drying a 
ceramic slurry. The granules are comprised of individual alumina particles, sintering additives, and an organic 
binder. 





Figure C2.11.4. A commercial spark plug with its electrically insulating ceramic body comprised of alumina and 
glass (white portion). 
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Figure C2.11.5. Scanning electron micrographs showing the microstructure of an alumina ceramic spark-plug 
body (a) fracture surface and (b) polished and thermally etched cross section. 

There is unquestionably a substantial engineering component in manufacturing ceramics. There is also a very 
critical scientific component that involves understanding and controlling the physical chemistry of surfaces. Not 
only are a number of different unit process steps required to manufacture a ceramic, but each unit process has its 
own set of requirements for optimization. Often, the requirements to optimize one step are diametrically opposed to 
those for another unit process. This necessitates compromise in order to optimize the complete manufacturing 
process. For example, while a fme-particle-size powder provides a high surface area and driving force for sintering, 
electrostatic attraction and van der Waals forces promote agglomeration of fine particles and make them difficult to 
mix, pack, and compact. As a compromise, a practical lower limit of -0.1 urn diameter particles, is typical in 
advanced ceramic powder processing. 
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C2.11. 2.1 RAW MATERIALS 


Ceramic processing generally starts with ceramic powders that range from relatively impure, naturally occurring 
clays, to ultra-high-purity, controlled morphology powders. Inexpensive, mined raw materials are typically used to 
manufacture traditional, high- volume production, ceramics [6, 7, 8, 9, 10, H and 12 ]. Chemically synthesized 
ceramic powders, which are often considerably more expensive, are used to manufacture high-cost, lower- volume, 


advanced ceramics [13, 14, 15, 16, 17, 18, 19 and 20]. 

Naturally occurring ceramic raw materials such as silica (Si0 2 , or quartz), silicates (e.g. talc) and aluminosilicates 
(e.g. clays) are generally mined from the Earth's surface. In contrast, nanometre size, controlled chemistry ceramic 
powders are produced from high-purity, specialty chemicals: the precipitation of solids from liquid solutions is one 
example. Precipitation occurs by a combination of nucleation and growth, both of which occur to lower the free 
energy of the system. The system's desire to minimize its surface energy per unit volume determines the shape of 
the precipitate. Most powders used in the manufacture of advanced ceramics fall somewhere between these two 
extremes. For example, reasonably pure ceramic powders can be formed by reacting constituent oxide powders 
and/or salts at an elevated temperature (i.e. calcining). Barium titanate, BaTi0 3 , which is used to make capacitors 
in solid-state electronics, can be produced by mixing BaC0 3 with Ti0 2 and calcining. High-surface-area fine 
powders promote rapid and complete reaction of the constituent powders to produce the desired compound. 

C2.11.2.2 BENEFICIATION 

Beneficiation is the process or processes whereby the chemical and/or physical properties and characteristics of a 
raw material are modified to make it more processable. Particle-size reduction (i.e. comminution) using mechanical 
energy may be the most common process [21, 22, 23, 24, 25, 26, 27 and 28]. Crushing, grinding, and/or milling 
create new surfaces by breaking down aggregates (i.e. clusters of tightly bound particles) and by fracturing 
particles. Comminution produces the desired size distribution powder for subsequent processing. 

After comminution, soluble impurities either inherent to the raw materials or introduced during processing can be 
extracted by washing (e.g. with water), followed by filtration [6, 23]. Chemical leaching and magnetic separation 
are also used to purify raw materials. In a more specialized process, a frothing agent can be used to promote 
differential adsorption of impurity particles onto gas bubbles to separate out the desired product [23]. 

C2.11.2.3 GRANULATION 

In dry powder processing, after the desired particle size and chemistry are obtained, the powder is generally 
granulated. Powders comprised of micrometre-size particles are difficult to handle due to interparticle forces. 
Granulation transforms individual particles into agglomerates with controlled size, shape, and strength, to improve 
the flow, packing, and compaction behaviour of powders in ceramic processing [29, 30 and 31 ]. Granules are 
formed by spraying a liquid or a binder solution directly into a tumbling powder, or by spray drying a slurry in a 
heated chamber. In the former, granules form under the influence of the capillary forces between the liquid-solid 
(particle) interfaces. In spray drying, a combination of (liquid) surface tension and interparticle forces produce 
-50-300 jum diameter granules. Due to liquid surface tension, the atomization of a slurry produces spherical 
droplets that subsequently dry to form spherical granules. Capillary forces hold the individual particles together 
within the granule during the drying, while van der Waals forces and bonds from the organic additives adsorbed 
onto particle surfaces hold the dry granule together. 
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C2.11. 2.4 FORMING ADDITIVES 

Immediately prior to, during, and/or immediately following granulation, forming additives or processing aids are 
commonly added to a ceramic powder to enhance processing [32, 33, 34, 35, 36 and 37]. Organic additives adsorb 
onto the surfaces of ceramic particles to modify surface energy and particle-particle interactions. Two common 
additives used in ceramic processing are binders and lubricants. Organic binders, which are also referred to as 
coagulants and flocculants, are polymer molecules or colloids that adsorb onto particle surfaces and promote 
interparticle bridging (i.e. flocculation). The main purpose of a binder is to provide strength to the powder compact 
after shape forming, which may be necessary for subsequent handling and/or green machining. Binders are used 
extensively in dry powder pressing operations, and are also added to extrusion bodies and to pastes. 


Lubricants are added to lower interfacial frictional forces between individual particles and/or between particles and 
forming die surfaces to improve compaction and ejection (i.e. extraction of the pressed compact from the forming 
die). Individual particle surfaces can be lubricated by an adsorbed film that produces a smoother surface and/or 
decreases interparticle attraction. Forming (die) surfaces can be lubricated by coating with a film of low- viscosity 
liquid such as water or oil. 


B1.20.3 SHAPE FORMING 

Ceramic forming typically involves using pressure to compact and mould particles to the desired size and shape. 
Ceramics can be formed from slurries, pastes, plastic bodies (i.e. such as a stiff mud), and from wet and dry 
powders. 

C2.11.3.1 SLURRIES 

In preparation for the shape forming process, ceramic particles can be dispersed in a liquid. The dispersion of solid 
particles in a liquid is known as a slurry, and is often referred to as a suspension or a dispersion. Forming a slurry 
involves (1) wetting the solid particle surface with the liquid (i.e. replacing the solid-vapour interfacial area with 
solid-liquid interfaces), (2) breaking down agglomerates, and (3) controlling particle surface charge to prevent 
flocculation or reagglomeration [38, 39, 40, 41 and 42]. To optimize dispersion and stability, dispersants (also 
known as deflocculants or anticoagulants) are often added to slurries [33, 36]. Deflocculants prevent dispersed 
particles from reagglomerating in a slurry by keeping particle-particle separation distances sufficiently large such 
that the short range van der Waals attractive forces that will hold particles together are negligible. Particle 
separation is maintained by the steric effect of the preferential adsorption of large deflocculant molecules on the 
particle surfaces. Electrostatic stabilization is achieved through the use of the electrical double layer that forms 
around particles such that neighbouring particles are repelled from one another by like (negative or positive) 
surface charges. Deflocculation by electrostatic stabilization is common in clay slurries as well as with ceramic 
particles dispersed in polar liquids (e.g. water). 

The use of acids and bases to control interparticle forces in oxide (ceramic)-water suspensions is an example of 

electrostatic stabilization. Hydroxylated oxide surfaces react with H + (acid) or OH - (base) by surface ionization to 
become positively charged (low pH) or negatively charged (high pH), respectively. The like-charged particle 
surface layers repel neighbouring particles and stabilize the solution. A stable dispersion is produced by 
progressively adding 
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acid or base to a system to increase the particle surface charge such that the long-range electrostatic repulsive 
forces dominate over the short-range van der Waals attractive forces. Conversely, an oxide slurry can be induced to 
flocculate by adjusting the system pH to the point where there is no net charge on a particle's surface. The pH at 
which this occurs is defined as the point of zero charge (PZC) for the oxide. At the PZC, the electrostatic repulsive 
forces are eliminated and the van der Waals attractive forces take over, causing flocculation. Surfactants or wetting 
agents offer another means to improve dispersion in a slurry. By reducing the surface tension of a liquid, a wetting 
agent decreases the solid-liquid interfacial energy, making it more favourable for the liquid to coat the solid 
particles. A low-surface-tension surfactant also makes a good antifoam agent. 

C2.11. 3.2 CASTING 

Slurry or slip casting provides a relatively inexpensive way to fabricate uniform-thickness, thin-wall, or large cross 
section shapes [43, 44, 45, 46, 47 and 48]. For slip casting, a slurry is first poured into a porous mould. Capillary 
suction then draws the liquid from the slurry to form a higher solids content, close-packed, leather-hard cast on the 
inner surface of the mould. In a fixed time, a given wall thickness is formed, after which the excess slurry is 
drained. 


Electrophoretic deposition (EPD) is another method of casting slurries. EPD is accomplished through the controlled 
migration of charged particles under an applied electric field. During EPD, ceramic particles typically deposit on a 
mandrel to form coatings of limited thickness, or thin tubular shapes such as solid p'" - A1 2 3 electrolytes for 
sodium-sulfur batteries. 

C2.11. 3.3 DRYING 

After casting, the residual liquid in the ceramic part must be removed by drying [49, 50, 51, 52, 53 AND 54]- It is 
important to achieve relatively uniform drying throughout the body in order to avoid the excessive differential 
(capillary) stresses and stress gradients that can result in drying cracks (i.e. like those formed in a mud puddle) and 
warping (i.e. like that seen in lumber on drying). Air drying by convection and conduction is the most common 
means of drying ceramic ware, whereby drying occurs by liquid evaporation at the drying front. Initially, the drying 
front starts at the ware surface and then moves into the part. During drying, liquid migrates to the drying front by 
capillary flow, chemical diffusion, and/or thermal diffusion at a rate determined by the permeability of the ware. 
The size of the porosity in the ceramic body, the viscosity and surface tension of the liquid, and the moisture 
gradient from inside the body to the drying front determine the permeability. When liquid migration cannot keep 
pace with the evaporation process, the drying front moves from the surface into the body, where drying continues 
by evaporation from the menisci of the liquid within the pores. Large pores and interstices are emptied in 
preference to smaller pores, and the large capillary stresses produced as the menisci recede into fine pores can 
result in cracking. The finer the particles, the greater the drying shrinkage, and the greater the capillary stresses 
during drying. 

Stresses during drying can be minimized by controlled humidity drying, by supercritical drying, or by freeze 
drying. Controlled humidity drying utilizes a high-humidity atmosphere during the critical, initial stage of drying to 
maintain a liquid film on the (solid) surface of the ware (i.e. a solid-liquid interface). Supercritical drying is 
accomplished by heating ware under pressure in an autoclave until the liquid becomes a supercritical fluid (i.e. both 
a liquid and a vapour simultaneously), after which drying can be accomplished by isothermal depressurization to 
remove the vapour. Supercritical drying is often used to avoid generating catastrophic capillary stresses during the 
drying of fine-pore materials such as gels. Freeze drying makes use of freezing and sublimation to minimize drying 
stresses due to capillarity. In freeze drying, the temperature of the ware is initially decreased to below the freezing 
point of the liquid. 
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Then the pressure is reduced to transform the frozen liquid to a vapour and to remove it. Freeze drying is 
commonly used to make powders that are not agglomerated. 

C2.11. 3 A POWDER PRESSING 

Powder pressing may be the most common method of forming ceramic components [55, 56, 57, 58 and 59]. Dry 
pressing, also referred to as mechanical pressing, is an economical, yet versatile technique for fabricating small, 
relatively simple-shape powder compacts. Automated dry pressing, which is used extensively in the production of 
pharmaceutical tablets, is capable of producing 5000 ceramic parts per minute. Dry pressing involves compacting 
a, typically granulated, ceramic powder between two plungers in a die cavity. Friction between the powder and the 
die walls must be controlled during forming to minimize pressing pressure gradients that can create defects in the 
form of density gradients and/or cracking in a pressed powder compact. Friction can be controlled using lubricants 
during forming or through the design of the die (i.e. materials and geometry). 


C1 .1 1 .4 THERMAL PROCESSING 

Generally, the last step in ceramic component manufacturing is thermal processing [60, 61, 62 and 63]- This is the 
stage where the weakly-bound particulate body produced during shape forming is heat treated to produce a 


cohesive body with the desired properties for its end-use application. Thermal consolidation, which is more 
commonly referred to as 'firing', typically involves two steps, burnout and sintering. Generally, both are 
accomplished in a single firing process with burnout preceding sintering. 

C2.11. 4.1 BURNOUT 

The burnout stage involves eliminating the organic processing aids and any residual organic impurities or water 
prior to sintering [60, 61, 62 and 63]. Minor concentrations of residual liquid used in forming, and physically 
adsorbed moisture on particle surfaces can be eliminated on heating to -200 °C. Most organic binders used in 
ceramic forming are physically adsorbed onto particle surfaces, and can be burned out by heating to -500 °C. Clays 
such as kaolin must be heated to 700 °C to liberate the water of crystallization and produce the desired dehydrated 
aluminosilicate phase for subsequent processing. The decomposition of constituents such as sintering aids, which 
may be added in the form of a salt precursor, may require temperatures up to -900 °C. Temperatures in excess of 
1000°C may be required to completely eliminate chemically adsorbed water on fine-particle surfaces. 

C2.11. 4.2 RAW MATERIALS 

Sintering involves the densification and micro structure development that transforms the loosely bound particles in 
a powder compact into a dense, cohesive body [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 21, 72 and 73]. The end- 
use properties of a finished ceramic are largely dependent on the degree of densification achieved during sintering, 
and on the micro structure produced; consequently, sintering is one of the most critical steps in ceramic processing. 
Sintering, which is often considered to be synonymous with densification, is usually accomplished by heating a 
powder compact to approximately two-thirds of its melting temperature for a given time. Sintering can also occur 
by subjecting a powder compact to externally applied pressure, or heat and pressure simultaneously (e.g. hot 
pressing and hot isostatic 
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pressing). A ceramic densifies during sintering as the porosity (i.e. void space) between the solid particles is 
reduced in size with time. Concurrently, the cohesiveness of the body increases as interparticle contact (i.e. grain 
boundary) area increases during sintering. 

(a) Driving force for sintering. Ceramic powder compacts sinter as a result of the thermodynamic driving force to 
minimize the Gibbs' free energy, G, of a system [61, 62, 63 and 64, 74]. This includes minimizing the volume, 
interfacial, and surface energy in the system. In a powder compact, excess free energy is present primarily in the 
form of surface or interfacial energy (i.e. liquid-vapour and/or solid-vapour interfaces) associated with porosity. 
Under the influence of elevated temperature and/or pressure during sintering, atoms migrate to thermodynamically 
more stable positions within a powder compact. Material transport is driven by the chemical potential difference 
that exists between surfaces of dissimilar curvature within the system. Physically, in a particulate system, atoms or 
ions move from higher energy convex (i.e. as viewed from the particle centre out) particle surfaces to lower energy 
concave particle surfaces to decrease the curvature and chemical potential gradients in the system. 

Material transport can occur by solid-state, liquid-phase, and/or vapour-phase mechanisms. For polycrystalline 
ceramics, material transport commonly occurs as ions diffuse through the volume, along grain boundaries (i.e. 
particle-particle intersections) and on particle surfaces (figure C2.1 1.6.). Additionally, ions can vaporize from, and 
subsequently recondense onto, particle surfaces (i.e. evaporation-condensation). A powder compact will densify 
(i.e. undergo volume contraction) when material transport occurs in a manner that allows particle centres to 
approach during sintering. Material transport by volume and grain boundary diffusion can result in densification. 
Material transport that changes the geometry of the system without densification is termed coarsening. Grain 
growth is perhaps the most prevalent form of coarsening during sintering. Coarsening can occur when material is 
transported by volume diffusion, surface diffusion, or evaporation-condensation. 
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Derm fixation ' 


Figure C2.11.6. The classic two-particle sintering model illustrating material transport and neck growth at the 
particle contacts resulting in coarsening (left) and densification (right) during sintering. Surface diffusion (a), 
evaporation-condensation (b), and volume diffusion (c) contribute to coarsening, while volume diffusion (d), grain 
boundary diffusion (e), solution-precipitation (f), and dislocation motion (g) contribute to densification. 
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(b) Densification and micro structure development. Micro structurally, material transport during sintering manifests 
itself as interparticle pore shrinkage, grain boundary formation, a decrease in the total volume of the system (i.e. 
densification), and an increase in the average size of the particles that make up the compact (i.e. grain growth) [61, 
62, 63 and 64]. Interparticle contacts flatten, the curvature within the system decreases, and the surface area and 
free energy of the system decrease during sintering 

The ideal sintering process can be divided into three basic stages [74]. Initially, material is transported from convex 
particle surfaces to the pore-grain boundary intersection to form necks between adjacent particles. As this occurs, 
grain boundaries grow to create a three-dimensional array of approximately cylindrical, interconnected (i.e. 
continuous) pore channels at three grain junctions throughout the compact. These pore channels shrink in diameter 
during intermediate-stage sintering. Ultimately, because of Rayleigh instability (i.e. the critical 'cylinder' length to 
diameter ratio), the channels pinch off to form approximately spherical, isolated (i.e. closed) pores at four grain 
junctions within the ceramic matrix. The radial shrinkage of closed pores and the growth of larger grains at the 
expense of smaller ones constitute final-stage sintering. 

Sintering phenomena are generally similar in real powder compacts; however, factors including surface energy 
anisotropy and packing heterogeneities in real systems can contribute to heterogeneous (i.e. non-uniform) 
densification and micro structure development. To circumvent this problem, minor concentrations of select 
chemicals, referred to as sintering aids or dopants, are commonly added prior to sintering. These chemical 
impurities preferentially segregate to high-energy crystallographic planes to decrease the crystalline anisotropy in 
the system to provide improved control over microstructure development during sintering. MgO-doped A1 2 3 is 
the classic example in ceramics [71]. Impurity segregation to high-energy grain boundaries will also produce 
lower-energy interfaces that reduce the overall driving force for material transport during sintering. 

(c) Solid-state sintering. Ceramics can be densified by solid-state [ 71 , 72, 73, 75], liquid-phase [76], and viscous 
[ 77 ] sintering. Solid-state sintering refers to the process whereby densification occurs by solid-state, diffusion- 
controlled material transport. Densification occurs as higher-energy, solid-vapour (i.e. pore) interfaces are replaced 
by lower energy, solid-solid (i.e. grain boundary) interfaces. The change in free energy associated with the 
elimination of porosity, which drives densification, can be approximated by 


dG = k»<Mj»- RvdAg 


(C2.11.5) 


After the pore surfaces are eliminated and densification is complete, grain growth can further reduce the free 
energy of the system by reducing the amount of high-energy, solid-solid interfacial area. The change in free energy 
associated with the elimination of particle-particle interfaces, which drives grain growth, can be approximated by 

dG = -FttdAs*. (C2.11.6) 

Because densification occurs via the shrinkage of thermodynamically unstable pores, densification and 
microstructure development can be assessed on the basis of the dihedral angle, 0, formed as a result of the surface 
energy balance between the two solid-vapour and one solid-solid interface at the pore-grain boundary intersection 
[61, 78, 79 and 80], 

e=2COS " L fe) (CZ117) 
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where y ss and y sv are the solid-solid and solid-vapour interfacial energies, respectively (figure C2.1 1.7). Pore 
shrinkage and densification are favoured by a dihedral angle that is greater than the geometric dihedral angle, W 9 of 
the regular polyhedron whose number of sides equals the coordination number of the pore. Theoretically, for the 
ideal four-sided pore present during final-stage sintering, must be greater than 70.4 ° (i.e. the geometric dihedral 
angle for a tetrahedron) to achieve 100% theoretical density during sintering. A larger dihedral angle will be 
required to eliminate larger pores (i.e. surrounded by more grains) formed as a result of packing defects. The larger 
the dihedral angle, the larger the intergranular pores that can be eliminated during sintering and the greater the 
surface tension driving force for pore shrinkage. Thermodynamics and/or kinetics limit the shrinkage of pores 
trapped within grains (i.e. intragranular porosity) and pores above a critical size [78, 79 and 80 ]. 


Pore 



Figure C2.11.7. An illustration of the equilibrium dihedral angle, 0, formed by the balance of interfacial energies 
at a pore-grain boundary intersection during solid-state sintering. 

(d) Liquid-phase sintering. To promote faster densification at lower temperatures, relatively small concentrations of 
chemical additives, referred to as sintering aids, are commonly used to create a liquid phase during sintering. 
Traditional liquid-phase sintering involves heating and melting crystalline solids to form a eutectic liquid during 
sintering [63, 76]. The requirements for liquid-phase sintering are that the liquid wets the solid particles, there is 
sufficient liquid present, and that the solid is soluble in the liquid. The concentration of the liquid and the solubility 
of the solid in the liquid (i.e. reactivity) increase dramatically with increasing temperature above the eutectic 
temperature. 

Liquid-phase sintering is significantly more complex than solid-state sintering in that there are more phases, 
interfaces, and material transport mechanisms to consider. In general, densification will occur as long as it is 


energetically favourable to replace liquid-vapour (subscript lv), solid-solid (subscript ss), and solid-vapour 
(subscript sv) interfaces with solid-liquid (subscript si) interfaces during sintering: 

dC = Kai d ^si - (fr-ddiv + y^dA^ + y iV &A m ). (C2.11.8) 

Densification during liquid-phase sintering occurs in three stages. Initially, liquid forms at particle intersections and 
redistributes throughout the particulate mass under the influence of the capillary action. Shear stresses due to the 
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capillary pressure imbalance on different particles (e.g. different size particles) result in particle rearrangement to 
improve packing, and contribute to initial-stage densification. Solution-precipitation controls densification during 
intermediate-stage sintering. Material dissolves from higher energy, convex particle surfaces and migrates to lower 
energy, pore surfaces where it precipitates. This process is sometimes referred to as grain accommodation, because 
individual grains will actually change shape to fill void space. Densification by solution-precipitation continues 
until a rigid, three-dimensional skeletal structure is formed. The transition to final-stage liquid-phase sintering 
occurs when closed pores are formed at a compact relative density of -90%. Final-stage liquid-phase sintering, as 
with solid-state sintering, is characterized by the shrinkage of isolated pores and by grain growth. 

In liquid-phase sintering, densification and micro structure development can be assessed on the basis of the liquid 
contact or wetting angle, (|), formed as a result of the interfacial energy balance at the solid-liquid-vapour 
intersection as defined by the Young equation: 


^ — C os ' ( 7sv Kh> J (C2.1 1 .9) 


where y sL and y lv are the solid-liquid and liquid-vapour interfacial energies, respectively (figure C2.1 1.8). A low 
contact angle favours liquid wetting of particle surfaces and densification during liquid-phase sintering. 
Theoretically, § must be less than 60 ° to achieve 100% of the theoretical density. 



Figure C2.11.8. An illustration of the equilibrium contact (i.e. wetting) angle, (|), formed by the balance of 
interfacial energies for or a liquid (sessile) drop on a flat solid surface. 

(e) Pressure sintering. Pressure sintering employs the simultaneous use of both pressure and temperature during 
sintering to effect densification. Externally applied pressure on a powder compact increases the compressive stress 
at particle contacts, increasing the chemical potential gradient and the driving force for material transport relative to 
conventional sintering [63, 79, 80]. During conventional solid-state sintering, the driving force (DF) for final-stage 
pore closure at any given time, t, is determined by the surface energy, y sv of the sintering material and the radius, r, 
of the pore: 

DF(y) s = 2y sv /r,. (C2.11.10) 

During pressure sintering, interparticle compressive stress, approximated by the externally applied stress a a and 
normalized by the relative density of the compact p, supplements the surface tension driving force for pore 
shrinkage: 


DF t = DF{Yh+oJp f . (C2.11. 11) 
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As such, under an externally applied pressure a powder compact can densify faster and/or at a lower temperature 
during sintering. Pressure sintering is generally used to densify materials that are difficult or impossible to densify 
conventionally, and to produce dense fine-grain-size ceramics without the use of sintering aids. Additionally, the 
increased driving force for material transport and densification makes it possible to eliminate larger pores during 
pressure sintering, which can contribute to improved performance and reliability. 

(f) Sintering atmosphere. The sintering atmosphere plays an important role in determining how a material densifies, 
the ultimate density achieved, and the end-use properties of the finished ceramic [63]. In particular, in complex 
electronic ceramics like lead zirconate titanate, where the phases present during and after sintering are strongly 
dependent on oxygen stoichiometry, the sintering atmosphere can determine if the system densifies by solid-state 
or liquid-phase sintering, and what the resultant electrical properties are [61]. In addition, if gas from the sintering 
atmosphere becomes trapped in closed pores during final-stage sintering and cannot readily diffuse through the 
system, it will impede and ultimately limit densification [81]. The pressure, P, of the gas trapped within a closed 
pore will counteract the surface tension driving force to shrink the pore during sintering: 

DF f = DF(y) t - F t . (C2.11. 12) 

Trapped gas in closed pores often limits densification when sintering with a liquid or viscous (glass) phase because 
rapid material transport through the liquid often results in pore closure early in the sintering process. 


C2.1 1.5 SUMMARY 

The manufacture of ceramics starts with the constituent raw materials and carries through to thermal consolidation. 
Intermediate processing steps include raw material beneficiation, shape forming, and pre-sinter thermal processing. 
Surfaces are created, modified, and eliminated during ceramic powder processing. Optimizing ceramic 
manufacturing requires understanding and controlling the physical chemistry of surfaces and interfaces during the 
various unit process steps. The control and utilization of surface energy and surface curvature are critical. Surface 
tension creates pressure gradients that contribute to the agglomeration and rearrangement of particles in powders, 
to the migration of liquids during mixing, shape forming, and drying, and to pore shrinkage during sintering. 
Chemical potential gradients associated with surface curvature determine the solubility of any particles in liquids, 
control the rate of evaporation from solid surfaces, and drive material transport during sintering. In combination 
with a strong engineering component, robust ceramic processing requires understanding and controlling the 
physical chemistry of surfaces. 


ACKNOWLEDGMENTS 

The author thanks Dr James Voigt and Dr Donald Ellerby of Sandia National Laboratories for their technical 
review of this article, and Dr Ellerby for providing the SEM micrographs herein. 


-15- 


REFERENCES 

[1] Adamson A W 1 976 Physical Chemistry of Surfaces 3rd edn (New York: Wiley) 


2] Reed J S 1995 Introduction to the Principals of Ceramic Processing 2nd edn (New York: Wiley) pp 18-27 

3] Reed J S 1995 Introduction to the Principals of Ceramic Processing 2nd edn (New York: Wiley) 

4] Richerson D W 1992 Modern Ceramic Engineering: Properties, Processing, and Use in Design 2nd edn (New York: 
Dekker) 

5] Ewsuk K G 1993 Ceramics (processing) Kirk-Othmer Encyclopedia of Chemical Technology 4th edn, vol 5 (New 
York: Wiley) pp 603-33 

6] Reed J S 1995 Introduction to the Principals of Ceramic Processing 2nd edn (New York: Wiley) pp 35-53 

7] Richerson D W 1992 Modern Ceramic Engineering: Properties, Processing, and Use in Design 2nd edn (New York: 
Dekker) pp 373-81 

8] Norton F H 1974 Elements of Ceramics 2nd edn (Reading, MA: Addison-Wesley) pp 24-71 

9] Kingery W D 1960 Introduction to Ceramics (New York: Wiley) pp 15-31 

10] Jones J T and Berard M F 1972 Ceramics: Industrial Processing and Testing (Ames, IA: The Iowa State University 
Press) pp 14-38 

1 1] Ewsuk K G 1993 Ceramics (processing) Kirk-Othmer Encyclopedia of Chemical Technology 4th edn, vol 5 (New 
York: Wiley) pp 603-7 

1 2] Brownell W E 1 976 Structural Clay Products, Applied Mineralogy vol 9 (New York: Springer) pp 43-60 

13] Reed J S 1995 Introduction to the Principals of Ceramic Processing 2nd edn (New York: Wiley) pp 54-66 

14] Johnson D W Jr 1981 Nonconventional powder preparation techniques Am. Ceram. Soc. Bull. 60 221-4, 243 

15] Johnson D W Jr 1987 Innovations in ceramic powder preparation Ceramic Powder Science, Advances in Ceramics 
vol 21, ed G L Messing et al (Westerville, OH: The American Ceramic Society) pp 3-19 

16] Rhodes W H and Natansohn S 1989 Powders for advanced structural ceramics Am. Ceram. Soc. Bull. 68 1804-12 

17] Anderson H, Kodas T T and Smith D M 1989 Vapor phase processing of powders; plasma synthesis and aerosol 
decomposition Am. Ceram. Soc. Bull. 68 996-1000 

18] Ganguli D and Chatterjee M 1997 Ceramic Powder Preparation Handbook (Norwell, MA: Kluwer) 

19] Voigt J A 1993 Powder and precursor preparation by solution techniques Characterization of Ceramics ed R E 
Loehman (Greenwich, CT: Butterworth-Heinemann) pp 1-27 

20] McColm I J and Clark N J 1988 Forming, Shaping and Working of High Performance Ceramics (New York: Chapman 
and Hall) pp 60-140 

21] Reed J S 1995 Introduction to the Principals of Ceramic Processing 2nd edn (New York: Wiley) pp 313-33 


-16- 


[22] Richerson D W 1992 Modern Ceramic Engineering: Properties, Processing, and Use in Design 2nd edn (New 
York: Dekker) pp 381-96 

[23] Norton F H 1974 Elements of Ceramics 2nd edn (Reading, MA: Addison-Wesley) pp 55-71 

[24] Jones J T and Berard M F 1972 Ceramics: Industrial Processing and Testing (Ames, IA: The Iowa State University 
Press) pp 20-38 

[25] Hogg R 1981 Grinding and mixing of nonmetallic powders Am. Ceram. Soc. Bull. 60 206-1 1, 220 

[26] Greskovich C 1976 Milling Ceramic Fabrication Processes, Treatise on Materials Science and Technology vol 9, 
ed F F Y Wang (New York: Academic) pp 15-33 


[27] Somasundaran P 1978 Theories of grinding Ceramic Processing Before Firing ed G Y Onoda Jr and L Hench 
(New York: Wiley) pp 1 05-23 

[28] Ewsuk K G 1993 Ceramics (processing) Kirk-Othmer Encyclopedia of Chemical Technology 4th edn, vol 5 (New 
York: Wiley) pp 607-10 

[29] Reed J S 1995 Introduction to the Principals of Ceramic Processing 2nd edn, (New York: Wiley) pp 378-90 

[30] Richerson D W 1992 Modern Ceramic Engineering: Properties, Processing, and Use in Design 2nd edn (New 
York: Dekker) pp 41 1-13 

[31] Ewsuk K G 1993 Ceramics (processing) Kirk-Othmer Encyclopedia of Chemical Technology 4th edn, vol 5 (New 
York: Wiley) p 610 

[32] Reed J S 1995 Introduction to the Principals of Ceramic Processing 2nd edn (New York: Wiley) pp 400-4 

[33] Reed J S 1 995 Introduction to the Principals of Ceramic Processing 2nd edn (New York: Wiley) pp 1 35-208 

[34] Richerson D W 1992 Modern Ceramic Engineering: Properties, Processing, and Use in Design 2nd edn (New 
York: Dekker) pp 421-8 

[35] Morse T 1 979 Handbook of Organic Additives for Use in Ceramic Body Formulation (Butte, MA: Montana Energy 
and MHD Research and Development Institute) 

[36] Ewsuk K G 1993 Ceramics (processing) Kirk-Othmer Encyclopedia of Chemical Technology 4th edn, vol 5 (New 
York: Wiley) pp 610-12 

[37] Shanefield D J 1996 Organic Additives and Ceramic Processing: with Applications in Powder Metallurgy, Ink, and 
Paint 2nd edn (Boston, MA: Kluwer) 

[38] Allen T 1981 Particle Size Measurement 3rd edn (New York: Chapman and Hall) pp 246-66 

[39] Nelson R D 1988 Dispersing powders in liquids Handbook of Powder Technology vol 7, ed J C Williams and T 
Allen (New York: Elsevier) 

[40] Brindley G W 1960 Ion exchange in clay minerals Ceramic Fabrication Processes ed W D Kingery (New York: 
Wiley) pp 11-23 

[41] Michaels A S 1960 Rheological properties of aqueous clay systems Ceramic Fabrication Processes ed W D 
Kingery (New York: Wiley) pp 23-31 


-17- 


[42] Onoda G Y Jr 1 978 The rheology of organic binder solutions Ceramic Processing Before Firing ed G Y Onoda Jr 
and L Hench (New York: Wiley) pp 235-51 

[43] Reed J S 1995 Introduction to the Principals of Ceramic Processing 2nd edn (New York: Wiley) pp 492-533 

[44] Richerson D W 1992 Modern Ceramic Engineering: Properties, Processing, and Use in Design 2nd edn (New 
York: Dekker) pp 444-78 

[45] Cowan R E 1976 Slip casting Ceramic Fabrication Processes, Treatise on Materials Science and Technology vol 
9, ed F F Y Wang (New York: Academic) pp 153-71 

[46] Magid H S 1960 Controls required and problems encountered in production slip casting Ceramic Fabrication 
Processes ed W D Kingery (New York: Wiley) pp 40-5 

[47] St Pierre P D S 1960 Slip casting nonclay ceramics Ceramic Fabrication Processes ed W D Kingery (New York: 
Wiley) ch 5, pp 45-51 

[48] Ewsuk K G 1993 Ceramics (processing) Kirk-Othmer Encyclopedia of Chemical Technology 4th edn, vol 5 (New 
York: Wiley) pp 615-17 


[49] Reed J S 1995 Introduction to the Principals of Ceramic Processing 2nd edn (New York: Wiley) pp 545-58 

[50] Norton F H 1974 Elements of Ceramics 2nd edn (Reading, MA: Addison-Wesley) pp 1 14-25 

[51] Jones J T and Berard M F 1972 Ceramics: Industrial Processing and Testing (Ames, IA: The Iowa State University 
Press) pp 69-89 

[52] Brownell W E 1976 Structural Clay Products, Applied Mineralogy vol 9 (New York: Springer) pp 101-25 

[53] Brinker C J and Scherer G W 1990 Sol-gel science The Physics and Chemistry of Sol-Gel Processing (New York: 
Academic) pp 453-513 

[54] Ewsuk K G 1993 Ceramics (processing) Kirk-Othmer Encyclopedia of Chemical Technology 4th edn, vol 5 (New 
York: Wiley) pp 619-20 

[55] Reed J S and Runk R B 1976 Dry pressing Ceramic Fabrication Processes, Treatise on Materials Science and 
Technology vol 9, ed F F Y Wang (New York: Academic) pp 71-93 

[56] Thurnauer H 1960 Controls required and problems encountered in production dry pressing Ceramic Fabrication 
Processes ed W D Kingery (New York: Wiley) pp 62-70 

[57] Reed J S 1995 Introduction to the Principals of Ceramic Processing 2nd edn (New York: Wiley) pp 418-45 

[58] Richerson D W 1992 Modern Ceramic Engineering: Properties, Processing, and Use in Design (New York: 
Dekker) pp 429-43 

[59] Ewsuk K G 1993 Ceramics (processing) Kirk-Othmer Encyclopedia of Chemical Technology 4th edn, vol 5 (New 
York: Wiley) pp 613-14 

[60] Norton F H 1974 Firing ceramic ware Elements of Ceramics 2nd edn (Reading, MA: Addison-Wesley) pp 126-53 


-18- 


[61] Ewsuk K G 1993 Consolidation of bulk ceramics Characterization of Ceramics ed R E Loehman (Greenwich, CT: 
Butterworth-Heinemann) pp 77-101 

[62] Reed J S 1 995 Introduction to the Principals of Ceramic Processing 2nd edn (New York: Wiley) pp 583-61 9 

[63] Ewsuk K G 1993 Ceramics (processing) Kirk-Othmer Encyclopedia of Chemical Technology 4th edn, vol 5 (New 
York: Wiley) pp 620-7 

[64] Richerson D W 1992 Modern Ceramic Engineering: Properties, Processing, and Use in Design (New York: 
Dekker) pp 519-64 

[65] Kingery W D 1960 Introduction to Ceramics (New York: Wiley) pp 15-31 

[66] Kingery W D, Bowen H K and Uhlmann D R 1967 Introduction to Ceramics (New York: Wiley) pp 448-515 

[67] Herbert J M 1985 Ceramic dielectrics and capacitors Electrocomponent Science Monographs vol 6 (New York: 
Gordon and Breach) pp 63-94 

[68] Jones J T and Berard M F 1972 Ceramics: Industrial Processing and Testing (Ames, IA: The Iowa State University 
Press) pp 69-89 

[69] Brownell W E 1976 Structural clay products Applied Mineralogy vol 9 (New York: Springer) pp 126-64 

[70] McColm I J and Clark N J 1988 Forming, Shaping and Working of High Performance Ceramics (New York: 
Chapman and Hall) pp 208-31 

[71] Coble R L and Burke J E 1963 Sintering in ceramics Progress in Ceramic Science vol 3, ed J E Burke (New York: 
MacMillan)pp 197-251 


[72] Thummler F and Thomma W 1967 The sintering process J. Inst. Metals 12 69-108 

[73] Burke J E and Rosolowski J H 1976 Sintering Reactivity of Solids (Treatise on Solid State Chemistry vol 4) ed N B 
Hannay (New York: Plenum) pp 621-59 

[74] Herring C 1949 Surface tension as a motivation for sintering The Physics of Powder Metallurgy ed W E Kingston 
(New York: McGraw-Hill) pp 143-79 

[75] Coble R L 1961 Sintering crystalline solids. I, intermediate and final state diffusion models J. Appl. Phys. 32 787- 
92 

[76] German R M 1985 Liquid Phase Sintering (New York: Plenum) 

[77] Brinker C J and Scherer G W 1990 Sol-gel science The Physics and Chemistry of Sol-Gel Processing (New York: 
Academic) pp 675-742 

[78] Kingery W D and Francois B 1965 The sintering of crystalline oxides, I. Interactions between grains boundaries 

and pores Sintering and Related Phenomena ed G C Kuczynski, N A Hooton and C F Gibbon (New York: Gordon 
and Breach) pp 471-98 

[79] Ewsuk K G 1986 Final stage densification of alumina during hot isostatic pressing PhD Thesis The Pennsylvania 
State University 

[80] Ewsuk K G and Messing G L 1986 A theoretical and experimental analysis of final-stage densification of alumina 
during hot isostatic pressing Hot Isostatic Pressing: Theories and Applications ed R J Schaefer and M Linzer 
(Materials Park, OH: ASM International) pp 23-33 


-19- 


[81] Ewsuk K G 1992 Effects of trapped gases on ceramic-filled-glass composite densification Solid State Phenomena 
vol 25-26, ed A C D Chaklader and J A Lund (Brookfield, VT: Trans-Tech) pp 63-72 (Proc. Sintering' 91) 


-1- 

C2.1 2 Zeolites 

Andreas Kogelbauer and Roel Prins 


C2.12.1 INTRODUCTION AND HISTORY 

Compared to other crystalline inorganic oxides, zeolites represent a special class of materials. Their crystalline, 
microporous nature with well-defined pore dimensions in combination with high thermal stability, ion exchange 
and sorption capacity, as well as the ability to generate acidity has made them unique materials for practical 
applications. In recent decades, zeolites have gained tremendous importance both from an academic and an 
economic point of view. On the one hand, it is the versatility of zeolites that makes them such outstanding 
materials. They are well suited for a broad range of applications such as use as drying agents, use in gas separation 
processes, use as detergent additives and as catalysts (see section C2.12.7 ). On the other hand, the variety of 
structure types, the broad range of chemical modifications of the zeolite matrix, and the derived physico-chemical 
properties all carry a distinct fascination for the scientist which is much reflected in the ever growing number of 


zeolite-related publications and people involved in zeolite research [JJ. 

The term zeolite Was coined by the Swedish mineralogist A F Cronstedt who in 1756 observed that certain 
minerals exhibited intuminescense upon heating. Since they seemed to boil he referred to them as 'boiling 
stones' (Greek: zein = to boil, lithos = stone) [2]. Mineralogical studies that followed in the nineteenth century 
comprised mainly the determination of morphological, physical, and chemical properties of zeolitic minerals. The 
first systematic studies regarding the reversible hydration behaviour of zeolites and their chemical composition date 
back to the mid- 1800s [3] and were followed by investigations regarding their adsorption properties around the 
beginning of this century. To explain the selective adsorption of small molecules in chabazite, McBain coined the 
term molecular sieve [4]. With the availability of x-ray diffraction around the 1920s, structure determination of 
zeolites became possible. Analcime was the first zeolite structure to be determined in 1930 [5]. At the same time, 
the hydrated alumino silicate framework with loosely bonded alkali and earth alkali cations was established as the 
common criterion to distinguish zeolites chemically from other materials. About 1938, Barrer started the systematic 
investigation of the properties of natural zeolites: in particular, he applied physico-chemical principles and thereby 
put the study of zeolites on a firm scientific base. His investigations regarding zeolite synthesis led to the first 
reproducible and substantiated synthesis of zeolites in a laboratory environment [6]. The industrial use of zeolites 
followed shortly afterwards. Researchers at Union Carbide Corporation discovered the commercially important 
zeolite types A, X, and Y which were commercialized in 1954. Most prominent among those was the use of 
synthetic zeolite X as cracking catalyst by Mobil Oil in 1962. The use of template molecules for zeolite synthesis 
during the 1960s led to a variety of new synthetic structures with interesting properties. Ongoing synthesis efforts 
have led to more than 100 different synthetic structures known today. Zeolite nomenclature is based on a three- 
letter code assigned to the different structure types by the International Zeolite Association (IZA) [7] and compiled 
in the Atlas of Zeolite Structure Types [8]. It is, however, still common practice to use traditional designations as 
for instance in the case of X and Y zeolites which belong to the FAU structure type. Many of these traditional 
designations originate from the laboratory in which the materials were synthesized (e.g. ZSM for Zeolite Socony 
Mobil). For those materials whose framework topology has been confirmed, cross-references exist in the Atlas of 
Zeolite Structure Types enabling the assignment of traditional designations to the IZA structure type. 
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C2.12.2 COMPOSITION AND STRUCTURE OF ZEOLITES 

The traditional definition of a zeolite refers to microporous, crystalline, hydrated aluminosilicates with a three- 
dimensional framework consisting of corner-linked Si0 4 or A10 4 tetrahedra, although today the definition is used 
in a much broader sense, comprising microporous crystalline solids containing a variety of elements as tetrahedral 
building units. The alumino silicate -based zeolites are represented by the empirical formula 

Mi/ fl AlOjJcSith yH 2 (C2.12.1) 

in which M represents a cation of valence n, and x > 1 since no Al-O-Al bonds are permitted in a zeolite according 
to Loewenstein's rule [9]. The latter states that the ratio of silicon-to-aluminium must be equal to or greater than 
one due to local charge restrictions. The Si0 4 tetrahedra are charge balanced but each A10 4 tetrahedron carries a 
formal charge of-1 due to the +3 charge of the aluminium atom (see figure C2.12.1). Cations M are therefore 
required for balancing the lattice charge. These cations are rather mobile if the zeolite is in a hydrated state and, 
therefore, they can be easily exchanged. Typically sodium, potassium or organic tetralkyl ammonium is present as 
a monovalent charge-compensating cation in synthetic zeolites. Besides cations from the alkaline and alkaline earth 
series, transition metal cations are also frequently found in zeolites from natural sources. One of the outstanding 
properties of zeolites derives from the exchange of the charge-balancing cations by protons which can be attained 
by treatment in dilute mineral acids or by the exchange of ammonium cations that are subsequently thermally 
decomposed to yield ammonia and surface bonded protons (see figure C2.12.2 ). These protons are acidic which 
makes zeolites solid Br(|)nsted acids with an acid strength comparable to that of 70% sulphuric acid 10. The 
concentration of these acid sites increases with the aluminium concentration in the zeolite lattice and is in the range 

of 10 -10 mol per gram zeolite. 
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Figure C2.12.1. Origin of ion exchange capacity in zeolites. Since every oxygen atom contributes one negative 
charge to the tetrahedron incorporated in the framework, the silicon tetrahedron carries no net charge while the 
aluminium tetrahedron carries a net charge of -1 which is compensated by cations M. 
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Figure C2.12.2. Formation of Bransted acid sites in zeolites. Aqueous exchange of cation M with an ammonium 
salt yields the ammonium form of the zeolite. Upon thermal decomposition ammonia is released and the proton 
remains as charge-balancing species. Direct ion-exchange of M with acidic solutions is feasible for high-silica 
zeolites. 

The connection of the primary building unit, the tetrahedron, through oxygen bridges leads to the secondary 
building units (SBU), some of which are illustrated in figure C2.12.3 [11]. The way of depicting SBUs and whole 
zeolite structures, as in figure C2.12.3 , is common practice in zeolite science; only the central atoms of the 
tetrahedra (T-atoms) are drawn and lines represent oxygen bridges between tetrahedra. The pore openings that 
result from the arrangement of the primary building units are only referred to by the number of T-atoms; that is, a 
four-membered ring actually consists of four T-atoms and four bridging oxygen atoms in alternating arrangement. 
The prevalence of certain SBUs is used to classify zeolites but other ways of classifying framework topologies 
have also been developed 12. 
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Figure C2.12.3. Secondary building units in zeolites. Each corner represents a T-atom (Si, Al) while the 
connecting lines represent oxygen bridges with the oxygen atom in the middle. 

Many zeolite structures can be envisaged as being constructed from polyhedra that are obtained by appropriate 
arrangement of the SBUs. Some of the more common polyhedra are depicted in figure C2.12.4 . The formation of 
different zeolite structures from the same polyhedron, the sodalite cage, is demonstrated in figure C2.12.5 . By 
connection of two sodalite cages through one shared 4-ring, sodalite is obatined (IZA structure code SOD) whose 
largest pores are formed by 6-rings. The effective dimension of these pores is about 2.5 A which is too small for 
any molecule of interest to penetrate into the zeolite micropores. Sodalite is therefore unimportant for technical 
applications. By connecting two sodalite cages with a double 4-ring prism zeolite A is obtained (IZA structure code 
LTA). The larger void that is formed by the specific arrangement of eight sodalite cages in zeolite A is called a- 
cage. It is accessible through 8-membered rings with a pore opening of 4.1 A. The a-cage is therefore accessible to 
small molecules such as water which makes zeolite A an excellent drying agent. The pore diameter of zeolite A can 
further be varied between 3 and 5 A by substituting different cations such as K, Na or Ca. These materials are 
commercially available as molecular sieve 3A, 4A and 5A. An even larger internal void space is obtained when 
sodalite cages are connected through double 6-ring prisms such as in the cubic faujasite (IZA structure code FAU). 
Zeolites X or Y are the synthetic equivalents and vary only in their Si/Al ratio. The pore opening of these zeolites is 
formed by 12-membered rings with diameters of 7.4 A which are big enough even for larger organic molecules 
such as substituted aromatics. Reversing the stacking order results in hexagonal faujasite (IZA structure code 
EMT). This zeolite has distorted 12-membered rings as pore openings. 
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Figure C2.12.4. Typical polyhedra found in zeolites: (a) sodalite cage found in sodalite, zeolite A or faujasite; (b) 
cancrinite or s-cage found in cancrinite, erionite, offretite or gmelinite; (c) the 5-ring polyhedron found in ZSM-5 
and ZSM-1 1; (d) the large cavity of the faujasite structure; and (e) the a-cage forming the large cavity in zeolite A. 
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Figure C2.12.5. Different framework topologies based on the sodalite cage obtained through different connection 
patterns. 

Another technically relevant class of zeolites is the pentasil group [13]. Their dominating structural building units 
are 5-membered rings and they contain less aluminium than the sodalite cage-based zeolites. The polyhedron 
shown in figure C2.12.4 can be arranged to form chains, as shown in figure C2.12.6 , which build the basis for the 
ZSM-5 (IZA structure code MFI) and ZSM-1 1 (IZA structure code MEL) topologies. The pores of these zeolites, 
which are classified as medium pore zeolites, are formed by 10-membered rings. Small pore zeolites are in analogy 
those having 8-membered ring pores and large pore zeolites those with 12-membered rings and above. The 
micropores in ZSM-5 form a two-dimensional channel system in which straight channels are intersected by 
sinusoidal channels as depicted 
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in figure C2.12.7 . In ZSM-1 1, perpendicular straight channels intersect each other. Transport of molecules, 
however, is possible in all three main crystallographic directions by moving from one channel system to the other. 
There are also zeolite structures with only mono- or two-dimensional channel systems such as L, ZSM-1 2, 
ferrierite and mordenite. Some zeolite structures exhibit small pore openings but larger internal voids (supercages) 
such as erionite or faujasite. A compilation of the structural and chemical characteristics of technically important 


zeolite types is given in table C2.12.1. 

Table C2.12.1. Characteristics of technically important zeolites. 


IZA structure Typical unit cell 
Zeolite code composition 


Si0 2 /Al 2 3 range by Dimensionality of Pore apertures 
synthesis channel system (nm) 


LTA 


LTL 


Na 12 [(AI0 2 ) 12 (Si0 2 ) 12 ]27 2.0-6.8 


X FAU 


FAU 


Mordenite MOR 


ZSM-5 MFI 


Beta BEA 


H 2 


K g [(AI0 2 ) 9 (Si0 2 ) 27 ]22 6.0-10.0 


H 2 


N a 86 [( AIO 2)86( SiO 2)l06] 


264 H 2 


2.0-3.0 


Na 56 [(AI0 2 ) 56 (Si0 2 ) 136 ] 3.0-9.0 


250 H 2 

Na 8 [(AIO 2 ) 8 (SiO 2 ) 40 ]24 9-0"32 
H 2 

(Na,TPA) 3 [(AI0 2 ) 3 (Si0 2 ) 30-oo 
93 ]16H 2 

(Na,TEA) 5 [(AI0 2 ) 5 (Si0 2 ) 20-oo 

59 


0.41 


0.71 


0.74 


0.74 


0.65x0.70 
0.26x0.57 

0.53x0.56 
0.51x0.55; 

0.76x0.64 
0.55x0.55 


Free apertures in second channel system are too small for organic molecules to diffuse readily, making the channel 
system of mordenite essentially monodimensional. 
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Figure C2.12.6. Framework topology of ZSM-5. The 5-ring polyhedron is connected into chains which form the 
ZSM-5 structure with the 10-membered openings of the linear channels. 




Figure C2.12.7. Channel system of MFI (top) and MEL (bottom). The linear channels are interconnected by zigzag 
channels in ZSM-5 while exclusively straight running channels are present in ZSM-1 1 - larger internal openings 
are present at the channel intersections - the arrows indicate the pathways for molecular transport through the 
channel system. 

Progress in zeolite synthesis throughout the last decade has led to ultra-large pore zeolites, that is, zeolitic materials 
with pore sizes larger than those of 12-membered rings. These are very attractive and sought after materials 
because they can admit very bulky organic molecules typical of fine chemicals into the interior of the zeolite matrix 
where adsorption and catalysis can be carried out [14]. Very few structures have been synthesized so far which 
were initially based on aluminophosphates (AlP0 4 -8, 14-membered ring, 7.9><8.7A pore dimension; VPI-5, 18- 
membered ring, 12.1 A pore diameter, see also below), or gallophosphates (cloverite, 20-membered ring with clover 
shape). Only recently true silicon-based zeolites with 14-membered rings have been synthesized (UTD-1 [15], CIT- 
5 [16]) that exhibit high thermal stability, especially when compared with the aluminium or gallium phosphates, 
and have comparable Bransted acidity to other zeolites. Their synthesis, however, is based on the use of expensive 
or commercially unavailable template molecules. 
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The search for zeolitic materials with pore sizes larger than 10A recently resulted in a new class of materials 
termed mesoporous molecular sieves (known by the designation M41S), which are prepared using surfactant 
micelles as templating agents [17]. The pore diameter of these materials is tunable in a wide range of about 30 to 
100 A but the matrix consists of amorphous rather than crystalline silica walls. Incorporation of hetero-atoms (Al, 
Ti, B, Ni, Cr, Fe, Co, Mn) has been described, as was the synthesis of nonsiliceous materials such as oxides of W, 
Fe, Pb, Mo, and Sb [18]. Although these materials do not represent true zeolites, they are highly interesting 
materials which are commonly covered in the zeolite literature with great potential for shape-selective catalysis of 
bulky molecules. 

Besides structural variety, chemical diversity has also increased. Pure silicon forms of zeolite ZSM-5 and ZSM-1 1, 
designated silicalite-1 [ 19 ] and silicalite-2 [20], have been synthesised. A number of other pure silicon analogues of 
zeolites, called porosils, are known [21]. Various chemical elements other than silicon or aluminium have been 
incorporated into zeolite lattice structures [22, 23]. Most important among those from an applications point of view 
are the incorporation of titanium, cobalt, and iron for oxidation catalysts, boron for acid strength variation, and 
gallium for dehydrogenation/aromatization reactions. In some cases it remains questionable, however, whether 
incorporation into the zeolite lattice structure has really occurred. 

Compositional variety can also be achieved by ion exchange [24]. The cations are then located at the ion-exchange 


positions rather than being incorporated in the zeolite lattice as oxygen tetrahedra. Ion exchange methods are used 
for the preparation of a number of commercially important zeolitic materials. Most important is the exchange for 
ammonium as discussed above because it represents the least damaging route for the preparation of the proton form 
of zeolites used in acid catalysis. Cs-exchange leads to zeolites that act as solid base catalysts. Using reducible 

2* 

metal salts for the ion exchange (e.g. Pt(NH 3 )4 ) with subsequent reduction in hydrogen, metal-loaded zeolites with 

a high dispersion can be prepared which find applications in refinery processes as bifunctional catalysts (acidic and 
reducing functionality). Alternatively, the metal can be introduced using uncharged carbonyl complexes such as Ni 
(CO) 4 . 

Additional to the alumino silicate -based zeolites, a number of other crystalline microporous three-dimensional 
oxides have been synthesized [25]. Most prominent among these are the aluminophosphates (ALP0 4 series) [ 26 , 

27 ] whose framework is composed of strictly alternating (A10 4 )~ and (P0 4 ) + tetrahedra. Since the pure ALP0 4 
framework does not require charge-balancing cations, further compositional modifications are required to make use 
of these materials as catalysts. This is achieved by the partial or complete isomorphic substitution of framework 
phosphorus or aluminium by other elements during the synthesis. A variety of aluminophosphate derivatives have 
been synthesized in this way such as the SAPO (silicoaluminophosphate), MeAPO (metal aluminophosphate, 
Me=Mg, V, Cr, Mn, Fe, Co, Ni), ZnPO (zincphosphate), BePO (berylliumphosphate), GaP0 4 (galliumphosphate), 
and MeAPSO (metal silicoaluminophosphate) families. Although most elements substitute either for Al or P, in the 
case of silicon substitution of P and Al is possible. In SAPO-5 materials (IZA structure code AFI) silicon rich 
domains are simultaneously present besides SiAlP domains, the former being generated by substitution of Al-P 
pairs for silicon, the latter, which essentially carries all the Bronsted acidity, arising from substitution of 
phosphorus by silicon. 


C2.12.3 SYNTHESIS OF ZEOLITES 

Zeolites are the product of a hydrothermal conversion process [28]. As such they can be found in sedimentary 
deposits especially in areas that show signs of former volcanic activity. There are about 40 naturally occurring 
zeolite types. Types such as chabazite, clinoptilolite, mordenite and phillipsite occur with up to 80% phase purity in 
quite large 
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sedimentary deposits all over the world which makes mining economical. Due to the lack of purity and consistency 
in composition, the application of natural zeolites is rather limited and mainly adsorbent and ion exchange 
applications have been realized. Natural zeolites are, for instance, used as soil amendment, as cement additives or 
for the purification of municipal and nuclear wastewater [29]. 

For more demanding applications such as catalysis or the meaningful characterization of zeolitic phases by 
physico-chemical methods, only synthetic zeolites provide the required phase purity and compositional 
consistency. For their synthesis, hydrothermal conditions are commonly applied similar to those occurring during 
the formation of zeolitic phases in nature [23, 30]- The crystallization occurs from a gel formed from an aqueous 
silicate and aluminate solution at temperatures between 60 and 100 °C and atmospheric pressure, with higher 
temperatures and elevated pressures being occasionally required (e.g. for the synthesis of mordenite). Syntheses 
from clear solutions have been described but they are mainly of academic interest due to the low zeolite yields. 
More recently, new strategies toward zeolite synthesis have been developed which aim mainly at the formation of 
zeolite films and zeolitic membranes [31]. They are based on solid-state transformation, vapour-phase synthesis, 
secondary growth and casting of nanoparticles [32]. Synthesis routes that lead to binder- free materials are 
interesting from an applications point of view [33]. 

Since zeolites are metastable crystallization products they are subject to Ostwald's rule which states that metastable 
phases are initially formed and gradually transform into the thermodynamically most stable product. The least 
stable zeolitic phase (that with the lowest framework density) is therefore formed first and consumed with further 
synthesis time at the expense of a more stable phase due to a continuous crystallization/redissolution equilibrium. 


The synthesis time is therefore of great importance. The primary condition for the crystallization of a zeolitic 
structure is a certain degree of supersaturation in the synthesis mixture leading to nucleation. If the degree of 
supersaturation is too high, rapid polycondensation occurs that does not permit the formation of highly organized 
crystals. The gel that is formed during this initial process, however, is sufficiently soluble to provide the right 
degree of supersaturation and zeolites can nucleate if other requirements such as the correct Si/Al ratio of the 
synthesis mixture or the presence of a specific template are fulfilled. 

On a laboratory scale, hydrothermal synthesis is usually carried out in Teflon-coated, stainless-steel autoclaves 
under autogenous pressure. A typical synthesis mixture consists of up to four major constituents, a T-atom source 
(silicon and aluminium, other elements may also be incorporated as indicated above), a solvent (almost exclusively 

water), a mineralizer (OH - , F~), and a template. The T elements are usually provided as amorphous hydroxides, 
oxides or aluminosilicates, but alkoxysilicates may be used when high reactivity is required. Various solids such as 
precipitated gels, fumed silicas or clays may also be used as T-atom sources and the choice depends mainly on the 
desired reactivity. The mineralizer can be present in the T-atom source itself such as in aqueous silicate solutions. 
The primary function of the mineralizer is the dissolution of the T-atom source and the formation of the gel but it 
also assists in the quick equilibration of monomeric and polycondensated silicate species in the solution. Optimum 
concentration ranges exist for the mineralizing agent in dependence of the desired zeolite structure (e.g. aluminium 
rich zeolites crystallize preferentially at higher pH). Distinct differences in the resulting zeolites also result from the 

different pH ranges in which OH - (strongly alkaline) and F~ (alkaline to slightly acidic) operate, the lower 
concentration of crystal defects in fluoride synthesis being one. The template generally assists in the formation of 
the solid by forming additional bonds with the zeolite (ionic, dipole, hydrogen bond or van der Waals interaction) 
and additionally exerts a structure directing effect. The most commonly encountered templates are either the 

hydrated alkaline or alkaline earth cations present in solution (e.g. Na + in the synthesis of zeolite X), or quaternary 
organic ammonium cations such as tetrapropyl ammonium (TPA) for ZSM-5 synthesis. Substituting TPA with 
tetrabutyl or tetraethyl ammonium reveals the structure 
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directing effect since ZSM-1 1 and zeolite beta are obtained instead of ZSM-5. Ongoing research has meanwhile 
identified a vast array of suitable templates and synthesis routes free of organic templates have been developed for 
zeolites that were formerly only attainable through template synthesis such as ZSM-5. As well as the factors 
discussed above, other variables such as the concentration and ratio of the constituents of the synthesis mixture, the 
preparation of the synthesis mixture, ageing, seeding, agitation during the synthesis, and crystallization time and 
temperature have a distinct influence upon the synthesis and need to be considered. The IZA maintains a collection 
of verified recipes [ 34 ] that are available through their web page [7]. 


C2.12.4 POST-SYNTHETIC MODIFICATION OF ZEOLITES 

Only in very rare cases can zeolites be used directly in the form in which they were originally synthesized. For 
many larger-scale industrial applications, for instance, the synthetically obtained zeolite powders must be formed 
into larger attrition and crush-resistant particles using inorganic binder materials [33]. In most cases a thermal 
treatment in air (calcination) is at least required to decompose the organic template, to dehydrate the zeolite and to 
desorb impurities [35]. This holds particularly true if the proton form of a zeolite is desired from the ammonium 
form for acid catalysis. 

The simplest and most commonly applied modification method is ion exchange [24]. By far the vast majority of 
studies regarding ion exchange of zeolites were carried out in aqueous solutions. Ion exchange processes are 
described by equilibrium ion-exchange isotherms; these relate the equivalent fraction of the ion in the zeolite to that 
in solution. Zeolites exhibit different selectivities for ion exchange depending upon factors such as the silicon-to- 
aluminium ratio of the zeolite, the size, charge and polarizability of the cation, the solvation medium and the size 
and stability of the solvation sphere. For example, the selectivity series for exchange of monovalent cations into 
NaY is Ag ^>T1 > Cs > Rb > K > Na > Li, lithium being the smallest cation but with the largest hydration sphere. 
The presence of ion exchange positions in small cages such as the sodalite cage, which is accessible through a 6- 
ring window with an effective pore diameter of about 2. 5 A, may hinder or exclude the exchange of certain cations 


that are too bulky to penetrate it. As a consequence, the maximum theoretical exchange capacity, which is 
determined by the number of lattice aluminium atoms, is not attainable and only 63-65% ion exchange is observed 
(e.g. for Rb and Cs exchange of NaY). Another undesired effect that is frequently observed is the simultaneous 
exchange of protons, particularly in the case of transition metals which tend to hydro lyse in aqueous solution. 
Additionally, when di- and trivalent cations are exchanged they tend to hydrolyse and exchange as lower- valency 
hydroxylated ions; subsequent migration from the ion exchange positions and clustering upon calcination is 
frequently observed. More recently, solid-state ion exchange using fused salts has been successfully applied, a 
procedure in which the zeolite and the salt are ground and subsequently heated to elevated temperatures either 
under aerobic or anaerobic conditions [36]. For exchange processes that are difficult in aqueous solution, vapour 
phase exchange using salts that are volatile at elevated temperatures such as FeCl 3 and GaCl 3 has yielded a high 
degree of ion exchange [37]. 

Dealumination, the removal of aluminium from the zeolite framework, is generally applied for stabilizing zeolites 
and for the formation of mesopores which help in overcoming diffusional problems in the zeolite micropores. For 
instance ultrastable Y zeolite (USY), a major component of fluid-catalytic cracking (FCC) catalysts, is obtained by 
a twofold ammonium exchange with intermediate steam calcination leading to a dealuminated material that retains 
its structure upon heating up to 900 °C [38]. Due to the susceptibility of the Al-0 bond to hydrolysis, one of the 
issues associated with the use of zeolites is their stability in acidic and basic media or under hydrothermal 
conditions as commonly experienced in speciality chemical manufacture or hydrocarbon conversion. During such 
treatment, Si-O-Al bonds are 
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broken and the aluminium is removed from its tetrahedral lattice position, leaving silanol nests as lattice defects 
(see figure C2.12.8 . The extensive and uncontrolled removal of aluminium from the zeolite framework leads to the 
destabilization of the remaining framework and ultimately to complete structural collapse. A controlled 
dealumination treatment is the most successfully applied postsynthetic modification for stabilizing zeolites. The 
most commonly used dealumination methods are acid leaching, steaming, chemical treatment with silicon 
fluorides, and direct replacement of framework aluminium by means of gaseous SiCl 4 [24]. Combinations of these 
techniques have proven useful for the subsequent extraction of extra- framework species for example, steaming 
followed by acid leaching or extraction with complexing agents such as EDTA or oxalic acid. Inadvertent and thus 
undesired dealumination may occur during regular calcination of zeolites and during ion exchange in acidic 
medium, particularly in the case of zeolites with low lattice Si/Al ratio or a high concentration of lattice defects. 
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Figure C2.12.8. Schematics of the dealumination of zeolites. Water adsorbed on a Bn|)nsted site hydro lyses the Al- 
O bond and forms the first silanol group. The remaining Al-0 bonds are successively hydrolysed leaving a silanol 
nest and extra-framework aluminium. Aluminium is cationic at low pH. 
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Another important modification method is the passivation of the external crystallite surface, which may improve 
performance in shape selective catalysis (see C2.12.7 ). Treatment of zeolites with alkoxysilanes, SiCl 4 or silane, 
and subsequent hydrolysis or poisoning with bulky bases, organophosphorus compounds and arylsilanes have been 
used for this purpose [39]. In some cases, the improved performance was, however, not related to the masking of 
unselective active sites on the outer surface but rather to a narrowing of the pore diameters due to silica deposits. 


C2.12.5 PHYSICAL AND CHEMICAL PROPERTIES OF ZEOLITES 


Zeolites form small crystallites with an average size of about 1-2 um. Specially modified synthesis methods have 
yielded crystals as small as 10 nm [ 40 ] and ranging up to millimetre size [41]. Crystal agglomeration and 
intergrowth are commonly observed. The density range of zeolites is from 1.9 to 2.3 g cm . Due to the high 
porosity, specific surface areas are in the range of several 100 m 2 g _1 with a micropore volume of the dehydrated 
zeolites in the range from 0.1 to 0.3 cm g . Due to the mobility of the hydrated cations, zeolites exhibit electrical 
conductivity. The sodium form of zeolites leads to a pH value between 9 and 12 in aqueous solution while the 
proton form reacts in an acidic manner. The lattice Si/Al ratio is the governing factor determining the overall 
physico-chemical properties. A schematic representation of the effect of the aluminium concentration on ion- 
exchange capacity, acid strength of the protons, resistance to acidic media, thermal stability and hydrophilicity is 
given in figure C2.12.9 . Since every aluminium atom in the framework requires a charge balancing cation, the ion- 


exchange capacity is proportional to the aluminium concentration. This also holds true if the cations are protons. 
The acid strength of each of these protons, however, increases with decreasing aluminium concentration and levels 
off at a constant value at a Si/Al ratio characteristic for each zeolite type. This effect is related to the mutual 
influence of aluminium atoms in close proximity to one another, referred to as next nearest neighbour (NNN) 
aluminium that are separated only by one Si0 4 tetrahedron [42]. For instance, in zeolite A with a Si/Al ratio of 1 
only NNN aluminium is present. With decreasing aluminium concentration, aluminium atoms become more and 
more isolated in the silica matrix and only Si can be found in the NNN coordination shell. The overall acidity, the 
product of acid site concentration and acid strength, therefore goes through a maximum dependent on the Si/Al 
ratio. Since the Al-0 bond is susceptible to hydrolysis in acidic medium, the acid and thermal stability are also 
higher when less aluminium is present in the zeolite lattice. The higher the aluminium concentration, the higher is 
the overall lattice charge leading to hydrophilic materials. The adsorption capacity of zeolite A decreases with 
decreasing polarity and polarizability of adsorbates. On the other hand, zeolites with a low aluminium 
concentration are increasingly more hydrophobic and selectively adsorb hydrocarbons over water. 
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Figure C2.7.9. Effect of the Si/Al ratio of zeolites on their properties. 

Factors other than the Si/Al ratio are also important. The alkali-form of zeolites, for instance, is per se not 
susceptible to hydrolysis of the Al-0 bond by steam or acid attack. The concurrent ion exchange for protons, 
however, creates Bronsted acid sites whose A10 4 tetrahedron can be hydrolysed (e.g. leading to complete 
dissolution of NaA zeolite in acidic aqueous solutions). 


C2.12.6 CHARACTERIZATION OF ZEOLITES 

Characterization of zeolites is primarily carried out to assess the quality of materials obtained from synthesis and 
postsynthetic modifications. Secondly, it facilitates the understanding of the relation between physical and 
chemical properties of zeolites and their behaviour in certain applications. For this task, especially, in situ 
characterization methods have become increasingly more important, that is, techniques which probe the zeolite 
under actual process conditions. 

The first analytical tool to assess the quality of a zeolite is powder x-ray diffraction. A collection of simulated 
powder XRD patterns of zeolites and some disordered intergrowths together with crystallographic data is available 
from the IZA [43]. Phase purity and x-ray crystallinity, which is arbitrarily defined as the ratio of the intensity of 


selected reflections in the diffractogram with respect to a standard material are determined in this way. Additional 
routine characterization techniques comprise elemental analysis for the determination of the chemical composition, 
N 2 and Ar adsorption for the determination of textural properties and scanning electron microscopy for the 
determination of the crystal size and morphology. 
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Solid-state nuclear magnetic resonance (NMR) spectroscopy applying magic angle spinning (MAS) and infrared 
(IR) spectroscopy have been traditionally used to characterize the zeolite itself. The widespread availability of 
modern Fourier transform spectrometers means that these instruments are becoming increasingly important in the 
study of dynamic processes occurring during adsorption and catalytic reactions [44]. In situ measurements of 
adsorbate interactions with precise control of temperature and partial pressure is considered state-of-the-art 

regarding IR spectroscopic measurements. The interesting spectral region is between 3800 and 1200 cm covering 
the stretching and deformation vibrations of the zeolite hydroxyl groups and most organic functional groups. Of 
particular importance is the differentiation of Lewis and Bransted acid sites by means of pyridine adsorption. IR 
spectroscopy has meanwhile matured so far that in situ monitoring of the zeolite surface during catalytic reactions 

97 9Q 

is possible. In MAS-NMR it is mainly 7 A1 and Si NMR that provides useful information about the local 
environment of aluminium and silicon in the zeolite matrix. For aluminium, differentiation between tetrahedral 
aluminium in the zeolite lattice and octahedral extra-framework aluminium is, in principle, possible. The number of 

9Q 

A10 4 tetrahedra directly linked to an SiO^, tetrahedron can be determined from Si NMR since different chemical 
shifts are observed for the corresponding Si nuclei. In the absence of large concentrations of silanol defects, which 

can be ascertained by H cross-polarization measurements, the lattice Si/Al ratio can be determined from such data. 
Using in situ dehydration techniques, H-NMR is also turning into a more commonly applied method providing 
quantitative information about the nature of zeolite hydroxyl groups. Recent developments have even led to the 
possibility of studying catalytic reactions on zeolites in continuous flow by means of NMR spectroscopy [45]. 

Alongside these techniques, microbalance measurements of adsorption capacities and kinetics, microcalorimetric 
measurements of adsorption processes and temperature-programmed desorption of base molecules have provided 
useful information about the thermochemistry of adsorption processes and the acidity characteristics of zeolites 
[46]. 


C2.12.7 APPLICATIONS OF ZEOLITES 

The applications of zeolites can be divided into three major categories: ion exchange, adsorption and catalysis. The 
largest amount of zeolites is used in ion exchange applications while the largest value is derived from catalytic 
applications [1, 33 ]. 

The most important example for ion exchange applications is the use of zeolites as detergent additives for the 
removal of mainly Ca and partly Mg from washing waters. As an environmentally acceptable alternative, zeolites 
have taken over the traditional role of sodium polyphosphate which was a major contributor to the eutrophication 
of waters. Zeolite A is mainly used for this purpose and commercial syntheses have been optimized for the efficient 
preparation of large quantities from cheap, natural resources giving products with a homogeneous crystal size 
below 5 um. The annual global production for this application has reached several hundred thousand tons [33]. 
Important natural zeolite-based ion exchange applications comprise the selective removal of ammonium from 

industrial and municipal wastewater and the removal of Cs and Sr from radioactive wastewater [29]. 

The excellent suitability of zeolites as adsorbents derives from three main characteristics: namely, a high 
intracrystalline void volume, a high electrostatic field and the molecular sieving effect. The high adsorption 
capacity of zeolites for water has already been mentioned, which is exploited when zeolites are used as static 
drying agents for refrigerants, in double glazing, or as additives in the manufacture of solvent- free polyurethanes. 
Industrial processes based on temperature-swing or pressure-swing adsorbers are applied for the desiccation of 
natural gas and cracking gas and the 
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purification of natural gas prior to liquefaction [33]. It is mainly the various ion-exchanged forms of zeolite A that 
are used for these applications. 

Impurities such as hydrocarbons, carbon dioxide and water that are present in ppm levels can be successfully 
removed using molecular sieves. In particular, the presence of hydrocarbons is hazardous in cryogenic air 
separation plants since they form explosive mixtures with liquefied oxygen. Furthermore, the low degree of 
interaction of hydrogen with zeolites is exploited in the production of ultrahigh purity hydrogen where trace 
amounts of CO, C0 2 , N 2 , 2 , Ar, CH 4 and water are removed to ppm levels. Zeolites are also an excellent means 
of gas separation. The higher degree of interaction of N 2 with the cations in Ca-exchanged zeolite A or Li- 
exchanged zeolite X as compared to 2 can be used to enrich oxygen up to 95 wt%. Another application, the 
selective removal of volatile organic compounds (VOC) from humid exhaust vents such as in commercially-used 
frying pans, derives from the selective adsorption of hydrocarbons on hydrophobic high-silica zeolites [47]. 
Molecular sieving effects, that is, selective adsorption due to size exclusion effects, are exploited in the separation 
of ?z-paraffms from iso-paraffms. Along the same lines ethylbenzene, /?ara-ethyltoluene or para -xylene can be 
separated from their isomers. 

Catalysis with zeolites has traditionally been located in the petroleum refining industry where zeolites have 
replaced the traditional amorphous silica-alumina as catalysts [48]. The zeolite catalysts (zeolite X at first and USY 
later) provided higher activity by orders of magnitude and also improved yields of motor gasoline. Their success as 
FCC catalysts led to the application in many refinery processes soon after. Over recent decades zeolites have 
moved increasingly into the manufacture of petrochemicals and base chemicals. The most recent trends are in an 
augmented use of zeolites for the production of fine and speciality chemicals [49, 50]. Additionally zeolites have 
shown remarkable potential in environmental catalysis, namely for exhaust purification (transition metal- 
exchanged zeolites) [ 51 ] or as replacements for traditionally used liquid catalysts such as sulphuric and 
hydrofluoric acid [50]. Alkylation, acylation, nitration or Beckmann rearrangement are some examples that are 
presently being studied. 

The reasons for their excellent suitability as catalysts are multifaceted. Depending on their chemical state, zeolites 
can be used for acid catalysis, base catalysis, bifunctional catalysis (hydrogenation-dehydrogenation reactions 
coupled with acid catalysis), or redox reactions. As solid acids they provide a high acidity, both in terms of acid 
strength and acid site concentration, which is imperative for achieving high reaction rates in acid-catalysed 
reactions; corrosion is less of a concern compared to liquid acids. In this respect, theoretical work based on 
quantum chemical calculations has significantly contributed to the understanding of zeolite acidity in relation to 
carbenium ion chemistry [52, 53 ]. As inorganic solids, zeolites are easily separable by filtration facilitating product 
separation and permit fixed-bed flow-through operation. Further advantages arise from their excellent thermal 
stability and regenerability, properties sought for in heterogeneous catalysts. Some structures, like ZSM-5, show an 
exceptionally high intrinsic resistance to coke formation, which permits long reaction times between regeneration 
cycles. Of the utmost importance, however, is their ability to discriminate molecules based upon their size and 
shape, coined shape selectivity, which makes them attractive materials for intermediates and fine-chemicals 
synthesis. Following the original definition, shape selectivity can become apparent in three different ways (see 
figure figure C2.12.10 [54]. It refers to the exclusion of molecules (reactant selectivity), the retardation of 
molecular transport within the zeolite pores depending on the molecular dimensions (product selectivity), or the 
confinement of the transition state for a certain reaction (transition state selectivity). A comprehensive discussion 
of the implications of shape selectivity was given recently [55]. 
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Figure C2. 12.10. Different manifestations of shape-selectivity in zeolite catalysis. Reactant selectivity (top), 
product selectivity (middle) and transition state selectivity (bottom). 

A typical example for reactant selectivity is the selectoforming process, the selective cracking of ?z -paraffins from 
reformates and naphthas using small-pore zeolites such as erionite. Only linear paraffins can penetrate into the 
zeolite micropore system and are converted while the desired branched and cyclic hydrocarbons remain unaffected. 
The same effect is being exploited in dewaxing processes and in the above mentioned «-paraffin/iso-paraffin 
separation by adsorption. The alkylation of toluene over ZSM-5 is an example of product selectivity. Due to its 
higher diffusivity, para-xyleriQ can leave the pore system much more rapidly than the bulkier ortho and meta 
isomers which are continuously reequilibrated in the zeolite pores. Thereby, a higher yield of the para-isomQX is 
obtained. During xylene isomerization, transition-state selectivity is manifested through the absence of 
trimethylbenzenes in the product which would originate from transalkylation reactions. The confined space in the 
zeolite pore prohibits the formation of the sterically demanding transition state for the bimolecular transalkylation. 

Only a very few selected examples have been discussed. The number of processes based on shape-selective 
catalysis by zeolites is ever increasing, particularly in the field of speciality and fine chemicals and quite a few 
have been 
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commercialized. For a more comprehensive picture the reader is advised to consult the further reading suggestions. 
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C2.13 Plasma chemistry 

Martin Schmidt and Kurt Becker 


C2.13.1 INTRODUCTION 

The plasma state is often referred to as the fourth state of matter [1]. It is characterized by the presence of free 
positive (and sometimes also negative) ions and negatively charged electrons in a neutral background gas. The 

charge carrier concentration can vary from 10 5 m -3 in a dilute interstellar plasma to 10 m" 3 in a dense stellar 
plasma. Most matter in the universe is found in the plasma state. Examples include the sun and other stars, inter 
stellar matter and the terrestrial ionosphere. Naturally occurring plasmas on Earth are rare and include lightning and 
flames. Plasmas generated for technological applications include, among others, welding arcs, plasma torches, 
high-pressure lamps and the ignition spark in an internal combustion engine.In the efforts to solve the energy 
problem on Earth, magnetically confined plasmas in nuclear fusion reactors are one of several choices to achieve 
the extreme conditions under which nuclear fusion might occur [1]. This chapter focuses primarily on gaseous 
plasmas at pressures ranging from a fraction of an atmosphere to at most atmospheric pressure. Plasmas are 
generally created by supplying a sufficient amount of energy to a volume containing a neutral gas, so that free 
electrons and ions are generated from the atoms and molecules in the gas. The energy may be supplied in the form 
of electrical energy, heat, ultra violet radiation or particle beams. In technical plasma devices, the input energy is 


generally supplied as electrical energy that causes the ignition of a gas discharge. Chemical reactions among the 
different neutral and ionic atomic and molecular species occur in this gaseous atmosphere (volume processes) and 
also at the surfaces that surround the plasma (surface or wall processes). The study and the technical utilization 
ofthese chemical reactions is referred to as plasma chemistry [2, 3, 4, 5, 6 and 7]. 

A brief description of a low-density non-equilibrium plasma is given followed by a review of its characteristic 
features and of the relevant collisionprocesses in the plasma. Principles for the generation of plasmas in technical 
devices are discussed and examples of important plasma chemical processes and their technical applications are 
presented. 


C2.13.2 THE CHARACTERIZATION OF PLASMAS 

A plasma is a globally quasi-neutral system of free electrons and positive and negative ions in a neutral background 
gas consisting of atoms, molecules and free radicals (some of which may be in electronically and/or rotationally- 
vibrationally excited states) [8, 9]. Global quasi-neutrality means that the plasma contains overall an equal number 
of positive and negative charges. The electrons in the plasma have a mean kinetic energy that can range from less 
than 0.01 eV in an interstellar plasma to more than 10 keV in fusion plasmas. In most laboratory plasmas, the mean 
kinetic energy of the electrons is higher than the thermal energy corresponding to room temperature (0.025 eV) 
( figure C2.13.1 ). In some cases, the positive ions and the neutrals in the plasma also have temperatures 
significantly above room temperature. An additional criterion for the existence of a plasma (as opposed to a mere 
mixture of electrons, ions and neutrals) is that the charge carriers and their mutual electromagnetic interaction 
determine the properties of the system. 
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Figure C2.13.1. Electron energies and electron densities for different plasmas. 


Any charge imbalance in a plasma (i.e. any local deviation from charge neutrality) results in a motion of the 
electrons that, in turn, leads to oscillations of the electrons with the electron plasma frequency co , (Langmuir 
frequency) 


"VI = 




where e Q is the elementary charge, n is the electron density, s Q is the permittivity of free space, and m e denotes the 
electron mass. Deviations from the global quasi-neutrality of the plasma are possible only locally in a small volume 
referred to as the Debye sphere whose radius is characterized by r D , the Debye length: 


^D = 


f £t)kT c 


e n e 


where k is the Boltzmann constant and T Q refers to the temperature of the electrons. A plasma, as opposed to a 
mixture of electrons and ions, exists if the linear dimensions of the plasma (diameter, length) are large compared to 
the Debye length. The number of charge carriers in the Debye sphere amounts to 10 4 for an electron temperature of 
10 000 K and an electron density of 10 cm . Electromagnetic forces in such systems are important if the plasma 
frequency is higher than the collision frequency of the charge carriers with the neutral particles in the plasma. 
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A non-thermal, non-equilibrium plasma is characterized by an electron temperature T Q much larger than the ion 
temperature T and the neutral gas temperature T (T ^J, T ). Typical non-thermal, non-equilibrium plasmas 

used in technological applications have electron temperatures of 10 to 10 5 K (corresponding to mean electron 
energies of about 0.5-5 eV). In plasmas that are in or near thermal equilibrium ('thermal' plasmas), the electron, 

ion and neutral temperatures are roughly equal. The degree of ionization a in a plasma is given by 


a = 


He 


to denote the fraction of charge carriers (e.g. positive ions) in the plasma. In the above relation n^ is the neutral gas 

density. Weakly ionized ('thin' 
unity in fully ionized plasmas. 


density. Weakly ionized ('thin') plasmas have a degree of ionization in the range of 10 , whereas a approaches 


The generation of non-thermal plasmas by externally supplied electrical energy is possible because of the efficient 
interaction of the light electrons in the plasma with the external electric field. This results in a plasma with a high 
mean electron energy compared to the low energy of the near-thermal ions and neutrals. Energy transfer from the 
light electrons to the heavy particles in elastic collisions is negligible due to the difference in their masses. A 
selective energy transfer from the electrons to the heavy particles occurs via the various inelastic electron collision 
processes. In a molecular plasma, electron collisions will also lead to the formation of new species via dissociative 
processes. On the other hand, the energy gained by the ions in the external electric field is transferred efficiently to 
the gas molecules via elastic collisions. But this energy is generally small, so that the plasma will consist of a 'hot' 
electron gas and a 'cold' ionic and neutral gas. 

The velocity distribution of the electrons in a plasma is generally a complicated function whose exact shape is 
determined by many factors. It is often assumed for reasons of convenience in calculations that such velocity 
distributions are Maxwellian and that the electrons are in thermodynamical equilibrium. The Maxwell distribution 
is given by 

f(v) d, = 4* (j^rj' v 2 t—** T - Av 

1 9 

and the energy distribution with kT Q = eU Q and jmv = eU is given by 


h(U)dU = U 


= f;i/2 


J*V* 


V2 


T e-^dy = U l/2 f(U)dU. 


In plasmas, Maxwellian distributions are generally found only in cases where the energy exchange between the 
electrons is an important process. In most plasmas generated by external electrical fields, the observed velocity 
distributions deviate significantly from a Maxwellian distribution. The energy distribution function of the electrons 
is determined by their energy gain in the electric field and by their losses in elastic and inelastic collisions. The 
distribution function can be calculated using the Boltzmann equation [10, 11 ]. The shape of realistic electron 
energy distributions is often characterized by a lack of electrons with higher energies ( figure C2.13.2 ), [12]. 
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Figure C2.13.2. Electron energy distributions /(T/) for a mean electron energy of 4.2 eV, Maxwell distribution 
(M), Druyvesteyn distribution (D) and a calculated distribution (Ar) for an Ar plasma [12]. 

The transport of particles in the plasma is diffusive or convective for the neutrals, whereas the charge carriers move 
under the influence of the external and internal electric and magnetic fields. The drift velocityv of the charged 
particles is proportional to the electric field E: 

v = LiE 

where jli denotes the mobility. The mobility is related to the diffusion coefficient D by the Einstein relation 


The movement of the fast electrons leads to the formation of a space-charge field that impedes the motion of the 
electrons and increases the velocity of the ions (ambipolar diffusion). The ambipolar diffusion of positive ions and 
negative electrons is described by the ambipolar diffusion coefficient Z) : 


D» = 




Non-thermal plasmas in contact with insulating walls (substrate) have an important property. The plasma with the 
hot electrons is positively charged relative to the wall (self-bias). A sheath with a positive space charge and an 
electric field is formed between the wall and the plasma. The hot electrons travel faster to the wall than the heavy 


ions, but the two currents must be equal. This is achieved by the negative potential (10-20 V) of the wall which 
reflects the slow electrons and accelerates the ions towards the wall. The sheath potential V^ is determined for a 
planar surface by 
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where m- x denotes the ion mass. 


C2.13.3 COLLISION PROCESSES IN PLASMAS 

Collision processes involving the different plasma components play an important role in non-thermal plasmas 
( table C2.13.1 ) [ 13 , 14 and 15 ]. Electron collision processes are of particular importance because of the high 
temperature or high mean energy of the plasma electrons. Stepwise excitation and ionization, that is the 
excitation/ionization of an atom or molecule which is already in an excited or, in particular, in a metastable state, 
can occur with appreciable probability even though the concentration of excited/metastable species in a non- 
thermal plasma is generally low. The energy spacing between excited states is typically much smaller than the 
energy gap between the ground state and the first excited state. The number of low-energy electrons is typically 
much higher than the number of electrons with energies above about 10 eV and the excitation/ionization cross 
section out of an excited state is much larger than the cross section for excitation/ionization of ground-state species. 
This may result in rate coefficients (see below) for stepwise excitation/ionization that are quite large. Metastable 
species cannot decay via radiative dipole transitions to lower states. This results in a comparatively long lifetime of 
microseconds or even milliseconds for these species (compared to nanoseconds for excited states which can decay 
radiatively via dipole transitions). As a consequence, metastables can accumulate in the plasma and can be an 
efficient source of species for stepwise excitation/ionization processes or for super-elastic collisions in which the 
scattered electron gains energy. Ionization due to binary collisions involving metastable atoms or molecules is an 
efficient mechanism for charge carrier production. The generation of free radicals by electron collisions in 
molecular plasmas is an important precursor for plasma chemical reactions. Electron impact ionization is the 
fundamental process for sustaining a non-thermal plasma. Electron-impact-induced dissociation leading to the 
formation of free radicals is the most important reaction channel in plasma chemistry. In conventional chemistry, 
the formation of radicals is determined by the temperature of the entire system offering a different spectrum of 
secondary reactions from that resulting from radical production by electron collision in the cold neutral gas 
environment of a non-thermal plasma. 
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Table C2.13.1 Collision processes of electrons and heavy particles in non-thermal plasmas. The asterisk * denotes 
short-lived excited particles, the superscript m denotes long-lived metastable excited atoms or molecules. 


Collisions of electrons 

e" + A -^ A * /m + e " excitation of atoms 

A* /m -^ A + hv spontaneous de-excitation 

e - + A* /m -^ A + hv + e" collisional-induced de-excitation 


e" + A -> A + + 2e~ 


e - + AB -> AB* /m + e~ 


AB* -+AB + hv 

e" + AB* -> AB+ hv + e" 
e" + AB* -> A(*> + B + e" 
e" + AB -> A + B + + 2e" 


ionization of atoms 
excitation of molecules 
spontaneous de-excitation 
collisional-induced de-excitation 
dissociation of molecules 
dissociative ionization 


e +A /m ^A + e + E.- super-elastic collisions 


e - + A* /m -> A** + e" 


e - + A* /m -> A + + e" 


e" + A -> A" 

e" + A" -^ A + 2e~ 

e" + A + -> A 

e" + A + + M^A+M 


Table continued on next page. 


stepwise excitation 

stepwise ionization 

attachment 

detachment 

recombination 

three-body collision recombination 
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Table C2.13.1 Continued. 


Collisions of heavy particles 

A + + B -^ A + B + charge transfer 

A m + B -> A + B + 


+ e 

A m + A m -> A + 
A + + e 


Penning ionization 


pair collision 


A* + A -^ A + 2 + e Hornbeck-Molnar ionization 


A + + BC -> AC + + 
B 


ion-molecule reaction 


A + BC -^ AC + B chemical reaction 
R + BC -> RC + B 


chemical reactions with radical R produced in 
the plasma 


A* + BC -^ AC + chemical reactions with excited atom or 
B molecule 


The probability for a particular electron collision process to occur is expressed in terms of the 
corresponding electron-impact cross section a which is a function of the energy of the colliding electron. 
All inelastic electron collision processes have a minimum energy (threshold) below which the process 
cannot occur for reasons of energy conservation. In plasmas, the electrons are not mono-energetic, but 
have an energy or velocity distribution,/^. In those cases, it is often convenient to define a rate 
coefficient k for each two-body collision process: 


= / o(v)v 


ff(v)dv 


where a(v) denotes the cross section (here written as a function of velocity rather than energy) andffv) 
represents the velocity distribution function of the electrons. Realistic plasmas typically exhibit 
complicated velocity distribution functions for the plasma electrons (see above). For the simple case of a 
Maxwellian velocity distribution of the electrons and a collision cross section, whose low-energy 
behaviour (in a limited range of impact energies E above the threshold energy iL.) can be described by 
the expression 

<y{E) = nr 2 ii - E^fE) 

the resulting rate coefficient has the form 

k{T) = 7rr 2 (Si-T t ,/jr^) l/ -cxp(-£, hf /i-7i) l 

This expression corresponds to the Arrhenius equation with an exponential dependence on the threshold 
energy and the temperature T Q . The factor in front of the exponential function contains the collision cross 
section and implicitly also the mean velocity of the electrons. 
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C2.13.4 PLASMA GENERATION 

The usual means for the generation of technological plasmas is by supplying electrical energy to the gas in the 
plasma reactor (figure C2.13.3) and figure C2.13.4 . The electrons are accelerated in the external electric field and 
transfer energy by collisions with the other particles to the plasma. Depending on the time dependence of the 
sustaining external electric field, discharges are classified as direct current (DC) or alternate current (AC) 
discharges. The DC low-pressure normal glow discharge between two plane electrodes in a cylindrical glass tube is 
the prototypical DC discharge and has been studied extensively for about 100 years [16 ]. Such discharges exhibit 
characteristic luminous structures. The brightest part of the discharge is the negative glow, which is separated from 
the cathode by the cathode dark space (Crookes or Hittorf dark space). The large drop of the electrical potential in 
this cathode dark space is called the cathode fall. The positive column and the negative glow are separated by the 


Faraday dark space. The positive column extends to the anode, which may be covered by the anode glow. 


CATHODE 
NEG. GLOW 



MN 


fvXV^ 


DIELECTRICUV 


Figure C2.13.3. Schematic illustrations of various electric discharges: (a) DC-glow discharge, R denotes a resistor; 
(b) capacitively coupled RF discharge, MN denotes a matching network; (c), (d) inductively coupled RF discharge, 
MN denotes matching network; (e) dielectric barrier discharge. 
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Figure C2.13.4. Schematic illustrations of selected microwave discharges: (a) the discharge tube DT is inserted 
through a rectangular wave guide RW parallel to the electric field, M denotes the magnetron; (b) plasma reactor 
with slots, CP denotes the coupling probe, RC refers to the ring circulator CS to the coupling slots and Q denotes 
the quartz tube [22]; (c) planar microwave plasma source: MW, microwave and MWW, microwave window [21]; 
(d) electron cyclotron plasma reactor, MC denotes the magnetic coils. 

The microscopic processes in the DC glow discharge are fairly well understood [17]. Positive ions are accelerated 
by the high electric field in the cathode fall towards the cathode surface. The collisions of the energetic ions with 
the surface sputter neutral atoms and, most importantly, produce secondary electrons which are accelerated in the 
cathode fall. The energetic electrons transfer most of their energy in the cathode dark space and the negative glow 
to heavy particles in inelastic collisions by excitation and dissociation and create charge carriers by impact 
ionization. The discharge regions close to the cathode (cathode fall and negative glow) are necessary for 
establishing a self-sustaining glow discharge. A positive column is formed only under conditions where there is a 
long narrow separation between cathode and anode so that significant charge carrier losses to the surrounding 
discharge wall occur. The electrons, which have lost most of their energy in the negative glow, gain energy in the 
longitudinal electrical field of the positive 
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column. A steady-state electron energy distribution is formed that produces ions and electrons in sufficient 
numbers to balance the charge carrier losses at the wall. 


Discharges sustained by time-varying electric fields are referred to as pulsed or AC discharges. High-frequency 
electromagnetic fields in the radio -frequency (RF) [8] or microwave [ 18 ] range are of particular interest for 
generating discharge plasmas for various plasma chemical applications. 


In a capacitively coupled RF discharge, the electrodes are covered by sheath regions similar to the cathode dark 
space in the DC glow discharge. The bulk plasma occupies the region between the electrodes. The coupling 
capacitor between the RF generator and the powered electrode and the appropriate choice of different areas for the 
(smaller) powered electrode and (larger) grounded electrode leads to a negative DC potential between the plasma 
and the powered electrode, the so-called self-bias potential. The self-bias accelerates the ions near the powered 
electrode to energies of up to a few hundred electron volts. In contrast to the DC discharge, the electrodes of RF 
discharges must not be in conductive contact with the plasma. The most commonly used RF frequency is 13.56 
MHz; the pressure range is between 0.1-100 Pa. Such discharges are successfully used for the plasma-assisted 
deposition of thin films, for plasma etching and for the sputtering of insulating materials. 

The inductively coupled plasma [19] is excited by an electric field which is generated by an RF current in an 
inductor. The changing magnetic field of this inductor induces an electric field in which the plasma electrons are 
accelerated. The helicon discharge [ 20 ] is a special type of inductively coupled RF discharge. 

Gas discharge plasmas have also been successfully excited by microwaves. Two characteristic features of the 

microwaves are (i) the fact that the wavelength is of the order of the dimensions of the plasma apparatus (the 

standard frequency of 2.45 GHz corresponds to a wavelength of 12.2 cm) and (ii) the short period of the exciting 

microwave field. Only electromagnetic waves with a frequency higher than the electron plasma frequency f Q can 

penetrate into the plasma. Waves with lower frequencies than/ are reflected. The electron density which 

corresponds to the frequency / is called the cut-off density; however, electromagnetic fields can penetrate through 

a skin depth into the plasma. This skin sheath permits a partial absorption of electromagnetic power even above the 

ii 1 1~% o 

cut-off density . These facts limit the electron density for a 2.45 GHz excitation to 10 - 10 cm . 

Plasmas excited by microwaves may be produced in closed structures, in open structures and in resonance with a 

magnetic field. In closed structures, the plasma vessel is surrounded by metallic walls. Depending on the resonance 

conditions either multi-mode or single-mode cavities are used. Discharges in open structures include microwave 

torches, slow-wave structures and surfatrons in which the plasma is generated by the excitation of surface waves. 

Various types of slotted waveguides are applied successfully for the excitation of specially shaped microwave 

plasmas for technical applications [21, 22]. Such configurations produce plasmas with electron concentrations up to 

i ^ o 

10 cm in a broad pressure range. 

Microwave discharges at pressures below 1 Pa with low collision frequencies can be generated in the presence of a 
magnetic field B where the electrons rotate with the electron cyclotron frequency. In a magnetic field of 875 G the 
rotational motion of the electrons is in resonance with the microwaves of 2.45 GHz. In such low-pressure electron 
cyclotron resonance plasma sources collisions between the atoms, molecules and ions are reduced and the 
formation of unwanted particles in the plasma volume ('dusty plasma') is largely avoided. 

A special type of the RF discharge is the silent or dielectric barrier discharge [ 23 ] which can be operated at 
pressures 
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from 0.1-10 bar. Such a discharge was already used in 1857 by Siemens [ 24 ] for the production of ozone from air 
or oxygen. The silent discharge is generated between two electrodes with a dielectric barrier covering at least one 
electrode. The gas-filled gap is small, rarely exceeding a few millimetres. Voltages of 5 - 100 kV with frequencies 
from 50 Hz up to 1 MHz are necessary to sustain these discharges. The breakdown is connected with the formation 
of a large number of statistically distributed filaments (filament diameter 0.1 mm). The charge carriers from the 
plasma remain on the dielectric barrier and compensate the external electric field. Therefore, the lifetime of the 
filaments is very short (1 - 10 ns). The current density in one filament can be as high as 100 - 1000 A cm with 
electron densities in the range 10 - 10 cm, and electron energies in the range 1-10 eV. Homogeneous 
dielectric barrier discharges are also observed. This type is called atmospheric pressure glow discharge [23]. 


C2.13.5 PLASMA CHEMICAL PROCESSES 


Plasmas are used for the processing of solid surfaces such as in etching, cleaning and oxidizing and for the 
deposition of various thin films. In these applications chemical processes occur in the plasma volume as well as on 
the surfaces. The substrate may be positioned within the active plasma region or outside (remote plasma 
processing). Plasma chemistry is used for the treatment of gases to create and to change gaseous compounds in 
homogeneous gaseous reactions or heterogeneous surface processes. Several examples are discussed below. 

C2.13.5.1 PLASMA SURFACE PROCESSES 

(A) PLASMA ETCHING 

Plasma etching is an important process in the manufacture of microelectronic devices [25]. It is used for pattern 
transfer during the fabrication of integrated circuits. Structured masks of photoresist determine where the etching 
should occur and where not. For structures that are small in comparison with the thickness of the mask, anisotropic 
etching using a non-thermal plasma is the only feasible technique. The removal (sputtering) of atoms or molecules 
from a solid surface is possible by momentum transfer from the heavy particles impinging on the surface and via 
collision cascades. Chemical surface reactions initiated by reactive particles from the plasma can lead to the 
formation of volatile reaction products. The nature of a gaseous reaction product formed in the surface reactions 
often determines the choice of etching gas used in a particular application ( table C2.13.2 , [3, 9]). For instance, 
fluorine-containing molecules are used in the etching of silicon because SiF 2 , the main reaction product, is highly 
volatile. Hydrocarbons are etched using 2 because of the benign by-products H 2 and C0 2 . The etching of Al 
uses Cl-containing gases because of the high vapour pressure of AlCl 3 . Ion impact leads to non-isotropic etching 
owing to the directed flow of the ions through the plasma sheath to the surface. Ion impact can also influence the 
formation of inhibitors, e.g. sidewall passivation ( figure C2.13.5 ). Characteristic parameters to describe the etching 
process are the etch rate, the selectivity, the anisotropy, the uniformity of the process across the wafer surface and 
the possible material damage. Chemical reactions are isotropic and have a high etch rate and a high selectivity. 
Material damage is usually negligible. Physical sputtering processes are characterized by high anisotropy, low 
selectivity and low etch rates. The impact of energetic ions can damage the substrate. In these processes, the plasma 
serves two purposes. Firstly, it activates the often inert feed gases of the plasma gas mixtures through the formation 
of reactive radicals and, secondly, the ions are accelerated in the plasma sheath and impact on the substrate nearly 
perpendicularly to the surface and influence the surface processes. 
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Table C2.13.2 Gases for etching of various materials [3, 4]. 


CF 4 /0 2 , CF 2 CI 2 , CF 3 CI, SF 6 /0 2 /CI 2 , CI 2 /H 2 /CI 2 F 2 /CCI 4 , C 2 CIF 5 /0 2> 
SiF 4 /0 2 , NF 3 , CCI 4 , C 2 CIF 5 /SF 6 , C 2 F 6 /CF 3 CI, Br 2 , CF 3 CI /Br 2 

Si0 2 CF 4 /H 2 , C 2 F 6 , C 3 F 8 , CHF 3 /0 2 


Silicon 


Si 3 N 4 CF 4 /0 2 /H 2 , C 2 F 6 , C 3 F 8 , CHF 3 , NF 3 , CHF 3 /0 2 

Organics, polymers 2 , CF 4 /0 2 , SF 6 /0 2 

Silicides CF 4 /0 2 , NF 3 , SF 6 /CI 2 , CF 4 /CI 2 

Al BCI 3 ,BCI 3 /CI 2) CCI 4 /CI 2 /BCI 3 ,SiCI 4 /CI 2 

Cr CI 2 ,CCI 4 /CI 2 

Mo, Nb, Ta, Ti, W CF 4 /0 2 , SF 6 /0 2 , NF 3 /H 2 

Au C 2 CI 2 F 4 , Cl 2 , CCIF 3 

GaAs BCI 3 /Ar, CI 2 /0 2 /H 2 , CCI 2 F 2 /0 2 /Ar/He, CCI 4 , PCI 3 , HCI, Br 2 , COCI 2 


InP 


CH 4 /H 2 C 2 H 6 /H 2 , CI 2 /Ar,CI 2 /0 2 
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Figure C2.13.5. Schematic illustrations of isotropic etching by a neutral gas and anisotropic plasma etching. 
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As an example, we look at the etching of silicon in a CF 4 plasma in more detail. Flat Si wafers are typically etched 
using quasi-one-dimensional homogeneous capacitively or inductively coupled RF -plasmas. The important process 
in the bulk plasma is the formation of fluorine atoms in collisions of CF 4 molecules with the plasma electrons 

CF 4 + e" -* CF } + F + e" 
CFj + tT -+ CFl + F + ^e-. 

The subsequent reaction of the F atoms with the silicon surface leads to the formation of the volatile product SiF 4 : 

4K + Si -+ SiFj. 

The flux of F radicals to the wafer is nearly isotropic. Anisotropic etching is due to ions that are incident on the 
wafer essentially perpendicular to the surface (see above). 

The sidewall protection by a thin polymeric film is an additional important process that occurs. The formation of F 
atoms in the gas phase leads to the formation of CF 3 radicals among other species. These radicals are the precursors 
for the deposition of a polymeric C x F film on the substrate surface. The growth of this film is governed by a 
balance between the ion-induced fragmentation and the desorption of CF x particles. The thickness of this film is 
typically 1-6 nm [26]. The F atoms do not react directly with the silicon surface. They diffuse through this film and 
then react. The volatile SiF 4 reaction products diffuse back into the gas phase after penetrating the thin film. The 
ion bombardment generates additional F atoms in the fragmentation of the C x F film and increases the diffusion 
velocity of the F atoms and the SiF 4 molecules due to the energy deposited into the film. It is obvious that the ion 
current density at the sidewall of the trench is much smaller than the density at the bottom of the trench. This 
sidewall passivation mechanism is crucial for the success of anisotropic etching. The formation of sidewall layers 
also results in characteristic angular distributions of the ions at the wall. CF^ions which have a high etch potential 
have a narrow angular distribution, whereas CFjand CF + ions, which are responsible for fluorocarbon layers, have 
a broad angular distribution [27]. The addition of oxygen to the CF 4 plasma leads to an increase of the etch rate, 
but decreases the anisotropy. This is understandable on the basis of the gas phase chemistry. C0 2 is formed, the 
CF x concentration is reduced, and the formation of fluorine is enhanced. The decrease of the CF x concentration 
impedes the formation of the protective sidewall layers. 


A special case of plasma etching is the etching of hydrocarbons in an oxygen discharge. The removal of photoresist 
in oxygen-containing plasmas is a frequently employed process in the semiconductor industry. The cleaning of 
metallic work pieces covered with organic surface layers is another widely applied technique in industry. The 
oxygen plasma generates oxygen atoms that react with the surface contaminant to form volatile C0 2 and H 2 0. 
Plasma cleaning causes low thermal stress of the surface. Ecologically and environmentally harmful solvents are 
largely avoided. 

(B) SURFACE TREATMENT 

Plasmas are successfully applied in surface oxidation at low substrate temperatures. In the plasma oxidation 
process the substrate is held at a floating potential in an oxygen plasma [9]. Plasma anodization is usually carried 
out with a positively biased substrate. The high oxidation rates of plasma anodization are a result of the high 
currents of electrons and negative ions to the substrate. High-quality oxide layers with excellent electrical 
properties, which cannot be achieved by standard thermal oxidation methods [28], are produced by by a remote 
oxygen plasma and keeping the substrate surface at room temperature. 
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The surface properties of solids are controlled by the chemical structure of the surface layer: e.g., CH 3 and CF 3 
groups at the surface produce a high-degree hydrophoby. Oxygen- and nitrogen-containing groups (e.g., -OH, - 
NH 2 ) produce a hydrophilic surface with high adhesion and unique properties for printing, painting, glueing and 
cell growth. Treatment of polymers in 'mild' (low ion energy) remote plasmas in selected gases provides highly 
selective surface properties. Biomaterials prepared by plasma chemical methods are sterile [29]. 

(C) THIN-FILM DEPOSITION 

Thin films with a broad spectrum of properties can be deposited in plasma chemical processes [30, 31 ]. Hard 
coatings such as diamond films and TiN films, soft plasma polymer films, insulating SiO x films, highly conducting 
Si films, anti-reflection coatings, semi-permeable membranes and very effective diffusion barriers can be 
deposited. Important parameters for the film deposition are (i) the nature of the precursor, (ii) the gas mixture and 
(iii) the selected plasma parameters. The advantage of plasma-assisted processes is the possibility to work with 
lower substrate temperatures than in pure chemical vapour deposition techniques, since the important reactions are 
initiated by energetic electrons. Plasma-deposited films have a high substrate adhesion because the substrate 
surface at the beginning of the deposition is activated by the plasma-surface interaction. 

The deposition of amorphous hydrogenated silicon (a-Si:H) from a silane plasma doped with diborane (B 2 H 6 ) or 
phosphine (PH 3 ) to produce p-type or n-type silicon is important in the semiconductor industry. The plasma 
process produces films with a much lower defect density in comparison with deposition by sputtering or 
evaporation. 

The SiH 3 radical is the dominant growth precursor for the formation of the a-Si:H films in a low-temperature 
silane plasma [32]. Silane molecules are dissociated by energetic plasma electrons: 

c" + SiH 4 ^ SiHj + H + c" 
e" +SiII 4 ^ Sill 2 + 211 + e- 

followed by the reactions 

SiH 4 + $iH 2 -v Si 2 H 6 
o" +SijH & ->- SiH 3 + SiH 2 + H + C". 

The SiH 3 radical physisorbs on the a-Si:H surface and recombines there with another SiH 3 radical to form disilane 
Si 2 H 6 , or abstracts H from the surface to form a dangling bond and SiH 4 . The film growth is determined by the 
chemisorption of the SiH^ radical on a free dangling bond site by formation of a Si-Si bond. The cross-linking of 


neighbouring Si-H bonds leads to the elimination of H 2 - 

Admixtures of oxygen or oxidizing agents such as N 2 to the silane plasma enable the deposition of Si0 2 films. 
Other Si-containing compounds such as SiCl 4 or tetraethoxysilane (Si(OCH 2 CH 3 ) 4 ) are used for plasma-enhanced 
Si0 2 deposition at lower temperatures [33]. 

The deposition of organic films by plasma polymerization is an important application of non-thermal plasmas [30]. 
Plasma polymers are formed at the electrodes and the walls of electrical discharges containing organic vapours. 
Oily products, soft soluble films as well as hard brittle deposits and powders are formed. The properties of plasma 
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polymers are similar to those of conventional polymers, but their structure is different. The often highly cross- 
linked material is not characterized by a mere repetition of the basic units. The properties of the polymers are 
essentially determined by the deposition conditions as well as by the plasma and by the gas flow rather than by the 
particular monomer. All organic compounds can be used as monomers or precursors for plasma polymerization. 
Functional groups such as double bonds are not necessary. Thus methane can be used as a precursor for plasma 
polymerization. Plasma polymer films have numerous advantages over conventional films: good adhesion on very 
different substrates, freedom from pinholes, good conformation to various substrate surfaces, a high degree of 
cross-linkage, chemical inertness and low levels of leachables [29]. 

The reaction mechanisms of plasma polymerization processes are not understood in detail. Poll et al [34] (figure 
C2.13.6) proposed a possible generic reaction sequence. Plasma-initiated polymerization can lead to the 
polymerization of a suitable monomer directly at the surface. The reaction is probably triggered by collisions of 
energetic ions or electrons, energetic photons or interactions of metastables or free radicals produced in the plasma 
with the surface. Activation processes in the plasma and the film formation at the surface may also result in the 
formation of non-reactive products. 
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Figure C2.13.6. Schematic illustrations of plasma - assisted thin - film deposition. 


An important and well studied example is the deposition of plasma-polymerized fluorinated monomer films [35]. 
Monomers are fluoroalkyls, fluorohydroalkyls, cyclo-fluoroalkyls, as well as unsaturated species. The actual 


-16- 

monomers are the CF x radicals from which the polymer deposit is built. Electron impact produces the active 
species: ions, F atoms and F 2 molecules, and CF x radicals. CF + and CFtions are more effective for the polymer 
deposition than CFfions (see above) [27]. The film growth occurs by the addition of CF x radicals to previously 

activated sites. The activation of surface sites occurs via collisions of charged particles, via ion collisions on 
negatively charged substrates and via electron collisions on positively charged substrates. The fluorine-to-carbon 
ratio influences the fluorine content in the plasma which, in turn, determines whether polymerization or etching is 
the dominant process (figure C2.13.7, [36]). The admixture of oxygen enhances the fluorine concentration and thus 
the etching properties by reducing the recombination probability of F atoms by formation of CO, C0 2 , COF 2 . The 
addition of H 2 reduces the fluorine concentration via the formation of HF, thus enhancing the polymerization. The 
competition between etching and deposition is also influenced by other conditions such as the substrate bias. 
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Figure C2.13.7. Change between polymerizing and etching conditions in a fluorocarbon plasma as determined by 
the fluorine-to-carbon ratio of chemically reactive species and the bias voltage applied to the substrate surface 
[36]. 

C2.1 3.5.2 PLASMA VOLUME PROCESSES 

The chemical reactions in plasmas find applications in the generation or conversion of gaseous products, primarily 
via homogeneous reactions or in surface treatment and modification processes via heterogeneous reactions. A 
classical example for the production of a gaseous product is the ozone synthesis in dielectric barrier discharges. 
The electrons are the most important species for ozone formation [23]. A non-thermal plasma generated in a 
dielectric barrier discharge reactor at atmospheric pressure in pure oxygen causes a significant fraction of the 
oxygen molecules to be dissociated as the result of electron collisions. 
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3 (A 3 E N ;) + e- -+ 0( J P) + 0(T)+e fc 


3 (B J n u ")+c- 


0( 3 P) + 0( ] D)+<T 


Nitrogen molecules, a major constituent of air, are excited by electron collisions and the excitation energy is 
transferred to the 2 molecules, or the N 2 molecules may be dissociated and O atoms formed via the reactions 


N 2 ' e- -* N 2 (A^) I e 




-> 

N 2 (R 3 nj+e 

N?(A. 

, B) + 2 

-> 

N2 + 2O 



-» 

N 2 + O 


N 2 +e _ 

^ 

2N+C" 


N+O2 

-> 

KO + O 


N + NO 

_► 

N* + G. 


The ozone formation occurs in a three-body collision of O atoms with 2 molecules: 

+ 2 + M^ Oj + M 

The probability for three-body collisions increases with increasing pressure making the use of an atmospheric 
pressure plasma desirable. The above process is used worldwide for ozone production for water purification. 

Pollution control such as the reduction of nitrogen oxides, halocarbons and hydrocarbons from flue gases [37] is 
another important field of plasma-assisted chemistry using non-thermal plasmas. The efficiency of plasma 
chemical reactions can be enhanced by introducing catalysts into the plasma [38, 39 ]. 

C2.1 3.5.3 PLASMA CHEMISTRY IN NATURE 

Naturally occurring plasma induced processes on Earth are not common. One example is the formation of nitrogen 
oxides in lightning. Another area of naturally occurring plasmas is the ionosphere [40]. The charge carriers are 
mainly produced by UV radiation from the Sun. The neutral gas composition in the ionosphere is determined by 
oxygen and nitrogen with smaller admixtures of water molecules, C0 2 and inert gases such as He and Ar. The most 
effective ionization process is the photoionization of 2 and N 2 leading to the formation of atomic and molecular 
ions. 

Mass spectrometric investigations of the ionosphere show an abundance of molecular ions such as NO + and 
watercluster ions [41]. This is an indication of the result of ion-molecule reactions which change the chemical state 
of the ions in this plasma: 
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O' +N 2 -*NO + + N 

Nf + O -* NO + * N 

>JCf + ll 2 + M -+ NCT{11 2 0) + M 

NO (H 2 0) + H 2 + M -► NO (H 2 0) 2 + M. 

In addition, charge transfer processes such as 

n| + o 2 -* o; + N 2 

are efficient 

Molecular ions have an important role in charge carrier losses in the ionosphere. The probability of electron-atom- 


ion recombination processes is very small, because the frequency of the necessary three-body collisions 
(momentum conservation) at this low pressure is very low. Dissociative electron-molecular ion recombination 
processes are more effective. Here the two atoms that are formed ensure momentum conservation: 

e" + Nj -+ N + R 

Negative ions [42] are the result of electron attachment processes such as 

e" \ 20 2 -* O2 1 2 . 

Loss processes for °: ions are 

0^ +O2 + M -* O; +M. 

The ozone formation in the atmosphere is induced by radiation and a result of three-body collisions of the oxygen 
atoms with 2 molecules. This process requires a higher gas density and is, therefore, not efficient in the 
ionosphere. 


C2.13.6 PLASMA MODELLING 

C2. 13.6.1 MICROSCOPIC KINETICS 

Modelling plasma chemical systems is a complex task, because these system are far from thermodynamical 
equilibrium. A complete model includes the external electric circuit, the various physical volume and surface 
reactions, the space charges and the internal electric fields, the electron kinetics, the homogeneous chemical 
reactions in the plasma volume as well as the heterogeneous reactions at the walls or electrodes. These reactions are 
initiated primarily by the electrons. In most cases, plasma chemical reactors work with a flowing gas so that the 
flow conditions, laminar or turbulent, must be taken into account. As discussed before, the electron gas is not in 
thermodynamic equilibrium 
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with the heavy particles. The velocity distribution function of the electrons is determined by the energy and 
momentum exchange in collisions with the heavy particles, in electron-electron collisions and by the energy gain in 
the electric field. It can be determined from the Boltzmann equation or by particle simulation methods (e.g., Monte 
Carlo, particle in cell). The concentrations of the various kinds of particles can be calculated by systems of balance 
equations including the generation and loss mechanisms of particles by electron impact, heavy-particle collisions 
and the interaction with surfaces. The reaction probabilities of molecules also depend on their electronic, 
vibrational and rotational states. Aside from the mathematical problems in solving a coupled system of balance 
equations, the collision cross sections, the natural life times of excited states, sticking and recombination 
coefficients at the wall etc must be known. Their knowledge, however, is limited in the case of most, if not all, 
realistic plasmas. Only a few very special cases can be solved, provided further simplifying assumptions are made 
[43,44]. 

C2.13.6.2 MACROSCOPIC KINETICS 

The concept of macroscopic kinetics avoids the difficulties of microscopic kinetics [ 46 , 47 ]. This method allows a 
very compact description of different non-thermal plasma chemical reactors working with continuous gas flows or 
closed reactor systems. The state of the plasma chemical reaction is investigated, not in the active plasma zone, but 


in the final state after all unstable reaction products have been converted into stable products. The investigation is 
restricted to the gross reaction; intermediate reaction steps are not considered. Chemical quasi-equilibrium states 
play an important role in this method. Quasi-equilibrium is reached in plasmas with high electron concentration 
(high-power plasma) after a comparatively short time, whereas this state is reached after a much longer time in a 
low-power plasma (low electron concentration). Experimentally determined reaction rates of the net reaction allow 
the computational treatment of the system. The specific energy P/Vx (where P is the power, Fis the plasma volume 
and t is the residence time of the gas in the plasma) is a particularly crucial parameter. Up until now, only simple 
systems have been successfully modelled. 


C2.13.7 CONCLUSIONS 

Non-thermal plasmas with their hot electron gas extend the realm of conventional chemistry in many interesting 
directions. New technical applications have emerged such as surface treatment of materials in plasma etching and 
thin-film deposition in the microelectronics industry, plasma chemical surface modifications to achieve a variety of 
desired properties (increased wettability, biocompatibility) and the deposition of various coatings to improve the 
hardness and the tribological and optical properties of materials. As successfully and widely used as plasma 
chemistry has been in surface processing applications, the applications of plasma chemical methods for producing 
gaseous products are limited. The ozone synthesis is a unique process in that category and has been well known for 
more than a century. The development of new processes for the synthesis or decomposition of gaseous compounds 
will require a broader as well as a more detailed knowledge of the processes in non-thermal plasma, particularly at 
higher gas densities, and will necessitate the study of the effect of catalysts on the plasma. 
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C2.14 Biophysical chemistry 

J J Rams den 


C2.14.1 INTRODUCTION 

Biophysical chemistry may be defined as the application of physical chemistry to biological systems. The 
underlying question (e.g.[l]) is whether the laws of physics and chemistry suffice to understand biology. This 
question, whose origins go back a long way and which encompasses notions of vitalism and so on, has still not 
been definitively answered. Nowadays it is formulated somewhat differently, typically as 'What emergent 
properties are needed to characterize biological systems?' Haifa century ago the prevailing view was encapsulated 


in a rather well-known statement of P A M Dirac: 'The underlying physical laws necessary for the mathematical 
theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only 
that the exact application of these laws leads to equations much too complicated to be soluble.' Biology being even 
more complicated than chemistry, the difficulty was expected to be correspondingly greater. Nevertheless, as P W 
Anderson [2] has pointed out, biology is not applied chemistry, any more than chemistry is applied physics. 
Emergent properties are expected to arise through increases in scale and complexity, and there is no need to endow 
them with mystical attributes: one hopes to formulate them in mathematical terms just like the descriptions of 
simpler systems. 

Biological systems are bewilderingly complex; the task of biophysical chemistry is to render this complexity 
intelligible. It is, challengingly, complexity of the most difficult kind, falling between the tractable extremes of 
elements few enough to be enumerated and analysed exactly, and huge crowds of similar or identical elements 
whose statistical properties are sufficient for understanding the behaviour of the whole. The molecules of biology 
are astonishingly diverse and even apparently very minor changes — the single amino acid substitution in 
haemoglobin causing sickle cell anaemia is a paradigmatic example — has consequences at the levels of the 
structure of the molecules, the shape of the erythrocytes containing haemoglobin, the health of the individual 
human being, and the biology of the population. 

Most biochemists and molecular biologists make use of chemistry and physics in their investigations. Every time 
they run a 'gel' — a kind of chromatography used to separate protein (or DNA) mixtures, and appearing in almost 
every molecular biological lecture or publication — certain assumptions are made relating the position of a protein 
on the gel to its molecular mass, and a few 'standard' proteins are used to calibrate the positions. Is the procedure 
reliable? Certainly, many factors other than mass intervene. For example, the protein MARCKS, M ~ 30 000, 
appears near proteins almost three times heavier [3]. Why? Biophysical chemistry can provide the answer, and a 
correct apprehension of the reasons enhances the domain of application of the technique. Another example is the 
use of biosensors to investigate protein-protein association. Many investigators first coat the sensing platform with 
a thick dextran hydrogel, and covalently attach one of the pair of proteins whose association is to be investigated to 
the matrix. The hydrodyamics of the aqueous phase containing the binding partner are thereby greatly distorted 
compared with the free solution, and the measured binding rates tend to reflect highly retarded transport rather than 

affinity [JJ. Even greater caution is required when using gene chips, two dimensional arrays of -10 to 10 5 single 
stranded DNA fragments whose nucleic acid constitution and coordinates are, in principle, known. If it is desired to 
ascertain whether certain mutations are present in the gene of a patient, his or her DNA is isolated, separated into 
single-stranded molecules, labelled with a fluorescent marker, and brought into contact with a gene chip including 
sequences complementary to those sought. If they are present in the isolate, they will be bound specifically to the 
chip, 
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and after dissociating nonspecifically bound material, the remaining fluorescence is scanned and the corresponding 
sequences identified from their spatial coordinates. If it is desired to ascertain which genes in a given cell are 
expressed at a given moment, the messenger RNA (selected copies of the gene corresponding to those amino acid 
sequences, proteins, which will subsequently be synthesized — see also section C2.14.8 is isolated from the cell, 
labelled and exposed to the gene chip. The pattern of residual fluorescence should bear some relation to the 
messenger RNA population in the cell. Potential sources of misinterpretation include: variable degrees of labelling; 
complementary binding hindered by the presence of the label, or by the immobilization of the DNA strand to the 
chip surface; insufficient time allowed for specific association, or for dissociation of nonspecifically bound 
material; binding of the same sequence to multiple sites, or of different sequences to the same site. Clearly the 
specificity of binding to a given site increases with increasing length of the immobilized DNA sequence, but then 
so does the likelihood of mistakes in the synthesis of the sequences on the chip. The pattern of fluorescent spots — 
their positions and intensities — may depend as much on biophysico-chemical details of the binding processes 
taking place at the molecular level as on the actual nucleic acid sequences themselves. Since the information 
potentially available from the pattern is huge, it must necessarily be analysed largely automatically, by algorithms 
which fix certain choices regarding the interpretation of the data, and hence it is important for their alternatives to 
be thoroughly investigated. 

Classical biophysical chemistry has concentrated heavily on the elucidation of the structures of biomolecules and 


biomolecular assemblies. Since there are already several excellent texts and reviews dealing with techniques such 
as x-ray diffraction, electron microscopy, scanning probe microscopies and so on, and since they are all dealt with 
elsewhere in this encyclopaedia, it is superfluous to discuss them here. A few structural methods whose domain of 
application has been so exclusively concerned with biological material that they have not warranted individual 
entries elsewhere will be discussed in section C2. 14.2. 

Apart from the sheer complexity of the static structures of biomolecules, they are also rather labile. On the one 
hand this means that especial consideration must be given to the fact (for example in electron microscopy) that 
samples have to be dried, possibly stained, and then measured in high vacuum, which may introduce artifacts into 
the observed images [5]. On the other, apart from the vexing question of whether a protein in a crystal has the same 
structure as one freely diffusing in solution, the static structure resulting from an x-ray diffraction experiment gives 
few clues to the molecular motions on which operation of an enzyme depends [6]. 

Biology has been defined as the organization of parts and processes, their reciprocal interactions and directedness. 
Given instantaneous and perfect mixing, could not metabolic duties be accomplished via specific intermolecular 
interactions, at least on the level of an individual cell? The description and understanding of specificity is another 
great area of classical biophysical chemistry. Compared with the rather high level of development of structural 
elucidation, the field of biomolecular interactions has progressed less; they are often described using oversimplified 
models, e.g. exponential decay for the dissociation of two molecules, or of a molecule from a surface, despite the 
fact that empirically this law has been found to be an exception to the more general occurrence of time-dependent 
dissociation rate 'constants' ( section C2. 14.4.3 ). Specificity is dealt with in section C2. 14.6 ; one should keep in 
mind that mixing is neither instantaneous nor perfect, and the relative importance of directed transport over random 
diffusion within a cell is a major current preoccupation in biophysical chemistry. 

Biological reactions in vivo rarely operate under conditions even remotely approaching those of reversibility . For a 
living organism, the rate of a process is usually more important than the attainment of equilibrium, and large 
driving 
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forces are employed to achieve a rapid rate, leading to profligate waste of free energy from the classical chemical 
thermodynamical viewpoint. The law of mass action has universal validity, but emphasis on the 'biochemical 
standard state' (in which all components are fully hydrated and dissolved at a concentration of 1 M in water, which 
itself has a concentration of 55.6 M), far removed from reality, is to the detriment of discussion of the actual 
concentrations of the molecules participating in biochemical pathways. Moreover, since biology is a realm of 
almost universally broken ergodicity, the premises of statistical mechanics approaches need to be carefully 
scrutinized; more emphasis on pathways would be appropriate. 

The degree of coarse graining appropriate to solve a biological problem is an ever-present preoccupation [7]. When 
attempting to predict the three-dimensional structure of a protein, is it enough to consider the amino acids as 
structureless blobs, or do their individual atoms need to be included? The structure of large, elongated extracellular 
matrix proteins such as fibronectin can be successfully modelled even more coarsely as a string of beads [8], each 
bead comprising tens of amino acids. Coarse graining seems to be carried too far, however, when human brain 
activity is investigated in vivo using nuclear magnetic resonance (NMR) imaging. 'High spatial resolution' in the 

context of this technique means averaging over a volume of ~10mm ^containing hundreds of thousands of neurons, 
each of which may be connected to thousands of other neurons elsewhere. It might be recalled that currently we are 
not even capable of giving a comprehensive description of the brain of the leech, with only a couple of dozen 
ganglions, each containing about 400 nerve cells. Moreover, although the NMR imaging technique is considered to 
be noninvasive, one may legitimately enquire whether a human being will, indeed can, behave naturally and 
normally when constrained to lie still in a narrow tube with his head tightly fixed. 

A further problem confronting the biophysical chemist is that biological systems are usually highly 
compartmentalized, and the actual numbers of each active species taking part in a reaction within a given 
compartment may be very small. Hence there can be no question of passing to the thermodynamic limit. 
Fluctuations are expected to play a dominant, sometimes enabling, sometimes destructive role in biology, and the 


way that living beings have evolved sufficient autonomy to deal with fluctuations is itself a fascinating area. 

Compartmentalization not only leads to a greatly amplified role for the incoherent fluctuations of groups of 
molecules, but inevitably implies the possibility of interaction between those molecules and the compartment walls 
(often a bilayer lipid membrane), i.e. interfacial phenomena, added to which is the fact that the properties of the 
solvent (usually water) may be modified in the vicinity of the wall. In experiments now considered as classic, 
Kempner and Miller [9, 10] centrifuged intact cells and showed that virtually no enzymes were present in the 
soluble phase; the term cytosol refers to the solution, containing a wide variety of enzymes and other 
macromolecules, obtained from cell fractionation after disruption. Despite these and other observations, 
experiments to elucidate in vitro reactions between biomolecules under homogeneous conditions greatly 
preponderate over those in which the equivalent reaction is studied heterogeneously. Perhaps the recent 
introduction of some excellent new techniques for investigating biomolecular interactions at the solid-liquid 
interface [ 11 ] will start to shift this balance, complementing and ultimately replacing molecular biological 
techniques such as the two-hybrid system. 

Once pairwise biomolecular interactions have been correctly characterized, the next step is to understand how they 
mesh together in a network. During the past couple of decades, molecular biology has been rather successful at 
identifying individual genes and their protein products, and obtaining valuable mechanistic insights through site- 
directed mutagenesis. A cell is not a mosaic of individual reactions, however, any more than a protein is a (linear) 
mosaic of amino acids. To establish that process X is catalysed by Xase, which may then be cloned, sequenced and 


-4- 

expressed, is only the beginning of understanding how the reactions are linked together in an elaborate network of 
control in which some gene products regulate the activity of others, and which is not merely hierarchical, but 
heterarchical. Moreover, it is essential to recognize that the intramolecular bonds which determine the structures of 
biological molecules are usually comparable in strength to the intermolecular interactions as well as interactions 
with the solvent (water). Hence the structures will in many cases be modified upon association, changing the 
affinity of the complex for a third and subsequent molecules, and hence a comprehensive catalogue of all the 
pairwise interactions in a cell will be a very incomplete representation of the network. 

In order to meet the challenge of understanding how hundreds or even thousands of partly interrelated reactions fit 
together to form a coherent functioning whole, systems theory was developed in the 1960s. It was supposed to lead 
to general principles governing complex organizations such as the living cell. A considerable ceuvre was achieved 
(e.g. [12]), but it must be conceded that it has had rather little influence on biological research. Perhaps it was too 
ambitious for the level of phenomenological knowledge extant when it was first developed, and since therefore no 
immediate application of its predictive power was possible, it fell into neglect. Moreover the theory quickly 
becomes intractable when nonlinear systems involving more than two elements are considered. After three decades 
of immensely diligent data-gathering, the situation may now be more propitious for the application of systems 
theory to biology, but to begin with a more modest programme may be in order, such as an investigation of the 
scaling laws of biological reactions, which are far more strongly nonlinear than those encountered in most 
nonbiological systems. 

Acknowledgement of the existence of biological systems refocuses attention on the old question of whether new 
physico-chemical concepts are required in order to understand their working. Is regulation and control theory, as 
developed mainly in departments of engineering and electronics, adequate to describe biological systems, or is their 
complexity — multilevelled and on multiple time scales — sufficiently great to make them quantitatively different in 
some way? This question is still open. The old dream was to predict protein structure (and possibly even some 
dynamical properties) from gene sequence, and then function from structure. The first part looks close to being 
achieved, although many proteins are post-translationally modified (glycosylated, lipidated, etc) and these 
modifications, carried out by other proteins, are crucial for regulating the specific molecular interactions which 
play such an essential role in metabolism. But the notion of the 'function' of a particular molecule embodies all its 
interactions with others, and it is not clear that studying the genome alone will enable one to understand them. Put 
even more starkly, could one predict the existence of a central nervous system merely from studying the genome? 
The reductionist view is to seek a molecular explanation, but a list of all the molecules in a cell, even if their 
spatially varying concentrations are given, is not sufficient to understand how the whole cell works, let alone 


multicellular organisms. 

It is of course futile to attempt to cover all of biophysical chemistry within one short chapter. The emphasis will be 
on necessary concepts, especially where these diverge from the mainstream of physical chemistry; inevitably there 
will be gaps. Material which is well known and readily found in standard texts (e.g. [13]), or in other chapters of 
this encyclopaedia, will be dealt with cursorily, if at all, but new material, and topics less well known than they 
should be, are accorded a more detailed treatment. Given the central role of macromolecular interactions in 
biological systems, they are covered extensively ( section C2.14.6 and section C2.14.7 and part of section C2. 14.3 ), 
except for protein folding ( section C2. 14.2 ) since it is also the topic of chapter 2.5. Biological membranes and 
associated topics such as the conduction of the nervous impulse [14, 15] will not be discussed. Another omission is 
the interaction of light with biological molecules, because it does not seem that in essence they diverge 
significantly from photochemical processes in general, without excluding the possibility that photons may be 
involved in intercellular signalling. For the same reason, intra- and intermolecular electron transfer phenomena 
have also been omitted. 
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To some extent, the division into sections is arbitrary: for example, aspects of biological structure prediction 
( section C2. 14.2.2 ) hinge on kinetic considerations ( section C2. 14.3.5 and section C2. 14.4 ). Most obviously in a 
developing organism, but actually throughout life, morphogenesis and regulation depend on proteins which are 
being synthesized at definite rates. The essence of enzyme function lies in the dynamical aspect of structure 
(conformational relaxation). Indeed, if one had to name a single principle characterizing biophysical chemistry and 
how it differs from the rest of physical chemistry, it would be the emphasis on kinetics, and an alternative 
definition of biophysical chemistry might be the science of kinetics and pattern. The reason for including pattern, 
i.e. spatial inhomogeneity, will become clear in section C2. 14.4.4 . 

C2.14.2 BIOLOGICAL STRUCTURE 

Ever since it became apparent that many proteins can refold from a denatured random coil into a fully functional 
enzyme, implying that the amino acid sequence alone encodes the necessary information, this field has been 
divided into two: the experimental determination and the prediction of structure from sequence. It may be that one 
day it will be much easier and quicker to compute structures from sequences ( section C2. 14.2.2 ), but at present the 
experimental determination is more reliable. The choice is between classical methods capable of yielding three- 
dimensional coordinates of many or all the atoms in the molecule, which is however not under in vivo conditions; 
and a panoply of diverse methods capable of lower resolution, or of elucidating only one particular aspect of 
structure, but under physiological conditions. Nowadays it is realized that the determination of structure is a 
problem of inference using data from diverse sources which must be combined, and no one method is universally 
applicable. It is furthermore as well to remember that biological structure is set in a dynamical context, and that the 
characteristic patterns of biopolymer structural fluctuations are probably essential to understanding functional 
mechanisms; Ageno [16] gives an excellent example of the consequences of rapid transitions between different 
conformational states of DNA. 

C2.1 4.2.1 STRUCTURE DETERMINATION 

The classical methods of determining three-dimensional native structures of biopolymers, i.e. the spatial 
coordinates of all their atoms, are well documented by Cantor and Schimmel [ 13 ] and will not be reiterated in any 
detail here. The most comprehensively useful technique is x-ray diffraction (see chapter B 1.9), for which the 
material must be prepared in crystalline form. The attainable resolution is strongly dependent on the size and 
quality of the crystal. Crystallization is still a highly empirical art. Intense x-ray sources (synchrotron radiation) 
enable smaller crystals to be used, although the rate of radiation damage is correspondingly faster. Until recently, 
membrane proteins, which may comprise about a third of the expressed protein repertoire of a typical cell, could 
not be crystallized in their native environment and their structure determination using x-rays was problematical; the 
use of cubic phase lipids offers a promising new route [17]. 

The main problem with x-ray (and neutron) diffraction is that the information it is made to yield is essentially 


static. X-ray diffraction generates data on the millisecond time scale, whereas amino acid residues in a protein can 

rotate about 10 times per second, and even proton exchange takes place on a submicrosecond scale. Hence an 
enormous amount of information is averaged out. Some attempts to quantify molecular motility have been made by 
analysing the temperature dependence of the broadening of the Debye-Waller factors [18], but large infrequent 
motions do not perceptibly contribute to the broadening. Since it is precisely these motions which may constitute 
the key part of enzyme action, their invisibility vitiates the structure — » function path of inference. An equally 
serious problem is that many, or possibly most, proteins can exist in several stable conformational states and 
therefore possess the ability to 


-6- 

remember. This polymorphism, proper characterization of which may be essential to understand the functional 
mechanism of a protein, usually remains undetected by the classical methods, for the simple reason that in the final 
stage of numerically refining the atomic coordinates derived from the diffraction data a computer is programmed to 
find a single optimal structure, not the optimal mix of (unknown) structures, which is probably anyway 
indeterminate from the available information. 

A further problem is the influence of the rather unusual — from the physiological viewpoint — salt conditions 
necessary for crystallization. It should not be presumed that proteins embedded in a crystal are in their most 
common native structure. It is well known that, with the exception of sodium or potassium chloride, which are not 
very useful for inducing crystallization, salts change key protein parameters such as the melting temperature [19]. 

A weakness with x-ray, but not neutron diffraction (although the latter is experimentally more difficult, mainly 
because neutrons are far harder to produce and focus than x-rays) is that the hydrogen atoms are invisible, and their 
positions must be inferred during numerical structure refinement. Given the primordial role of hydrogen bonding in 
determining biological structure, this omission is unfortunate. 

The spatial arrangement of atoms in two-dimensional protein arrays can be determined using high-resolution 
transmission electron microscopy [20]. The measurements have to be carried out in high vacuum, but since the 
method is used above all for investigating membrane proteins, it may be supposed that the presence of the lipid 
bilayer ensures that the protein remains essentially in its native configuration. 

Nuclear magnetic resonance spectroscopy (chapters Bl.ll , B1.12 , B1.13 and B1.14 ) can provide cross-relaxation 
rates between two proton spins, from which a set of short range (up to -5 A) distance constraints can be generated 
[21], from which in turn a three dimensional conformation can be computed [22, 23]. The number of constraints is 
much larger than the number of degrees of freedom in the protein, which partly compensates for the limited 
accuracy of the constraints (uncertainties can be as much as -1 A, but renders the computational problem 
exceedingly difficult, comparable to the protein folding problem (section 2.5). The upper limit of protein molecular 
weight is about 200 000. Since the intrinsic time scale of NMR is about 1 ms, as in the case of x-ray diffraction, 
many conformations are averaged out. An attraction of the method is that the protein is not constrained in a crystal, 
but is presumably present in its native structure, although in order to measure signals of adequate intensity, rather 
high protein concentrations have to be used and there is a risk of aggregating the protein. 

Apart from these mainstream methods enabling one to gain a comprehensive and detailed structural picture of 
proteins, which may or may not be in their native state, there is a wide variety of other methods capable of yielding 
detailed information on one particular structural aspect, or comprehensive but lower resolution information while 
keeping the protein in its native environment. One of the earliest of such methods, which has recently undergone a 
notable renaissance, is analytical ultracentrifugation [24], which can yield information on molecular mass and 
hence subunit composition and their association/dissociation equilibria (via sedimentation equilibrium 
experiments), and on molecular shape (via sedimentation velocity experiments), albeit only at solution 
concentrations of at least a few tenths of a gram per litre. 

The new scanning probe microscopies ( chapter B 1 . 1 9 ) have been used enthusiastically by biologists almost since 
their invention, because biomolecules can be investigated in aqueous, physiological milieux at room temperature. 
Early hopes of using atomic force microscopy to sequence DNA, or scanning tunnelling microscopy to characterize 
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individual ion channels, have not been realized, however. While a degree of resolution close to that achievable with 
good electron microscopy has been reported for fairly rigid protein arrays, notably bacteriorhodopsin, biological 
samples are on the whole too sticky and labile to be successfully imaged at submolecular resolution. Even the 
smallest achievable probe-sample interaction forces deform the sample, assuming that it does not slide about the 
surface, and the use of specially fine tips (radius < 1 nm) is vitiated by the rapid accumulation of biological debris 
at the tip. A brighter future lies in the more easily attainable realm of imaging the two-dimensional arrangement of 
proteins at a surface, from which the radial distribution function, a rich source of information on short and long 
range intermolecular interactions, can be obtained. For this application, it is sufficient to image each protein as a 
featureless blob. 

Circular dichroism has been a useful servant to the biophysical chemist since it allows the non-invasive 
determination of secondary structure (a-helices and P-sheets) in dissolved biopolymers. Due to the dissymmetry of 
these structures (containing chiral centres) they are biaxial and show circular birefringence. Circular dichroism is 
the Kramers-Kronig transformation of the resulting optical rotatory dispersion. The spectral window useful for 
distinguishing between a-helices and so on lies in the region 200-250 nm and hence is masked by certain salts. The 
method as usually applied is only semi-quantitative, since the measured optical rotations also depend on the exact 
amino acid sequence. 

Another technique used for structural inference is dielectric dispersion in the frequency [25] or time [26] domains. 
The biopolymer under investigation must have a permanent dipole moment ju . It is first dissolved in a 
dielectrically inert solvent, e.g. octanol, which may be considered to bear some resemblance to a biological lipid 
membrane, and then the complex impedance * = e' + ie w is measured over a range of frequencies /typically from a 

few kHz up to several tens of MHz. The dielectric dispersion arises through the rotational relaxation of the 
molecules. One or more Debye relaxation functions: 


€ ' = f « + Af /(1 + Ulfuf) (C2.14.1) 

and 

€" = Aeo[///«]/(] + Uffuf) (C2.14.2) 

where As Q is the dielectric relaxation amplitude (related to the square of the permanent dipole moment ju Q [25]), are 
fitted to the data in order to determine the relaxation frequency / , which is related to the rotational friction 
coefficient and hence to the shape of the molecule. Water contributes significantly to the impedance spectrum and 
its contribution must be carefully assessed and eliminated. 

Careful measurement of the kinetics of association of a molecule with a surface can also yield structural 
information at this level of resolution [27], and lateral clustering and crystallization can also be deduced. This is 
described in more detail in section C2. 14.7.2 . 

Another method applicable to interfaces is the determination of the partial molecular area a of a biopolymer 
partitioning into a lipid monolayer at the water-air interface using the Langmuir trough [28]. The first step is to 
record a series of pressure 7i-area (A) isotherms with different amounts n of an amphiphilic biopolymer spread at 
the interface. 
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A particular surface pressure is then chosen, and n plotted against the values of A at that pressure. From the 


conservation of mass, we must have 


n = TA-c h V (C2.14.3) 

where T is the surface concentration and c b the concentration in the subphase bulk of volume V. Such plots yield 
straight lines from which T and c b can be determined using equation (C2.14.3). A plot of T (from which a can be 

deduced) versus n can then be constructed; typically such a plot has several features which can be assigned to 
conformational transitions. 

A little known structural method for investigating protein motions is to measure the Rayleigh scattering of 
Mossbauer radiation (RSMR) [29]. A Mossbauer source moving with velocity ±v irradiates the sample, and both 
elastically and inelastically scattered radiation are measured as a function of the scattering angle 9. The energy 
spectrum (resonant peak position and linewidth) and the fraction of elastic scattering embody information on the 
dynamics of the protein. These measurements are especially valuable for characterizing the main types of 
movements in a protein, namely: 

1. solid state motions (amplitudes A 1 - 0.1-0.2 A and correlation times t q - 10 - 10 s); 

2. large scale individual motions of small groups of atoms (amplitudes A 2 up to 0.5 A and correlation times 
x 2 ~10- n -10- 9 s); 

3. complex cooperative motions of larger domains (amplitudes A^ up to 1 A and correlation times t 3 - 10 
-10- 7 s); 

C2.1 4.2.2 THE PREDICTION OF STRUCTURE 

A native protein is folded from a linear chain comprising anything from about 30 to 2000 amino acids, mainly 
chosen (with differing probabilities) from a set of twenty or so different ones. The prediction of the stable, three- 
dimensional structure (or structures) of a biopolymer is an horrendously difficult problem. It is not even known if 
the stable structure corresponds to the global energy minimum; even if it does, the calculation of this minimum, of 
a rough energy surface with countless local minima, is an extremely difficult optimization problem. In any case, 
realistic estimates of the time needed to search through all possible conformations would exceed the lifetime of the 
universe, whereas proteins are known to fold spontaneously within typically a few seconds. This is sometimes 
referred to as the 'Levinthal paradox'. 

An excellent account of the statistical physics of polymer chains, with some consideration of biological 
macromolecules, is given by Lifschitz etal [30]- Much recent work on the protein folding problem seems to have 
been inspired by the concept of frustration in spin glasses [ 31 ] — whether the analogy is deep or superficial remains 
to be seen — and hence it has been proposed that proteins fold on a rough energy landscape [32], which implies 
certain generic features of the folding pathway, but whether these are sufficient to solve the folding problem is 
unclear. Perhaps the most delicate issue is the relative importance of local versus nonlocal (i.e. between residues 
distant from each other along the polypeptide chain) interactions [33]. From a practical viewpoint this approach has 
not led to a 
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successful algorithm for predicting structure. It is a mark of the lack of progress that much attention is now being 
devoted to expert algorithms based on the (now quite large) data bases of known structures and the sequences 
specifying them; they merely compare a new sequence of unknown structure with extant sequences whose 
structures are known. Of course a causal basis is lacking and no fundamental insight into the underlying 
mechanisms governing folding is gained. 

A different approach is based on the realization that equilibrium thermodynamics cannot dictate a sequence of 


events under time constraints unless the contributions to the thermodynamic potential themselves represent kinetic 
parameters [34]- Life as a whole is well characterized as a thermodynamic system operating under kinetic 
constraints; individual life is essentially transient, and metastable structures are nearly always good enough. This 
naturally leads to viewing expedience rather than equilibrium as the driving principle of folding, and the 
preeminence of the Lagrangian £ (=T- Ffor conservative systems, where Tand Fare respectively the kinetic and 

potential energies), rather than the Hamiltonian ?{= T+ V. Minimization of the action (the integral of £ ) is an 

inerrant principle for finding the correct solution of a dynamical problem [35]; the difficulty resides in the fact that 
there is no general recipe for constructing £ . 

A solution leading to a successful algorithm was recently found for the folding of ribonucleic acid (RNA) [36]. 
Natural RNA polymers (figure C2.14.1) are mainly made up from four different 'bases', A, C, G and U. As with 
DNA, multiple hydrogen bonding favours the formation of G-C and A-U pairs [16, 37, 38] which leads to the 
appearance of certain characteristic structures. Loop closure is considered to be the most important folding event. 
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Figure C2.14.1. Diagram of a fragment of a folded RNA polymer, the QP replicase MDV-1 [ 176 ]. Note the 
various structural features: stems closed with a loop ('hairpins'), bows, and single strands. 

F(the potential) is identified with the enthalpy, i.e. the number n of base pairings (contacts), and T corresponds to 
the entropy. At each stage in the folding process, as many as possible new favourable intramolecular interactions 
are formed, while minimizing the loss of conformational freedom (the principle of sequential minimization of 
entropy loss, SMEL). The entropy loss associated with loop closure is AS lo (and the rate of loop closure - exp 
(AS lo )); the function to be minimized is exp (-AS loo IR)ln [36]. A quantitative expression for ASj can be 
found by noting that the TV monomers in an unstrained loop (N > 4) have essentially two possible conibrmations, 
pointing either inwards or outwards. For loops smaller than a critical size 7V , the inward ones are in an apolar 
environment, since the enclosed water no longer has bulk properties, and the outward ones are in polar bulk water; 
hence the electrostatic charges on 
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the ionized phosphate moieties of the bases will tend to point outwards. For TV < N Q , ASj = - RN In 2, and for TV 
> 7V , the Jacobson-Stockmayer approximation based on excluded volume yields ASj - R In N. 

In the case of proteins, it is advantageous to make use of the fact that not all combinations of the two dihedral 
angles (the only degrees of freedom of the polypeptide chain) specifying the orientations of the C - C and C - N 
bonds are permitted [39]- Essentially there are just three basins of attraction, corresponding to left- and right- 
handed a-helices (compact conformations) and the P-sheet (extended conformation). Consensus sequences (runs of 
amino acid residues whose local conformations fall in the same basin) result in persistent, ultimately global 
structure being built up (the folding problem can be viewed as fixing the relation between local and global 
structures). As with RNA, the barriers are entropic, determined solely by loop closure (except where contacts have 
to be disassembled), and the SMEL principle applies. This approach has been successfully used to fold bovine 
pancreatic trypsin inhibitor [40]. 


C2.1 4.2.3 BIOMOLECULAR ASSEMBLIES 

The modern era of biochemistry and molecular biology has been shaped not least by the isolation and 
characterization of individual molecules. Recently, however, more and more polyfunctional macromolecular 
complexes are being discovered, including nonrandomly codistributed membrane-bound proteins [41]. These are 
made up of several individual proteins, which can assemble spontaneously, possibly in the presence of a lipid 
membrane or an element of the cytoskeleton [ 42 ] which are themselves supramolecular complexes. Some of these 
complexes, e.g. snail haemocyanin [43], are merely assembled from a very large number of identical subunits; 
viruses are much larger and more elaborate; and we are still some way from understanding the processes 
controlling the assembly of the wonderfully intricate and beautiful structures responsible for the iridescent colours 
of butterflies and moths [44]. 

Specific intramolecular interactions ( section C2.14.6 ) can be expected to play a role in the spontaneous assembly of 
the constituent elements, so that simply mixing the constituents will result in a correctly assembled structure 
provided the environment is right. One wonders whether stigmergic building algorithms [45] are involved, in which 
individual elements communicate only through the local environment. If one observes a nest of ants after a 
disturbance, the impression is one of haphazard movement as the ants drag exposed eggs hither and thither, but 
within a very short time they have all been moved to safety, without any hierarchical command system in 
operation. 

C2.14.3 BIOLOGICAL EQUILIBRIUM 

Ergodicity is generally broken in biological systems, and hence the standard notion of equilibrium is not very 
useful for solving biological problems; as already mentioned in section C2. 14.2.2 , most living systems operate 
under time constraints. Processes are therefore not infinitesimally slow and perfectly reversible: an organism is 
willing to sacrifice free energy in order to ensure that events take place rapidly and are consequently irreversible. 

C2.1 4.3.1 TRANSFER AND STORAGE OF CHEMICAL POTENTIAL 

Many biochemical reactions are involved in converting and storing energy, and the primary consideration is the 
chemical potential at which the product is recovered, rather than the yield. Consider the simple reaction 


A ^ B. (C2.14.4) 
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The rate of storage of chemical potential (in other words, the power P carried by the chemical reaction) isy'ju B , 
where y = db/dt = Jf-J^, the net flux per unit volume (here, as elsewhere, lower case letters denote concentrations, 
and subscripts f and b refer to the forward and backward reactions). If A and B are at equilibrium, j = 0; if the 
forward reaction proceeds at a finite rate, then ju B < ju A , but if/ goes to its maximum value, j = J f (i.e. J^ = 0), ju B = 
and no chemical potential is stored. At what intermediate flux is P optimal [46]? The change in chemical 
potential is given by the van't Hoff isotherm 

A^ = ^i3 - ji A = -/?r[ln K - ln(fi*/01 (C2.14.5) 

where the stars serve as a reminder that the concentrations should be multiplied by activity coefficients, assumed to 
equal unity in the following discussion. Provided one is allowed to assume that the equilibrium constant K = k^k^ 9 
a valid assumption if Boltzmann equilibrium is maintained in the steady state and hence a single rate coefficient 
correctly characterizes the reaction, then 


An = RT\n— = AT In 


Jf. 


(C2.14.6) 


and the efficiency of free energy transfer is ju A + A m )/ju A . Writing P asy'(M A + Au), it is a simple matter to 

) for A , ( 


substitute in expression (C2.14.6) for A , differentiate with respect toy' and set the derivative to zero, obtaining: 


_| n f,_^i" = 

L <*t - 


. (C2.14.7) 

RT 


1 ™ Jmax/^f 
Once the reaction is accomplished, it is usually desirable to store the product, i.e. a further reaction 

B ^ B il0K (C2.14.8) 


must follow ( C2.14.4 ). Since all the reactions are reversible, at first sight it would appear impossible to store B 
indefinitely: the back reaction out of the store could only be prevented by an infinitely high energy barrier. To 
avoid this, the store is fitted with a door; only when this door is closed is the barrier infinitely high. Furthermore, 
the store has a variable volume such that the chemical potential is constant and independent of the amount of 
material contained. 

When the door is open, the optimal net flux into the store isy m , given by equation (C2.14.7). It may be that the 
stochastically gated diffusion treated by Szabo et al [47], see also [ 48 ] is a good representation of typical biological 
storage reactions (C2.14.8). 

C2.1 4.3.2 'MISSING ENTROPY' 

The calorimetrically measured AH is usually assigned to the formation and breaking of chemical bonds. The 
equation 
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-RT In K = AH - T AS (C2.14.9) 

linking enthalpy and entropy is, as has been aptly pointed out [49], 'as infallible as the laws of thermodynamics'; 
equally infallible statements can be generated by adding, pace Hall and Knight, some arbitrary quantity £, to both 
terms on the right-hand side, which becomes AH+ £ ) -(TAS + £). As Weber has pointed out [50], the separation of 
AH and AS depends upon specific hypotheses relating them, among which there is no prima facie unique choice, 
and one may legitimately enquire whether a proposed choice of % is appropriate. As hinted at by Planck [51], AH is 
a composite quantity comprising (a) the chemical bonding energy, and (b) the integral of the specific heat 
increments AC (defined as d Qld Tina reversible change, where Q is the heat), i.e. / ACpdT, obtainable by 

painstaking measurements of AC from K to T. For ideal gases and other small entities, this integral is practically 
zero and can be neglected, but for biological macromolecules it could well be more significant than the chemical 
bonding energy. Since Cp = (tiHf'dT)f*, this assertion can easily be verified by enquiring whether AH varies with 

temperature for the reaction under consideration [52, 53]. Since dQ = TdS in a reversible change, equation 
(C2.14.9) can be rewritten as [49] 

-RTlnK = A// fl - AW (C2.14.10) 


where the subscript stands as a reminder that this is the (defined) heat of reaction at K , and 


AW = Tf ^-^dT-f ACpdT 


(C2.14.11) 


represents the work obtainable by conversion of 'thermal' heat expendable in the separation of chemically bonded 
atoms [49]. At all temperatures above absolute zero, the integral f r ACpdT must ^ e subtracted from the measured 

enthalpy in order to obtain the true heat of reaction. 

This argument was pointed out almost 30 years ago by Benzinger [49], in a paper which referred to some still 
earlier work of his, and yet its implications, even more pertinent today given the wider use of calorimetry in 
molecular biology, still appear to be largely ignored, an exception being Weber's work on the association enthalpy 
of protein subunits [50]. 

The 'missing entropy' _ f T (&Cp/T) dT * s proportional to the number of possible states. In typical biological 

macromolecules and also other nonergodic materials such as glasses and spin glasses, disorder is quenched: only 
one of the very large number of possible realizations occurs, and Nernst's third law is violated [54]. 

The entropy of a solution is itself a composite quantity comprising: (i) a part depending only on the amount of 
solvent and solute species, and independent from what they are, and (ii) a part characteristic of the actual species 
(A, B, . . .) involved (equal to zero for ideal solutions). These two parts have been denoted respectively cratic and 
unitary by Gurney [55]. At extreme dilution, (ii) becomes more or less negligible, and only the cratic term remains, 
whose contribution to the free energy of mixing is 
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-trin W^ n ^ = kT(ri A ln_v A +flBln.v B + ■) (C2.14.12) 

where the « A , x A etc represent the numbers and mole fractions of A, B,. . .. 

C2.1 4.3.3 COOPERATIVE EFFECTS IN BINDING 

Switching and control ('signal transduction') in biological systems as elsewhere usually strives to achieve a highly 
nonlinear response, which inter alia confers a certain immunity from noise onto the system. Cooperative binding is 
an easy way to achieve this end. Most signalling in biology is based on the binding of a ligand L to an unoccupied 
site S on a receptor B: 

L + S(B) ^C. (C2.14.13) 

The ligand-receptor complex C has changed properties which typically allow it to undergo further, previously 
inaccessible reactions (e.g. binding to a DNA promoter sequence). The role of L is to switch B from one of its 
stable conformational states to another. The approximate equality of the intramolecular, molecule-solvent and L-B 
binding energies is an essential feature of such biological switching reactions. An equilibrium binding constant K 
is defined according to the law of mass action: 

K = ^-. (C2.14.14) 

Si 

If there are n independent binding sites per receptor, conservation of mass dictates that s = nb - c, where Z? is the 


total concentration of B, and the binding ratio r = c/b^ (number of bound ligands per biopolymer) becomes 




1+*V 


(C2.14.15) 


Suppose now that the sites are not independent, but that addition of a second (and subsequent) ligand next to a 
previously bound one (characterized by an equilibrium constant K^) is easier than the addition of the first ligand. In 
the case of a linear receptor B, the problem is formally equivalent to the one-dimensional Ising model of 
ferromagnetism, and neglecting end effects, one has [56]: 


2\ ""[(l-Ib«) 1 +4W/*J 1/a y 


(C2.14.16) 


where the degree of cooperativity q is determined by the ratio of the equilibrium constants, q = K^l K^. For q > 1 
this yields a sigmoidal binding isotherm. Another interesting case, also yielding a sigmoidal relation between r and 
£ , is represented by the uncharged oligopeptide alamethicin which partitions as a monomer into bilayer lipid 
membranes and aggregates within the membrane [57]. 
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If q < 1, then binding is anticooperative, for example when an electrically charged particle adsorbs at an initially 
neutral surface; the accumulated charge repels subsequent arrivals and makes their incorporation more difficult 
[58]. 

Another important noise-reduction mechanism is to incorporate a threshold into the responsive apparatus. 
Essentially this is why antibodies are multidentate, why serial triggering of antigen-presenting cells [59] is 
necessary, and so on. The benefits of a response threshold Tin have been thoroughly investigated in the context of 
radiation detectors [60], and the argument can be adapted to biological detectors. Suppose that L ligands are 
incident on an area containing R receptors. The number arriving at any particular receptor will fluctuate around X = 
L/R, the mean number of ligands per receptor, and assuming a Poisson distribution for these fluctuations, the 
expected number pof activated receptors (i.e. those receiving Tor more ligands) is fR, where 


/=1 _ e -.( 1+ , + £ + ... + _£i_). 


(C2.14.17) 


ps binomially distributed and its standard deviation a is [N f*(l - p)] . The least detectable signal V must exceed 
L by a certain amount — let us suppose that the least detectable increment is a (the argument remains unchanged if 
some other multiple of a is taken) and it is given by the solution of 

Rf = Rf \ a (C2.14.18) 

where/ is given by equation C2.14.17 with X = L I R. As an example, suppose that R = 200 and background L = 
400. If T= 1, then 79 additional ligands are needed to engender a response; but if T= 2, only 53 are required. The 
optimal threshold depends on the expected background level. Note that the most perfect possible detector is still 
subject to a basic limitation due to the inherently fluctuating nature of the input; in the case of a Poisson process a 
= Vz = 20 ligands would be the smallest detectable increment. 

C2.1 4.3.4 THE EXPERIMENTAL DETERMINATION OF ASSOCIATION (BINDING) CONSTANTS 


The two main difficulties facing the experimenter are (i) how to detect binding, and (ii) how to ensure that the 
system under investigation is truly in an equilibrium state. 

(i) Typically an atom or group of atoms is selected as a reporter whose measured property (e.g. intensity of a 
particular Raman line) is characteristic of the state of binding. One of the most popular reporters is the 
photo luminescence intensity at a certain wavelength, since it can be very easily measured using a commercial 
fluorimeter. Sometimes the intrinsic photoluminescence of an amino acid (tryptophan, tyrosine) can be used, but 
quantum yields are low and the sample has to be excited in the far ultraviolet. Hence a fluorescent group is often 
covalently bound to one of the reaction partners, although this may drastically change its binding properties, an 
obvious caveat all too often overlooked. Such assays have to be calibrated, typically by measuring the 
photoluminescence (or other property) at binding saturation [52], although since true saturation requires one partner 
to be in infinite excess more or less ingenious extrapolation procedures are required. Once this is done, the 
relationship linking photoluminescence intensity with intermediate degrees of binding must be established. These 
steps are not trivial and usually end up relying on assumptions (e.g. of linearity) which are far from 
incontrovertible. 
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The easiest way to accomplish (ii) is to dilute the supposedly equilibrium bound state and check that the predicted 
degree of dissociation takes place on the time scale of the experiment. This is often inconvenient in homogeneous 
assays, however. Another check is to incubate the ligand and receptor for different times x: invariance of c with t 
would be evidence of equilibrium having been reached, provided that the range of t has been chosen judiciously. 
One may also compare the amount bound after adding successive small increments of ligand L to receptor B with 
the amount bound after having added L to B in a single large increment: if the system is in thermodynamic 
equilibrium, c should be path-independent. 

The development of extremely sensitive microcalorimeters has popularized the 'direct' determination of reaction 
enthalpy AH simply by measuring the heat evolved upon mixing L and B. The free energy is determined from the 
equilibrium constant K deduced from fitting a plot of r versus ito an appropriate isotherm, such as equation 

(C2.14.15) . No labelling is required, but it is often awkward to properly establish the reversibility of the reaction, 
and the validity of the evaluation of K depends on the binding model assumed. A graver deficiency is the neglect of 
specific heat effects ( section C2. 14.3.2 ), manifested by temperature-dependent 'standard' reaction enthalpies. Here, 
as elsewhere, the very ease of experimentation can lead the investigator into error, and microcalorimetry is no 
exception. 

Great interest has recently been developed in heterogeneous systems in which B is immobilized to a solid surface 
and ligand binding measured directly, e.g. using a quartz crystal microbalance (QCM) or an optical method such as 
optical waveguide lightmode spectroscopy (OWLS), ellipsometry or surface plasmon resonance (SPR) [61], i.e. the 
solid surface plays a dual role as both receptor and sensing platform. When carried out properly neither labelling of 
the participating molecules nor calibration of the response are required, and direct measurement of the reverse 
reaction can be accomplished with ease. These heterogeneous methods are discussed in more detail in section 
C2.14.7.2 . 

Attempts have been made to combine the sensitive detection possibilities of heterogeneous systems with the more 
familiar (to non-electrochemists) homogeneous ones by recreating a quasihomogeneous environment between the 
solid-liquid interface proper (the sensing platform) and the bulk liquid phase by interposing a hydrogel (e.g. 
carboxydextran) fixed to the solid phase. The receptors are covalently attached to the hydrogel scaffold. A 
drawback to this procedure is that it has been found empirically that association kinetics measured with such 
hydrogels do not correspond to those expected from truly homogeneous systems, the reason being that mass 
transport within the hydrogel is sufficiently drastically retarded that binding becomes diffusion limited for all but 
the slowest reactions [4]. Furthermore, upon dissociation the ligand has an extremely high probability of rebinding 
before it can diffuse away from the receptor (see also section C2. 14.6.3 ). Hence binding parameters derived from 
the kinetics do not reflect the true chemical affinity of the receptor for its ligand. This is apart from the fact that 
covalent immobilization to the hydrogel may inactivate the receptors, e.g. by involving amino acids on an epitope 
or at the active site. A good way to avoid these difficulties is to anchor B to a lipid bilayer [62]. 


C2.14.3.5 CONFORMATIONAL RELAXATION 

Ageno [63], Blumenfeld [64] and possibly others have emphasized that biological systems are constructions: a 
living cell is much closer to a mechanical clock than to a bowl of consomme. To characterize the latter, a statistical 
approach is adequate, in which the motions of an immense number of individual particles are subsumed into a few 
macroscopic parameters such as temperature and pressure. But one does not usually need to know the pressure 
when analysing the working of a clock. The energy contained in a given system can be divided into two categories: 
(a) the multitude of microscopic or thermal motions sufficiently characterized by the temperature and (b) the 
(usually small number of) macroscopic, highly correlated motions, whose existence turns the construction into a 
machine. The total energy 


-16- 

contained in the microscopic degrees of freedom may be far larger than those in the macroscopic ones, but 
nevertheless the microscopic energy can usually be successfully neglected in the analysis of a construction. In 
informational terms, the macrostates are remembered, but the microstates are not [65]. 

The question then is, to what degree can the microscopic motions influence the macroscopic ones: is there a flow of 
information between them [66]? Biological systems appear to be nonconservative par excellence and present at 
least the possibility that random thermal motions are continuously injecting new information into the macroscales. 
There is certainly no shortage of biological molecular machines for turning heat into correlated motion (e.g. [ 67 ] 
and section C2.14.5 ; note also [16]). 

A construction makes use of only an insignificant fraction of the Gibbs canonical ensemble and hence is essentially 
out of equilibrium. This is different from thermodynamic nonequilibrium — it arises because the system is being 
investigated at time scales much shorter than those required for true statistical equilbrium. Such systems exhibit 
'broken ergodicity' [68], as epitomized by a cup of coffee in a closed room to which cream is added and then 
stirred. The cream and coffee equilibrate within a few seconds (during which vast amounts of microinformation are 
generated within the whorled patterns); the cup attains room temperature within tens of minutes; and days may be 
required for the water in the cup to saturate the air in the room. 

Broken ergodicity may be regarded as a generalization of broken symmetry, a concept introduced by Landau (see 
[69]) in the context of phase transitions, and which leads to a new thermodynamic quantity, the order parameter £, 
whose value is zero in the symmetrical phase. £, may be thought of as conferring a kind of generalized rigidity on a 
system [70], allowing an external force applied at one point to be transferred to another. Some protein molecules 
demonstrate this very clearly: flash photolysis of oxygenated haemoglobin causes motion of the iron core of the 
haem which results in (much larger) movement at the distant intersubunit contacts, leading ultimately to an overall 
change in the protein conformation involving hundreds of atoms. 

In the case of enzymatic catalysis, it has been proposed that when substrate binds to the active site, local fast 
vibrational relaxation takes place on the picosecond time scale, but the active site is no longer in equilibrium with 
the rest of the molecule and the resulting strain modifies the energy surface on which the enzymatic reaction takes 
place [ 71 , 72 ]. Subsequent conformational relaxation involves making and breaking a multiplicity of weak bonds, 
but at a slower rate than the reaction being catalysed. This description implies a definite and striking prediction: the 
reaction rate should exhibit an inverse Arrhenius temperature dependence, because increasing the temperature 
accelerates conformational relaxation, and hence shortens the time during which the strained molecule is able to 
accelerate the enzymatic reaction. Evidence for this mechanism is provided by the pulsed photolysis of 
carbonmonoxy (relaxed, R) haemoglobin at 532 nm. Time-resolved resonance Raman spectroscopy of aromatic 
amino acid residues associated with the intersubunit contact show that a strained tense (T) conformation 
(characterized by the tyr a 42 - asp P 99 intersubunit hydrogen bond and a close trp P 37 - tyr a 140 contact, indicated 

by an increased tyr 830/850 cm Fermi doublet intensity ratio and a decreased trp 880 cm band intensity 
respectively [73]) is produced within the 7 ns duration of the photolysis pulse. Strain is also inferred from the 
enhanced optical adsorption difference (compared with the difference between equilibrium T and R forms) at 315 
nm (due to the trp P^ 7 - tyr a lzm contact) which appears on the nanosecond time scale. The enhancement then 


relaxes with the same (microsecond) time constants [74] characteristic of tertiary structural changes in the vicinity 
of the haem iron which can be probed by the Soret band adsorption (R A Copeland, S Dasgupta, J J Ramsden, R H 
Austin and T G Spiro, unpublished observations). Another intriguing piece of evidence comes from direct 
observation of the adenosine triphosphate (ATP)-induced generation of mechanical force by immobilized myosin 
interacting with actin tethered to beads held in optical traps. Upon hydrolysis 
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of ATP, adenosine diphosphate (ADP) is released. Individual hydrolysis events could be monitored by microscopic 
observation of fluorescently labelled adenosine. The simultaneous monitoring of bead displacements due to the 
mechanical force exerted on the actin clearly showed that force is generated several hundred milliseconds after 
release of ADP [75]. 

C2.14.4 Kineticslt has already been emphasized ( section C2. 14.1 , section C2. 14.2.2 and section C2. 14.3.1 ) that 
kinetics are of paramount importance in describing living systems [76]. The root of this may ultimately lie in the 
fact that whereas inanimate matter has endless time in which to undergo its transformations, mortal, animate matter 
is constantly racing against the clock. 

C2.1 4.4.1 TRANSPORT IN BIOLOGICAL SYSTEMS 

The determination of biological affinity by mixing two species and measuring their rates of association and 
dissociation presupposes that the contribution of transport to the association dynamics is precisely known. Well- 
defined hydrodynamic conditions are therefore a prerequisite for the experimental determination of affinities via 
rates. 

In a homogeneous system, the rate of mixing is governed by Smoluchowski's equations [72], according to which 
the diffusion-limited association rate of S and L ( equation (C2.14.13 )), supposed uncharged, equals that of the flux 
and is 


(C2.14.19) 

where d and D are the molecular radii and diffusivities respectively. In the presence of an energy barrier 
characterized by an association (forward) rate coefficient £ f , one introduces a vicinal concentration (subscript v) 

and writes the rate as dc/dt = k^ per S, and the flux from the bulk to the vicinity of L as 4n(d^ + d^)(D^ + £> L )( - 
), giving the familiar expression: 

(C2.14.20) 

The equivalent equations for heterogeneous and quasi-heterogeneous systems (the latter are small vesicles which 
can practically be handled as homogeneous systems, but which are nevertheless large enough to possess a 
macroscopic solid-liquid interface) are dealt with in section C2.14.7 . 

At first sight it seems that biochemical reactions taking place in the cytoplasm can be modelled homogeneously, 
but in fact the cytoplasm is a complex, highly viscous medium belonging to the class of soft matter or complex 
fluids, which bears little relation to the cytosol [78], and in which diffusion may be anomalous. It is a current 
experimental challenge to reconstruct the cytoplasm in vitro and systematically investigate biomolecular reaction 
kinetics in such media, although since the majority of intracellular reactions actually seem to take place at the 
solid-liquid interface [78], it is even more important to correctly apply the methods of heterogeneous kinetics to 
biochemical reactions. 
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An important aspect of biological transport is that nature makes extensive use of the reduction of dimensionality to 
speed up search and discovery (SD) (see also section C2. 14.6.2 ). SD is enormously enhanced upon moving from 
three to two or one dimensions, because the spatial extent to be explored is drastically reduced. Affinity follows 
kinetics in being enhanced upon moving from three dimensions to two dimensions [79]. 

C2.1 4.4.2 SMALL SYSTEMS 

Consider again the prototypical homogeneous reaction (C2. 14.13), which Renyi has analysed in detail [80]. Taking 
the forward reaction only (other cases are also dealt with in [80]), and supposing that k f « 4 n (d g +d L )(D s +D L (cf 
equation (C2. 14.20) ), then 

^ = hmm + A-(W)] = kf(jt) (C2.14.21) 

where the angular brackets denote expected numbers, and y t is the number of C molecules created up to time t. The 
term A (y t ) expresses the fluctuations in Yt~ r {yf) = {y t )~ + A~(yJ: supposing that y t approximates to a Poisson 

distribution, then A 2 (y t ) will be of the same order of magnitude as (y t >. The so-called kinetic mass action law 
(KMAL) putting (s) = s Q - c(t) and so on, the subscript denoting initial concentration at t = 0, is a first 

approximation in which A 2 (y ) is supposed negligibly small compared to (s) and (I ), implying that (s)(l ) = (si), 

whereas strictly speaking it is not since s and tare not independent. The neglect of A z (y ) is justified for molar 

quantities of starting reagents (except near the end of the process, when ( s) and ( £) become very small), but 
inconceivably so for reactions in minute subcellular compartments. 

These number fluctuations, i.e. the A (y t ) term, will constantly tend to be eliminated by diffusion. On the other 
hand, because of the correlation between s and £, initial inhomogeneities in their spatial densities lead to the 

development of zones enriched in either one or the other faster than the enrichment can be eliminated by diffusion. 

Hence instead of L disappearing as t _1 (when i Q = s Q ), it is consumed as t 4 [82], and in the case of a reversible 

reaction, equilibrium is approached as t [82] (charged particles are dealt with in [83]). Deviations from perfect 
mixing are more pronounced in dimensions lower than three. 

C2.1 4.4.3 NONEXPONENTIAL DECAY AND ITS ORIGINS 

The paradigmatical binding reaction (equation (C2. 14.22)) is generally analysed as a second order forward reaction 
and a first order backward reaction, leading to the following rate law: 

— =kfSt-k h C (C2.14.22) 

dJ 

supposing l« l y . Despite its beguiling simplicity, this equation cannot, in general, be solved analytically, but a 

numerical solution is straightforward and can be fitted to experimental data to determine the forward and backward 
rate coefficients £ f and k^. Ideally, the data collected should comprise both an association phase, during which S 
andL 


-19- 

are brought into contact, and a dissociation phase, in which C is diluted into a large volume of pure solvent, and the 
fitting carried out globally over both phases. It is unfortunate that many of the traditional binding assays used in 
biochemistry are awkward or impossible to apply to dissociation. This has led to an underappreciation of the fact 


that simple Poisson dissociation (rate proportional to the amount remaining undissociated) giving familiar 
exponential decay is the exception rather than the rule in biomolecular interactions. This is very easy to 
demonstrate with a heterogeneous reaction such as the adsorption of seralbumin onto silica (e.g. [84, 85]): the 
dissociation rate coefficient is clearly time dependent. The amount of protein bound, v(t), can be represented by the 
integral [86] 

v[t) = krf f <£Ui)<2(Mi)dri (C2.14.23) 

Jo 

where § is the fraction of unoccupied receptors. The memory kernel Q denotes the fraction of L bound at epoch ^ 
which remain adsorbed at epoch t (if dissociation is indeed a first order (Poisson) process Q(t) = exp (-k^fj). A 
necessary condition for the system to reach equilibrium is 

lim Q(r) = 0, (C2.14.24) 

Processes of this type have been analysed [84, 85] by adding an irreversible step, either in parallel: 

— — = ft^.Tfc (C2.14.25) 

dr 

for which the memory function is [ 87 ] : 

j2 = T- + *~* hS (C2. 14.26) 

if 


or in series 

dim _ 

— — = iin-C (C2.14.27) 

a/ 

for which the memory function is [ 86 ] 

Q = — (C2.14.28) 

k trr + *b 

to equation (C2. 14.22) , which is modified accordingly. Note that in neither of these cases does lim f Q(t) = 0; the 
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systems do not reach equilibrium and the usual KMAL assumption that the equilibrium constant K can be equated 
to the quotient of the forward and backward rate coefficients does not apply; the backward reaction (dissociation) 
coefficient is time dependent and can be obtained from the quotient [ 86 ] 


JZ4>U0Q'it.t ] )dt } 

kb{t) = e* ^, ww H~ ' (C2.14.29) 


The existence of multiple stable conformations with different affinities implies time-dependent dissociation: a 
molecule initially associated in a low-affinity conformation has the chance to switch to a high affinity one before 
dissociating. Even a single conformation may actually comprise many slightly different subconformations 
('conformational substates', CS, possibly rotamers) separated by finite barriers [7, 40]. At some finite temperature 
the molecule will exist in several different CS (i.e. ergodicity is broken), each of which may be presumed to make a 
slightly different contribution to the rate of any process in which that conformation of the biopolymer participates. 
In biological (and inanimate glassy) systems relaxation is empirically often found to follow 'stretched 
exponential' (Kohlrausch) decay: 

t[r)f£i } = expt-tAv^l, < fi < 1- (C2.14.30) 

If the contributions from the different CS are additive and relax in parallel, 

:(Jt H )e-**'dJtb (C2.14.31) 


at) /» 

= / uM 


with which equation (C2. 14.30) can be simulated, but unless there is some independent way of determining the 
weight distribution w (& b ), its choice is arbitrary. Series relaxation avoids this difficulty [88]: relaxation on the nth 
level is only possible if certain elements in the (n - l)th level satisfy some condition. For example, in the case of 
biopolymer relaxation (cf section C2. 14.3.5) the condition might be that \i nl monomer units in level n - 1 attain 
one particular state of their 2 /r " ' possible ones, giving an average relaxation time x n = 2 /,n ' x ^. 

C2.1 4.4.4 PATTERN FORMATION 

Relative to the multicellular organism into which it develops, the fertilized egg is rather homogeneous, but within a 
few generations of cell division, an embryonic animal already shows remarkable spatial variation, which ultimately 
develops into the differentiated limbs and organs of the adult organism. The realization that diffusion and chemical 
reaction provides an adequate basis for the formulation of a mathematical model of morphogenesis dates back to a 
seminal paper published by Turing in 1952 [89]. The essential idea is that the initially homogeneous, stable state 
moves out of stability due to some random disturbance (not necessarily diffusive; wetting and percolation may also 
play a role [90]). The unstable state generates waves of morphogens, molecules capable of leading to the generation 
of differentiated forms. 

The great complexity of morphogenesis makes it rather difficult to formulate a theory of the process beyond stating 
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the equations; particular cases have to be investigated with the help of numerical simulation. In this respect, cellular 
automata [ 91 ] seem to show great promise. An example is the model of neurogenesis in Drosophila recently 
described by Luthi et al [92]. The system is divided up into cells corresponding to the actual cellular divisions of 
the organism, each of which is initially assigned concentrations of a substance which promotes neuralization within 
the cell and inhibits it in the neighbours to which it is transmitted. Subsequent evolution is governed by plausible 
rules inspired by recent advances in the molecular biology of the developing embryo. 


C2.14.5 BIOLOGICAL MACHINES 

One of the most fascinating recent developments in biology has been the discovery of numerous highly complex 
biopolymer assemblies (see also section C2. 14.2.3) such as the ribosome or the bacterial flagellum [93, 94 and 95], 
the envy of nanotechnologists seeking to miniaturize man-made mechanical devices (note that the word 
'machinery' is also sometimes used to refer to multienzyme complexes such as the proteasome [96]), and an entire 


organism might indeed be considered as a machine [63]. Even very complex processes such as mitosis can now be 
analysed in considerable biophysicochemical detail [97]. Mitosis serves as an exemplar of the process design which 

seems to epitomize much of life, the so-called S architecture [97]: stochastic (influenced by noise and showing 
strong fluctuations); self-correcting (working by trial and error, with checkpoints and feedbacks to ensure efficient 
regulation); and synchronized (which may itself be self-correcting). 

Mitosis is characterized [99] by steady elongation (at a velocity v , depending inter alia on the concentration of 
monomeric tubulin) of the microtubule (polymerized tubulin) filaments which search for, and ultimately 
mechanically separate freshly replicated DNA prior to cell division, punctuated by their abrupt shrinkage (with 
velocity v ). This dynamic instability is characterized by length fluctuations of the order of the mean microtubule 

length, hinting at a phase transition. Letf denote the frequency of switching from growth to shrinkage 

gs 

('catastrophe'), and the reverse switching back to growth (in a different direction) by/ ('rescue'). When v f = 

v„f, at which point the average tubule length £= v n v/(vf- vfj diverges, growth switches from unbounded 

s gs g s s g sg 

(during the so-called interphase, between cell division) to bounded (during mitosis, when the microtubules have to 
find and grab chromosomes) [97]. The molecular origin of catastrophe and rescue lies in the fact that tubulin 
monomers can bind to guanosine triphosphate (GTP), and the complex can spontaneously assemble to form 
filaments. But the GTP slowly hydrolyses to guanosine diphosphate (GDP), thereby somewhat changing the 
tubulin conformation such that it prefers to be monomeric. The microtubule can only be disassembled from the end, 
however: a catastrophe occurs if the rate of GTP hydrolysis exceeds that of tubulin addition for a while. After a 
catastrophe, growth occurs in a new direction. This dynamic instability-based mechanism is an extremely efficient 
way to search a volume [ 100 ]. 

C2.1 4.5.1 THE GENERATION OF TRANSLATION AND ROTATION 

The forces involved in muscle contraction [ 101 ] can now be directly scrutinized by attaching an actin filament to a 
small dielectric sphere which can be nanomanipulated using optical tweezers [ 102 , 103 ] and bringing the filament 
into contact with myosin. Using similar techniques it has become possible to directly observe kinesin molecules 
moving along microtubules [ 104 ]: the kinesin is labelled with a fluorescent molecule and imaged with low 
background total internal reflexion fluorescence microscopy, sensitive enough to detect single molecules. 

Apart from the development of imaging and force measurement devices, an important biophysico-chemical 
problem is 
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understanding how motion occurs within the framework of molecular interactions. The general concept is based on 
Brownian particles moving along a periodic but asymmetric ('sawtooth') potential, resulting in directed 
('processive') motion [ 67 ] (see [95] and [ 105 ] for examples of rotatory motors). Either the potential or the force 
acting on the particle fluctuate. Beautiful experiments based on direct imaging of these molecular motions, and 
direct measurements of the fluctuating forces, have enabled this generic concept to be refined into realistic models 
whose parameters are closely bounded by the experimental observations [67]. Kinesin has two heads connected by 
a flexible hinge. At the start of a cycle, both heads are sitting in a potential well. ATP hydrolysis (see below; and cf 
the effect of GTP hydrolysis on tubulin, above, and the general mechanism of enzyme catalysis described in section 
C2. 14.3.5 ) results in a conformational change of one head which consequently advances to the next well. The hinge 
is now strained, and during relaxation bringing the two heads together again it is more probable (because of the 
asymmetric shape of the wells) that the laggard head is dragged to the advanced one, rather than vice versa. This 
relaxation together with the release of hydrolysed ATP completes the cycle [67]. 

A vital biophysico-chemical problem is to understand how chemical energy (released by ATP or GTP hydrolysis 
[ 105 ], or by protons falling down an electrochemical potential gradient [95]) is converted into mechanical energy. 
The thermodynamic constraints on the energy requirements of biological machines have been set out by Gray 
[ 106 ]. The force F which has to be applied to a molecular lever requires accurate knowledge of its position x if 
reversible work is to be performed. Specifying the positional accuracy as Ax, the uncertainty principle gives the 
energy requirement as 


AE > hc/(4Ax) (C2.14.32) 

and the uncertainty in the force generated at x is then 

AF = F(x) ± A.v(dF/ck). (C2.14.33) 

To compute the work Wdone by the system, equation (C2. 14.33) is integrated over the appropriate x interval. The 
first term on the right-hand side yields the reversible work ^ rev , and the second term yields -A x Z- \F. - F. +1 | for 
any cycle involving^ steps. The energy conversion factor s is 

* = Wf(Q + AE) (C2.14.34) 

where Q is the net energy input during the cycle. With the help of inequality (C2. 14.32) and defining two 
dimensionless quantities: 


a = hrJ^\f j -f J+l \fi4QWM) 


(C2.14.35) 


and 
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Z = AE/Q (C2.14.36) 

(the relative energy cost of control), one can write 

€f*w < (1 ~ ff/z)/(1 +Z) (C2.14.37) 

where s rev = W ^JQ, the classical conversion factor. The maximum possible value of this ratio is obtained by 
substituting z by its optimal value z ., obtained from the turning point of equation (C2. 14.33) and given by 

Zopi = a(\ + /TTT/tt) (C2.14.38) 

which is 

i - i/(L + v* -*/*) 

(C2.14.39) 


/ j\ = l-l/(H 


VI + I/a) 


If more energy than z t is used, then a decreases because of the energy cost of information; if less, then s 
decreases because of the irreversibility (dissipation etc). For a macroscopic system these quantities are 
insignificant. But consider the myosin motor: taking F- ~ 2 pN [ 102 , 103 ], the displacement x « 10 nm [ 102 ], and Q 
« 0.067 aJ (the energy released by hydrolysing a single ATP molecule), then the energy cost of optimum control, 
Qz nn +, is equivalent to hydrolysing almost 150 ATP molecules (cf [ 107 ]) and (s /s^J^ = 0.0033. As with the 


storage reaction discussed earlier ( section C2. 14.3.1 ), reversible operation is far from efficient. Chemical to 
mechanical conversion occurs at a finite rate which may be essentially uncontrolled, i.e. determined intrinsically. 
The parallel to the storage reaction may be developed further by noting that the duty ratio of a molecular machine 
(the fraction of time the motor spends attached and working) corresponds to the fraction of time the store is open. 


C2.14.6 THE SPECIFICITY OF BIOMOLECULAR INTERACTIONS 

The marvellous intricacy of a living organism could not function without the multitude of highly specific 
interactions which pervade almost every aspect of physiology. The concept of molecular recognition can be traced 
at least as far back as Fischer's lock and key mechanism for the recognition of its substrate by an enzyme [ 108 ]. 
Given perfect mixing along with specificity, cell physiology could presumably function in a structureless medium 
on the basis of concentration gradients and diffusion; real cells are internally structured, but even the transport of 
molecules and organelles along cytoskeletal filaments requires specific binding of molecules to their carriers, and 
the assembly of large multimolecular, multifunctional complexes ( section C2. 14.2.3 ) also requires specific 
recognition between the constituents. 

From the viewpoint of biophysical chemistry, the main problems to be solved are: (i) what is the submolecular 
basis of 
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recognition and (ii) what is the required degree of specificity? Beyond these two, the study of specific interactions 
inevitably leads to questions about the origin of specificity and how it evolved from more primitive and presumably 
less specific interactions, but these lie beyond the scope of this chapter. 

C2.14.6.1 THE SUBMOLECULAR BASIS OF RECOGNITION 

One of the earliest successes of biophysical chemistry in the postwar era, which helped to lay the foundations of 
modern molecular biology, was the discovery of the base pairing mechanism in nucleic acids [16, 37], based on the 
hydrogen bond [38, 109 ] (figure C2.14.2), which could be considered to be the most important bond in biology; as 
well as providing the basis for molecular recognition and all that implies, it also gives water its unusual and vital 
properties. As well as ensuring the fidelity of DNA replication and its transcription into RNA, nucleotide base 
pairing also allows RNA to adopt the unique structure ( figure C2.14.1 and section C2. 14.2.2 ) needed for its 
subsequent translation into protein [ 110 ]. Furthermore, hydrogen bonding determines the folding of a denatured 
protein in water into its three-dimensional conformation via hydrogen bonds between donor and acceptor groups on 
polar amino acids, and the inability of water to form hydrogen bonds with apolar amino acid residues, which drives 
as many of them as possible into the protein interior. 


< 
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Figure C2.14.2. The hydrogen bond in water. The oxygen lone pairs (shaded blobs) are the donors, and the 
hydrogen atoms the acceptors [ 177 , 178 ]. 

Given that the hydrogen bond is rather weak, with bond energies AE ~ a few kT, biological recognition has to be 
based on multiple interactions (cf section C2. 14.3.3 ). If dissociation requires the simultaneous, independent rupture 
of all the interactions, then k^ (cf equation (C2. 14.13) ) ~exp (-v AE/kT), where v is the number of bonds, even a 
few of which, taken together, thereby become equivalent to a single strong covalent bond. 


Hydrogen donor/acceptor complementarity, complemented by electrostatic complementarity, although this appears 
to play the minor role, is the basis for a vast effort in computational drug design based on putative receptor 
structures mostly derived from x-ray crystallography [ 111 ]. Calculations based on static structures without 
allowing for subtle structural modifications of the binding partners following initial association have produced less 
than spectacular results; attempts are now being made to incorporate flexibility into the simulated molecules. The 
simple idea of docking taking place much as two rigid spacecraft interact is further complicated by the ubiquitous 
presence of water, itself a strongly hydrogen bonding molecule. Some interactions may involve expulsion of 
solvent. The omission of water in numerical simulations of docking is likely to be fatal for the accuracy and 
relevance of the results. 
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C2.14.6.2 THE REQUIRED DEGREE OF SPECIFICITY 

SEARCH AND DISCOVERY (SD) 

A somewhat naive view is that affinity should be maximal for the molecule to be recognized, and zero for all other 
molecules. This strategy may not overall be the most efficient for recognition. Given the essential similarity of 
many biomolecules, the complete absence of attraction or even the presence of repulsion between any given pair of 
dissimilar molecules is likely to be rather exceptional, but nature can make good use of weak, 'nonspecific' 
attractive interactions. Consider the recognition of a particular DNA base sequence, say 10 base pairs long, by a 
protein. This type of process is very common and is the basis of transcription regulation and DNA restriction 

(scission of foreign DNA). If out of all 4 possible sequences only the correct sequence has any affinity, then it 
can only be found by a tedious process of trial and error in three-dimensional space. If the protein has weak affinity 
for all of the DNA however, it can quickly bind anywhere on the molecule, and then execute a fast one-dimensional 
walk along it until the recognition site is found [ 112 ]. Searching in one or two dimensions is much more efficient 
than in three [ 113 ], and it has been experimentally demonstrated that a dimerization reaction has a much higher 
affinity at the two-dimensional solid-liquid interface than in three-dimensional bulk liquid [79]. 

A conceptually related effect occurs in immune recognition, when a ligand (antigen) present at the surface of an 
antigen presenting cell (APC) is bound by a T lymphocyte (TL). Binding triggers a conformational change in the 
receptor protein to which the antigen is fixed, which initiates further processes within the APC, resulting in the 
synthesis of more receptors, and so on. Apparently, effective stimulation of these further processes depends on 
sustained activation at the surface (pace the noise-reduction effect of a response threshold discussed in section 
C2. 14.3.3 , and cf [88]). This can be accomplished with a few, or even only one TL, provided the affinity is not too 
high: the TL binds, triggers one receptor, then dissociates and binds anew to a nearby untriggered receptor 
(successive binding attempts in solution are highly correlated [ 114 , 115 and 116 ]). This 'serial triggering' [59] can 
formally be described by: 


L+R -> Rl (C2.14.40) 

(with rate coefficient & a ) where the starred R denotes an activated receptor, and 

R£ ^R H -L (C2.14.41) 

with rate coefficient k^ for dissociation of the ligand from the activated receptor, and the same rate coefficient k^ 
for reassociation of the ligand with an already activated receptor. The rate of activation (triggering) is -dr/dt = -k r 
I, solvable by noting that dtfdt = —*;,(!' + r*) + itjj,* . One obtains 


(C2.14.42) 


« r > = ,_Kc-^ + 2*; 
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where x = {4l Q k a k & + [k & (l Q - r Q ) -k d ] 2 }~ m and Y= (k d + k^ + r Q ] - l/x)/(k d + k^ Q + r Q ] + 1/x), and the sought- 
for solution is then 


■(0 = *«,[to(i-J^l)-i] 


(C2.14.43) 


CELLULAR RECOGNITION 


Yet another example of SD is provided by the leukocytes which are constantly circulating in the bloodstream but 
do not normally interact with tissue. Venules of inflamed and infected tissue are dilated, however, which changes 
the hydrodynamic regimen and allows some leukocytes to come into contact with the venule wall (endothelium) 
[ 117 ]. The leukocytes are coated with glycoproteins which can interact with selectins (proteins able to selectively 
bind oligosaccharides) present in the outer membranes of the venule wall tissue. The combination of weakened 
hydrodynamic flow and weak selectin-glycoprotein bonds induces the leukocytes to roll along the venule wall 
[117], until they encounter integrins, another class of membrane-embedded proteins, which are able to interact 
more strongly with complementary molecules embedded in the leukocyte outer membrane. The leukocyte then 
stops, spreads out over the endothelium and penetrates between its constituent cells in order to search for, and 
destroy, pathogens. 

THE IMMUNE REPERTOIRE 

Antibodies binding to an antigen interact with a relatively small portion of the molecule. The number TV of foreign 
antigens which must be recognized by an organism is very large, perhaps greater than 10 16 , and there is a smaller 
number TV (~10 ?) of self-antigens which must nothz recognized. Yet the immunoglobulin and T-cell receptors 
may only contain n ~ 10 different motifs. Recognition is presumed to be accomplished by a generalized lock and 
key mechanism involving complementary amino acid sequences. How large should the complementary region be, 
supposing that the system has evolved to optimize the task [ 118 ]? (A similar problem is posed by the olfactory 
system [ 119 ].) If P s is the probability that a random receptor recognizes a random antigen, the value of its 
complement P F = 1 - P s maximizing the product of the probabilities that each antigen is recognized by at least one 
receptor, and that none of the self-antigens is recognized, i.e. { | - Jp£) |V />" |V ', is: 

p F = M+^_j . (C2.14.44) 

Using the above estimates for n, TV and TV, one computes P^*2x 10 . Suppose that the complementary sequence 
is composed of m classes of amino acids and that at least c complementary pairs on a sequence of s amino acids are 
required for recognition. Since the probability of a long match is very small, to a good approximation the individual 
contributions to the match can be regarded as being independent. A pair is thus matched with probability 1/w, and 
mismatched with probability 1 - \lm. Starting at one end of the sequence, runs of c matches occur with probability 
m~ c , and elsewhere they are preceded by a mismatch and can start at s - c possible sites. Hence 
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p s = [(s - c)(m - ])/m + ]]/m L \ (C2.14.45) 

If 5 » c > 1, one obtains 

C = log^ [J (ffl - l)/m] - log,,, P S . (C2.14.46) 

Supposing 5 to be a few tens, aw = 3 (positive, negative and neutral residues), and again using the numbers given 
above (since they all enter as logarithms the exact values are not critical) one estimates c ~ 15, which seems to be 
in good agreement with observation. 

C2.14.6.3 MODELLING INTERACTIONS WITH BROWNIAN DYNAMICS 

The first predictions of antibody-antigen binding rates were made on the basis of the Smoluchowski equation 
(C2. 14.20) . Experimental work suggested a rate about 1000 times slower, which was understood to reflect the 
rather precise rotational alignment required for two molecules to dock specifically, since the area of the docking 
zone (epitope) is only a tiny fraction of the total surface area of the antibody. Careful calculations taking this into 
account indicated that the actual rates should be about a million times slower than those predicted from equation 
(C2. 14.20) , and that the experimentally measured rates were therefore a thousand times faster than expected. Two 
interpretations for the discrepancy were proposed: long range attractive forces steering the antigen to the 
complementary sequence on the antibody [ 120 ]; and the Franck-Rabinowitsch (cage) effect [ 114 , 116 ]. It was a 
notable early achievement of Brownian dynamics (BD) [ 120 , 121 and 122 ] to elucidate the conditions under which 
either or both apply. In these simulations, a large number of Brownian trajectories of the ligand are started on the 
surface of a sphere of radius b centred on the receptor. A fraction P terminate with a successful encounter, and the 
remainder reach the surface of a 'quitting sphere' of radius q> b. The bimolecular association rate coefficient is P 
Jt a (6)/[l-Q(l-p)], where 


^-If 3 ^*]"'- 


(C2.14.47) 


A G IF (z) is the ligand-receptor interfacial interaction potential ( section C2. 14.7.1 , equation (C2. 14.52) , equation 
(C2.14.53) and equation (C2. 14.54) ), and Q is the probability that a particle at q will return to b, equal to the ratio 
k a (b)/k a (q)[121l 

In favourable contrast to molecular dynamics, BD allows molecular movements of realistically long duration to be 
simulated. Nevertheless, the practical number of protein molecules which can be simulated is only two; since 
collective phenomena are often of crucial importance in determining the course of interaction events, other 
simulation techniques, such as cellular automata [ 115 ], need to be used to capture the behaviour of large numbers 
of particles. 

C2.14.7 INTERFACIAL PHENOMENA 

Interfaces play a predominant role in metabolic processes [78], as well as in immune recognition (e.g. T 
lymphocytes recognizing antigens on the surface of antigen presenting cells, section c2. 14.6.2 . Historically, the 
bulk of experimental 
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work on the kinetics of biomolecular interactions has dealt with homogeneous systems, however. Heterogeneous 
systems are in principle very attractive for investigating molecular interactions experimentally because the 


contribution of transport to the kinetics can be precisely controlled, allowing diffusion and chemical effects to be 
separated. Several excellent experimental techniques for investigating heterogeneous binding kinetics in situ have 
been developed during recent years. Compared with the methods available for homogeneous systems, they are 
generally more sensitive and less cumbersome, have better time resolution, and do not require the use of labelled 
molecules [61]. The in situ capability arises because the solid part of the interface can play an additional role as a 
transducer for converting the number of bound molecules into an electrical or optical signal. The various 
techniques may be classified as follows: 

i. electrochemical: typically the change of electrode impedance due to the adsorption of biomolecules is 
monitored. Another possibility is to monitor the collective oscillations (surface plasmons) of the electrons 
in a thin metallic film [ 123 ]. Since these oscillations are excited at optical frequencies the measured 
surface plasmon resonance (SPR) may be considered as a hybrid opto-electronic technique. The 
oscillations are retarded by interactions with biomolecules adsorbed at the metal surface; 

ii. mechanical: the oscillation frequency of a quartz crystal is inversely proportional to the mass of 
biomolecules (and their shape and viscoelasticity) attached to the crystal surface; 

iii. optical: the reflectance change of the solid-liquid interface due to the formation of a thin film of 
biomolecules is monitored. Since they are the most sensitive, the most versatile (especially regarding 
possible choices of solid materials) and the most informative, optical methods have become rather popular. 

The simplest approach conceptually is to directly measure the reflectance at different angles and fit the Fresnel 
equations to the data (scanning angle reflectometry, SAR) [ 124 ]. A thin film of adsorbed biomolecules needs at 
least two parameters to characterize it, its refractive index n A and geometrical thickness d A . Even though an 
adlayer composed of randomly adsorbed particles is nonuniform (heterogeneous on the nanometer scale), the 
uniform thin film approximation appears to yield satisfactory results [ 125 ], although the optically determined 
geometrical thickness depends on the refractive index profile perpendicular to the interface and may be smaller 
than the largest dimension of the adsorbed molecule. The number v of adsorbed molecules per unit area can be 
calculated from the relation [126] 

u= <wd. (C2M48 > 

where n c is the refractive index of the solvent and dn/dc is the refractive index increment of the biomolecules in 
solution [ 127 ]. 

The Fresnel equations predict that reflexion changes the polarization of light, measurement of which forms the 
basis of ellipsometry [ 128 ]. Although more sensitive than SAR, it is not possible to solve the equations linking the 
measured parameters with n A and d A in closed form, and hence they cannot be solved unambiguously, although 
their product yielding v (equation C2. 14.48) appears to be robust. 

The most recently introduced optical technique is based on the retardation of light guided in an optical waveguide 
when biomolecules of a polarizability different from that of the solvent they displace are adsorbed at the waveguide 
surface (optical waveguide lightmode spectroscopy, OWLS) [11]. It is even more sensitive than ellipsometry, and 
the mode 
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equations characterizing the phase velocities of the guided light can be solved analytically to yield n A and d A . All 
work reported so far appears to have been restricted to the measurement of two modes, which are sufficient to 
completely characterize uniform isotropic films, but in principle higher modes can also be measured, enabling more 
complex films to be characterized. The integrated optical interferometer [ 129 ], in which two orthogonally polarized 
modes interfere with one another (since each polarization interacts differently with the adsorbed biomolecules, the 
interference pattern shifts according to v) is a further development which offers even higher sensitivity, 


proportional to the path length over which interference and adsorption take place. 

C2.1 4.7.1 BIOCOMPATIBILITY 

The reactions of biopolymers at interfaces form the basis of some extremely important industrial processes. The 
primary process in all cases is the adsorption of biomolecules, usually proteins. If ultimately living cells are 
adsorbed, this always takes place onto a preadsorbed protein layer (which may be secreted by the cells themselves 
[ 130 ]). These processes can be classified into three categories: 

i minimal adsorption, as in the preparation of materials for surgical implants in contact with the blood (stents, 
replacement tubing, heart valves, biosensors, etc), filtration (including renal dialysis), storage of 
pharmaceuticals in solution, and antifouling paint for ships' hulls; 

ii maximal adsorption, mainly for surgical implants in contact with tissue (e.g. replacement bones) which 
should be mechanically firmly integrated into the host; 

iii variable adsorption, as in coatings for chromatographic separation materials. 

Current emphasis is on the behaviour of proteins at the solid-liquid interface, but liquid-air and liquid-liquid 
interfaces, which were actually investigated much earlier [ 131 ], are still important. 

Much of the science of biocompatibility can be reduced to the principles of how to determine the interfacial 
energies between biopolymer and surface. The biopolymer is considered to be large enough to behave as bulk 
material with a surface; since (for example) a water cluster containing only 15 molecules and with a diameter of 0.5 
nm already behaves as a bulk liquid [ 132 ] it appears that most biological macromolecules can be considered to 
have surfaces. The interfacial energy AG IF can be decomposed into three components: 


AGJ 23 = AG ( l2 ?> + AG l $ + AG^'j (C2.14.49) 

corresponding to the Lifschitz-van der Waals (LW), electron donor-acceptor (da) and electrostatic (el) interactions; 
subscripts 1, 2 and 3 refer to solid, solvent and biopolymer respectively. 

A salient feature of natural surfaces is that they are overwhelmingly electron donors [ 133 ]. This is the basis for the 
ubiquitous 'hydrophilic repulsion' which ensures that a cell can function, since massive protein-protein 
aggregation and protein-membrane adsorption is thereby prevented. In fact, for biomolecule interactions under 
typical physiological conditions, i.e. aqueous solutions of moderately high ionic strength, the donor-acceptor 
energy dominates. 
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Tables of single substance surface tensions y can be built up through the measurement of contact angles [ 134 ]; a 
few examples are collected in table C2.14.1 . These can be combined pairwise according to: 


V l2 =WY] ~VYz ) (C2.14.50) 


and 


rlr = 2(VX? - v^KV>f - V/f )■ (C2.14.51) 


The interfacial free energies for infinite parallel surfaces at contact are given by the relation [ 134 ]: 

A ^nfLW or da) II (LW or da) (LW or <Ja) (LW or da) iro * A rm 

and these can already provide an indication whether a surface is suitable for promoting or hindering biopolymer 
adsorption. 

A next step is to consider the surface-particle distance z and curvature (interfacial radius R) dependence of the 
interactions [ 134 ], for which approximate expressions are: 

AG< LW > = lirlJAG^XR/z (C2.14.53) 

where / Q is the equilibrium contact distance (distance of closest approach); 

AC td) = 4« ^|tfr 3 ln[l + exp(-«)]/t (C2.14.54) 

where the \\r are the electrostatic surface potentials (see [ 135 ] for an up-to-date discussion), and 1/k is the Debye 
length; and 

AG m = 2xx AG (da)l ' exp[(£ - z)/x]r (C2.14.55) 

where % is the decay length for the da interaction. These equations (equation C2. 14.53), (equation C2. 14.54) and 
(equation C2. 14.55) are for a sphere approaching an infinite plane; for two spheres approaching each other the 
perfect energies must be halved. Their sum ( C2. 14.49 ) can be integrated to compute the association distance 8^ 
[136]: 
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& a = j [exp(AG(z)/kT) - l]dz (C2.14.56) 

whence the adsorption rate constant (cf £ f in section C2. 14.4 can be computed: 

k A = D/S R . (C2.14.57) 

Cases are known in which the use of the single substance surface tensions leads to predictions at variance with 
observation. For example, using equations (C2JA49), ( C2.14.50 ), ( C2.14.51 ), ( C2.14.52 ), ( C2.14.53 ), ( C2. 14.54 ) 
and ( C2.14.55 ) and the data in table C2.14.1 the interfacial free energy between seralbumin and silica is predicted 
to be positive, and the protein should therefore be repelled, whereas as is well known it is strongly adsorbed. There 
are several possible reasons for discrepancies. One is that the characteristic length scale of the (macroscopic) 
surface tension is different from (probably larger than) the characteristic length for protein adsorption; possibly it is 
more appropriate to use the microsurface tension [ 137 ] for these calculations. Another is that the use of average 
curvatures, surface potentials and so on is too crude for biomolecules with their intricate surface topography. 
Finally, equation C2. 14.49 , equation C2. 14.50 , equation C2. 14.51 , equation C2. 14.52 , equation C2. 14.53 , equation 
C2. 14.54 and equation C2. 14.55 assume that the protein remains unchanged upon interaction with the surface. 
While this is likely to be true up to the moment of initial contact, native folded proteins are only marginally stable 
and intramolecular contacts maintaining the native structure may be substituted by molecule-surface contacts with 


concomitant unfolding. Since equation C2. 14.56 and equation C2. 14.57 apply to the approach of the protein up to 
its initial contact with the surface, equation C2. 14.49 , equation C2. 14.50 , equation C2. 14.5 1 , equation C2. 14.52 , 

equation C2. 14. 5 3 , equation C2. 14.54 and equation C2. 14.55 should be valid for computing 1^ equation C2. 14.56 - 
equation C2. 14. 5 7 , but a complete description of the adsorption process must take subsequent events into account. 
Folding is entropically costly since compact configurations are restricted to fairly small regions of the 
Ramachandran map [39], but in the native conformation this cost is outweighed by the enthalpy-losing 
intramolecular contacts. At a surface, however, the possibility of losing enthalpy by protein-surface contacts 
enables the molecule to adopt an extended, less entropically costly (since the corresponding Ramachandran map 
regions are large) configuration. This view is corroborated by observed changes in the optical rotation (circular 
dichroism) of protein solutions to which minute colloidal particles onto which the proteins can adsorb are added; 
the changes are consistent with varying degrees of loss of a-helical secondary structure [ 138 ]. There are still some 
puzzles, however: some protein-surface combinations appear to lead to an increase of a-helical structure [ 139 ], and 
differential scanning calorimetry of proteins in the absence and presence of minute colloidal particles [ 140 ] have 
shown that the temperature of the denaturation transition can be significantly lower for adsorbed proteins compared 
with the native dissolved state. Clearly the nature of the protein-surface contacts need more careful scrutiny: a 
complicating feature is that if conformational rearrangement does take place, it will usually lead to a biopolymer 
surface chemically different from that of the native conformation. For example, essentially no polar amino acids 
are found in the interior of a globular protein, and therefore almost any conformational rearrangement must result 
in the dilution of polar residues on the surface. The diminished protein surface polarity should result in stronger 
adsorption to apolar surfaces. A plethora of non-native adsorptive contacts constraining the polypeptide chain could 
in principle cost even more entropy than the native folded structure. Desolvation of the protein-solvent and 
protein-surface interfaces will also contribute to the free energy [ 141 ]. A further complication at all but the lowest 

coverages is that lateral interactions between adsorbed proteins will also affect AG [142]. Far too few different 
proteins and surfaces have been investigated experimentally sufficiently carefully for reliable general conclusions 
to be drawn on these matters at present. 
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Table C2.14.1 Single substance macroscopic surface tensions/(mJ m ) of various materials (data mostly from 
[134]). 


Material 

y (LW) 

y @ 

y 

Biomaterials 




Cellulose 

44 

1.6 

17 

DextranT-150 

42 



55 

Fibrinogen 

37 

0.1 

38 

Immunoglobulin 

34 

1.5 

50 

Lecithin 

29 

2.7 

60 

Serum albumin 

27 

6.3 

51 

Synthetic polymers 




Nylon 6,6 

36 

0.02 

22 

Polyethylene 

33 





Polyethylene oxide 

43 



64 

Polystyrene 

42 



1.1 

Polyvinyl chloride 

43 

0.04 

3.5 

Teflon 

18 






Metal oxides 




Si0 2 

39 

0.8 

41 

Ti0 2 

42 

0.6 

46 

Zr0 2 

35 

1.3 

3.6 

Liquids 




Water 

22 

25.5 

25.5 

Ethanol 

19 



68 

Chloroform 

27 

3.8 



Hexadecane 

28 
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Materials implanted in a living body should not engender an immune response. Even though the adsorption of a 
few pure proteins to interfaces under diverse conditions appears now to be reasonably well characterized, at least 
phenomenologically, adsorption from highly complex body fluids such as blood are only just beginning to be 
investigated quantitatively using the same methods applied successfully to pure materials [ 143 ]. Among the 
hundreds of different proteins in blood are enzymatic networks capable of triggering thrombus formation [144], or 
an immune response as soon as the adsorbed layer acquires certain, essentially still unknown, attributes. 
Furthermore, the implanted material must not corrode or disintegrate, releasing particles which could themselves 
engender an immune response. In fact, a truly biocompatible material needs to have an adaptive capability, and 
should thus qualify for the appellation 'smart', just as biological tissue itself does. 

C2. 14.7.2 KINETICS OF BIOPOLYMER ADSORPTION 

The presence of the solid surface imposes new conditions onto the disposition of reactants, compared with the 
homogeneous case ( section C2. 14.4.1 ). Adsorption is often observed to approach a plateau, yet is irreversible with 
respect to dilution. The plateau must therefore arise because no more space is available for adsorption, rather than 
through a dynamic adsorption-desorption equilibrium and it can be inferred that the dissolved biopolymer does not 
adsorb to its preadsorbed congeners. This behaviour is by no means universal: it has been proposed that the plaques 
associated with spongiform encephalopathies arise through the native, normally soluble PrP c protein being 
partially denatured upon contact with a surface to become the pathogenic PrP Sc form, to which the PrP c can adhere 
to form multilayers. 

A common feature of biopolymer adsorption is that its rate is usually one to three orders of magnitude smaller than 
the diffusion-limited rate to a perfect sink: 


(diVdJ.W = Dc h fS (C2.14.58) 

where C b is the bulk dissolved concentration and 8 the thickness of the diffusion boundary layer [ 145 ] (this can be 
quickly ascertained by drawing a tangent to a plot of v versus t at £— » and comparing its slope with the right-hand 
side of equation C2. 14.58. This implies the existence of an energy barrier characterized by a rate coefficient k^ 
(equations ( C2. 14.56 ) and ( C2. 14.57 )). For adsorption to small particles (colloidal minerals, vesicles etc.) of radius 
R, the effective 8 is given by [ 146 ]: 

(C2.14.59) 


]/$M = 1/R+ 1/5, 

Even cursory inspection of typical (v,t) data shows that the evolution does not follow the single exponential 
approach to saturation implied by, for example, ( equation C2. 14.22 ) with initial concentrations X^ » s Q . Such data 
are sometimes described as 'biphasic', and one encounters attempts to fit and interpret them with two exponentials, 
even though there does not seem to be any theoretical justification for doing so. The basic kinetics of adsorption are 
described by: 

4v/<it =Jt a t- v ^ (C2.14.60) 
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where c y is the concentration in the vicinity of the surface and depends on c b , k^ and hydrodynamic factors, i.e. D 
and 8 [87], and <\> is the fraction of the surface available for adsorption. The familiar Langmuir expression § = 1 - 0, 
where is the fraction of surface occupied, introduced to describe the adsorption of gases onto metals, whose 
surface is assumed to consist of discrete, noninteracting sites larger than the ligand, indeed predicts an exponential 
approach to saturation. For most cases of practical interest in biocompatibility, however, the Langmuir assumptions 
are invalid: the surface is a continuum. The adsorption of one particle creates an exclusion zone around it, within 
which the centre of another particle may not adsorb, since particles cannot overlap. The exclusion zone has twice 
the diameter of the particle and its area is quadruple that of the particle, hence, for small 0, <\> = 1 - 40. As coverage 
increases, exclusion zones begin to overlap and the factor -40 overcompensates for the loss of available area; for 
the overlap of two exclusion zones, a factor proportional to , and for the overlap of three exclusion zones, a 
factor proportional to , must be added back [ 147 ], the proportionality constants depending on the shapes of the 
particles and whether lateral diffusion or desorption is allowed. This problem of random sequential adsorption 
(RSA) has been solved exactly in one dimension (useful for describing proteins adsorbing onto DNA) [ 148 ], and 
accurate interpolation formulae, for two dimensions are now available [149], which have been shown to describe 
experimental adsorption data very well [ 150 ]. The RSA process has infinite memory and the configurations 
generated are quite different in many respects from their equilibrium counterparts [ 151 ]. 

RSA has turned out to be an extremely useful formalism for making structural inferences from adsorption kinetics. 
Where pure random sequential adsorption is observed, the area a occupied per molecule can be determined (note 
that a is the constant of proportionality between and v). If this area depends on c^, conformational rearrangement 
leading to spreading is inferred and its kinetics and magnitude can be determined [ 152 ]. Nucleation and growth of 
two dimensional islands also have a characteristic kinetic signature [ 153 ]. Occasionally, Langmuir adsorption is 
observed in protein adsorption onto a continuum [ 154 ], unambiguously implying that clustering or crystallization 
of the adsorbed biopolymer takes place, thereby annihilating the exclusion zones. Both kinetic parameters and the 
crystal unit cell dimensions can then be determined [ 154 ]. 

A simple mapping enables the RSA formalism to be applied to binding of a ligand L to receptors R embedded in a 
surface (the RSA-random site (RSA-RS) model) [ 149 , 
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C2.14.8 BIOLOGICAL INFORMATION 

Information, like energy, is an irreducible concept, but is surprisingly rarely mentioned in textbooks of biophysical 
chemistry (whereas bioenergetics has developed into a distinct field of its own). Yet information is maybe even 
more germane than energy to the very essence of life, starting with the DNA which, it is often stated, encodes our 
organism, initially via the amino acid sequence which encodes protein structure, and so on. The neologism 
'bioinformatics' usually denotes the analysis of DNA sequences, in particular the comparison of sequences derived 
from different organisms but apparently encoding the same protein. That this is a very difficult task is well 
illustrated by the analysis of transcriptional promoter sequences (to which a protein must bind in order for 
transcription to be initiated), which have few discernible common features which could be used to identify them. 


To get some flavour of the magnitude of the problem, consider that the familiar bacterium E. coli contains almost 
five million nucleic acid base pairs, which encode about 4000 genes and a comparable number of promoters. 
Eukaryotic organisms have much more DNA, of which only a small proportion appears to code for proteins. It is a 
puzzle that non-coding DNA (introns and intra-gene sequences) shows long range correlations whereas coding 
DNA does not [156]. 

Perhaps the central question in this field is whether the genome specifies the construction of the organism in a 
rather deterministic (and hierarchical, according to the homeotic gene concept [ 157 ]) fashion, or whether the genes 
merely specify rules with which the organism can be constructed, more in the spirit of the stigmergic building 
referred to previously [45], or the brain, in which it appears that connexions between specific cells are not 
preprogrammed, but grow according to an algorithm given genetically to select certain favourable system structures 
[ 158 ]; moving back a stage further, the genes could merely specify how to construct an algorithm for specifying the 
construction. It is actually difficult to establish the existence of a real command structure, and it is consequently 
legitimate to enquire whether so-called master genes are merely akin to the king who daily ordered the sun to set, 
and in the morning to rise again, and was considered by most of his subjects to be an omnipotent autocrat whose 
orders were infallibly obeyed. What is established is that genes are powerless in isolation: they specify protein, but 
the realization of the specifications (and indeed the synthesis of the genes themselves) itself involves (other) 
proteins. The scheme of organization thus appears to be heterarchical rather than hierarchical, rather like the brain 
[159]. 

Current views of metabolic regulation are largely inspired by the lac operon of E. coli, which was comprehensively 
described almost 40 years ago [ 160 ] and was for many years thereafter the sole exemplar discussed in textbooks. 
Much work has been dedicated to identifying and characterizing the molecules involved, but how all these elements 
fit together remains elusive [161], and the need to move beyond the treatment of individual elements in isolation, 
towards concepts such as distributive control and supramolecular organization has been stressed [ 162 ]. Systems 
theory [12, 163 , 164 ] was originally developed to render tractable this jungle of complexity, but it no longer seems 
to be part of mainstream research in the field, possibly because of the insufficiently close collaboration of the 
different disciplines which would have been needed to ensure its successful application to biological problems 
[ 165 ]. More recently an approach based on analogies between genetic and electric circuits seems capable of 
yielding valuable insights [ 166 , 167 , 168 ]. 

Biological information is also concerned with the analysis of biological messages and their import. The 
fundamental premise of the protein-folding problem section C2. 14.2.2 is that the full three-dimensional 
arrangement of the protein molecule can be predicted, given only the amino acid sequence, together with the 
solvent composition, temperature and pressure. One test of the validity of this premise is to compare the 
information content of the sequence with the information contained in the structure [ 169 ]. The former can be 
obtained from Shannon's formula: 
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H = -J2p r \og 2 Pr (C2.14.64) 

f=] 

where p is the probability of occurrence of the rth amino acid, and the latter can be quantified via the algorithmic 
complexity [ 169 ]; they are approximately 2 and 0.5 bits per amino acid respectively, i.e. the sequence contains 
somewhat more information than is required by the structure. This is a puzzle. Clearly discussion of information 
content needs to be complemented by an appraisal of the value of the information [65]. The discrepancy between 
the two measures hints at the protein indeed being an object whose structure in part reflects noise expressed 
macroscopically [66, 158 ]. 

Analysis of the global statistics of protein sequences has recently allowed light to be shed on another puzzle, that of 
the origin of extant sequences [ 170 ]. One proposition is that proteins evolved from random amino acid chains, 
which predict that their length distribution is a combination of the exponentially distributed random variable giving 
the intervals between start and stop codons, and the probability that a given sequence can fold up to form a compact 


structure, which increases with sequence length. An alternative view is that modern proteins evolved from a small 
set of 'starter' sequences, but this does not provide a simple, natural explanation for the observed extant 
distribution in the way that the first proposition does [ 170 ]. 

Reference to gene chips as a tool for investigating the expression of messenger RNA (mRNA), the precursor to 
protein synthesis, has already been made section C2. 14.1 . The alternative is to extract all the proteins in a cell and 
separate them (according to molecular weight and isoelectric point) using two-dimensional gel electrophoresis 
[ 171 ], after which their abundances may be quantified. This is a much more onerous procedure than the gene chip 
method, and has its own drawbacks, such as poor recovery of membrane proteins, but on the other hand the 
relationship between mRNA and protein abundance is complex, nor can the gene chip take account of the 
numerous post-translation modifications such as glycosylation. 

The collection of proteins expressed in a cell is called its proteome (cf the genome, the collection of genes). Much 
effort has been expended on identifying the individual proteins separated by 2D gel electrophoresis, but this is 
rather like discussing an author's use of particular words, for example when the authorship of a work is disputed: as 
Yule has pointed out in that context, such discussion gives not the slightest notion as to what the vocabulary is like 
as a whole [ 172 ]. It is therefore of great interest to examine the properties of the entire proteome, and such an 
investigation has yielded the curious result that the distribution of rates of protein synthesis p r (or protein 
abundances) in prokaryotes follows the simple canonical law (scl): 

p r = /V + p)" 1 ^ (C2.14.65) 


where r is the rank, and P, p and are parameters, remarkably well [ 173 ]. This might be regarded merely as a 
useful exercise in data reduction were it not for the fact that the simple canonical law is precisely the distribution 
expected for a communication system minimizing its energy expenditure, while constrained by the given amount of 
information which has to be conveyed, word by word. 

The distribution also has a certain information content which can be calculated using equation (C2. 14.64), and it 
turns 
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out that this is a rather low number: typically around 8 bits/protein [ 173 ], far less than the information contained in 
the sequence, let alone in the structure [ 169 ]. Does this mean that much of the sequence-structure information is 
actually irrelevant to the protein, which has 'merely' to fulfil a certain specified function? For an enzyme or a 
protein which has to recognize a substrate or a binding site, the conformation and side-chain chemistry of the 
catalytic or binding site may be far more crucial than the rest of the protein. But one should also bear in mind that 
proteins do not exist in isolation. Their expression (and folding, for about 10% of proteins) requires the presence of 
other proteins, and nearly all proteins interact with others, either to build up structures, or in subtle and complex 
signalling pathways. Hence the information content and value of an individual protein cannot be assessed in 
isolation, but must be evaluated in the context of the entire repertoire, just as the genome is not a mosaic of 
individual genes, each coding for a single protein or attribute, but a highly complex interconnected, possibly even 
heterarchical, network [ 174 ]; and just as a little 'no' in a long paragraph could be absolutely crucial to the import of 
the entire message. 

C2.14.9 CONCLUDING REMARKS 

The study of living organisms, although traditionally reserved for the biologist, is a field in which biology, 
chemistry and physics must work together in order to make real advances. Ageno [16] has further emphasized that 
to view this development as the 'conquest' of one discipline by another is quaintly outmoded: the classification of 
this or that field as part of a particular discipline is rather arbitrary and mainly of historical interest. The fusion of 
the work of the biologist, chemist and physicist is irremeable; one may call this fusion biological physics, or, more 
comprehensively, biophysical chemistry; a toolbox with the help of which a mathematical description of biological 


phenomena can be given. Sometimes these descriptions will be caricatures of reality, but one hopes that they at 
least fulfil their purpose of capturing its essence. 

It has become fashionable to prefix the names of disciplines with 'bio', as in biophysics, bioinformatics and so on, 
giving the impression that in order to deal with biological systems, a different kind of physics, or information 
science, is needed. But there is no imperative for this necessity. Biological systems are often very complex and 
compartmentalized, and their scaling laws may be different from those familiar in inanimate systems, but this 
merely means that different emphases from those useful in dealing with large uniform systems are required, not 
that a separate branch of knowledge should necessarily be developed. 

Experimental work in the field is often burdened by tension between the small amounts of pure materials generally 
available for investigation (a problem compounded by their instability) and the complexity of biological systems 
and hence the large number of possible interpretations of data, which calls for more detailed investigation than 
otherwise. The frequently encountered, seemingly poor, reproducibility of experiments with biological materials 
and systems also calls for more experiments than would suffice for simpler systems. This 'irreproducibility' might 
well be a manifestation of the fact that the measured phenomena are the result of multiplicative rather than additive 
processes. For the latter, the sum and its distribution converge rapidly enough to their asymptotic values; for the 
former, a principle comparable to the central limit theorem is lacking; in fact the average value of the product 
diverges exponentially from the most probable value as the number of random variables contributing to it increases 
[175]. 
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C2.15 Optoelectronics 

William L Wilson 


C2.15.1 INTRODUCTION 

Optoelectronic technologies [1] encompass a wide variety of devices and structures used to generate, manipulate, 
detect and direct light signals. This interdisciplinary field, born at the intersection of microelectronics and optics, is 
poised to provide the primary building blocks of the communications and computing infrastructure of the 21st 
century. The development of high-bandwidth optical networks using wave division multiplexing [2], for example, 
requires a wide variety of components for network construction. The optical communications and optical 
networking revolution is being fought with fibre, optical routers, modulators and detectors. The development of 
these basic optoelectronic components will be key in determining the progression of communication technologies 
for years to come. 

In this chapter we review the fundamental processes which allow us to define and control optical sources and 
signals. The basic mechanisms for generation, transmission and manipulation are described. A large number of 
detailed treatises have been published describing many of the phenomena covered here [3, 4, 5, 6 and 7]. Because 
of space limitations, we will offer rudimentary descriptions and insights of the subjects covered and will attempt to 
ensure that the references listed will allow exploration of the subject matter to whatever detail is desired. It is 
important to note that any description of optoelectronic technologies will have roots deep in optical physics, 
quantum mechanics and electromagnetic theory. Here only essential formulae are derived; detailed derivation of all 
the expressions can be found in the references cited. Our goal is to give the reader a flavour of the technology and 
hopefully to stir your imagination and interest in this exciting, evolving and rapidly expanding field. 


C2.15.2 ELECTROMAGNETIC WAVES 

In order to understand how light can be controlled, we must first review some of the basic properties of the 
electromagnetic field [§]. The electromagnetic theory of light is governed by the equations of James Clerk 
Maxwell. The field phenomena in free space with no sources are described by the basic set of relationships below: 

V x H = /■„ (C2.15.1) 

9t 

V x E = -u (C2.15.2) 

V « E = (C2.15.3) 

V . H = D (C2.15.4) 
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where i? and H are the electric and magnetic fields respectively. Here the constants s and ju are the electric 
permittivity and the magnetic permeability of free space. 

The necessary boundary conditions required for E and H to satisfy Maxwell's equations give rise to the well 
known wave equation for the electromagnetic field: 

-\d 2 u 

V'U =0 (C2.15.5) 

c 2 ilr 2 

where c=\ I (Sq(lx ) =3 x 10ms, the speed of light in a vacuum. 

This wave equation is the basis of all wave optics and defines the fundamental structure of electromagnetic theory 
with the scalar function U representing any of the components of the vector functions E and H. (Note that equation 
(C2.15.5) can be easily derived by taking the curl of equation (C2.15.1) and equation (C2.15.2) and substituting 
relations ( C2.15.3 ) and ( C2.15.4 ) into the results.) 

Although a complete treatment of optical phenomena generally requires a full quantum mechanical description of 
the light field, many of the devices of interest throughout optoelectronics can be described using the wave 
properties of the optical field. Several excellent treatments on the quantum mechanical theory of the 
electromagnetic field are listed in [9]. 

In general, the wave equation describes the propagation characteristics of a disturbance through some transparent 
media, specifically the electromagnetic wave is just a subset of the physical phenomena which satisfy this 
relationship. The solutions of this equation with appropriate boundary conditions gives rise to all the behaviours we 
commonly associate with the wave properties of light. For monochromatic waves, all of the components of the 
electric field are harmonic functions of time and space. Electromagnetic waves are by definition transverse, i.e. the 
electric (E) and magnetic (H) field disturbances are orthogonal to the propagation direction (z), with the E and H 
fields orthogonal to each other ( figure C2.15.1 ). The electric field is characterized by the amplitude (A), the 
wavelength (k), the phase of the wave and the velocity of the wavefront. The plane wave is the most general and 
simplest example of a three-dimensional solution of the wave equation, in addition it provides a somewhat ideal 
input field for all optoelectronic applications. The wave has the form: 

U(l\ t) = A e - (k ''- w,) (C2.15.6) 


where A is the amplitude of the field disturbance, k is the propagation vector and has the magnitude 2n/\ and co is 
the angular frequency, co = 2nv. 
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where A is the amplitude of the field disturbance, k is the propagation vector and has the magnitude 2n/\ and co is 
the angular frequency, co = 2nv. 
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Figure C2.15.1. The transverse electromagnetic wave. 

This basic equation describes waves, whose properties are related as follows: 

C 
V = — 

X 
w_ _ _ I 
k p V^u^o 


(C2.15.7a) 


(C2.15.75) 


where v is the frequency of the wave and v is the phase velocity. 

For the electromagnetic fields E and H the form of the waves of interest is 


(C2.15.8) 


for many of the applications described here. 

Above we described the nature of Maxwell's equations in free space in a medium, two more vector fields need to 
be 
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(C2.15.9) 


HB 

V x E = (C2.15.10) 

?>/ 

VD=0 (C2.15.11) 

V + U = (C2.15.12) 


with 


Z> = fL>E>P (C2.15.13) 

B = fluH + fl i} M . (C2.15.14) 

These new quantities allow us to directly relate properties of the media to E and H. In essence they afford us the 
opportunity to quantify the field-matter interaction. The media response to the fields is described generally in terms 
of the polarization, P and the magnetization, M. (We note that in free space P and M = and we recover equation 
(C2.15.1) , equation (C2.15.2) , equation (C2.15.3) and equation (C2.15.4) above.) 

For isotropic media we will assume that P is parallel to E with the coefficient of proportionality independent of 
direction: 

P=Xc E (C2.15.15) 

where the constant % Q is the electric susceptibility of the medium. The electric displacement is therefore 
proportional to E: 

D = eE (C2.15.16) 

where s = 1 + 4n% Q is the dielectric constant. This parameter relates the material properties of the media to the 
polarization generated through its interaction with the external field. This polarization becomes a source term in 
Maxwell's equations giving rise to new fields mediated via the material-field interaction [9]. Absorption and 
dispersion processes can be attributed to s with 

fj = ^/7: =yj + if£ (C2.15.17) 

being the complex refractive index, where the real part is related to dispersive properties of the media and K, the 
absorption coefficient, is determined by the imaginary part of the polarization [8]. 

The last attribute of the electromagnetic field we need to discuss is wave polarization. The nature of the transverse 
field is such that the oscillating field disturbance (which is perpendicular to the propagation direction) has a 
particular orientation in space. The polarization of light is determined by the time evolution of the direction of the 
electric field 
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vector E(r,t). In our description, Eq is modulated by a phase factor which maps out the field oscillation in space. If 
z is the axis of propagation, we can define the amplitude factor as 

jEih = EtX + EyV (C2.15.18) 

ft 

where the unit vectors -V and v are orthogonal and 

E x =< T exp(i0 A ) (C2.15.19) 

the amplitudes e x with their phase factors § map out the polarization vector in the x-y plane. The resultant E- x ranges 
from linear polarized light, for <|> v or (j) tj = 0, through all possible combinations resulting in elliptical fields. In figure 


C2.15.2 a right circularly polarized wave is illustrated. As the wave propagates, E^ sweeps out a circle in the x-y 
plane. It is clear that, given a well characterized light source, there are many attributes we can attempt to control 
(wavelength, polarization, etc.); the question is how to generate well-characterized light? 



Figure C2.15.2. Right circularly polarized light. As the wave propagates the resultant E sweeps out a circle in the 
x-y plane. 


C2.15.3 SOURCES: THE LASER 

Given the general description of the electromagnetic field, let us explore the sources available for optoelectronics. 
The one primary light source for optoelectronic device and system architectures is the laser. The laser [ 10 ] is the 
source of choice simply because if we want to control light fields they need to be well defined at the start and the 
laser is the most 


-6- 

The acronym LASER (Light Amplification via the Stimulated Emission of Radiation) defines the process of 
amplification. For all intents and purposes this method was elegantly outlined by Einstein in 1917 [JJJ wherein he 
derived a treatment of the dynamic equilibrium of a material in a electromagnetic field absorbing and emitting 
photons. Key here is the insight that, in addition to absorption and spontaneous emission processes, in an excited 
system one can stimulate the emission of a photon by interaction with the electromagnetic field. It is this 
'stimulated' emission process which lays the conceptual foundation of the laser. 


The essential result of quantum theory [12] is that each physical system can be found upon measurement to be in 
one of a pre-determined set of energy states — the eigenstates of the system. These eigenstates [13] result from the 
solution of the Schrodinger equation for the system under study with the Hamiltonian [13] chosen to give the most 
complete characterization of the total energy of the system. Some classic analyses of generic systems [13] include 
the harmonic oscillator, the hydrogen atom and the hydrogen molecule ion problems. In each of these cases the 
solutions allow us to adequately predict the energetic processes for the complex systems mentioned. Let us assume 
we have a system described by Figure C2.15.3. Let us isolate two levels E^ and E 2 . If the system is in E 2 , there is a 
finite probability per unit time that the system will decay to E^ with the emission of a photon of energy hv 2 \- (The 
energy difference is (E n - E nl ) = hv n nl ) This spontaneous emission is characterized by a lifetime for state E 2 . 
For an ensemble of the systems described above, we can represent the time rate of change of the population density 
as 


dp2 


= Al]Pl = 02/(Wi) 


(C2.15.20) 


i 2 


UJ 


Figure C2.15.3.Generalized energy level diagram. 
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In the presence of an electromagnetic field of energy of about /zv 21 , our systems can undergo absorptive transitions 
from E^ to E 2 , extracting a photon from the electric field. In addition, as described by Einstein, the field can induce 
emission of photons from E 2 to E^ (given E 2 is occupied). Let the energy density of the external field be E(v) then, 


W21 = B 2l Eiv) 
WU~ B\ 2 E(v) 


(C2.15.21) 
(C2.15.22) 


where the B.. are the Einstein coefficients for absorption and stimulated emission. The W.. are the associated 

y y 

transition probabilities. In this picture the total transition rate would be 

W^ = Z?2l£(v) + A; i (C2.15.23) 

where we have simply added the spontaneous contribution. In thermal equilibrium it can be shown that [1] 


h 2. I 


(C2.15.24) 


where W M is the equilibrium transition probability and g(v) is a spectral lineshape function. 

eq 

Using equation (C2. 15.24), we can derive a general expression for the absorption coefficient for this simple two- 
level system: 




If we substitute equation (C2. 15.25) into Beer's law 


(C2.15.25) 


(C2.15.26) 


it is clear that when the upperstate population exceeds that of the lower state there will be an exponential increase 
in the field intensity as the photon flux propagates through the active media. This 'population inversion' is the 
primary condition required for laser action. Since this is a non-equilibrium condition energy must be introduced 
into the system to reach this state, the process of 'pumping' is the introduction of the energy required to reach 
inversion and depends on the system in which we are attempting to obtain laser action. The first optical lasers were 
developed nearly 50 years after the introduction of the Einstein equations — direct evidence of the difficulty of 
creating inverted conditions. The key to laser design has been to find systems that can be efficiently pumped to 
produce gain. 

It has been said that anything will lase if pumped with enough energy, but the efficiency of the pumping process is 
important for practical, economical devices. In this regard two-level lasers are of little interest because, except 
under extraordinary pumping conditions, one can only equalize the populations of the upper and lower levels. A 

three-level laser is illustrated in figure C2. 15.4(a) . The first solid-state laser (Cr 3+ :A10 3 ) ruby was of the three- 
level variety (b). The scheme works as follows. Atoms are pumped optically from state 1 to state 3. Non-radiative 
relaxation moves the 


-8- 


population from state 3 to state 2, creating (at sufficient pumping levels) a population inversion between level 2 and 
level 1. Lasing occurs on the 2-1 transition with the linewidth of the emission determined by the kinetics of the 
system and the resonator design. 
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Figure C2.15.4. (a) A three-level laser energy level diagram and (b) the ruby system. 

Four-level lasers offer a distinct advantage over their three-level counterparts, (figure C2.15.5). The Nd 3+ :YAG 
system is an excellent example of a four-level laser. Here the terminal level for the laser transition, |2), is 
unoccupied thus resulting in an inverted state as soon as any atom is pumped to state 3. Solid-state systems based 
on this pumping geometry dominate the marketplace for high-power laser devices. 
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Figure C2.15.5. (a) A four-level laser energy level diagram and (b) the Nd :YAG system. 
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A wide variety of methods has been used to pump laser systems. Although optical pumping has been implied, there 
is an array of collisionally or electron impact pumped systems, as well as electrically pumped methods. The 
efficiency of the pumping cycle in many ways defines the utility and applications of each scheme. The first 


material where optical laser action was observed was in the ruby system mentioned above. Here intense flashlamps 
were used to pump the system, which runs naturally in a pulsed mode. True continuous wave (CW) systems were 
first demonstrated in gaseous gain media. 

The He-Ne laser system was the first efficient CW laser. It is still one of the most common systems in use today. 

Its level diagram is shown in figure C2.15.6 . Here a DC or RF discharge is used to excite the He + ions, which in 
turn 
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collisionally excite Ne + ions. Lasing occurs between several S and P bands with resonators designed to optimize 
the wavelength of interest. 
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Figure C2.15.6. The He-Ne system. 


For primary optoelectronic device applications the most relevant laser source is the semiconductor laser. A detailed 
analysis of semiconductor laser theory can be found in several references [6, 14] and is treated elsewhere in this 
volume, the basic operation of the lasers will be described qualitatively here. Semiconductors [ 15 ] are in complex 
multi-atom crystalline systems which can be characterized by dense bands of energy levels (derived from atom- 
atom interactions) separated by 'forbidden' gaps. These gaps are regions of phase space which do not match the 
boundary conditions required for electronic states. The simplest view of a semiconductor is as an ensemble of 
interacting atoms characterized by loosely bound valence electrons, coupled to a strong periodic potential derived 
from the atomic nuclei. The periodicity gives rise to the boundary conditions mentioned above, with the details of 
the energy levels determined by the atoms in the array and the specifics of the crystal structure. Figure C2.15.7 
shows a generic energy level diagram for metals, semimetals, insulators, and semiconductors. Intrinsic 
semiconductors have full valence bands and generally have band gaps that range from 0.1 to 4 eV. Absorption and 


emission occurs via promotion of excited electrons from occupied to unoccupied band states. The key property of 
these materials is that the number of these states can be modified thereby changing the electronic properties of 
these materials. It is the ability to change the 
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electronic properties of these materials easily that has led to their extensive use as materials for complex electronic 
device fabrication. The addition ('doping') of electron-deficient or electron-rich atoms to the lattice can greatly 
modify the populations of the electrons and holes. Clearly, one now has another degree of freedom to adjust when 
attempting to reach inversion. The ability to create 'unoccupied' (hole) and/or 'occupied' (electron) states allows 
for 'chemical' pumping in addition to any other scheme designed. The p-n junction laser is a perfect example of 
this exploitation, ( figure C2.15.8 ). Here a p-type material, (excess 'holes') is joined with a n-type material, (excess 
electrons), figure C2. 15.8(a) . At the junction a 'depletion layer' is created by the internal electric fields in the 
material which results in potential barriers that spatially localize each of the carriers, figure C2. 15.8(b) . When the 
structure is positively biased these barriers are lowered, allowing charge injection into the depleted region resulting 
in radiative recombination ( figure C2. 15.8(c) ). This electric pumping process is extremely efficient and results in 
the low-current, high-output devices that are common today. 
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Figure C2.15.7. Generic band diagrams: insulator, metal, semimetal, and semiconductor. 

We do not have space, nor is it appropriate to review all laser types and modes of operation, (the references 
included will afford the reader ample opportunity to survey the field). For reference we include a table giving an 
overview of the common laser types and their modes of operation ( table C2.15.1 ). In general, pulsed laser output 
results from ^-switching or mode locking the devices. In both of these cases the kinetics of the optical system and 
the configuration of the optical resonator define the modulation frequency limits. In a g-switched system a 
controllable loss is introduced into the resonator, allowing the steady-state population inversion to reach a level far 
above that achieved by conventional pumping. When this additional loss is removed the system begins oscillations 
at a point well above the threshold, lasing occurs with a rapid depletion of the gain, which eventually turns off the 
oscillation. The pulses generated have widths typically in the range of tens to hundreds of nanoseconds, with 
repetition rates of 10 4 - 10 5 pulses s _1 . 
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Figure C2.15.8. The p-n junction: (a) p-type and n-type materials, (b) depletion layer formation at the p-n 
interface or 'junction' and (c) p-n junction laser action. 
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Table C2.15.1 Common laser sources (s denotes solid-state lasers and g denotes gaseous lasers). 



Lasing 

Efficiency rj 

Mode of 

Typical output 

Laser type 

wavelength X 

(%) 

operation 

power 

ArF excimer (g) 

193 nm 

1 

Pulsed 

500 mJ 

KrF excimer (g) 

248 nm 

1 

Pulsed 

500 mJ 

He-Cd (g) 

442 nm 

0.1 

CW 

10 mW 

Ar + (g) 

514 nm 

0.05 

CW 

10W 

He-Ne 

633 nm 

0.05 

CW 

1 mW 


Kr + (g) 


647 nm 

0.01 

CW 

500 mW 

Dye laser (R6G) 


550-650 nm 

0.005 

CW or pulsed 

100 mW 

Ruby (Cr 3+ ) (s) 


694 nm 

0.1 

Pulsed 

5J 

Ti 3+ :AI 2 3 (s) 


650-1180 nm 

0.01 

CW 

1-10 w 

Nd 3+ glass (s) 


1064 nm 

1 

Pulsed 

10-50 J 

Nd 3+ :YAG (s) 


1064 nm 

0.5 

CW or pulsed 

10-30W 

KF colour centre 

(s) 

1.25-1.45 urn 0.005 

CW 

500 mW 

co 2 


10.6 urn 

10-20 

CW 

100 W 


Another method for producing pulsed laser output is longitudinal 'mode locking'. Here, the natural longitudinal 
modes of a laser resonator are phase locked, resulting in wavepacket formation. This method takes advantage of the 
coupled gain characteristics of laser modes in optical resonators. It is the ability to dynamically control the phase 
relationships between the lasing modes in the cavity that makes this phenomenon possible. The width of this pulse 
is determined by the gain bandwidth, with the limits defined by the uncertainty principle. Mode locking is actually 
achieved by the intra-cavity modulation of the optical gain of the laser at the round trip frequency of the resonator. 
This frequency is c/2l, where c is the speed of light and / is the cavity length of the laser. This modulation induces 
sidebands which couple the gain of adjacent cavity modes. 

The importance of laser light, in brief, is that its base characteristics, coherence, spectral and polarization purity, 
and high brilliance allow us to manipulate its properties. Gain switching [1, 10] and mode locking [16] are prime 
examples of our ability to very specifically control the laser output. It is easy to see why lasers are the ideal sources 
for optoelectronic applications. 
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C2.15.4 NONLINEAR OPTICS 

The high-field output of laser devices allows for a wide variety of 'nonlinear' interactions [ 17 ] between the 
radiation field and the matter. Many of the initial relationships can be derived using engineering principles by 
simply expanding the media polarizability in a Taylor series in powers of the electric field: 

P =£0XE+ ' J NL (C2.15.27) 

where 

PnL = 2X^E 2 + 4x^^ + -". (C2.15.28) 

A wide variety of useful phenomena that allow the manipulation of the wavelength, amplitude and phase of the 

optical fields are mediated by %, the first- and higher-order susceptibilities. In essence, the y} n > represent the 
complex interactions of the electric fields with the nonlinear media. They determine the explicit interaction of the 
quantum mechanical system (the propagation medium) with the quantized radiation field. Our engineering 
approach is a precursor to the well known semi-classical approach, where the radiation field is treated classically 
and the media quantum mechanically [18]. Assuming the electric dipole interaction represents the dominant 
contribution to the interaction Hamiltonian, the macroscopic polarization of the material of interest is the 
expectation value of the dipole matrix element scaled by the volume, 

(C2.15.29) 


AV 

where the interaction Hamiltonian is 

Ht = -JE{rJ) (C2.15.30) 

(here E(r,t) represents all external driving fields). This quantity incorporates all matter-field interactions. A 
perturbative expansion of this system allows correlation of the terms of equation (C2. 15.29) to the nth order, with 
the terms generated from the simple Taylor series expansion described initially. This analysis gives an 
atomic/molecular basis for the susceptibilities, allowing greater insight to the nonlinear processes observed. An 
excellent treatment of this analysis is found in [18]. For our purposes the simple view that the intense fields drive a 
nonlinear oscillation of the polarizable electronic states of the materials is sufficient. 

We will obtain a flavour of the nonlinear phenomena by exploring the processes generated via the matter-field 
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interactions to second order. The polarization to second order in the electric field is 


><2>, -. £ * 


(C2.15.31) 


/+ 
■a 


where the various contributions arise through the permutation of the indices j and k. The most encountered spectral 

component of y}> arises from a two-photon coupling of a single electric field, resulting in the generation of a 
second harmonic of the field. Second harmonic generation (SHG) [19], discovered soon after the laser, is an 
essential wavelength conversion tool utilized throughout laser physics and engineering. The relevant terms in the 
polarization are of the form 


if'k z) = so 52*£5*fa : 2w ' -a^/* 2 ^ z)E* k (co f z) 


(C2.15.32) 


and 


F, (2) (2w, z) = E ^ x!%(2w - &< to)Bj(w, z)E*(a>. -). (C2.15.33) 

.' i i- 

/*/ 'will of course be the source term in the wave equation. It is clear that for SHG the generated polarization 
scales as \E(co)\ . In general, the intensity scales with the incident power per cross sectional area. To maximize the 
SHG output, the interaction length of the co and 2co fields needs to be as long as possible, or power begins to be 
converted back to co . In most materials, dispersion and diffraction effects limit the conversion efficiency. A wide 
variety of techniques have been developed to solve this problem. The most widespread is the use of uniaxial 
nonlinear crystals for wavelength conversion. In these systems (which have orientation dependent indices of 
refraction) the crystals are cut such that the propagating second harmonic and fundamental wavelength traverse the 
media at the same speed, thus resulting in optimal conversion. This is commonly known as phase matching. (A 
detailed analysis of the phase matching arrangements can be found in [20].) In addition, several waveguide and 
fibre SHG devices have been developed [21]. Another second-order process of great utility is optical rectification 
[ 22 ] (the coupling and generation of DC fields using an optical field). An outgrowth of this process is the 'electro- 
optic effect', which allows the manipulation of optical radiation with strong DC fields in appropriate media. In 


these materials, the change in index can be written as 

y\2) | 

A;i * — E(0) = — u*?E(0} (C2.15.34) 

n 2 

where r is the Pockels coefficient. This effect allows for electric field induced phase shifting of the optical fields. 
The nonlinear process allows the direct, rapid, and efficient coupling of electrical RF signals to high-frequency 
optical fields, making the Pockels effect an essential tool for high-frequency modulation. 
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C2.15.5 OPTICAL LIGHT GUIDES 

Traditionally, light signals are directed using lenses, mirrors, and optical prisms. This free-space guiding of light, 
although exceedingly useful, is not very robust for optical transmission over long distances. The push to transmit 
'information' optically drove the development of optical conduits to transmit light signals from place to place. The 
development of guided- wave optics has been key to the advances of optical communication, leading to the 
optoelectronics revolution. In addition, the concentration of intense radiation afforded by these guides has impacted 
every area from laser design to nonlinear optical devices and has opened the door for the development of integrated 
optical devices. In this section we will outline the basics of guided- wave optics. As in [1], we will start with the 
planar-mirror waveguide as an introduction to the essential concepts, and then move on to dielectric structures. 

Consider a light conduit constructed using two planes of parallel mirrored surfaces ( figure C2.15.9 ). Assuming the 
mirrors are lossless, a light ray at an appropriate angle will propagate along the conduit axis, reflecting without 
loss of energy. (Note that waveguides of this type are not made in practice due to the difficulty of fabricating 
mirrors with low enough losses.) Consider launching a monochromatic wave into the guide with wavelength X = 
X^/n (where n is the refractive index between the plates). Given our basic assumption of lossless surfaces, the guide 
constrains the propagating modes to those that maintain the same transverse distribution at all distances along the 
waveguide axis. To fulfill this requirement any launched wave must remain unchanged after two reflections. In 
addition there are a limited number of angles which satisfy these conditions for this system, and they must satisfy 
the relationship 

sinOU = l "^7 (C2.15.35) 

where m = 1, 2, 3, . . .. The guided-wave modes are composed of two distinct plane waves at ±0. We define the 
propagation constant of the m th mode as 

There is a maximum number of modes possible, defined by the range of accessible angles. For sine < 1, the 
maximum allowed value of modes is the greatest integer smaller than l/(k/2d) or 

M = — . (C2.15.37) 

/•. 

As shown, the number of modes increases with increasing mirror separation and decreasing wavelength. If 2d/X is 
less than one, M= and no self-consistent modes are supported. 2d represents the cut-off wavelength of the guide. 
It is the longest wavelength supported by the guide. It is clear that if the spacing is adjusted properly, M can be set 
to one and only a single mode will be supported. For completeness, we note TM (transverse magnetic) and TE 
(transverse electric) mode distributions define the direction of E of the propagating field as shown in figure 


C2.15.10. 
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Figure C2.15.9. The planar-mirror light guide. 
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Figure C2. 15.10. Orientation of the TE and TM modes. 


C2.15.6 THE DIELECTRIC WAVEGUIDE 

Optical conduits as described above are generally not practical. The most common waveguide is the slab dielectric 
waveguide. In these devices, a high-transmission material is surrounded by a media of a lower refractive index. The 
light is guided into the device by total internal reflection. The basic structure is shown in figure C2.15.ll . As with 
our initial description, light rays making an angle with the z-axis experience multiple total internal reflections at 
the interfaces, provided that 9 is smaller than the complement of the critical angle. The slab boundaries define all 
of the properties of the guide. As before, rays making angles larger than the complement of the critical angle 
refract, losing a fraction of their optical power at each reflection and eventually vanishing (the unguided waves of 
figure C2.15.ll ). The detailed analysis of the waveguide modes requires a full solution of Maxwell's equations 
both inside and outside the high-index core with the appropriate boundary conditions. Such an analysis, which is 
beyond the scope of this review, can be found in [1, 23]. We will summarize the results here. As with our first 
analysis, a twice-reflected wave undergoes a phase shift that must be zero or a multiple of 2n to be self-consistent. 
The number of TE modes allowed is 


k/2d 
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(C2.15.38) 


where M is increased to the nearest integar, More generally 


M = 2 — NA 


(C2.15.39) 


where the numerical aperture na = («- — /r^) l ■■ ,,3 • The NA is the sine of the angle of acceptance of rays from air 

into the core of the slab. When X/2d > sin 9 c or (IdlX^N A < 1 the waveguide supports a single mode. The TE and 
TM modes in a dielectric planar waveguide are as shown in figure C2.15.12 . The possible modes can be 
characterized by a propagation constant, P where 


fi r „ = a ka cos 0^ 


(C2.15.40) 


One can of course fabricate two-dimensional waveguides. These devices confine light in two transverse directions 
(x and y). An important example of two-dimensional waveguides is the optical fibre, which we will treat directly. 
Generally, two-dimensional waveguides are of the channel variety. An array of two-dimensional waveguide 
geometries is shown in figure C2.15.13 . So far, we have not considered modal interactions in guides, that is, the 
coupling of light from one mode to another, or the energy transfer between modes. The mode coupling of light is 
an important tool in optoelectronics. Although a full treatment of this process is beyond the scope of this chapter, 
we will describe one relatively simple device as an example and leave it to the reader to survey the references for 
greater detail on a wide variety of structures. It should be stated that the design and fabrication of these devices is 
an exciting area of current research. 
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Figure C2. 15.11. The dielectric waveguide. 


-18- 


^L 


tW 


<a: 


v^ 


^ 


i::i 


Figure C2. 15.12. (a) TE and (b) TM modes for the dielectric planar waveguide. 
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Figure C2. 15.13. Two-dimensional waveguide configurations, the darker shading indicates the different indexes. 

Our model device will be the directional coupler ( figure C2.15.14 ). The basic function of the structure is to couple 
two optical inputs to two optical outputs. Consider two single-mode waveguides on the same substrate, as shown in 
the figure. Using a simple-coupled mode model, if the guides are non-interacting, the amplitudes of the input fields 
as a function of propagation distance are 

— - = -ifiAiiz) (C2.15.41a) 

dz 

dA, 

—^ = -1JM2U) (C2.15.41/)) 

with solutions 

J 4rU) = ^(0)C P ^ : (C2.15.42) 

where i = 1 and 2 respectively. (Note that the modes propagate unchanged.) If the two guides are brought close 
enough together, the evanescent fields of the two modes interact, allowing energy exchange. One can define a 
coupling constant k that characterizes the perturbative interaction of the modes. Under these conditions equation 
(C2. 15.41a) and equation (C2.15.41b) become 
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= -i/JA| (:) - IkAAz) (c2.15.43a) 

d- 

dAi 

—f = -iftA^ (z) - i* A| (z)- (C2.15.435) 

If the guides are lossless (here we ignore bending losses) and we only launch a field into guide 1 (i.e. A^ = 1, A 2 = 
0) 

A]{z) = COS(ftTZ) (C2.1 5.44a) 

A 2 (:.) = sin(jti). (C2.15.445) 

The optical power \A f \ oscillates between the guides depending upon the propagation distance. Clearly, by 
controlling z, p and k, a wide variety of passive devices (beam splitters, combiners, attenuators, and 
interferometers) can be readily constructed. 
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Figure C2. 15.14. The directional coupler. 
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C2.15.7 THE OPTICAL FIBRE 

The optical fibre [4, 24] is the most extensively used optical waveguide. The cylindrical step index structure is 
composed of an ultra-pure, extremely-low-loss, high-index core surrounded by a slightly lower index cladding. The 
fibre properties are characterized by the relative diameters of the cladding and the core, lallb (where a and b are 
the core and cladding radii), and the delta of the two materials ( figure C2.15.15 ). The delta, A, is defined as 


A = 


H[ — 1*2 


"l 


(C2.15.45) 


where n^ and « 2 are the core and cladding indexes of refraction, respectively. (Typical deltas range from 0.001 to 
0.02.) An optical field is guided in the fibre core just as in the dielectric waveguide via the total internal reflection 
at the core-cladding boundary. Again, rays propagate if their angle of incidence is less than the complement of the 
critical angle. The fibre input is defined in terms of the numerical aperture NA, with 

a = Sill 1 NA (C2.15.46) 

where 9 a is the fibre acceptance angle. The NA can also be written in terms of the A of the fibre: 

NA = (nj - ttl) l/2 2 n t (2 A) l/: , (C2.15.47) 


Another important characterization parameter for fibres is the normalized frequency V: 

V = k^iin] - ;i;) 1/: = k i} N A. (C2.15.48) 

Note that here k n = 2n/X and a is the core radius. The parameter V determines the number of modes supported by 


the fibre design, therefore defining the cut-off frequency of the fibre. For V< 2.405, only a single mode is 
supported. Although written here very simply, the V parameter is exactly derived from solution of the complex 
eigenvalue problem of the weakly guiding fibre [25]. A full solution of the Helmholtz equation representation of 
the Maxwell's equations in cylindrical coordinates must be obtained in the core and cladding, with the appropriate 
boundary conditions. The Bessel function solutions of this problem give rise to the characteristic equations for V. In 
practice, many of the solutions are obtained graphically. For large V there are a large number of roots to the 
characteristic equations, allowing for a large number of propagating modes. In this limit, the number of modes is 


4 , 


(C2.15.49) 


for V 3*1 . Under these conditions the propagation constant can be found to be 

_ r (/ + 2ipf) 3 1 


(C2.15.50) 
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and the group velocity of the (/, m) mode is 


■»■•-'■[' m— a \ 


(C2.15.51) 


This spread in velocity is called 'modal dispersion' and is the principle limit to the use of multimode fibres for 
long-distance transmissive applications. 
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Figure C2. 15.15. The structural profile of a step index fibre. 

As described above for small a and NA, a fibre is single mode if V< 2.405. Here only one mode, with one group 
velocity, is possible. This lack of 'modal dispersion' is why single-mode fibre dominates transport media in long- 
haul communication systems. 

Graded index structures allow greater control over fibre characteristics. In these structures, the core has a variable 


refractive index, highest in the centre and gradually decreasing until it reaches that of the cladding at the core- 
cladding interface. The result is that the phase velocity gradually increases with spatial position in the core. 
Properly designed structures can greatly minimize the differences in the group velocity of the fibre modes, limiting 
modal dispersion. 

For optical transmission, the parameters of greatest importance are attenuation (i.e. loss) and 'material' dispersion. 
In effect they define the limits of the optical communication system. Loss, due to absorption and scattering, limits 
the lengths between the transmission nodes. In transmission quality fibre, the loss is in units of decibels per 
kilometre. 
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(Attenuation of less than 0.2 dB km is common for telecom quality fibre.) Since modal dispersion can be greatly 
mitigated by fibre design, real material dispersion is of greatest consequence. Silica is a dispersive media. There is 
a wavelength dependence to the index of refraction, as a result an optical pulse of finite bandwidth, 8X, spreads as it 
propagates along the core axis. This spread limits the spacing of successive pulses and hence the maximum 
transmission frequency. Figure C2.15.16 shows a plot of the dispersion for silica as a function of wavelength 
within the transmission window. Because the zero dispersion point is at approximately 1.31 urn, this wavelength 
has become one of the base telecom transmission bands. The other key telecom transmission wavelength, 1.54 urn, 
is near the loss minimum of the fibre. 
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Figure C2. 15.16. Wavelength dependent loss (upper) and dispersion (lower) in a silica fibre. 

Nonlinear dispersion becomes relevant at sufficient pulse powers. In some fibre structures the interplay between 
the nonlinear dispersion and the group velocity dispersion can be used to produce non-dispersive waves called 
solitons. Solitons, although beyond the scope of this treatment, may revolutionize the communication systems of 
the future. A full treatment of soliton theory can be found in [4, 26 ]. 


C2.15.8 OPTICAL MODULATION AND DETECTION 


We will complete our survey of optoelectronics with a brief discussion of optical modulation and optical detection. 
These two categories of devices are important because they define the speed of the optoelectric link. The ability to 
generate and detect high-frequency signals determines the ultimate limits on any optical circuit. Transmission 
modulation can be accomplished two ways: (a) direct modulation, switching of an optical source and (b) external 
'shuttering' of a CW optical source. Although both are adequate, the latter allows for greater flexibility and 


diversity of device structure. 
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The most useful direct modulation technique is the current gain switching of semiconductor laser devices. This 
technique is unique to semiconductor sources, nearly all other lasers are modulated externally. In these devices the 
excitation current of the laser is modulated, resulting in modulated gain and therefore modulated output power. A 
detailed analysis of this process is found in [27]. Simply put, an oscillating current of the form 

/ = / y + f ffJ c'**' (C2.15.52) 

is applied to the device. Since the laser output power is 

P Q ^ nlI - llh) (C2.15.53) 

*•> 

there will be a modulation in the output power. Careful laser design can result in modulation frequencies in the 
hundreds of megahertz to gigahertz range. In general, the limitation of the high-frequency response of these laser 
devices is due to the broading of linewidth and spectral output due to a change of the material absorption, gain and 
index of refraction as a function of the carrier density. This 'chirping' of the laser output bounds the modulation 
frequencies possible. To surpass this limitation external modulation of CW lasers is employed, allowing 
modulation frequencies greater than 10 GHz. 

A wide variety of external modulators are used in practice. Electro-optic modulators can produce amplitude, 
frequency, or phase modulation utilizing the Pockels effect (mentioned when we studied nonlinear optical 
phenomena above). By polarizing the input wave and setting the electro-optic device between crossed polarizers. A 
controllable bias on the modulator determines the optical phase shift induced on the optical beam and therefore the 
output. Waveguide electro-optic devices are also of great interest. Single-mode waveguide devices, such as 
couplers and interferometers, were mentioned earlier and are becoming of great importance in the communications 
industry. Waveguides fabricated using electro-optic materials such as LiNb0 3 can be made into active devices. The 
Pockels effect allows dynamic index switching, enabling the modulation of an optical input. A diagram of an 
optical Mach-Zender modulator is shown in figure C2.15.17 . In addition, external electroabsorption has achieved 
wide success in communication links. Here the process that limits the direct modulation of a semiconductor laser, 
i.e. the change (shift) of the absorption spectrum with carrier density, is used for high-speed modulation. A 
beneficial side effect is that fully integrated laser/modulator packages can be fabricated. 
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Figure C2. 15.17. The Mach-Zender modulator. A 3 dB coupler splits the input wave into the two arms of the 
device. The output 3 dB combiner recombines half the wave with its phase-shifted counterpart. By adjusting Vq the 
output transmission can be rapidly modulated. 


Optical detectors generally fall into two broad categories: thermal and photoelectric detectors. Thermal devices 
operate by converting the photon detected into heat with the heat evolution used to quantify the emissive output. 
Photoelectric devices convert the photons directly to electrons or other charge carriers, with the subsequent current 
proportional to the photon input. In general thermal devices are slow and not easily integrated into complex 
devices. The dominant detectors for optoelectronic applications are of the photoelectric variety. Again, two general 
categories of devices are prevalent: those that rely on the photoelectric effect and those that are based on 
photoconductivity. Photodetectors, based on the photoelectric effect, are known as phototubes. In these devices a 
photoemissive material is configured as a cathode. The input radiation induces the ejection of a photoelectron from 
the cathode, which is accelerated towards the anode (which is at a higher electric potential), generating a current. 
Often secondary emission devices (called dynodes) are used to produce a large amplification of the photocurrent. 
These dynodes are electron multipliers with successive stages, resulting in large current gains. Phototubes are 
commonly used as high-gain (10 ), high-sensitivity detectors for a wide variety of applications. 

For optoelectronic applications photoconductive devices are more common. Semiconductor photoconductives tend 
to be inexpensive and can be easily integrated with other components. In these devices photo-illumination above 
the bandgap of the material generates charge carriers, increasing the conductivity of the semiconductor and 
resulting in a photocurrent. For example in a p-n junction device, photons absorbed in the depletion layer generate 
electrons and holes under the influence of an electric field, this directly generates the photocurrent described. If a 
large reversed bias is placed across the junction, the large field produced can accelerate photogenerated carriers 
with enough kinetic energy to excite additional carriers by impact ionization. This 'avalanche' effect results in a 
dramatic improvement in detector sensitivity. Avalanche photodiodes have become critical devices for many 
photonic applications. 


C2.15.9 OPTICAL COMMUNICATIONS 

The primary driver for the expansion of optoelectronic technologies is optical communications [2]. It was realized 
in the second-half of the 20th century that an increase of several orders of magnitude in bandwidth would be 
possible if optical waves were used as the carrier for telephone signals. The basic configuration of an optical 
communication 


-25- 


system is shown in figure C2.15.18. All the components described in this review are used, and, in many cases, were 
developed to complete the optical link. The primary application that drove optoelectronics research through the 
1980s and early 1990s was long-distance business; the goal being to send gigabit data rate information efficiently 
over very long distances (thousands of kilometres). The need to go farther and faster led to the development of 
extremely-low-loss optical fibre, ultra-high-speed modulators and high-sensitivity avalanche photodetectors. This 
important innovation enabled all-optical transmission. The development of the optical fibre amplifier 
revolutionized optical communication, allowing digital optical signals to be transmitted vast distances before being 
converted back to electronic pulses, therefore greatly simplifying long-haul links. (These devices amplify the 
optical data signals via stimulated emission.) 
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Figure C2. 15.18. The basic components of an optical communications link. 

During the last 5-10 years optical networks of all varieties have become a very important part of the World's 
communications infrastructure for data transmission. The new focus has become the transmission of multi-gigabit 
information over moderate distances at low cost. These requirements are profoundly affecting the design criteria for 
optoelectronics devices and components. In addition, coarse and dense wavelength division multiplexed systems 
are being developed to increase bandwidth. The essence of wavelength division multiplexing is the simultaneous 
transmission of many optical data channels each at a different wavelength on a single optical fibre. Systems with as 
many as 40, 1-10 Gbit optical channels are becoming commercially available. Low-cost components will be the 
key factor in determining how rapidly this technology will be deployed throughout the communications 
infrastructure. 
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C2.15.10 CONCLUSION 

In this chapter we have laid out much of the underlying basis for a wide variety of optoelectronic devices and 
structures. I hope that it is clear from the text that optoelectronics is optical physics in action. While optical 
technologies are beginning to mature, much new work is needed to provide new sources and devices for next- 
generation photonic applications. The introduction of organic electronic devices [ 28 ] and novel photonic bandgap 
materials [ 29 ] will no doubt add fuel to the fire. I expect these technologies to redefine computing and 
communication well into the 2 1 st century. 
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C2.16 Semiconductors 

Henryk Temkin and Stefan K Estreicher 


I think there is a world market for maybe five computers. 

Thomas Watson Sr, IBM Chairman, 1943 

The history of semiconductor devices can be traced back to the paper of Braun, published in 1874, describing 
rectifying behavior of a contact [1]. However, for many years semiconductors were considered too difficult a 
subject and the science of semiconductors began only during World War II. 


The physics of semiconductors was understood rather quickly but the materials were far too poor for practical 
applications. The first high-purity semiconductor grown in large quantities was Ge. Si emerged only in the mid 
1950s as the semiconductor of choice. Compound semiconductors, such as GaAs, began to play a role in the mid- 
1970s. These developments are presented in a number of recent review articles [2, 3, 4 and 5]. 

Since the early days of this field, the scientific and technological advances have been chasing each other. Advances 
in technology enabled implementation of new ideas, which in turn suggested new applications. In the last 50 years 
this process has resulted in a series of unprecedented advances that have transformed our society. There is, as of 
yet, no sign of a slowdown. 

The present article reviews basic concepts of semiconductor physics and devices with emphasis on current 
problems. Further details can be found in the references. 


C2.16.1 INTRODUCTION 

Semiconductors are a class of materials whose conductivity, while highly pure, varies with temperature as exp (- 
E Jk B T), where E is the size of a forbidden energy gap. The conductivity of semiconductors can be made to vary 
over orders of magnitude by doping, the intentional introduction of appropriate impurities. The range in which the 
conductivity of Si can be made to vary is compared to that of typical insulators and metals in figure C2.16.1 . 


In an intrinsic semiconductor, the conductivity is limited by the thermal excitation of electrons from a filled 

valence band (VB) into an empty conduction band (CB), across a forbidden energy gap of width E The process 

_ " 8 

leaves holes (h ) in the VB and electrons (e ) in the CB, and both of these charge carriers participate in the 
conduction. 


In an extrinsic semiconductor, the conductivity is dominated by the e (or h + ) in the CB (or VB) provided by 
shallow donors (or acceptors). If the dominant charge carriers are negative (electrons), the material is called n type. 
If the conduction is dominated by holes (positive charge carriers), the material is called p type. 


io 


J* 


n' 


n 


10 - 

io 6 - 

10°. 

io-*- 

10*- 

1d*- 


-2- 

Qomond 
MckelOxrde 

©toss 



Intrinsic 
idonorinlO Sotonra 

Silicon 

l*„ I donor In 10 'Slalom* 

heart? doped 


.Gold 


Copper 


Figure C2.16.1. A nomogram comparing electrical resistivity of pure (intrinsic) and doped Si with metals and 
insulators. 


C2.1 6.2 MATERIALS 

There are hundreds of semiconductor materials, but silicon alone accounts for the overwhelming majority of the 
applications world-wide today. The families of semiconductor materials include tetrahedrally coordinated and 
mostly covalent solids such as group IV elemental semiconductors and III-V, II- VI and I- VII compounds, and 
their ternary and quaternary alloys, as well as more exotic materials such as the adamantine, non-adamantine and 
organic semiconductors. Only the key features of some of these materials will be mentioned here. For a more 
complete description, the reader is referred to specialized publications [6, 7, 8 and 9]. 

C2.1 6.2.1 ELEMENTAL SEMICONDUCTORS 

The 'group IV semiconductor materials are fourfold coordinated covalent solids from elements in column IV of 
the periodic table. The elemental semiconductors are diamond, silicon and germanium. They crystallize in the 
diamond lattice. 

Diamond may never be used to make devices because it is nearly impossible to make it sufficiently n type, that is to 
obtain high electron concentration. Substitutional B is a good shallow acceptor, and interstitial Li has been reported 
to produce some n type conductivity. 

Silicon is used in many forms, from high-purity thin films to bulk material, which may be crystalline, multi- or 
polycrystalline and amorphous (usually hydrogenated). Silicon is the material discussed the most in this article. 
Substitutional B and P are the most common (of many) shallow acceptors and donors, respectively. 
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Germanium is very similar to Si, but its band gap is too small for many practical applications. Large crystals of 
ultra-high-purity Ge have been grown for use as gamma-ray detectors. In such crystals, the net concentration of 

electrically active centres is incredibly low, of the order of 10 12 cm -3 . Isotopically pure Ge crystals have been 
grown as well [10]. 


C2.1 6.2.2 COMPOUND SEMICONDUCTORS 


There is a great number of mostly covalent and tetrahedral binary IV-IV, III-V, II-VI and I- VII semiconductors. 
Most crystallize in the zincblende structure, but some prefer the wurtzite structure, notably GaN [H, 12]. While the 
bonding in all of these compounds (and their alloys) is mostly covalent, some ionic character is always present 
because of the difference in electron affinity of the constituent atoms. 

C2.1 6.2.3 IV- IVS 

The ionic character of compounds of C, Si, and Ge [ 13 ] ranges from a few percent (SiGe) to as much as 16% (SiC). 
In addition to compounds, many alloys can be grown, such as C x SiGe 1 , where x is of the order of 0.02 to 0.04. 
Compounds such as Si x Ge 1 _ x are used because their gap can be made to vary from the Ge to the Si value. 

Compounds and alloys of group IV elements normally have the zincblende structure. A notable exception is SiC 
which can crystallize in hundreds of polytypes that differ in the way the Si-C units are stacked along the c-axis of 
the crystal. For example, the zincblende structure (3C) has the sequence ABC-ABC-. . . with cubic symmetry, and 
the wurtzite structure (2H) is hexagonal with the sequence AB-AB-... (figure C2.16.2). The lowest-energy 
structure of SiC is 6H, with sequence ABCACB-ABCACB-.... In all these polytypes, each atom is fourfold 
coordinated and makes (almost or exactly) tetrahedral angles with its neighbours. In addition to the cubic and 
hexagonal polytypes, many other structures of SiC exist with rhombohedral or trigonal symmetries. 



Figure C2.16.2. The sequence of atoms in the two polytypes of SiC, zincblende and wurtzite, along the c-direction. 
The zincblende lattice has perfect tetrahedral angles. 


C2.16.2.4III-VS 
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III-V compound semiconductors with precisely controlled compositions and gaps can be prepared from several 
material systems. Representative III-V compounds are shown in the gap-lattice constant plots of figure C2.16.3. 
The points representing binary semiconductors such as GaAs or InP are joined by lines indicating ternary and 
quaternary alloys. The special nature of the binary compounds arises from their availability as the substrate 
material needed for epitaxial growth of device structures. 


Figure C2.16.3. A plot of the energy gap and lattice constant for the most common III-V compound 
semiconductors. All the materials shown have cubic (zincblende) structure. Elemental semiconductors, Si and Ge, 
are included for comparison. The lines connecting binary semiconductors indicate possible ternary compounds with 
direct gaps. Dashed lines near GaP represent indirect gap regions. The line from InP to a point marked * represents 
the quaternary compound InGaAsP, lattice matched to InP. 
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Figure C2.16.4. A plot of the energy gap and lattice constant for large-gap nitrides. These materials have wurtzite 
structure. 
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Ternary and quaternary semiconductors are theoretically described by the virtual crystal approximation (VCA) [7]. 
Within the VCA, ternary alloys with the composition AB 1 _ X C % are considered to contain two sublattices. One of 
them is occupied only by atoms A, the other is occupied by atoms B or C. The second sublattice consists of virtual 
atoms, represented by a weighted average of atoms B and C. Many physical properties of ternary alloys are then 
expressed as weighted linear combinations of the corresponding properties of the two binary compounds. For 
example, the lattice constant d dependence on composition is written as: 

d = (] -*)</*& +-v</*c. 

This approximation, known as Vegard's law, accurately describes the average lattice constant (but not the 
microscopic structure!) of most ternary compounds. However, the expression for the gap must be modified by the 
inclusion of a quadratic term 

where the bowing coefficient Q is positive. The term -x (1 - x) is due to the random distribution of atoms B and C 
within their sublattice. It represents the probability of finding sequences of atoms B-A-C, or C-A-B, in the 
random alloy [14]. 

Some semiconductors with compositions close to AB Q 5 C Q 5 are known to become ordered. This results in changes 
in the gap, and electrical and optical properties, compared to random alloys of the same composition. 

Two of the material systems shown in figure C2.16.3 are of particular importance. These are the ternary 
compounds formed from group III elements such as Al and Ga in combination with As and quaternary compounds 
formed from Ga and In in combination with As and P [8, 15 J. Ternary Al Ga, _ As grown on GaAs is the best 

J J J i J J If 

known of the general class of compounds AJ"B|" C . Quaternary Ga^Irij _ x As 1 _ P grown on InP is 
representative of the general class A^'Bf'^Cj. Df ,. The lattice constants, gaps, indices of refraction and most other 
parameters of these materials depend on their composition x and y. 

Al jc Ga 1 _ x As grown on GaAs is used for the preparation of light-emitting diodes (LEDs), injection lasers and 
bipolar transistors. The lattice constants of GaAs (0.565 nm) and AlAs (0.566 nm) are almost identical. Aluminum 
atoms can be substituted for Ga atoms in the GaAs lattice to form Al Gaj _ x As without significant change in the 
lattice constant. It is thus possible to vary the gap from E (GaAs) = 1.43 eV to E (AlAs) = 2.16 eV simply by 
adjusting the Al fraction x in the epitaxial layer. This feature of Al^Ga 1 _ x As is quite unique among the compound 
semiconductors. 

An even wider range of gaps can be reached with lattice-mismatched structures of Ga x In 1 _ x As grown on GaAs. 
This ternary system is shown in figure C2.16.3 by the line joining GaAs and InAs. The thickness of defect-free 
layers of Ga jf In 1 _ x As is limited by the biaxial compressive strain arising from the lattice mismatch with the 
substrate. Structures based on Ga^In 1 _ x As are thus implemented in the form of quantum wells, typically less than 
1 nm thick. 

The usual acceptor and donor dopants for Al^Ga 1 _ x As compounds are elements from groups II, IV and VI of the 
periodic table. Group II elements are acceptors and group VI elements are donors. Depending on the growth 
conditions, Si and Ge can be either donors or acceptor, i.e. amphoteric. This is of special interest in LEDs. 
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Quaternary Ga In, _ As, _ P grown on InP is of major importance to fibre-optic communications. In quaternary 

y y 

compounds, both the gap and the lattice constant can be tailored by changing the chemical composition. In thick 
layers, in order to avoid the generation of strain-induced defects, care must be taken in adjusting the ratio of x and y 
to maintain the lattice-matched composition x = 2.2y. The available gaps range from 1.34 eV in InP to -0.75 eV in 


lattice-matched In ^Ga^ 47 As. A particularly interesting feature of this system is the high reliability of LEDs and 
diode lasers [16]. 

While quaternary layers and structures can be exactly lattice matched to the InP substrate, strain is often used to 
alter the gap or carrier transport properties. In Ga In, _ As or Ga In, _ As, _ P grown on InP, strain can be 

JL x JL JL x JL x y y 

introduced by moving away from the lattice-matched composition. In sufficiently thin layers, strain is 
accommodated elastically, without any change in the in-plane lattice constant. In this material, strain can be either 
compressive, with the lattice constant of the layer trying to be larger than that of the substrate, or tensile. 

C2.1 6.2.5 III- V NITRIDES 

Figure C2.16.2 shows the gap-lattice constant plots for the III-V nitrides. These compounds can have either the 
wurtzite or zincblende structures, with the wurtzite polytype having the most interesting device applications. The 
large gaps of these materials make them particularly useful in the preparation of LEDs and diode lasers emitting in 
the blue part of the visible spectrum. Unlike the smaller-gap III-V compounds illustrated in figure C2.16.3 single 
crystals of the nitride binaries of AIN, GaN and InN can be prepared only in very small sizes, too small for 
epitaxial growth of device structures. Substrate materials such as sapphire and SiC are used instead. 

There is also a possibility of preparing mixed III-V nitride alloys, e.g. GaAs 1 _ N , connecting the two sets of 
semiconductor materials. Their gap dependence on composition is the subject of active research. 


C2.16.3 GENERAL PROPERTIES OF SEMICONDUCTORS 

C2.1 6.3.1 ENERGY BANDS 

The optical and electrical characteristics of semiconductors are conveniently described by energy level diagrams 
[ 17 , 18 , 19 and 20]. Electrons in atoms are restricted to sets of discrete energy states, separated by gaps in which 
electrons are not allowed. In solids, formed by bringing isolated atoms close together, the allowed energy levels of 
discrete atoms spread into essentially continuous energy bands. Two such bands are of particular importance in 
semiconductors: the highest-lying filled VB and the lowest-lying empty CB. The VB and CB are separated by the 
energy gap E . 

In a defect- free, undoped, semiconductor, there are no energy states within the gap. At T= K, all of the VB states 
are occupied by electrons and all of the CB states are empty, resulting in zero conductivity. The thermal excitation 
of electrons across the gap becomes possible at T> and a net electron concentration in the CB is established. The 
electrons excited into the CB leave empty states in the VB. These holes behave like positively charged electrons. 
Both the electrons in the CB and holes in the VB participate in the electrical conductivity. 

Calculated plots of energy bands as a function of wavevector k, known as band diagrams, are shown in figure 
C2.16.5 for Si and GaAs. Semiconductors can be divided into materials with indirect and direct gaps. In direct-gap 
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semiconductors (represented by GaAs) the minimum energy in the CB and the maximum in the VB occur for the 
same value of k, namely k=0, the T point. This is not the case in indirect materials (represented by Si) in which the 
maximum of the VB occurs at k = but the minimum of the CB at k ^ 0. This difference has profound 
consequences for the rates of electron-hole recombination across the gap. 
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Figure C2.16.5. Calculated plots of energy bands as a function of wavevector k, known as band diagrams, for Si 
and GaAs. Indirect (Si) and direct (GaAs) gaps are indicated. High-symmetry points of the Brillouin zone are 
indicated on the wavevector axis. 

All technologically important properties of semiconductors are determined by defect-associated energy levels in the 
gap. The conductivity of pure semiconductors varies as a - Qxp(-EA B T), where E is the gap. In most 
semiconductors with practical applications, the size of the gap, E ~ 1-2 eV, makes the thermal excitation of 
electrons across the gap a relatively unimportant process. The introduction of shallow states into the gap through 
doping, with either donors or acceptors, allows for large changes in conductivity ( figure C2.16.1 ). The donor and 
acceptor levels are typically a few meV below the CB and a few tens of meV above the VB, respectively. The 
depth of these levels usually scales with the size of the gap (see below). 

C2.1 6.3.2 ELECTRICAL PROPERTIES 


The application of a small external electric field E to a semiconductor results in a net average velocity component 
of the carriers (electrons or holes) called the drift velocity, v d . The coefficient of proportionality between E and v d 
is known as the carrier mobility ju. At higher fields, where the drift velocity becomes comparable to the thermal 

velocity of the carriers (which is about 10 cms in Si at room temperature), the carriers decelerate by scattering 
with charged impurities and lattice vibrations (phonons and local vibrational modes). The simple linear relationship 

between E and v d no longer applies. Thus the low-field mobility ju describes the mean free time between collisions. 
It depends strongly on the effective mass (see below) of the carriers, the purity of the semiconductor and the 
temperature. The effective mobility ju of carriers in a semiconductor reflects the contributions of various scattering 
mechanisms and is written as 1/jli = H4 att j ce 
these individual 


^imourit ' ^ e tem P eramre dependence of the mobility reflects 
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contributions. For instance, since a, ... ~ T~ and ju- 
1-1- 1 r • ^lattice r _ . _ v 


T* , most semiconductors show a peak in the 


mobility measured as a function of temperature [21]. In some of the modern semiconductor structures, it is possible 
to essentially eliminate M imDurit and thus reach very high, metallic, mobilities at low temperatures [22]. 

In terms of the carrier mobility, the electrical conductivity a of an n type semiconductor can be written as 


where n is the conduction electron density and e the electron charge. Since n is a strong (exponential) function of 
temperature, a varies with temperature both through n and ju. 

In thermal equilibrium at the temperature T, the distribution of electrons in the band is given by the Fermi-Dirac 
distribution function f(E\ — [I + v 1 *' ~ tf ^ kT ]~ \ where k is the Boltzmann constant. The function/^) describes 
the probability that a state with an energy E is occupied at the temperature T. The quantity E^ called the Fermi 
level, denotes the energy level with the occupation probability 1/2 at T= 0. At T= 0, all the available states below 
Ef are filled, f(E <E^)= 1, all the states above £ f are empty, f(E > £ f ) = 0. At T= 0, the Fermi level coincides with 
the chemical potential. 

Instead of plotting the electron distribution function in the energy band diagram, it is convenient to indicate the 
position of the Fermi level. In a semiconductor of high purity, the Fermi level is close to mid-gap. In p type (n type) 
semiconductors, it lies near the VB (CB). In very heavily doped semiconductors the Fermi level can move into 
either the CB or VB, depending on the doping type. 

The distributions of states in the CB and VB are described by the effective density of states. The concentration of 
electrons in the CB can be calculated as n = f^~f (E) N(E) d (E), where / (E) is the Fermi distribution and N (E) d 

E is the density of states between E and E + d E. A simpler way of calculating n is to represent all the electron 
states in the CB by an effective density of states N Q at the energy E (band edge). The electron density is then 
simply n=N c f (£,). 

Most of our ideas about carrier transport in semiconductors are based on the assumption of diffusive motion. When 
the electron concentration in a semiconductor is not uniform, the electrons move {diffuse) under the influence of 
concentration gradients, giving rise to an additional contribution to the current. In this motion, electrons also 
undergo collisions and their temporal and spatial distributions are described by the diffusion equation. The 

proportionality constant between the flux and the concentration gradient is called diffusivity, D (cm s ). 
Diffusivity and mobility are related by Einstein's relationship D = {kT/q)\i, where q is the carrier charge. In the 
context of diffusive motion, it is also straightforward to introduce the carrier lifetime x as the average time between 
collisions and define a diffusion length L = VTJr. This concept is particularly useful to describe minority carriers. 
The minority carrier diffusion length varies from the sub -micrometre in heavily doped semiconductors to tens of 
micrometres in high-purity materials. 

C2.1 6.3.3 RECOMBINATION 

In n type semiconductors, electrons are the majority carriers. Holes will also be present through accidental 
incorporation of acceptor impurities or, more importantly, through the intentional creation of electron-hole pairs. 
Holes in n type and electrons in p type semiconductors are minority carriers. 
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There are many ways of increasing the equilibrium carrier population of a semiconductor. Most often this is done 
by generating electron-hole pairs as, for instance, in the process of absorption of a photon with /zoo > E Under 
reasonable levels of illumination and doping, the generation of electron-hole pairs affects primarily the minority 
carrier density. However, the excess population of minority carriers is not stable: it gradually disappears through a 
variety of recombination processes in which an electron in the CB fills a hole in a VB. The excess energy E is 
released as a photon or phonons. The former case corresponds to a radiative recombination process, the latter to a 
non-radiative one. The radiative processes only rarely involve direct recombination across the gap. Usually, this 
type of process is assisted by shallow defects (impurities). Non-radiative recombination involves a defect-related 
deep level at which a carrier is trapped first, and a second transition is needed to complete the process. 

Radiative recombination of minority carriers is the most likely process in direct gap semiconductors. Since the 
carriers at the CB minimum and the VB maximum have the same momentum, very fast recombination can occur. 
The radiative recombination lifetimes in direct semiconductors are thus very short, of the order of the ns. The 
presence of deep-level defects opens up a non-radiative recombination path and further shortens the carrier 
lifetime. 


The situation is very different in indirect gap materials where phonons must be involved to conserve momentum. 
Radiative recombination is inefficient, resulting in long lifetimes. The minority carrier lifetimes in Si reach many 
ms, again in the absence of defects. It should be noted that long minority carrier lifetimes imply long diffusion 
lengths. Minority carrier lifetime can be used as a convenient quality benchmark of a semiconductor. 


C2.16.4 DEFECTS AND IMPURITIES 

Intrinsic defects (or 'native' or simply 'defects') are imperfections in the crystal itself, such as a vacancy (a missing 
host atom), a self-interstitial (an extra host atom in an otherwise perfect crystalline environment), an anti-site defect 
(in an AB compound, this means an atom of type A at a B site or vice versa) or any combination of such defects. 
Extrinsic defects (or impurities) are atoms different from host atoms, trapped in the crystal. Some impurities are 
intentionally introduced because they provide charge carriers, reduce their lifetime, prevent the propagation of 
dislocations or are otherwise needed or useful, but most impurities and defects are not desired and must be 
eliminated or at least controlled. 

The presence of defects and impurities is unavoidable. They are created during the growth or penetrate into the 
material during the processing. For example, in a crystal grown from the melt, impurities come from the crucible 
and the ambient, and are present in the source material. Depending on factors such as the pressure, the pull rate and 
temperature gradients, the crystal may be rich in vacancies or self-interstitials (and their precipitates). 

After the growth, virtually all the processing steps create defects and/or add impurities. Ion implantation and 
electron irradiation create vacancy-self-interstitial pairs (Frenkel pairs). Wet or dry etching, the deposition of 
organic masks, metallic contacts, or other surface layers, furnace or rapid thermal anneals and other processes also 
result in defects or impurities penetrating into the material or diffusing through the bulk. 

Experimentally, local vibrational modes associated with a defect or impurity may appear in infra-red absorption or 
Raman spectra. The defect centre may also give rise to new photoluminescence bands and other experimentally 
observable signature. Some defect-related energy levels may be visible by deep-level transient spectroscopy 
(DLTS)[23]. 
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C2.1 6.4.1 NOMENCLATURE 

Most electrical and optical properties of semiconductors are determined by the impurities and defects they contain. 
The underlying reason for this is that any kind of imperfection in the crystal means a different local potential and 
therefore new energy eigenvalues (energy levels) and eigenfunctions associated with the defect or impurity and its 
immediate environment. If the new levels are in the gap, as is often the case, they give rise to some electrical 
activity. The energy levels can be shallow (near a band edge) or deep (far from a band edge). 'Hyper-deep' refers 
to a localized level in the VB. 

A gap level is called an acceptor level if the defect is neutral when the state is empty (no electron). It is called a 
donor level if the defect is neutral when the state is occupied (one electron). The former is often labelled (0 / -) and 
the latter (+ / 0), where the first (second) sign refers to the charge of the defect when no electron (one electron) is 
present. Double or triple acceptor and donor levels are similarly labelled. 

Common terminology used to characterize impurities and defects in semiconductors includes point and line defects, 
complexes, precipitates and extended defects. These terms are somewhat loosely defined, and examples follow. 

A point defect refers to a localized defect (such as a monovacancy) or impurity (such as interstitial O). This 
includes any relaxation and/or distortion of the crystal around it. Many point defects are now rather well 
understood, especially in Si, thanks to a combination of experiments providing information of microscopic nature 


(such as electron paramagnetic resonance, local vibrational mode spectroscopy, or photoluminescence) and 'ab 
initio' or 'first-principles' theory (which means that no experimentally adjusted parameters are used). Tremendous 
progress has been achieved since the mid-1980s in the theory of defects in semiconductors, to the point where 
nearly quantitative predictions are common. 

A combination of a small number of point defects is called a complex. Examples are the boron-hydrogen pair in Si, 
a small cluster of vacancies such as the hexavacancy, or a C-C pair. Precipitates refer to more complicated and 
larger aggregates of impurities or defects, such as the O-related thermal donors in Si, or metallic impurities trapped 
at some internal void. Such precipitates can be of substantial size, involving anywhere from a handful to thousands 
of atoms. They can be permanent sinks for specific impurities (such as Si0 2 precipitates in Si) or serve as sources 
of impurities or defects, such as the {311} platelets of self-interstitials in Si. Such defects are difficult to study. The 
number of degrees of freedom and local minima of a multi-dimensional potential energy surface render the 
theorist's task very challenging. Experimentalists must deal with complicated spectra and often broad lines. As a 
result, only a few small complexes are well understood, and almost nothing microscopic is known about large 
complexes and precipitates. For example, despite almost 50 years of experimental studies and theoretical 
modelling, the structure and nature of O-related thermal donors in Si is still unknown [24, 25]. 

Extended defects range from well characterized dislocations to grain boundaries, interfaces, stacking faults, etch 
pits, D-defects, misfit dislocations (common in epitaxial growth), blisters induced by H or He implantation etc. 
Microscopic studies of such defects are very difficult, and crystal growers use years of experience and trial-and- 
error techniques to avoid or control them. Some extended defects can change in unpredictable ways upon heat 
treatments. Others become gettering centres for transition metals, a phenomenon which can be desirable or not, but 
is always difficult to control. Extended defects are sometimes cleverly used. For example, the 'smart-cut' process 
relies on the controlled implantation of H followed by heat treatments to create blisters. This allows a thin layer of 
clean material to be lifted from a bulk wafer [26]. 
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Point defects and complexes exhibit metastability when more than one configuration can be realized in a given 
charge state. For example, neutral interstitial hydrogen is metastable in many semiconductors: one configuration 
has H at a relaxed bond-centred site, bound to the crystal, and the other has H atomic-like at the tetrahedral 
interstitial site. 

Bistability refers to defects which have different configurations in different charge states. The transition to a 
bistable state can be induced by the exposure to band-gap light for example. One example is the EL2 centre in 
GaAs [27]. A defect or impurity is called negative U when the energy gained by pairing two electrons (or holes) at 
the defect is greater than the Coulomb repulsion between them. The best known example is the vacancy (V) in Si 
which is stable in the spin singlet ++, and — charge states but unstable in the spin doublet + and - states [28]. 

The energetics associated with a metastable and/or bistable defect are often described using a configuration 
diagram. It is a semi-quantitative plot of the energy against a global coordinate which combines the position of the 
impurity and all the relaxations and distortions of the crystal, which can be substantial. The configuration diagram 
in figure C2.16.6 was obtained from muon spin rotation (juSR) data in Si [29], and relates to the states of muonium 
(Mu), a light isotope of hydrogen. This example illustrates configuration diagrams, acceptor and donor levels, 
metastability, bistability and negative- U behaviour. 
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Figure C2.16.6. The energy states of a metastable and bistable muonium in Si are illustrated in a configuration 
diagram. It plots the defect energy as a function of a coordinate which combines position and all the relaxations 
and distortions of the crystal. The specific example, discussed in the text, illustrates acceptor and donor levels, 
metastability, bistability and negative- U [50] behaviour. 

The configuration diagram consists of three (potential) energy curves associated with the three charge states of 
hydrogen in Si. The two types of minima correspond to the impurity at the tetrahedral interstitial (T) and at the 
bond-centred (BC) sites, respectively. There is little relaxation of the host crystal when hydrogen is at the T site, 
but the BC site is a minimum of the energy only after an Si-Si bond relaxes substantially to become the bridged Si- 
H-Si bond. The Hastate exists only in n type Si and is stable only at the T site. The ^if state is observed 
predominantly in p type Si, and is only stable at the BC site. The neutral state is found in two configurations, at the 
BC site (stable) and the T site (metastable). The energy difference between these two sites is shown as A, the value 
of which is estimated at a few tenths of an eV. The juSR data show that all three charge states coexist at the 
microsecond time scale above room 
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temperature and there is experimental evidence that this is the case for H as well. The energy differences marked in 
the figure were either measured directly (ionizations) or deduced from fitting the data. 

The ionization energies and impurity levels are shown in the flat-band figure next to the configuration diagram. 
The donor level (+ / 0) corresponds to the ionization energy H'ir -* H J<-since the transition occurs between 
essentially identical configurations. The ionization energy measured by juSR is very close to the donor level 
obtained for hydrogen by DLTS [30], 0.175 ± 0.005 eV. 

The situation is more complicated for the acceptor level, as the measured ionization energy corresponds to a 
transition from the stable state Hrto the metastable state H". However, the acceptor level corresponds to the energy 
difference between stable states ^r ~* Hjwthat [ s the ionization energy corrected by A. The juSR data therefore 
imply that muonium (or hydrogen) is negative [/if A < 0.35 eV, and positive [/otherwise. The energy difference U 
between the acceptor and donor levels is in any case quite small, whether positive or negative. Note that quantum 
effects (such as the zero-point energy) do play a role in this case, as the impurity (a muon of hydrogen) is very 
light. 

C2.1 6.4.2 SHALLOW LEVELS 


Shallow impurities have energy levels in the gap but very close to a band. If an impurity has an empty level close to 
the VB maximum, an electron can be thermally promoted from the VB into this level, leaving a hole in the VB. 
Such an impurity is a shallow acceptor. On the other hand, if an impurity has an occupied level very close to the 
CB minimum, the electron in that level can be thermally promoted into the CB where it participates in the 
conductivity. Such an impurity is a shallow donor. 


There are other, more exotic, possibilities. For example, if a defect has an empty level near the CB, an electron may 
become trapped in it. This localized electron may in turn bind a hole in a loose orbit, forming a bound exciton. 

Shallow donors (or acceptors) add new electrons to the CB (or new holes to the VB), resulting in a net increase in 
the number of a particular type of charge carrier. The implantation of shallow donors or acceptors is performed for 
this purpose. But this process can also occur unintentionally. For example, the precipitation around 450°C of 
interstitial oxygen in Si generates a series of shallow double donors called thermal donors. As-grown GaN crystal 
are always heavily n type, because of some intrinsic shallow-level defect. The presence and type of new charge 
carriers can be detected by Hall effect measurements. 

Since shallow-level impurities have energy eigenvalues very near those of the perfect crystal, they can be described 
using a perturbative approach first developed in the 1950s and known as effective mass theory (EMT). The idea is 
to approximate the band nearest to the shallow level by a parabola, the curvature of which is characterized by an 
effective mass parameter m . 

The simplest example is that of the shallow P donor in Si. Four of its five valence electrons participate in the 
covalent bonding to its four Si nearest neighbours at the substitutional site. The energy of the fifth electron which, 
at -0 K, is in an energy level just below the minimum of the CB, is approximated by Jrk 2 /Im^phxs the screened 
Coulomb attraction to the P ion, e /sr, where s is the dielectric constant or the frequency-dependent dielectric 
function. The Schrodinger equation for this electron reduces to that of the hydrogen atom, but m replaces the 
electronic mass and s screens the Coulomb attraction. 
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In Si, the binding energy is reduced to -20-40 meV independently of the shallow donor. The solution further yields 
a hydrogenic series of levels analogous to the Is, 2sp etc states of atomic hydrogen. This series bridges the shallow 
level of the donor and the CB minimum, facilitating the thermal ionization of the electron into the CB. This 
ionization, which begins below liquid nitrogen temperature, is what provides free electrons to the CB. 

The Bohr radius is very large, 3-5 nm, and the shallow impurity wavefunction extends over a large portion of the 
crystal. Doping up to the 'metallic limit' consists in implanting a sufficiently high concentration of donors so that 
the shallow-donor wavefunctions overlap, creating a half- filled impurity band in which the electrons move freely. 

C2.1 6.4.3 DEEP LEVELS 

If the level(s) associated with the defect are deep, they become electron-hole recombination centres. The result is a 
(sometimes dramatic) reduction in carrier lifetimes. Such an effect is often associated with the presence of 
transition metal impurities or certain extended defects in the material. For example, substitutional Au is used to 
make fast switches in Si. Many point defects have deep levels in the gap, such as vacancies or transition metals. In 
addition, complexes, precipitates and extended defects are often associated with recombination centres. The 
presence of grain boundaries, dislocation tangles and metallic precipitates in poly-Si photovoltaic devices are major 
factors which reduce their efficiency. 

Deep-level defects cannot be described by EMT or be viewed as simple perturbations to the perfect crystal. Instead, 
the full crystal-plus-defect problem must be solved and the geometries around the defect optimized to account for 
lattice relaxations and distortions. The study of deep levels is an area of active research. 

In order to remove the unwanted electrical activity associated with deep-level impurities or defects, one can either 
physically displace the defect away from the active region of the device (gettering) or force it to react with another 
impurity to remove (or at least change) its energy eigenvalues and therefore its electrical activity (passivation). 

Gettering is a black art. It consists in forcing selected impurities (typically, transition metals) to diffuse toward 
unimportant regions of the device. This is often done by creating precipitation sites and performing heat treatments. 
The precipitation sites range from small oxygen complexes to layers such as an Al silicide. The formation of such a 


metallization injects vacancies into the bulk and they enhance the diffusivity of some transition metals. Phosphorus 

gettering occurs during the heat treatment that follows the implantation of a heavily doped n + layer in Si. However, 
boron-rich buried layers and H-induced platelets are efficient gettering sites as well, because transition metals are 
much more stable at such defective regions than dissolved in the bulk. Recent success in achieving copper contacts 
on Si involved creating a gettering layer (TiN) in the subsurface region to precipitate Cu and keep it out of the 
active region of the device. 

Passivation involves mostly the use of hydrogen [31]. H is a rapid diffuser in most semiconductors and is 
unavoidable. It may be present in an ambient (water vapour for example) and in many processing steps such as the 
deposition of Schottky contacts or antireflection coatings, the use of organic masks etc. Hydrogen diffuses and 
traps at a range of impurities and defects. The trapping always involves some covalent interaction, a change in the 
configuration and electronic structure of the complex and a shift in its energy eigenvalues. Passivation results when 
an energy level shifts from the gap into a band. The thermal stability of most H-impurity pairs is normally rather 
low (a few hundred degrees Celsius at the most), but that of H-defect pairs tends to be higher. This is why H is 
used to passivate grain boundaries and other defects in poly-Si solar cells for example. In GaN, hydrogen 
passivates the shallow Mg acceptor with an 
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unusually high thermal stability. In order to obtain p type GaN, the material is often annealed at high temperatures 
to dissociate the {H, Mg} pairs and diffuse H out of the crystal. 

C2.16.4.4 DIFFUSION 

In addition to the configuration, electronic structure and thermal stability of point defects, it is essential to know 
how they diffuse. A variety of mechanisms have been identified. The simplest one involves the diffusion of an 
impurity through the interstitial sites. For example, copper in Si diffuses by hopping from one tetrahedral 
interstitial site to the next via a saddle point at the hexagonal interstitial site. 

However, most impurities and defects are Jahn-Teller unstable at high-symmetry sites or/and react covalently with 
the host crystal much more strongly than interstitial copper. The latter is obviously the case for substitutional 
impurities, but also for interstitials such as O (which sits at a relaxed, puckered bond-centred site in Si), H (which 
bridges a host atom-host atom bond in many semiconductors) or the self-interstitial (which often forms more 
exotic structures such as the 'split-(l 10)' configuration). Such point defects migrate by breaking and re-forming 
bonds with their host, and phonons play an important role in such processes. 

The vacancy is very mobile in many semiconductors. In Si, its activation energy for diffusion ranges from 0.18 to 
0.45 eV depending on its charge state, that is, on the position of the Fermi level. While the equilibrium 
concentration of vacancies is rather low, many processing steps inject vacancies into the bulk: ion implantation, 
electron irradiation, etching, the deposition of some thin films on the surface, such as Al contacts or nitride layers 
etc. Such non-equilibrium situations can greatly affect the mobility of impurities as vacancies flood the sample and 
trap interstitials. 

Self- interstitials are also mobile. In Si, the activation energy for diffusion is believed to be of the order of the eV, 
but this drops to zero when minority charge carriers are present. This is probably due to recombination-enhanced 
diffusion, a process in which the defect itself is a recombination centre for electrons and holes. The recombination 
releases an energy about equal to size of the gap and this energy is used to propel the impurity above a diffusion 
barrier. Since many processing steps which inject self-interstitials also inject minority carriers, self-interstitials tend 
to diffuse excessively fast during the processing step and react with impurities and defects. They often kick out 
substitutional impurities, thus transforming a slow-diffusing impurity into a rapidly diffusing one. Important 
examples include boron in Si and the 3d transition metals which diffuse much faster as interstitial than 
substitutional impurities. 

More exotic diffusion processes have been identified, although they may not be fully understood. One example is 
the substantial enhancement [25] of the diffusivity of interstitial O by H, resulting in the increased formation rate of 


O-related thermal donors in Si in the 300-450°C range. 


C2.16.5 STRUCTURES AND DEVICES 


C2.1 6.5.1 P-N JUNCTION 


In order to obtain appreciable conductivities, semiconductors must be doped with small amounts of selected 
impurities. It is possible to switch the doping type from n to p type, or vice versa, either during the growth of a 
crystal or by the selective introduction of impurities after the growth. The boundary region between the p type and 
n type regions is 
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called a p-n junction [ 17 , 32, 33, and 34 ]. 

The detailed spatial distribution of carriers in the immediate vicinity of the p-n junction is very important. Some of 
the majority carriers at the junction neutralize each other. This results in a thin region depleted of free carriers, 
known as the space-charge region. In this region, negatively ionized acceptors on the p side repel the mobile 
electrons from the n side of the junction. Similarly, the mobile holes from the p side are repelled by the positively 
ionized donors on the n side. The result is a built-in electric field which inhibits carrier diffusion across the p-n 
junction. The electrostatic potential, or contact potential, associated with this field bends the conduction and 
valence bands in the space-charge region by an amount called the barrier height, as is illustrated in figure C2.16.7. 
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Figure C2.16.7. A schematic energy band diagram of a p-n junction without external bias (a) and under forward 
bias (b). Electrons and holes are indicated with - and + signs, respectively. It should be remembered that the energy 
of electrons increases by moving up, holes by moving down. Electrons injected into the p side of the junction 
become minority carriers. Approximate positions of donor and acceptor levels and the Fermi level, are indicated. 


V^ is the built-in potential of the p-n junction. 

A current flow across the p-n junction can be accomplished only by the application of an external voltage which 
opposes the contact potential and reduces the barrier height. This voltage, called a forward bias, supplies electrons 
at the n and holes at the p contact. Since the barrier height is similar in magnitude to the gap energy, the external 
bias needs to be fairly small, on the order of the gap energy divided by the electron charge. 


-16- 

Under a forward bias, the majority carriers cross the p-n junction and become minority carriers, e.g. holes on the n 
side of the junction. These holes rapidly come into thermal equilibrium with the crystal and reach the energy 
approximately equal to the energy of the valence band edge on the n side of the junction (it should be remembered 
that electrons moving up on the energy band increase their energy while holes increase their energy by moving 
down). Similarly, the minority electrons reach equilibrium at the p side of the junction. However, the minority 
carriers are not in thermodynamic equilibrium with the majority carriers and must give up their excess energy. A 
hole on the n side recombines with a majority electron, which then loses an energy about equal to E The minority 
electron loses a similar energy by recombining with a majority hole. 

The distributions of excess, or injected, carriers are indicated in band diagrams by so-called quasi-Fermi levels for 
electrons (£ fn ) or holes (E^). These functions describe steady state concentrations of excess carriers in the same 
form as the equilibrium concentration. In equilibrium we have E^ = E^ = E^. 

C2.1 6.5.2 PHOTODETECTORS AND SOLAR CELLS 

The current-voltage characteristic of an ideal p-n junction is / = I s [exp(qV/mkT) - 1], where q is the electron (hole) 
charge, m is the ideality factor, 1 < m < 2, and / the saturation current. Under applied reverse bias, the current 
through the junction is limited to I . Illuminating a reverse-biased diode with photons of energy greater than E 
results in the generation of photocarriers. The photocurrent is proportional to the intensity of the incident light. In 
low-doped diodes, / can be quite low and even small amounts of light can be detected. The diode is being operated 
as aphotodetector. Very sophisticated detector structures have been designed, particularly those relying on high- 
field avalanche multiplication of photocarriers [35]. 

The p-n junction diode can also be used to convert optical energy directly to electrical power, without external 
power supplies. Absorption of a photon with E> E produces an e~-h + pair. The internal electric field of the p-n 
junction separates the carriers; the e~ and h + move toward metallic contacts on opposite sides of the cell. The 
resulting photocurrent is sent to an external load. The maximum power delivered to a load is obtained under small 
forward bias. In Si cells, the largest voltage output produced in the open circuit mode (i.e. with /= 0) is about 0.7 
V. Si solar cell power efficiencies as high as 24% have been reported, close to the theoretical limit of 32%. The 
power generated depends on the design of the diode itself and a match to the electrical load. The gap of the cell's 
semiconductor must match the solar spectrum as closely as possible and the structure of the gap should allow for 
efficient absorption of the solar photons. Perfection of the semiconductor material is also very important for high 
efficiency. Electrons and holes generated far from the electrodes must be extracted from the bulk of the cell and 
this requires long minority carrier diffusion lengths. 

C2.1 6.5.3 LIGHT EMITTING DIODES 

Light is generated in semiconductors in the process of radiative recombination. In a direct semiconductor, minority 
carrier population created by injection in a forward biased p-n junction can recombine radiatively, generating 
photons with energy about equal to E . The recombination process is spontaneous: individual electron-hole 
recombination events are random and not related to each other. This process is the basis of LEDs [36]. 

LEDs can be now fabricated in all primary colours and with efficiencies much higher than those of light bulbs. Red 
LEDs, based on InGaAlP, are sufficiently bright to be used in traffic lights and automobile brake lights. Blue 
LEDs, based on GalnN, have become commercially available in the last few years. White light can be produced by 
blue LEDs 
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pumping appropriate phosphor materials. Inexpensive and very reliable infrared LEDs are common in 
communication systems. 

C2.1 6.5.4 BIPOLAR TRANSISTORS 

The bipolar junction transistor (BJT) consists of three layers doped n-p-n or p-n-p that constitute the emitter, base 
and collector, respectively. This structure can be considered as two back-to-back p-n junctions. Under normal 
operation, the emitter-base junction is forward biased to inject minority carriers into the base region. For example, 
the n type emitter injects electrons into a p type base. The electrons in the base, now minority carriers, diffuse 
through the base layer. The base-collector junction is reverse biased and its electric field sweeps the carriers 
diffusing through the base into the collector. The BJT operates by transport of minority carriers, but both electrons 
and holes contribute to the overall current. 

A band diagram of a biased n-p-n BJT is shown in figure C2.16.8. Under forward bias, electrons are injected from 
the n type emitter, giving rise to the current I E flowing into the p type base. Some of the carriers injected into the 
base recombine in the base or at the surface. This results in a reduction of the base current by /, the lost 
recombination current, and the base current becomes J B =/ En - / . At the same time, holes are injected from the 
base into the emitter giving rise to the I E component of the current. The two components, / and 7 £ , must be 
minimized since they reduce the collector current ^ C =(7 E n -/ £ )- I r - The 'current gain' of the transistor is the 
ratio / C /I B - Since the base current is much smaller than the collector current, large gains are possible. 
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Figure C2.16.8. Schematic energy band diagram for an n-p-n bipolar junction transistor. Positions of quasi-Fermi 
levels and bias voltages are indicated. 

In a BJT, the width of the base region must be smaller than the minority carrier diffusion length, and the two 
junctions strongly influence each other. Small increases in the current flowing through the first junction result in 
large increases in the collector current. The ratio of the current at the collector to the base current is the current 
gain. In Si-based BJTs, the gain is limited by the injection of the base majority carriers into the emitter. In order to 
maintain an adequate current gain, the doping level of the base must be lower than that of the emitter. 
Unfortunately, this results in higher base resistance and larger RC time constant of the transistor. 
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Some of these problems are avoided in heteroj unction bipolar transistors (HBTs) [37, 38], the majority of which 
are based on III-V compounds such as GaAs/AlGaAs. In an HBT, the gap of the emitter is larger than that of the 
base. The conduction and valence band offsets that result from the matching up of the two different materials at the 
heterojunction prevent or reduce the injection of the base majority carriers into the emitter. This permits the use of 


the highly doped, very thin bases needed to achieve high gain and high speed of operation at the same time. The 
conduction band offset at the emitter-base interface gives rise to the possibility of injecting carriers with high 
kinetic energy and ballistic transport in the base. 

C2.1 6.5.5 METAL - OXIDE- SEMICONDUCTOR TRANSISTORS 

Metal-oxide-semiconductor field-effect transistors (MOSFETs) are the basic devices of modern electronics [32, 33 
and 34]- They can be produced in large numbers with very reproducible characteristics and their low power 
consumption makes large-scale integrated circuits possible. They are majority carrier devices and thus relatively 
insensitive to materials defects. A schematic cross section of a MOSFET is shown in figure C2.16.9 . The heart of 
the device is the metal-oxide-semiconductor (MOS) gate structure that controls the current flow in the channel. In 
normal operation of an n-channel enhancement mode device, the gate bias is used to produce a conducting channel 
between two contact regions known as the source and drain. A positive bias attracts electrons into the channel; a 
negative bias repels them. For low source-drain bias, the conductivity of the channel is proportional to the gate 
voltage, the channel width is uniform from source to drain and the transistor behaves as a voltage-controlled 
resistor. As the drain bias approaches a threshold value (T t ) the voltage drop across the oxide, in the vicinity of the 
drain, decreases. This results in a non-uniform channel width, lower at the drain, and overall higher resistance. For 
a drain bias greater than V v the channel is pinched off at the drain. Increased source-drain voltage does not produce 
any incremental source-drain current. At this point, the slope of the output current versus source-drain bias curve 
is zero. This bias region is known as the saturation region. In this region, small changes in the gate voltage produce 
large changes in the drain current and the device is said to have large trans conductance. 

The performance of MOSFETs depends critically on the gate dimension measured along the channel (the gate 
length), and the gate oxide thickness. Higher transconductance is obtained for shorter gate length and thinner 
oxides. State-of-the-art devices use gate length shorter than 200 nm and gate oxides thinner than 10 nm. 

The carriers in the channel of an enhancement mode device exhibit unusually high mobility, particularly at low 
temperatures, a subject of considerable interest. The source-drain current is carried by electrons attracted to the 
interface. The ionized dopant atoms, which act as fixed charges and limit the carriers' mobility, are left behind, 
away from the interface. In a sense, the source-drain current is carried by the two-dimensional (2D) electron gas at 
the Si-gate oxide interface. 

Electron effective masses are much smaller in compound semiconductors and quantum effects in 2D gas much 
more pronounced. It is also easier to engineer the 2D electron (or hole) gas by the use of heterostructures and 
modulation doping. The channel of a compound semiconductor FET can be formed by two layers with different 
gaps, for instance GaAs and InGaAs [39]. An undoped spacer layer may be introduced at the interface. The 
electrons associated with the donors in the wider-gap layer see lower-lying energy states in the adjacent narrower- 
gap material and transfer to it. The electrons and positively charged donors become spatially separated, effectively 
eliminating ionized impurity scattering in the narrower-gap channel. High-mobility electrons can be then 
maintained with high sheet charge densities down to very low temperatures. 
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Figure C2.16.9. Schematic cross-section and biasing of a metal-oxide-semiconductor transistor. A uniform 

conducting channel is induced between source (S) and drain (D) for V ae > V f . Voltage V ae is applied between the 

gs t gs 

gate (G) and the source. Part (A) shows the channel for V^ < V - V t ; the transistor acts as a triode. The source- 
drain voltage is increased in part (B) to V^ > V - V r The channel is now pinched off at the drain side and 1^ is 
saturated. This is the proper regime of operation of the MOS transistor. 


C2.16.5.6 HETEROSTRUCTURES 

In the p-n junction illustrated in figure C2.16.7 both sides are made of the same semiconductor, and therefore have 
the same energy gap. Such a junction is called a homoj unction. In a homojunction, the minority carriers are free to 
move away from the junction by a few diffusion lengths. It is therefore very difficult to achieve high carrier 
densities. Significantly higher carrier densities can be obtained by introducing an energy barrier at, or very near, the 
p-n junction. The energy barrier arises when two different semiconductors (therefore with different gaps) are 
joined [8, 40]. Barriers in the CB and VB, called the band-edge discontinuities AE c and AE y , are formed by 
changing the composition of the semiconductor layers. The junction of a small- with a large-gap semiconductor is 
called a single heteroj unction. It confines one type of minority carrier (electrons or holes) to the p-n junction 
region. 

A more effective carrier confinement is offered by a double hetero structure in which a thin layer of a low-gap 
material is sandwiched between larger-gap layers. The physical junction between two materials of different gaps is 
called a heterointerface. A schematic representation of the band diagram of such a structure is shown in figure 
C2.16.10 . The electrons, injected under forward bias across the p-n junction into the lower-bandgap material, 
encounter a potential barrier AE c at the p-p junction which inhibits their motion away from the junction. The holes 
see a potential barrier of 
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AE at the n-p heterointerface which prevents their injection into the n region. The result is that the injected 
minority carriers are confined to the thin narrow-bandgap region. If this region is thinner than the average diffusion 


length, very high densities of injected carriers can be obtained in a forward-biased diode. Heterojunctions are the 
basic structures of LEDs, semiconductor lasers and HBTs. 
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Figure C2. 16.10. Band diagram of a double p-p-n double heterostructure without the external bias. The gap E « is 
smaller than E 2 > Conduction and valence band discontinuities are indicated. Structures of this type are used in 
LEDs and diode lasers. 

C2.1 6.5.7 QUANTUM WELL STRUCTURES 

Advances in epitaxial crystal growth methods make it possible to prepare heterostructures with essentially arbitrary 
thickness of the small-gap layer. When the thickness of this layer is reduced to dimensions of the order of 10 nm 
(between 20 and 30 atomic planes) a quantum mechanical description of the confined carriers is needed. Such 
heterostructures are called quantum wells [41, 42 ]. 

In quantum wells, Heisenberg's uncertainty principle requires an increase in the carrier energy over the equilibrium 
energy of the bulk semiconductor. The confined carriers are allowed only a few discrete states, with energies 
inversely proportional to the carrier's effective mass and the square of the well width. The incorporation of 
quantum wells into the material has a number of subtle consequences. The VB in direct bulk semiconductors is 
degenerate at the T point (see figure C2.16.5 , resulting in two types of holes with the same energy, the heavy holes 
and light holes. The effective mass of the light hole is similar to that of the electron, while that of the heavy hole is 
typically ten times larger than that of the electron. In quantum wells, the degeneracy of the two hole bands is lifted. 
The energy shift due to quantum well confinement is larger for the light holes. This has important consequences for 
quantum well lasers and modulation-doped FETs. 

C2.1 6.5.8 DIODE LASERS 

The light emitted in the spontaneous recombination process can leave the semiconductor, be absorbed or cause 
additional transitions by stimulating electrons in the CB to make a transition to the VB. In this stimulated 
recombination process another photon is emitted. The rate of stimulated emission is governed by a detailed balance 
between absorption, and spontaneous and stimulated emission rates. Stimulated emission occurs when the 
probability of a photon causing a transition of an electron from the CB to VB with the emission of another photon 
is greater than that for the upward transition of an electron from the VB to the CB upon absorption of the photon. 
These rates are commonly described in terms of Einstein's A and B coefficients [8, 43]. For semiconductors, there 
is a simple condition describing the carrier density necessary for stimulated emission, or lasing. This carrier density 
is known as 
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the threshold density. There is also a corresponding threshold current density that has to be supplied to the p-n 
junction. Lasing can start when the density of electrons injected into the CB exceeds the hole density in the VB, a 
condition of population inversion. It occurs when the separation of the quasi-Fermi levels for holes (E ) and 
electrons (£" ) is greater than the energy of the emitted photon, 

£ n — E p > h&> 


and the photon energy /zoo must be at least equal to E . Thus, in semiconductor lasers, stimulated emission occurs 
between distributions of states in the conduction and valence bands. In most other lasers, such as gas or glass 
lasers, this transition occurs between discrete energy levels. 

The emission wavelength of the laser is directly related to the size of the gap. The early lasers were based on GaAs 
and emitted therefore in the near infrared. Lasers based on InGaAsP produce light between 1.3 and 1.55 urn, 
specifically tailored to optical fibre communications [44]. Ongoing advances in GaN-based materials are resulting 
in lasers emitting in the blue [11]. We thus have a very wide range of gaps and emission wavelengths at our 
disposal. 

A diode laser requires the generation of spatially localized high concentrations of minority carriers, a medium to 
provide the gain and a way of providing feedback to the stimulated emission [43, 44 and 45]. The medium is the 
semiconductor hetero structure arranged to help to confine the carriers and the photons to the same region of space. 
Light is generated by a p-n junction which injects electrons from the valence to the conduction band and thus 
provides the population inversion. This is followed by electron-hole recombination and the emission of light. 
Further recombination can be stimulated by light already present in the medium. The optical feedback is arranged 
by forming a cavity with two mirrors parallel to each other. Light generated within the cavity is then partially 
reflected back into the crystal. Such mirrors can be formed in most compound semiconductors by simply cleaving 
both ends of the heterostructure wafer. 

Figure C2.16.ll illustrates the evolution of the threshold current density of diode lasers with the structure of the 
recombination region within the p-n junction, known as the active region. The early diode lasers were based on 
GaAs homojunctions. Their large threshold current densities resulted from poor carrier confinement and large 
effective active region thickness. This is because the diffusion length of electrons is fairly long in most of the 
semiconductors discussed here, of the order of several micrometres. A very high threshold current density limits 
operation to short pulses and cryogenic temperatures. 
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Figure C2. 16.11. Changes in the threshold current density of diode lasers resulting from new structure concepts. A 
homojunction diode laser was first demonstrated in 1962. SH and DH stand for single and double heterostructure, 
respectively. The best laser performance is now obtained in quantum well (QW) lasers. 


In a heterostructure laser, the active region can be defined by epitaxial layers and made considerably thinner. In 
GaAs/Ai x .Ga 1 _ x As heterostructures, the active region can be made as thin as 100 nm, and the threshold current 

density drops to less than 0.5 kA cm Such lasers readily achieve continuous operation at room temperature and 
are capable of high power output. 


A logical consequence of this trend is a quantum well laser in which the active region is reduced further, to less 
than 10 nm. The 2D carrier confinement in the wells (formed by the CB and VB discontinuities) changes many 
basic semiconductor parameters, in particular the density of states in the CB and VB, which is greatly reduced in 
quantum well lasers. This makes it easier to achieve population inversion and results in a significant reduction in 
the threshold carrier density. Indeed, quantum well lasers are characterized by threshold current densities lower 
than 100 AcnT 2 . 

The history of the diode laser illustrated in figure C2.16.1 1 shows the interplay of basic device physics ideas and 
technology. A new idea often does not produce a better device right away. It requires a certain leap of faith to see 
the improvement potential. However, once the belief exists, the technology can be developed to demonstrate its 
validity. In the case of diode lasers, the better technology was invariably associated with improved epitaxial 
growth. 


C2.16.6 OUTLOOK 

We are aware of the dangers inherent in predicting the future. It is much safer to summarize the present and to 
extrapolate to the near term only. 

The speed and general performance of semiconductor electronics have been doubling and the cost halving every 1 8 
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months for the last 50 years, a phenomenon known as Moore's law [46]. It is remarkable that this rate of advances 
still holds today, even as the active volume of devices is becoming so small that quantum effects are critical. The 
semiconductor industry has listed the expected challenges and specific goals in a 'road-map' extending well into 
the next decade [47]. The gate length of MOS transistors will be measured in nanometres, not in micrometres. The 
thickness of gate dielectrics is expected to drop to less than 20 nm. This reduction in size demands a much better 
microscopic understanding of materials and processes. 

Atomic-scale devices already projected pose design challenges at the quantum mechanical level. The framework of 
quantum computing is now being discussed in research laboratories [48, 49 ]. 

In additions to improvements in Si, a variety of devices based on compound semiconductors can be expected. Blue 
lasers with high brightness and long operating lifetimes already exist in the laboratory. LEDs are likely to be used 
for all lighting purposes. The bandwidth of optical communications will continue to increase with ever faster 
semiconductor lasers. 

There appears to be a world market for an infinite number of computers and other electronic devices. 
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C2.17 Nanocrystals 

Vicki L Colvin and Dan M Mittleman 


C2.17.1 INTRODUCTION 

Throughout most of the twentieth century, the size of a solid has been an uninteresting parameter in the developing 
areas of solid-state physics and chemistry. Millimetre-thick gold wires have the same colour, conductivity and 
melting point as gold coins; even in state-of-the-art microelectronics, where structural features with sub- 
micrometre sizes have become typical, semiconductors and metals have the same behaviour as measured in their 
macroscopic counterparts. These facts are not surprising, as most material properties, such as conduction and 
colour, emerge from interactions between, at most, hundreds of unit cells, each less than a nanometre in size. Thus 
size is not an important factor in understanding and controlling crystalline solids unless the length of solid shrinks 
to the nanometre length scale. Of course, this is the relevant size range for the next generation of microelectronics, 
and is evidently an important length scale for many biological systems. These factors have spurred the development 
of the field of nanoscience, which is the study of the influence of size on the properties of solids. Innumerable 


studies of the properties of nanocrystalline materials have amply demonstrated that size really does matter; it can be 
a powerful parameter in the systematic study of bulk behaviour, as well as in the design of new materials with 
unique and special properties. 

For the purposes of this review, a nanocrystal is defined as a crystalline solid, with feature sizes less than 50 nm, 
recovered as a purified powder from a chemical synthesis and subsequently dissolved as isolated particles in an 
appropriate solvent. In many ways, this definition shares many features with that of 'colloids', defined broadly as a 
particle that has some linear dimension between 1 and 1000 nm [1]; the study of nanocrystals may be thought of as 
a new kind of colloid science [2]. Much of the early work on colloidal metal and semiconductor particles stemmed 
from the photophysics and applications to electrochemistry. (See, for example, the excellent review by Henglein 
[3].) However, the definition of a colloid does not include any specification of the internal structure of the particle. 
Therein lies the crucial distinction: in nanocrystals, the interior crystalline structure is of overwhelming importance. 
Nanocrystals must truly be 'little solids' ( figure C2.17.1 ), with internal structures equivalent (or nearly equivalent) 
to that of bulk materials. This is a necessary condition if size-dependent studies of nanometre-sized objects are to 
offer any insight into the behaviour of bulk solids. 

The definition above is a particularly restrictive description of a nanocrystal, and necessarily limits the focus of this 
brief review to studies of nanocrystals which are of relevance to chemical physics. Many nanoparticles, particularly 
oxides, prepared through the sol-gel method are not included in this discussion as their internal structure is 
amorphous and hydrated. Nevertheless, they are important nanomaterials; several textbooks deal with their 
synthesis and properties [4, 5]- The material science community has also contributed to the general area of 
nanocrystals; however, for most of their applications it is not necessary to prepare fully isolated nanocrystals with 
well defined surface chemistry. A good discussion of the goals and progress can be found in references [6, 7, 8 and 
9]. Finally, there is a rich history in gas-phase chemical physics of the study of clusters and size-dependent 
evaluations of their behaviour. This topic is not addressed here, but covered instead in chapter C 1.1 , Clusters and 
nanoscale structures, in this same volume. 
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Figure C2.17.1. Transmission electron micrograph of a Ti0 2 (anatase) nanocrystal. The mottled and unstructured 
background is an amorphous carbon support film. The nanocrystal is centred in the middle of the image. This 
microscopy allows for the direct imaging of the crystal structure, as well as the overall nanocrystal shape. This 
titania nanocrystal was synthesized using the nonhydrolytic method outlined in [79]. 

We begin our discussion of nanocrystals in this chapter with the most challenging problem faced in the field: the 
preparation and characterization of nanocrystals. These systems present challenging problems for inorganic and 
analytical chemists alike, and the success of any nanocrystal synthesis plays a major role in the further quantitative 
study of nanocrystal properties. Next, we will address the unique size-dependent optical properties of both metal 
and semiconductor nanocrystals. Indeed, it is the striking size-dependent colours of nanocrystals that first attracted 


the interest of chemical physicists. Finally, the thermodynamic properties of nanocrystals will be reviewed. The 
melting point reduction and unusual structural metastability observed in solids of confined size are important 
results in understanding the physics of these systems. 


C2.17.2 PREPARATION OF NANOCRYSTALS 

Obtaining high-quality nanocrystalline samples is the most important task faced by experimentalists working in the 
field of nanoscience. In the ideal sample, every cluster is crystalline, with a specific size and shape, and all clusters 
are identical. While such uniformity can be expected from a molecular sample, nanocrystal samples rarely attain 
this level of perfection; more typically, they consist of a collection of clusters with a distribution of sizes, shapes 
and structures. In order to evaluate size-dependent properties quantitatively, it is important that the variations 
between different clusters in a nanocrystal sample be minimized, or, at the very least, that the range and nature of 
the variations be well understood. 

Reaching the goal of the ideal nanocrystal sample is not an easy task. There are few commercial sources for 
nanocrystals, and the chemical reactions used to make them can require involved synthetic methodology. On the 
other hand, the last decade has seen enormous progress in this area and many solids have now been prepared in the 
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nanocrystalline phase. For the most well studied materials, particle size distributions of less than 5% on the 
diameter are routinely obtained [10, H and 12]; more typically, size distributions of 10-20% are reasonable. Note 
that polydispersity for a colloidal system, referred to here as a, is defined as the standard deviation of the particle 
diameters divided by average particle diameter. 

While size distribution is important, control over the nanocrystal surface is equally important. The best nanocrystal 
syntheses provide avenues for nanocrystals to be purified, collected as powders, and then redissolved in appropriate 
solvents. This requires control over the surface chemistry, in order to control the solubility of the nanocrystals. 
Such flexibility allows these materials to be structurally characterized and then assembled into a wide variety of 
configurations for further experiments. 

There are many ingenious and successful routes now developed for nanocrystalline synthesis; some rely on gas 
phase reactions followed by product dispersal into solvents [7, 9, 13, 14 and 15 ]. Others are adaptations of classic 
colloidal syntheses [16, 17, 18 and 19]. Electrochemical and related template methods can also be used to form 
nanostructures, especially those with anisotropic shapes [20, 21, 22 and 23]. Rather than outline all of the available 
methods, this section will focus on two different techniques of nanocrystal synthesis which together demonstrate 
the general strategies. 

C2.17.2.1 REACTIONS OCCURRING IN RESTRICTED ENVIRONMENTS 

A logical departure point for the synthesis of nanocrystals is to view the problem as one of limiting the growth of a 
bulk crystal; this is challenging because the large surface free energy of a nanocrystal makes it a metastable system, 
highly prone to fusion and aggregation. One particularly elegant and versatile solution is to precipitate solids inside 
reactors which are themselves of nanometre scale [24, 25, 26, 27 and 28]. Such nanometre-scale reaction 
environments can be formed by mixing water, surfactant and oil to create inverse micelles [29, 30]- These water 
pools have diameters that can be tuned from 5 to -60 A in radius by varying the water/surfactant molar ratio, thus 
providing a direct avenue for size control [31, 32 and 33]- Ionic salts can be solvated in the inverse micelles; when 
such solutions are exposed to a counterion which forms an insoluble solid with the original salt, small crystals of 
the solid form within the micelle environment. This process, referred to as arrested precipitation, can be used to 
grow nanocrystals with sizes which are roughly equivalent to the original micellar size. Surface control can be 
achieved by adding organic 'capping' agents to the final solutions. Depending on the solubility and capping group 
affinity, the nanocrystals may directly precipitate out of the micellar solution [34, 35] or may be recovered as a 
powder after the micelle phase is disrupted by the addition of an alcohol [36, 37]. The advantage of this approach is 


that it is relatively easy, and can be applied to many different materials including metals [38, 39, 40 and 41], 
ceramics [42, 43, 44, 45, 46 and 47] and some semiconductors [34, 35, 36 and 37, 48]. Shape control is also 
possible in surfactant-based syntheses of metal nanocrystals [49, 50 and 51 ]. Cizeron et al [52] use surfactants to 
control not only particle size, but more critically the growth rates of different crystal faces, producing highly 
anisotropic nanoneedles with aspect ratios exceeding 100:1 ( figure C2.17.2 ). 
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Figure C2.17.2. Transmission electron micrograph of a gold nanoneedle. Inverse micelle environments allow for a 
great deal of control not only over particle size, but also particle shape. In this example, gold nanocrystals were 
prepared using a photolytic method in surfactant-rich solutions; the surfactant interacts strongly with areas of low 
curvature, thus continued growth can occur only at the sharp tips of nanocrystals, leading to the formation of high- 
aspect-ratio nanostructures [52]. 

Though easy to implement and quite versatile, inverse micelle reactions have not generally produced the highest- 
quality nanocrystals. In such an environment the reaction necessarily occurs at room temperature. While crystalline 
metals are easily formed at these low temperatures, nanoparticles of semiconductors and oxides are often highly 
defective or even amorphous. This problem can be addressed to some extent by refluxing the nanocrystals in a high 
boiling point solvent after separating them from the micelle phase [35]. Another common problem with 
nanocrystals produced by this method is their relatively poor size distributions. Post-processing treatments, such as 
size-selective precipitation or filtration, are generally employed if nanocrystal monodispersity is of great 
importance [53, 54]- Pileni et al [54] recently used size-selective precipitation to narrow the size distribution of 
silver nanocrystals prepared in an inverse micelle reaction from 37% to 15% polydispersity. Perhaps the most 
severe limitation of the inverse micelle approach is that it can only form nanocrystals whose precursors are stable 
and solvated by water. More covalent semiconductors like silicon and gallium arsenide, as well as many II- VI 
materials like CdSe, require reactive organometallic precursors. Thus, alternative reactions which proceed in dry, 
organic solvents are necessary. 

C2.17.2.2 PRECURSOR APPROACHES TO NANOCRYSTAL SYNTHESIS 

This class of methods differ from inverse micelle methods in that the reactions are completed in organic solvents. 
Such solvents permit the reactions to proceed at much higher temperatures, leading to nearly perfect crystalline 
solids [55]. In addition, the use of organic solvents permits nanocrystals to be prepared from a wide variety of 
molecular precursors under oxygen-free and water-free conditions. Metal [12, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65 
and 66], semiconductor [67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77 and 78] and ceramic nanocrystals [79, 80] have 
been generated using this basic strategy. These stringent controls over reaction conditions are important in the 
synthesis of covalent semiconductors, which require reactive organometallic starting materials. This can also be an 
advantage in the preparation of metal nanocrystals free from oxide or hydroxide contamination. Precursor methods 
have also shown remarkable success in producing highly monodisperse (a < 5%) nanocrystals, especially for II- VI 
semiconductor nanocrystals [10, 81]. For these reasons, this particular approach to nanocrystal synthesis is 
becoming a popular strategy despite the fact that it requires more involved synthetic methodology. 


The first step in designing a precursor synthesis is to pick precursor molecules that, when combined in organic 
solvents, yield the bulk crystalline solid. For metals, a usual approach is to react metal salts with reducing agents to 
produce bulk metals. The main challenge is to find appropriate metal salts that are soluble in an organic phase. 
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One prevalent strategy for this involves the use of a phase transfer agent, such as tetraoctyl ammonium bromide, to 
bring gold and silver salts into an organic phase [12, 56, 57, 58, 59, 60, 61, 62 and 63]. Reduction of the metal salts 
occurs readily with the addition of sodium borohydride. Formation of the bulk solid is prevented by the saturation 
of the organic phase with capping molecules, which serve to limit crystallite growth [82]. Gold nanocrystals with 
average diameters of 2.5 nm and 20% polydispersity are the crude product of this reaction; fractional crystallization 
of these samples, however, can effectively size-separate these nanocrystals and provide samples with less than 1% 
polydispersity [12]. Rather than rely on phase transfer techniques, which do require exposure of the organic phase 
to water, gold complexes soluble in organic phases can be prepared directly. Upon heating, these decompose to 
form gold clusters [64, 65]. The extension of these general strategies to more reactive transition metals such as 
titanium and copper is also possible. For these systems, LiAlH 4 or hydrotriorganoborates are used to reduce metal 
salts in solvents such as tetrahydrofuran, producing nanocrystalline metals of a variety of sizes stabilized in organic 
phases. Bonneman et al [66] provide a comprehensive review of these versatile precursor strategies for nanocryst 
alline metals. 

For covalent semiconductors the problem of creating any crystalline material in liquids at relatively low 
temperatures is much more challenging. The precursors are generally chosen to provide by-products that are stable 
as well as volatile at the reaction temperatures, thus providing a driving force for the reaction. For example, in one 
of the early reactions for making nanocrystalline GaAs, GaCl 3 and As(SiMe 3 ) 3 were used as precursors, since the 
silyl halide by-product was stable and volatile yielding a pure product of crystalline GaAs at temperatures as low as 
240°C [78]. Since then, a variety of III-V semiconductor nanocrystals, including InP, have been produced through 
similar high-temperature reactions between reactive precursors [71, 72, 73 and 74, 76, 77, 83]. A different 
precursor strategy was employed for the formation of silicon and germanium nanocrystals. Here, the metal halides 
were reduced by alkali metals at high temperatures forming nanocrystals and salts as a by-product [67, 68 and 69 ]. 

Control over the size of the semiconductor nanocrystals formed in these reactions is possible, though the 
rationalization of the size control is not always straightforward. First, these strategies require the use of a strongly 
stabilizing solvent, such as tri-octyl phosphine oxide, which is thought to slow crystal growth because of its strong 
interactions with the growing crystallite surfaces. In perhaps the best characterized reaction, the formation of II-VI 
semiconductor nanocrystals, crystal growth is limited by a rapid quenching of the reaction temperature [10, 84, 85 
and 86]- Particle nucleation occurs during a fixed period of time after injection of the cadmium precursor into hot 
mixtures of tri-octyl phosphine oxide and tri-octyl phosphine. Growth of these nuclei is highly temperature 
dependent, and the final nanocrystal size can be controlled by the reaction time at elevated temperatures, as well as 
the ratio of Cd:Se [11]. Typical size distributions formed by these reactions can be a < 5% in the II-VI material 
systems [10, 81]. The III-V materials can achieve a = 20% [87], and the group IV semiconductor nanocrystals 
achieve a -30% [69]. 

C2.17.2.3 OTHER CHEMICAL APPROACHES TO NANOCRYSTAL SYNTHESIS 

The methods described above are by no means the only strategies for creating crystals of restricted size. Colloidal 
methods for creating nanoparticles are still used widely today, especially to make gold and semiconductor 
nanoparticles [16, 17 ]. These reactions have the advantage of providing nanoparticles in aqueous solutions, the 
ideal environment for electrochemical applications. Also, while gas phase clusters are not considered nanocrystals 
by the stringent definition given previously, it is possible to form nanocrystals in the gas phase and subsequently 
collect and disperse them into solvents. Such methods have been applied to the production of silicon [13, 88] as 
well as metal [89] nanocrystals. 
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C2.1 7.2.4 NANOCRYSTAL ASSEMBLY 

In solution, nanocrystals are ideal spectroscopic samples; however many of their most important properties can 
only be realized when they are assembled into more complex structures. One way of building complex structures is 
to rely on the inherent tendency for monodisperse spheres to crystallize. Figure C2.17.3 shows the hexagonal close- 


packed ordering of monodisperse silica nanoparticles; such crystallization does not require any inter-particle 
interaction, but occurs when highly uniform objects are driven to a minimum volume packing. This ordering has 
been observed in many nanocrystalline systems when crystallite size distributions fall below 5%; three-dimensional 
supercrystals of nanocrystals are the result [90, 91 and 92]. Crystallization can also be induced at air-liquid 
interfaces using Langmuir-Blodgett techniques. Thin monolayers of particle arrays thus formed can be transferred 
to any surface [93, 94]- Finally, high-coverage monolayers of nanocrystals or spatially patterned sub-monolayers 
can be formed on a variety of flat solid surfaces [14, 95, 96, 97, 98, 99, 100 and 101 ]. 


Figure C2.17.3. Close-packed array of sub-micrometre silica nanoparticles. When nanoparticles are very 
monodisperse, they will spontaneously arrange into hexagonal close-packed structure. This scanning electron 
micrograph shows an example of this for very monodisperse silica nanoparticles of -250 nm diameter, prepared in 
a thin- film format following the techniques outlined in [ 236 ]. 

An equally important challenge for nanocrystal assembly is the formation of specific nanocrystal arrangements in 
solution. By using complementary DNA strands as tethers, Mirkin et al [ 102 , 103 ] formed aggregates of gold 
nanocrystals with specific sizes; Alivisatos et al also used DNA to structure semiconductor nanocrystal molecules, 
though in this case the molecules contained only a few nanocrystals placed controlled distances from each other 
[ 104 , 105 and 106 ]. The potential applications of biomolecular techniques to this area of nanoscience are immense, 
and the opportunities have been reviewed in several recent publications [ 107 , 108 , 109 and 110 ]. 


C2.17.3 CHARACTERIZATION OF NANOCRYSTALS 

The goal of any nanocrystal characterization is to identify the position of every atom in a single nanocrystal, as 
well as 
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to determine the distribution of sizes and shapes present in a sample. Such precise characterization is possible in the 
case of some smaller metal and semiconductor clusters. These systems are amenable to new types of mass 
spectroscopy which can provide molecular weights of species up to 55 000 atomic mass units, thus permitting 
molecular composition and size to be precisely determined [ 111 , 112 , 113 and 114 ]. These systems are the 
exception, however, not the rule. More typically, nanocrystalline samples contain crystallites with thousands of 
atoms, molecular weights in excess of 100 000 amu, and diameters larger than 1 nm. 

The problem of characterizing these larger nanocrystals is challenging and is, in its own right, an area of research. 
Some of the simplest characterization techniques are indirect: for example, the mean size of semiconductor 
nanocrystals can often be inferred from the sample's optical properties. These inferences typically rely on very 
simple models for the nanocrystal properties, and thus provide only very rough sizing estimates and no information 
about structure. Classic analytical methods of polymer and colloid science, such as chromatography and light 
scattering, can also be used to evaluate nanocrystalline samples [13, 115 , 116 , 117 and 118 ]. While they provide a 
measure of particle size and shape in solution, their applications to inorganic nanocrystals have been limited, in part 
because they provide little direct structural information. The most successful tools in this area have been adapted 
from material science and inorganic chemistry. These direct structural methods, when used together, provide a 
complete picture of the nanocrystal interior and its surface. 

C2.1 7.3.1 MICROSCOPIES 

In many ways the nanocrystal characterization problem is an ideal one for transmission electron microscopy 
(TEM). Here, an electron beam is used to image a thin sample in transmission mode [ 119 ]. The resolution is a 
sensitive function of the beam voltage and electron optics; a low-resolution microscope operating at 100 kV might 


have a 2-3 A resolution while a high-voltage machine designed for imaging can have a resolution approaching 1 A. 
Since nanocrystalline samples range from ten to hundreds of angstroms in size, this type of microscopy allows both 
the interior crystal structure and the overall particle shape to be measured. 

A single TEM picture of a nanocrystalline sample can provide an enormous amount of information ( figure 
C2.17.1 ). Low-resolution TEM can also be used to determine sample distributions and shapes ( figure C2.17.2 ) 
( figure C2.17.4 ) and ( figure C2.17.5 ) [ 120 ]. Higher-resolution images show the discrete nature of the crystalline 
interior of nanoparticles and can detect the presence of certain crystalline defects ( figure C2.17.1 ) and ( figure 
C2.17.6 ) ; the Fourier transform of such images ( figure C2.17.6 ) left panel) provides a measure of the lattice 
spacing and crystallographic parameters. Direct electron diffraction data can also be collected on fields of particles 
to verify the phase of the nanocrystal ( figure C2.17.7 ). 
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Figure C2.17.4. Transmission electron micrograph of a field of Zr0 2 (tetragonal) nanocrystals. Lower-resolution 
electron microscopy is useful for characterizing the size distribution of a collection of nanocrystals. This image is 
an example of a typical particle field used for sizing purposes. Here, the nanocrystalline zirconia has an average 
diameter of 3.6 nm with a polydispersity of only 5% 



Figure C2.17.5. Transmission electron micrograph of a field of anisotropic gold nanocrystals. In this example, a 
lower magnification image of gold nanocrystals reveals their anisotropic shapes and faceted surfaces [36]. 
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Figure C2.17.6. Transmission electron micrograph and its Fourier transform for a TiC nanocrystal. High-resolution 
images of nanocrystals can be used to identify crystal structures. In this case, the image of a nanocrystal of titanium 
carbide (right) was Fourier transformed to produce the pattern on the left. From an analysis of the spot geometry 
and spacing, one can determine that the nanocrystal is oriented with its [ 100 ] zone axis parallel to the viewing 
direction [ 217 ]. 
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Figure C2.17.7. Selected area electron diffraction pattern from TiC nanocrystals. Electron diffraction from fields 
of nanocrystals is used to determine the crystal structure of an ensemble of nanocrystals [ 119 ]. In this case, this 
information was used to evaluate the phase of titanium carbide nanocrystals [ 217 ]. 

More sophisticated analyses of high-resolution TEM images can provide even deeper insight into subtle structural 
aspects of nanocrystals [ 121 , 122 and 123 ]. Simulation of high-resolution images can, in principle, provide data 
concerning whether the average bond length in a nanocrystal is uniform or variable within the nanocrystal interior 
[ 124 ]. Another exciting prospect is the use of TEM to provide direct information about nanocrystal surfaces, 
including reconstructions and dynamic motions of atoms at surfaces [ 125 ]. In one case high-resolution studies of a 
gold nanocrystal surface led to the identification of a 2 x 1 surface reconstruction [ 126 , 127 ]. 

Other forms of microscopy have been used to evaluate nanocrystals. Scanning electron microscopy (SEM), while 
having lower resolution than TEM, is able to image nanoparticles on bulk surfaces, for direct visualization of 
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nanocrystals in larger assemblies ( figure C2.17.3 ). McEuen et al [ 128 , 129 ] used a field emission SEM to detect the 


presence or absence of single nanoparticles in small electrodes made lithographically; Bawendi et al [91] used 
SEM to determine the three-dimensional structure of arrays of CdSe clusters. Scanning tunnelling microscopy 
(STM) has also been successfully applied to metal nanocrystals; in these cases STM can be used both to image 
nanocrystals and to study their electrical properties [ 111 , 130 , 131 , 132 and 133 ]. The use of scanning force 
microscopies for nanocrystal characterization is not as common, due to the relatively large probe sizes (d> 50 nm) 
required for force microscopy [ 134 , 135 ]; however, the height resolution of force microscopy is quite good, of the 
order of 0.1 A. The height of nanocrystals on surfaces has been used as a metric for nanocrystal diameter [58, 136 ]. 
Another difficulty with force microscopy is that lateral forces are large, and the probe can move nanocrystals 

unless they are tightly bound to the underlying surface. Intermittent contact-mode, or Tapping Mode™, reduces 
these lateral forces and is a more appropriate choice for nanocrystal sizing. 

C2.17.3.2 X-RAY DIFFRACTION 

Although microscopic methods provide a direct visualization of nanocrystal samples, the images alone provide a 
misleading view of a nanocrystalline sample. Unreacted molecular species as well as small amorphous particles are 
difficult to see in many microscopies, yet they can comprise a large fraction of a supposedly nanocrystalline 
sample. In such low-purity samples, x-ray diffraction (XRD) studies would show no distinct crystalline features; 
highly pure nanocrystalline samples, in contrast, provide strong crystalline reflections (figure C2.17.8). The 
positions of these peaks provide an accurate fingerprint of the crystal structure of the nanocrystal. Such an 
unambiguous determination of nanocrystal structure is especially important, as many nanocrystalline materials 
adopt metastable crystal structures distinct from the bulk solid ( figure C2.17.9 ) [ 137 , 138 ]. 
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Figure C2.17.8. Powder x-ray diffraction (PXRD) from amorphous and nanocrystalline Ti0 2 nanocrystals. Powder 
x-ray diffraction is an important test for nanocrystal quality. In the top panel, nanoparticles of titania provide no 
crystalline reflections. These samples, while showing some evidence of crystallinity in TEM, have a major 
amorphous component. A similar reaction, performed with a crystallizing agent at high temperature, provides well 
defined reflections which allow the anatase phase to be clearly identified. 
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The breadth of the peaks in an x-ray diffractogram provide a determination of the average crystallite domain size, 
assuming no lattice strain or defects, through the Debye-Scherr formula: 


t = 


0.9JL 
8 cos fl B 


where t is the thickness of the crystal, X is the wavelength of the x-rays, B is the full width at half maximum of the 


diffraction peak, and B is the Bragg angle of the peak [ 139 , 140 ]. Figure C2.17.9 shows an example of the line 
broadening observed in nanocrystalline samples of different size. It is especially valuable to compare the domain 
sizes determined from XRD with the sizing found in TEM. Good agreement between these two measurements is 
conclusive evidence that the nanocrystals are free from crystalline defects. More extensive analysis and simulations 
of the XRD patterns of nanocrystals can provide information on nanocrystal defect density and type as well as the 
presence and distribution of strain in the nanocrystal lattice [ 141 ]. 



Figure C2.17.9. Size-dependent changes in PXRD linewidths. PXRD can be used to evaluate the average size of a 
sample. In these cases, different samples of nanocrystalline titania were analysed for their grain size using the 
Debye-Scherr formula. As the domain size increases, the widths of the diffraction peaks decrease. 

C2.17.3.3 SURFACE CHARACTERIZATION 

The characterization of nanocrystal surfaces is a crucial issue, as surface chemistry and defects can dominate 
nanocrystal properties. A common question concerning nanocrystal surface is the nature of the bonding at the 
crystal-organic interface. X-ray photoemission (XPS) is a useful method for not only identifying the atomic 
composition of a sample, but also for determining the oxidation states of the nanocrystal atoms [ 84 , 142 ]. In CdSe 
nanocrystals, for example, XPS indicates the presence of both selenium bound to cadmium as well as oxidized 
selenium at the surface [84]; similar XPS studies of gold nanocrystals have indicated that, even when passivated by 
thiols, samples only contain gold in the zero-valent state. Information concerning the dynamics and also coverage 
of the organic groups at 
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the nanocrystal surface can be found using solution state NMR [58, 143 , 144 ]. The size of nanocrystals reduces the 
rotational averaging in the liquid phase; however proton signals from capping groups can be detected in many 
systems. Vibrational spectroscopies have also been applied to the problem of organic group geometry and 
coverage, especially in metal systems [52, 59]. Also, extended x-ray absorption fine structure (EXAFS) and other 
x-ray absorption spectroscopies have been applied to nanocrystal surface characterization [ 145 , 146 ]. These 
methods are particularly powerful, as they can, in principle, measure the spatial distribution of bond lengths within 
a nanocrystal. 


C2.17.4 OPTICAL PROPERTIES OF NANOCRYSTALS 


The striking size-dependent colours of many nanocrystal samples are one of their most compelling features; 
detailed studies of their optical properties have been among the most active research areas in nanocrystal science. 
Evidently, the optical properties of bulk materials are substantially different from those of isolated atoms of the 


same material. In principle, one can describe the optical properties of a nanometre-sized object as an intermediate 
between these two limiting cases, with a continuous evolution from atomic to bulk properties with increasing 
particle size. Practically speaking, the most common approaches have used the bulk optical behaviour as a starting 
point for predicting spectra; successive modifications are made to these descriptions in order to account for a 
variety of size-dependent effects. Schemes of this nature are necessarily limited, both because the size-dependent 
modifications are usually treated in a perturbative or approximate fashion, and because the modified theories do 
not, in general, converge to the atomic limit as the size of the nanoparticle decreases. Nonetheless, such approaches 
have enjoyed a great deal of success in predicting the optical properties of a wide variety of nanoparticles. 

This section will outline the simplest models for the spectra of both metal and semiconductor nanocrystals. The 
work described here has illustrated that, in order to achieve quantitative agreement between theory and experiment, 
a more detailed view of the molecular character of clusters must be incorporated. The nature and bonding of the 
surface, in particular, is often of crucial importance in modelling nanocrystal optical properties. While this section 
addresses the linear optical properties of nanocrystals, both nonlinear optical properties and the photophysics of 
these systems are also of great interest. The reader is referred to the many excellent review articles for more in- 
depth discussions of these and other aspects of nanocrystal optical properties [ 147 , 148 , 149, 150 , 151, 152 , 153 
and 154]. 

C2.1 7.4.1 THE OPTICAL PROPERTIES OF SEMICONDUCTOR NANOCRYSTALS 

One of the most striking features of semiconductor nanocrystal samples is their vivid colours, which vary with the 
mean particle size. These changes in the optical properties are not due to structural changes in the material; rather 
they are a reflection of the fact that the nature of the excitations which determine the colour of these solids is 
perturbed by the change in the size of the system. In many bulk semiconductors, the characteristic size of the 
lowest-lying optical excitations is much larger than the size of a unit cell [ 155 ]. As the system shrinks, this 
excitation is confined to a region which is smaller than its natural size, and the associated electronic energy levels 
shift as a result. Figure C2.17.10 shows typical absorption spectra of spherical CdSe semiconductor nanocrystals, 
for several different mean particle sizes. Evidently, the lowest-lying excitation shifts continuously to higher energy 
as the size of the particle decreases. Indeed, it is possible to follow the evolution of this state continuously from 
very large (nearly bulk-like) crystallites down to only -10 A radius. In bulk semiconductors, this lowest energy 
absorption feature corresponds to an excited electron and hole which are bound via the Coulomb interaction, 
known as an exciton. Thus, the corresponding excitation in the nanocrystals is generally assumed to be of the same 
character, although perturbed by the confinement. In CdSe, for example, the bulk exciton is 57 A in radius [ 155 ]; 
thus, when CdSe nanocrystals shrink 
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to or below this characteristic length scale, their optical properties show an exquisite sensitivity to nanocrystal size 
(figure C2.17.10). This basic observation of a blue shift in the absorption energy of the excitonic transition in 
smaller clusters has been observed in many different semiconductor nanocrystals [72, 156 , 157 ]. 
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Figure C2. 17.10. Optical absorption spectra of nanocrystalline CdSe. The spectra of several different samples in 
the visible and near-UV are measured at low temperature, to minimize the effects of line broadening from lattice 
vibrations. In these samples, grown as described in [84], the lowest exciton state shifts dramatically to higher 
energy with decreasing particle size. Higher-lying exciton states are also visible in several of these spectra. For 
reference, the band gap of bulk CdSe is 1.85 eV. 

An explanation for these size-dependent optical properties, termed 'quantum confinement', was first outlined by 
Brus and co-workers in the early 1980s, [ 156 , 158 , 159 , 160 and 161 ] and has formed the basis for nearly all 
subsequent discussions of these systems. Though recent work has modified and elaborated on this simple model, its 
basic predictions are surprisingly accurate. The energy of the lowest-lying exciton state is given by the following 
simple formula: 


E(R) - £ K + 




fR 


Here, E and s are the band gap energy and the dielectric constant of the bulk semiconductor, and jli is the reduced 

9 

mass of the exciton system, 1/jli = l/m Q + 1/rn,. The second term, proportional to \IR , arises from a simple 
quantum confinement effect, just as in the well known particle-in-a-box problem of undergraduate quantum 
mechanics. An additional adjustment to the energy arises from a Coulombic effect. Because the electron and hole 
are forced to occupy a small volume, the wavefunctions of these two oppositely charged particles overlap spatially 
to a greater degree than in the bulk crystal. The third term, proportional to l/R, accounts for this enhanced overlap. 
Figure C2.17.ll illustrates this result for several different semiconductor materials. 
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Figure C2. 17.11. Exciton energy as a function of particle size. The Brus formula is used to calculate the energy 
shift of the exciton state as a function of nanocrystal radius, for several different direct-gap semiconductors. These 
estimates demonstrate the size below which quantum confinement effects become significant. 


Figure C2.17.12 depicts a comparison between experimentally determined exciton energies and those predicted by 
the Brus model, for CdSe nanocrystals. Given the level of approximation, the agreement is surprising. Nonetheless, 
the simple theory clearly overestimates the energies for the smallest crystallites. Recent work, both experimental 
and theoretical, has shown that the main deficiency of the quantum confinement model is that it fails to include 
molecular-level detail. For example, the Brus model assumes that the confining potential is spherically symmetric 
and infinitely high. High-resolution electron microscopy studies [ 162 ] as well as theoretical calculations [ 163 ] have 
suggested that the lattice structure and facets of nanocrystals lower the particle symmetry leading to intrinsic shifts 
in the confinement energies. Dielectric spectroscopy [ 164 ], as well as Stark absorption spectroscopy [ 165 , 166 ] 


have demonstrated that the ground state of the CdSe nanocrystals has a large static dipole moment. These 
measurements support the notion that the departure from perfect spherical symmetry can strongly influence the 
energies. Interactions between internal and surface states have also been shown to strongly influence the dynamics 
of the excited state, on time scales ranging from femtoseconds [ 167 , 168 ] to nanoseconds [ 169 , 170 ]. 
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Figure C2. 17.12. Exciton energy shift with particle size. The lowest exciton energy is measured by optical 
absorption for a number of different CdSe nanocrystal samples, and plotted against the mean nanocrystal radius. 
The mean particle radii have been determined using either small-angle x-ray scattering (open circles) or TEM 
(squares). The solid curve is the predicted exciton energy from the Brus formula. 

While the Brus formula can be used to locate the spectral position of the excitonic state, there is no equivalent a 
priori description of the spectral width of this state. These bandwidths have been attributed to a combination of 
effects, including inhomogeneous broadening arising from size dispersion, optical dephasing from exciton-surface 
and exciton-phonon scattering, and fast lifetimes resulting from surface localization [ 167 , 168, 170 , 171 ]. Due to 
the complex nature of these line shapes, there have been few quantitative calculations of absorption spectra. This 
situation is in contrast with that of metal nanoparticles, where a more quantitative level of prediction is possible. 

C2.17.4.2 THE OPTICAL PROPERTIES OF METAL NANOCRYSTALS 

Like semiconductor nanocrystals, solutions of metal nanocrystals exhibit striking size-dependent colours, a fact 
which has fascinated scientists and artists alike for centuries. Gold colloids, in particular, have been used since the 
Middle Ages as colouring pigments for paints and especially stained glass. Faraday was the first to recognize their 
metallic character in 1857, and he remarked on their vivid colours. This colour arises from a sharply peaked 
resonance in the visible region of the spectrum, which occurs at much lower energy than in the bulk metal. Both the 
spectral position and width of this resonance are observed to vary with the size of the metal nanoparticle, 
suggesting, as in the semiconductor nanocrystals, that a fundamental excitation of the system is influenced by the 
restricted size. 


The optical properties of metal nanoparticles have traditionally relied on Mie theory, a purely classical 
electromagnetic scattering theory for particles with known dielectrics [ 172 ]. For particles whose size is comparable 
to or larger than the wavelength of the incident radiation, this calculation is rather cumbersome. However, if the 
scatterers are smaller than -10% of the wavelength, as in nearly all nanocrystals, the lowest-order term of Mie 
theory is sufficient to describe the absorption and scattering of radiation. In this limit, the absorption is determined 
solely by the frequency-dependent dielectric function of the metal particles and the dielectric of the background 
matrix in which they are 
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embedded. So, the size dependence of the optical properties enters only through the size-dependent dielectric 
function of the nanoparticles [ 172 ]. 

In the optical range, the dielectric function of a metal generally can be divided into two distinct contributions, 
arising from the intraband and the interband coupling. The former is generally described using a Drude formalism, 
whereas the latter is often described empirically, and differs dramatically for different metals. In nanometre-sized 
metals, the interband transitions, as in for example the 5d-to-conduction band transitions in gold, are generally 
assumed to be independent of the size of the crystallite. In contrast, the inelastic scattering time, which appears as a 
phenomenological parameter in the Drude model, is assumed to have a strong contribution from surface scattering 
[ 173 ]. A modified scattering rate of the form r = r Q + Av^lR is often used [ 172 , 174 ]. Here T^ is the scattering rate 
in the bulk metal, v F is the Fermi velocity and A is a constant of order unity. This form expresses the limitations on 
the mean free path of the free electrons as a result of the confining geometry. 

Figure C2.17.13 presents a model calculation of the absorption of gold nanocrystals, using the formalism outlined 
above. The qualitative result is that, as metal colloids become smaller, the primary absorption peak shifts to lower 
energy, and broadens significantly. The peak shifts predicted are small, of the order of 0.1 eV for a 2 nm gold 
crystallite. In contrast, the peak widths are far more sensitive to size. This simple theory, and its variations, have 
been successful at explaining many experimental observations, especially for clusters greater than 3 nm in size 
[ 154 ]. This success is not surprising, given that the fundamental assumption has been the insensitivity of the 
interband transitions to the size of the crystallite. Unlike in the case of the semiconductor nanocrystals, where the 
transition involved highly delocalized exciton states, these band-to-band transitions involve the localized d-shell 
electrons of the metal. As a result, a clear quantum confinement effect has generally not been expected in metal 
nanocrystals. Recently, extremely detailed analyses of the optical spectra of highly monodisperse gold colloids 
suggest that some features of these spectra could be attributed to quantum confinement, in the form of a size- 
dependent interband dielectric function [ 111 , 112 ]. Further work on the nonlinear optical properties of these 
materials may permit a more precise quantification of these effects. 



Energy {e V) 

Figure C2. 17.13. A model calculation of the optical absorption of gold nanocrystals. The formalism outlined in the 
text is used to calculate the absorption cross section of bulk gold (solid curve) and of gold nanoparticles of 3 nm 
(long dashes), 2 nm (short dashes) and 1 nm (dots) radius. The bulk dielectric properties are obtained from a cubic 
spline fit to the data of [ 237 ]. The small blue shift and substantial broadening which result from the mean free path 
limitation are 


clearly evident. 
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Some of the most interesting recent work in the optical properties of nanocrystals involves the study of single 
nanocrystals rather than ensembles, using near-field optical techniques. These relatively new optical methods can 


be used to address individual nanoparticles, rather than ensembles. This can potentially eliminate from 
consideration the effects of sample inhomogeneity and, in addition, can give rise to new and interesting optical 
effects. Studies of both metal [ 175 ] and semiconductor [ 176 , 177 ] nanocrystals have been reported. 


C2.17.5 THERMODYNAMIC PROPERTIES OF NANOCRYSTALS 

In order to use any material for commercial purposes, it is important to understand its phase behaviour. Bulk gold, 
for example, melts at 1065°C and thus would be an unwise choice as an electrical interconnect in a high- 
temperature environment; many ceramics can crack during thermal processing due to solid-solid phase 
transformations that lower the volume of the crystal [ 178 ]. The extrapolation of such bulk phase behaviour to the 
properties of nanocrystals is not a straightforward problem. Nanocrystals are intrinsically metastable materials 
which, given the right circumstances, would fuse to create bulk crystals. Indeed, metal nanocrystals prepared on 
surfaces under high- vacuum conditions do spontaneously fuse into larger grains [ 179 , 180 ]. On the other hand, 
solutions of nanocrystals stabilized with organic agents can exist for months or even years with unchanging sizes. 
Evidently, the metastability of nanocrystals is a sensitive function of their surface bonding, but the nanocrystal 
surface affects the crystallite in two distinct ways. First, surface atoms can make up 5^0% of the mass of a 
nanocrystal, and thus contribute significantly to the overall thermodynamic properties of the material. Second, the 
nanocrystal surface chemistry can raise the activation barriers for many thermodynamically favoured processes. 
Thus, it is important to consider the surface in both the kinetic and thermodynamic treatments of phase behaviour. 

This section will describe the current status of research in two different aspects of nanocrystal phase behaviour: 
melting and solid-solid phase transitions. In the case of melting, thermodynamic considerations of surface energies 
can explain the reduced melting point observed in many nanocrystals. Strictly thermodynamic models, however, 
are not adequate to describe solid-solid phase transitions in these materials. 

C2.17.5.1 NANOCRYSTALS AT HIGH TEMPERATURES 

In a classic study, Buffat et al used electron microscopy at high temperature to measure the melting point of gold 
nanocrystals as a function of their size [ 181 ]. They observed that smaller nanocrystals have lower melting 
temperatures; this size-dependent melting point varies as roughly \IR. Thus, a gold nanocrystal of 5 nm diameter, 
for example, melts at 885°C, 180°C lower than bulk gold. Since this original study, this behaviour has been 
observed in both metal and semiconductor nanocrystals [ 182 , 183 , 184 , 185, 186 , 187 , 188 , 189 and 190 ], using 
techniques ranging from electron microscopy [ 191 , 192 and 193 ] to nanocalorimetry [ 194 ]. 

The simplest approach to understanding the reduced melting point in nanocrystals relies on a simple 
thermodynamic model which considers the volume and surface as separate components. Whether solid or melted, a 
nanocrystal surface contains atoms which are not bound to interior atoms. This raises the net free energy of the 
system because of the positive surface free energy, but the energetic cost of the surface is higher for a solid cluster 
than for a liquid cluster. Thus the free-energy difference between the two phases of a nanocrystal becomes smaller 
as the cluster size 
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decreases and a reduction in the melting point is observed [ 181 , 187 , 193 ]. A variety of more elaborate theories 
which treat the nanocrystal surface structure, as well as the exact treatment of lattice strain, can provide quantitative 
agreement with measurements [ 195 , 196 and 197 ]. 

While the variation in the melting temperature of nanocrystals can be explained using classic thermodynamic 
arguments, the process by which nanocrystals, or even bulk solids, undergo melting is an active area of research. 
[ 198 , 199 ] Nanocrystals offer ideal systems with which to explore the role of kinetics in melting. Nanocrystals 
embedded in solid media have been observed to superheat, existing as solids well above their melting point [ 200 , 
201 ]. The extent of the superheating and its dependence on heating rate are sensitive to the nanocrystal size. One 
explanation is that long-range density fluctuations are responsible for inducing melting in solids [ 202 ]; in 


nanocrystals these modes may be restricted. Another important issue in the mechanism of bulk melting is the role 
of surfaces and defects [192, 203, 204 ]. Several studies suggest that surface melting occurs at temperatures below 
the bulk melting point. This phenomenon would have important consequences for the thermal stability of 
nanocrystals as it could lead to shape changes in isolated nanocrystals and fusion in tightly packed nanocrystal 
arrays [179, 180, 205, 206, 207, 208 and 209]. 

Melting is only one of many processes that nanocrystals can undergo when they are heated. Temperature-induced 
phase transitions are equally important in nanocrystals, especially in covalent materials such as oxides [ 210 ]. 
Unlike melting and the solid-solid phase transitions discussed in the next section, these phase changes are not 
reversible processes: they occur because the crystal structure of the nanocrystal is metastable. For example, titania 
made in the nanophase always adopts the anatase structure. At higher temperatures the material spontaneously 
transforms to the rutile bulk stable phase [ 211 , 212 and 213 ]. The role of grain size in these metastable-stable 
transitions is not well established; the issue is complicated by the fact that the transition is accompanied by grain 
growth which clouds the interpretation of size-dependent data [ 214 , 215 and 216 ]. In situ TEM studies, however, 
indicate that the surface chemistry of the nanocrystals play a crucial role in the transition temperatures [ 217 , 218 ]. 

C2.1 7.5.2 NANOCRYSTALS AT HIGH PRESSURE 

The ability to control pressure in the laboratory environment is a powerful tool for investigating phase changes in 
materials. At high pressure, many solids will transform to denser crystal structures. The study of nanocrystals under 
high pressure, then, allows one to investigate the size dependence of the solid-solid phase transition pressures. 
Results from studies of both CdSe [ 219 , 220 , 221 and 222 ] and silicon nanocrystals [ 223 ] indicate that solid-solid 
phase transition pressures are elevated in smaller nanocrystals. 

The observation of elevated transition pressures in nanocrystals may at first appear to contradict the observation 
that the melting temperatures of these same systems are reduced. This is not a contradiction, though, because the 
important variable in both phase changes is the difference in the free energies of the two states. If the high-pressure 
crystal structure (rock-salt in the case of the II- VI nanocrystals) has a higher surface free energy than the low- 
pressure phase (wurtzite for II- VI nanocrystals), then the transition pressure will be elevated in small clusters for 
strictly thermodynamic reasons. This explanation is consistent with the observed elevation in the phase transition 
pressures observed in smaller nanocrystals, and was the first model proposed to describe this high-pressure 
behaviour [ 219 ]. 

More recently, studies of the hysteresis of these phase transitions have illuminated the importance of kinetic factors 
in solid-solid phase transitions [ 224 ]. The change between crystal structures does not occur at the same point when 
pressure is increasing, as when it is decreasing; the difference between this 'up-stroke' and 'down-stroke' pressure 
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provides a measure of the activation energy required for the transition. These experiments have been performed 
both as a function of cluster size as well as temperature for II- VI nanocrystals [ 224 ]. As the nanocrystal size 
increases, a larger number of atoms in the crystallite interior must move during the phase transition, so the kinetic 
barrier increases. However, in sufficiently large nanocrystals, there is a larger volume available and defect- 
nucleation is a viable pathway. This lowers the kinetic barrier as the bulk limit is approached. For more complete 
reviews of studies of pressure-dependent properties in nanocrystals see [ 225 , 226 ]. 


C2.17.6 CONCLUSIONS 

This review has covered many of the essential features of the physical chemistry of nanocrystals. Rather than 
provide a detailed description of the latest and most detailed results concerning this broad class of materials, we 
have instead outlined the fundamental concepts which serve as departure points for the most recent research. This 
necessarily limited us to a discussion of topics that have a long history in the community, leaving out some of the 
new and emerging areas, most notably nonlinear optical studies [ 152 ] and magnetic nanocrystals [ 227 ]. Also, the 


study of the electrical transport behaviour of nanocrystals and nanocrystal assemblies is an area of growing interest. 
Both single-electron transport and collective transport through organized assemblies [ 128 , 228 , 229 and 230 ] 
promise to be of great scientific as well as technological importance. Finally, we note that we did not discuss the 
many possible applications for nanocrystals. These little solids offer many unique and tunable features which 
promise to make them important materials in areas as diverse as microelectronics [64, 231, 232, 233 and 234 ] and 
biotechnology [ 107 , 235]. Finally, we note that a search of one particular on-line scientific abstract database, using 
the keyword 'nanocrystal', demonstrates a monotonic increase in the number of publications per year, each year 
since 1993. This highly unscientific measure is nonetheless quite satisfying, as it testifies to the health and vitality 
of this exciting field of chemical physics. 
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C2.18 Etching and deposition 

HP Gillis 


C2.18.1 INTRODUCTION 

Numerous technological processes involve removal of material from a solid surface by etching, or addition of new 
material to a solid surface (in this case called the substrate) by deposition. Both classes of reactions have long been 
studied in liquid solution [JJ. Since about 1980 process demands of the microelectronics industry have stimulated 
development of both etching and deposition processes involving solid surfaces and gas phase reactants [2]. The 
general field of gas-surface reactions can be unified by classification into four groups, according to whether the 
products are volatile or involatile, and whether the reaction products incorporate atoms from the surface (see table 
C2.18.1). These reaction groups all share common fundamental concepts, and can be investigated by common 
techniques developed during the very active period of modern surface science since about 1975 [3]. These concepts 
and techniques have been introduced at several earlier points in this book. Regarding applications, the discussion of 
catalysis and corrosion in chapter C2.7 and chapter C2.8 respectively complement the present chapter. 


Table C2.18.1. Classification of gas-solid reactions. 


Volatile products Involatile products 


Solid atoms incorporated Etching Corrosion 

No atoms from solid Catalysis Deposition 


Because surface chemical reactions occur at a localized geometrical interface between phases, transport processes 
are strongly coupled with these reactions. The overall reaction can be decomposed into a series of steps — 
attachment or 'sticking' of reactants to the surface; diffusion of reactants on the surface; formation of reaction 
product; disposition of reaction product — any one of which can be rate-limiting for the overall reaction. 
Fundamental progress in understanding etching and deposition reactions relies upon isolating one of these steps for 
investigation in simplified model circumstances, either theoretical or experimental. A comprehensive review 
published in 1994 summarizes such fundamental results from reactive molecular beam studies of both etching and 
deposition reactions. That review is highly recommended as an introduction to the field, since it also describes the 
fundamental concepts in adsorption dynamics and chemisorption, as well as the relevant experimental techniques 
[4]. 

In practical applications, gas-surface etching reactions are carried out in plasma reactors over the approximate 
pressure range 10—1 Torr, and deposition reactions are carried out by molecular beam epitaxy (MBE) in ultrahigh 
vacuum (UHV: below 10 Torr) or by chemical vapour deposition (CVD) in the approximate range 10—10 Torr. 
These applied processes can be quite complex, and key individual reaction rate constants are needed as input for 
modelling and simulation studies — and ultimately for optimization — of the overall processes. 
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The objective of this chapter is to provide an introduction to etching and deposition for chemical physicists and 
physical chemists so they can select key fundamental questions in these complex processes for experimental or 
theoretical studies in much simpler model configurations. Since modern etching and deposition methods have 
developed mainly empirically, they are closely tied to and named after the experimental techniques used. We 
provide a brief introduction to this terminology, as well as a brief guide to the process literature, including review 
articles and advanced references. The bulk of this chapter is devoted to a discussion of selected themes and 
examples in etching and deposition where fundamental studies have already made advances. The goal of this 
discussion is to illustrate opportunities and approaches to these problems, not to provide a critical review of the 
present state of these fields. 


C2.18.2 INDUSTRIAL IMPORTANCE OF ETCHING AND DEPOSITION 

Etching (e.g., for decorating glass objects, for removing oxides and other impurities from the surfaces of metals) 
and deposition (e.g., for coating and passivating metallic surfaces) have a long and distinguished history in the 
chemical processing industries. The present discussion is limited to gas-surface reactions important in the 
microelectronics and optoelectronics industries. 

C2.1 8.2.1 DRY ETCHING FOR PATTERN DEFINITION IN MICROELECTRONICS 

The purpose of etching is to transfer features from a device design mask to an underlying film of device material, 
replicating accurately the cross-sectional profile of each feature. Various possible results are schematically 
illustrated in figure C2.18.1 where the mask is still in place after etching to show the connection between etch 
profile and the mask dimensions. In the earliest days of integrated-circuit fabrication, features 20- 50 urn wide 
were transferred by wet etching into films about 1 um thick, producing rounded, undercut or isotropic etching, 
illustrated in region A in the figure. As new design rules to increase the speed of devices and to pack devices more 
densely on the wafer required lateral dimensions below 1 um, the errors due to undercut became intolerable and the 
need arose for a new method that would achieve anisotropic etching, to define straight sidewalls (B). Various 
implementations of plasma etching described below achieve this goal under proper conditions. 
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Figure C2.18.1. Schematic representation of various results of etching through a mask. The regions marked by 
letters are defined and described in the text. 
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All versions of plasma etching expose the sample and mask to ion bombardment in the presence of chemically 
reactive species. The interplay between these two components determines the net result of the etching process. 
Although the mechanism is still not understood in detail, it is recognized that ion-enhanced reactions at the bottom 
of the opening in the pattern (but not along the sidewalls) of the feature being transferred are essential for 
anisotropy. If ion bombardment overwhelms the chemical component of the etching process, the mask edges will 
be eroded and the profile will be overcut (C). Excessive ion component also produces trenching (D) because ions 
reflected from the sidewalls appear at the edge of the bottom and increase the ion flux (and the etch rate) in that 
region. Ideally, etch processes will be highly selective between materials; non-selectivity is illustrated in region E. 
In regions A-E small amounts of material are removed through the mask to define a recess into the film. In region 
F, much more material is removed to define a mesa in the film. Regions B and F illustrate the ideal features from 
which advanced devices are constructed. 

Depending on conditions of ion energy and flux, temperature and chemical environment these features may receive 
ion bombardment damage while being etched. The term 'etch damage' includes effects such as creation of lattice 
defects or interstitials due to momentum transfer from the ions, 'knock-on' of various impurities into the substrate 
and electrical breakdown from non-uniform charge accumulations, all of which compromise the optical and 
electrical properties of the etched surface. Characterization of micro structural changes that constitute damage and 
development of new etching methods to reduce or eliminate damage, are major themes in present-day dry etching 
research. These newer methods of plasma etching are described below. 

The most common form of anisotropic plasma etching, called reactive ion etching (RIE), is achieved by an AC 
glow discharge (usually at 13.56 MHz) between two metal electrodes in a reactive feed gas ( figure (C2.18.2 )(a)). 
RIE delivers ions to the sample with energy -300 eV and therefore inflicts substantial damage, which must be 
removed in subsequent process steps. In electron cyclotron resonance (ECR) etching ( figure (C2.18.2 ) (b)), 
microwave power is coupled into the cylindrical source chamber by an antenna and one or more electromagnet 
coils around the source generate the cyclotron resonance frequency. Coupling the excitation energy into the reactor 
by various types of inductive coil leads to inductively coupled plasma (ICP) etching, for which the electromagnets 
are not necessary. Both ECR and ICP can generate a very high-density plasma, in which arrival energy of the ions 
is controlled by independent RF electrical bias of the sample stage. Arrival energies can be controlled below -50 
eV, which is still sufficient to cause damage in many cases. A substantial literature exists on the design and 
performance of RIE, ICP and ECR systems [5]. Anisotropic etching is also achieved by chemically assisted ion 
beam etching (C AIBE) in which independent beams of ions and reactive species are directed to the sample ( figure 
(C2.18.2 ) (c)). CAIBE provides much greater independent control over the identity, energy, direction of incidence 
and flux of both the ionic and reactive species than does RIE, ICP or ECR. Since the ion beams are extracted from 
broad-area plasma sources of the Kaufman type [6], ion energy must be -200 eV to obtain useful current and ion- 
inflicted damage is common. 


-4- 


Gas F«d 


* 


rJ™J£r7) 

Plilftti ^-T 



* ' / 

ETE/Z 


? 41 wiirce 


_L Blocking 
capacUoc 


Sample 


Vacuum pump 
(ft] 



MullfpOlGS 


Rruw 


Af gas f8Bd 



H4Ki^0Q»lHd 


Sample 


Pump 


(«) 


Figure C2.18.2. Schematic representations of various experimental configurations for plasma etching, (a) Reactive 
ion etching (RIE). (b) Electron cyclotron resonance etching (ECR). (c) Chemically assisted ion beam etching 
(CAIBE). The configurations are described in the text. 

Dramatic progress has been achieved, largely through empirical engineering, in developing practical processes for 
etching silicon with fluorine species. Indeed, the ability to etch sub -micrometre features in Si CMOS device 
technology has played a key role in bringing the computer industry to its current highly developed state. Dry 
etching of III-V materials is less well developed, but is increasingly important due to the impact of III-Vs in the 
rapidly expanding areas of wireless and optical communication. The wide-bandgap group III nitride 
semiconductors, which are especially interesting as emitters in the blue region of the spectrum, present special 
challenges in dry etching [7]. 

C2.1 8.2.2 FUNDAMENTAL ISSUES IN DRY ETCHING 

In order to design and optimize anisotropic dry etching processes, several issues must be understood: 

(a) What is the mechanism by which ion-enhanced reactions give anisotropic etching? 

(b) What are the underlying surface thermochemical reactions that are being enhanced? 

(c) What controls the detailed evolution of the feature profile during etching? What is the relative importance of 
directed transport of neutral reactive species and of ion enhancement? 

(d) What are the structure and composition of the surface during etching? 

(e) How can damage be eliminated from the process? 
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Although substantial progress has been made, much work remains to be done, especially in the characterization and 
elimination of damage. Each of these issues invites study by methods of chemical physics and physical chemistry, 
especially when examined in a simplified model environment. Representative themes and examples, illustrating 
both progress achieved and remaining questions, are presented in section C2. 18.3 . 


C2.1 8.2.3 DEPOSITION PROCESSES IN MICROELECTRONICS AND OPTOELECTRONICS 

While etching controls the lateral dimensions of microscopic devices in integrated circuits ( figure C2.18.1 ), 
deposition controls their vertical dimensions, which are equally important in device function. Modern devices may 
comprise thin films of deposited material between 1 nm and 50 urn in thickness, which in turn may include 
numerous individual thinner layers. Each of these layers must be deposited with highly controlled and reproducible 
properties, including composition, thickness, strain, microstructure and surface morphology. Advanced optical and 
electronic devices from compound semiconductors require epitaxial growth, in which the crystalline orientation of 
the deposited film is registered with that of the substrate. 

Numerous techniques have been developed for depositing films from vapours, ranging from straightforward 
evaporation to advanced chemical transport in which reactions are activated by heat, light or plasma. These have 
been surveyed in two comprehensive reviews [8, 9] and two popular interdisciplinary textbooks [10, 11 ]. The three 
most widely used chemically based techniques are: 

(1) CVD: gaseous reactants (precursors) delivered to a heated substrate in a flow reactor undergo thermal 
reaction to deposit solid films at atmospheric or reduced pressure, and volatile side products are pumped 
away. CVD is used for conductors, insulators and dielectrics, elemental semiconductors and compound 
semiconductors and is a 'workhorse' in the silicon microelectronics industry. 

(2) Metallorganic chemical vapour deposition (MOCVD) or organometallic vapour phase epitaxy (OMVPE): in 
this variation of CVD, the precursors are organometallic compounds of the III-V or II- VI elements, a very 
large number of which are available. Because of chemical similarity in the various groups of elements, a wide 
range of compound semiconductor alloys can be produced, e.g. GaAsj P , the basis of the familiar red light 
emitting diodes (LEDs) and lasers, and In Gaj N, the basis of the new blue LEDs and lasers [12]. MOCVD 
is widely used for epitaxial growth of compound semiconductor heterostructures essential for optical and 
high-speed electronic devices. 

(3) Metallorganic MBE (MOMBE): the 'solid source' Knudsen cells in conventional MBE are replaced with 
gaseous beams of organometallic precursors, directed toward a heated substrate in UHV. Compared to 
MOCVD, MOMBE eliminates gas phase reactions that may complicate the deposition surface reactions, and 
provides lower growth temperatures. 

All three techniques have a vast and readily accessible literature, and are discussed regularly at numerous scientific 
conferences around the world. 

C2.1 8.2.4 FUNDAMENTAL ISSUES IN CHEMICAL DEPOSITION OF THIN FILMS 

The fundamental steps in CVD, MOCVD and MOMBE processes can be classified as follows [13]: 
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(a) adsorption of precursors at the growth surface, 

(b) surface diffusion of precursors to growth sites, 

(c) surface reactions: incorporation of film constituents, 

(d) nucleation, followed by growth of film microstructure and topography, 

(e) desorption of by-products from surface reactions. 

It is difficult to observe these surface processes directly in CVD and MOCVD apparatus because they operate at 
pressures incompatible with most techniques for surface analysis. Consequently, most fundamental studies have 
selected one or more of these steps for examination by molecular beam scattering, or in simplified model reactors 
from which samples can be transferred into UHV surface spectrometers without air exposure. Reference [4] 
describes many such studies. Additional themes and examples, illustrating both progress achieved and remaining 
questions, are presented in section C2.18.4 . 


C2.18.3 SELECTED THEMES AND EXAMPLES IN ETCHING STUDIES 

Plasma etching was introduced into silicon device fabrication technology in the middle 1970s in order to obtain the 
anisotropic pattern definition needed for device features smaller than 1 urn. Early processes used complex mixtures 
of fluorocarbon gases and additives selected to optimize anisotropy, selectivity and rate through time-consuming 
empirical studies. Almost immediately, model studies of essential features of the reactions in simpler environments 
were undertaken to obtain fundamental insights on which optimization could be rationally designed. Twenty-five 
years later, this is still a rich and productive field of research. 

C2.1 8.3.1 EXPERIMENTAL STUDIES OF THERMAL ETCHING REACTION KINETICS AND DYNAMICS 

It was quickly recognized that a purely thermal reaction between F atoms and Si surfaces was a key component of 
plasma etching. This reaction was studied in a series of UHV experiments where an effusive molecular beam of 
XeF 2 , which readily decomposed to give F atoms, was directed onto a Si surface. After some early controversies, 
careful mass spectrometric analysis in which the product flux was modulated demonstrated that the main reaction 
product was SiF 4 [14, 15 ]. Measured velocity and energy distributions of the SiF 4 products were non-Maxwellian 
for the temperature of the surface, showing excesses of both 'cool' and 'hot' molecules [16]. This suggests at least 
two modes of product desorption, both different from simple evaporation of weakly bound molecules. The overall 
dynamics of formation and desorption of SiF 4 is complicated, since the reaction occurs in a complex fluorosilyl 
'corrosion layer' readily formed at the Si surface by the small and highly reactive F atoms [17]. This layer is 
described further in section C2. 18.3.3. A series of studies by Engel and co-workers examined the effect on this 
reactive adlayer at the Si surface of using chlorine versus fluorine, and molecules versus atoms. In all cases 
coverage of etchant adequate to produce steady-state etching led to complex reactive adlayers [18]. 

In view of the complex kinetics and dynamics of the overall reaction, attention was directed to dynamical studies of 
individual steps, particularly the attachment of etchant molecules to surfaces. The emerging theme is that the 
sticking probability and the structure of the adsorbate layer depend strongly on the energy of the incoming 
molecules and on the structure of the surface, and that chemisorption can be highly site-selective. For example, at 
low incident kinetic energy where precursor-mediated chemisorption dominates, Cl 2 forms large islands of SiCl on 
Si(l 1 1)-(7 x 7), while at high incident energy direct activated chemisorption dominates, and CI is adsorbed only at 
isolated sites [19]. 
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The dangling bonds of a Si surface abstract one F atom from an incident F 2 molecule while the complementary F 
atom is scattered back into the gas phase [20]. This abstractive mechanism leads to F adsorption at single sites 
rather than at adjacent pairs of sites, as observed directly by scanning tunnelling microscopy [21]. Br atoms adsorb 
only to Ga atoms in the second layer of GaAs(001)-(2 x 4), where empty dangling bonds on the Ga atoms can be 
filled by electrons from the Br atoms [22]. 

C2.1 8.3.2 EXPERIMENTAL STUDIES OF ION-ENHANCED ETCHING REACTION KINETICS AND DYNAMICS 

In what may be the single most influential experiment in the field of dry etching, John Coburn and Harold Winters 
of the IBM Almaden Research Centre in San Jose, CA, USA demonstrated that the rate of the F-Si thermal etching 

reaction is greatly enhanced if the Si surface is simultaneously bombarded with a beam of energetic Ar + ions and 
exposed to a flux of F atoms at thermal energies [23]: see figure C2.18.3 . This phenomenon plausibly explained the 
origin of anisotropic etching in regions B and F of figure C2.18.1 since, under proper conditions of pressure and 
power, the electric fields in the plasma reactor could steer ions from the plasma onto the sample along its normal 
direction and enhance the rate only at the bottom of the feature being etched [24]. Ion-enhanced etching model 
studies, in which independent beams of energetic ions and electrically neutral reactive species were simultaneously 
directed onto the substrate in high vacuum or UHV with various in situ reaction diagnostic tools, constituted an 
important simplification of an essential step in plasma etching; many such studies followed the original example of 
Coburn and Winters. 

The molecular mechanism of ion enhancement proved to be both subtle and complex. The complexity of the 


intrinsic 'corrosion layer' in F-Si etching made it difficult to determine whether ion bombardment influenced the 
adsorption of reactant, formation of product or desorption of product [25]. Ion enhanced Cl-Si etching, which is 
not thermally spontaneous at room temperature, demonstrated that ion-induced recoil implantation of CI into the Si 
lattice contributed substantially to the net reaction [26, 27 and 28]. Therefore, in steady-state ion-enhanced etching 
the ions contribute simultaneously to the formation and removal of the 'corrosion layer', and the only well defined 
experiment is to measure the kinetic energy distribution of each departing product species, as a function of ion 
beam energy and current density, by analogy with the sputtering of clean solids by inert ion beams [29]. Sputtering 
had already been explained by the linear cascade theory in which the incident ion transfers energy through a series 
of binary collisions with atoms in the solid as it penetrates beneath the surface [30]. The resulting fast recoils 

eventually cause atoms to be ejected from the solid with a kinetic energy distribution Q(E) oc E/(E+Uq) where U^ 
is the surface escape energy, usually approximated as the sublimation energy of the solid [31]. Careful analysis of 

the data reviewed from several different ion-enhanced etching studies (Si(Cl 2 ; Ar + ), Si(XeF 2 ; Ar + ), Si(SF 6 ; Ar + ), 
Si/Si0 2 (Cl 2 ; Ar + ), Si/Si0 2 (XeF~; Ar + )) showed that in all cases more than 80% of the molecules in each product 
species departed the surface with an energy distribution characteristic of a linear cascade [32, 33 ]. Each product 
species required a different value of U^ the values determined were consistent with reasonable qualitative models 
of the ion- induced mixing layer or 'corrosion layer'. To date no definitive quantitative description has been 
obtained, but the general trend is that in the energy range 0.5-2.5 keV ion-enhanced etching is dominated by 
cascade processes, the details of which vary with each gas-solid combination. 

Model studies of the crystalline damage caused to the substrate by ion bombardment as a side effect of ion- 
enhanced etching relied upon Rutherford backscattering ion channelling measurements with glancing exit path 
geometry to enhance depth resolution of the damaged layer. During bombardment of Si with energies 250-1000 
eV, argon and neon ions caused shallow damage (65-80 A) while hydrogen ions caused damage 275-475 A in 
depth depending on dose [34]. Progress in ion-enhanced etching up to 1992 has been reviewed by the founders of 
the field [35]. 
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Figure C2.18.3. Relationship between ion-assisted etching and directionality in plasma etching, (a) Demonstration 
of the synergy between ion bombardment and reactive species during ion-assisted etching, (b) Ions incident on an 
etched feature. This situation prevails in glow discharges when the feature dimensions are much less than the 
plasma sheath thickness. Reproduced from [35] 

More recent model experimental studies of ion-enhanced etching have emphasized the Si-chlorine combination 
with ion energies below 100 eV to simulate conditions in the newer ECR and inductively coupled high-density 
plasma etching systems which give highly anisotropic etching of high aspect ratio features with reduced damage in 
Si technology [36, 37 and 38]. As at higher ion energy, the collision cascade model describes the essential features 
of the process, with energy thresholds for removal of SiCl^ etch products lower than that of pure Si, in consequence 
of the reactive adlayer produced by chlorination of the surface. 
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C2.18.3.3 COMPOSITION AND STRUCTURE OF SURFACES EXPOSED TO ETCHANTS 

The complexity of observed etching kinetics suggested that several species are present at the surface during steady- 
state reaction. Studies were begun to identify these species by surface analysis techniques to aid in clarifying the 
etching reaction mechanisms. This work has been developed farthest for Si-F etching, for which selected highlights 
are summarized as follows. 


McFeely and co-workers used soft x-ray photoelectron spectroscopy (SXPS) to measure the changes in binding 
energies of Si(2p) levels after slight exposure to fluorine atoms via dissociative chemisorption of XeF 2 [39]. Using 
synchrotron radiation at 130 eV as the source enabled extreme surface sensitivity. Since this level is split into a 


doublet by spin-orbit coupling, the 2p 1/2 component was numerically removed from the data in order to simplify 
the spectrum and facilitate interpretation of peaks due to multiple species. Figure C2.18.4 shows the results after 
exposing a clean Si(100)-(2 x 1) surface to 50 L of XeF 2 . (The Langmuir (L), a convenient measure of exposing a 

surface to a steady background pressure of some gas for a specified time, is defined by 1 L = 1 x 10 Torr s. One 
Langmuir provides approximately one monolayer of adsorbate, if all molecules stick to the surface.) In addition to 
the peak for the bulk un-reacted Si, three new peaks appear with binding energy increases of approximately 1, 2 
and 3 eV relative to the bulk peak. They have been assigned to SiF, SiF 2 and SiF 3 respectively. Examination of 
several other Si surfaces showed the same fluorosilyl species, independent of surface structure. The relative 
amounts of these species depended on surface structure in a manner rationalized by the number of dangling bonds 

available at each surface. Subsequent studies extended the exposure to 5 min at 5 x 10 Torr of XeF 2 , to examine 
conditions more typical of steady-state etching [40]. Figure C2.18.5 shows the results, analysed as above. A new 
peak with binding energy approximately 4 eV higher than the peak for bulk Si is assigned to SiF 4 , and the most 
abundant fluorosilyl species is SiF 3 . These results are quite striking because they indicate that the etching reaction 
does not proceed by successive stripping of the outermost Si atomic layers, but rather by formation of a thick, 
highly fluorinated reaction layer — estimated to be about seven monolayers in thickness — in which the volatile 
reaction product SiF 4 can be trapped. Moreover, conversion of SiF 3 to SiF 4 appears to be a kinetic bottleneck in 
sequential fluorination reactions, but the complexity of the overall process prohibits identifying this step as rate- 
limiting. 
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Figure C2.18.4. Upper panel shows the 2p photoemission spectrum of the Si(l 1 1)-(2 x 1) cleaved surface after 
exposure to approximately 50 L of XeF 2 . The lower panel shows the 2p 3/2 component of the spectrum after 
background subtraction. In addition to the unshifted Si(2p v? ), there are three chemically shifted satellites 


corresponding to SiF, SiF 2 and SiF 3 in order of increasing binding energy. The dashed curves show the separate 
components, and the solid curve shows the sum of the four dashed components. Reproduced from [39]. 
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Figure C2.18.5. Si(2p) spectrum of Si(l 1 1) reacted with 5x10 Torr of XeF 2 , using photon energy of 130 eV. 
The top panel shows the raw data and the fitted background. The bottom panel shows the spectrum after 
background has been subtracted and fitted into five components: bulk Si and the four fluorosilyl peaks. The solid 
curve is the sum of the individual dashed component curves. Reproduced from [40]. 

Yarmoff and co-workers continued and extended this study, supplementing the SXPS measurements with photon 
stimulated desorption to obtain greater depth analysis of the fluorosilyl layer, and determined the thickness and 
composition of the fluorosilyl layer as a function of XeF 2 exposure on a Si(l 1 1)-(7 x 7) surface [41]. As exposure 
increases, the concentration of SiF 3 at the surface increases, and buries a relatively constant concentration of SiF 
and SiF 2 that remains near the interface with the unreacted Si substrate. More precisely, the reaction layer proceeds 
through four regimes, as shown in figure C2.18.6 . 
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Figure C2.18.6. The coverages of fluorosilyl groups in the reaction layer shown as a function of exposure. The 
coverages refer to monolayers of SiF x groups. The smooth curves are drawn through the data points. Reproduced 
from [41]. 


(1) 


(2) 


(3) 


(4) 


Initial exposure regime: This involves fluorination and etching of atoms in the 7 x 7 reconstruction, during 
which the destruction of the surface states in the reconstruction is correlated with creation of the species SiF 2 
and SiF v 

Quasi-equilibrium exposure regime: After the 7 x 7 structure has been removed, quasi-equilibrium between 
etching and growth of the reaction layer is established. The reaction layer is about one monolayer thick, and 
contains primarily SiF. Defects form near the surface, partly from the large reaction exothermicity. 
Transition to steady-state etching: The surface becomes sufficiently disordered to disrupt the quasi- 
equilibrium, and the reaction layer becomes a 'tree' structure of fluorosilyl chain structures terminated by 
SiF^ groups. 
Steady-state etching: Steady-state etching commences when the tree structure is fully developed. 


The authors analyse these results in considerable detail, demonstrating that both the structure of the surface and 
steric interactions between F atoms on neighbouring SiF 3 groups influence the reaction progress. 

Analysis by SXPS rationalized an earlier demonstration that heavily n-doped Si etches more rapidly with F atoms 
than does heavily p-doped Si, and produces less SiF 3 in the volatile etch products [42]. After exposure to sufficient 
XeF 2 to achieve steady-state etching, heavily p-Si(l 11) and n-Si(l 11) samples were analysed by SXPS as 
described above [43]. The p-type sample showed a much thicker fluorosilyl layer and substantially greater SiF 3 
concentration. Formation of Si-F bonds involves considerable charge rearrangement, especially in converting the 
electron-depleted SiF 3 centre to SiF 4 . This conversion should be facilitated in n-type samples where the Fermi level 
lies in the high density of filled electronic states, but impeded in p-type samples. Thus, the SXPS data explain both 
the relative rates and the relative product distributions between heavily p- and n-doped samples. 
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Analysis by SXPS has provided insights into dry etching of III-V materials by the halogens. The general 
conclusions from a comprehensive review of the field are summarized as follows [44]. Molecular halogens attach 
by dissociative chemisorption, forming sequentially mono-, di- and tri-halides. The result is competition between 
etching and surface passivation, governed by temperature, surface stoichiometry and surface crystallinity. When 


passivation occurs, the adsorbate usually forms an ordered overlayer. When etching occurs, there appears to be 
little preference between the III atom and the V atom; attachment occurs at whichever atoms are exposed at the 
surface. 

Detailed atomic-level description of the etching mechanisms requires data not only on composition and electronic 
structure of the surface, as revealed by SXPS, but also on the atomic structure of the surface. The scanning 
tunnelling microscope has been used to demonstrate that purely thermal etching reactions depend on, and in turn 
influence, surface morphology [45, 46 and 47]. Since thermal barriers to adsorption of etchant molecules and to 
desorption of etch products depend on local structural features, various competing reaction pathways can be 
observed as a function of temperature to determine the dominant effects. This method holds promise for describing 
the detailed evolution of surface morphology during etching [48]. 

C2.1 8.3.4 THEORETICAL MODELS AND SIMULATIONS OF ETCHING REACTIONS 

The method of molecular dynamics (MD), described earlier in this book, is a powerful approach for simulating the 
dynamics and predicting the rates of chemical reactions. In the MD approach most commonly used, the potential of 
interaction is specified between atoms participating in the reaction, and the time evolution of their positions is 
obtained by solving Hamilton's equations for the classical motions of the nuclei. Because MD simulations of 
etching reactions must include a significant number of atoms from the substrate as well as the gaseous etchant 
species, the calculations become computationally intensive, and the time scale of the simulation is limited to the 

order of 10 ps. Nonetheless, these simulations provide considerable insight into the early stages of the etching 
reaction. 

The central ingredient in MD simulations is the interaction potential [49]. The first potential function used in MD 
of Si-F 2 etching reactions was developed by Stillinger and Weber by parameterizing an empirical functional form 
to fit data for bulk Si, gaseous F 2 , gaseous SiF^ and gaseous Si 2 F 6 [50]. MD simulations based on this potential 
predicted that etching would occur only at temperatures near the melting point of Si, and that molecular F 2 did not 
scatter on the clean Si(100) surface, both contrary to experimental results. Subsequent MD simulations with this 
same potential predicted that F atoms incident on Si(100) with kinetic energy 3 eV did react spontaneously [51]. 

A more accurate potential function for Si-F 2 etching was developed by Carter and co-workers by carrying out 
highly correlated ab initio quantum chemistry calculations of the interaction of F 2 with Si(100) (treated as a 
cluster) and fitting their results to the functional form of the Stillinger-Weber (SW) potential [52]. As a result, the 
SW Si-F bonding well became deeper and the non-bonding terms less repulsive. These changes enabled further 
MD calculations to observe buildup of a fluorosilyl layer, which is the first stage of etching [53, 54]. Running 
sufficient repetitions of the simulation to enable statistical predictions revealed the influence of surface steps and 
defects, surface coverage, molecular orientation and molecular internal energy on formation of the fluorosilyl layer 
[55, 56, 57 and 58]. Subsequent MD simulations of F 2 with Si(100) in which the original SW potential was 
compared directly with the reparametrized version due to Weakliem, Wu and Carter (WWC) demonstrated that 
while WWC was an improvement over SW, neither predicted adequate dissociative chemisorption when compared 
with experimental results [59]. More sophisticated ab initio calculations are therefore required to represent more 
accurately the interactions between F ? and Si(100). 
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These theoretical descriptions of the thermal etching reaction between F 2 and Si(100) have been reviewed in some 
detail in the context of ab initio methods in surface chemistry [60]. 

MD simulations based on the SW potential have been applied to ion-enhanced etching systems; selected examples 
are Ar + bombardment of Cl/Si [61], Ar + bombardment of fluorosilyl/Si [62], Cl + and F + bombardment of Si [63], 
fluorosilyl ion bombardment of Si [64] and Cl + bombardment of Si in the presence of a thermal background flux of 
CI atoms [65]. General conclusions are as follows: (a) sputter yield in the presence of reactive species is enhanced 
over that for pure Si; (b) predicted reaction products agree qualitatively with experimental results; (c) surface is 
roughened during etching and (d) weakly bound product species are produced during ion bombardment and can be 
removed by thermal desorption. However, no universal mechanism for ion-enhanced etching has been identified 


and the detailed evolution of the product distribution is not revealed. This probably originates partly in the short 
time scale of the simulations compared to the overall reaction time scale, and partly in limitations of the potential 
functions employed. 

Perhaps greater success has appeared in simulations that emphasize the short- time consequences of binary 
collisions, namely energy loss from the ions and angular effects, to assess the effects of ion bombardment on the 
profile of the etched features [ 66 , 67 and 68]. Earlier studies of molecular beam scattering of energetic F atoms 
with a fluorinated Si surface had demonstrated the influence of directed energy transport of neutral reactive species 
on evolution of the etch profile [69]. These various effects can be combined to simulate the buildup of non-uniform 
charge distributions on patterned surfaces, which have dramatic consequences both for the shape of the etched 
profile and for inflicting etch damage [70, 71 and [72]. The relationship between plasma process parameters, non- 
uniform charging, etched profile and etch damage are presently areas of intense research activity, discussed 
regularly at the continuing conference series International Symposium on Plasma- and Process-Induced Damage 
[73]. 

C2.1 8.3.5 MODEL STUDIES OF PHOTON- AND ELECTRON-ENHANCED ETCHING 

In order to avoid ion bombardment damage while achieving anisotropy, alternate means of enhanced etching have 
been explored. In a seminal early study, Houle showed that cw bandgap excitation generated hot carriers that 
enhanced the etching of both n- and p-Si(lOO) by XeF 2 [74]. Later developments are summarized in a conference 
proceedings [75]. At present, photoelectrochemical etching is the only wet etching method for the wide-bandgap 
Ill-nitrides [76]; it has proved especially sensitive to defect structure in the films [77]. Coburn and Winters [ 23 ] and 
Veprek [ 78 ] examined electron-enhanced etching of Si. Gillis and co-workers developed low-energy electron 
enhanced etching (LE4), in which electrons with kinetic energy 1-15 eV and chemically reactive species at thermal 
velocities are delivered simultaneously to the surface. LE4 is accomplished either in UHV with separate beams of 
electrons and molecules [ 79 ] or in a DC plasma. In DC plasma it has produced excellent anisotropy and smooth 
surfaces and maintained stoichiometry of compound semiconductor surfaces when etching Si [80], GaAs [81] and 
GaN [82, 83 and 84]- Recently LE4 has been used to transfer a hexagonal array of 18 nm holes on a 22 nm lattice 
constant from a biologically derived pattern into Si(100) [85]. High-resolution cross-sectional transmission electron 
microscopy showed Si lattice fringes at the perimeter of the etched holes, demonstrating that LE4 does not inflict 
lattice displacement damage on the substrate. 


C2.18.4 SELECTED EXAMPLES OF DEPOSITION STUDIES 

Deposition by chemical reaction is a vast field that cannot be surveyed in the limited space here. Two particular 
examples have been selected because they illustrate the close relation between fundamental surface chemistry 
research 
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and process development. Moreover, both show great promise for nano -fabrication, where film thickness must be 
controlled at the atomic level. 

C2.1 8.4.1 HOMOEPITAXY OF GALLIUM ARSENIDE BY ATOMIC LAYER EPITAXY 

As outlined in section C2. 18.2.4 above, nucleation is a key early step in film growth. This leads to micro- 
crystallites with uncontrolled boundaries, the coalescence of which may lead to high defect densities that require 
thermal annealing or other post-growth treatment to produce high-quality films. The method of atomic layer 
epitaxy (ALE) was developed to achieve the final crystal form immediately in compound materials by exposing the 
growing surface sequentially to reactants providing one of the constituent atoms [86]- ALE relies on self-limiting 
adsorption/reaction at each step to grow crystalline films layer by layer with precise control of thickness and superb 
uniformity of thickness. 


Substantial work has been devoted to achieving such results for homoepitaxy of GaAs(lOO) by sequential reactions 
of the substrate with trimethylgallium (TMGa) (CH 3 ) 3 Ga and arsine AsH 3 in CVD reactors and in MOMBE 
configurations [87, 88]. Successful ALE growth (1 monolayer/cycle) was achieved in narrow regions of 
temperature and exposure. Surface science studies showed that in similar regions of temperature and exposure, 
TMGa dissociatively chemisorbed on the Ga-rich reconstructions of GaAs(lOO) to produce significant coverage of 
methyl groups that stabilized the complete Ga monolayer [89]. Outside these regions, the coverage of methyl 
groups was insufficient, and self-limiting deposition in the TMGa cycle was lost. Various model mechanisms were 
proposed and debated [ 90 , 91 ]. A second factor influencing self-limiting deposition of Ga is the change in 
stoichiometry of the GaAs(lOO) surface as it evolves from As-rich to Ga-rich during the TMGa cycle. None of the 
known adsorbate-free reconstructions of the polar GaAs(lOO) surface are ideally terminated at one monolayer 
coverage, and therefore cannot support the 'ideal' ALE process. By viewing step edges as reservoirs where surface 
atoms may be added or removed in order to fill incomplete terminations, Creighton proposed the Ga-rich GaAs 
(100)-(1 x 2)-CH 3 reconstruction as the key participant in ALE [92]. This surface consists of 0.5 monolayer of CH 3 
adsorbed on a full monolayer of dimerized Ga atoms. The adsorbate stabilizes the surface at 1 monolayer of Ga, 
and enables the self-limiting adsorption of Ga needed for ALE. Two candidate stabilized surfaces have been 
identified for the arsine cycle: the As-rich surface of GaAs(lOO) has a y-(2 x 4) reconstruction terminated with 1 
monolayer of As, and in the presence of adsorbed H atoms a c(2 x 8)/(2 x 4) reconstruction that can be saturated 
with 1 monolayer of As. ALE growth is controlled in a complex way by the competing reaction kinetics on the Ga- 
rich and As-rich surfaces. None of the proposed mechanisms explains all the experimental results. Progress until 
1996 has been reviewed in moderate detail [93]. 

C2.18.4.2 DEPOSITION OF OXIDE FILMS BY ATOMIC LAYER PROCESSING 

The Si/Si0 2 interface is crucial to the function of silicon devices. As the dimensions of these devices continue to 
shrink, the thickness of the oxide layers will be reduced to -3 nm. The need will also arise for conformal and 
uniform deposition on three-dimensional structures with high aspect ratios, which cannot be achieved with the 
standard line-of-sight deposition methods. These demands for precise thickness control, uniformity and 
conformality can be met by a variation of ALE in which oxides are deposited by an alternating sequence of self- 
limiting reactions; one element of the oxide is deposited in each of the reactions [94]. Because these layers are 
amorphous rather than epitaxial with the substrate, the method is known as atomic layer processing (ALP). George 
and co-workers have studied the fundamental surface chemistry in two ALP reaction sequences in order to identify 
conditions under which they are self-limiting, and have used the results to deposit high-quality oxide films. 
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Deposition of Si0 2 has been achieved by exposing the substrate to the binary reaction sequence SiCl 4 +2H 2 — » 
Si0 2 + 4HC1. This is divided into the following 'half-reactions' in which species at the surface are indicated by 
asterisks [95]: 

(A) Si-OH ¥ +SiCI 4 -*5iOSi-Ci;+HCI; 

(B) Si-Cr+H 2 0^Si-OH # +HCL 

Surface species during the reaction were detected by Fourier transform infrared spectroscopy, using the Si-Cl 

stretch at 625 cm and the SiO-H stretch at 3740 cm . During the (A) half-reaction, the Si-Cl peak increased 
while the SiO-H peak declined. The opposite behaviour was seen during the (B) half-reaction. Figure C2.18.7 
shows the integrated absorbances of these two peaks versus time during the (A) and (B) half-reactions. These 
results clearly demonstrate that both of the binary reactions are complete and self-limiting at 600 K and 10 Torr; at 
lower temperatures the reactions did not go to completion. Temperature-programmed desorption of Si0 2 in UHV 
after various numbers of AB reaction cycles demonstrated that the growth rate was -0.1 1 nm per cycle. Analysis by 
Auger electron spectroscopy (AES) showed a peak at 83 eV characteristic of the Si-Si0 2 interface, which 
decreased as the films grew thicker. The only AES peaks that increased with growth were those characteristic of 
stoichiometric Si0 2 . This study not only produced chlorine-free, stoichiometric Si0 2 films by self-limiting 
reactions, but also provided detailed insight into the molecular mechanisms involved. A related study examined the 
fundamental surface chemistry in the deposition of stoichiometric A1 2 3 by the self-limiting sequential reactions of 
trimethyl aluminium (CH^Al and H ? [96]. Deposition of A1 ? 0^ on well characterized porous membranes 


demonstrated conformal coating of the pore walls [97]. 
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Figure C2.18.7. The integrated absorbance of the Si-CI stretching vibration at 625 cm and the SiO-H stretching 
vibration at 3740 cm as a function of time during the (A) SiCl 4 and (B) H 2 half-reactions at 600 K and 10 Torr. 
Reproduced from [95]. 
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Progress up to 1996 has been reviewed in moderate detail [98]. Subsequent developments have been summarized 
for A1 2 3 [99] and for Si0 2 , [ 100 ] and deposition of Si 3 N 4 has been reported [ 101 ]. 


C2.18.5 CONCLUDING COMMENTS 

The examples discussed in this chapter show a strong synergy between fundamental physical chemistry and device 
processing methods. This is expected only to become richer as shrinking dimensions place ever more stringent 
demands on process reliability. Selecting key aspects of processes for fundamental study in simpler environments 
will not only enable finer control over processes, but also enable more sophisticated simulations that will reduce 
the cost and time required for process optimization. 
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C3.1 Transient kinetic studies 

Robert A Goldbeck and David S Kliger 


C3.1.1 INTRODUCTION AND HISTORICAL OVERVIEW 

Transient, or time-resolved, techniques measure the response of a substance after a rapid perturbation. A swift 
'kick' can be provided by any means that suddenly moves the system away from equilibrium — a change in reactant 
concentration, for instance, or the photodissociation of a chemical bond. Kinetic properties such as rate constants 
and amplitudes of chemical reactions or transformations of physical state taking place in a material are then 
determined by measuring the time course of relaxation to some, possibly new, equilibrium state. Determining how 
the kinetic rate constants vary with temperature can further yield information about the thermodynamic properties 
(activation enthalpies and entropies) of transition states, the exceedingly ephemeral species that lie between 
reactants, intermediates and products in a chemical reaction. 

Relaxation kinetics may be monitored in transient studies through a variety of methods, usually involving some 
form of spectroscopy. Transient techniques and spectrophotometry are combined in time resolved spectroscopy to 
provide both the structural information from spectral measurements and the dynamical information from kinetic 
measurements that are generally needed to characterize the mechanisms of relaxation processes. The presence and 
nature of kinetic intermediates, metastable chemical or physical states not present at equilibrium, may be directly 
examined in this way. 

The introduction in the 1920s of rapid mixing techniques to initiate chemical reactions first brought millisecond 
time resolution to the study of solution kinetics [1], overcoming the limitation of classical methods to reactions 
occurring in seconds or longer. Originally developed to use a continuous flow of reagents, the more reagent- 
conserving stopped flow approach is now commonly used and widely available in commercial instrumentation. Not 
only are bimolecular reactions studied by the rapid combination of reagents, but unimolecular reactions may also 
be initiated by rapid dilution, as in denaturant dilution studies of protein folding. 

The millisecond barrier to fast kinetic studies was broken in the late 1940s and early 1950s by two developments: 
the flash photolysis method of Norrish and Porter[2] and the chemical relaxation techniques of Eigen [3], advances 
for which the three shared the 1967 Nobel Prize in chemistry. (The term relaxation techniques refers to kinetic 
methods in which a sudden change in an extensive parameter such as temperature, pressure, or electric field 
provides a perturbation from equilibrium small enough that any subsequent relaxation can be treated as a first order 
rate process, as discussed further below. This is distinguished from flash photolysis in which absorption of an 
optical photon creates a new physical or chemical state that is far from equilibrium.) The new techniques initially 
made possible transient kinetic studies of processes taking place on time scales as short as microseconds. The 
development of flash photolysis boosted the field of photochemistry tremendously by opening up transient 
photochemical and photophysical species such as free radicals and electronically excited states to direct 
observation and characterization. At the same time, the development of relaxation techniques opened up the field of 
fast solution kinetics by allowing researchers to directly follow the time evolutions of fast unimolecular and 
bimolecular reactions such as dissociations, isomerizations and near diffusion-controlled ionic association 
reactions. 
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The flash lamp technology first used to photolyse samples has since been superseded by successive generations of 
increasingly faster pulsed laser technologies, leading to a time resolution for optical perturbation methods that now 
extends to femtoseconds. This time scale approaches the ultimate limit on time resolution (At) available to flash 
photolysis studies, the limit imposed by chemical bond energies (AE) through the uncertainty principle, AEAt > ±h. 

Similarly, the most rapid relaxation method, temperature jumping by solvent absorption of a brief pulse of optical 


or IR photons, is ultimately limited in time resolution by the energy redistribution processes, such as rotational and 
vibrational relaxations, leading to thermal equilibrium. These are of the order of picoseconds in condensed phases 
but can be much slower in the gas phase. The time scales applicable to some transient techniques are summarized 

in table C3.1.1. This article focuses on transient kinetic studies in the 10 to 1 s time regime. Ultrafast 
(femtosecond and picosecond) methods are covered elsewhere in the encyclopedia. 

Table C3. 1.1 Time-resolved methods and time scales. 


Method Time range (s) 


Flow techniques 

1 3 - 1 0" 4 

Relaxation techniques 


Temperature jump 

1-1CT 11 

Pressure jump 

1-1 cn 6 

Electric field jump 

10" 2 - 10" 10 

EPR 

10- 5 -10" 10 

Flash photolysis 

1-1CT 15 

Pulsed radiolysis 

1-1CT 11 


C3.1.2 TIME RESOLVED PROCESSES 

Fast transient studies are largely focused on elementary kinetic processes in atoms and molecules, i.e., on 
unimolecular and bimolecular reactions with first and second order kinetics, respectively (although conformational 
heterogeneity in macromolecules may lead to the observation of more complicated unimolecular kinetics). 
Examples of fast thermally activated unimolecular processes include dissociation reactions in molecules as simple 
as diatomics, and isomerization and tautomerization reactions in polyatomic molecules. A very rough estimate of 
the minimum time scale required for an elementary unimolecular reaction may be obtained from the Arrhenius 
expression for the reaction rate constant, k = A e~ E ^ nf . The quantity k B T/h from transition state theory provides 

an upper limit on the pre-exponential factor ,4 that is of the order of 10 13 s , or a vibrational frequency, at room 
temperature. This leads to the estimate that a barrierless reaction can proceed over the course of tens or hundreds of 
femtoseconds. However, chemical reactions must often overcome a potential energy barrier associated with 
breaking bonds (while perhaps 
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forming others) and the addition of even a modest barrier slows the previous estimate considerably. An activation 

energy of only 23 kJ mol (about 5% of a covalent bond energy), for example, will slow our hypothetical ultrafast 
reaction by four orders of magnitude and begin to bring it into the province of the fast kinetic methods discussed 
here. Moreover, entropic constraints often present in the transition state can further reduce the reaction rate 
constant by reducing A from the upper limit above. 

The fastest bimolecular reactions are rate limited by the time it takes for reactants to diffuse toward one another. A 

typical diffusion-controlled bimolecular rate constant, k^, is about 6 x 10 9 1 mol -1 s _1 for uncharged reactants 
dissolved in water at room temperature. (This is slower by a factor of 50 than the corresponding rate constant for 
binary collisions in a gas because of solvent viscosity.) We can define a pseudo-first-order rate constant, k^ for the 
bimolecular reaction of species A with B if one reactant, A for instance, is present in excess: k^ = £ D [A]. This leads 
to an upper limit on the bimolecular reaction time constant of about half of a nanosecond for solution 
concentrations of A approaching 1 M, an estimate that will be proportionately slower for lower concentrations of 


A. (The gas phase estimate is about 100 picoseconds for A at 1 atm pressure.) This suggests that the great majority 
of fast bimolecular processes, e.g., ionic associations, acid-base reactions, metal complexations and ligand-enzyme 
binding reactions, as well as many slower reactions that are rate limited by a transition state barrier can be 
conveniently studied with fast transient methods. 

The absorption of a photon initiating photophysical and photochemical processes can itself be an extremely rapid 

event (as short as -10 s, for instance, given the ~ 10 cm bandwidth typical of transitions in condensed phase 
polyatomic molecules and available in lasing media for ultrashort pulsed lasers). This has made light absorption a 
widely used trigger for fast kinetic studies. Ensuing photophysical and photochemical processes can take place on 
fast to ultrafast time scales. Unimolecular photophysical processes and their characteristic time scales include: 
fluorescence emission, 10 -11 - 10 -6 s; phosphorescence, 10 -3 - 10 2 s; internal conversion (spin-conserving 


14 


-11 


nonradiative relaxation) from higher excited states to the lowest excited state, 10 - 10 s; internal conversion 
from the lowest excited state to the ground state, 10 - 10 s; and intersystem crossing (spin-changing 

11 o 

nonradiative relaxation) from the lowest excited singlet state to a triplet state, 10 - 10 s. Primary unimolecular 
photochemical processes, such as photodissociation into molecules, ions, or radicals, photo-isomerization or 
rearrangement and photoionization, proceed on excited state potential surfaces and typically are kinetically 
independent of temperature, as are primary bimolecular photochemical processes such as photodimerization, 
photoaddition, hydrogen atom abstraction and electron transfer to or from an excited acceptor or donor. (The 
kinetics of secondary, or dark, photochemical reactions proceeding from the products of primary photochemical 
processes will in general be dependent on temperature, however, as they take place on the ground state potential 
surface.) Primary photochemical processes generally compete kinetically with the photophysical processes of 
radiative and nonradiative relaxation as decay routes for the initially excited state. Additional bimolecular 
processes that may be generated by light excitation but which do not necessarily lead to permanent photoproducts 
include excimer and exciplex formation, the association of an excited species with a like or dissimilar ground state 
species, respectively, and quenching processes in which excitation energy is transferred to other species in either a 
contact or long-range interaction. As the above time scales suggest, current understanding of these photochemical 
and photophysical processes has benefited greatly from the application of fast time-resolved spectroscopic 
techniques. 
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C3.1.3 TRANSIENT SPECTROSCOPY 

An apparatus for time-resolved spectroscopy can be schematically reduced to the basic elements shown in figure 
C3.1.1. The pump source in this figure is some device providing the perturbation needed to initiate changes in the 
sample to be studied. As discussed further below, this usually refers to a light source such as a laser when 
measurements are to be carried out on a fast time scale. However, this could refer to other types of perturbing 
device such as a stopped-flow apparatus, for example, to rapidly mix different reagents or a capacitor discharged 
across a sample cell to suddenly jump the sample temperature. 



sample 



probe 





detector 







Figure C3.1.1. The basic elements of a time-resolved spectral measurement. A pump source perturbs the sample 
and initiates changes to be studied. Lasers, capacitive-discharge Joule heaters and rapid reagent mixers are some 
examples of pump sources. The probe and detector monitor spectroscopic changes associated with absorption, 
fluorescence, Raman scattering or any other spectral approach that can distinguish the initial, intermediate and final 


states in a reaction. 

A light source probes changes in the sample at various times after perturbation using some type of spectroscopy 
that can distinguish the initial reactant, intermediates and final product. A laser can be used to further excite the 
sample, producing fluorescence or Raman scattering that may be monitored as a function of time, for instance, or, 
alternatively, the absorption spectrum of the sample may be monitored using a variety of light sources that may be 
polarized or unpolarized, lasing or incoherent. Non-optical spectral techniques such as EPR [4] or NMR [5] can 
also be used to probe reaction dynamics, but in this chapter we will emphasize optical spectroscopies as these are 
most commonly used, particularly on fast time scales. 

Finally, the detection system in figure C3.1.1 represents some device to detect the changes in the spectral properties 
of the probe beam caused by perturbing the sample. This is typically a photoelectric detector to record light coming 
from the sample, such as fluorescence or Raman scattering, or to record intensity changes of the probe light source 
in the case of absorption measurements. Probe and detection strategies can involve measurements made one 
wavelength at a time, using devices such as photomultipliers or photodiodes to record an intensity change as a 
function of time, or can involve multispectral measurements using photodiode arrays or charge-coupled devices to 
measure entire spectra at some specific time following application of the perturbation to the sample. In many cases, 
the goal is to obtain the time evolution of an entire spectrum monitoring some process of interest. This is 
accomplished with a single wavelength instrument by monitoring the time evolution of a signal at some wavelength 
and repeating this measurement at different wavelengths. Alternatively, with a multi wavelength instrument, one 
measures the spectrum at a specific time and repeats the spectral measurement at various times. It is also possible 
to accomplish this goal in a single measurement 
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using a streak camera, which records the spectral evolution over a range of wavelengths and delay times after a 
single perturbation. Streak cameras have been used mainly in ultrafast applications, however, as their high cost has 
tended to discourage applications to slower time regimes [6]. 


C3.1.4 RAPID MIXING 

One can study a slow (minutes or longer) chemical reaction by mixing two chemicals in the sample compartment 
of a standard UV-vis spectrophotometer and measuring the spectrum as a function of time. Though perhaps not 
often thought of as such, this is a form of transient spectroscopy, albeit a slow one. To carry out such a 
measurement for a reaction which is complete on a time scale of milliseconds to seconds one needs to mix the 
chemicals and measure the spectra much more rapidly. For gases, this can be done by releasing reactants into a 
discharge flow apparatus, where they are mixed by diffusion and turbulence while being carried down a tube in an 
inert carrier gas such as helium. This is not, strictly speaking, a transient kinetic method, however, as the progress 
in time of the reaction is measured by the steady state detection of concentration as a function of distance travelled 
down the tube. For liquids, achieving satisfactory mixing times (not to mention conserving reactant) usually 
requires the use of a stopped-flow apparatus, as in figure C3.1.2 in which chemicals are rapidly forced into a 
sample cuvette by syringes whose plungers are quickly actuated at a specific time. A probe detection system is 
triggered immediately after the sample is mixed in this transient technique. Conductance may be monitored in the 
case of ionic solutions, whereas spectrophotometry provides a more general method for determining 
concentrations. Electronic detection methods provide the time resolution needed (oscilloscopes and transient 
recorders with response frequencies up to a few GHz are available) to monitor the conductance or spectral changes 
that accompany the reaction taking place. The rapid mixing approach is limited to the study of reactions taking 
place on time scales of milliseconds or longer simply because it takes this long for mixing to occur (although 
ultrarapid techniques have been developed to mix reactants on a 100 microsecond time scale [7]. 
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Figure C3.1.2. Stopped-flow apparatus with motor-driven syringes. Syringe plungers force the reactants A and B 
through a mixing chamber into a spectral cell. Kinetic data collection begins when the effluent syringe plunger is 
pushed out to contact an activation switch, about a millisecond after the initiation of mixing. (Adapted from Pilling 
M J and Seakins P W 1995 Reaction Kinetics (Oxford: Oxford University Press) 

The example above of the stopped-flow apparatus demonstrates some of the requirements important for all forms 
of transient spectroscopy. These are the ability to provide a perturbation (pump) to the physicochemical system 
under study on a time scale that is as fast or faster than the time evolution of the process to be studied, the ability to 
synchronize application of the pump and the probe on this time scale and the ability of the detection system to time 
resolve the changes of interest. 


C3.1.5 RELAXATION SPECTROSCOPY 

How does one monitor a chemical reaction that occurs on a time scale faster than milliseconds? The two 
approaches introduced above, relaxation spectroscopy and flash photolysis, are typically used for fast kinetic 
studies. Relaxation methods may be applied to reactions in which finite amounts of both reactants and products are 
present at final equilibrium. The time course of relaxation is monitored after application of a rapid perturbation to 
the equilibrium mixture. An important feature of relaxation approaches to kinetic studies is that the changes are 
always observed as first order kinetics (as long as the perturbation is relatively small). This linearization of the 
observed kinetics means 
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that useful information about reactions involving higher order kinetic mechanisms may be obtained in a relatively 
simple manner. To see why this is so, consider the reaction: 


A + B^C 


The rate equation describing the kinetics of this reaction is 

-d[A] -d[B] d[C] 
ill df df 


= t,[A][B]-*_,[C]. 


If the equilibrium concentrations for A, B and C are a, b and c, respectively, the concentration changes resulting 
from the application of the perturbation will be 

a ■ = a - [A] = b - [B = [C] - c 

and we can then reduce the rate equation to 

^- = k x {a-x){h-x)-k- l {c + x). 

Expanding the right-hand side of this equation yields 

k\tth — k](a + h)x +ktx 7 — k_[c — k_\x. 

However, since the rate of change of all components is zero at equilibrium, 

fcl*6-Jt-ic • = 0. 

The perturbation being small, x is negligible, so that 

^ = -[k i (o+b)+k. l )x 
df 

This shows that the observed rate for this process will follow first order kinetics, even though the reaction being 
studied is second order. Furthermore, both k^ and k_^ may be determined by observing the kinetics at different 
starting concentrations that vary the quantity (a+b). 
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C3.1.5.1 TJUMP 

In a discharge T-jump apparatus, a capacitor is discharged across a sample cell containing conductive solution as 
shown in figure C3.1.3 in order to rapidly increase the temperature through Joule heating. Because equilibrium 
constants generally depend on temperature, the reaction mixture is rapidly triggered to change according to the 
kinetic scheme of the reaction under study. The change is given by van't Hoff s law: 


/ 17 In g \ _ A//° 


which predicts that the equilibrium constant changes by about 2% per degree at room temperature for a reaction 
with a Mr value of 10 kJ moP , for example. 


Electrodes 
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Figure C3.1.3. Schematic diagram of Joule heating T-jump apparatus for transient spectroscopy. (Adapted from 
French T C and Hammes G G 1969 Methods Enzymol 16 3.) 

One can follow reactions of the order of microseconds or longer using a discharge T-jump. In a typical example, 

•2 o 

discharging 45 J of electrical energy into 10 cm of aqueous solution raises Thy about 1 C with an RC time 
constant of Ins for R=20 Q (-0.5 M NaCl) and C=0.1uF (charged to 3 x 10 4 V). 
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C3.1.5.2PJUMP 


A sudden change in pressure can also be used to shift a chemical equilibrium, the change being given by the 
thermodynamic relation 






where AV^ is the standard molar volume change for the reaction. This approach is less general than the T-jump 
method because relatively few reactions have significant volume changes. Examples are found, however, in the 

ionic association and dissociation reactions, in which solvation and electro striction effects may produce a AV^ of 
-10 cm mol . In this case, a A P of 50 atm is required to produce a 2% change in K at room temperature. (Care 
must be taken in applying the constant T expression above for the change in K as it neglects the fact that a rapid P- 
jump is adiabatic and may be significantly non-isothermal, particularly in nonaqueous solvent.) Pressure jumps of 

the order of 10 atm with 10 s rise times may be obtained by rupturing a disc to suddenly admit a pressurized gas 
into a sample cell or, conversely, to suddenly release a pressurized gas from the cell [8, 9]. Differential P-jumps (~ 
150 atm) can also be applied to study the kinetics of reactions at absolute pressures up to 2500 atm [10]. Because of 
the frequent need for high sensitivity in P-jump experiments, concentrations are often monitored using conductive 
measurements. 


C3.1.5.3E FIELD JUMP 


The rapid application of a very high electric field (\E\ ~ 10 5 V cm ) can perturb chemical equilibria. This effect is 
described by the thermodynamic relation 




where A hr is the change in standard molar polarization of the reaction. A hr is highest for reactions involving 
charge separations, e.g., ionic dissociations. Because of the resulting focus on conductive ionic reaction mixtures, 

pulsed fields of 10 s or shorter duration are typically used in order to limit Joule heating to acceptable levels. An 
interesting early application of the pulsed field technique was to measurement of the reaction rate constant for the 

prototypical proton transfer reaction H + + OH - — » H 2 [11]. The value measured, 1.3 x 10 M s , is more than 
an order of magnitude faster than a typical diffusion controlled rate constant, reflecting the anomalously rapid 
diffusion of protons through water. Electric field jumps are also used to measure nanosecond electro-optic 
relaxation time constants for the dipole reorientations of biological macromolecules [12]. 


C3.1.6 FLASH PHOTOLYSIS 

Laser-based pump strategies are generally necessary to study reactions taking place on time scales faster than 
microseconds. Lasers can be used to produce T-jumps on time scales faster than microseconds or to initiate 
reactions through rapid photochemical or photophysical processes. Lasers can also initiate ultrarapid mixing via a 
wide variety 
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of 'caged' compounds that release reactants upon photolysis [13]. Caged compounds contain a reactant moiety 
whose active site is blocked by a photolabile group. The cage can be rapidly photodissociated to produce a sample 
with reactants already microscopically mixed. One then studies the reaction on a time scale limited only by the time 
required for photodissociation and microscopic diffusion of reactants rather than by the time needed to 
macroscopically mix reagents. A similar approach can be used to rapidly change the pH of a solution to initiate a 
reaction or change a pH-dependent equilibrium. This can be done with compounds that exhibit dramatically 
different pK^ values in their ground and excited states [14], such as sulphonated phenols [ 15 , 16 ]. One can change 
the pH of a solution by several units within nanoseconds with such an approach. The acid-base chemistry involved 
is usually reversible, however, leading to eventual loss of the pH jump, often within a millisecond after excitation. 
Protons produced by irreversible photochemistry can provide more persistent jumps for use in single shot or flow 
experiments. The photoconversion of o-nitrobenzaldehyde to nitrosobenzoic acid, for example, produces a pH 
jump within a microsecond that persists for tens of milliseconds [17]. 

Lasers can also be used to produce T-jumps on time scales of picoseconds or longer in condensed phases such as 
liquid solutions. An intense pulsed laser, tuned to a solvent absorption, heats the solvent molecules on the time 
scale of the exciting laser pulse after rapid radiationless decay of the initial photoexcitation and rapid intra- and 
intermolecular energy transfer between the solvent and solute molecules' degrees of freedom. Red or near-IR lasers 
are often frequency down-converted to reach the IR region where the solvent absorbs most efficiently. In aqueous 
solutions, for example, shifting the laser to the region around 1.5 jum results in strong absorption and temperature 
jumps of tens of degrees when the laser pulse is focused into a small volume. In a typical application, a 300 mJ 
pulse of 1.06 jum fundamental from a ns Nd:YAG laser is converted by a Raman shifter or OPO to a longer IR 
wavelength for efficient absorption by water. A conversion efficiency of 5%, for instance, produces 15 mJ of 

down-converted IR, which can be focused into a 1.5 x 10 cm absorbing volume to give a temperature jump of 
10°C. 

Reaction kinetics can be initiated most rapidly by the photo initiation of a unimolecular reaction. With a sufficiently 


fast excitation pulse, the speed of such reactions depends simply on intramolecular rates of energy transfer and the 
reaction dynamics themselves, precisely the properties one is often interested in studying. Researchers can study 
the photophysical properties of molecules by monitoring the spectra (e.g., absorption, emission or Raman) and 
dynamics of the excited states, or study the photochemical properties of excited states that decay through reactive 
channels such as isomerization, bond cleavage or ligand dissociation. It is also possible to initiate bimolecular 
photochemical reactions but these generally will occur on slower time scales involving the diffusion of reactant 
molecules to form reactive complexes. 

The sensitivities of particular spectroscopic techniques to specific chemical features are described more fully in the 
next section. Perhaps the most common and versatile probes of reaction dynamics are time-resolved UV-vis 
absorption and fluorescence measurements. When molecules contain chromophores which change their structure 
directly or experience a change of environment during a reaction, changes in absorption or fluorescence spectra can 
be expected and may be used to monitor the reaction dynamics. Although absorption measurements are less 
sensitive than fluorescence measurements, they are more versatile in that one need not rely on a substantial 
fluorescence yield for the reactants, products or intermediates to be studied. 

Unfortunately, the low resolution absorption spectra characteristic of condensed phase molecules at room 
temperature frequently do not provide a lot of information about the physicochemical nature of intermediates. 
Thus, time-resolved absorption measurements are often useful to initially characterize the kinetic characteristics of 
a reaction, but other spectroscopic methods may also be useful in probing more subtle or structure-specific 
mechanistic features. In the many cases in which one would like to obtain more information about the structural 
features of intermediates 
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than is available from absorption data, vibrational spectroscopies, including infra-red absorption measurements and 
Raman scattering, can be very useful. Often exquisitely sensitive to molecular structure, vibrational spectra contain 
much structural information, at least in principle, although their interpretation in terms of molecular structures may 
not always be straightforward in larger molecules because of spectral crowding. 

It is also possible to obtain more structural information than is usually available from absorption data by making 
measurements with polarized light. Looking at linear dichroism (LD), the difference in absorption between linearly 
polarized light oriented parallel or perpendicular to a reference axis, as a function of time can provide detailed 
information about changes in orientation of that chromophore during the course of a reaction. An LD reference axis 
is determined in the molecular frame by the transition dipole of the chromophore used to photo initiate the reaction, 
and in the laboratory frame by the polarization of the exciting laser. Measurements of differences in refractive 
index for parallel or perpendicularly polarized light (linear birefringence, LB) provide similar information and can 
sometimes be measured with greater sensitivity. 

The use of circularly polarized light can further provide additional structural information. Time-resolved 
measurements of circular dichroism (CD), the difference in absorption intensity between left and right circularly 
polarized light, or optical rotatory dispersion (ORD), proportional to the difference in refractive index between 
circular polarizations, can provide information on kinetic changes in molecular structures exhibiting asymmetry. 
Changes in the helical content of proteins during the course of their reactions can be monitored by time-resolved 
CD measurements, for instance. It is also possible to induce a circular dichroism in a sample by the application of a 
magnetic field. Magnetic circular dichroism (and magnetic ORD) provides information complementary to that from 
natural CD (ORD) measurements, as discussed further below. 


C3.1.7 SPECTROSCOPIC METHODS 

C3.1.7.1 ULTRAVIOLET-VISIBLE ABSORPTION 

Several strategies commonly used for time resolved optical absorption spectroscopy (TROA) are shown 


schematically in figure C3.1.4 . Perhaps most common for microsecond to millisecond time-resolved measurements 
is the cw (continuous wave) probe approach ( figure C3. 1.4(a) )). The probe can be a cw laser, provided a suitable 
wavelength is available, but cw xenon arc, tungsten filament or halogen lamps are most commonly used for UV- 
visible measurements, while glow bars have been used for IR measurements. Continuous probe sources offer the 
advantage of simplified timing — it is necessary to synchronize only the detection system and pump pulse. Two 
problems can offset this advantage, however. The first is the signal to noise ratio (S/N) of the data obtained with 
this method; the second is sample stability. 
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Figure C3.1.4. Schematic measurement traces depicting different probe strategies for transient spectroscopy and 
their typical time scales. Each panel represents the intensity of a probe beam against time. The small Gaussian at 
the bottom left of each panel represents the excitation (pump) pulse, defining the zero of time, (a) Strategy using a 
constant, cw light source as probe. The top line represents the probe light in the absence of laser pump excitation 
and the curved line represents a changing intensity after excitation of the sample. Time scales: milliseconds to 
seconds, (b) A cw arc lamp as in panel (a) augmented by a capacitive discharge across the arc to enhance intensity 
by a factor of 10-100. Time scales: microseconds to milliseconds, (c) A pulsed flashlamp is an alternative to the 
pulsed cw lamp in panel (b) that produces high power but low total energy from the probe light source, allowing for 
measurements on a faster time scale: nanoseconds to microseconds, (d) Pulsed lasers used for both pump and probe 
sources. Note that the pump pulse (left) is shown enlarged to emphasize the fact that the pump pulse must be much 
larger than the probe pulse. The probe pulse decreases in magnitude when the pump pulse excites the sample, 
creating a transient absorption. The probe pulse in the presence of the pump pulse is drawn wider only to make it 
easier to see in the figure. Time scales: nanoseconds to seconds, (e) With gated multichannel detection, the 
flashlamp probe in panel (c) is observed at a particular time delay after pump excitation, rather than monitored as a 
function of time as in single- wavelength detection. The probe pulse is diminished by the presence of transient 
absorption created by pump excitation, as measured by the spectra of the probe with and without pump sampled 
over a small range of delay times indicated by the 'gate pulse'. The probe pulse and detector gate are kept 
overlapped as their joint time delay is varied to yield spectra as a function of time on time scales from nanoseconds 
to seconds. 


The S/N of any light intensity measurement varies as the square root of the intensity (number of photons) produced 
by the source during the time of the measurement. The intensities typical of xenon arc lamps are sufficient for 
measurements of reasonable S/N on time scales longer than about a microsecond. However, a cw lamp will 


produce few photons during measurement times faster than this and the signal will be noisy. This problem can be 
dealt with in 
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very stable samples by averaging many signals, but this approach is impractical for samples of moderate or high 
photolability. In fact, the cw probe approach can be problematic even with little signal averaging if the light source 
continues to irradiate a photolabile sample between data collection cycles. A modification of the cw probe 
approach improving the S/N of measurements on a nanosecond to microsecond time scale is shown in figure C3.1.4 
(b) . The intensity of the cw lamp can be greatly increased (typically by a factor of several hundred) for times up to 
milliseconds by discharging a capacitor across an arc lamp just before firing the pump source [18]. 

The use of pulsed light sources such as lasers for both pump and probe, as shown in figure C3. 1.4(c) figure C3.1.4 
(d) and figure C3. 1.4(e) can achieve high S/N while avoiding photostability problems. The time delay between the 
probe and pump laser pulses can be varied over a wide range in measuring time-resolved absorption or emission. 
Sample photostability problems are avoided by proper adjustment of the (potentially very high) probe intensity. 
However, each single-wavelength, single-time measurement must be repeated at a number of wavelengths to obtain 
a spectrum. This process is further repeated at a number of time points to map out time-resolved spectra. This can 
again lead to sample deterioration in less photostable samples and can be tedious even for stable samples. A 
variation of this approach avoids the problem by using broad band pulsed light sources such as a laser-pumped dye, 
or 'soup' of laser dyes [19], or flashlamps as probe sources for multichannel measurements. 

Figure C3. 1.4(d) illustrates use of a pulsed xenon flashlamp probe source with a typical pulse width of several 
microseconds. This has the advantage of providing very high peak power for nanosecond measurements with high 
S/N, but low integrated intensity so that samples need not be exposed to excessive light from the probe source. For 
single-wavelength kinetic measurements, however, flashlamps have the disadvantage of being able to probe 
transient absorptions only for the duration of the flash, i.e., for microseconds or less. 

All of the approaches described above can be used in a kinetic mode where the time evolution of absorption or 
emission signals are measured at one wavelength. As identifying transient intermediates often requires observing 
many wavelengths, an alternative is to replace photomultipliers or photodiodes with gated multichannel detectors 
such as intensified diode arrays ( figure C3.1.5 ) or intensified CCDs to measure entire spectra. While these detectors 
can be used with cw sources, an optimal approach is to use gated multichannel detectors with flashlamps as 
depicted in figure C3. 1.4(e) [20, 21]. 
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Figure C3.1.5. Schematic diagram of an intensifier-gated optical multichannel analyser (OMA) detector. The 
detector consists of a microchannel plate (MCP) image intensifier followed by a 1024-channel Reticon photodiode 
array. Light dispersed across the semitransparent photocathode ejects photoelectrons. These are accelerated toward 
the entrance of the microchannels by the gate pulse. The photoelectrons collide with the channel walls to produce 
secondary electrons, which are accelerated in turn by the MCP bias voltage to produce further collisions and 
electron multiplication. Electrons leaving the microchannels are further accelerated by the phosphor bias voltage, 


about 6 kV, until they strike the phosphor and produce light. Several hundred photons are produced for every MCP 

electron, providing further gain. This light, intensified by a factor of 10 over the amount produced when the gate 
pulse is off, is detected by the photodiode array. (From Lewis J W, Yee G G and Kliger D S 1987 Rev. Sci. 
lustrum. 58 939-44.) 

Figure C3.1.6 schematically shows the use of a flashlamp probe source to efficiently measure entire spectra at a 
given delay time. The flashlamp output extends from the UV into the IR spectral region. Multichannel detectors 
can typically measure intensities simultaneously at 500 to 1000 wavelengths over this range. The multichannel 
detector is gated on at the peak intensity of the flashlamp to provide maximum S/N for sampling times as short as 
2-5 ns. The probe light source and detector gate can be delayed together to measure spectra with constant S/N over 
a wide time range. It is still necessary to carry out experiments at multiple delay times with this method to obtain 
kinetic information, but delay times can often be logarithmically spaced, as shown in figure C3.1.7 so that fewer in 
number are needed than the number of wavelengths needed to accurately record the spectra of intermediates. This 
is particularly true when using global fitting approaches, described below, which extract the maximum amount of 
information from time-resolved spectra. 
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Figure C3.1.6. Block diagram for nanosecond absorption apparatus using multichannel detection. (From Goldbeck 
R A and Kliger D S 1993 Methods Enzymol. 226 147-77.) 
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Figure C3.1.7. Time-resolved optical absorption data for the Soret band of photolysed haemoglobin-CO showing 
six first-order (or pseudo-first-order) relaxation phases, I-VI, on a logarithmic time scale extending from 
nanoseconds to seconds. Relaxations correspond to geminate and diffusive CO rebinding and to intramolecular 
relaxations of tertiary and quaternary protein structure. (From Goldbeck R A, Paquette S J, Bjorling S C and Kliger 
D S 1996 Biochemistry 35 8628-39.) 

C3.1.7.2 FLUORESCENCE 

Fluorescence, the spontaneous emission of light in a spin-allowed transition from an excited state to a lower energy 
or ground state, can provide a very sensitive means for detecting the concentrations of atomic and molecular 
species in transient kinetic studies. Being a null measurement, the sensitivity can be extraordinarily high. Indeed, it 
is possible to detect the presence of a single molecule in solution through its laser-induced fluorescence (LIF) [22]. 

Sensitivity levels more typical of kinetic studies are of the order of 10 molecules cm -3 . A schematic diagram of 
an apparatus for kinetic LIF measurements is shown in figure C3.1.8 . A limitation of this approach is that only 
relative concentrations are easily measured, in contrast to absorption measurements, which yield absolute 
concentrations. Another important limitation is that not all molecules have measurable fluorescence, as 
radiationless transitions can be the dominant decay route for electronic excitation in polyatomic molecules. 
However, the latter situation can also be an advantage in complex molecules, such as proteins, where a lack of 
background fluorescence allows the selective introduction of fluorescent chromophores as probes for kinetic 
studies. (Tryptophan is the only strongly fluorescent amino acid naturally present in proteins, for instance.) 


-17- 


Sample 
in 


C.-l 


n r ri M] r n II 


Mirror 



M]rrnr 


Dye 


Delay — E*tirn« 


Figure C3.1.8. Schematic diagram of a transient kinetic apparatus using laser-induced fluorescence (LIF) as a 
probe and a C0 2 laser as a pump source. (From Steinfeld J I, Francisco J S and Hase W L 1989 Chemical Kinetics 
and Dynamics (Englewood Cliffs, NJ: Prentice-Hall).) 

Determining a molecule's fluorescence lifetime (t f ), typically of the order of nanoseconds for strong emitters, is 
frequently an object of transient kinetic study. A transient measurement of lifetime after excitation with a 
nanosecond flash lamp or pulsed laser (inexpensive subnanosecond pulsed nitrogen lasers are available for this 
purpose) can be accomplished by directly monitoring the time course of fluorescence intensity using a fast 
photomultiplier and transient recorder or boxcar integrator or, less directly, by measuring the statistical distribution 
of times between absorption and emission events (under low intensity illumination conditions) using a time- 
correlated single-photon counting apparatus [23, 24]- In general, the fluorescence lifetime can be shorter than the 
radiative lifetime (x R ), given by the Einstein B coefficient for the emissive transition (usually estimated from the 
corresponding absorption band, as the same upper and lower states are usually connected by absorption and 
emission), because radiationless transitions and intermolecular excited state quenching can compete kinetically 
with emission in depopulating the excited state. A measurement of x F can thus be used to determine the total rate 
(£ NR ) of nonradiative relaxation processes (internal conversion, intersystem crossing) and bimolecular quenching 

(£q): £ nr + £q[Q]=1/t f - 1/t r , where [Q] is the concentration of the quenching molecule. 

Nonradiative relaxation and quenching processes will also affect the quantum yield of fluorescence, ^ = £ R /(£ R + 
^NR + ^c^Q-D* R e l at i ve measurements of fluorescence quantum yield at different quencher concentrations are easily 
made in steady state measurements; absolute measurements (to determine £ NR ) are most easily obtained by 
comparisons of steady state fluorescence intensity with a fluorescence standard. The usefulness of this situation for 
transient studies 
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of large molecules, such as biopolymers, lies in the ability to use the quasi-steady state fluorescence intensity of an 
embedded chromophore as a probe of dynamic changes in the solvent exposure of the chromophore (through 
changes in £ NR ) or in long distance chromophore-chromophore quenching interactions (Forster energy transfer) 

o 

over the 10 to Is time regime. The rate of Forster (very weak coupling excitonic) energy transfer varies inversely 
as the sixth power of the chromophore to chromophore distance, making the observed fluorescence intensity a 
potentially sensitive ruler of intramolecular distances in static and time-resolved studies. Thus, for example, time- 
resolved measurements of tryptophan near-UV fluorescence intensity under steady state illumination can be used to 


monitor conformational relaxations in aqueous proteins after rapid mixing with a denaturant that disrupts secondary 
and tertiary structure. As a protein's native structure unfolds, increased distance from a quenching group, such as 
the haem prosthetic group in haem proteins, can dramatically increase the fluorescence yield. In proteins lacking an 
internal quencher, increased exposure of tryptophan to the solvent enhances £ NR and reduces the fluorescence 
intensity. 

C3.1.7.3 INFRARED ABSORPTION 

Time-resolved spectroscopy in the IR region (TRIR) can give detailed information about structural changes in 
molecules by monitoring changes in the frequencies and absorption amplitudes of vibrational normal modes. The 
selection rule for vibrational transitions allows for those modes whose motions produce a change in electric dipole 
moment to be IR active. Changes in bond strengths or molecular symmetry accompanying intramolecular processes 
such as isomerizations, and the binding or dissociation of ligands in complexes, are examples of transient events 
that may be studied with TRIR. An apparatus for TRIR is shown schematically in figure C3.1.9 figure C3.1.10 
shows the results of a TRIR study of the transient binding of CO to a copper atom in cytochrome c oxidase after 
photodissociation of the Fe-CO bond in this haem protein [25]. Evolution in the secondary structure of proteins 
may also be followed by measuring the TRIR of several vibrational bands arising from the peptide backbone: the 
N-H stretching vibration (3300 cm"), the C=0 stretch, or amide I band (1600-1700 cm") and the N-H bending 
vibration, or amide II band (1520-1550 cm"). TRIR of these bands has been used to directly monitor fast folding 
and unfolding reactions in the protein RNase A [26] and in small peptides [27], for example. 
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Figure C3.1.9. Block diagram for time-resolved infrared spectroscopy apparatus. (From Dyer R B, Einarsdottir 6, 
Killough P M, Lopez-Garriga J J and Woodruff W H 1989 J. Am. Chem. Soc. Ill 7657-9).) 
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Figure C3.1.10. (a) Steady state IR difference spectrum (dark minus light) of cytochrome c oxidase CO complex 
measured at low temperature (127 K). This protein contains a copper atom situated immediately adjacent to a haem 

iron, the latter binding CO with high affinity at equilibrium. The band at 2061 cm shows the presence of a Cu- 
CO bond in the intermediate species frozen immediately after photolysis (light) of the equilibrium, Fe-CO bonded 
protein (dark), (b) The time evolution of the Cu-CO bonded intermediate after room temperature photolysis of the 
Fe-CO protein complex. The CO is transferred to the Cu atom within femtoseconds of photodissociation of the Fe- 
CO bond. The Cu-CO bond then thermally dissociates with a time constant of about 1.5 jus and freed CO diffuses 
out of the protein. (The two frequencies outside the Cu-CO band provide control measurements.) (From Dyer R B, 
Einarsdottir 6, Killough P M, Lopez-Garriga J J and Woodruff W H 1989 J. Am. Chem. Soc. Ill 7657-9).) 
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The absorption intensities characteristic of IR transitions, the intensities of IR light sources and the sensitivities of 
IR detectors are relatively low compared with those for visible and UV wavelengths. These factors present a 
challenge for experimentalists wishing to accumulate data with signal to noise ratios sufficient to resolve time- 
dependent changes. An additional factor is the wide presence of IR active transitions in polyatomic molecules. 
While on the one hand this constitutes one of the primary advantages of the TRIR technique, it means on the other 
hand that transient IR signals are often detected against interfering background absorptions from the solvent or 
from peripheral modes in large molecules. For example, water, an important solvent in TRIR studies of biological 


molecules, is also a strong IR absorber. Differential TRIR techniques are often used to overcome interference from 
such background absorptions. 

Instead of the blackbody-radiation light sources commonly used in dispersive IR spectrometers, time-resolved 
studies often use pulsed xenon flash lamps or tunable CW diode lasers, which concentrate IR output intensity in 
time or frequency space, respectively. TRIR has typically been measured in single-wavelength, kinetic mode, as IR 
sensitive OMAs are not yet widely available. Kinetic IR signals are collected with photoconductive detectors using 
materials such as indium antimonide (InSb) or mercury-cadmium-tellurium (HgCdTe). However, complete 
nanosecond TRIR spectra have been recorded using a dispersive scanning spectrometer with an HgCdTe detector 
[28], although the spectral accumulation times tend to be long. Another approach to spectral mode TRIR has been 
to combine Fourier transform techniques (FTIR) with time-resolved spectroscopy [29]. 

C3.1.7.4 RESONANCE RAMAN SCATTERING 

The Raman effect, the inelastic scattering of photons resulting in frequency shifts that reflect the vibrational and 
rotational energies of the scattering molecule, is used in transient spectroscopy to obtain time-dependent vibrational 
information that often complements that obtained from TRIR. Raman scattering arises from changes in the 
polarizability of a molecule associated with vibrational (and rotational) motions, rather than from changes in the 
dipole moment itself, as is the case in IR absorption spectroscopy. Raman selection rules therefore can in general 
be different from those for absorption. The two methods thus complement one another, particularly in small 
molecules. This is most true for molecules with a centre of symmetry. In this case, the selection rules are exactly 
complementary — all Raman active transitions are IR inactive, and vice versa. Raman spectroscopy also offers an 
advantage for transient studies of solutes in solvents such as water that are not strongly Raman active, e.g. transient 
studies of aqueous biomolecules. 

Bringing the frequency of the photon into near resonance with an electronic transition enhances the intensity of the 
inherently weak Raman scattering process. The ratio of Raman to Rayleigh scattering intensity, typically 10 -9 to 
10 , is increased in resonance Raman by one to two orders of magnitude to give ratios of 10 to 10 . Time- 
resolved resonance Raman spectroscopy (TR ) thus offers greater S/N ratios and higher time resolution for 
transient studies. It also offers greater specificity in the time-resolved vibrational spectroscopy of large molecules, 
as compared with TRIR spectroscopy, in that only the vibrational modes associated with the nuclear structure of 
the resonant chromophore are enhanced. This is a particularly important advantage in very large molecules, such as 
biopolymers, where many overlapping IR and Raman active modes may be present [30, 31 and 32 ]. 

Raman spectroscopy requires an intense, monochromatic light source. The field thus developed rapidly when lasers 

became commercially available in the 1960s. Nanosecond TR measurements are now performed with several 
configurations using pulsed or CW lasers. A pump-probe two-pulse method giving both kinetic and spectral 
information about dynamic processes is frequently used. Kinetic information is obtained in this method by varying 
the delay time between pump and probe, as provided for by a digital delay generator in the apparatus shown in 
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figure C3.1.1 1. The ability to choose different pump and probe wavelengths in this method reduces the likelihood 

of spectral artifacts. For nanosecond TR measurements, the pump source is typically a low repetition rate, high 
pulse-energy laser, often a Nd:YAG or excimer laser with a pulse duration of several nanoseconds. The probe light 
source in a Raman measurement must also be intense — intense enough to generate detectable Raman scattering, but 
not so intense as to photoinitiate changes in the sample (recall that the probe is in near resonance with an electronic 
transition of a photoreactive molecule). The same laser types used as pumps are also used as probe sources, 
although they are generally coupled with a dye laser, hydrogen, deuterium or methane Raman shifter or, more 
recently, a nonlinear harmonic generation crystal or an optical parametric oscillator (OPO) to increase the selection 
of available wavelengths. 
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Figure C3.1.11. Apparatus for pump-probe time-resolved resonance Raman spectroscopy. (From Varotsis C and 
Babcock G T 1993 Methods Enzymol 226 409-31.) 

Discriminating the small amount of light intensity in the Raman-shifted lines from the intense Ray leigh- scattered 
light of the probe laser places great demands on monochromator design, which must balance high wavelength 
resolution and stray-light rejection against signal throughput. Single, double and triple monochromators are used in 

TR J measurements for correspondingly increased rejection of Rayleigh scattering. The addition of each grating 
typically introduces a throughput efficiency factor of roughly 30%, however, so that the overall efficiency of a 

triple monochromator is only about 3%. Single 0.75-1 m monochromators have output efficiencies ideal for TR 
experiments and may be used to measure Raman frequencies greater than 500 cm" . Notch filters combined with a 
single monochromator give Rayleigh rejection factors comparable to a double monochromator while maintaining 

greater throughput and are suitable for TR measurements at frequencies as low as 50 x 100 cm T301. 
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TR signals are preferably detected with a time-gated optical multichannel analyser (OMA), which allows the 
entire spectral change associated with each pump-probe pair to be recorded for a given time delay. Intensified 
diode array detectors, with optical quantum efficiencies of about 20%, can be gated with a high voltage pulse to 
capture the transient Raman signal of interest while suppressing interference from Rayleigh scatter and 
fluorescence falling outside the time window of the gate pulse. CCD (charge-coupled device) detectors are 2D 

arrays that have more recently become available for TR J spectroscopy. They offer the advantages of low readout 
noise and higher quantum efficiencies in the IR region. Although also more sensitive to artifacts from cosmic rays, 
the narrow spikes these produce can be removed from affected data by using commercially available software. 

C3.1.7.5 CIRCULAR DICHROISM AND OPTICAL ROTATORY DISPERSION 


Circular dichroism (CD), the differential absorption of left versus right circularly polarized light, is the polarization 
spectroscopy perhaps best suited to detecting the presence of asymmetry in the structure or environment of 
molecular chromophores. Various time-resolved CD (TRCD) methods have been developed to take advantage of 
this sensitivity and obtain more detailed structural information about kinetic processes than is found from ordinary 
time-resolved absorption measurements [33]. Some examples of the processes studied with TRCD methods are: the 
effects of electronic excitation on the structure of chiral inorganic complexes, the changes in a-helical secondary 


structure accompanying the folding reactions of proteins and the time evolutions of tertiary and quaternary 
structure in allosteric proteins, as reflected in the protein-environment-induced CDs of 'reporter' chromophores. 

CD is a small effect. As/e, the ratio of the difference in circularly polarized extinction coefficients, As = A L - s R , to 
total absorption, s = ^(s L + s R ), is typically only about 10 - 10 . Being so small, the measurement of CD with 

signal to noise ratios sufficient for static and time resolved studies requires special methods, each representing a 
different tradeoff between the factors such as time resolution, sensitivity to artifacts and experimental simplicity, 
that determine the method of choice for a particular kinetic study. 

Kinetic CD measurements on slow time scales may be made on commercial CD instruments, which can be 
equipped for stopped-flow studies. Commercial CD instruments use rapid polarization modulation methods, 
introduced in the 1960s, and phase-locked detection to increase sensitivity. Linearly polarized light is passed 
through a photoelastic modulator (PEM), essentially a small quartz plate undergoing resonantly driven acoustic 
vibration, to produce time-varying elliptically polarized light cycling between left and right circular polarizations. 
The CD signal is detected as the AC component of the light intensity transmitted through the sample, normalized to 
the magnitude of the DC component and to a calibration factor (determined by measuring the CD of a standard 
substance) reflecting the relative gain of the AC and DC electronic amplification stages. Noise from instrumental 
sources, such as arc wander, containing frequency components lower than the modulator frequency is effectively 
filtered from the AC-modulated CD signal. However, the PEM resonant frequency, typically 1-100 kHz, not only 
sets an upper limit on the frequencies of the instrumental noise components suppressed, it also limits the maximum 
characteristic frequencies of the kinetic processes that may be studied. TRCD measurements on time scales faster 
than about a millisecond thus require unconventional methods. 

An ellipsometric approach to CD measurements used for TRCD spectroscopy in the nanosecond regime is depicted 
schematically in figure C3.1.12 [34]. The instrument used in this technique is based on a nanosecond Nd:YAG 
laser photolysis apparatus using a broad band microsecond xenon flash lamp as a probe. Rather than detecting the 
differential absorption of left and right circularly polarized light, the polarization state of elliptically polarized light 
is detected in this method, i.e., ellipsometry. The linearly polarized probe beam is passed through a strain plate, a 
fused 
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silica plate under slight mechanical compression, to produce highly eccentric elliptically polarized light. The CD 
signal is detected as the difference between right and left elliptically polarized light intensity transmitted through 
the sample and an analysing polarizer, normalized to the sum of the intensities and a proportionality factor 
determined by the pathlength and concentration of the sample and the magnitude of the strain plate retardance, 8. 
(No calibration against a CD standard is required in this method.) This can be written as As=(8/c/)/(/ RE p-/ LE p)/ 
^REP + ^LEp)' ^ e P rmiar Y advantage of the near-null approach is that the signal is effectively amplified relative to 
the noise from instrumental sources by a factor inversely proportional to 8. The tradeoff for this increased 
sensitivity and time resolution is that more care must be taken to avoid potential interference from light scattering, 
fluorescence and optical artifacts — particularly those from linear birefringence present in the sample or optics — 
than is necessary for conventional PEM-based measurements. An instrument for nanosecond far-UV TRCD using 
this method is shown schematically in figure C3.1.13 . 
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Figure C3.1.12. Schematic diagrams of optical configurations for quasi-null detection techniques used in transient 
kinetic studies of (a) CD, (b) MCD and (c) ORD/LD. Elliptical polarization is provided in (a) and (b) by a 
horizontal prism polarizer followed by a fused silica plate of linear retardance ±8, magnitude -1°, induced by 
mechanical compression indicated by arrows. CD of the sample adds to or subtracts from the net ellipticity of the 
beam detected by the vertical analysing polarizer, giving rise to the differential signal shown in the text. A solvent 
blank in an opposed applied field cancels the Faraday rotation of solvent and cell in (b). The polarizer axis is 
rotated to ±p from horizontal in (c), where P~l°. ORD or LD of the sample adds to or subtracts from the net 
rotation of the beam detected by the analysing polarizer, giving rise to a differential signal. (From Chen E, 
Goldbeck R A and Kliger D S 1997 Annu. Rev. Biophys. Biomol Struct. 26 325-53.) 
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Figure C3.1.13. Experimental configuration for far-UV nanosecond CD measurements using a frequency- 
upconverted Ti:sapphire laser as a probe source. P 1 and P 2 are MgF 2 Rochon polarizers at cross orientations. SP 1 is 
a strained transparent plate with about 1° of linear birefringence for quasi-null ellipsometric CD detection. Prism 
PM 1 and the iris I 7 select the far-UV fourth harmonic of the argon laser-pumped Ti-sapphire laser's near-IR 
fundamental output to probe the ellipticity of the sample. A second laser beam at 532 nm is used to pump CD 


transients in the sample. (From Goldbeck R A, Kim-Shapiro D B and Kliger D S 1997 Ann u. Rev. Phys. Chem. 48 
453-79.) 

Ultrafast TRCD has also been measured in chemical systems by incorporating a PEM into the probe beam optics of 
a picosecond laser pump-probe absorption apparatus [35]. The PEM resonant frequency is very low (1 kHz) in 
these experiments, compared with the characteristic frequencies of ultrafast processes and so does not interfere 
with the detection of ultrafast CD changes. 

In principle, optical rotatory dispersion (ORD) and circular dichroism contain identical information about 
molecular structure and can be interconverted using the Kramers-Kronig integral transforms [36]. CD, being 
limited to absorption bands and easier to interpret than ORD, became the preferred approach to the study of 
optically active molecules after the development of PEM-based CD spectrometers. From an experimental point of 
view, however, optical rotation (OR), the rotation of the polarization plane of light by a chiral substance, is easier 
to measure than circular dichroism. This fact accounts for the historical importance of OR (the beginnings of 
chemical kinetics as a quantitative discipline, for instance, can be traced to the use of OR measurements to 
determine reaction rates for sucrose hydrolysis by Wilhemy in 1850 [37]) and for the recent development of rapid 
time-resolved ORD methods for kinetic studies. A near-null polarimetric ORD method has been incorporated into 
several generations of flash photolysis instruments developed over the past few decades for time-resolved 
applications extending into the nanosecond time regime [33]. (This method also doubles as a very sensitive 
technique for measuring linear dichroism in anisotropic samples and is useful for studies of orientational relaxation 
after laser photoselection [38].) 

C3.1.7.6 MAGNETIC CIRCULAR DICHROISM 

Magnetic circular dichroism (MCD) is independent of, and thus complementary to, the natural CD associated with 
chirality of nuclear structure or solvation. Closely related to the Zeeman effect, MCD is most often associated with 
orbital and spin degeneracies in chromophores. Chemical applications are thus typically found in systems where a 
chromophore of high symmetry is present: metal complexes, porphyrins and other aromatics, and haem proteins are 
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prominent examples. Time-resolved MCD (TRMCD) spectroscopy is best suited to the study of kinetic processes 
directly affecting chromophore electronic structure, e.g., spin, oxidation and ligation state changes in metal 
complexes and metalloporphyrins. TRMCD is measured by adding a magnet — permanent, electric or 
superconducting — to the TRCD instrumentation described above. This is straightforward to do for PEM-based 
instruments, such as the ultrafast MCD apparatus of Xie and Simon [39]. Ellipsometric TRMCD measurements, on 
the other hand, require an additional optical component to compensate for rotation of the probe beam's polarization 
orientation by the Faraday effect of the transparent solvent and cell windows in the magnetic field [33]. 


C3.1.8 ANALYSIS OF TIME-RESOLVED SPECTRAL DATA 

Transient kinetic studies measure a time-resolved record of some property of the sample, such as absorption, 
emission or conductance, that can be analysed for its kinetic components. The data are usually stored as a digital 
computer file containing a linear array of observations against time. In the case of spectroscopic measurements, this 
may be generalized to a rectangular array of spectra against time. The particular form of the analysis that is applied 
to the data in order to obtain rate constants and amplitudes is determined by knowledge about, or assumption of, a 
particular kinetic mechanism. Determining an unknown mechanism is often an iterative process in which possible 
models are tested until the most parsimonious mechanism consistent with the data is found. This mechanism can 
frequently be assumed to involve only first order or pseudo-first order rate processes in fast kinetic studies. In this 
case, the analysis of single-wavelength absorption data may be as simple as a linear regression plot of In 
(absorption) against time, the slope of which provides the rate constant for a simple exponential decay. More 
complicated analysis, i.e., nonlinear least squares multi-exponential fitting of absorption against time, is required if 


more than one process is present. 

Multichannel time-resolved spectral data are best analysed in a global fashion using nonlinear least squares 
algorithms, e.g., a simplex search, to fit multiple first order processes to all wavelength data simultaneously. The 
goal in this case is to find the time-dependent spectral contributions of all reactant, intermediate and final product 
species present. In matrix form this is A(A,, i) = BC, where A is the data matrix, rows indexed by wavelength and 
columns by time, B contains spectra as columns and C contains time-dependent concentrations of all species 
arranged in rows. 

A general first order mechanism can be written symbolically in matrix form as 

d<(0 


di 


= Kc(r) 


where K is the matrix of rate constants and c(t) is a column vector of time-dependent concentrations. A general 
solution for the concentrations (found using eigenvalue techniques, for instance) is 


c,(r) = X;^[M- 1 d/ = 0)]^ HlK ^' 


where K is diagonalized by a similarity transform with matrix M. An efficient approach to fitting the kinetic 
mechanism represented by the elements of K to the data in A is to first apply singular value decomposition (SVD) 
to 
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A, A = USV , where S is a diagonal matrix of singular values, and U and V are orthonormal matrices containing in 
this case spectral and temporal information, respectively (superscript T' indicates matrix transpose) [40]. (SVD is 
very similar to principal component analysis [41, 42].) The data are then filtered of random noise by discarding 
singular values that fall below a value determined by the magnitude of the noise, leaving the truncated matrix S . 
(The corresponding columns of U and V are also discarded to obtain U r and V .) The singular values remaining 
correspond to those columns of U and V containing spectrokinetic information filtered of noise. Their number, r, is 
the effective rank of A. Besides providing a convenient deconvolution of the data into spectral and temporal 
components and filtering these into a more compact representation, SVD also provides physical information in that 
r gives a lower bound on the number of independent spectrotemporal components, or kinetic species, present in the 
system. Although the individual spectra in U r and evolutions in V r do not generally correspond to those of physical 
species, observed rate constants can be efficiently fitted to the aggregate temporal information in V . These 

constants correspond to the diagonal elements of M KM, which we will call K [, . In the simplest case, that of 
simple decays proceeding in parallel, K = K Qbs , i.e., M is the identity matrix, I. The elements of C are calculated 

from the solution for cj(t) using K Qbs and known, or trial, values of 0^=0). The spectra of intermediates are then 

calculated from B = U r S r V r T pinv (C), where pinv is the matrix pseudoinverse (essentially a least squares solution 
to the overdetermined inversion problem). Finally, any assumptions about initial concentrations can be checked by 
comparing B against known model spectra. For more general first order kinetic mechanisms, M is calculated from 

M KM = K Qbs and knowledge of, or assumptions about, branching ratios or equilibrium constants for any back 
reactions that may be present. Such assumptions, if necessary, can be tested by comparing the calculated spectra in 
B to known model spectra. (It may be expedient to obtain the observed rate constants, without determining a 
mechanism, by simply choosing all cj(t=0) = 1 and setting M = I, in which case the spectra in B are sometimes 
referred to as 'Z?-spectra' in the literature.) The derivation of first order mechanisms from transient spectral data is 
discussed in more detail in [43]. 

The Arrhenius relation given above for the temperature dependence of an elementary reaction rate is used to find 
the activation energy, E^ and the pre-exponential factor, A, from the slope and intercept, respectively, of a (linear) 

plot of InfkfT)) against T~ . The standard enthalpy and entropy changes of the transition state (at constant 


temperature and pressure) can be found for most types of reaction from 


&H+ = E^-RT 


and 


AS* = R]n{AhfkuT)-R. 


These expressions are modified in the case of non-unimolecular gas phase reactions to 

AH fl * = £ a - RT(] -An*) 
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and 

As} = R In(AhfknT) - R(l - AiA) 

where & n ± _ l'AV^/Kl\ A ^^being the standard volume of activation. The free energy of activation is found 

The presence of nonlinearity in an Arrhenius plot may indicate the presence of quantum mechanical tunnelling at 
low temperatures, a compound reaction mechanism (i.e., the reaction is not actually elementary) or the 'unfreezing' 
of vibrational degrees of freedom at high temperatures, to mention some possible sources. 
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C3.2 Electron trandfer reactions 

Gilbert C Walker and David N Ber tan 


C3.2.1 INTRODUCTION 

C3.2.1.1 WHAT IS CHEMICAL ELECTRON TRANSFER? 

Chemical reactions involve the redistribution of electronic charge and changes in chemical bonding. Chemical 
bonds can reorient, disconnect, and reform during chemical reactions. These bonding changes are coupled, in turn, 
to changes in the structure (orientation, hydrogen bonding, polarization etc.) of the surrounding solvent. Bonding 
changes can be subtle or dramatic. The subject of this chapter is the special class of reactions in which an electron 
is displaced by distances much larger than the length of a single chemical bond; such reactions are known as 
' electron-transfer reactions ' . 

Electron transfer reactions are conceptually simple. The coupled structural changes may be modest, as in the case 
of 'outer- sphere' electron transport processes. Other electron transfer processes result in bond formation or 


cleavage — inner sphere electron transfer — and are more complex. Despite their apparent simplicity, outer-sphere 
electron-transfer reactions play a central role in chemistry, from current flow at electrodes to the early events in 
photosyntheis and to radiation damage in DNA. Theories that predict the relationship between chemical structure, 
solvation, spectroscopy and electron transfer rates were developed extensively over the last 50 years; they provide a 
valuable unifying thread in this field of research [JJ. 

C3.2.1.2 THE DIVERSITY OF CHEMICAL ET SYSTEMS 

Much of this chapter concerns ET reactions in solution. However, gas phase ET processes are well known too. See 
figure C3.2.1 . The 'harpoon mechanism' by which halogens oxidize alkali metals is fundamentally an electron 
transfer reaction [2]. One might guess, from this simple reaction, some of the structural parameters that control ET 
rates: relative electron affinities of reactants, reactant separation distance, bond length changes upon 
oxidation/reduction, vibrational frequencies, etc. 
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Figure C3.2.1. A slice through the intersecting potential energy curves associated with the K+Br 2 electron transfer 
reaction. At the crossing point between the curves (i? ), electron transfer occurs, thus 'harpooning' the Er : species, 

which is then associated with K + at shorter distances. From [2]. 

Much of the motivation for examining ET processes comes from biology. Many processes in bioenergetics involve 
transmembrane electron transport. The cascade of biological reactions that leads to an electrochemical gradient 
starts with electron transfer [3]. In the photosynthetic charge separation of purple bacteria, for example, a 
photoexcited state of a magnesium chlorophyll 'special pair' undergoes picosecond time-scale electron transfer to a 
pheophytin and then to a quinone [4]. These redox-active species are imbedded in a membrane-spanning protein 
whose structure was determined in 1984. These reactions are the subject of intense experimental and theoretical 
interest. 

In the 1980s, considerable attention turned to ET reactions in fixed donor-acceptor geometries with the goal of 
understanding the control of the fixed distance biological reactions. Locking in the donor-acceptor separation 
distance simplifies the interpretation of measured ET kinetics [5], by removing uncertainties associated with 
intermolecular motion and docking. The simplest approach is to freeze donors and acceptors in a matrix [6], 
generating an ensemble of fixed distances. Simplifying the interpretation even further came from synthesizing 
covalently linked donor-bridge-acceptor molecules [7, 8] ( figure C3.2.2 ). Indeed, unimolecular ET structures have 
been studied in considerable depth in both gas and solution-phase environments [9]. 
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Figure C3.2.2. Long-range charge separation occurs from the SI state of the rigidly bridged molecule above. This 
figure shows predominantly emission from the charge separated state. Absence of local emission from the SI state 
indicates that ET occurs on a subnanosecond time-scale. Curves: (1) R = -H, (2)R = -CH 3 , and (3) R = -OCH 3 . 
From Wegewijs B and Verhoeven J W 1999 Long-range charge separation in solvent-free donor-bridge-acceptor 
systems Adv. Chem. Phys. 106 248. 

Electron transport processes at surfaces often involve electron-tunnelling transport. For example, in the scanning 
tunnelling microscope (STM) (see section B1.19 and figure C3.2.3 ), electrons flow between delocalized initial and 
final states. Depending on the experimental design, the tunnelling can proceed through vacuum or through attached 
atoms and molecules. In closely related photochemical experiments, electrons are driven from a delocalized 
electrode state to a localized molecular species or another electrode through an 'insulating' molecular monolayer 
[10]. 

In solid state materials, single-step electron transport between dopant species is well known. For example, electron- 
hole recombination accounts for luminescence in some materials [11]. Multistep hopping is also well known. 
Models for single and multistep transport are enjoying renewed interest in the context of DNA electron transfer 
[ 12 , 13 , 14 and 15 ]. Indeed, there are strong links between the ET literature and the literature of hopping 
conductivity in polymers [16]. 
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Figure C3.2.3. Schematic view of a scanning tunnelling microscope. From Chen C J 1993 Introduction to 
Scanning Tunnelling Microscopy (Oxford: Oxford University Press). 


Consideration of the donor-bridge-acceptor systems just mentioned, in the presence of optical excitation, raises the 
question of whether the photoexcited state might have substantial charge transfer character. Indeed, excitation to 
charge-transfer excited states is observed in inorganic complexes; these transitions are known as metal-ligand, 
ligand-metal, and intervalence charge transfer bands. Intervalence bands are particularly diagnostic of bridge- 
mediated donor-acceptor interactions. The extent of excited state charge transfer is especially important in species 
such as the 'special pair' of chlorophylls in photosynthesis, as polarization of this excited state may influence its 
rate of ET [17]. 

Much of chemistry occurs in the condensed phase; solution phase ET reactions have been a major focus for theory 
and experiment for the last 50 years. Experiments, and quantitative theories, have probed how reaction-free energy, 
solvent polarity, donor-acceptor distance, bridging structures, solvent relaxation, and vibronic coupling influence 
ET kinetics. Important connections have also been drawn between optical charge transfer transitions and thermal 
ET. 

C3.2.1.3 ET IN BIOLOGY 

A substantial fraction of the named enzymes are oxido-reductases, responsible for shuttling electrons along 
metabolic pathways that reduce carbon dioxide to sugar (in the case of plants), or reduce oxygen to water (in the 
case of mammals). The oxido-reductases that drive these processes involve a small set of redox active 'cofactors', 
that is, small chemical groups that gain or lose electrons. These cofactors include iron porphyrins, iron-sulfur 
clusters and copper complexes as well as organic species that are ET active. 

Many key protein ET processes have become accessible to theoretical analysis recently because of high-resolution 
x-ray structural data. These proteins include the bacterial photo synthetic reaction centre [18], nitrogenase 
(responsible for nitrogen fixation), and cytochrome c oxidase (the terminal ET protein in mammals) [ 19 , 20 ]. 
Although much is understood about ET in these molecular machines, considerable debate persists about details of 
the molecular transformations. 


-5- 
C3.2.1.4 ET TECHNOLOGY: MEDICAL DIAGNOSTICS AND NANOSCALE ELECTRONICS 

In addition to conventional applications in conducting polymers and electrooptical devices, a number of recent 
novel applications have emerged. Switching of DNA electron transfer upon single-strand/double-strand 
hybridization forms the basis for a new medical biosensor technology. Since the number of base pairs of length 20 

90 1 9 

is 4 (or 10 ), a modest length DNA sequence, with ET properties that switch upon hybridization, can be 
employed to detect the presence of a complementary sequence. Many research efforts are engaged in devising ET- 
based biosensors to detect human disease and to sense the contamination of foodstuffs. 

Small molecules are of the order of nanometres in linear dimensions. Conventional microelectronics technology 
employs features fully a hundred to a thousand times larger. As such, considerable interest is focused upon 
employing molecular species (together with and the rules of molecular ET) to devise ultra-small computing 
devices. Examples of recent advances include the demonstration of molecular-scale diodes (figure C3.2.4), 
prototypes for molecular scale memories, and single-electron devices [ 21 , 22 ]. Remarkable physics arises in these 
devices. For example, in devices with dimensions of the order of the electron wavelength, conductivity is 
quantized; and current-voltage relations follow a stair-step pattern, rather than a simple linear relationship. 
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Figure C3.2.4. Plot of the log of photocurrent against number of methyl units in a alkylsilane based monolayer 
self-assembled ona« silicon electrode. The electrode is immersed in a solution with an electron donor. Best fits of 
experimental data collected at different light intensities: (•) 0.3 mW cm ; (!) 0.05 mW cm . From [10]. 


C3.2.2 ET THEORY AND EXPERIMENT 

This section presents the basic theoretical principles of condensed phase electron transport in chemical and 
biochemical reactions. 
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C3.2.2.1 ADIABATIC ET THEORY 


Modern electron transfer theory has its conceptual origins in activated complex theory, and in theories of 
nonradiative decay. The analysis by Marcus in the 1950s provided quantitative connections between the solvent 
characteristics and the key parameters controlling the rate of ET. The Marcus theory predicts an adiabatic 
bimolecular ET rate as 


ft LT = ,4exp 


\ k*T ) 


(C3.2.1) 


where A is a collision frequency and AG is the free energy of activation for the donor-acceptor ET process when 
the redox species are in contact. Marcus theory leads to an 'intersecting parabola model' which results from the 
assumption that the distribution of nuclear configurations about the equilibrium is Gaussian in the distortion 
coordinate. In such a case, one finds that the activation energy is 


AG* = 


(AC+JL) 2 
4k 


(C3.2.2) 


Here X is the 'reorganization energy' associated with the curvature of the reactant and product free energy wells 
and their displacement with respect to one another. Assuming a structureless polarizable medium, Marcus 
computed the solvent or outer-sphere component of the reorganization energy to be 




(C3.2.3) 


where Ag is the amount of charge transferred, s Q is the optical dielectric constant, s s is the static dielectric constant, 
a 1 is the donor radius, a 2 is the acceptor radius, and R is the donor-acceptor distance. Note that the outer-sphere 
reorganization energy is always positive, and grows with donor acceptor separation. For distances much larger than 
the donor/acceptor species size, the dependence of X^ on separation distance is weak. Reorganization energy is 
larger in polar solvents (compared to nonpolar solvents), where the difference between the optical and static 
dielectric constant (reciprocals) will be large. Reorganization energies are now computed routinely for much more 
complex media with contacts between regions with high and low dielectric constants. 

C3.2.2.2 NONADIABATIC ET THEORY 

In many instances the adiabatic ET rate expression overestimates the rate by a considerable amount. In some 
circumstances simply forming the the activated state geometry in the encounter complex does not lead to ET. This 
situation arises when the donor and acceptor groups are very weakly coupled electronically, and the reaction is said 
to be nonadiabatic. As the geometry of the system fluctuates, the species do not move on the lowest potential 
energy surface from reactants to products. That is, fluctuations into activated complex geometries can occur 
millions of times prior to a productive electron transfer event. 
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In this weakly coupled regime, ET in an encounter complex can be described approximately using a two-level 
system model [23]. As such, the time-dependent wave function is 


*(i) = i* A sinf-^A f^ DC os^/J 


(C3.2.4) 


where ® D represents the donor wave function (acceptor for A) and T DA is the donor-acceptor coupling. This 
coupling can be enhanced by mixing of the D and A states with each other via intervening bridge orbitals. Note that 
amplitude localized on donor or acceptor oscillates sinusoidally in time (neglecting relaxation processes) with a 
frequency determined by the strength of the donor-acceptor coupling, T DA . Fermi's golden rule of time-dependent 
perturbation theory can be used to compute the rate of ET based upon the short-time evolution of the system: 
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Here we have treated the nuclear degrees of freedom classically as in the Marcus formulation [1]. 

C3.2.2.3 TUNNELLING BARRIERS 

The new challenge that arises in making predictions of nonadiabatic electron transfer rates is to determine the 
electronic coupling element, T DA . Simple orbital tunnelling analysis predicts that if (a) the donor is delocalized 
over 7V D orbitals and the acceptor is delocalized over N A orbitals, (b) the average of the donor and acceptor orbital 
energies are 2 eV removed from the bridging levels (based upon the electronic absorption properties of proteins), 
(c) the donor-acceptor distance is large (measured edge-to-edge between donor and acceptor in A), and (d) the 
donor-acceptor interaction at 'contact' is 1 eV: 

(C3.2.6) 


T^ieV) = 


I 


I 


V(A'a) yf(No) 


(2.7) cxp[-0.72fl UA ] a cxp[-(0/2)fl DA ]. 


This conjecture was made for protein ET systems in particular [24]. It was later found that the exponential decay 
constant, p/2, varies strongly as a function of bridging orbital symmetry and donor/acceptor energetics in both 
proteins and in smaller model compounds [25]. Considerable theoretical and experimental efforts have gone into 
determining this average decay parameter, and many studies of rigid donor-bridge-acceptor systems were 
motivated by a desire to address this question. 

The distance decay of tunnelling through vacuum is more rapid than the decay for tunnelling through bond. 
Through space, the 'barrier' to tunnelling is essentially the binding energy of the donor/acceptor states. However, 
in a bridged system this barrier is generally much smaller, determined by the energy gap between the 
donor/acceptor states, the energies of the bridging orbitals, and the interaction strength among the orbitals. In large 
complex systems, the strength of this interaction is estimated from the tunnelling pathway model according to the 
formula [ 26 ] 
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where £ t*md ^ q$ £ t^ ^ o.gexpf- 1 ,7{# K d - 1.4 A)] an< ^ £^ E " K3fkl ^ 0.36 • These factors are chosen in such a 

way that the 'pathway product' of equation (C3.2.7) joining the donor and acceptor sites in a protein or protein- 
protein complex is a maximum. Because there is a unique decay factor (s) for each contact defined in a protein x- 
ray or NMR structure, the 'strongest' pathway can be determined by a relatively simple graph-search algorithm. 
The prefactor P can be chosen as in the square-barrier model described above ( equation (C3.2.6) ), or can be fitted 
to the experiment. Figure figure C3.2.5 shows the strongest pathways determined for a family of ruthenium 
modified cytochromes c. Pathway analysis also predicts that the average decay of coupling with distance depends 
upon protein secondary structure [26]. Ongoing studies utilizing modern quantum chemical methods are 'summing 
up' the large number of pathway contributions to the donor-acceptor coupling in an effort to make predictions of 
increasing quantitative reliability [27]. 
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Figure C3.2.5. Strongest tunnelling pathways between surface histidines and the iron atom in cytochrome c. Steps 
in pathways are denoted by solid lines (covalent bonds), dashed lines (hydrogen bonds), and through-space contacts 
(dotted lines). Electron transfer distance to His 72 is 5 A shorter than in His 66, yet the two rates are approximately 


the same. The long-distance through space contact on the dominant pathway of His 72 accounts for this dramatic 
effect. From Langen R, Chang I J, Germanas J P, Richards J H, Winkler J R and Gray H B 1995 Science 268 1733. 

The pathway model makes a number of key predictions, including: (a) a substantial role for hydrogen bond 
mediation of tunnelling, (b) a difference in mediation characteristics as a function of secondary and tertiary 
structure, (c) an intrinsically nonexponential decay of rate with distance, and (d) pathway specific 'hot and cold 
spots' for electron transfer. These predictions have been tested extensively. The most systematic and critical tests 
are provided with ruthenium-modified proteins, where a synthetic ET active group can be attached to the protein 
and the rate of ET via a specific medium structure can be probed (figure C3.2.5). 
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The predictive power of pathway analysis is well illustrated with two of the Ru-modified systems of Gray and 
coworkers [29]. Consider, the His 72 and His 39 ruthenium-modified cytochromes c [28]. The ET rates in these 
proteins are about the same, despite the fact that the transfer distance is fully 5 A shorter in the His 72 derivative. 

Average square barrier models with P=1.4 A ( equation (C3.2.6) ) would predict the His 72 rate to be 1000 times 
faster. This equivalency of rates despite the great difference in distances is understood because the strongest 
pathways in the His 72 derivative contain a through-space tunnelling gap. 

A large body of rate data in native and modified proteins was analysed recently in the context of the pathway 

model. Correcting for differences in activation free energies in different proteins (AG ), rates fall into alpha-helical 
and beta-sheet 'zones' when plotting ET rate against distance in a wide range of native and modified proteins [ 29 ] 
(figure C3.2.6), consistent with one of the most fundamental predictions of the pathway model [26]. The average 

decay exponents in the two regions are approximately 1.1 A (P-sheet) and 1.4 A (a-helix), consistent with the 
pathway predictions. 
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Figure C3.2.6. Zones associated with the distinctive decay of electronic coupling through a-helical against P-sheet 
structures in proteins. Points shown refer to specific rates in ruthenium-modified proteins and in the photosynthetic 
reaction centre. From Gray H B and Winkler J R 1996 Electron transfer in proteins Ann. Rev. Biochem. 65 537. 


In addition to testing predictions of the pathway model in proteins, experiments have also examined the prediction 
that the decay across a hydrogen bond (from heteroatom to heteroatom) should be about as costly as the decay 
across two covalent bonds. Indeed, by synthesizing a family of hydrogen bonded and covalently bonded systems 
with equal bond counts (according to this recipe), it was demonstrated that coupling across hydrogen bonded 


contacts is about as favourable as across covalent bonds [ 30 ] ( figure C3.2.7 ). 
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Figure C3.2.7. A series of electron transfer model compounds with the donor and acceptor moieties linked by 
(from top to bottom): (a) a hydrogen bond bridge; (b) all sigma-bond bridge; (c) partially unsaturated bridge. 
Studies with these compounds showed that hydrogen bonds can provide efficient donor-acceptor interactions. 
From Piotrowiak P 1999 Photoinduced electron transfer in molecular systems: recent developments Chem. Soc. 
Rev. 28 143-50. 

C3.2.2.4 BRIDGE ORBITAL SYMMETRY EFFECTS IN CHEMICAL SYSTEMS 

The simplest theoretical orbital-based estimate of the coupling interaction, T DA , is provided by the McConnell 
relation: 


Tda 


■KT- 


(C3.2.8) 


Rewriting T nA as an exponential, 


= --\n 

or 


AE 


(C3.2.9) 
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where Fis the interaction between neighbouring bonds in the bridge, AE is the energy gap between the redox active 
D/A orbitals and the bridge mediating bonds, and a is the distance between the neighbouring bonds. In a system 
that mediates coupling by more than one set of orbitals, this expression can be generalized. Nevertheless, equation 
(C3.2.9) provides a rough means of describing the physical aspects of bridge mediation, (figure C3.2.8). 
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Figure C3.2.8. Dependence of the donor-acceptor coupling decay per bond, |e per bond|, upon tunnelling energy. 
Average exponential decay parameter, P, is related to this decay per parameter by P = (l/^ un j t ) In |€ per bond| for 
periodic bridges. ^ unit is the spacing between repeating units in the bridge. Decay of the coupling with distance is 
softest (|i per bond| is closest to 1) for tunnelling energies near the frontier orbital energies of the bridge (which lie 
at -6 and +7 eV in this figure). From D N Beratan and J N Onuchic 1991 Electron Transfer in Inorganic, Organic, 
and Biological Systems (Advances in Chemistry Series 228) ed J R Bolton, N Mataga and G McLendon 
(Washington, DC: ACS Press). 

Note that P is large in the limit that Fis small comparable to AE. However, as the coupling strength increases or as 
the energy gap drops, p can become smaller. In fact, as the donor/acceptor states approach thermal energies (kT) of 
the bridge orbital energies, the golden rule (equation C3.2.5 ) treatments of the problem are no longer appropriate, 
and more elaborate models are called for. These models would include the possibility of multistate hopping and 
would need to consider the time-scales of hopping compared to the time-scales of thermal trapping on the 
individual sites. 

The McConnell relation does not provide quantitative estimates of electronic propagation because (a) it does not 
include the influence of antibonding orbitals, (b) it neglects 'through-space' nonnearest neighbour interactions, and 
(c) it does not include contributions from multiple interfering coupling pathways. 

It is now understood that inclusion of nearest and second-neighbour interactions is adequate for describing 
tunnelling interactions in many bridged systems [31, 32]. Moreover, tunnelling interactions have been dissected for 
various 
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bridge structures. This kind of analysis permits one to understand the nature of constructive and destructive 
interference interactions that arise in specific bridging species. The central nature of interference can be understood 
from the decay term V/AE in the McConnell relation. In a one-orbital model Fis a negative number (a 'resonance' 
integral in the language of Hiickel theory) but AE is positive (the D/A states lie at energies above the HOMO of the 
bridge). These V elements are readily calculated using quantum chemical methods (figure C3.2.9). As such, as 


longer-length pathways are introduced, they will make contributions with sign that oscillates as their contributions 
drop in magnitude [31]. 



Figure C3.2.9. Both nearest neighbour and nonnearest neighbour coupling interactions mediate superexchange 
between the terminal pi-electron groups of rigid dienes with saturated bridging units. From [31]. 

Inclusion of coupling contributions from both bonding and anti -bonding orbitals give rise to a U-shaped 
dependence of coupling on D/A energetics ( figure C3.2.8 ). 

C3.2.2.5 INNER-SPHERE REORGANIZATION ENERGY 

Whether adiabatic or nonadiabatic, it is the case that both solvent and intramolecular degrees of freedom respond to 
ET events. As such, the two rate expressions given above can be generalized such that 


A = Aq + A[ 


(C3.2.10) 


where X^ is the outer sphere (solvent) contribution to the reorganization energy and X^ is the intramolecular or inner 
sphere contribution to the reorganization energy. 


-13- 


Rate formulations that treat the inner-sphere mode(s) quantum mechanically and the outer sphere modes classically 
are used rather widely. The rate expression for a single harmonic quantum mode is 


2H y 1 


7>E— "P[ w^i J 


(C3.2.11) 


where S is the inner sphere reorganization energy divided by the energy of the vibrational quantum (fioo) and the 
other terms are as defined above. This expression is used for interpreting experimental rate data. The major 
qualitative effect of the quantum mode is to slow the drop off of the rate in the inverted region (where -AG > X and 
see section C3.2.2.6 ). ET rates have been formulated to include the effects of multiple quantum modes, anharmonic 


potentials; rates valid beyond the golden rule regime have been established as well [32]. 

C3.2.2.6 FREE ENERGY- RATE RELATIONS 

The form of the classical (equation C3.2.11 ) or semiclassical (equation C3.2.11 ) rate equations are 'energy gap 
laws'. That is, the equations reflect a free energy dependent rate. In contrast with many physical organic reactivity 
indices, these rates are predicted to increase as -AG grows, and then to drop when -AG exceeds a critical value. In 
the classical limit, log(£ ET ) has a parabolic dependence on -AG. When high-frequency chemical bond vibrations 
couple to the ET process, the dependence on -AG becomes asymmetrical, as mentioned above. 

A tremendous effort was made in the 1980s to test the prediction of an inverted region. Covalently linked donor- 
acceptor species were constructed in such a way that the energies of the donor and acceptor groups (ionization 
potential of donor, electron affinity of acceptor) could be changed to vary AG ( figure C3.2.1Q ). Modelled loosely 
on photosynthetic ET systems, many of the structures contained porphyrin electron donors and quinone acceptors. 
Utilizing an essentially rigid chemical bridge removed ambiguity associated with a distribution of distances (and 
hence of T^ A values). Many practical limitations make mapping of the inverted region challenging, but there are 
now several examples of inverted behaviour for charge separation. Inverted behaviour was recently investigated in 
charge recombination reactions, that prove particularly amenable to study [33]. 
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Figure C3.2.10.(a) Dependence of electron transfer rate upon reaction free energy for ET between biphenyl radical 
anions and various organic acceptors. Experiments were performed with the donors and acceptors frozen into 

organic (methyltetrahydrofuran) glasses. Parameters: 10 s; A, =0.4 eV; \=0A eV; oo=1500 cm" . From Miller J R 
1987 New J. Chem. 11 83. (b) Dependence of electron transfer rate upon reaction free energy for ET between 
biphenyl radical anions and various organic acceptors attached to each other by a rigid spacer measured methyl 

tetrahydrofuran solution. Parameters: A § =0.75 eV; A y =0.45 eV; oo=1500 cm" ; V=6.2 cm" . From Closs G L and 
Miller J R 1988 Intramolecular electron transfer in organic molecules Science 240 440. 
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C3.2.2.7 MARCUS CROSS-RELATION 


A powerful application of outer-sphere electron transfer theory relates the ET rate between D and A to the rates of 
self exchange for the individual species. Self-exchange rates correspond to electron transfer in D/D_(^ 11 ) and A/A + 
(k 22 )- These rates are related through the cross-relation to the D/A electron transfer reaction by the expression 


(C3.2.12) 


where /j 2 is a value that is often of order unity [34]. 


The cross relation has proven valuable to estimate ET rates of interest from data that might be more readily 
available for individual reaction partners. Simple application of the cross-relation is, of course, limited if the 
electronic coupling interactions associated with the self exchange processes are drastically different from those for 
the cross reaction. This is a particular concern in protein/protein ET reactions where the coupling may vary 
drastically as a function of docking geometry. 

C3.2.2.8 SOLVENT POLARITY-RATE RELATIONS 

Electron transfer reaction rates can depend strongly on the polarity or dielectric properties of the solvent. This is 
because (a) a polar solvent serves to stabilize both the initial and final states, thus altering the driving force of the 
ET reaction, and (b) in a reaction coordinate system where the distance between reactants and products (DA and 

D + A~) is one, the strength of the solvent-electron coupling alters the curvature along a solvent nuclear coordinate 
of the diabatic representations for each electronic state. An incomplete but useful beginning point to understand this 
coupling is to think of the strength of the total dipolar field that a solvent, by reorganizing its nuclear charges, could 
project onto the axis of charge separation. It is easy to see that strongly polar solvents, like propylene carbonate or 
water, will exert a stronger coupling in this way than weakly polar solvents like toluene. 

Formally, these effects enter the ET rate expressions by contributing to the reorganization energy, X , see equation 
(C3.2.3 ). The effects may be obtained either from spectroscopic parameters or from a priori models. The classical 
approach to obtaining X Q is to use dielectric continuum theory. In the case of two spherical reactants, the result 
appears in equation (C3.2.3 above. The validity of this approach has been tested often, though perhaps the best 
known examples are transition metal exchange reactions ( figure C3.2.11 ). More recently, interest has focused on 
molecular representations of solvation shells and quantum effects on solvation rates that may be obtained from an 
analysis of the high-frequency solvent motions, particularly in the first solvent shell. 
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Figure C3.2.11. Log of the ET rate (A) against (1/e -l/e s ) for the bis(biphenyl) chromium self-exchange 
reaction. From [34]. 

C3.2.2.9 INTERRELATIONSHIPS IN ET THEORY: OPTICAL VERSUS THERMAL ET 


One of the most interesting aspects of the basic formulation of thermal electron-transfer theory (as portrayed in 


figure figure C3.2.1Q ) is that it is closely related to the Franck-Condon model for optical electronic transitions. 
This establishes a strong connection between spectroscopy and kinetics. Optical charge transfer spectra can be used 
to determine most (and in some cases all) of the required parameters for making rate-constant predictions using 
thermal electron-transfer theory [35]. X and T DA can be obtained from spectroscopic properties. This was first 
outlined in the late 1960s when it was shown that in a symmetric (AG=0) system, E =X. When T DA is small and 
can be neglected, E =X+AGq, in unsymmetrical systems. The use of the integrated oscillator strength of a charge 
transfer band to determine T DA was also introduced. Using a Mullikan formalism, Hush showed that the electronic 
coupling element is related to the intensity of the charge transfer transition by 

(0.0206 IWFma* A V\; 2 ) l/1 
TLa - (C3.2.13) 

'"|_>A 

where v_ ov and s_ ov are the frequency of the band maximum and the band width in wavenumbers, and r^ A is the 

donor-acceptor centroid distance in angstroms [36, 37 ]. This Mullikan-Hush expression has been extensively 
applied to outer sphere [ 38 ] and bridged ET systems [39]. 

C3.2.2.10 DYNAMICAL SOLVENT CONTROLLED RATES 

Chemical changes are not irreversible unless there is some form of dissipation in the system. That is, the reaction 
free energy must be dispersed to a number of degrees of freedom distinct from the reaction coordinate. Models that 
include 
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dissipation predict that the reaction mechanism itself can be switched from adiabatic to nonadiabatic based upon 
the nature of the solvent relaxation. Solvent that relaxes 'slowly' induces multiple recrossings of the activated 
complex prior to relaxation away from the crossing region. The effect of this is to cause the onset of adiabatic-like 
behaviour earlier than might be anticipated. The rate expression that spans both dynamical regimes, assuming a 
Debye solvent with a single relaxation time, is 


Jt NA 
*LT = T— (C3.2.14) 

where the adiabaticity parameter is K — 4 jT T.],i/?ik, and Jfc NA is the nonadiabatic ET rate. 

In Debye solvents, x is the longitudinal relaxation time. The prediction that solvent polarization dynamics would 
limit intramolecular electron transfer rates was stated theoretically [ 40 ] and observed experimentally [41]. 

C3.2.2.11 VIBRATIONAL MODE COUPLING TO ET 

The Franck-Condon principle reflected in the connection between optical and thermal ET also relates to the 
participation of high-frequency vibrational degrees of freedom. Charge transfer and resonance Raman intensity 
bandshape analysis has been used to determine effective vibrational and solvation parameters [ 42 , 43 ]. 

To make connection between the spectra and the ET process clearer, we note a simple model for the lineshape that 
includes a classical and a high-frequency degree of freedom. In this case the overall lineshape is 


™* Dpi «>!■«* [ — 4i^7 — J 


(C3.2.15) 


where A, is the classical reorganization energy, and A^ is the reorganization energy in the quantum nuclear mode of 
frequency v QM . The free-energy difference between the charge-separated states then depends on the quantum 
number of the quantum nuclear mode, ( AG|' = (AC + uh l'^m ))• n is the index for the quantum mode. 

This lineshape analysis also implies that electron-transfer rates should be vibrational- state dependent, which has 
been observed experimentally [44]. Spin-orbit relaxation has also been identified as an important factor in 
controlling the identity of both electron and vibrational-state distributions in radiationless ET reactions. 

C3.2.2.12 COMPETITION BETWEEN INNER SPHERE AND OUTER SPHERE NUCLEAR POLARIZATION DYNAMICS 

Early studies showed that the rates of ET are limited by solvation rates for certain barrierless electron transfer 
reactions. However, more recent studies showed that electron-transfer rates can far exceed the rates of diffusional 
solvation, which indicate critical roles for intramolecular (high frequency) vibrational mode couplings and inertial 
solvation. The interplay between inter- and intramolecular degrees of freedom is particularly significant in the 
Marcus inverted regime [45] ( figure C3.2.12 )). 
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Figure C3.2.12. Experimentally observed electron transfer time in psec (squares) and theoretical electron transfer 
times (survival times, Tau a and Tau b) predicted by an extended Sumi-Marcus model. For fast solvents the 
survival times are a strong function of the characteristic solvent relaxation dynamics. For slower solvents the 
electron transfer occurs through the motion of intramolecular degrees of freedom. From [45]. 


C3.2.2.13 ORIENTATION IN INTERMOLECULAR ET 


The electrostatic forces that control orientation also influence the D/A overlap and thus tunnelling probability. This 
balance is known to be manipulated in rather subtle ways in biological ET. In a small molecule system, the relative 
orientation of electron donor (dimethylaniline) and acceptor (coumarin 337) in a solvent/solute ET reaction was 
examined [46]. Figure figure C3.2.13 shows the time-dependent response that is measured at 500 nm after 400 nm 
electronic excitation of the coumarin. At early times, the transition moment probed at 500 nm belongs to coumarin. 
At later times after electron transfer from the dimethylaniline to the coumarin, the probed transition moment 
belongs to the dimethylaniline radical cation. The nai'i ve expectation might be that the solvent molecules most 
stably oriented relative to the solute would preferentially undergo ET. In fact, it was found that the electron transfer 
occurred when the permanent dipole moments of the donor and acceptor were perpendicular, not antiparallel. 


-19- 


E 

«r 

V 

E 

j5 "J 


I 


10 


■15 



r*- 0.2 ±0 05 

Ahsorhance of DMA * + 

3 



M0,<±aw 


Stimulated Emission &fC337 


10 20 30 
Delay time (j») 


40 


Figure C3.2.13. Orientation in a photoinitiated electron transfer from dimethylaniline (DMA) solvent to a 
coumarin solute (C337). Change in anisotropy, r, reveals change in angle between the pumped and probed 
electronic transition moments. From [46]. 

C3.2.2.14 ETIN PHOTOSYNTHETIC REACTION CENTRES 

Electron transfer occurs in many steps in photosynthesis. Recently, considerable attention has been paid to the 
initial steps of charge separation. These initial steps occur in the picosecond time domain. A remarkable fact is that 
the photosynthetic reaction centre is highly symmetrical, with near C 2 symmetry in the cofactors but electron 
transfer occurs down only one path. It is believed that a slight asymmetry in the protein dielectric environment 
contributes to electron preference for one branch. Just how strongly the electron interacts with all of the nearby 
cofactors has also been an issue of interest; some argue that the electron passes along a transport chain, with 
identifiable populations at each cofactor, while others have argued that some cofactors participate only as 
superexchange mediators. Because the initial electron transfer processes are very fast (<3.5 ps in Rhodobacter 
spheroides) in photosynthetic reactions, these centres have also provided a testbed for analysing the role of 
nonequilibrium vibrations and their possible coherent coupling to electron transfer. Such coherences have been 
examined in smaller molecular systems [47, 48, 49 and 50] as well. 

C3.2.2.15 LASER-FIELD DRIVEN ET COHERENCES 

Calculations within the framework of a reaction coordinate degrees of freedom coupled to a bath of oscillators 
(solvent) suggest that coherent oscillations in the electronic-state populations of an electron-transfer reaction in a 
polar solvent can be induced by subjecting the system to a sequence of monochromatic laser pulses on the 
picosecond time scale. The ability to tailor electron transfer by such light fields is an ongoing area of interest [ 51 ] 
( figure C3.2. 14 ). 
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Figure C3.2.14. Electron population difference x(t) = P d (7) - ^ a (0 for three electron transfer reactions in the 
presence of a pulsed laser field. Frequency of the field is tuned to solvent reorganization energy X = 1 eV and the 
field strength is such that coupling potential (charge transfer dipole moment times the field strength) is twice the 

laser field frequency, (a) Activationless reaction, AG = X, (b) reaction with AG = XJ2, and (c) symmetric electron 
transfer with no bias. From [51]. 


C3.2.3 APPLICATIONS IN COMPLEX SYSTEMS 

This section describes the application of the theoretical principles described above to specific structures and 
processes of current interest in electron transfer research. 

C3.2.3.1 DNA ELECTRON TRANSFER 

Over the last decade attention has turned to the nature of electron transport in DNA [52]. Although DNA electron 
transfer has been of interest for a much longer time, new methodologies for attaching redox active donors and 
acceptors at defined positions in complex macromolecules has driven a resurgence of interest. One of the principle 
issues in this field concerns the mechanism of charge transfer, that is, whether the reactions proceed by long-range 
(single-step) tunnelling or multistep hopping. These mechanisms are summarized in figure figure C3.2.15 . 
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Figure C3.2.15. Schematic diagram showing (A) electron hopping between electron reservoirs via empty states of 
an intervening bridge, (B) tunnelling, and (C) hole hopping via filled states of an intervening bridge. From 


Bockrath M, Cobden D H, McEuen P L, Chopra N G, Zettl A, Thess A and Smalley R E 1997 Single-electron 
transport in ropes of nano tubes Science 275 1922-5. 

P values reported for DNA electron transfer vary from 0.2 to 1.5 A -1 . Assuming that the reactions proceed by 
single-step tunnelling (see equation C3.2.5), explanations of the physical origin for this wide range of values 
include: (a) the stacking interactions (Tof equation (C3.2.8) ) might be highly variable because of variations in the 
nature of stacking or (b) changes in the energy denominator by changes in the average energetics of the redox 
active donor and acceptor orbitals. An alternative explanation for small apparent P values is that the process 
proceeds by multistep hopping (figure C3.2.15), where many very rapid short distance steps lead to a weak 
apparent distance dependence. Indeed, recent experiments involving oxidation of guanine, most likely fall in the 
regime of either tunnelling or multistate hopping, depending upon the details of the way in which the system is 
constructed. 

Although the challenge of determining the distance dependence of ET in DNA seems academic, it is of 
considerably wider interest. There are now a number of companies developing medical diagnostic devices based 
upon changes in ET rates or electrode currents upon recognition and binding of single-stranded DNA. The 
sensitivity of these devices will depend upon how ET rates change upon oligomer binding, nature of base pair 
mismatches, and the solvation environment upon recognition. 

C3.2.3.2 STM AND SINGLE MOLECULE CONDUCTIVITY 

The invention of the scanning tunnelling microscope in the 1980s opened up new directions for electron transfer 
chemistry. The STM is discussed in detail in section B1.20 of this encyclopedia. Measurements of tunnelling 
current propagating through empty space and through various adsorbates provide a relatively direct probe of 
tunnelling propagation. Measurement of the current as a function of adsorbate layer thickness probes the decay 
parameter P directly ( figure C3.2.16 )). Single molecule bridged STM studies have mirrored results seen in solution 
chemistry: as the effective energy barrier is decreased and as the effective interaction between bridging units 
increases, P drops in size. 
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Figure C3.2.16. Dependence of measured resistance in an STM junction consisting of a 'bare tip' a tip with one Xe 
atom attached, and a tip with two Xe atoms. Note that the Xe atoms facilitate tunnelling (compared to empty 
space). From Yazdani A, Eigler D M and Lang N D 1996 Off resonance conduction through atomic wires Science 
111 1921-4. 


An expression for the current across a molecular junction is developed by analogy with the description of 
unimolecular solution phase electron transfer. The conduction is written [ 20 ] 


J = ^ J^/(ft)Il - /(£V)J|7/J^(^ - E f ) (C3.2.16) 

where /represents the fermi function and the sum is taken over states of the source electrode (i) and receiving 
electrode (/). When there are no molecular eigenstates of the molecule in the energy regime between the highest 
filled source electrode energy and lowest empty acceptor electrode, the conduction is limited by the tunnelling 
characteristics of the molecule. However, if eigenstates of the bridge fall in this gap, transport can involve a 
multistep hopping process. Indeed, as the voltage is varied and multiple eigenstates enter the gap, a molecular 
eigenstate staircase is seen in the current-applied voltage curves. Eigenstate staircases have been observed in 
carbon nanotube structures as well as in 'break junctions' bridged by molecules [21]. 


C3.2.4 FUTURE DIRECTIONS 

We have surveyed the remarkable progress in the field of ET reactions, and have examined some of the key 
applications and successes of the theory. Many of the current frontiers of ET research lie in biological systems and 
in molecular-scale electronic devices. 

C3.2.4.1 ENERGY FLOW INTO ET PROCESSES 

The nitrogenase system reduces hundreds of millions of kilograms of nitrogen gas to ammonia each year, 
catalysing the reaction at ambient temperatures and atmospheric pressure. Nitrogenase consists of two proteins that 
contain 
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iron-sulfur and molybdenium-iron-sulfur clusters. Each (interprotein) electron transfer reaction is driven by the 
hydrolysis of two ATP molecules. The energy released as a consequence of ATP hydrolysis is about one-third of an 
electronvolt. A total of eight electrons must be delivered to the Mo-Fe protein to complete the catalytic cycle. It 
appears that the ATP consumed by nitrogenase drives substantial conformational changes that lead to protein- 
protein docking, an increase in ET driving force, and (following ET) protein-protein dissociation [53, 54]- Thus, 
ATP overcomes kinetic barriers to ET. This serves as an example of energy transduction from a storage species 
(ATP) to an ET process. The molecular mechanism of this process is the subject of great current interest. Equally 
challenging is to understand the way in which ET leads to transmembrane proton gradients, and how these 
gradients then drive the synthesis of ATP [3, 55] (figure C3.2.17)). 


ATP 



b *i 


frPO(i] 


Figure C3.2.17. Diagram of a liposome-based artificial photosynthetic membrane showing the photocycle that 
pumps protons into the interior of the liposome and the CFqF^-ATP synthase enzyme. From [55]. 

C3.2.4.2 PA THWA Y FUNCTION 

It is fairly clear that most biological electron transfer reactions fall in the weakly coupled regime (with a possible 
exception being the primary charge separation event in bacterial photosynthesis). However, there remains 
considerable debate concerning the functional significance of tunnelling pathways and of protein secondary and 
tertiary structure on the control of biological ET rates. The average decay constant for tunnelling through a-helix is 
predicted to be larger (rates drop more rapidly with distance) than for P-sheet structures. It was recently argued that 
this secondary structure effect is exploited to slow the rate of charge recombination in the photosynthetic reaction 
centre which is dominantly helical [56]- In contrast, the structure of cytochrome c oxidase appears to have direct 
beta strand-like pathways linking the redox centres to the oxygen reduction site. Since ET in this protein is thermal 
rather than 
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photochemical, and the reactions are exergonic, charge transfer recombination is not a competing reaction pathway. 

With the emergence of increasingly high-resolution structural data for complex proteins, the influence of specific 
tunnelling pathways on protein function will be subjected to greater scrutiny. A particularly intriguing potential 
effect appears in the protein system that comprises the nitrogenase nitrogen fixing protein. When the two proteins 
dock to exchange electrons, it appears that there is a substantial structure change that 'wires' a tunnelling pathway. 
Following ET, the direct pathway is broken, electronically disconnecting the two redox centres. Since the protein 
dissociation reaction is particularly slow and the driving force of the reaction is not large, this disconnection could 
prove functionally important for localizing the electron at the proper site in the enzyme. Protein dynamics causes 
donor/acceptor interactions mediated by the protein to fluctuate. If fluctuations are sufficiently rapid, the root mean 
square coupling value should control the observed ET rate. A considerable open challenge is to understand what 
kinds of protein folds might have coupling interactions that are particularly sensitive to thermal fluctuations [57]. 


C3.2.4.3 INTERPRETING THE QUANTUM NATURE OF PROTEINS: REDUCED HAMILTONIANS AND CURRENT LOOPS 


The complexity of protein structure motivates the development of new strategies for 'information reduction'. One 
approach has been to devise hot and cold spot maps at the pathway level [26]. Other Hamiltonian based strategies 
have built 'reduced' Hamiltonians comprised of a subset of effective interactions that represent the interactions 
characteristics of the bridge [ 23 ] ( figure C3.2.18 ). Density matrix approaches follow propagation of electron 
current density in two-level systems using equation C3.2.4 , rather than tracking the wave function propagation. 
That is, the time domain rather than the energy domain quantum picture is analysed. Whereas electron amplitude 
decay and oscillation are seen in the wave function amplitude picture, vortices — closely associated with the nodal 
structure of the decaying wave function — appear in the current density maps [58]. 
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Figure C3.2.18.(a) Model a-helix, (b) hydrogen bonding contacts in the helix, and (c) schematic representation of 
the effective Hamiltonian interactions between atoms in the protein backbone. From [23]. 
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Figure C3.2.19. In this ESDIAD experiment where H + ions are produced and collected (see text), an adsorbed 

acetate species is excited by an incoming electron. H + ions are emitted in the direction of the C-H bond in the 
upward pointing -CH 3 group in the species. Circular symmetry of figure indicates that C-H bonds are spinning 

around the vertical axis in the acetate species. From Lee J G, Ahner J, Mocutta D, Denev S and Jates J T Jr 2000 J. 

Chern. Phys. 112 335. 

Not all processes that involve charge redistribution move charge between spatially well-localized regions. Electron 
scattering events fall into this regime and lie at the boundaries of the topics that we have discussed. Electron 
scattering processes are often used to practical advantage to probe the structure and dynamics of chemisorbed 
molecules, for example. One of these, ESDIAD (electron stimulated desorption ion angular distribution), invented 
in 1974 [59], may be used to observe the bond directions in chemisorbed species. This method uses electrons to 
ionize adsorbed molecules, making either positive or negative fragment ions. The ions escape from the molecule in 
directions closely similar to those of the chemical bonds that are being ruptured by the excitation event. By 
measuring the emission directions of the ion fragments, the characteristics of the ruptured chemical bond can be 

observed. In figure figure C3.2.19 , an adsorbed acetate species is excited by an incoming electron [60]. H + ions are 
emitted in the direction of the C-H bond in the upward pointing -CH 3 group in the species. In the example shown 

here, millions of individual acetate species have been ionized, and the statistical distribution of the H + emission 
directions is shown by the volcano-shaped figure at the top. The circular symmetry of the figure indicates that the 

C-H bonds are spinning around the vertical axis in the acetate species, so that an almost equal probability of H + 
emission exists in all azimuthal directions. If the surface is cooled to very low temperatures, the rotation of the - 

CH 3 group ceases, and a multibeam H + pattern is observed. Measuring the temperature dependence of the beam 
pattern broadening into the volcano pattern allows one to measure the energy required to make the -CH 3 group 

spin. Such information is of importance in many technologies dependent upon molecular motions on surfaces, such 

as semiconductor device fabrication, corrosion inhibition, and heterogeneous catalysis. 
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C3.3 Energy transfer in gases 

George W Flynn 


C3.3.1 INTRODUCTION 

Almost all aspects of the field of chemistry involve the flow of energy either within or between molecules. Indeed, 
the occurrence of a chemical reaction between two species implies the availability of some minimum amount of 
energy in the reacting system. The study of energy transfer processes is thus a topic of fundamental importance in 
chemistry. Energy transfer in gases is of particular interest partly because very sophisticated methods have been 
developed to study such events and partly because gas phase processes lend themselves to very complete and 
detailed theoretical analysis. 

In the gas phase molecules are generally separated by distances large compared to their diameters, and energy 
exchange between two different molecules occurs only when they collide, much like billiard balls on a pool table, 
or two trucks in a road accident such as that depicted in figure C3.3.1 . Nevertheless, despite these large separations, 
gas molecules collide at a very high rate. The formula for the collision rate, z, of a single molecule is [1] 

where a is the molecular diameter, (u) the mean speed and n the density of molecules. To get some feeling for the 
size of z, try to imagine a single nitrogen molecule in the atmosphere right in front of your nose. If you blink your 
eye, that molecule undergoes more than a billion collisions in the time it takes for your eye to open and close! 
Modern techniques for the study of such processes generally focus on single collision events. This would require a 
time resolution of better than a billionth of a second to investigate a nitrogen molecule in the atmosphere. 
Fortunately, for gases, the density n can be easily controlled thereby reducing z substantially. For the collisional 
energy transfer processes described in this chapter, pressures of lO -5 atmospheres are typically used, thereby 


reducing the mean time between collisions to an experimentally manageable time scale of a few hundred 
thousandths of a second. 

While collisions between atoms are very simple, since only translational motion is of importance, collisions 
between molecules are more complicated because of the internal degrees of freedom. For example, in a relatively 
simple collision between an argon atom and a linear C0 2 molecule, the atom has three translational degrees of 
freedom while the molecule has three translational, two rotational and four vibrational degrees of freedom. In 
principle, energy exchange among all of these degrees of freedom must be accounted for in any complete 
description of such a collision. Loss or gain of vibrational energy in molecular collisions is of special interest in 
chemistry because all reactions ultimately involve the breaking and making of chemical bonds, a process that is 
accelerated by putting energy into the vibrational degrees of freedom of a molecule. 
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Figure C3.3.1 A collision between a milk truck and a bread truck showing the well ordered truck contents at the 
top, the 'scattering event' in the middle and the post crash scrambling of the truck contents at the bottom. 

C3.3.1.1 UNIMOLECULAR REACTIONS AND ENERGY TRANSFER 

Of particular importance is loss of energy from molecules with 'chemically significant' amounts of vibrational 
energy. These are systems in which the molecule has sufficient energy to rupture a chemical bond. Chemical 
reactions of such highly vibrationally excited molecules, which normally take place on the ground electronic state 
potential energy surface, are often described by the Lindemann unimolecular reaction mechanism in which a 
substrate, S, is excited by collisions to S*, a level with energy sufficient to cause bond breaking or molecular 
rearrangement [2, 3 and 4]. S* is thus said to have a 'chemically significant' amount of energy. For large 
molecules, the time scale for decomposition of S* is sufficiently long that further collisions with the bath molecules 
can cause deactivation of the excited substrate, thus quenching the reaction process. The overall mechanism can be 
summarized by the equations 


S + B^ S*+B 


(C3.3.1) 


S* -> P (C3.3.2) 

S* + B -> S + B (C3.3.3) 

where B is a generalized representation for a bath molecule and P is the product, chemically distinct from S. Once 
the 
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molecule is excited in step ( C3.3.1 ), the rate of production of product is a competition between unimolecular 
breakup of S* in step ( C3.3.2 ) and quenching in step ( C3.3.3 ). The importance of vibrational energy loss in step 
( C3.3.3 ) is therefore paramount in determining the overall efficiency with which S is converted to product via this 
mechanism. It would be difficult to exaggerate the importance of this quenching step, since an enormous number of 
thermal chemical reactions proceed via this mechanism or some variant of it. 

One of the complexities that arises in studying the vibrational energy loss from highly excited molecules is the 
very, very large number of vibrational states (e.g. 10 vibrational states per cm") in molecules of even moderate 
size at chemically significant energies (100-400 kJ mol -1 ). Because of this, directly probing the vibrational states 
of S*, with even the highest resolution laser devices, is essentially impossible. Nevertheless, such collisions can be 
monitored in great detail by using a simple trick provided that the bath molecule B is relatively small. This trick 
amounts to realizing that the collision of S* with B can be viewed through the 'eyes' of the bath molecule B [5, 6]. 
When B is a small molecule with well resolved and assigned vibrational and rotational spectroscopic transitions, 
more information about the quenching process (C3.3.3) can be obtained from probing the bath B than from probing 
the donor S*. If we return for a moment to the collision between the bread truck and the milk truck of figure C3.3.1 
a typical approach for the police to use in reconstructing such an accident is to take pictures of the post-collision 
scene to establish the speed and position of the two trucks before the initial 'scattering event' occurred. The more 
detail available in the post-collision picture, the better the chance of accurately reconstructing the collision event. 
In principle, the condition and position of either truck is sufficient to establish most of the details of the collision. 


C3.3.2 EXPERIMENTAL APPROACH 

C3.3.2.1 GENERAL SCHEME 

Experiments of the type described above rely on the availability of both high resolution and very intense laser 
sources throughout the ultraviolet, visible and infrared spectral ranges [7, 8, 9, 10, H, 12, 13, 14, 15, 16, 17 and 
18 ]. Figure C3.3.2 shows an energy level diagram and excitation scheme for a typical molecule. A short pulse of 
ultraviolet light from a laser excites a medium size molecule such as benzene or hexafluorobenzene to its first 
electronically excited singlet state, 

So + hv (248 nm) -* S\ (laser excitation). 

The natural processes of intersystem crossing and internal conversion will quickly (e.g. 50 ns) carry the molecule 
from this excited electronic surface to the ground electronic surface without a collision, 

S] -» % [inter system cross in g/i menial convcrsi on), 

The result is the preparation of molecules with a well defined energy E that is essentially pure vibrational energy in 
Sq, the ground electronic state. (Laser excitation adds only hlln to the total angular momentum of the molecule and 
produces no increase in the translational energy.) This highly excited molecule S.* of energy E is our donor with 
chemically significant amounts of vibrational energy. Preparation of donor species in this way provides molecules 


with 
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very well characterized energy and sets the origin of time at the laser pulse so that subsequent collisions can be 
studied if they occur on a time scale long compared to the laser pulse width and the molecular internal- 
conversion/intersystem-crossing time. Typical laser pulse widths are 10-30 ns while collision times can be 
controlled with pressure to take place on a time scale of several microseconds. (At a pressure of 20 mTorr, the 
mean time between collisions is about 4 jus.) 


Si- 


vib 


24 s 



SlE' »kT)+B*<l?Vkn 

Figure C3.3.2. A simple diagram showing a clean method for preparing molecules with a specific and large 

amount of vibrational energy E ib = 5 eV (roughly 460 kJ mol" ). A pulse from an excimer laser excites an allowed 
(Sq — > Sj) electronic transition in the molecule. Intersystem crossing and internal conversion, occurring collision- 
free in the molecule, carry it onto the ground electronic surface. Collisions of the hot donor molecule, S*, with bath 
molecules, B, depicted by the equation at the bottom of the figure, cause the molecule to lose energy with a 
probability P(E, E ), where AE = E -E. 

Once prepared in S* with well defined energy E, donor molecules will begin to collide with bath molecules B at a 
rate determined by the bath-gas pressure. A typical process of this type is the collision between a C 6 F 6 molecule 

with approximately 5 eV (40 000 cm or 460 kJ mol ) of internal vibrational energy and a C0 9 molecule in its 
ground vibrationless state 00 to produce C0 2 in the first asymmetric stretch vibrational level 00 1 [ 11 , 12 and 
13 ]. This collision results in the loss of approximately AE = 2349 cm of internal energy from the C 6 F 6 , 

QF 6 (^ = 4IR22cm" , ) + CO 2 ((K) n 0:y' 1 V) -+ C^F^t = 39 473 cm" 1 ) 

+ COif00 fl l: J>V). 

J and V represent the rotational angular momentum quantum number and the velocity of the C0 2 , respectively. The 
hot, excited C 6 F 6 donor can be produced via absorption of a 248 nm excimer-laser pulse followed by rapid internal 
conversion of electronic energy to vibrational energy as described above. Note that the result of this collision is to 

produce one quantum of vibrational energy in the C0 2 00 1 state with a corresponding loss of the same amount of 
energy from the internal vibrational degrees of freedom of the donor. 

Actually, collisions in which the bath becomes vibrationally excited are relatively rare, occurring with a typical 
probability of 1% per gas-kinetic collision [6, 8, H and 13]. More common are processes that produce rotational 
and translational excitation in the bath acceptor while leaving the molecule in its ground (vibrationless) 00 state, 
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C b ¥ b lE =41 Raacm-VCO^O: J\ V) -* C*F 6 <r) + CO>(00^0; J, V). 

Here the energy loss AE = E -E from the donor shows up as translational energy through a change in the bath 
molecule velocity (V going to V) and as rotational energy through a change in the rotational-angular-momentum 
quantum number {J going to J). 

C3.3.2.2 INFRARED DIODE LASER PROBING 

To 'view' these collisions through the eyes of the bath acceptor molecule, we need only to probe the C0 2 (00 0; J 9 
V) molecules exiting the collision. Much like the police visiting the post-collision-accident scene ( figure C3.3.1 ) 

we 'take a picture' of the CO 2 (00 0; J, V) with a high resolution camera. Our camera is an infrared laser with 
sufficient resolution to identify all of the quantum numbers for molecular vibration of C0 2 , the rotational angular 

momentum quantum number J and the recoil velocity V. This probe, or picture-taking, step can be represented by 
the equation 

CO^OiVO: J. V) +hv (4,3 jtm) -^ CO 3 {0fl°h J + 1 . V>( infra rod laser pnihc) 

where light from a narrow-band-infrared-diode laser (linewidth of 0.0003 cm") operating at a wavelength of 

approximately 4.3 urn is used to sense the arrival of molecules in the 00 0; J state by observing or probing the fully 

allowed infrared transition, 00 0; J—> 00 1; J± 1, in C0 2 . Such is the resolution of these laser devices that 
essentially any state of the C0 2 bath molecule produced in the collision process can be probed and the population 

of each quantum state measured. 

Even more remarkable is the fact that these infrared diode lasers have sufficient resolution to measure the Doppler 

lineshape for the 00 0; J—> 00 1; J± 1 transition of the recoiling molecules, hence providing a probe of the C0 2 
molecular recoil velocity, V. The absorption frequency for a molecule moving with a component of velocity V z 

parallel to the direction of propagation of the infrared laser beam is v = v (l ± VJc), where v Q is the absorption 
frequency for the molecule at rest and c is the speed of light. The ± sign determines whether the molecule is 
travelling in the same or opposite direction as the light beam. For an isotropic distribution of molecules whose 
translational motion is at equilibrium at a temperature T, there is a range of V given by the Boltzmann distribution. 
The corresponding spread of absorption frequencies arising from this velocity distribution gives rise to an 
inhomogeneously broadened spectral line shape that can be described by a Gaussian function whose full width at 
half height is given by [ 19 ] 

Av = 2(3.581 x ]0" ? )v o (r/A/j l/2 

where Mis the molecular weight of the molecule, v the absorption frequency for a molecule at rest and Tthe 
absolute temperature. Figure C3.3.3 shows the relative Doppler lineshapes for a C0 2 molecule at a temperature of 
300 K corresponding to a moderate velocity spread along the laser probe direction, and 3000 K corresponding to a 
rather large velocity spread along the laser probe direction. Also shown for comparison is the laser line width that 
is much narrower than even the 300 K Doppler line shape for room-temperature C0 2 . For an experiment conducted 
at room temperature, bath molecules start with a 300 K Doppler lineshape corresponding to a Boltzmann velocity 
distribution at this temperature, but collisions with hot donors produce recoiling bath molecules that have velocities 
comparable to those typical for temperatures in the 1500^1000 K range. Figure C3.3.3 indicates that such 
linewidths can easily be measured with modern, commercially available, high-resolution infrared lasers. While 
there is no a priori reason to 
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expect that molecules scattered by collisional energy transfer will have an isotropic Boltzmann distribution of 
velocities, experiments performed under simple gas-bulb conditions with molecules initially thermalized at 


temperatures in the 200-400 K range have so far found that the spectral absorption lineshapes for recoiling 
molecules in these studies are Gaussian within experimental error and, therefore, can be characterized by a 
temperature T [5, 9, 12, 16 ]. 
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Figure C3.3.3. A schematic drawing of the Doppler-lineshape profile for a typical infrared transition in a small 

1 /9 

molecule at T= 300 K and T= 3000 K. The width of the absorption profile scales as T reflecting the thermal 
spread and isotropic distribution of molecular velocities. Such profiles are easily measured using a high-resolution 

infrared diode laser whose typical line profile, having a width of 0.0003 cm , is also shown in the figure. 
C3.3.2.3 EXPERIMENTAL APPARATUS 

Figure C3.3.4 shows a schematic diagram of an apparatus that can be used to study collisions of the type described 
above [5, 9, 12, 16]. Donor molecules in a 3 m long collision cell (a cylindrical tube) are excited along the axis of 
the cell by a short-pulse excimer laser (typically 25 ns pulse width operating at 248 nm), and bath molecules are 
probed along this same axis by an infrared diode laser (wavelength in the mid-infrared with continuous light-output 

power of 100 juW in a bandwidth of 0.0003 cm"). The pump and probe beams are joined in front of the collision 
cell with a dichroic beamsplitter coated to reflect the ultraviolet laser light and pass the infrared laser light. The 
beams propagate collinearly along the cell axis. At the end of the cell the ultraviolet beam is discarded and the 
infrared beam passes through a monochromator to select the appropriate diode laser mode. About 10% of the 
infrared laser light is split from the main beam and sent through a reference cell, a scanning etalon, and a 
monochromator. This fraction of the light beam is detected with an InSb infrared detector and fed to a lock- in 
amplifier. The output of the lock-in amplifier can be used as an error signal and fed back to the diode-laser-control 
electronics to lock the diode either to a specific spectral reference line or to an etalon fringe. This reference loop is 
used to identify the exact frequency of the diode laser and to sweep the laser over small ranges (for Doppler profile 
measurements) by scanning the frequency of the etalon to which the laser can be locked. The reference cell also has 
electrodes for exciting a discharge in the reference gas in cases where the frequencies of interest originate in 
vibrational levels not populated at the ambient temperature of the cell. 
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Figure C3.3.4. A schematic diagram of an apparatus described in the text for studying vibrational energy transfer 
to small bath molecules from donor molecules having chemically significant amounts of internal vibrational 
energy. 

The infrared light passing through the collision cell impinges on a second InSb-solid-state detector (cooled to 77 K) 
producing both DC and AC signals at the detector output. Since most of the IR light is not absorbed by the sample, 
the DC signal that measures the continuous output level of laser light is much larger than the AC signal. It is in fact 
fluctuations in the DC light level that constitute one of the main noise sources in the experiment. Both AC and DC 
signals are fed to a high-speed transient recorder with at least two channels where the time-resolved ratio of the AC 
and DC currents is recorded and stored in memory. Single-collision data are obtained from this time-dependent 
absorption data. Signals from a series of ultraviolet laser pulses can be added in memory with subsequent signal 
averaging and noise reduction. The ratio of the AC and DC infrared light levels, Mil, is related to the pressure of 
absorbing molecules, P, the molecular absorption coefficient, a and the cell path length, L: 

The path length is set by the experimental configuration while a is known for each transition (such as 00 0: J— > 
00 1, J±\ or 00 1; J— » 00 2, J± 1). Thus, a measurement of Mil provides the partial pressure P of molecules 
produced in probed states such as 00 0; J or 00 1; J. (Strictly, optical probing measures the difference in the partial 
pressures between the upper and lower states of the probed transition; however, in practice, the lower state 
population is always much larger than the upper state population so that the probe senses only the lower state 
population in the experiment.) 
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C3.3.3 DATA ANALYSIS 


C3.3.3.1 KINETIC PROCESSES 


As a first step in understanding the analysis of energy transfer experiments, it is worthwhile to summarize the steps 
in a typical experiment where C 6 F 6 is the hot donor and carbon dioxide is the bath receptor molecule. First, excited 

C 6 F 6 molecules (C 6 F 6 ®) are produced at energy E = 41 822 cm by an excimer laser pulse (25 ns), 

CfcFa + hv (248 nm) -+ QF^\ (C3.3.4) 

The C 6 F 6 electronic state excited in this energy region rapidly interconverts (collision-free), on a time scale of less 
than 30 ns, to high vibrational energy levels of the electronic ground state. Collisions of this hot donor with C0 2 

'inert' bath gas then cause translational, rotational and vibrational excitation of the v^ stretching (00 1;2349 cm") 
vibrational state, as well as rotational and translational excitation in the ground vibrationless (00 0) level, 

C*F^ + CO^OQ^O) -> QF^ p + t 0,(00*1. /. V) (bath vibrational excitation, v*) (C3.3.5) 

CfiKi*' + CO 3 (00*0) -> Cd : ;, A h + LO;(U0 D 0. J\ V) (bath translation /rotation excitation). (C3.3.6) 

A tunable diode laser operating at 4.3 jum is used to probe the P and/or R branch bands of the following transitions, 

aij((K)° I h A V) + h u {4.1 /im) -^ COitOO^. J ± I . V) (C3.3.7) 

COrfOtfU A V} + /iv(4,3^m}^ COi(00°l./ r ± I, V), (C3.3.8) 

Velocity recoils are measured at short times after the initial ultraviolet excitation pulse by probing the 'nascent' 
Doppler profiles for the different spectral lines probed in these last steps. 

C3.3.3.2 INITIAL RATES TECHNIQUE 

Using the above equations, the rate equation for production of bath molecules in a given quantum state due to 
collisions with a hot donor molecule can be written (e.g. for equation (c3.3.5)) 


d[CO 2 (00°l t A V)]/d/ = Jt r [Q,F^*][CO 2 (00 ll 0)] 


where k r is the rate constant for production of CO 2 (00 1, J, V) from collisions between C^Y^ molecules excited 
to an energy E and initially unexcited CO 2 (00 0) bath molecules. For times t after the excimer laser excitation 
pulse that are short compared to the mean collision time in the gas, this equation can be solved to a good degree of 
approximation by using the initial rate technique with 

[tX>,(00°L./, V)] = k r [C*¥f l MCQ2iQtf>Q)]t- 
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Here [C 6 F 6 ^] Q is the initial concentration of excited donor molecules produced at time t = by the excimer laser 
pulse, and [CO 9 (00 0)] is the concentration of bath molecules that, for all practical purposes, can be assumed 
constant. The infrared diode laser probe provides an experimental value for [CO 2 (00 1, J, V)], while the number of 
absorbed excimer laser photons provides a measure of [C 6 F 6 ^] Q . [CO 2 (00°0)] is known from the partial pressure 
of the bath gas in the cell and the time t is easily determined from the transient recorder time base, leaving only k r 

to be determined from these experimentally measured parameters. Experiments of this kind provide three important 

pieces of data: (1) the distribution of populations in the final bath vibration/rotation state (00 1, J, V); (2) the 
distribution of recoil velocities, V, from a measure of the Doppler lineshape for the carbon dioxide spectral 

transition (00° 1, J, V) + Av(4.3 um) -» (00°2, J± 1, V) and (3) the rate constant, k, or probability that a collision 
produces a C0 o molecule with a given final /and Fin the 00 1 vibrational state. 


The initially excited C^F^ molecules can produce de-excited species, such as C 6 F^ E \ that are also able to excite 
C0 2 via collisions. This fact emphasizes the importance of choosing t short enough that a given bath molecule has 

time for only a single collision with an excited donor and does not collide a second time with either another hot 

donor or a bath molecule. Collisions with other bath molecules tend to relax the 'nascent' population and velocity 

distributions formed initially by collisions with the hot donor, while second collisions with excited donors produce 

'doubly' excited bath molecules. In contrast, rotational and vibrational-state-population distributions, velocity 

distributions and k r values measured at a time t after the excimer laser pulse corresponding to 0.1-0.25 of a gas 

kinetic collision provide data specific to a collision of a hot donor of energy E with a cold bath molecule. Thus, in 

experiments of this type, it is possible to correlate the donor energy, its vibrational density of states and other 

molecular properties with excitation probabilities, population distributions and the kinetic energy distributions of 

the scattered bath species. 


C3.3.4 DEDUCING ENERGY TRANSFER MECHANISMS FROM 
POPULATION AND VELOCITY DISTRIBUTIONS OF THE SCATTERED 
BATH MOLECULES' ROTATIONAL STATE POPULATION 
DISTRIBUTIONS FOR VIBRATIONAL EXCITATION OF THE BATH 

Figure C3.3.5 shows typical data obtained from experimental studies of the type described above, where the hot 
donor is the nitrogen heterocycle pyrazine, C 4 H 4 N 2 , initially excited by an excimer laser to an energy of 40 640 

cm . Here the process probed is excitation of a vibrationally-excited bath state where all three degrees of freedom 
of the bath — vibration, rotation, and translation — can become excited. In this particular case the vibrational state 

excited by collision is the first asymmetric stretch level of C0 2 , 00 1 that has 2349 cm of vibrational energy, 
roughly ten times the mean thermal energy in these experiments (kT = 208 cm -1 , where k is Boltzmann's constant). 
Shown in the upper half of the figure is a 'Boltzmann plot' of the natural log of the measured rotational state 

populations for just the 00 1 level, divided by their degeneracy, 2J+ 1, versus J(J+ 1) that is proportional to the 
molecular rotational energy. The slope of such plots (-1/kT^) gives the temperature (r R ) describing the rotational 

state distribution for a system at equilibrium at temperature T R [6, 9, 10 and H, 13, 18]. There are two remarkable 
things about this figure. First, the rotational state population distribution does give a straight-line 'Boltzmann plot' 
suggesting that the C0 2 molecules, scattered into this excited vibrational level by collisions with vibrationally hot 
pyrazine molecules, have a 'pseudo-equilibrium' distribution. Second, and far more amazing, is that the 
temperature, T R , deduced from the slope of this plot is only 383 ± 40 K, just slightly warmer than the initial, 
ambient cell temperature of 298 K! 
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Figure C3.3.5. The upper half of the figure is a 'Boltzmann' plot of the natural log of the population scattered into 

the CO-(00 1; J) vibration-rotation level divided by 2J+ 1, for collisions with an excited pyrazine molecule as 
depicted by the equation at the top of the figure. The slope of such a plot is l/kT R where k is Boltzmann's constant 

and T R is the temperature that characterizes the rotational- state distribution in the CO 2 (00 1) vibrational level. 
Shown in the lower half of the figure is a Doppler-lineshape profile for the CO 2 (00°l; J= 19) -> CO 2 (00°2; J= 
18) transition, where the molecules in CO 2 (00 1; J =19) have been excited by the collision process depicted at the 
top of the figure. The best fit of the data (circles) to a Gaussian profile gives a translational temperature of 328 K, 
indistinguishable in this case from the ambient 298 K Doppler profile. 

C3.3.4.1 VELOCITY PROFILES FOR VIBRATIONAL EXCITATION OF THE BATH 

In the lower half of figure C3.3.5 is a plot of the spectral lineshape for the transition 00 1; J= 19 —> 00 2; J= 18 
that provides a measure of the recoil velocity distribution for molecules scattered into the 00 1; J =19 state. The 
width of this distribution is characterized by a temperature of 328 ± 30 K that is again only slightly larger than the 
298 K gas temperature of the pre-excited molecules. These collisions have managed to insert into the bath acceptor 
molecules an energy equivalent to more than ten times the mean ambient energy of the initial molecular ensemble 
(before excimer laser excitation) without significantly exciting either the translational or rotational degrees of 
freedom of the molecule [6, 9, 10 and 11, 13, 18]! 
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In fact, the two observations represented by the upper and lower halves of figure C3.3.5 paint a remarkably 
consistent picture of the physical process that leads to vibrational excitation of the bath molecules in collisions with 
molecules having chemically significant amounts of vibrational energy. Such collisions must be 'soft', taking place 
at some distance short of the repulsive potential wall for interaction of the two molecules. Note that an impulsive 
collision sampling the steep repulsive wall of the intermolecular potential of a highly vibrationally excited 
molecule with its rapidly oscillating 'atomic pistons' would perforce be kicked rather hard, thereby developing 
substantial translational recoil. Furthermore, since CO ? is cigar shaped, it is nearly impossible to strike the 


molecule anywhere along its molecular axis (except perpendicular to the axis at the C atom and parallel to the axis 
at the O atom end) without inducing significant rotational motion [20]. Long-range energy transfer of this type has 
been characterized for small molecules at low excitation energies [ 21 , 22, 23, 24, 25, 26, 27, 28 and 29], but came 
as a complete surprise for encounters of the highly energetic nature described here. In such cases excitation of the 
vibrations of the acceptor is believed to arise from resonant vibrational-energy transfer in which the donor and the 
acceptor lose and gain, respectively, equal amounts of internal energy. Such an energy exchange is known to occur 
via long-range forces of which the most important is the transition-dipole-transition-dipole interaction moment. 
(The contribution to the dipole moments arises from vibrations of the molecules and is proportional to the 
derivative of the dipole moment with respect to the molecular, internal, nuclear coordinates) [21, 22, 23, 24, 25, 26, 
27 , 28 and 29]. It is worth emphasizing that such energy transfer processes do not occur with high probability in a 
typical collision. (The rate constant for the process depicted in figure C3.3.5 has been measured and indicates that 

the probability for exciting all of the 00 1 molecules without regard to their rotational state designation, is about 
1% per gas kinetic encounter [6, 8, H, 13].) 

C3.3.4.2 VELOCITY PROFILES FOR TRANSLATIONAL-ROTATIONAL EXCITATION OF THE BATH 

In stark contrast to the results shown in figure C3.3.5 are data obtained for collision processes that lead to no 

vibrational excitation of the bath molecule, leaving C0 2 in its ground vibrationless state, 00 0, but in highly 
rotational excited levels [5, 9, H, 12, 14, 16, 17 ]. The mean J for C0 2 at T= 298 K is 24, and figure C3.3.6 shows 

linewidth data obtained for the transition 00 0; J= 72 — > 00 1; J= 71 of C0 2 produced by collisions with 
methylpyrazine molecules excited to energies of 37 000 and 41 000 cm , respectively. Such high rotational levels 
of C0 2 are essentially unpopulated at the cell temperature of 298 K. The linewidths measured for this transition 

correspond to temperatures of 1340 ± 250 and 1 160 ± 220 K, respectively, a little over four times the initial 
ambient value, indicating significant recoil for this rapidly rotating molecule. Again, the picture presented by these 
data is very consistent if the behaviour of both the rotational and translational motions of the recoiling C0 2 bath 
molecule are considered. A hard, impulsive, collision that samples the repulsive wall of the methylpyrazine donor, 
with its rapidly oscillating atomic pistons, provides a significant kick to the bath acceptor molecule causing large 
translational recoil. Again, because C0 2 is cigar shaped, such a violent hit on the molecule almost always leads to 
significant rotational excitation accompanying the translational recoil [ 20 ] ! While rate constant measurements for 

such processes indicate that the probability for exciting a given J level of 00 is of the order of 0.1-1% per gas- 
kinetic collision [5, 9, H, 12, 14, 16, 17] the total excitation probability for this process, summed over all ./ levels 

in 00 0, is probably greater than 90%. The key point is that a collision between a medium-sized donor, with very 
high levels of internal vibrational excitation, and a small bath acceptor molecule is most likely to put donor 
vibrational energy into rotation and translation of the bath with little going into overall bath rotation. 
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Figure C3.3.6. Doppler-line profiles for molecules scattered into the CO^OC^O; /= 72) state by collisions with hot 
methylpyrazine molecules as depicted by the equations above each half 01 the figure. The energy of methylpyrazine 

in the upper half of the figure is 37 000 cm (excitation at 266 nm) while the energy of methylpyrazine in the 
lower half of the figure is 41 000 cm (excitation at 248 nm). 

C3.3.4.3 QUALITATIVE CORRELATION OF ENERGY TRANSFER DATA TO THE INTERMOLECULAR POTENTIAL 

Figure C3.3.7 shows a typical intermolecular potential and the types of recoil linewidth that are observed for 
collisions that sample the long- and short-range forces acting during a collision. For all cases studied so far, 
vibrational excitation of the bath acceptor species is accompanied by almost no excitation of translational or 
rotational motion. The clear signature of the kind of collision that samples the long-range, attractive part of the 
intermolecular potential is narrow recoil linewidths as shown in the upper half of Figure C3.3.7 . Figure C3.3.8 
shows what might be a typical trajectory for this kind of interaction — a distant fly-by in which energy exchange is 
brought about by long-range electrical forces acting at a distance. On the other hand, collisions that sample the 
steep repulsive part of the potential shown in the lower half of Figure C3.3.7 have a signature characterized by 
wide recoil linewidths as shown in the upper half of Figure C3.3.7 . A typical 'direct hit' trajectory that samples 
this part of the potential is shown in figure C3.3.9 . 
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Close encounters of the bath molecule with the rapidly vibrating atoms of the donor lead to strong recoil (and 
corresponding rotational motion) with accompanying loss of energy from the donor vibrations. The distant fly-by is 
characterized by vibration-vibration (resonant) energy transfer, while the impulsive, violent collision is 
characterized by vibration-translation/rotation energy transfer. It is reasonable to ask why these mechanisms are so 
clearly separated in nature, for example, why the vibrationally excited bath molecules do not have a recoil 
linewidth intermediate between their original, ambient value and the large recoil linewidths exhibited in vibration- 


translation/rotation energy transfer. While a definitive answer to this question is at present lacking, some clues can 
be found in the nature of the bath vibrations that have been studied so far. In all cases investigated to date with this 
technique, the bath modes excited by collisional energy transfer have had vibrational states whose separation is 
large compared to the mean thermal energy (hv » kT). These are sometimes referred to as 'stiff vibrations. There 
are good theoretical reasons to expect 'soft' acceptor modes with energy separations comparable to kT to be excited 
by a combination of both short- and long-range forces [30, 31, 32, 33, 34, 35, 36, 37, 38 and 39 ]. 
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Figure C3.3.7. In the upper half of the figure are shown typical measured Doppler profiles for molecules scattered 
into the (00 0; J= 72) or (00 1; J =17) states of C0 2 by collisions with hot pyrazine having an energy of 40 640 
cm . In the lower half of the figure is shown a typical intermolecular potential identifying the 'hard' and 'soft' 
collision regimes and the kind of energy transfer they effect. 
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Figure C3.3.8. A typical trajectory for a 'soft' collision between a hot pyrazine molecule and a C0 2 bath molecule 
in which the C0 2 becomes vibrationally excited. 


ApprOach| 


r vJb =27MK ffp 





</ ^j ^1 


Recoil 




f„ t = 700 K 

r IHftf 800 - 2000 K 


Figure C3.3.9. A typical trajectory for a 'hard' collision between a hot donor molecule and a C0 2 bath molecule in 
which the C0 2 becomes translationally and rotationally excited. 
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C3.3.5 QUANTITATIVE DATA ANALYSIS 


C3.3.5.1 MASTER EQUATION ANALYSIS OF UNIMOLECULAR REACTION DYNAMICS 


The analysis of the data presented so far, while qualitative, has provided a clear picture of the general mechanisms 
for loss of energy from a molecule with chemically significant amounts of energy. Nevertheless, a quantitative 
representation of this information would be highly desirable to use as a bench mark for comparison with theory and 
in practical applications where rates of unimolecular reactions are modelled with master equation techniques [40, 
41 , 42, 43 and 44]. Figure C3.3.1Q shows a schematic, energy-level diagram for a molecule undergoing 
unimolecular decomposition. Above the reaction barrier the molecule has sufficient energy to undergo 
decomposition at a rate k E represented by the arrows going to the right. On the other hand, quenching collisions of 
the type we have been discussing carry molecules down to lower energy as represented by the downward arrows in 
the figure. When a quenching collision carries a molecule from an energy above the reaction barrier to an energy 
below this barrier, it snuffs out the reaction process. In the original Lindemann unimolecular reaction scheme [2], 
both k E and the rate constant for quenching collisions were assumed to be the same, independent of energy. This 
assumption is too simple. Both k E and k , the quenching rate constant, are energy dependent. Quenching collisions 
can not only bring molecules below the reaction barrier depicted in figure C3.3.1Q , they can also reshuffle 
molecules within the different energy states above the barrier, changing the rate of unimolecular decomposition 
because of the dependence of k E on E. An expression that takes into account these different energy dependences is 

the master equation giving the rate of loss of substrate S (E) at an energy E [40, 41, 42, 43 and 44 ]: 

d[S*{fi)]/dr = k u [B] fV<A\ '0[S-{£ )] - W* E)[S*tE)])de - MS"(E)]. 

The last term in this expression is simply the unimolecular rate of loss of substrate S*(E), k LJ is the Lennard- Jones 
rate constant, [B] the concentration of bath molecules and P(E, E') the energy transfer probability distribution 
function. P(E, E') gives the probability that a substrate molecule S*(£) at energy E will be carried to an energy E 
in a collision with a bath quenching molecule. Note that, while k E is a property of the substrate molecule alone, P 
(E, E) depends on the identity of both substrate and bath molecules (as well as on the energies E and E). From this 
expression, a full description of a unimolecular reaction process clearly requires a knowledge of the distribution 
function P(E, E) for energy loss from the substrate S*(iT). 
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Figure C3.3.10. A schematic energy-level diagram for a molecule capable of undergoing unimolecular reaction 
above the energy depicted as the reaction barrier. Arrows to the right indicate reaction (collision- free) at a rate k E 
that depends on the energy E. Down arrows represent collisional redistribution of the hot molecules both above and 
below the reaction barrier. 

C3.3.5.2 EXTRACTING THE ENERGY TRANSFER PROBABILITY DISTRIBUTION FUNCTION P(E, E) 


The data obtained in the infrared-diode-laser-probe studies described above provides quenching information at a 
given substrate donor energy E. By varying the laser excitation wavelength for production of vibrationally hot 


species, equivalent data at other excitation energies can be obtained. Experiments of this type have already begun 
[45, 46 and 47]. Even to extract P(E, E) at a single donor energy E by inverting the experimental data from these 
studies is a formidable task [15]. First, the information obtained provides a measure of the probability, P(00 0;J, 
V), for producing a given final quantum state of the bath acceptor molecule, such as 00 0, J, V, with a specific 
vibrational (00°0), rotational (J) and velocity (V) signature. In order to turn such a distribution into a P(E, E') 
distribution, some method of identifying the initial state of the bath molecule must be found so that the energy 
change AE = E -E occurring during the collision can be determined. (The conventional way to define AE gives it a 
negative sign for energy loss {E < E) from the donor.) The initial state of the bath molecule consists of a 
Boltzmann distribution of velocities and rotational state populations described by the cell temperature T. Thus, each 
final state of the bath molecule can arise from a number of different initial states leading to a distribution of AE 
values. Fortunately, for large A E, this spread is not too significant because the initial distribution for cell 
temperatures near T= 300 K is not large. In addition, by studying the final P(00 0;J 9 V) distributions as a function 
of cell temperature, the initial states of the bath that contribute significant population to a given scattered 00°0;J, V 
state can be narrowed still further [15, 16]. Second, in the case of translational motion, the quantity of interest is the 
energy transferred in the centre-of-mass frame that takes into account the recoil of both the bath acceptor and the 
donor. Thus, the data obtained in the experiments that measures the laboratory frame recoil velocities of the bath 
molecules, as described here, must be transformed into the centre-of-mass frame. The procedure for doing this is 
lengthy and has been described elsewhere [9, 12]. Third, the results for collision-induced scattering into a large 
number of different final states of the bath molecule must be summed in order to obtain the complete distribution 
function P(E, E). Finally, there is no way at present to take into account (no experimental measure of) the change 
in rotational energy of the donor molecule during the collision. For heavy donors, this is not expected to cause 
much error in the determination of the distribution functions because angular-momentum constraints limit the 
maximum change in angular momentum during the collision [9, 20 ]. 
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For heavy molecules with very small rotational state spacing, this limit on A/ puts severe upper limits on the 
amount of energy that can be taken up in the rotations of a heavy molecule during a collision. Despite these 
limitations, P(E, E) distributions have been obtained by inverting data of the type described here for values of AE 

in the range -1500 cm > AE > -8000 cm for the two donor molecules pyrazine and hexafluorobenzene with 
carbon dioxide as a bath acceptor molecule [ 15 , 16 ]. Figure C3.3.1 1 shows these experimentally derived 

probability distributions for events that leave the C0 2 bath molecule in its ground vibrational level 00 ('pure' 
vibration-rotation/translation energy transfer). Even though limited to large values of A E, these probability 
distribution functions are very revealing. First and foremost, we see that the probability for very large energy 

transfers (e.g. -AE = 6000-8000 cm") is small but measurable. These so-called 'super collisions' were a great 
surprise when first discovered a number of years ago [48, 49, 50, 51, 52, 53, 54, 55 and 56]. The distributions in 

figure C3.3.1 1 can be thought of as the 'supercollision tail' of the P(E, E) distribution function. A second 
interesting feature of the data in figure C3.3.1 1 is the difference between the two molecules. Evidently, P(E, E) is 
significantly larger for hexafluorobenzene than for pyrazine at small AE. Since the average energy transferred in 
collisions of this type is always weighted heavily by low AE values, we would expect that the mean energy 
transferred from hexafluorobenzene to bath acceptors would be larger than the mean energy transferred from 
pyrazine to the same bath acceptors. Experimental measurements of these average energy transfer values have not 
yet been made for the same C0 2 quencher bath molecule, but the trends from self-quenching data indicate that 
C 6 F 6 will have a substantially larger mean energy transfer value than pyrazine [52, 58]. Such a trend is consistent 
with the large number of low-frequency modes in C 6 F 6 compared to pyrazine, a factor that usually drives up the 
mean-energy-transfer values. Finally, the probabilities for transferring large amounts of energy become larger in 

pyrazine than in C 6 F 6 for AE values more negative than about -3000 cm . While no data are generally available 
on the variation in 'supercollision' probability with molecular parameters, the trend in figure C3.3.1 1 can be 
rationalized by recognizing that pyrazine has more high-frequency vibrations than C 6 F 6 . In transferring, e.g. 6000 

cm from pyrazine to C0 2 , only two C-H-stretch quanta are required, while four C-F-stretch quanta would be 
needed in the case of C 6 F 6 [16]. As a general rule, energy- transfer probabilities increase if the number of 

vibrational quanta surrendered in the exchange process are minimized [30, 31, 32, 33, 34, 35, 36, 37, 38 and 39 ]. 
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Figure C3.3.11. The energy transfer probability distribution function P(E, E ) (see figure C3.3.2 ) for two 
molecules, pyrazine and hexafluorobenzene, excited at 248 nm, arising from collisions with carbon dioxide 

molecules. Only those collisions that leave the carbon dioxide bath molecule in its ground vibrationless state 00°0 
have been included in computing this probability. 
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The probability distribution functions shown in figure C3.3.11 are limited to events that leave the bath molecule 
vibrationally unexcited. Nevertheless, we know that the vibrations of the bath molecule are excited, albeit with low 
probability in collisions of the type being considered here. Figure C3.3.12 shows how these P(E, E) distribution 

functions of Figure C3.3.11 are changed if the probability for exciting the CO 2 (00 1) level is also included. 
Because such bath vibrational excitation is accompanied by essentially no translational or rotational energy gain, 

the probability increase from this channel is confined to a narrow region at 2349 cm , the energy of the C0 9 00 
— » 00 1 vibrational transition. Thus, the quantum nature of the bath vibrations, coupled with a mechanism in which 
bath vibrations are resonantly excited by vibration-vibration energy exchange from a hot donor, leads to the 

appearance of resonances or spikes in the P(E, E) distribution function [ 15 , 16]! 
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Figure C3.3.12. The energy-transfer-probability-distribution function P(E, E) (see figure C3.3.2 and figure 
C3.3.11 ) for two molecules, pyrazine and hexafluorobenzene, excited at 248 nm, arising from collisions with 
carbon dioxide molecules. Both collisions that leave the carbon dioxide bath molecule in its ground vibrationless 
state, 00°0, and those that excite the 00° 1 vibrational state (2349 cm -1 ), have been included in computing this 
probability. The spikes in the distribution arise from excitation of the carbon dioxide bath 00 1 vibrational mode. 
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C3.3.6 SUMMARY 


A variety of interconnected, bimolecular, collision studies have been described that employ laser devices to 
investigate the detailed energy disposal in product molecular species and to provide a direct experimental measure 
of the energy-transfer distribution function P(E, E). A knowledge of this function is important both in testing 
detailed theoretical energy-transfer calculations and in modelling unimolecular chemical reactions using master- 
equation methods. Although there have been a number of extremely informative studies of the energy relaxation of 
highly vibrationally excited molecules in the past, with rare exception these studies are not able to follow all the 
degrees of freedom of the quencher molecules. The experimental approach described here, designed as it is to 
investigate the detailed dynamics of these collisions by separately probing the vibrational, rotational and 
translational degrees of freedom, can significantly increase our understanding of the mechanisms for these 
fundamental processes that are of such importance in studies of photochemistry, unimolecular and bimolecular 
reactions. All of these experiments provide data of fundamental chemical interest since the information obtained is 
sensitive to molecular-potential-energy surfaces and can serve as a test for necessarily approximate dynamical 
theories. In addition, many of the experimental data obtained will be of practical interest in the study and control of 
unimolecular chemical reactions and photochemical processes in the development of optically pumped molecular 


lasers and in the development of an improved understanding of atmospheric chemical reactions. 
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C3.4 Electronic energy transfer in condensed 
phases 

Audrey A Demidov and David L Andrews 


C3.4.1 INTRODUCTION 

In condensed phase matter, the process of electronic excitation through the absorption of light commonly results in 
a system with an initially high degree of energetic instability. One of the most important and rapid means by which 
the system begins to accommodate its energy is a redistribution of excitation between the component optical 
centres (molecules or chromophores). The fundamental mechanism for this redistribution is the phenomenon of 
electronic energy transfer. At the molecular level this is a pairwise process in which the energy of electronic 
excitation held by one molecule transfers to another. As a result the former molecule, called the donor, imparts its 
energy of electronic excitation to the latter acceptor. 

In modelling energy transfer in a bulk medium containing more than two chromophores or molecules, one has to 
consider all the individual pairwise processes and conduct the appropriate averaging. Such an approach is valid 
when excitation can be localized on individual molecules. This is called localized excitation', the localized energy is 
termed a localized exciton. The process of energy migration is then known as incoherent energy transfer, and the 
jump of an exciton from one molecule to another is a good way to describe it. The physical condition for incoherent 
energy transfer is weak coupling between the donor and acceptor species. 

In the opposite case, i.e. strong coupling, the model of energy transfer changes. Then the donor and acceptor form a 
dimer-like structure as their excited states mix together and form a joint excited state split by twice the coupling 
energy. With such excitonic states we can no longer specify the molecular location of the electronic excitation, as it 
is 'spread out' between both molecules. Now we have to use a different language, and the issue is energy transfer 
between excitonic states. 

In systems comprising a large number of molecules one can find various types of energy transfer. If the molecules 
are strongly coupled we observe multi-excitonic behaviour, where not just two but a number of excitonic excited 
states are formed. Energy transfer in this case is called coherent energy transfer. Generally it may happen that in a 
bulk medium some molecules have strong coupling, whereas others have weak coupling, resulting in a mixture of 
coherent and incoherent energy migration. Let us now consider in more detail the different mechanisms that lead to 
these types of electronic energy transfer. 


C3.4.2 INCOHERENT ENERGY TRANSFER 

In our everyday life we move in a world that we largely experience through processes of electronic energy transfer. 
For example, you can read the printed words on this page because the print absorbs radiation emitted by your 
source of light. At the atomic level that is a process of electronic energy transfer from, let us say, the excited 
tungsten atoms of your reading lamp to the atoms of carbon on the page. The same principle applies across cosmic 
scales as we peer up at 
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distant stars in the night sky, the very process of vision requiring molecules of rhodopsin in the retina to capture 


photons of starlight and so enter an electronic excited state. In a very real sense this radiative process is a direct 
transfer of electronic energy from the star to the retina. Obviously the coupling between the donor and acceptor in 
such a case is close to zero. 

The coupling remains relatively weak even at the other extreme of distance, where the energy transfer occurs 
within a single piece of matter and the donor and acceptor are separated by, let us say, only a few nanometres. 
Given the weak coupling, we still have incoherent energy transfer, but now the photon that conveys the energy has 
to be conceptualized as a virtual quantity — its involvement can only be inferred, and the energy transfer is for all 
practical purposes radiationless. This kind of energy transfer was first investigated by Forster in 1948 [1], and 
addressed with a perturbation theory based on dipole-dipole interaction between the excited donor and unexcited 
acceptor. Later the theory was rectified in a number of works [2, 3] and became the proven workhorse for much of 
the modern research on energy transfer in condensed matter [4, 5]. 

More recently Andrews and Juzeliunas [6, 7] developed a unified theory that embraces both radiationless (Forster) 
and long-range radiative energy transfer. In other words this theory is valid over the whole span of distances 
ranging from those which characterize molecular structure (nanometres) up to cosmic distances. It also addresses 
the intermediate range where neither the radiative nor the Forster mechanism is fully valid. Below is their 
expression for the rate of pairwise energy transfer w from donor to acceptor, applicable to transfer in systems where 
the donor and acceptor are embedded in a transparent medium of refractive index n: 

9*2^ f . du 


wy = 


* f du 
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(C3.4.1) 


Wl = — --n^I^l -2*1*3) / Fr,(frj)<T A £f>')^T 

In the above expression the three terms respectively represent a 'Forster' rate contribution, w F ; a radiative 
(fluorescence) term, w d ; and an 'intermediate' term, Wj. The various parameters featuring in ( C3.4.1 ) are defined 
as follows: F D (co) is the normalized spectrum of donor fluorescence; a A (co) is the absorption cross section of the 
acceptor; co is the optical frequency (2nv); c is the speed of light; t d is the radiative lifetime of the donor, n is the 
refractive index; and R is the distance between the donor and acceptor. Lastly, k 1 3 are orientation factors (their 
detailed form to be given below). It may be noted that if the medium across whicn energy transfer occurs is not 
transparent, one has to take into account the frequency dependence of the refractive index, and in particular its 
imaginary part, leading to a more complex result (see [7]). One can find elsewhere the following alternative form 
of the Foorster formula: 

WF =^1(^Y (C3-4.2) 

1 T D \RJ 
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where R^ is called the 'critical' or Forster radius. Obviously ( C3.4.2 ) and the Forster component of ( C3.4.1 ) are 
related, and the Forster radius is calculable from the overlap integral of the appropriate fluorescence and absorption 
spectra. Typical values of the Forster radius range over tens of angstroms, see [4, §]. 

In ( C3.4.1 ) we find three major forms of dependence on distance: R ,R and R which correspond to Forster, 
intermediate and radiative types of energy transfer. The intermediate part usually makes a significant contribution 
at distances of about a hundred or a few hundred nanometres, where all three components w F , Wj and w rad are 
comparable in magnitude. At shorter distances w v dominates, whereas at larger distances w raH dominates. It is 


important to clarify the meaning of t d . It represents the rate of radiative transition of the donor from its excited 
state to the ground state, related to the measured fluorescence lifetime t fluor through the fluorescence quantum 
yield r|, with t fluor = r|T D . For example, chlorophyll a in various solutions exhibits a fluorescence lifetime of about 
5-7 ns; with x\ ~ 0.3-0.35, that makes x D ~ 15-20 ns [8, 9 and 10]. 

The k factors in ( C3.4.1 ) represent another very important facet of the energy transfer [4, 11 ]. These factors depend 
on the orientations of the donor and acceptor. For certain orientations they can reduce the rate of energy transfer to 
zero — for others they effect an 'enhancement' of the energy transfer to its maximum possible rate. Figure C3.4.1 
exhibits the angles which define the mutual orientation of a donor and acceptor pair; in terms of those angles the 
orientation factors k 1 and k 3 are given by [6, 7] 

ifj = 0*d /'/O - /(« /*&)(£ -aa) =cos«- ; cos fi cosy / = I. J (C3.4.3) 

where the casey = 3 is the conventional orientation factor, usually written simply as k. 



Figure C3.4.1. Example of the spatial orientation of three major vectors: ^W, f 1 Aand R , unit vectors of the donor 
and acceptor transition dipole moments, and the donor-acceptor displacement- vector. 

The dipole-dipole interaction which leads to ( C3.4.2 ) and (C3.4.3) for energy transfer is in certain cases not 
applicable, as for example, if either the donor or acceptor transition is dipole (El) -forbidden or exceptionally weak. 
Then, the coupling necessarily involves the electric quadrupole moment (E2), higher electric multipoles (En) or 
even magnetic multipoles (Mn), in each case leading to an orientation and distance of a different form. In the most 
common case of predominantly electric coupling, then if (En) and (Em) are the leading non-zero moments of the 

donor and acceptor, the distance dependence takes the form R-2( n+m+ l) m Details of the functional form of the 
distance and angle dependence is discussed for example in [12, 13 and 14]. 
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Knowledge of the pairwise energy transfer rates forms a basis for finding the average rate of energy transfer in an 
ensemble of TV molecules. To this end, a system of 'master equations' is commonly employed [15, 16 and 17 ]. 
Then, the probability, /?., to find excitation on molecule i can be calculated as: 




dp, 


— = -— - 2^ WjyPf + 2^ W «W + ^ 




—T" = > li^iP.V + > WAr/Pj + * ^ (C3.4.4) 

Here x. is the intrinsic lifetime of the excitation residing on molecule i (i.e. the fluorescence lifetime one would 

observe for the isolated molecule), w.. is the pairwise energy transfer rate and F. is the rate of excitation of the 

ij i 

molecule i by the external source (the photon flux multiplied by the absorption cross section). The master equation 
system (C3.4.4) allows one to calculate the complete dynamics of energy migration between all molecules in an 
ensemble, but the computation can become quite complicated if the number of molecules is large. Moreover, it is 
commonly the case that the ensemble contains molecules of two, three or more spectral types, and experimentally it 
is practically impossible to distinguish the contributions of individual molecules from each spectral pool. 

The measurement of fluorescence intensity from a compound containing chromophores of two spectral types is an 
example of a system for which it is reasonable to operate with the average rates of energy transfer between spectral 
pools of molecules. Let us consider the simple case of two spectral pools of donor and acceptor molecules, as 
illustrated in figure C3.4.2 [18]. The average rate of energy transfer can be calculated as 

*DA = ffJ^J^ w 08i (C3.4.5) 

■ ; - I r — I 

where TV and M are the number of donor and acceptor molecules respectively and g. is the probability to find 
excitation on the donor i, commonly we have g i ~ 1. In this case the master equation system can be simplified as: 
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A 


In these equations n D and n A are the excited state populations of the donor and acceptor molecules- and x D and x 
are the lifetimes of the donor and acceptor molecules in the excited state; the notation x D is used to distinguish it 
from the radiative constant x (in other words x = x fluor for the donor); £ DA is given by ( C3.4.5 ) and £* D , the 
corresponding rate constant for the backward energy transfer from acceptors to donors can be found by the same 
means. Finally, F D and F A represent external sources of excitation, for example the absorption of laser light by the 
donor and acceptor molecules. Commonly, for example in the case of 8-pulse excitation (in practice an ultrashort 
laser pulse), (C3.4.6) yields exponential decay kinetics for n^(f) and n A (t). The opposite case of steady excitation 
(CW light), yields the equilibrium ratio 


= : . (C3.4.7) 

Master equation methods are not the only option for calculating the kinetics of energy transfer and analytic 
approaches in general have certain drawbacks in not reflecting, for example, certain statistical aspects of coupled 
systems. Alternative approaches to the calculation of energy migration dynamics in molecular ensembles are Monte 
Carlo calculations [18, 19 and 20] and probability matrix iteration [21, 22], amongst others. 



Figure C3.4.2. Schematic presentation of energy transfer between: (a) two donor molecules and six acceptor 
molecules; and (b) a general case of energy transfer involving a pool of TV donor molecules and a pool of M 
acceptor molecules. 
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C3.4.3 POLARIZATION ANISOTROPY 

Let us consider the case of a donor-acceptor pair where the acceptor, after capturing excitation from the donor, can 
emit a photon of fluorescence. If the excitation light is linearly polarized, the acceptor emission generally has a 
different polarization. Common quantitative expressions of this effect are the anisotropy of fluorescence, r, or the 
degree of polarization, P; 


' = 77T2t: p= i^i:^ r= — (C3A8) 

where L and 7j_ are measured components of the acceptor fluorescence parallel and perpendicular to the 
polarization of the incident excitation beam. The anisotropy of fluorescence is a valuable source of information 
about the structure of molecular complexes. The key factor determining the change in polarization is the angle a 

between the directions of the transition dipole moments of the donor, ^W, and the acceptor, f 1 A(see figure C3.4.1 . 
In the 1920s Levshin [23, 24] and Perrin [25] derived the following well known formula for an ensemble of 
randomly oriented donor-acceptor pairs 


3 coir a — 1 

P = — 5—- (C3A9) 

3 + cos" 1 a 

This formula allows one to directly calculate the angle a, a microscopic parameter, by measuring the macroscopic 
value P (or equally the anisotropy r). 

When applying polarization techniques one has to bear in mind that the above result is derived for an ensemble of 
independent donor-acceptor pairs with a 'rigid' structure — i.e., a system for which a is a constant. If we deal with 
an ensemble of randomly oriented donor and acceptor molecules the result is dramatically different, reflecting a 
very rapid loss of polarization 'memory'. Then, one single act of energy transfer yields r Q = 1/25 [2] and two or 
more energy transfer jumps, to all intents and purposes, totally destroys any polarization in the emitted 
fluorescence. This feature may be used as a powerful test for energy transfer in a sample of freely rotating or 
disordered molecules. If energy transfer occurs in a non-ordered system, the acceptor fluorescence is depolarized; if 

o 

not, the fluorescence remains polarized. The one exception is where the angle a = 54.7 , the so-called 'magic 
angle', when the polarization will be zero anyway. In the case of incoherent energy transfer, the maximum 

magnitude of the anisotropy (r = 0.4) happens when the donor and acceptor have linear transition dipole moments 

oriented in parallel. 

Naturally occurring molecular ensembles such as proteins from photo synthetic systems (plants, algae, 
photosynthetic bacteria, etc) are usually relatively rigid systems that contain various chromophores and hold them 
at fixed positions and orientations relative to each other. That is why, despite the numerous energy jumps between 
the chromophores, the resulting emitted fluorescence is polarized. The extent of this polarization thus affords 
invaluable information about the internal structure of molecular complexes. 
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C3.4.4 NONLINEAR PHENOMENA 

C3.4.4.1 BLEACHING OF THE GROUND STATE 

Equations ( C3.4.5 ) and ( C3.4.6 ) cover the common case when all molecules are initially in their ground electronic 
state and able to accept excitation. The system is also assumed to be impinged upon by sources F. The latter are 
usually expressible as the product aOfio, where a is an absorption cross section, O is the photon flux and Hois the 
population in the ground state. The common assumption is that No= n^, i.e. practically all molecules are in the 
ground state because n JS* n^. This is the assumption of linear excitation, where the system exhibits a linear 
response to the excitation intensity. This assumption does not hold when the extent of excitation is significant, i.e. 

when the rate of excitation inflow (a®) is larger than excitation dissipation, x _1 . In this case we have a significant 
depletion of the ground state: tto= («q -ri)< n^, resulting in a nonlinear response from the ensemble. 

C3.4.4.2 SINGLET-SINGLET (S-S) ANNIHILATION 

Let us now consider the case where there is more than one exciton in the given molecular ensemble. The presence 
of two or more excess excitons not only creates two or more 'holes' in the ground state (see case (a) above) but it 
also opens up the possibility of two excitons being found on neighbouring molecules. Then the following two-stage 
process can take place [26]: 


S, I S, -> S, ? \ S ft -^ S, \ S 0l (C3.4.10) 

In the first stage, where at first we have two excitons Si, excitation jumps from one of the excited molecules to 


another excited molecule. As a result, the latter molecule is promoted to a higher excited state S , while the former 
loses its energy and finds itself in the ground state S Q . This is followed by the second step, a fast internal relaxation 
of the highly excited molecule from S^ to S^ Thus, where we started with two S 1 excitations we end up with only 
one, i.e. one excitation has effectively been annihilated (the energy lost through intramolecular relaxation 
ultimately manifests as heat). A simplified mathematical expression accounting for S-S annihilation in a 
homogeneous ensemble of molecules is as follows: 

— = yrr ■+■ F. (C3.4.11) 

dr r 

The solution to this equation reflects a nonlinear response — the kinetics n(t) are strongly dependent on the 
magnitude of F and/or the initial conditions n(t = 0). 

S-S annihilation phenomena can be considered as a powerful tool for investigating the exciton dynamics in 
molecular complexes [26]. However, in systems where that is not the objective it can be a complication one would 
prefer to avoid. To this end, a measure of suitably conservative excitation conditions is to have the parameter o$x < 
0.01. Here x is the effective rate of intrinsic energy dissipation in the ensemble if the excitation is by CW light, and 
T = x las isthe 
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pulse duration if the source is a laser pulse faster than all dissipation processes: l/x las JS* (1/^). 

One other common source of nonlinear response, singlet-triplet annihilation, is often the reason for a discrepancy 
between fluorometric and absorption kinetic measurements [27, 28 and 29 ]. 


C3.4.5 COHERENT ENERGY TRANSFER 

C3.4.5.1 DIMER 

Let us first consider the simplest case of two identical chromophores (or molecules) that are in interaction with 
each other such that their 'dimer' Hamiltonian can be written [30, 31] as 


A A ■■. ■>. 


H = H\ + M 2 + V ]2 (C3.4.12) 

where /f,and /f,are the Hamiltonians of the individual molecules and V 12 is the operator for their dipole-dipole 
interaction 

V| 2 = (4™ )- '[(/*, • ,/ 2 )#f 2 3 - 3(/i| ■ R\ 2 )Ul 2 ■ R\2)Rn] (C3.4.13) 

each n being an operator on molecular wavefunctions (see also figure C3.4.1 and equation (C3.4.3) for notation 
and definitions). The solution of the Schrodinger equation with the above Hamiltonian yields two wavefunctions: 

*- = » (* - *0 *'* = <f>im3 atld * 53 = <Plam ' (C34.14) 


Here cp 10 , cp 2 o an< ^ ^la' ( f ) 2a are me wave f unc ti° ns of the non-excited and excited molecules if there is no interaction 
between them. In the case we consider the molecules do interact and as a result the dimer exhibits properties 
different from those of the monomers it comprises. In particular, the energy level of the excited state is different 
from the monomer — it is split into two states: 


hv + = /i%, +■ V u 


(C3.4.15) 


termed symmetric and antisymmetric respectively. In (C3.4.15), V l2 is the energy of dipole-dipole interaction 
between the monomers (the expectation value in the monomer product basis set of the operator given by 

(C3.4.13)). The dimer has a common ground state and excitation may terminate in either the v + or v~ excited state 
(see the solid arrows in figure C3.4.3 ). The transition dipole moments of these transitions are defined as: 
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(C3.4.16) 


(see figure C3.4.4 ). The dipole strength of these transitions are calculated as 


D ± = Z> 0a (] ±COSff) 


(C3.4.17) 


where cost* = ft \ -/oand a expresses the angle between transition dipole moments of unperturbed molecules, Z) Qa 
is the dipole strength of the individual monomer. One can see from (C3.4.17) that the total dipole strength is 

conserved, i.e. D + + D~ = 2Z) Qa . The singly excited dimer, in either its symmetric or antisymmetric state, can 
capture another photon and undergo transition to the doubly excited state (cp la cp 2a ) of energy 2hv Mil Depending in 

what excited state the dimer was originally, v + or v~ the resonance energy for the double excitation would be v~ or 
v + respectively, see figure C3.4.3. 
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Common ground state 
Figure C3.4.3. Energy levels of a dimer (complete description in text). 
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Figure C3.4.4. Definition of the dimer transition dipole moments ju + and ju on the basis of the monomer transition 
dipole moments ju 1 and ju 2 - 

Now let us consider the implications of these results for energy transfer. First we recognize that there is no directed 
energy transfer of the form considered in the incoherent case. Molecules in the dimer cannot be recognized as well 
defined separate entities that can capture and translate excitation from one to another. The captured excitation 
belongs to the dimer, in other words, it is shared by both molecules. The only counterpart to energy migration 

relates to 'internal' transitions between the dimer states v + and v~. In other words, if in the case of incoherent 
transfer, energy migrates between two or more distinct sites and an exciton is a localized entity that can at any time 
be localized on either of the molecules, then in the case of a dimer we have to change our language and speak about 
an internal transfer between energy states, as the excitation is delocalized between the contributing molecules [32]. 

Before moving further to the multimer case we should outline one further significant feature of a dimer. Dimer 
excited states have a well defined rotational strength [33, 34]; 




(C3.4.18) 


i.e. the dimer has a definite circular dichroism (CD) in each of its excited bands, whereas the monomers it 
comprises may not. This is a useful test to verify if one has dimers or monomers in a sample. To confirm the 
presence of a dimer, the spectrum should exhibit two bands ('+' and '-') manifesting CD of equal magnitude and 
opposite sign. Moreover, the dimer transition dipole moments jli + and \i~ are perpendicular to each other (see figure 
C3.4.4), yielding the linear anisotropy r = (3cos 90° - 1) 15 = -0.2 when the excitation is in the '+' band and the 
response is recorded in the '-' band. Also, the bleaching of the ground state measured in the '+' and '-' bands must 
happen synchronously because their ground state origin is common, whereas the 'antibleach' (absorption from the 
excited state into a higher state) and also stimulated emission would reflect the kinetics of an internal energy 
transfer between the dimer states. 

Recent theoretical [35, 36 and 37] and experimental [38] research has revealed anomalous behaviour of the dimer 
anisotropy under certain excitation conditions. If the dimer is excited by broadband light that covers both excitonic 
transitions, or by a relatively narrow band properly positioned between the maxima of the excitonic transitions, the 
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anisotropy would have the maximum value r = 0.7. This magnitude far exceeds the theoretical limit r = 0.4 for 
uncoupled molecules. 


C3.4.5.2 MULTIMER 


When more than two chromophores exhibit significant coupling, the ensemble can be described as multimer. This 
means that the energy levels of individual molecules are replaced by a number of excitonic levels following the 
same rule as for a dimer and the wavefunctions of the multimer are linear combinations of the wavefunctions for 
the unperturbed molecules [30, 39, 40]. The number of excitonic levels is equal to the number of chromophores if 
there is no degeneracy. The light-harvesting antennae of photosynthetic bacteria and some other photosynthetic 
preparations are believed to be an example of a multimer [39, 40 and 41]- Energy migration in the multimer 
proceeds in a form of energy equilibration between excitonic levels in the same manner as described above. 


C3.4.6 EXCHANGE MECHANISM OF ENERGY TRANSFER IN FORBIDDEN 
TRANSITIONS 

In the previous section we analysed cases of energy transfer where transitions between excited and non-excited 
states are allowed, i.e. the common case of dipole-dipole interaction. In 1953 Dexter [ 42 ] offered a theory that 
revealed the possibility of energy transfer in the case of dipole-forbidden transitions. The major idea of this theory 
is that when the electron distributions of the donor and acceptor are close enough to strongly overlap, the energy of 
electronic excitation can pass directly to the acceptor, essentially being channelled by the overlapped electron 
clouds. According to this theory the probability of energy transfer is described as 


=W 


F d (E)F a (E)d£ (C3.4.19) 


where 


2 < 4 


* 2 ^ 


(-?)■ 


exp — r- I- (C3.4.20) 


Here the parameter k is the dielectric constant of the medium; e is the charge of the electron; R^ is the critical 
radius; R is the distance of donor-acceptor separation; and L is a parameter introduced as 'an effective Bohr radius 
for the excited donor and unexcited acceptor' (see the original paper, [42]). The constant of proportionality Fis a 
dimensionless scaling entity <g.\. The coupling integral between the donor and acceptor (written in the energy 
domain) is analogous to the integrals introduced in ( C3.4.1 ). The problem with this integral is that F^{E), which 
represents the absorption spectrum of the acceptor, cannot always be measured directly from experiment (if the 
transition is forbidden) but must rather be calculated — and the latter can be quite a complicated procedure [42, 43 ]. 

One can see that the Dexter exchange mechanism is exponentially dependent on the distance between the donor 
and acceptor, and as such it begins to play a visible role only at very short distances when the electron clouds begin 
to 
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overlap strongly. The exponential distance dependence manifest in ( C3.4.2Q ) essentially reflects a typical 
asymptote for the wavefunctions of the molecular orbitals. It should be emphasized that the Dexter mechanism will 
operate over short distances both for dipole-dipole forbidden and allowed transitions. With increasing distance 
between the donor and acceptor, the Forster mechanism will come into play for dipole-dipole allowed transitions, 
whereas in the forbidden case one should account for transitions via higher multipoles. A review on further 
developments in this area, including superexchange mechanisms, can be found in [44], and for an account of how 
the Dexter and Forster mechanisms seamlessly merge for the dipole-allowed case, the reader is referred to [45]. 


C3.4.7 NUCLEAR MOTIONS AND ENERGY TRANSFER 

With the development of femtosecond laser technology it has become possible to observe in resonance energy 
transfer some apparent manifestations of the coupling between nuclear and electronic motions. For example in 
photosynthetic preparations such as light-harvesting antennae and reaction centres [32, 46, 47 and 49] such 
observations are believed to result either from oscillations between the coupled excitonic levels of dimers 
(generally multimers), or the nuclear motions of the chromophores. This is a subject that is still very much open to 
debate, and for extensive discussion we refer the reader for example to [46, 47, 50, 51 and 53 ]. A simplified view 
of the subject can nonetheless be obtained from the following semiclassical picture. 

In light of the theory presented above one can understand that the rate of energy delivery to an acceptor site will be 
modified through the influence of nuclear motions on the mutual orientations and distances between donors and 
acceptors. One aspect is the fact that ultrafast excitation of the donor pool can lead to collective motion in the 
excited donor wavepacket on the potential surface of the excited electronic state. Another type of collective nuclear 
motion, which can also contribute to such observations, relates to the low-frequency vibrations of the matrix 
structure in which the chromophores are embedded, as for example a protein backbone. In the latter case the matrix 
vibration effectively causes a collective motion of the chromophores together, without direct involvement on the 
wavepacket motions of individual chromophores. For all such reasons, nuclear motions cannot in general be 
neglected. In this connection it is notable that observations in protein complexes of low-frequency modes in the 

range 40-150 cm reflect vibrational periods of about 200-800 fs, comparable to typical rates of donor-acceptor 
energy transfer. 


C3.4.8 SPECTROSCOPIC METHODS AND TECHNIQUES 

C3.4.8.1 PUMP-PROBE ABSORPTION 

The spectroscopic methods called 'pump-probe absorption', or 'pump-probe transient absorption' involve the use 
of at least two laser pulses. One pulse excites the sample and the second probes changes in optical properties 
caused by the first pulse — thus, the first pulse is called the 'pump' and the second one the 'probe'. Obviously there 
needs to be a time interval between these pulses as the pump pulse precedes the probe. A typical configuration for a 
one-colour pump-probe installation (probing at the same wavelength as the excitation) is presented in figure 
C3.4.5 . Experiments based on probing at a different wavelength from the excitation require the use of at least two 
source laser beams, though the two-colour experimental set-up has the same principal elements. In either case the 
time resolution is defined by the difference in the optical paths to the sample of the pump and probe pulses. The 
book [54] can be recommended 
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as a valid source of different measurement techniques described in this and following sections. 
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Figure C3.4.5. Typical scheme of a single-colour pump-probe experiment utilizing lock-in-amplifier detection. 

Optical detectors can routinely measure only intensities (proportional to the square of the electric field), whether of 
optical pulses, CW beams or quasi-CW beams; the latter signifying conditions where the pulse train has an interval 
between pulses which is much shorter than the response time of the detector. It is clear that experiments must be 
designed in such a way that pump-induced changes in the sample cause changes in the intensity of the probe pulse 
or beam. It may happen, for example, that the absorption coefficient of the sample is affected by the pump pulse. In 
other words, due to the pump pulse the transparency of the sample becomes larger or smaller compared with the 
unperturbed sample. Let us stress that even when the optical density (OD) of the sample is large, let us say OD - 1, 
and the pump-induced change is relatively weak, say 10 , it is the latter that carries positive information. 

Thus we are challenged by the problem of measuring a small signal against the background of one much stronger. 
The problem is usually solved by one of two means: (a) lock-in-amplifier detection; and (b) a boxcar type of 
detection (to some extent we can include double-input optical multichannel detection in this category). 

(a) Lock-in-amplifier detection. This method involves 'chopping' or rapidly switching the pump beam by a 
mechanical chopper or by an electro-optic or acousto-optic modulator. Figure C3.4.5 shows the set-up where 
this chopping is achieved by an optical modulator (OM) of the above kind. The chopping introduces 
alternating periods of time when the pump beam is affecting the sample and when it is not, causing the 
detected signal to fluctuate at the chopping frequency. In that case the detector, for example a photodiode or 
photomultiplying tube (PMT), would see a change in the intensity of the probe beam corresponding to the 

value of 10 OD. The next stage is to filter out this signal. This is achieved by using the device called a lock- 
in-amplifier, which is actually an amplifier with a very narrow spectral bandwidth. Tuned into resonance with 
the chopping frequency, it will register only the part of the total detected signal that is modulated by the 
chopping frequency, dumping all other components. 

(b) Boxcar detection. This also uses signals associated with 'pump on' versus 'pump off conditions, but in a 
different manner. Whereas the lock-in detector measures differences in the intensity of the probe beam, the 
boxcar detector can measure signal ratio. The latter scheme involves the use of two channels with two 
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time/space gates. Through one gate the boxcar records the unperturbed probe beam (with the pump off), whereas 
through the other it records the probe beam with the pump on. The ratio of the perturbed over the unperturbed 
signals from the probe beam is the desired output. 


The gates referred to above can be created in various ways. For example, suppose that the probe beam goes through 
the sample, but only half of its physical width (in the sample) is crossed with the pump beam. Now, if we have two 
photodiodes, one can measure the intensity of the perturbed part of the probe beam, whilst the second measures the 
unperturbed part; as a result of creating spatial gates, the two recorded output signals can be used to measure the 


requisite ratio. At the same time, the signals from the detectors can be gated in time to improve the signal/noise 
ratio, i.e. the boxcar records signals only in the short time gate that opens in the presence of the light pulse signal. 
The rest of the time the gate is closed, and the boxcar receives no input. Averaging over a large number of input 
signals is another option that is used in boxcar instrumentation to improve signal/noise ratio. 

C3.4.8.2 UP/DOWN CONVERSION 

The nonlinear optical techniques of up- and down-conversion are based on mixing optical beams in a suitable 
crystal (BBO, LiNb0 3 , KDP, etc) with the generation of new optical frequencies: the physical principle is as 
follows. If two beams having optical frequencies oc^, oo 2 and wavevectors k^, k 2 are mixed in a nonlinear optical 
crystal at the appropriate angle, a new optical frequency oo 3 can be coherently generated with the following 
conditions satisfied: 

GJ3 = £cJ| ± <&2 and fcj = fc| ± Arj. 

The sum-frequency case of co 3 = co 1 + oo 2 is called up-conversion, the difference-frequency oo 3 = co 1 - oo 2 down- 
conversion, reflecting the increase or decrease of the generated optical frequency oo 3 from the input frequencies co 1 
and ol> 2 . 

Now, if we consider the co 1 input as sample fluorescence caused by the excitation pulse (the pump) and co 2 as the 
probe pulse, the process of up/down conversion affords a means of analysing the kinetics of sample fluorescence 
through observing the intensity of the co 3 output. In this case the time selection happens in the nonlinear crystal, not 
in the sample. It is as if the probe beam creates a very narrow time-gate that results in generation of the co 3 signal 
only when the gate is open. The kinetics is measured by delaying the time-gate versus the time of sample excitation 
by the pump pulse. 

C3.4.8.3 STREAK CAMERA 

Monitoring the kinetics of energy transfer in many systems calls for ultrafast (sub-picosecond) time resolution. 
Streak camera detection relies on fast electronics to achieve real-time detection of an ultrafast signal registering the 
transfer — usually a fluorescence signal. Modern streak cameras allow such measurements to be performed with 
picosecond resolution. The principle of detection is simple: the pump pulse excites the sample and triggers a fast 
electronic camera similar to the tube in a common oscilloscope. Fluorescence from the sample is collected by the 
input slit of the camera; the photons either hit the camera photocathode directly or are preamplified by use of an 
electro-optic amplifier. The beam of electrons so produced, created by the initial pulse of fluorescence and 
featuring its temporal profile, propagates towards the display phosphor screen. En route the beam is deflected by an 
electric field created by a fast generator triggered by the pump pulse. The result is a track on the phosphor screen 
with an intensity proportional to 
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the intensity of the fluorescence. Usually the streak camera is coupled with a CCD camera or a diode array to 
record this track. 

C3.4.8.4 SAMPLE CELL 

When performing measurements of energy transfer one has to pay attention to a number of factors that could 
invalidate the data. In particular, the 'reset time' of the sample (the time for conversion of excited molecules back 
to the ground equilibrated state) could be larger than the period between laser pulses. This creates a build-up of the 
excited states and/or intermediate products — the latter significantly complicating analysis of the collected data. Yet 
another complication relates to the local heating of the sample, which modifies the local optical properties from 


those which apply under equilibrium ambient conditions (thermal lensing, for example). The common way to 
eliminate these problems is to refresh the sample in the laser-illuminated area at a rate equal to or faster than the 
laser pulse repetition. This can be achieved using a flow cell, shaking cell, spinning cell or flow jet. Another kind 
of problem occurs if the laser pulses have significant noise (exhibiting spatial/temporal spikes for example) and no 
longer can be considered as propagating with a smooth Gaussian profile. This is a problem which is difficult to 
overcome without improving the quality of the pulses. 

In a flow cell the sample flows through a cuvette by use of a pump, the most popular kind being a peristaltic pump. 
A shaking cell is usually a cell that resides on a mount that can move laterally in a plane perpendicular to the laser 
beam(s). A spinning cell is a disc-like cell mounted on a motor shaft that rotates with a speed up to thousands of 
revolutions per minute. Finally the flow jet is an assembly in which liquid sample is ejected through the special jet 
nozzle to create a uniform fast stream of sample. The laser beam is focused in this stream. The spinning cell is 
generally the best choice to achieve the fastest refreshing rate in anaerobic conditions without damaging the sample 
molecules. 


i 
The population of excited states (n) and the probability to find excitation on an individual molecule (p) are related by n = pN, 

where N is the total number of molecules. 
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FURTHER READING 

We would like to recommend the following books for further reading. Recently (1999) Wiley 
published a book entitled Resonance Energy Transfer (ed D L Andrews and A A Demidov). This 
book contains a detailed overview of the major subjects briefly discussed in the present section. 
Previously, van der Meer et al [4] published a good work concerning the Forster mechanism of 
energy transfer, and in particular the impact of the orientation factor on the efficiency of energy 
transfer. Those who are interested in theoretical aspects of energy transfer in condensed matter are 
referred to the classic book 'Electronic Excitation Energy Transfer in Condensed Matter' by 
Agranovich and Galanin [2]. Copious examples of energy transfer in photosynthetic systems are 
presented in the books entitled Bioenergetics of Photosynthesis [ 10 ] and Excitation Energy and 
Electron Transfer in Photosynthesis (1987, edited by Govindjee, Kluwer Academic Publishers). 
Finally, a great many details and tips on practical applications of laser spectroscopy for the 
investigation of energy transfer, and its instrumentation, can be found in the book Chemical 
Applications of Ultrafast Spectroscopy by Fleming [54]. 
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C3.5 Vibrational energy transfer in condensed 
phases 


Lawrence K Iwaki and Dana D Dlott 


C3.5.1 INTRODUCTION 

Vibrational energy relaxation (VER) of molecules in condensed phases is a fundamental dynamical process [1, 2, 3, 
4, 5 and 6]. Isolated molecules can undergo only intramolecular vibrational energy redistribution (IVR), and 
cannot lose vibrational energy except by slow radiative processes. VER is usually associated with condensed 
phases or high pressure gases [7]. VER refers to loss of energy from a specific vibrational mode (the 'system') to 
some or all of the other mechanical degrees of freedom (the 'bath'). VER in condensed phases ordinarily occurs on 

11 Q 

the 10 -10 s time scale, although slower VER has been observed in diatomics. 

VER occurs as a result of fluctuating forces exerted by the bath on the system at the system's oscillation frequency 
Q [5]. Fluctuating dynamical forces are characterized by a force-force correlation function. The Fourier transform 
of this force correlation function at Q, denoted r|(Q), characterizes the quantum mechanical frequency-dependent 
friction exerted on the system by the bath [5, §]. 


The multiple roles of VER in essentially all condensed phase chemical processes have been extensively discussed 
[§]. In chemical reactions, the 'system' is the specific mode of the reactant (or a coupled set of reactant and solvent 
modes [8]) associated with the reaction coordinate. Chemical reactions are 'catalysed' by vibrational energy. The 
system becomes activated by vibrational energy from the bath. Then the barrier is crossed. Then that vibrational 
energy plus the enthalpy of reaction flows back into the bath (figure C3.5.1). Much has been written about 
dynamical effects of VER on barrier crossings [8, 9 and 10]. Chemical reactions are slower when VER is too slow, 
due to multiple barrier recrossings. Reactions are faster when VER is too fast, due to VER during the barrier 
crossing. The 'Kramers turnover' between these two regimes is located at the maximum possible rate, which is also 
that given by transition-state theory [8, 9] (figure C3.5.1). The VER rate is varied in practice by pressure-tuning 
solvent density, as in classic studies of photoisomerization of stilbene [ 11 , 12 ] and boat-chair isomerization of 
cyclohexane [13, 14]. 

m 
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Figure C3.5.1. (a) Vibrational energy catalyses chemical reactions. The reactant R is activated by taking up the 
enthalpy of activation ^ff* from the bath. That energy plus the heat of reaction is returned to the bath after barrier 

crossing, (b) VER influences chemical reaction rates by modulating the system during barrier crossing. For a 
particular VER rate, the reaction rate has a maximum at the Kramers turnover. 
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C3.5.2 BRIEF HISTORY OF VER 


Condensed phase vibrational or vibronic lineshapes (vibronic transitions create vibrational excitations of electronic 
excited states) rarely provide information about VER (see example C3.5.6.4 ). Experimental measurements of VER 
need much more than just the vibrational spectrum. The earliest VER measurements in condensed phases were 
ultrasonic attenuation studies of liquids [15], which provided an overall relaxation time for slowly (>10 ns) relaxing 
small molecule liquids. 

Lasers revolutionized VER measurements, providing the needed time resolution and specific state resolution. Even 
early nanosecond solid-state lasers (not tunable) could produce large vibrational populations in liquids or solids via 
stimulated Raman scattering (SRS) [16]. Early infrared (IR) lasers had limited tuning ranges, but could pump 
certain molecules (e.g. CO laser pumping solid CO [17]). An excellent history of early VER studies of small 
molecules is given by Oxtoby [5]. An amazing result from this era, due to Brueck and Osgood [18], is the 

incredible 56 s VER lifetime of liquid N 2 , showing the average N~ molecule undergoes ~4 x 10 5 oscillations 
before VER. 

Other early work, which continues to this day, involved vibronic relaxation [6] of large colored molecules such as 
chrysene [19], pyrene [20] and perylene [21], due to the relative ease of using visible or near-UV light to pump and 
probe these systems (see example C3.5.6.5 below). 

Major breakthroughs in early ultrafast VER measurements were made in 1972 by Laubereau et al [22], who used 
picosecond lasers in an SRS pump-incoherent anti-Stokes Raman probe configuration, to study VER of C-H 

groups in ethanol and methanol (~ 3000 cm"), and by Alfano and Shapiro [23], who monitored both the decay of 
the initially excited C-H stretch excitation and the appearance and subsequent decay of a daughter vibration, a C-H 

bending vibration (~ 1460 cm -1 ). Several reviews described these early studies of liquids [1, 6, 16 ]. 

Another important breakthrough occurred with the 1974 development by Laubereau et al [24] of tunable ultrafast 
IR pulse generation. IR excitation is more selective and reliable than SRS, and IR can be used in pump-probe 
experiments or combined with anti-Stokes Raman probing (IR-Raman method) [16] Ultrashort IR pulses have 
been used to study simple liquids and solids, complex liquids, glasses, polymers and even biological systems. 


C3.5.3 OVERVIEW OF VER PHENOMENA 

C.3.5.3.1 SIMPLE (DIATOMIC) SYSTEMS 

The simplest condensed phase VER system is a dilute solution of a diatomic in an atomic (e.g. Ar or Xe) liquid or 
crystal. Other simple systems include neat diatomic liquids or crystals, or a diatomic molecule bound to a surface. 
A major step up in complexity occurs with polyatomics, with several vibrations on the same molecule. This feature 
guarantees enormous qualitative differences between diatomic and polyatomic VER, and casts doubt on the 
likelihood of understanding polyatomics by studying diatomics alone. 
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Diatomic molecules have only one vibrational mode, but VER mechanisms are paradoxically quite complex (see 
examples C3.5.6.1 and C3.5.6.2 ). Consequently there is an enormous variability in VER lifetimes, which may 
range from 56 s (liquid N 2 [18]) to 1 ps (e.g. XeF in Ar [25]), and a high level of sensitivity to environment. A 
remarkable feature of simpler systems is spontaneous concentration and localization of vibrational energy due to 
anharmonicity. Collisional up-pumping processes such as 

2CO(u = 1) -* COiv = Q> + CO(y = 2) +23 cm" 1 

proceed spontaneously with a decrease in enthalpy and a decrease in entropy. For very long-lived vibrations 
equilibrium may be reached with quite highly excited vibrational states. Anex and Ewing [26] have observed 
collisional up-pumping of CO up to v = 20 for CO in liquid N ? , and high overtone emission was observed in a 


monolayer of CO on NaCl by Chang and Ewing [27]. 

VER of diatomic molecules bound to surfaces [28] was first studied by Heilweil and co-workers [29], who used 

ultrafast IR to find a surprisingly long VER lifetime of 150 ps for OH stretching vibrations (~ 3650 cm") on the 
surface of silica [30]. Guyot-Sionnest et al [ 31 ] found nanosecond lifetimes for H chemisorbed to Si(l 11) surfaces 

(~ 4000 cm"). Surface-bound diatomics have much shorter lifetimes on metals than on dielectrics or 
semiconductors, due to interactions with conduction electrons [32]. For example, when CO is bound to NaCl (100), 
its VER lifetime is in the millisecond range [27], but when CO is bound to Pt [33], its lifetime is a few picoseconds. 

C.3.5.3.2 COMPLEX (POLYATOMIC) SYSTEMS 

In polyatomics, a completely different VER process may occur, termed 'ladder relaxation', where energy is 
transferred from one excited vibrational mode to another mode on the same molecule (the 'rungs' of the 
'vibrational ladder'), while the bath takes up the remaining energy [34, 35 ]. Vibrational energy running down this 
ladder is termed a 'vibrational cascade'. Ladder processes are so efficient that VER lifetimes of polyatomics in 
condensed phases hardly ever exceed one nanosecond. Due to short VER lifetimes in polyatomics, intermolecular 
energy transfer is thought to be noncompetitive with intramolecular ladder processes. However, there are not many 
data yet, and intermolecular transfer has been directly observed in a few studies. Apkarian and co-workers 
observed energy transfer among CH 3 F molecules in a rare-gas matrix, resulting in vibrational up-pumping to v = 2 
[36]. Ambroseo and Hochstrasser [37] observed energy transfer from pyrrole to benzene. Hong et al [38] observed 
energy transfer from OH stretch of alcohols to C-H stretch of nitromethane. 

The most powerful technique for studying VER in polyatomic molecules is the IR-Raman method. Initial IR- 
Raman studies of a few systems appeared more than 20 years ago [16], but recently the technique has taken on new 
life with newer ultrafast lasers such as Ti: sapphire [39]. With more sensitive IR-Raman systems based on these 
lasers, it has become possible to monitor VER by probing virtually every vibration of a polyatomic molecule, as 
illustrated by recent studies of chloroform [40], acetonitrile [ 41 , 42 ] (see example C3.5.6.6 below) and 
nitromethane [39, 43 ]. 

There does not yet exist a simple theory, analogous say to Marcus theory for electron transfer, which predicts VER 
rates of polyatomic molecular vibrations, or how those rates depend on vibrational frequency. A simple rule was 
proposed by Nitzan and Jortner [35]. Those authors identified three frequency regimes for VER, a lower-energy 
regime I where short-lived vibrations decay directly into the bath, an intermediate regime II where longer-lived 
vibrations undergo ladder relaxation and a higher-energy regime III where vibrations undergo ultrafast 
intramolecular 
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vibrational relaxation. This hardly rigorous rule has been observed (always with numerous exceptions) in several 
systems (see examples C3.5.6.5 and C3.5.6.6 ). 

One simplifying motif for VER studies of polyatomic molecules is to probe a diatomic ligand bound to a more 
complex system. CO is a ubiquitous ligand and many studies have been made of VER of CO ligands, e.g. W(CO) 6 
in CC1 4 , where the VER lifetime is 280 ps but quite dependent on solvent [44, 44] or CO bound to 
metalloporphyrins [46, 47, 4§ and 49] where the VER lifetime is -20 ps and dependent on haem structure. CO has 
also been used to study VER of biomolecules, specifically haem proteins. For myoglobin and haemoglobin at 300 
K, the CO VER lifetime was -20 ps [46, 47, 5Q]. A study of CO bound to native and genetically engineered haem 
proteins showed that vibrational energy flows from CO to haem via Ti-electron coupling, and that VER rate could 
be influenced by the protein [50]. Recently, VER measurements have been made of proteins themselves. The 

-1600 cm" 1 amide I stretching vibrations of several proteins have been found to have an -1 ps lifetime [51, 52 and 
53]. 


C3.5.4 THEORY OF VIBRATIONAL ENERGY RELAXATION 


Consider an excited condensed-phase quantum oscillator Q, with reduced mass jli and normal coordinate q Q . The 
bath exerts fluctuating forces on the oscillator. These fluctuating forces induce VER. The quantum mechanical 
Hamiltonian is [54, 55] 


H = HfiUiti) + HniQ) + Viqs, Q) (C3.5.1) 

where ff S2 (^ n )is the Hamiltonian for the oscillator Q, // rt f Q)is the Hamiltonian for the bath, where Q represents 
collective bath coordinates, and ^represents the Hamiltonian for interaction between Q and the bath. In solids these 
collective bath states are phonons, which extend from zero frequency to a cut-off frequency, the Debye frequency, 
co D . In liquids, the collective states have been termed instantaneous normal modes [56, 57 and 58]. Since these play 
the same role as phonons in VER [34, 56, 59, 60], we will call them liquid phonons or simply phonons. 

The fluctuating forces F(t) on the rigid oscillator Q are characterized by a time-dependent force-force correlation 
function [54, 55 ], 

(F(*)F(0)) Q = — (C3.5.2) 

where 

is the Heisenberg operator for the fluctuating forces and Tr denotes trace. 
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Equation (C3.5.2) is a function of bath coordinates only. The VER rate constant is proportional to the Fourier 
transform, at the oscillator frequency Q, of the bath force-correlation function. This Fourier transform is 
proportional as well to the frequency-dependent friction r|(Q) mentioned previously. For example, the rate constant 
for VER of the fundamental (v = 1) to the ground (v = 0) state of an oscillator with frequency Q is [ 54 ] 

*i^o = t-^tt f dre ift, {^(O^(0)) o . (C3.5.3) 

Equation (C3.5.3) shows the VER lifetime can be determined if the quantum mechanical force-correlation function 
is computed. However, it is at present impossible to compute this function accurately for complex systems. It is 
straightforward to compute the classical force-correlation function using classical molecular dynamics (MD) 
simulations. With the classical force-correlation function, a 'quantum correction' factor Q is needed 5, 

A'i- fl = T ^-rf dt^'C(r) (C3.5.4) 

where C (t) is the classical force-force correlation function C (t) = (F (t) F (0) ). For a harmonic bath, Bader and 
Berne [61] give the exact quantum correction, 


hQ/kT 


Q = Wrv- (C3.5.5) 


Most realistic problems involve an anharmonic bath. How to determine Q for an arbitrary anharmonic bath is not 
yet known. Other correction methods have been discussed, including the method proposed by Egelstaff [62]. 

In diatomic VER, the frequency Q is often much greater than co D , so VER requires a high-order multiphonon 
process (see example C3.5.6.1 ). Because polyatomic molecules have several vibrations ranging from higher to 
lower frequencies, only lower-order phonon processes are ordinarily needed [34]. The usual practice is to expand 
the interaction Hamiltonian I'^/o. £>)in equation (C3.5.2) in powers of normal coordinates [34, 63], 


fiv I x-^ n~v 


\Q 01 
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~ir S ^ -an i/i -i/i e-C?y<?iGi + "^ (C3.5.6) 

For polyatomics, ordinarily only the last two terms of equation (C3.5.6), the cubic and quartic anharmonic terms, 
need be considered [34]. In a cubic anharmonic process, excited vibration Q relaxes by interacting with two other 
states, say another vibration co and one phonon (or alternatively two phonons). In the quartic process, Q relaxes by 
interacting with three other states, say two vibrations and one phonon. The total rate constant for energy loss from 
Q for cubic 
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coupling was given by Fayer and co-workers as [34] 

K& — J^[/J^( 1 + Hft+u) jOft+u^Q+tu + ( 1 + fluXur + A |fi+aj|)/ta-u£ , |fi-a|] (C3.5.7) 

where n^ is the thermal occupation number, 

B* = (e**'* r - I)" 1 (C3.5.8) 

p w is the density of phonon states at co, C^ is a product of coupling constants which contains factors such as fi/2juco 

and the derivatives of Fin equation (C3.5.6) and a = 1 if Q > co or a = if Q < co. When T-> 0, all the thermal 
occupation factors in equation (C3.5.7) vanish, but the VER rate does not vanish. VER is then said to occur via 
spontaneous emission of phonons. As Tis increased, two new thermally activated processes turn on. One involves 
stimulated phonon emission and the other phonon absorption. Spontaneous and stimulated emission processes 
convert Q to a lower-energy vibration co (down-conversion). Phonon absorption converts Q to a higher-energy 
vibration co (up-conversion). Figure C3.5.2 from Kenkre et al [ 34 ] shows all the possible VER processes which can 
occur via cubic or quartic anharmonic coupling. 
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Figure C3.5.2. VER transitions involved in the decay of vibration Q by cubic and quartic anharmonic coupling 
(from [34]). Transitions involving discrete vibrations are represented by arrows. Transitions involving phonons 
(continuous energy states) are represented by wiggly arrows. In (a), the transition denoted (i) is the ladder down- 
conversion process, where Q is annihilated and a lower-energy vibration co A and a phonon co r are created. 
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C3.5.5 EXPERIMENTAL TECHNIQUES 

Experimental techniques have been reviewed extensively, e.g. [6]. Only a brief discussion will be presented here. 
C3.5.5.1 CREATING VIBRATIONAL EXCITATIONS 


The easiest method for creating many vibrational excitations is to use convenient pulsed visible or near-UV lasers 
to pump electronic transitions of molecules which undergo fast nonradiative processes such as internal conversion 
(e.g. porphyrin [64, 65] or near-IR dyes [66, 67, 6§ and 69]), photoisomerization (e.g. stilbene [ 12 ] or 
photodissociation (e.g. Hgl 2 [8]). Creating a specific vibrational excitation Q in a controlled way requires more 
finesse. The easiest method is to use visible or near-UV pulses to resonantly pump a vibronic transition (e.g. 
S]J -* S^, where S^ denotes an electronic singlet state) of a coloured molecule [6]. Vibronic relaxation may be 

complicated by the presence of multiple electronic states (see example C3.5.6.1 ). Nonresonant pumping using 
visible or near-IR pulses can pump an ^} vibration by stimulated Raman scattering (SRS) [70]. SRS excites a 
specific vibration (with the largest Raman cross-section) but one cannot select which vibration that will be and the 
high intensities needed for SRS often produce unwanted parasitic optical effects. Resonant mid-IR pumping of 
specific vibrational transitions (S]j -* Sf) seems to produce fewer parasitic effects and has become increasingly 

popular, given recent improvements in tunable mid-IR pulse generation. 


C3.5.5.2 PROBING VIBRATIONAL EXCITATIONS 

Vibronic excitations are relatively easy to probe using resonant visible or near-UV processes such as vibronic 
absorption or vibronic fluorescence. Here we will concentrate mainly on probing vibrational excitations produced 
by mid-IR pumping. Two powerful but technically difficult techniques which are becoming increasingly important 
are probing by IR absorption (two-colour IR pump-probe method) or probing by incoherent anti-Stokes Raman 
scattering with a nonresonant pulse (IR-Raman method). 

C3.5.5.3 EXPERIMENTAL TECHNIQUES 


Schematic diagrams of modern experimental apparatus used for IR pump-probe by Fayer and co-workers [50] and 
for IR-Raman experiments by Dlott and co-workers [39] are shown in figure C3.5.3 . Ultrafast mid-IR pulse 
generation by optical parametric amplification (OP A) [ 71 ] will not discussed here. Single-colour IR pump-probe or 
vibrational echo experiments have been performed with OP As or free-electron lasers. Free-electron lasers use 


relativistic electron bunches travelling through periodic magnetic fields to generate tunable light pulses [72]. Two- 
colour IR pump-probe experiments use a pair of OP As or one OPA and a continuous probe laser and a high-speed 
optical detection scheme [73]. In pump-probe experiments, the modulation of a weaker probe pulse by the pump 
pulse is monitored with an IR detector. Vibrational echo experiments (see example C3.5.6.4 ) use essentially the 
same apparatus, but instead two intense pulses are directed to the sample and a third pulse (the echo), emitted by 
the sample in a unique direction, is detected. 
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Figure C3.5.3. Schematic diagram of apparatus used for (a) IR pump-probe or vibrational echo spectroscopy by 
Fayer and co-workers [50] and (b) IR-Raman spectroscopy by Dlott and co-workers [39]. Key: OPA = optical 
parametric amplifier; FEL = free-electron laser; MOD = high speed optical modulator; PMT = photomultiplier; 
OMA = optical multichannel analyser. 

For IR-Raman experiments, a mid-IR pump pulse from an OPA and a visible Raman probe pulse are used. The 
Raman probe is generated either by frequency doubling a solid-state laser which pumps the OPA [16], or by a two- 
colour OPA [39]. Transient anti-Stokes emission is detected with a monochromator and photomultiplier [39], or a 
spectrograph and optical multichannel analyser [40]. 

C3.5.5.4 INTERPRETING EXPERIMENTAL MEASUREMENTS 

Most molecular vibrations are well described as harmonic oscillators with small anharmonic perturbations [5]. For 
an harmonic oscillator, all single-quantum transitions have the same frequency, and the intensity of single-quantum 
transitions increases linearly with quantum number v. For the usual anharmonic oscillator, the single-quantum 
transition frequency decreases as v increases. Ultrashort pulses have a non-negligible frequency bandwidth. For a 1 

ps duration pulse, the bandwidth is at minimum -15 cm -1 ; for a 100 fs pulse -150 cm -1 . We need to consider two 
cases, narrow-band pulses (bandwidth less than the anharmonicity) or broad-band pulses (bandwidth greater than 
the anharmonicity). 


Consider narrow-band pulses pumping and probing the fundamental transition of an anharmonic oscillator Q. The 
pulse will not be resonant with overtone transitions, so the oscillator is viewed as a two-level system. A two-level 
system can be saturated, so a pump-probe experiment can measure the VER rate by pumping the system and 
measuring the absorption recovery rate [45]. There is, however, an interpretation problem. It is difficult to 
distinguish between decay of Q with direct repopulation of the ground state, and decay of Q with population of a 
different vibration co by the ladder process Q — » co [74]. What the probe sees depends on the frequency shift of the 
Q fundamental transition when co is excited, caused by QsiQ;,,and other terms in equation (C3.5.6) . If this shift is 
too small for the probe to resolve, then energy transfer from Q to co causes the Q absorption to recover with decay 
rate constant Kq from equation (C3.5.7) . If the shift is larger, absorption does not recover with rate constant K^ 9 
but instead absorption recovery involves both decay constants K^ and K^. A two-colour pump-probe experiment 
greatly alleviates this problem, since a broadly tunable probe pulse could monitor fundamental and overtone 
transitions of both Q [73] and co transitions. Anti-Stokes Raman probing is perhaps even better, since the Raman 
pulse may simultaneously probe all transitions [40], including transitions of co and Q. 
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With broad-band pulses, pumping and probing processes become more complicated. With a broad-bandwidth pulse 
it is easy to drive fundamental and overtone transitions simultaneously, generating a complicated population 
distribution which depends on details of pulse structure [75]. Broad-band probe pulses may be unable to distinguish 
between fundamental and overtone transitions. For example in IR-Raman experiments with broad-band probe 
pulses, excitation of the first overtone of a transition appears as a fundamental excitation with twice the intensity, 
and excitation of a combination band Q + co appears as excitation of the two fundamentals [76]. 


C3.5.6 VIBRATIONAL RELAXATION EXAMPLES 

C3.5.6.1 DIATOMIC MOLECULES IN RARE GAS CRYSTALS 

In rare gas crystals [ 77 ] and liquids [78], diatomic molecule vibrational and vibronic relaxation have been studied. 
In crystals, VER occurs by multiphonon emission. Everything else held constant, the VER rate should decrease 
exponentially with the number of emitted phonons (exponential gap law) [79, 80]. The number of emitted phonons 
scales as, and should be close to, the ratio Q/co D , where co D is the Debye frequency. A possible complication is the 
perturbation of the local phonon density of states by the diatomic molecule guest [77]. 

Apkarian and co-workers used ultrafast spectroscopy to investigate vibrational relaxation of I 9 (Q = 212 cm") in 
solid Kr (oo^ = 50 cm ) and vibronic relaxation of XeF (Q = 424 cm ) in solid Ar (co D = 65 cm ). I 2 , in a Kr 
lattice at 15K, is photodissociated by a subpicosecond visible pulse and probed by transient absorption [81]. The 
impulsive photodissociation hurls the two I atoms against the walls of a Kr cage. The atoms recoil and then 
recombine to form nascent vibrationally excited I 2 . Figure C3.5.4 [ 81 ] shows the rate of ensemble-averaged 
vibrational energy loss from I 2 . In the first 5 ps, energy loss is fast and stepwise due to individual binary collisions. 
After that time, energy loss from any individual molecule most likely involves a stepwise relaxation, but ensemble 
averaging wipes out the details, so in the longer time region the ensemble averaged energy loss is exponential in 
time with a time constant of 12 ps. This VER process is much faster than VER of liquid 2 discussed below, 
probably because the ratio Q/oe> D - 4 is relatively small. 

Vibronic relaxation of XeF in solid Ar at 25 K was studied by pumping vibronic transitions with a subpicosecond 
UV pulse, and detecting frequency-resolved emission with a fast optical gate [25]. XeF has two sites in Ar, one 

which emits only from the B( £i/ 2 ) state and one which emits only from the C( n 3/2 ) state. Very fast VER was 
observed in the C-emitting site. Excitation near v = 20 results in a return to v = in 13 ps, about the same as I 2 in 

Kr. In the B-emitting states, a slower stepwise relaxation was observed. Figure C3.5.5 shows the possible modes of 

relaxation for B-emitting XeF and some experimentally determined time constants. Although a diatomic in an 

atomic lattice seems to be a simple system, these vibronic relaxation experiments are rather complicated to 

interpret, because of multiple electronic states which are involved due to energy transfer between B and C sites. 
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Figure C3.5.4. Ensemble-averaged loss of energy from vibrationally excited I 2 created by photodissociation and 
subsequent recombination in solid Kr, from [81]. The inset shows calculated transient absorption (pump-probe) 
signals for inner turning points at 3.5, 3.4 or 3.3 A. 
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Figure C3.5.5. Vibronic relaxation time constants for B- and C-state emitting sites of XeF in solid Ar for different 
vibrational quantum numbers v, from [25]. Vibronic energy relaxation is complicated by electronic crossings 
caused by energy transfer between sites. 


C3.5.6.2 LIQUID OXYGEN 
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The VER lifetime of liquid 2 is 280 ms at 70 K [82]. Predicting such a slow rate from theory is a formidable 
challenge taken up by Skinner and co-workers [54] in a recent paper. The fundamental frequency of 2 is 1552 

cm -1 , whereas the maximum characteristic frequency for motion in the liquid (analogous to co D ) is - 50 cm -1 , so 
an extremely high-order multiphonon process (Q/co D - 30) is needed for VER [54]. 

VER in liquid 2 is far too slow to be studied directly by nonequilibrium simulations. The force-correlation 
function, equation (C3.5.2) , was computed from an equilibrium simulation of rigid 2 - The VER rate constant 
given in equation (C3.5.3) is proportional to the Fourier transform of the force-correlation function at the ? 
frequency. However, there are two significant practical difficulties. First, the Fourier transform, denoted C(Q)in 


[ 54 ] (C(Q) is not an operator but rather a classical mechanical function), is needed at a frequency Q » co D where its 

value is very small. It is difficult to compute C( £2) accurately at large Q given the statistical noise in the simulation. 
Second, the simulation uses classical mechanics, and a quantum correction factor defined in equation (C3.5.4) is 
needed. The first problem is alleviated using the Weiner-Khintchine theorem [54]. Instead of Fourier transforming 
the correlation function over the time range of -oo to +oo, it is more accurate to Fourier transform the fluctuating 
force itself over a finite time range of-x to +t. 


J4 


s . Above Q = 


The computed Fourier transform is shown in figure C3.5.6 . The 9 frequency is Q = 2.925 x 10 

1 A 1 ~ 

0. 8 x 10s, is very noisy. The result from the Weiner-Khintchine theorem in figure C3.5.6 is believed to be 
accurate up to Q = 1.5xl0 14 s _1 . In order to extend the result to the needed high frequency, an ansatz was made 
that the correlation function C(t) must have the form 


C(r) = Co 


cos(hi ) 
cosh{ai) 


(C3.5.9) 


where the coefficients a and b are determined so that equation (C3.5.9) has the same short-time expansion, through 

order t , as the exact correlation function [54]. By Fourier transforming equation (C3.5.9) analytically, a result for 
the VER rate can be determined, 


M-M> = 


n QC$ cosh (tt Qfla) coshixhfla) 
n7iRa[ca$li{jrR/ii) + cosh(j£hfa)] 


(C3.5.10) 


Figure C3.5.6 compares the result of this ansatz to the numerical result from the Wiener-Khintchine theorem. They 
agree well and the ansatz exhibits the expected exponential energy-gap law (VER rate decreases exponentially with 
Q). The ansatz was used to determine the VER rate with no quantum correction (Q = 1), with the Bader-Berne 
harmonic correction [ 61 ] and with a correction based [83, 84] on Egelstaff s method [62]. The Egelstaff corrected 
results were within a factor of five of experiment, whereas other corrections were off by orders of magnitude. This 
calculation represents the present state of the art in computing VER rates in such difficult systems, inasmuch as the 
authors used only a model potential and no adjustable parameters. However the ansatz procedure is clearly not 
extendible to polyatomic molecules or to diatomic molecules in polyatomic solvents. 
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Figure C3.5.6. The computed Fourier transform at frequency G) , C(<*0, of the classical mechanical force-force 
correlation function for liquid 2 at 70 K from [54]. The VER rate is proportional to the value of C(«j)at the 2 

vibrational frequency 2.925><10 14 s -1 , multiplied by the quantum correction Q. The solid line is obtained from 
direct Fourier transformation of the simulation. The thick solid line is obtained from the Wiener-Kjhintchine 
theorem, and the dashed line is an ansatz proposed by the authors. 

C3.5.6.3 SMALL-MOLECULE REACTION DYNAMICS IN SOLUTION 

Chemical reaction dynamics is an attempt to understand chemical reactions at the level of individual quantum 
states. Much work has been done on isolated molecules in molecular beams, but it is unlikely that this information 
can be used to understand condensed phase chemistry at the same level [§]. In a bath, the reacting solute's potential 
energy surface is altered by both dynamic and static effects. The static effect is characterized by a potential of mean 
force. The dynamical effects are characterized by the force-correlation function or the frequency-dependent friction 


Photodissociation of a linear triatomic such as ^[85, 86] or Hgl 2 [8] to produce a vibrationally excited diatomic, or 
cage recombination of a photodissociated diatomic such as I 2 [78 , 81] are classic model simple systems for reaction 
dynamics. Here we discuss the HgI 2 ^HgI + I reaction studied by Hochstrasser and co-workers [87, 88 and 89 ]. 
The important issues are how energy is partitioned, the degree of coherence in the formation of the product and 
whether the reaction is adiabatic (solvent easily follows reactant motions) or nonadiabatic [8]. A nonadiabatic 
theory would be much more complicated. 

Pumping Hgl 2 in ethanol with a femtosecond UV pulse causes impulsive photodissociation, producing Hgl with 
average vibrational quantum number v = 15 [^§]. There are several possibilities for nascent Hgl, as depicted in 
figure C3.5.7 [8]. The smooth Gaussian function represents the ensemble-average of Hgl vibrational displacements 
(vibrational wavepacket). In possibility A, there is no VER and no vibrational dephasing. All Hgl fragments simply 
oscillate coherently. In B, VER is faster than a vibrational period, so Hgl loses energy before vibrating even once. 
An 
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ensemble of coherently vibrating ground state Hgl results. In C, the VER rate is comparable to a vibrational period, 
so there will be some coherence in the ground state. In D, dephasing (caused by quartic anharmonic coupling to a 
dynamic bath) is faster than VER, so the wavepacket spreads (dephases) much faster than it loses energy. 






Figure C3.5.7. Possible modes of vibrational wavepacket (smooth Gaussian curve) motion for a highly 
vibrationally excited diatomic molecule produced by photodissociation of a linear triatomic such as Hgl ? , from [8]. 


(A) no VER and no dephasing. (B) VER is faster than a vibrational period. Once the vibrational ground state is 
reached, the wavepacket begins to oscillate coherently. (C) The VER rate is comparable to a vibrational period so 
some coherence is seen in both excited and ground states. (D) Dephasing is faster than VER. 

In Hgl, possibility C is the best description [§]. The dephasing time constant is -150 fs and the overall time for 
vibrational cooling is -200 fs. Thus coherence is seen in the vibrational excited states, and in the ground state as 
well. A molecular dynamics simulation of rigid Hgl in ethanol was used to understand the VER mechanism [90]. 
The computed frequency-dependent friction is shown in figure C3.5.8 [90]. Notice this function is much more 
complicated than in liquid 2 ( figure C3.5.6 ), and an exponential gap law is not observed. The simulation results 

were used to conclude VER at the 130 cm frequency of Hgl was dominated by Lennard- Jones interactions 
between solute and solvent. The solvent nuclear response is a few times faster than the vibrational period, so the 
Hgl-ethanol system is close to the adiabatic limit. 
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Figure C3.5.8. Computed frequency-dependent friction (inversely proportional to the VER lifetime 7^) from a 
classical molecular dynamics simulation of rigid Hgl molecules in ethanol solution, from [90]. The Hgl vibrational 

frequency is 125 cm" . The Lennard- Jones contribution to the friction dominates the Coulombic contribution at 
that frequency. 

C3.5.6.4 TEMPERATURE DEPENDENCE OF VIBRATIONAL LINESHAPES 

VER rates cannot be predicted from vibrational lineshapes alone [5] except for a few exceptional cases [2, 91]. The 
most detailed deconstruction of vibrational lineshapes to date is the work of Fayer and co-workers [92], who 
studied W(CO) 6 in glass-forming liquids including 2-methyl pentane (2MP; T = 80 K). The triply degenerate 

asymmetric C=0 stretch of W(CO)^(Q -1980 cm") has a VER lifetime which is solvent and temperature 
dependent. In 2-MP at 300 K, the lifetime is -150 ps. 


Infrared absorption was used to measure the total lineshape. Vibrational echo spectroscopy (the vibrational 
analogue of spin echoes in magnetic resonance [93]) was used to remove inhomogeneous broadening, to reveal the 
underlying homogeneous lineshape. One-color pump-probe with magic-angle polarization was used to measure the 
VER lifetime. Pump-probe polarization anisotropy was used to measure the orientational relaxation rate. 


The vibrational echo experiments yielded exponential decays at all temperatures. The Fourier-transform of the echo 
decay gives the homogeneous lineshape, in this case Lorentzian. The echo decay time constant is 4T 2 , where T 2 is 
the vibrational dephasing time constant, and the corresponding homogeneous linewidth r hom = lnT 2 . At low 
temperature (-10 K) in 2-MP glass, the total linewidth is r tot = 300 GHz (30 GHz = 1 cm -1 ), whereas r hom -1.5 


GHz. 
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Therefore, the absorption line is massively inhomogeneously broadened at low temperature. An inhomogeneous 
lineshape can be used to determine the static or quasistatic frequency spread of oscillators due to a distribution of 
environments, but it provides no dynamical information whatsoever [94, 95]. As Tis increased to 300 K, the 
absorption linewidth T t t decreases and r hom increases. At 300 K, the lineshape is nearly homogeneously 
broadened and dominated by vibrational dephasing, because fast dephasing wipes out effects of inhomogeneous 
environments, a well known phenomenon termed 'motional narrowing' [95]. 

The homogeneous linewidth r hom can be subdivided into distinctly different contributions, 

where T 2 is the time constant for 'pure dephasing' processes which modulate only the oscillator phase, and T or is 
the contribution from orientational relaxation [92]. Orientational relaxation refers either to molecular rotation 
(impossible below T ) or interconversion among nearly degenerate C= stretching transitions of different symmetry 
[96]. The parameters T 2 , T^ and T or are measured from echo and pump-probe experiments, and F 2 is computed by 

substituting those parameters into this equation. 

Figure C3.5.9 shows all contributions to the homogeneous linewidth of W(CO) 6 in 2-MP. r* increases from 
about 1.5 to 100 GHz from 10 to 300 K. Pure dephasing and orientational contributions to r hom vanish as T ^> 0, 
leaving only the VER contribution. As T is increased, the VER lifetime decreases slightly with increasing 
temperature. This counterintuitive temperature dependence is not seen explicitly in equation (C3.5.7) . However, a 
thermal slowdown of VER can happen if the C m or p n factors decrease with increasing Tand the occupation 
numbers nJJ) involve only high frequencies hto £> k/"[34]. The orientational contribution increases with T, but it 

never contributes much to the total lineshape. At -50 K the homogeneous line undergoes a transition from VER 
dominance to pure dephasing dominance. Thus the VER contribution to the lineshape is hidden at lower 
temperatures by inhomogeneous broadening and, at higher temperatures, by pure dephasing. 
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Figure C3.5.9. Contributions to the homogeneous linewidth of a C=0 stretching transition of W(CO) 6 (Q - 2000 

cm") in a glass-forming liquid, 2-methyl pentane, from [92]. The total homogeneous linewidth T h is measured 
with vibrational echo spectroscopy, which removes inhomogeneous broadening. The VER lifetime!^ and 

orientational relaxation contribution T are measured with pump-probe experiments. The pure dephasing time 
constant ; is computed from the other results. VER dominates the homogeneous lineshape at low temperature. 
Pure dephasing dominates at high temperature. 

C3.5.6.5 POLYATOMIC MOLECULES IN LOW-TEMPERATURE CRYSTALS— FREQUENCY DEPENDENCE 

Much of our knowledge of the frequency dependence of VER rates in polyatomic molecules stems from low- 
temperature studies of molecular crystals [2] such as pentacene (PTC; C 22 H 14 ) guest molecules in a crystalline 

naphthalene (N; C in H s ) host. In naphthalene, the phonon cut-off frequency is -180 cm [97]. At low temperature, 
PTC has well resolved vibronic transitions * b, u ~^ °i 'in a convenient wavelength range for picosecond dye lasers 
(560-605 nm). 

Vibronic relaxation of PTC/N was studied by Hesselink and Wiersma [98, 99], who used ultrafast vibronic echoes 
to measure dephasing rates of 16 PTC vibronic transitions at low temperature (1.5 K), where vibronic dephasing is 
dominated by VER. Their results are shown in figure C3.5.10 . The lower-frequency vibrations < 350 cm") have 
shorter lifetimes in the 2 ps range. The mid-frequency vibrations (350 to 1000 cm") have generally longer 
lifetimes up to 40 ps. The higher- frequency vibrations (> 1000 cm") have shorter lifetimes. These results generally 
support the three-regime model proposed by Nitzan and Jortner [35]. Below regime I (<360 cm), vibrations relax 
efficiently by two-phonon emission. Above 360 cm" vibrational lifetimes become longer (regime II) until IVR 
becomes dominant in regime III, because the vibrational density of states becomes large enough (> 10 states cm") 
for efficient IVR. 
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Figure C3.5.10. Frequency-dependent vibronic relaxation data for pentacene (PTC) in naphthalene (N) crystals at 
1.5 K. (a) Vibrational echoes are used to measure VER lifetimes (from [99]). The lifetimes are shorter in regime I, 
longer in regime II, and become shorter again in regime III. (b) Two-colour pump-probe experiments are used to 
measure vibrational cooling (return to the ground state) from [ 102 ]. 

VER measurements of individual levels do not necessarily indicate the overall rate of vibrational energy loss from 
PTC molecules. Vibrational energy loss is a multistep process termed 'vibrational cooling' (VC) if ladder processes 
(vibrational cascades) are important [ 100 ]. Chang and Dlott measured VC rates in PTC/N with two-color pump 
probe experiments [ 101 , 102 ]. The first pulse pumps an SJJ —* S jVibronic transition and the second pulse measures 
the return to S'j'by probing the sfj —?■ S^transition. The return kinetics are nonexponential in time. By fitting the 
data to a VC model, time constants shown in figure C3.5.10 were extracted. The decay of regime I vibrations 
directly repopulates the ground state, so VER and VC rates are identical. The decay of regime II vibrations occur 
by a ladder process, so the VC rate increases higher up the ladder. The decay of regime III vibrations occurs by fast 
IVR. Some of that redistributed energy populates longer-lived regime II states, so in regime III the VC rate levels 
out at about the maximum VC rate in regime II. 

C3. 5.6.6 VER OF A POLYATOMIC LIQUID 

The decay of C-H stretching (and OH and NH) vibrations in liquids has been studied by IR-Raman spectroscopy 
[6]. Early work on ethanol by Alfano and Shapiro [23] indicated that C-H stretch excitations (-3000 cm") 
decayed by populating C-H bending excitations (-1500 cm"). However, until recently it was not known where the 
rest of the energy went, or the subsequent fate of the daughter C-H bending excitations. Dlott and co-workers used 
IR-Raman techniques to monitor the flow of vibrational energy through several polyatomic molecule systems [39, 
41 , 42 ]. Data for one example, acetonitrile (CH 3 =, are shown in figure C3.5.11 [6]. Acetonitrile is a model for 
nonassociated polar polyatomic liquids [ 103 , 104 and 105 ]. 
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Figure C3.5.11. IR-Raman measurements of vibrational energy flow through acetonitrile in a neat liquid at 300 K, 
adapted from [41] . An ultrashort mid-IR pulse pumps the C-H stretch, which decays in 3 ps. Only 1% of the 
energy is transferred to the C=N stretch, which has an 80 ps lifetime. Most of the energy is transferred to the C-H 
bend plus about four quanta of C-C=N bend. The daughter C-H bend vibration relaxes by exciting the C-C stretch. 
The build-up of energy in the C-C=N bend mirrors the build-up of energy in the bath, which continues for about 
250 ps after C-H stretch pumping. 

An ultrashort mid-IR pulse excited a C-H stretching vibration (-3000 cm -1 ) of neat acetonitrile at 300 K. The loss 
of C-H stretching energy occurred in 3 ps. Only 1% of that energy was transferred to the C=N stretch (2250 cm"), 
where it remained for -80 ps. Most of the energy was lost from the C-H stretch by the process, 

C Ilstrcichd' = I) -* C llbcnd{r= I )- - AC C=N bend 

which accounts for the -3 ps rise of excitation of the C - C=N bending vibration (-400 cm"). The C-H bending 
excitation decays in 15 ps, exciting lower-energy C-C stretch, C-C bend and C-=N bending modes. The C-C 
stretch and C-C bend decay with quite long lifetimes, in the 50 ps range. All vibrational energy is dissipated to the 
bath in -250 ps. There is a net temperature increase of -10°C in the bath, which accounts for the long-time-scale 
build-up of the C - C=N bending population. Whereas C-H stretch lifetime measurements indicate a VER lifetime 
of 3 ps, vibrational cooling actually takes -250 ps. Frequency-dependent VER lifetimes in acetonitrile generally 
support the model of Nitzan and Jortner [35], with an exception being the long-lived high-frequency C=N stretch 
[41]. 


C3.5.7 CONCLUDING REMARKS 

Understanding VER in condensed phases has proven difficult. The experiments are hard. The structurally simple 
systems (diatomic molecules) involve complicated relaxation mechanisms. The structures of polyatomic molecules 
are obviously more complex, but polyatomic systems are tractable because the VER mechanisms are somewhat 
simpler. 
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There are encouraging signs that condensed-phase VER is an important fundamental problem which is due for a 
major breakthrough. Theoreticians are finally having success in predicting VER rates of simple systems, and have 
begun to understand the multiple roles of VER in chemical reaction dynamics. Experimental technology has 
improved a great deal. As a result, we are beginning to accumulate data on several different molecular liquids and 
solids, vibrational energy flowing through a polyatomic molecule can be monitored in real time — even protein 
VER can be measured — and the first direct observations of chemical reaction dynamics in solution have been 


reported. Nevertheless, much remains to be done in the areas of predicting and understanding VER rates, 
developing simple but robust conceptual frameworks and incorporating our emerging understanding of VER into 
theories of chemical reaction dynamics, electron transfer, protein dynamics and other condensed-phase dynamical 
processes. 
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C3.6 Chaos and complexity in chemical systems 

Raymond Kapral and Simon J Fraser 


C3.6.1 INTRODUCTION 

Complex chemical mechanisms are written as sequences of elementary steps satisfying detailed balance where the 
forward and reverse reaction rates are equal at equilibrium. The laws of mass action kinetics are applied to each 
reaction step to write the overall rate law for the reaction. The form of chemical kinetic rate laws constructed in this 
manner ensures that the system will relax to a unique equilibrium state which can be characterized using the laws 
of thermodynamics. 

Most chemically reacting systems that we encounter are not thermodynamically controlled since reactions are often 
carried out under non-equilibrium conditions where flows of matter or energy prevent the system from relaxing to 
equilibrium. Almost all biochemical reactions in living systems are of this type as are industrial processes carried 
out in open chemical reactors. In addition, the transient dynamics of closed systems may occur on long time scales 
and resemble the sustained behaviour of systems in non-equilibrium conditions. A reacting system may behave in 
unusual ways: there may be more than one stable steady state, the system may oscillate, sometimes with a 
complicated pattern of oscillations, or even show chaotic variations of chemical concentrations. 

Analogous considerations apply to spatially distributed reacting media where diffusion is the only mechanism for 
mixing chemical species. Under equilibrium conditions any inhomogeneity in the system will be removed by 
diffusion and the system will relax to a state where chemical concentrations are uniform throughout the medium. 
However, under non-equilibrium conditions chemical patterns can form. These patterns may be regular, stationary 
variations of high and low chemical concentrations in space or may take the form of time-dependent structures 
where chemical concentrations vary in both space and time with complex or chaotic forms. 

In this chapter we shall examine how such temporal and spatial structures arise in far-from-equilibrium chemical 
systems. We first examine spatially uniform systems and develop the theoretical tools needed to analyse the 
behaviour of systems driven far from chemical equilibrium. We focus especially on the nature of chemical chaos, 
its characterization and the mechanisms for its onset. We then turn to spatially distributed systems and describe 
how regular and chaotic chemical patterns can form as a result of the interplay between reaction and diffusion. 

This account is not exhaustive but provides a guide to the main theoretical ideas and experimental methods that 
have emerged in this subject. Fuller accounts and broad background are given in recent books devoted to this topic 
[1,2, 3, 4 and 5]. 


C3.6.2 CHEMICAL REACTIONS AS DYNAMICAL SYSTEMS 

Consider a spatially homogeneous reacting mixture where concentration gradients are removed by stirring or rapid 
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diffusion of the chemical species. In this circumstance the instantaneous state of the system is described by a vector 
set of chemical concentrations for the n chemical species, c (t) = (c 1 (f), c 2 (t) 9 . . ., c n (t)) 9 whose evolution is 
specified by the ordinary differential equations (ODEs) of mass action kinetics 


—^- = Rto(c(t):k). (C3.6.1) 

Here R M (c(t);k) is a vector of reaction velocities which are usually nonlinear functions of the chemical 
concentrations whose form is determined by the reaction mechanism. The reaction velocities also depend on the 
chemical rate constants, collectively described by the vector k=(k^ k 2 ,. . •)> for the steps in the reaction mechanism. 

For a closed chemical system with a mass action rate law satisfying detailed balance these kinetic equations have a 
unique stable (thermodynamic) equilibrium, \im t ^ aD c(t)=c . In general, however, we shall be concerned with 
chemical reactions that are maintained far from chemical equilibrium by flows of reagents intoand out of a 
continuously stirred tank reactor (CSTR). In this case the chemical kinetic equation (C3.6.1) must be supplemented 
with flow terms 

— ^ = RmW/KH - *rfc(f) -<*) = liMt): V) (C3.6.2) 

at 

where fc f is the flow rate constant and cv the vector of the feed concentrations. We have denoted the reaction rate in 
this general non-equilibrium case by R(c(t);\i) with ju as symbol for the collection of all parameters that 
characterize R: rate constants, feed concentrations and flow rates. Suppose the flow terms increase from zero; then 
this open system's stable state moves from the thermodynamic equilibrium, c to a nearby, non-equilibrium 
steady state, c , on the so-called thermodynamic branch. This non-equilibrium stable state of the system is the 
solution, c=c s , of Rc;\i)=0. However, if the flow terms become sufficiently large, this steady state becomes unstable 
and is replaced by new, non-equilibrium states characteristic of this well stirred system. (Transients mimicking the 
behaviour of the non-equilibrium states can be observed in closed reactors starting from initial conditions that are 
far from equilibrium.) 

It is convenient to analyse these rate equations from a dynamical systems point of view similar to that used in 
classical mechanics where one follows the trajectories of particles in phase space. For the chemical rate law 
(C3.6.2) the 'phase space', conventionally denoted by T, is ^-dimensional and the chemical concentrations, c^c^. . 
.,c , are taken as orthogonal coordinates of T, rather than the particle positions and velocities used as the 
coordinates in mechanics. In analogy to classical mechanical systems, as the concentrations evolve in time they 
will trace out a trajectory in T. Since the velocity functions in the system of ODEs (C3.6.2) do not depend 
explicitly on time, a given initial condition in T will always produce the same trajectory. The vector/? of velocity 
functions in (C3.6.2) defines a phase-space (or trajectory) flow and in it is often convenient to think of these ODEs 
as describing the motion of a fluid in T with velocity field R(c;\i). 

C3.6.2.1 CHEMICAL ATTRACTORS 


Because of the underlying dissipative nature of the chemical systems that the ODEs (C3.6.2) represent, they have 
another important property: any volume in T will shrink as it evolves. For a given set of initial chemical 
concentrations the time evolution under the chemical rate law will approach arbitrarily close to some final set of 
points in 
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T after transients have decayed. This final set of phase-space points is the attr actor, and the set of all initial 
conditions that eventually reaches the attractor is called its basin of attraction. 

Attractors can be simple time-independent states (points in T), limit cycles (simple closed loops in T) 
corresponding to oscillatory variations of the chemical concentrations with a single amplitude, or chaotic states 
(complicated trajectories in T) corresponding to aperiodic variations of the chemical concentrations. To illustrate 


the representation of chemical dynamics in concentration phase space and the existence of chemical attractors, we 
consider the Willamowski-Roossler (WR) model chemical system based on the following reaction mechanism [6]: 


/li +X L ^2X| X\ +X 2 ^2X 2 A 5 + X 2 ^A2 

i_i ft-? A_> 

X\ +X^A^ A 4 + X } ^2Xy. 


(C3.6.3) 


The species A^ 2 ,. . .^i 5 are pool chemicals whose concentrations are assumed to be fixed by flows of reagents 
into and out of the reactor while X^X 2 andX 3 are the species whose concentrations vary with time. For mechanism 
(C3.6.3) the mass action rate law is the system of ODEs 


— — = A|<-A|£| — ft— | CTf — f(2C]C2 +k-2C± — k*C\Ci + k-4 
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— = hUC 2 - A_2cf - **CAgC2 + fc-.tt'A, 

do ? 


(C3.6.4) 


For this model the parameter set ju consists of the rate constants and the constant pool chemical concentrations \ c \ ■ 
(Most chemical rate laws are constructed phenomenologically and often have cubic or other nonlinearities and 
irreversible steps. Such rate laws are reductions of the full underlying reaction mechanism.) 

For certain parameter values this chemical system can exhibit fixed point, periodic or chaotic attractors in the three- 
dimensional concentration phase space. We consider the parameter set 

M= J*iC\, =3I.2,A_| = 0.2. ft 3 = I 45,Jfci< A , = MiX.k ^ = 0,12. fci = 1.02.* j = 0-01 . Va 4 The rate 
constant k_ 2 will be taken as the control or bifurcation parameter which is varied to examine how the system 
attractor changes. As an example, the single-banded chaotic attractor at £_ 2 =0.072 for the WR model is shown in 
figure C3.6.1(a) . 
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Figure C3.6.1 (a) WR single-banded chaotic attractor for k_ 2 = 0.072. This attractor is projected onto the (c^,c 2 ) 
plane. The maximum value reached by eft) is *"T* X ~ 54.1 and the minimum reached by £?"*« 2.5. The vertical line, 
at c 1 = 8.5 for c^ < 1, shows the position of the Poincare section of the attractor used later, (b) A projection, onto 
the (cft^),cft 2 )) plane, of the chaotic attractor reconstructed from the set of delayed coordinates {cft),cft^) 9 c^ 
(t 2 )}, where t^ = t + x^ and t 2 = t + x 2 , for < t < oo, and fixed delays t 1 = 137 and t 2 = 200. Note that both cft^) 
and eft 2) reach a maximum of ''P'and a minimum of <T" ni so that the three-dimensional reconstructed attractor is 


confined to a cube with sides of length c^*~ t"T ift - The central hole of the attractor is bounded along the diagonal in 
a similar way. 

C3.6.2.2 PHASE-SPACE RECONSTRUCTION 

The description of chemical reactions as trajectories in phase space requires that the concentrations of all chemical 
species be measured as a function of time, something that is rarely done in reaction kinetics studies. In addition, the 
underlying set of reaction intermediates is often unknown and the number of these may be very large. Usually, 
experimental data on the time variation of the concentration of a single chemical species or a small number of 
species are collected. (Some experiments focus on the simultaneous measurement of the concentrations of many 
chemical species and correlations in such data can be used to deduce the chemical mechanism [7].) 

The trajectory description problem of chemical reactions is resolved by using phase-space reconstruction from a 
single time series [8]; this method uses delayed data at times: t, t+x^ t+x 2 ,. . ., t+T n _^ for an ^-dimensional attractor, 
where usually n < 3. One may show that in place of the set of all chemical concentration one may use, say, c 1 (7), c^ 
(7+Tj), c 1 (7+t 2 ) v . . to represent trajectories in the concentration phase space. Such phase-space reconstruction 
methods all rely on Whitney's embedding theorem [9] which allows a multi-dimensional attractor to be 
reconstructed from a single time series. Since phase-space volumes contract for dissipative chemical systems, as 
noted above, the final attractor may have a dimension much smaller than the original ^-dimensional phase space. 
The effective behaviour of the system may often be captured in a phase space of only few dimensions even though 
many chemical intermediates are involved. To illustrate this reconstruction method, the set of delayed c 1 
coordinates for chaotic attractor shown in figure C3.6.1(a): namely (c 1 (t), c 1 (7+x), c 1 (M-t 2 )), for t going from zero 
to some large value, was used to reconstruct the topologically equivalent attractor shown in figure C3.6.1(b). 
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C3.6.3 CHEMICAL CHAOS 

We shall now analyse the structure of a chemical strange attractor and describe why the dynamics may be classified 
as chaotic. 

C3.6.3.1 STRANGE ATTRACTORS 

We begin by describing the features of a strange attractor. Figure C3.6.1(a) was constructed from a single chaotic 
trajectory which corresponds closely to a chaotic strange attractor, meaning that any such chaotic trajectory would 
look similar. Any point in the basin of the attractor approaches it asymptotically. A point moving on the strange 
attractor at some time comes arbitrarily close to any other point lying in the attractor, so motion on the attractor is 
ergodic; this is necessary but not sufficient for chaotic behaviour. To be chaotic the motion on the attractor must be 
sensitive to initial conditions, so that as time increases points on the attractor, however close together they may be 
initially, separate to distances comparable to the size of the attractor over a sufficiently long time. The rate of 
separation is measured by the Lyapunov number. In order that such separation be compatible with bounded motion, 
i.e. with the observation that the strange attractor lies in a finite volume of phase space, the chaotic (phase-space) 
flow must stretch and fold back onto itself. If we imagine a parcel of (compressible) fluid in phase space we see 
that this folding implies creation of infinitely many layers like the repeated folding of mille-feuille pastry. 
Correspondingly, the chaotic attractor is the result of an infinity of similar foldings. This dynamical recursion 
produces a self-similarity or fractal structure in the chaotic attractor. In summary, the chaotic attractor displays 
(exponential) separation of points or orbit segments and self-similar structure in the way these orbit segments are 
arranged in space. 

C3.6.3.2 POINCARE SECTIONS AND NEXT-AMPLITUDE MAPS 

For the strongly contracting phase volumes associated with chemical reactions, the three-dimensional continuous- 


time flow can be reduced to a one-dimensional discrete-time map as follows. We first construct the Poincare 
section of the attractor flow. For the projection shown in figure C3.6.1(a) the flow is counter-clockwise implying 
rabove the central hole in the single-banded attractor. Therefore this flow will circulate round this hole and 

(ft = i--** 1 — ft ^\ 

repeatedly intersect the Poincare surface of section ( lL ■ — L I " '), indicated by the heavy vertical line in 

( C3. 6.1(a) ), from right to left. Suppose that at time t^ the trajectory intersects this Poincare surface at a point (c 2 
(7 ), c 3 (^q)); at time ^ it makes its next or so-called j?re£ return to the surface at point (c 2 (t^) 9 c 3 (^)). This process 
continues for times t 2 2, ty. . ., the difference t + ^-t being the period of the nth first-return trajectory segment. The 
sequence of points generated by these intersections is the Poincare section and is plotted in figure C3. 6.2(b) . The 
thin line-like form arises from the strong contraction of the flow onto the attractor; thus the attractor resembles a 
two-dimensional surface formed from extremely tightly compressed and folded layers; sufficiently close to the 
attractor the trajectories tend to separate from one another across the attractor band. The function that takes (c 2 (t n ), 
c 3 (t n )) into (c 2 (t n+ i), £3(^+1)) is the Poincare map. The line-like form of the Poincare section and its single- 
valuedness as a function of either coordinate permits a one-dimensional representation of the two-dimensional 
Poincare map. To do this second part of the next-amplitude map construction we plot the c 2 component of the 
Poincare map corresponding to the nth intersection of the chaotic trajectory with the Poincare surface, c 2 (n) =c 2 
(f ), versus its value at the (n + l)th intersection, c 2 (n + l)=c 2 (* n +i)- This next-amplitude map is displayed in 
figure C3.6.2 (b). The map has a quadratic extremum, a roughly parabolic shape, and is densely filled by 
intersection points. We may then represent trajectories of the flow by iterates of this 
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map. To represent these iterates graphically, we first draw the bisectrix of the map, i.e. the diagonal line B in figure 
figure C3.6.2(Z?) . By construction, any point on the map whose abscissa is c 2 (n) has ordinate c 2 (n + 1). This 
ordinate is given by moving horizontally to the bisectrix. Moving vertically from the bisectrix to the map makes c 2 
(?z + 1) the new abscissa and c 2 (n + 2) the corresponding ordinate. These two steps correspond to an iteration of the 
next- amplitude map and the procedure can be repeated to obtain a discrete trajectory indicating how the chaotic 
attractor structure is built up. A portion of such a discrete chaotic trajectory is shown in figure C3.6.2(&). 
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Figure C3.6.2 (a) The (c 2 ,c 3 ) Poincare surface of a section of the phase flow, taken at cf*= 8.5 with c 1 < 0, for the 

WR chaotic attractor at k_ 2 = 0.072. (b) The next-amplitude map constructed from pairs of intersection coordinates 
{...,(c 2 (n+l),c 2 (n+2),c 2 (n+l)),...}. The sequence of horizontal and vertical line segments, each touching the 
diagonal B and the map, comprise a discrete trajectory. The direction on the first four segments is indicated. 



Figure C3.6.3 The spreading of an ensemble of four points on the WR chaotic attractor. (a) The initial tight, four- 
point ensemble of open circles (o) at c 2 = 5.287 ..., c 3 = 24.065 ... and variable c 1 = 2.884 ...,2.984 ...,3.084 ..., and 

3.184... spreading to the set of four filled circles (•) at time t = 4.0. The filled circles overlap in two pairs, (b) The 
spread from the same initial ensemble at time t = 800.0. The dispersion of the initial ensemble has the size of a full 
stop in the centre of the four overlapping open circles. The attractor is shown as a dust of stroboscopically plotted 
points so that the final ensemble of four filled circles (•) can be seen. One point lies on the inner edge of the central 
hole in the attractor. The density of the dust is an indicator of the coarse-grained density on the attractor. 
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C3.6.3.3 LYAPUNOV NUMBER AND FRACTAL DIMENSION 

Chaotic attractors are complicated objects with intrinsically unpredictable dynamics. It is therefore useful to have 
some dynamical measure of the strength of the chaos associated with motion on the attractor and some geometrical 
measure of the structural complexity of the attractor. These two measures, the Lyapunov exponent or number [1] 
for the dynamics, and the fractal dimension [ 10 ] for the geometry, are related. To simplify the discussion we 
consider three-dimensional flows in phase space, but the ideas can be generalized to higher dimension. 

As already mentioned, the motion of a chaotic flow is sensitive to initial conditions [11]; points which initially lie 
close together on the attractor follow paths that separate exponentially fast. This behaviour is shown in figure 
C3.6.3 for the WR chaotic attractor at k_ 2 = 0.072. The instantaneous rate of separation depends on the position on 
the attractor. However, a chaotic orbit visits any region of the attractor in a recurrent way so that an infinite time 
average of this exponential separation taken along any trajectory in the attractor is an invariant quantity that 
characterizes the attractor. If y(t) is a trajectory for the rate law ( c3.6.2 ] then we can linearize the motion in the 
neighbourhood of y to get 


dfic _ aft 
df Sc 


$c. 


(C3.6.5) 


The formal (or numerical) integration of this equation can be written as 


Sc(r) = L(t)&rX0) 


(C3.6.6) 


where L (i) is the displacement evolution matrix along y and 8c(t) is the solution of equation [C3.6.5] for initial 
displacement 8c(0). Then the Lyapunov number is defined by 


k= lim — ln[Tri*{f)L(0] 


(C3.6.7) 


where L* is the adjoint of L. If the Lyapunov number lis positive this indicates chaotic behaviour since lis a 
measure of (the exponent for) the average rate at which trajectories separate on the attractor. 

A chaotic attractor comprises line-like trajectory segments and so is topologically a one-dimensional object. 
However, the trajectories may lie arbitrarily close together in some regions of space, at least in the infinite time 
limit. In such regions a chaotic attractor has almost surface-like 'filling' properties. This unusual structure 
motivates the definition of a geometrical measure of chaos: the fractal, or more often, the box counting dimension 
of an attractor 

D = hm = - lim — (C3.6.8) 

*-*o ln(]/£) f-*o me 

where, in two-dimensional or three-dimensional space N(e) is the minimum number of squares or cubes, 
respectively, of side s that covers the attractor. This dimension can be calculated for the Poincare section of the 
phase flow by covering it with successively smaller squares or for the entire attractor by covering it with 
successively smaller cubes 
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and measuring D as the slope of JV(e) versus ln(l/s) implied in ( C3_6_8 ). This dimension is typically non-integer 
and is less than the phase-space dimension. 

C3.6.3.4 EXPERIMENTAL OBSERVATIONS OF CHEMICAL CHAOS 

The existence of chaotic oscillations has been documented in a variety of chemical systems. Some of the earliest 
observations of chemical chaos have been on biochemical systems like the peroxidase-oxidase reaction [ 12 ] and on 
the well known Belousov-Zhabotinskii (BZ) [13] reaction. The BZ reaction is the Ce-ion-catalyzed oxidation of 
citric or malonic acid by bromate ion. Early investigations of the BZ reaction used the techniques of dynamical 
systems theory outlined above to document the existence of chaos in this reaction. Apparent chaos in the BZ 
reaction was found by Hudson et al [14] and the data were analysed by Tomita and Tsuda [15] using a return-map 
method. Chaos was confirmed in the BZ reaction carried out in a CSTR by Roux et al [16, 17] and by Hudson and 

Mankin [18] who also used reconstruction from the electrode potentials of Pt and Br - , and d[Pt]/d£ as independent 
variables. These demonstrations of true chemical chaos were achieved by a number of then new methods: power- 
spectral analysis [16, 19], trajectory reconstruction in phase space [16], and next- amplitude-map analysis [15 20, 
21 ]. The existence of true chemical chaos was signalled by a positive Lyapunov exponent calculated from the 
experimental return map. Since these early investigations, chaos has been documented in a variety of chemical 
systems. One aspect of these CSTR experiments was the observation that the stirring rate moved the bifurcation 
point(s) even if this rate was very large [22]. This effect depends on turbulent mixing and can be controlled but not 
eliminated by keeping the stirring rate constant. We now give examples of two related dynamical systems 
techniques used by experimentalists: phase-space reconstruction of chaotic attractors and the analysis of the 
associated next-amplitude maps. First, we discuss a study where an attractor was reconstructed from experimental 
data and then used to obtain a next-amplitude map [17]. 

Figure C3. 6.4(a) shows an experimental chaotic attractor reconstructed from the Br - electrode potential, i.e. the 
logarithm of the Br - ion concentration, in the BZ reaction [17]. Such reconstruction is defined, in principle, for 
continuous time t. However, in practice, data are recorded as a discrete time series of measurements {X{t^\ i=l, 

2,. . -,* max }, consisting of thousands max ) of data points. In our example X(^) is proportional to ln[Br~](^). The 
experimental attractor was reconstructed [ 17 ] in the space of the three variables, X(t^ 9 X(t +x) andX(^ + 2 x), and 
C3. 6.4(a) shows the projection of this attractor onto the (X(t£) 9 X(t. + x)) plane. This attractor resembles that of the 
chaotic attractor shown in figure C3.6.1fa) and it can be demonstrated that the reconstructed attractor possesses the 
signatures of chaos: regions where trajectories locally spread or diverge and regions of re-injection and folding of 
the phase-space flow. Furthermore, we see that because the chaotic attractor is surface-like it has a fractal 
dimension close to two in spite of the fact that there are likely to be 30-40 chemical species involved in the 


reaction so that the Euclidean dimension of the full concentration phase space is large. This points to the usefulness 
of phase-space reconstruction methods for low-dimensional chaotic attractors, especially for systems with a high 
but unknown phase-space dimensionality. 
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Figure C3.6.4 Single-banded chaotic attractor and next-amplitude map reconstructed from experimental data for 
the BZ reaction, (a) The reconstructed attractor projected onto the (X(t^JC(t- + x)) plane (see the text for a 
discussion of the notation), (b) The next-amplitude map in the (X k? X k+1 ) plane drawn from the surface of section 
taken at the dashed curve on the lower part of the attractor in (a). See the text for an explanation of the map 
construction. Reproduced by permission from Roux and Swinney [17]. 

We now examine how a next-amplitude-map was obtained from the attractor shown in figure C3. 6.4(a) [17]. 
Consider the plane in this space whose projection is the dashed curve; i.e. a plane orthogonal to the {X{t^,X{t- x + 
x)) plane. Then, for the Ml intersection of the (continuous) trajectory with this plane, there will be a data point 
(XdiJ, X(tf i + r). X(tj t + 2r))on the attractor that lies closest to the intersection of the continuous trajectory. A 

second discretization produces the set { A\ = X (/^ + 2t )) : k = 1 , 2. . , , , fcj- ) . This set is used in the construction 
of the next-amplitude map shown in figure C3. 6.4(b) from the pairs of points \{X^, XV+i) : k = 1, 2 k^ — I ! 

This map has a single quadratic extremum, similar to that of the WR model described in detail earlier. Such maps 
(together with the technical constraint of negative Schwarzian derivative) [23] possess universal properties. In 
particular, the universal (U) sequence in which the periodic orbits appear [ 24 ] was observed in the BZ reaction in 
accord with this picture of the chemical dynamics. 


C3.6.4 ROUTES TO CHAOS 

The next problem to consider is how chaotic attractors evolve from the steady state or oscillatory behaviour of 
chemical systems. There are, effectively, an infinite number of routes to chaos [25]. However, only some of these 
have been examined carefully. In the simplest models they depend on a single control or bifurcation parameter. In 
more complicated models or in experimental systems, variations along a suitable curve in the control parameter 
space allow at least a partial observation of these well known routes. For chemical systems we describe period 
doubling, mixed-mode oscillations, intermittency, and the quasi-periodic route to chaos. 

C3. 6.4.1 THE PERIOD-DOUBLING ROUTE TO CHAOS 

We first examine how chaos arises in the WR model using the rate constant k_ 2 as the bifurcation parameter. 
However, another parameter or set of parameters could be used to explore the behaviour. (Independent variation of 
p parameters 
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produces a ^-dimensional bifurcation diagram.) In the context of experiments carried out in CSTRs the bifurcation 
parameter is usually taken to be the flow rate or a reservoir concentration. If we start with a value of k_ 2 > 0.1715 
with all other rate constants fixed at the values given in section C3.6.2 , the WR reaction has a stable steady state or 
fixed point. We examine the sequence of transformations that takes place as k_ 2 decreases. At a certain value of k_ 2 

= J^_ 2 « 0.1715 the fixed point, U]\ r", r")« (8.801,11.494,15.048), loses its stability and the concentrations 

begin to oscillate with period T^. This is the Hopf bifurcation point and for k_ 2 < £ H _ 2 the chemical attractor is a 
limit cycle. As k_ 2 decreases further the amplitude of the limit cycle grows (initially as \k_ 2 - k_ 2 \ ) until the 
system undergoes a further bifurcation at £_ 2 ~ 0.1 .. . where the orbit undergoes a subharmonic bifurcation and its 
period doubles. To understand this bifurcation, imagine that the limit cycle lies on the surface of a Mobius band 
which is effectively a strip with a single twist in it. Motion on this band represents the slow relaxation of the 
system. At bifurcation the limit cycle becomes unstable but a stable orbit is born adjacent to it in the strip; this 
newborn orbit is geometrically equivalent to the edge of the Mobius strip as the width of the strip becomes 
arbitrarily small; because of the twist, the strip has only one edge of twice the length of the unstable limit cycle it 
contains. Therefore, the new stable orbit has twice the period of its parent limit cycle. An infinite sequence of these 
local twists or braids occur in the phase flow, generating an infinite subharmonic sequence of period-doubled 
orbits. The first two orbits of the main WR sequence are shown in figure C3.6.5 there is a period-4 attractor at k_ 2 = 

0.095. At the nth period doubling the period of the oscillation is T n ~ 2 n T^. In the limit «^oowe arrive at the 
strange attractor where the time variation of the concentrations is no longer periodic. This is the period-doubling 
route to chaos. 



Figure C3.6.5 The first two periodic orbits in the main subharmonic sequence are shown projected onto the (c^c 2 ) 
plane. This sequence arises from a Hopf bifurcation of the stable fixed point for the parameters given in the text. 
The arrows indicate the direction of motion, (a) The limit cycle or period- 1 orbit at k_ 2 = 0.1 1. (b) The first 
subharmonic or period-2 orbit at k_ 2 = 0.095. 

It is instructive to view this sequence of transformations in terms of a bifurcation diagram. We use the procedure 
described earlier to examine the chaotic orbit: the intersections of the periodic trajectories with the Poincare surface 
are recorded for each value of the rate constant k_ 2 . In figure C3.6.6 we plot the concentration c 2 on the Poincare 
plane versus k_ 2 . One can clearly see the sequence of period-doubling bifurcations leading eventually to the chaotic 
attractor. One can understand the origin of this sequence of bifurcations by considering the next-amplitude map 
discussed earlier. We remarked in section C3. 6. 3.2 that this map has the nearly parabolic functional form shown in 
figure C3.6.2(b) so that, after suitable scaling, we can write the next-amplitude map in the standard quadratic form 
c 2 (n+l) = Xc 2 (n)(l - c 2 (n)), thereby preserving the local and global features of the bifurcation diagram C3.6.6 . We 
know what happens 
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as the standard map parameter X is changed. This reduction of the problem to the study of a quadratic map allows 
one to make a detailed examination of the universal properties of this route to chaos since the only requirement is 
that the map function be quadratic in the vicinity of its maximum. Such an analysis was carried out by Feigenbaum 
[ 26 ] where the following scaling relation was derived: let X n be the value of A, at the nth period doubling and X n be 


its value in the n — » oo limit. Then for sufficiently large n, X n - X c = §(^ n+1 
number for such period-doubling cascades for quadratic maps. 


X ) with 8 : 


4.6692 ... a universal 
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Figure C3.6.6 The figure shows the c 2 coordinate, for ^ < 0, of the family of trajectories intersecting the (c 2 ,c 3 ) 
Poincare surface at c 1 = 8.5 as a function of bifurcation parameter k_ 2 . As the ordinate k_ 2 decreases, the first 
subharmonic cascade is visible between k_ 2 ~ 0.1, the value of the first subharmonic bifurcation to £_ 2 ~ 0.083, the 
subharmonic limit of the first cascade. Periodic orbits that arise by the tangent bifurcation mechanism associated 
with type-I intermittency (see the text for references) can also be seen for values of k_ 2 smaller than this 
subharmonic limit. The left side of the figure ends at k_ 2 = 0.072, the value corresponding to the chaotic attractor 
shown in figure C3.6.1(a). Other regions of chaos can also be seen. 

C3.6.4.2 OTHER ROUTES TO CHAOS 

In addition to the period-doubling route to chaos there are other routes that are chemically important: mixed-mode 
oscillations (MMOs), intermittency and quasi-periodicity. Their signature is easily recognized in chemical 
experiments, so that they were seen early in the history of chemical chaos. 

MMOs have been observed in many experiments. Typically, a MMO consists of one or more large amplitude 
oscillations followed by several small amplitude oscillations. The size of the small oscillations may grow slowly. 
Suppose L large oscillations are followed by s small oscillations, where L and s are integers, then this MMO can be 

encoded by L s . For example, one large oscillation followed by one small oscillation is written 1 , and so on. Since 
large and small have a specific meaning in a series of chemical experiments we may find only small oscillations, 

encoded or only large oscillations encoded 1 in the series. Chaotic MMOs consist of L s oscillations 

interspersed randomly by L s , Z' s , or L fS oscillations. Experimental observations and theoretical descriptions for the 
origins of such oscillations have been given [27]. 

Intermittency, in the context of chaotic dynamical systems, is characterized by long periods of nearly periodic or 
'laminar' motion interspersed by chaotic bursts of random duration [28]. Within this broad phenomenological 
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description, three kinds of intermittency have been distinguished theoretically and some detected experimentally [1, 
29 ]. The onset of the laminar phase is statistical but its subsequent evolution is deterministic until the start of the 
next burst, whereas the behaviour of the chaotic phase is largely probabilistic. For this kind of onset of chaotic 
motion the bifurcation parameter, s say, is close to its critical bifurcation value s c for periodic motion. As s passes 
from a 'chaotic' value through s c the motion goes from intermittent to marginally stable, to strictly stable periodic 
motion. 


The quasiperiodic route to chaos is historically important. It arises from a succession of Hopf birfurcations. As 
already noted, a single Hopf bifurcation results in a limit cycle. The next Hopf bifurcation produces a phase flow 
that can be represented on the surface of a torus (doughnut). This flow is associated with two frequencies: if the 
ratio of these frequencies is irrational then the torus surface is densely covered by the phase trajectory, whereas if 


the ratio is rational the orbit winds periodically on the torus surface with both frequencies determining the overall 
period. However, a further Hopf bifurcation leads to an unstable torus flow which deforms into a chaotic flow. The 
nature of this instability was first discussed independently by Kupka and Smale; an equivalent theory of this 
breakdown of quasi-periodic flow to chaotic flow was proposed by Ruelle and Takens and was developed by them 
and New house [30]- The quasiperiodic route to chaos was important because it was the first example of a 
transition to chaos that involved/ew modes, in contrast to the classical model of Landau of a gradual wandering 
into chaos as successive modes become unstable. 


C3.6.5 CHEMICAL PATTERNS AND SPATIO-TEMPORAL CHAOS 

Thus far we have considered systems where stirring ensured homogeneity within the medium. If molecular 
diffusion is the only mechanism for mixing the chemical species then one must adopt a local description where 
time-dependent concentrations, c(r,t), are defined at each point r in space and the evolution of these local 
concentrations is given by a reaction-diffusion equation 


Sc(r,0 , 

a = Jflrir, f'K p) + DV-cAr. t) (C3.6.9) 

ot 

where D is a matrix of diffusion coefficients. In addition to the temporal behaviour described above, one now has 
the possibility of chemical pattern formation which may lead to spatio-temporal chaos. In order to investigate 
chemical pattern formation under controlled non-equilibrium conditions, experiments are now carried out in 
continuously fed unstirred reactors (CFURs) 3L In such reactors, well stirred reagent baths are in contact with a gel 
or porous medium within which the chemicals mix and react in the absence of stirring effects other than diffusion. 
Since the reagent baths are CSTRs they continuously supply and remove reactants and products from the reaction- 
diffusion medium, and chemical pattern formation can be controlled and maintained indefinitely. This has allowed 
experimentalists to make detailed studies of chemical pattern formation. 

We shall describe some of the common types of chemical patterns observed in such experiments and comment on 
the mechanisms for their appearance. In keeping with the theme of this chapter we focus on states of spatio- 
temporal chaos or on regular chemical patterns that lead to such turbulent states. We shall touch only upon the 
main aspects of this topic since there is a large variety of chemical patterns and many mechanisms for their onset 

[2,3,5,32]. 
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C3.6.5.1 EXCITABLE MEDIA 

Excitable media are some of the most commonly observed reaction-diffusion systems in nature. An excitable 
system possesses a stable fixed point which responds to perturbations in a characteristic way: small perturbations 
return quickly to the fixed point, while larger perturbations that exceed a certain threshold value make a long 
excursion in concentration phase space before the system returns to the stable state. In many physical systems this 
behaviour is captured by the dynamics of two concentration fields, a fast activator variable u with cubic nullcline 
and a slow inhibitor variable u with linear nullcline [33]. The FitzHugh-Nagumo equation [34], derived as a simple 
model for nerve impulse propagation but which can also apply to a chemical reaction scheme [35], is one of the 
best known equations with such activator-inhibitor kinetics: 


dii * 

— = —XI" +■ u — v = B„{h h f ) 

f' (C3.6.10) 

dv 

— = ${v - att + h) = R ,,(« p v) r 


Figure C3.6.7(a) shows the jj= and 6= nullclines of this system along with trajectories corresponding to sub- 
and super-threshold excitations. The trajectory arising from the sub-threshold perturbation quickly relaxes back to 
the stable fixed point. Three stages can be identified in the trajectory resulting from the super-threshold 
perturbation: an excited stage where the phase point quickly evolves far from the fixed point, a refractory stage 
where the system relaxes back to the stable state and is not susceptible to additional perturbation and the resting 
state where the system again resides at the stable fixed point. 
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Figure C3.6.7 Cubic (u = 0) and linear (j) = 0) nullclines for the FitzHugh-Nagumo equation, (a) The excitable 
domain showing trajectories resulting from sub- and super-threshold excitations, (b) The oscillatory domain 
showing limit cycle orbits: small inner limit cycle close to Hopf point; large outer limit cycle far from Hopf point. 

An excitable medium is a diffusively coupled array of such local excitable elements described by the reaction- 
diffusion equation (C3.6.9) with R given by (C3.6.10) and c = (w,v). Imagine a local super-threshold perturbation 
applied to the system in the homogeneous resting state. Due to diffusive coupling, the perturbation will excite 
neighbouring regions of the medium. The originally perturbed region will then relax to the refractory stage where it 
is no longer susceptible 
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to perturbation, and finally back to the stable steady state. Consequently, a circular wave of excitation with a 
refractory tail will propagate outward through the medium (see figure C3.6.8(a)). If the excitable system is 
periodically stimulated in a local region of the medium (a pacemaker region) a target pattern comprising a set of 
concentric rings of excitation will be observed. 



Figure C3.6.8 (a) A growing ring of excitation in an excitable FitzHugh-Nagumo medium, (b) A spiral wave in 
the same system. 


If an excitable wave is broken, for instance by an obstacle or inhomogeneity in the medium, since the front velocity 
is smaller at the tip than the rest of the wave front, free ends of wave fronts will curl leading to the formation of 
spiral waves in the system. An example of a spiral wave is shown in figure C3.6.8(b). Excitable waves are seen in 
many chemical and biological systems. The often studied Belousov-Zhabotinskii (BZ) reaction was one of the first 
systems in which such waves were observed [13, 36]- Chemical waves of this type have been studied extensively in 
catalytic oxidation of CO on Pt[37]. In biological contexts, waves of this type occur in the aggregation stage of the 
slime mould Dictyostelium discoideum where the chemical signalling is through periodic waves of cAMP; also the 
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Ca waves in systems like Xenopus laevis oocytes and pancreatic P cells fall into this category 38 . 
Electrochemical waves in cardiac and nerve tissue have this origin and the appearance and/or breakup of spiral 
wave patterns in excitable media are believed to be responsible for various types of arrhythmias in the heart [39, 
40 ]. Figure C3.6.9 shows an excitable spiral wave in dog epicardial muscle [41]. 




ms 


64 ms 


Figure C3.6.9 Spiral electrochemical wave in dog epicardial muscle visualized using a voltage-sensitive dye. 
Reproduced by permission from Pertsov and Jalife [41]. 
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The cores of the spiral waves need not be stationary and can move in periodic, quasi-periodic or even chaotic 
'flower' trajectories [42, 43]. In addition, spatio-temporal chaos can arise if such spiral waves break up and the 
spiral wave fragments spawn pairs of new spirals [42, 44 ]. 

C3.6.5.2 OSCILLATORY AND CHAOTIC MEDIA 

We described earlier how a stable steady state may give rise to a periodic oscillation through a Hopf bifurcation. 
The steady state of the FitzHugh-Nagumo model can undergo such a Hopf bifurcation. Consider the situation 
shown in figure C3. 6.1 (b) for b = where there is a single fixed point at the origin (u*,v*) = (0,0). This fixed point 

is stable if s > a , a < 1, and becomes unstable at s = s H = a through a Hopf bifurcation spawning a limit cycle 
encircling the origin. 


H H | 1/2 gauge 


Consider the analogue of such a bifurcation in a spatially distributed system and imagine tuning a bifurcation 

parameter \i (in the parameter set jli) in (C3.6.9) through such a bifurcation point, jli h , and let X = \\i 
the distance from the bifurcation point. One may then expand the local concentration about the steady state c* as c 
(r,t) = c* + A(r,t)e + c.c, where A(r,t) is a complex amplitude and e is an eigenvector of the linearized reaction 
diffusion problem. Then, in the vicinity of the Hopf bifurcation point, it is possible to transform the reaction- 
diffusion equation into a universal equation for the complex amplitude A (r,i) [45]: 


3r 


= A - (1 + i/J)|A| 3 >l + (1 + \ct)V 2 A. 


(C3.6.11) 


This complex Ginzburg-Landau equation describes the space and time variations of the amplitude A on long 
distance and time scales determined by the parameter distance from the Hopf bifurcation point. The parameters a 
and P can be determined from a knowledge of the parameter set \i and the diffusion coefficients of the reaction- 
diffusion equation. For example, for the FitzHugh-Nagumo equation we have a = (D - Z) )/[g)q(Z) + D )] and P 
= -1/c0q. The Ginzburg-Landau equation parameters may also be extracted from the experimental data and this has 
been done for the BZ reaction [46]. Through such an analysis one can study general features of oscillatory media, 


independent of specific features of the reaction kinetics. 

The complex Ginzburg-Landau equation also supports spiral wave solutions [47]. The core of a spiral wave is a 
point topological defect where the complex amplitude A vanishes [48]. In certain parameter regions, one finds a 
type of spatio-temporal chaos termed defect-mediated turbulence where the average number of topological defects 
is stationary but their instantaneous number fluctuates: defects of opposite topological charge may collide and 
annihilate or defects may nucleate in pairs as a result of 'pinching' of wave fronts [49]. Figure C3.6.1Q shows the 
system in the defect-mediated turbulence regime and illustrates distribution of spiral defects in the turbulent 
dynamics described above. Such defect-mediated turbulence has been observed in experiments on the BZ reaction 
[50]. Figure C3. 6. 11(a) shows the chemical pattern near the onset of the instability giving rise to spatio-temporal 
turbulence. Note that small well defined spirals can still be seen embedded in a sea of turbulent dynamics while in 
figure C3. 6. 11(b) well beyond the instability, one sees fully developed turbulence. 
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Figure C3.6.10 Defect-mediated turbulence in the complex Ginzburg-Landau equation, (a) The phase, arg(J), as 
grey shades, (b) The amplitude [A], with a similar color coding. In the left panel topological defects can be 
identified as points around which one finds all shades of grey. Note the apparently random spatial pattern of 
amplitudes. 



Figure C3.6.11 Defect-mediated turbulence in the BZ reaction, (a) Spatial structure close to the instability, (b) 
Fully developed spatio-temporal turbulence. The control parameter is the concentration of H 2 S0 4 in the feed 
reactor. Reproduced by permission from Ouyang and Flesselles [50]. 


The local dynamics of the systems considered thus far has been either steady or oscillatory. However, we may 
consider reaction-diffusion media where the local reaction rates give rise to chaotic temporal behaviour of the sort 
discussed earlier. Diffusional coupling of such local chaotic elements can lead to new types of spatio-temporal 
periodic and chaotic states. It is possible to find phase-synchronized states in such systems where the amplitude 
varies chaotically from site to site in the medium whilst a suitably defined phase is synchronized throughout the 
medium 5L Such phase synchronization may play a role in layered neural networks and perceptive processes in 
mammals. Somewhat surprisingly, even when the local dynamics is chaotic, the system may support spiral waves 


[52, 51 and 54]- The origin of such spiral waves in chaotic media can again be traced to the phenomenon of phase 
synchronization. The notion of a defect at the core of the spiral remains valid even for these chaotic media, so the 
phase-coherent dynamics necessary for the existence of a spiral wave survives the amplitude turbulence. New 
phenomena can arise: in addition to point topological defects one can find synchronization line defects whose 
dynamics may be chaotic. Such synchronization line-defect dynamics has been observed in the BZ medium 
reaction [54, 55 and 56]. 


-17- 
C3.6.5.3 TURING PATTERNS 

If the diffusion coefficients of the chemical species are sufficiently different, new types of chemical instability arise 
which can lead to the formation of chemical patterns and ultimately to spatio-temporal chaotic behaviour. 

One of the best known such instabilities is the Turing bifurcation proposed in 1952 as a possible mechanism for 
morphogenesis [57]. While the relevance of this type of pattern-forming instability for biological systems is still a 
matter of debate, such Turing patterns have been observed in laboratory chemical experiments [58, 59 and 60]. A 
Turing bifurcation involves the destabilization of a homogeneous steady state to form an inhomogeneous state or 
chemical pattern whose wavelength depends on the kinetic parameters and diffusion coefficients of the system. 
Turing bifurcations are often discussed in terms of activator-inhibitor kinetics like that of the FitzHugh-Nagumo 
equation above [61]. Consider two chemical species, X { andX 2 , with concentration vector, c = (c 1? c 2 ), that satisfies 
a two-variable reaction-diffusion equation where, as usual, R(c) describes the kinetics and D is a diagonal 
diffusion-coefficient matrix with elements D { and Z> 2 . We suppose the system possesses a homogeneous stable 
steady state, c*, obtained from the solution of i?(c*) = 0. To determine the conditions for a Turing bifurcation to 
occur we consider the perturbation of this homogeneous steady state to inhomogeneous perturbations, c(r,t) = c* + 
8c(r,t). We may linearize the reaction-diffusion to obtain 


- — -— = A&c(r< I ) + DV^cfr, I ) (C3.6.12) 

at 

where A = (dR/dc) c=c * is the matrix that specifies the chemical rate evolution about the steady state c*. To 
determine the stability of the steady state it is useful to examine the behaviour of the Fourier components of the 
concentration field, £^, which satisfies the Fourier transform of equation C3.6.12: 


dt 


= (A- k z D)Sc k {l) = Bc k (t)- (C3.6.13) 


Now we may state the well known conditions for a Turing bifurcation. If A j j < and A 22 < we say species X l is 
the activator and species X 2 is the inhibitor. Then, for a Turing bifurcation to occur we must have det B = 0, Tr B > 
and ^4 j jZ> 2 + ^22^1 > ^' ^ e ( un il ue ) wavenumber at the bifurcation is 

1/4 

(C3.6.14) 


/det AY'"* 


Furthermore, since the bifurcation must occur from a stable homogeneous steady state we must have D^ID^ < ^> ^ e - 
the diffusion coefficient of the inhibitor is greater than that of the activator. The critical diffusion ratio at the 
bifurcation is 
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— I = A^(det A - Ai^Mi +2{AnM\ detA) ]f2 ). 


(C3.6.15) 


Consequently, when D^Dj, exceeds the critical value, close to the bifurcation one expects to see the appearance of 
chemical patterns with characteristic length t= 2n I k c . Beyond the bifurcation point a band of wave numbers is 

unstable and the nature of the pattern selected (spots, stripes, etc.) depends on the nonlinearity and requires a more 
detailed analysis. Chemical Turing patterns were observed in the chlorite-iodide-malonic acid (CIMA) system in a 
gel reactor [58, 59 and 60]. Figure C3. 6. 12(a) shows an experimental CIMA Turing spot pattern [59]. 



Figure C3.6.12 (a) Turing spot pattern in the CIMA reaction, (b) Tio-temporal turbulence near the Turing 
bifurcation. Reproduced by permission from Ouyang and Swinney [59]. 

The Turing mechanism requires that the diffusion coefficients of the activator and inhibitor be sufficiently 
different; but the diffusion coefficients of small molecules in solution differ very little. The chemical Turing 
patterns seen in the CIMA reaction used starch as an indicator for iodine. The starch indicator complexes with 
iodide which is the activator species in the reaction. As a result, the complexing reaction with the immobilized 
starch molecules must be accounted for in the mechanism and leads to the possibility of Turing pattern formation 
even if the diffusion coefficients of the activator and inhibitor species are the same 62 . 

One may also observe a transition to a type of defect-mediated turbulence in this Turing system (see figure C3.6.12 
(b) . Here the defects divide the system into domains of spots and stripes. The defects move erratically and lead to a 
turbulent state characterized by exponential decay of correlations [59]. Turing bifurcations can interact with the 
Hopf bifurcations discussed above to give rise to very complicated spatio-temporal patterns [63, 64 ]. 

C3.6.5.4 CHEMICAL FRONT INSTABILITIES 

Another class of instabilities that are driven by differences in the diffusion coefficients of the chemical species 
determines the shapes of propagating chemical wave and flame fronts [65, 66 ]. 

As an example of chemical front instability consider a simple cubic autocatalytic reaction, A + IB — » 3B, occurring 
in 


-19- 

a two-dimensional geometry where the 'fuel' A occupies the right-hand region and the autocatalyst occupies the 
left-hand region 67. We suppose the reaction occurs under isothermal conditions which can be achieved for 
condensed phase reactions. The species B will consume the fuel A and the chemical front that separates the A and B 
species will move to the right. (For flame fronts one must generally couple the reaction kinetics to the variations in 
the temperature of the system.) 


If the diffusion coefficient of species A is less than that of B (D A < D B ) the propagating front will be planar. 
However, ifD A is sufficiently greater than D B , the planar front will become unstable to transverse perturbations and 
chaotic front motion will ensue. To understand the origin of the mechanism of the planar front destabilization 
consider the following: suppose the interface is slightly non-planar. We would like to know if the dynamics will 
tend to eliminate this non-planarity or accentuate it. LetZ> B ^D A , The situation is depicted schematically in figure 

C3.6.13 where large diffusion fluxes are indicated by — > and smaller diffusion fluxes by — * For the part of the B 
front that protrudes into the A region, fast diffusion of B leads to dispersal of B and suppresses the autocatalytic 
reaction that requires two molecules of B. The front will have difficulty advancing here. In the region where A 
protrudes into B, A will react leading to advancement of the front. The net effect is to remove any initial non- 
planarity and give rise to a planar front. 




Figure C3.6.13 Schematic illustration of how the front instability arises for the case (a) D B ^D A and (b) £> B ^ 
D A- 

If D B <KZ> A , in regions where B protrudes into A, rapid A diffusion will lead to conversion of A to B leading to front 
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advance. In regions where A protrudes into B, small diffusion of B into the A region does not favour the 
autocatalytic conversion so the front will not advance rapidly here. Consequently, any small non-planarity will 
grow to make the front even more non-planar. Therefore, we expect that for some ratio of diffusion coefficients, d 
= D A /D B > 1, the planar front will lose its stability [68]. An example of the front dynamics for d = 5 is shown in 
figure C3.6.14 where the minima in the front profile are plotted versus time. The resulting space-time plot shows 
the chaotic nature of the front dynamics. The (black) minima act like 'particles' in the system: they move and when 
they collide they coalesce to form a single minimum. If the distance between two minima is too large, a new 
minimum is formed. Thus, the average density of 'particles' per unit length of the interface remains constant but 
the instantaneous number of 'particles' fluctuates due to the creation and annihilation events for the minima. 



100000 


Figure C3.6.14 Space-time (y,t) plot of the minima (black) in the cubic autocatalysis front (|)(y,t) in equation 
C3.6.16 showing the nature of the spatio-temporal chaos. 

In order to investigate such front instabilities quantitatively one may derive an equation for the profile <\>(y,t) of the 
front directly from the reaction-diffusion equation. This Kuramoto-Sivashinsky equation [ 69 ] 




(C3.6.16) 


describes a number of general features of such front dynamics. The parameters v, u and k may be related to the 
parameters in the original reaction-diffusion equation. The nonlinear term accounts for the fact that the velocity of 
the front depends on its curvature, while the gradient terms arise from diffusive effects. The coefficient of the 
fourth-order gradient is positive while the sign of v depends on the diffusion coefficient ratio d: for d > d c , where 
d is a critical value of d, v changes from being positive to negative. This negative value of the 'diffusion 
coefficient' leads to an instability whose growth is controlled by the stabilizing fourth-order term. Instead of 
studying the full reaction-diffusion equation we may now explore the front dynamics directly through equation 
(C3.6.16). This equation yields front dynamics like that described above. 

In addition to flame fronts, which have been extensively studied experimentally, front instabilities have been 
investigated for the isothermal cubic autocatalytic iodate arsenous acid system [ 70 ] as well as for polymerization 
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reactions where thermal and hydrodynamic effects lead to complicated front patterns [71]. Front instabilities also 
play a role in determining the labyrinth patterns seen in recent chemical experiments [72]. 


C3.6.6 CONCLUSION 

Our understanding of the development of oscillations, multi-stability and chaos in well stirred chemical systems 
and pattern formation in spatially distributed systems has increased significantly since the early observations of 
these phenomena. Most of this development has taken place relatively recently, largely driven by development of 
experimental probes of the dynamics of such systems. In spite of this progress our knowledge of these systems is 
still rather limited, especially for spatially distributed systems. 


Several important topics have been omitted in this survey. We have described only a few of the routes by which 
chaos can arise in chemical systems and have made no attempt to describe in detail the features of the different 
kinds of chemical strange attractor seen in experiments. A wide variety of chemical patterns have been observed 
and while the many aspects of the mechanisms for their appearance are understood, some features like nonlinear 


pattern selection still present challenges and new patterns continue to be discovered. An ubiquitous class of 
chemical patterns that was not discussed here are those that arise from diffusion-limited aggregation (DLA) [73]. 
Such DLA clusters are seen in many contexts, including electrochemical deposition processes, and are often 
analysed using the concepts of fractal geometry [ 10 ] and wavelets [74, 75 ]. Also, methods for controlling chemical 
chaos [ 76 ] have not been discussed in this chapter although they have potential applications for both industrial 
processes and biological systems. 

In spite of these limitations it is hoped that this chapter will provide an introduction to the unusual phenomena that 
chemically reacting systems exhibit when driven far from equilibrium and an indication of how these phenomena 
may be analysed. Although such systems were often regarded as curiosities in the past, it is now clear that they are 
the rule rather than the exception in nature and deserve our full attention. 
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